`K`-Nearest Neighbors

In this project you will implement the "K-Nearest Neighbors" algorithm.

Step 1: Implement `K`-Nearest Neighbors

Implement K-Nearest Neighbors using Euclidean distance as discussed in class and in the textbook. Code some reasonable way of dealing with categorical (discrete-valued) features. If the taget feature for a given dataset is categorical, use the mode of the K nearest neighbors to predict the value of a new instance. If the target feature is continuous, use the mean of the K nearest neighbors. You should be able to run your classifier with any positive integer for the parameter K (note that if K meets or exceeds the number of training instances, then your algorithm becomes a baseline learner: it always just guesses the mean/mode of the whole training set).

Step 2: Implement your Main Program

Your main program should be able to run your K-Nearest Neighbors algorithm on any given .arff dataset and any positive integer K, provided that all features are either categorical or numeric. You should be able to run your program in a manner similar to this (e.g, for K=5):

Assignment6.exe lakesTrain.arff lakesTest.arff 5

The program would then train on lakesTrain.arff and test on lakesTest.arff. Your program would then output accuracy on the test set (since lakes has a categorical target feature). If the target feaure is continuous then your program should output root mean squared error instead of accuracy.

Don't forget to check back on our CS 495 datasets; more will be added soon.

Step 3: Bonus features

You will get bonus credit for implementing neat extras. The following are some suggestions.

Add other distance metric options, such as Manhattan distance or Minkowski distance.
Implement some kind of kernel function.
Automatically scale the feature space (e.g. with standardization or range scaling).
Have an option to use cross-validation to automatically tune the value of K.
Implement some kind of feature selection.

Step 4: Handing it in

In Educat, hand in the following

All of your source files
A READ_ME.txt file indicating what bonus features you are claiming, what language (and version of that language) you used, how to compile and run it, and any other information that you want to include.

K-Nearest Neighbors

Step 1: Implement K-Nearest Neighbors

Step 2: Implement your Main Program

Step 3: Bonus features

Step 4: Handing it in

`K`-Nearest Neighbors

Step 1: Implement `K`-Nearest Neighbors