K-Nearest Neighbors

In this project you will implement the "K-Nearest Neighbors" algorithm.

Step 1: Implement K-Nearest Neighbors

Implement K-Nearest Neighbors using Euclidean distance as discussed in class and in the textbook. Code some reasonable way of dealing with categorical (discrete-valued) features. If the taget feature for a given dataset is categorical, use the mode of the K nearest neighbors to predict the value of a new instance. If the target feature is continuous, use the mean of the K nearest neighbors. You should be able to run your classifier with any positive integer for the parameter K (note that if K meets or exceeds the number of training instances, then your algorithm becomes a baseline learner: it always just guesses the mean/mode of the whole training set).

Step 2: Implement your Main Program

Your main program should be able to run your K-Nearest Neighbors algorithm on any given .arff dataset and any positive integer K, provided that all features are either categorical or numeric. You should be able to run your program in a manner similar to this (e.g, for K=5):

Assignment6.exe lakesTrain.arff lakesTest.arff 5

The program would then train on lakesTrain.arff and test on lakesTest.arff. Your program would then output accuracy on the test set (since lakes has a categorical target feature). If the target feaure is continuous then your program should output root mean squared error instead of accuracy.

Don't forget to check back on our CS 495 datasets; more will be added soon.

Step 3: Bonus features

You will get bonus credit for implementing neat extras. The following are some suggestions.

Step 4: Handing it in

In Educat, hand in the following