Naive Bayes

In this project you will implement the Naive Bayes algorithm.

Step 1: Implement Naive Bayes

Implement Naive Bayes as discussed in class and in the textbook. You only need to handle categorical (descriptive and target) features (such as in the golf dataset).

Step 2: Equivalent sample size / Laplace Smoothing

To deal with the potential problem of sometimes getting probability estimates equal to zero, use an additive term as discussed in class (this is called Laplace Smoothing in the textbook). The effective sample size should be small. For example, instead of using 3/20 as the probability estimate for Pr[cold | PlayGolf = "Yes"], you would use (3 + α)/(20 + 4α), where α is a small constant that you choose (perhaps α=1) and 4 comes from the fact that Temperature has 4 possible values.

Step 3: Main

Your main program should be able to run your Naive Bayes algorithm on any given .arff dataset, provided that all features (including the target feature) are categorical. You should be able to run your program in a manner similar to this:

Assignment8.exe golfTrain.arff golfTest.arff

The program would then train on golfTrain.arff and test on golfTest.arff. Your program would then output accuracy on the test set.

Step 3: Bonus features

You will get bonus credit for implementing neat extras. The following are some suggestions.

Handle numeric descriptive features with fixed-width bins.
Handle numeric descriptive features with variable-width bins.
Handle numeric descriptive features using a probability density function.

Step 4: Handing it in

In Educat, hand in the following

All of your source files
A READ_ME.txt file indicating what bonus features you are claiming, what language (and version of that language) you used, how to compile and run it, and any other information that you want to include.