Assignment 2: Create-your-own learning task

In this project you will create a dataset to be used by the rest of the class in later machine learning projects. Build a dataset and covert it into the format given below.

Creating the dataset file

You can get an idea of what kind of ML datasets are out there by viewing the UC-Irvine archive (but don't use any of these for your dataset). There are endless data sources out there on the web. Some possibilities are: build up your own dataset (like we did with the images dataset in class), collect your own data, key in existing data, or find an existing database and trim it down to a reasonable set of related features. Take a look at the lakes dataset for an example. Wherever you collect your data from, you will need to reformulate it so that:

Include comments in the .arff file indicating the original source of the data and what you did to modify it into the current form. If you write a script or program to transform the data, include that when you hand it in. Also make sure that the concept isn't too simple (such as a perfect correspondence between one of the descriptive features and the target feature) nor too hard (no relationship at all between the descriptive features and target feature).