Project description
The main aim of the project is
- To determine which of the ML methods discussed best classifies unseen data for a given dataset.
The tasks are as follows:
- Describe the data set.
- Choose at least two methods from each of the statistical and rule based groups and one neural network algorithm.
| Statistical |
Rule based |
ANN |
| Discriminants |
ID3, C4.5, C5 |
Perceptrons |
| k-nearest neighbor |
CART |
Radial basis |
| Naive Bayes |
Bayes tree |
DIPOL92 |
| Causal networks |
Rough sets |
|
- Describe the chosen algorithms, and discuss their advantages and disadvantages.
- Develop a setup for testing the prediction quality of each method. This includes software selection and a brief description of the software.
- Validate your results by using the validation methods given below. Describe these methods and their pros and cons.
| Validation methods |
| Jack-knife |
| Cross validation |
| Bootstrap |
- Compare the performance of the algorithms in terms of prediction quality, complexity, storage costs etc.
Ivo Duentsch
2009-09-10