ML-Evaluation of Classifiers
Evaluation Criteria
Predictive accuracy:
Efficiency
- Time to construct the model
- Time to use the model
Robustness: handling noise and missing values Scalability: efficiency in disk-resident databases Interpretability: understandable and insight provided by the model Compactness of the model: size of the tree, or the number of rules.
**Precision and recall measures **
Confusion Matrix
Classified Positive | Classified Negative | |
---|---|---|
Actual Positive | TP | FN |
Actual Negative | FP | TN |
Roc Curve
True positive rate:
False positive rate: / True Negative Rate
How to compare 2 curves? Compute the area under the curve (AUC)
If AUC for
Evaluation Methods
Holdout set: The available data set D is divided into two disjoint subsets,
- the training set
(for learning a model) - the test set
(for testing the model)
Important: training set should not be used in testing and the test set should not be used in learning.
- Unseen test set provides a unbiased estimate of accuracy.
The test set is also called the holdout set. (the examples in the original data set D are all labeled with classes.)
This method is mainly used when the data set D is large.
n-fold cross-validation:
The available data is partitioned into n equal-size disjoint subsets. Use each subset as the test set and combine the rest
The procedure is run n times, which give n accuracies.
The final estimated accuracy of learning is the average of the n accuracies.
10-fold and 5-fold cross-validations are commonly used.
This method is used when the available data is not large.