10-Fold Cross Validation
10-fold cross validation (abbreviated "10FCV") is a system for testing trained classifiers. We use it in SpamAssassin development and QA.
The comp.ai.neural-nets FAQ covers it well, in http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-12.html :
In SpamAssassin, we generally use k=10 – as pretty much everyone else does anyway – and we use 10FCV to test:
- new tweaks to the "Bayesian" learning classifier (the BAYES_* rules)
- new tweaks to the rescoring system (which is also a learning classifier, just at a higher level).
Traditionally, k-fold cross-validation uses a "train on k-1 folds, test on 1 fold"; we use that for testing our rescoring system. However, for the BAYES rules, we use "train on 1 fold, test on k-1 folds", as otherwise it can be hard to get a meaningful number of false positives and false negatives to be able to distinguish improvements in accuracy, because that classifier is very accurate when sufficiently trained.