...
No Format |
---|
wc -l /home/corpus-rsync/corpus/Obsolete/submit-2.60-GA-run1/ham-set0.log /home/corpus-rsync/corpus/Obsolete/submit-2.60-GA-run1/spam-set0.log
210442 /home/corpus-rsync/corpus/Obsolete/submit-2.60-GA-run1/ham-set0.log
354479 /home/corpus-rsync/corpus/Obsolete/submit-2.60-GA-run1/spam-set0.log
|
...
No Format |
---|
./tenpass/split-log-into-buckets 10 \ < /home/corpus-rsync/corpus/Obsolete/submit-2.6 060-GA-run1/ham-set0.log mv split-1.log new ./tenpass/split-log-into-buckets 10 < new wc -l split-1.log 2104 split-1.log |
...
No Format |
---|
mv split-*.log ../../logs/nonspam-jm/ ./tenpass/split-log-into-buckets 10 \ < /home/corpus-rsync/corpus/Obsolete/submit-2.6 060-GA-run1/spam-set0.log mv split-1.log new wc -l new 35437 new |
...
Results will appear in "tenpass_results" – over the course of 4 days.
These will be:
- scores.{1 .. 10}: scores and GA accuracy ratings output by GA
- {ham,spam}.log.{1 .. 10}: validation log files for that set of scores
To perform the validation step, run
No Format |
---|
./tenpass/10pass-compute-tcr
|
This will compute an accuracy rating, using those scores and those validation log files, for the 10 folds. Output looks like:
No Format |
---|
# TCR: 14.173333 SpamRecall: 96.002% SpamPrec: 99.367% FP: 0.31% FN: 2.01%
# TCR: 13.986842 SpamRecall: 96.143% SpamPrec: 99.320% FP: 0.33% FN: 1.94%
# TCR: 15.865672 SpamRecall: 95.579% SpamPrec: 99.608% FP: 0.19% FN: 2.22%
# TCR: 14.173333 SpamRecall: 95.532% SpamPrec: 99.461% FP: 0.26% FN: 2.25%
# TCR: 15.748148 SpamRecall: 95.532% SpamPrec: 99.608% FP: 0.19% FN: 2.25%
# TCR: 12.807229 SpamRecall: 95.014% SpamPrec: 99.409% FP: 0.28% FN: 2.51%
# TCR: 14.561644 SpamRecall: 94.779% SpamPrec: 99.654% FP: 0.17% FN: 2.63%
# TCR: 12.432749 SpamRecall: 94.309% SpamPrec: 99.504% FP: 0.24% FN: 2.86%
# TCR: 14.358108 SpamRecall: 95.859% SpamPrec: 99.414% FP: 0.28% FN: 2.08%
# TCR: 18.318966 SpamRecall: 95.953% SpamPrec: 99.707% FP: 0.14% FN: 2.03%
|
These figures can be compared with other 10FCV runs; they're a good measurement of training accuracy. In other words, they're what you came for.
10-Fold Testing With The Perceptron Instead of GA
...