Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: missing edit-log entry for this revision

...

No Format
wc -l /home/corpus-rsync/corpus/Obsolete/submit-2.60-GA-run1/ham-set0.log /home/corpus-rsync/corpus/Obsolete/submit-2.60-GA-run1/spam-set0.log
 210442 /home/corpus-rsync/corpus/Obsolete/submit-2.60-GA-run1/ham-set0.log
 354479 /home/corpus-rsync/corpus/Obsolete/submit-2.60-GA-run1/spam-set0.log

...

No Format
./tenpass/split-log-into-buckets 10 \
    < /home/corpus-rsync/corpus/Obsolete/submit-2.6
060-GA-run1/ham-set0.log
mv split-1.log new
./tenpass/split-log-into-buckets 10 < new
wc -l split-1.log
   2104 split-1.log

...

No Format
mv split-*.log ../../logs/nonspam-jm/

./tenpass/split-log-into-buckets 10 \
    < /home/corpus-rsync/corpus/Obsolete/submit-2.6
060-GA-run1/spam-set0.log
mv split-1.log new
wc -l new
  35437 new

...

Results will appear in "tenpass_results" – over the course of 4 days. (wink)

These will be:

  • scores.{1 .. 10}: scores and GA accuracy ratings output by GA
  • {ham,spam}.log.{1 .. 10}: validation log files for that set of scores

To perform the validation step, run

No Format

./tenpass/10pass-compute-tcr

This will compute an accuracy rating, using those scores and those validation log files, for the 10 folds. Output looks like:

No Format

# TCR: 14.173333  SpamRecall: 96.002%  SpamPrec: 99.367%  FP: 0.31%  FN: 2.01%
# TCR: 13.986842  SpamRecall: 96.143%  SpamPrec: 99.320%  FP: 0.33%  FN: 1.94%
# TCR: 15.865672  SpamRecall: 95.579%  SpamPrec: 99.608%  FP: 0.19%  FN: 2.22%
# TCR: 14.173333  SpamRecall: 95.532%  SpamPrec: 99.461%  FP: 0.26%  FN: 2.25%
# TCR: 15.748148  SpamRecall: 95.532%  SpamPrec: 99.608%  FP: 0.19%  FN: 2.25%
# TCR: 12.807229  SpamRecall: 95.014%  SpamPrec: 99.409%  FP: 0.28%  FN: 2.51%
# TCR: 14.561644  SpamRecall: 94.779%  SpamPrec: 99.654%  FP: 0.17%  FN: 2.63%
# TCR: 12.432749  SpamRecall: 94.309%  SpamPrec: 99.504%  FP: 0.24%  FN: 2.86%
# TCR: 14.358108  SpamRecall: 95.859%  SpamPrec: 99.414%  FP: 0.28%  FN: 2.08%
# TCR: 18.318966  SpamRecall: 95.953%  SpamPrec: 99.707%  FP: 0.14%  FN: 2.03%

These figures can be compared with other 10FCV runs; they're a good measurement of training accuracy. In other words, they're what you came for. (wink)

10-Fold Testing With The Perceptron Instead of GA

...