Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Include examples of mail to not include in either ham or spam corpora


(Aside: yes, it's "corpora". See PluralOfCorpus)

Mail to NOT Include in Ham or Spam

  • Mailing lists
    Do not include discussion mailing lists in either your ham or spam corpus. Mailing lists tend to be too similar in content, and all mail is sent by the same mail server. Furthermore it can unduly bias the results if multiple masscheck participants are subscribed to the same mailing lists. Try to stick to mail directly to you. Generally low-traffic announce-only mailing lists are OK.
  • Spam Sent via Legitimate Services (Facebook, Livejournal, etc.)
    Occasionally you receive spam text posted to your account on services like LiveJournal or Facebook. DO NOT include such mail in either ham or spam folder. Just delete it. Why? We don't want to count these as spam, causing false marks against highly safe whitelist rules like USER_IN_DEF_DKIM_WL. They do not count as ham either, because spam URL's or spam text would throw off the statistics if they show up in the ham folder. Simply delete them.

Minor things that are nice to have