SpamAssassin Rules Project

(DRAFT - this part of the wiki is a discussion document, based on emails to dev list. Please feel free to add comments, but be sure to make clear that it's your opinion, by signing your name to them. Your real name is preferred, btw.)

The Problem

Here it is, stated by DuncanFindlay: 'SpamAssassin is not as effective as it could be because of the rules that are being used to detect spam. There are two problems here:

  1. The "not enough rules" problem: SpamAssassin does not have enough high quality spam-catching rules. Anecdotally, our FN ratio seems to be much higher with 3.1 than with 3.0 (we won't know for sure until the mass-checks are done). There may be a variety of reasons for this:

2. The "release cycle" problem: Any high quality rules that are incorporated into SpamAssassin are not distributed until the next release. Since rules and code are tied together, the release cycle for rules is too long. Submitted rules are not distributed while they are most effective, and rules lose their effectiveness too quickly.'

3. \[added by LorenWilton\] The instant an actual rule is posted on the user's list, it will lose about 80% of its effectiveness, usually within about 16 hours.  Within a week it will be virtually useless.  Sometimes the rule will regain some effectiveness a few months later, and in rare cases posting a rule will not affect the hit rate.  But in general, public posting in a readable forum of a rule body will negate the usefulness of the rule almost instantly.'

4. \[added by BobMenschel from others' discussion\] SA rules development handles rules aimed at spam in English best, since most SA rules developers that feed the distribution system speak and correspond in English, and the great majority of the testing corpora are based in English. We're not as good at developing, validating, testing, or scoring rules in other languages. 

The Solution

Based on the problem areas outlined above, here are the pages for each aspect of the problem, and proposed solutions:

Outstanding Tasks/Votes

Here's a list of the tasks that have fallen out of the above plan so far... we now need to vote to go forward with these, then put them into action.

First step – the sandboxes:

Phase two – mass-checking systems:

Phase three: