The main system used to measure spam-filtering accuracy in SpamAssassin is the "FP%/FN% percentages" system.
It's quite simple. First, you scan a corpus of hand-classified mail (see HandClassifiedCorpora) to get 4 figures:
Next, perform this calculation:
and you have two numbers that simply, concisely, and comprehensibly describe the accuracy and performance of the filter.
For example, let's say we do a test as follows:
the FP% and FN% work out as (2 / 1500) * 100 = 0.1333% and (30 / 1000) * 100 = 3.0% respectively.
The 'STATISTICS.txt' files distributed with SpamAssassin versions since about 2.30 include this data, measuring the ruleset's accuracy against a validation ruleset:
As you can see, FP% and FN% get pride of place in the measurement scheme.
See also MeasuringAccuracy for other methods.