DNS Blocklist Accuracy Figures (as of July 2005)

Many people, whether they use SpamAssassin or not, find accuracy figures for DNSBLs to be useful. Here are accuracy figures for the DNS blocklists included in SpamAssassin 3.1.0, as measured during our July rescoring run. We use the following techniques to assure high accuracy on these figures:

  • some hits are recorded from 'live' data at the time the messages were received, not post-facto testing (using 'mass-check --reuse')
  • there were 9 people contributing their hit data, from a variety of geographical locations and organisational types
  • both Ham and Spam hitrates are measured, and the corpora were hand-verified in advance
  • the corpora use (relatively) fresh mail, received between January 2004 and July 2005

123778 spam messages and 53091 ham messages were used:

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
 176869   123778    53091    0.700   0.00    0.00  (all messages)
100.000  69.9829  30.0171    0.700   0.00    0.00  (all messages as %)

These were randomly chosen from all contributors' logs (see below). First off, the DNS blocklists.

Note – the sorting is by mass-check's RANK metric, which puts 'better' results near the top, and the results are in HitFrequencies format. The 'S/O', 'SPAM%', and 'HAM%' columns are the most important metrics; S/O values approaching 1.0 are best.

 17.449  24.9285   0.0113    1.000   0.97    3.90  RCVD_IN_XBL
  3.841   5.4824   0.0132    0.998   0.88    2.16  RCVD_IN_SORBS_SOCKS
  5.865   8.3690   0.0283    0.997   0.88    3.16  RCVD_IN_SBL
  9.438  13.4652   0.0490    0.996   0.84    2.23  RCVD_IN_WHOIS_INVALID
  2.237   3.1839   0.0301    0.991   0.79    0.02  RCVD_IN_SORBS_MISC
 27.913  39.8423   0.0998    0.998   0.76    2.60  RCVD_IN_DSBL
  4.914   6.9883   0.0772    0.989   0.74    0.02  RCVD_IN_SORBS_HTTP
  0.914   1.3015   0.0113    0.991   0.72    2.77  RCVD_IN_NJABL_SPAM
  7.692  10.9486   0.0979    0.991   0.72    2.43  RCVD_IN_WHOIS_BOGONS
 22.130  31.5662   0.1300    0.996   0.71    2.05  RCVD_IN_SORBS_DUL
 10.642  15.1449   0.1450    0.991   0.67    0.72  RCVD_IN_NJABL_PROXY
 18.739  26.6946   0.1921    0.993   0.61    1.95  RCVD_IN_NJABL_DUL
  0.345   0.4888   0.0094    0.981   0.57    0.20  RCVD_IN_SORBS_SMTP
  5.309   7.5062   0.1865    0.976   0.56    1.46  RCVD_IN_SORBS_WEB
 16.300  23.1463   0.3372    0.986   0.53    1.56  RCVD_IN_BL_SPAMCOP_NET
  0.166   0.0016   0.5481    0.003   0.47   -2.20  RCVD_IN_IADB_VOUCHED
  0.161   0.0016   0.5330    0.003   0.47   -4.30  RCVD_IN_BSP_TRUSTED
  0.096   0.1365   0.0000    1.000   0.41    1.00  RCVD_IN_WHOIS_HIJACKED
  0.118   0.1656   0.0075    0.956   0.41    0.10  RCVD_IN_NJABL_RELAY
  0.040   0.0533   0.0094    0.850   0.32    0.26  RCVD_IN_SORBS_ZOMBIE
  0.000   0.0000   0.0000    0.500   0.28    0.00  RCVD_IN_SORBS_BLOCK
  0.000   0.0000   0.0000    0.500   0.28    0.00  RCVD_IN_NJABL_MULTI
  0.000   0.0000   0.0000    0.500   0.28    0.00  RCVD_IN_NJABL_CGI

URI blocklist lookups, against SURBL and SBL:

 17.882  25.5522   0.0000    1.000   1.00    4.50  URIBL_SC_SURBL
  9.684  13.8369   0.0019    1.000   0.98    3.81  URIBL_AB_SURBL
 34.260  48.9497   0.0132    1.000   0.98    4.09  URIBL_JP_SURBL
 36.356  51.9317   0.0414    0.999   0.90    3.01  URIBL_OB_SURBL
 30.956  44.1605   0.1695    0.996   0.66    2.14  URIBL_WS_SURBL
  0.266   0.3805   0.0000    1.000   0.56    2.80  URIBL_PH_SURBL
 22.415  31.8425   0.4370    0.986   0.49    1.64  URIBL_SBL

SPF lookups:

  3.437   4.8942   0.0396    0.992   0.80    1.38  SPF_SOFTFAIL
  1.006   1.4292   0.0207    0.986   0.71    2.43  SPF_HELO_SOFTFAIL
  2.550   3.5717   0.1676    0.955   0.53    1.14  SPF_FAIL
  2.297   3.2090   0.1695    0.950   0.52    1.07  SPF_NEUTRAL
  1.796   2.5029   0.1488    0.944   0.51    0.00  SPF_HELO_FAIL
  0.935   1.2724   0.1488    0.895   0.43    0.00  SPF_HELO_NEUTRAL
  5.334   2.5925  11.7252    0.181   0.21   -0.00  SPF_HELO_PASS
  3.267   2.6241   4.7654    0.355   0.10   -0.00  SPF_PASS

RFC-ignorant, testing against the envelope sender's domain:

  3.038   4.3352   0.0132    0.997   0.86    2.60  DNS_FROM_RFC_DSN
  1.174   1.6715   0.0151    0.991   0.75    1.94  DNS_FROM_RFC_BOGUSMX
  3.590   5.0607   0.1620    0.969   0.57    1.45  DNS_FROM_RFC_WHOIS
 13.930  19.7071   0.4615    0.977   0.47    1.71  DNS_FROM_RFC_POST
 12.120  16.7154   1.4051    0.922   0.34    0.20  DNS_FROM_RFC_ABUSE

other network rules:

  1.898   2.7081   0.0094    0.997   0.82    3.20  NO_DNS_FOR_FROM
  1.449   2.0593   0.0245    0.988   0.74    1.51  DNS_FROM_SECURITYSAGE
  7.200  10.0898   0.4615    0.956   0.44    0.23  DNS_FROM_AHBL_RHSBL

More details of the source mass-check log files and test procedure can be read in SpamAssassin bug 4505. the full list of freqs can be found in the STATISTICS-set3.txt file in the 3.1.0 release. Here's a list of the data files used. Note that only a randomly-chosen one tenth of each file was used.

Use of --reuse for real-time network results: confirmed on: 4 users (bmenschel, jm, parker, cthielen); confirmed off: 1 user (duncf); unknown: 4 users (bzoetekouw, misak, quinlan, theo).

bash-3.00$ ls -l /home/corpus-rsync/corpus/submit/
total 2839184
-r--r--r--   1 rsync    rsync    7967268 Jul 16 18:18 ham-bayes-net-bzoetekouw.log
-r--r--r--   1 rsync    rsync    1987090 Jul 16 14:49 ham-bayes-net-cthielen.log
-r--r--r--   1 rsync    rsync    23284450 Jul 24 08:04 ham-bayes-net-daf.log
-r--r--r--   1 rsync    rsync    51469171 Jul 19 02:26 ham-bayes-net-jm.log
-r--r--r--   1 rsync    rsync    45026386 Jul 19 02:27 ham-bayes-net-jm2.log
-r--r--r--   1 rsync    rsync     294744 Jul 25 18:57 ham-bayes-net-misak.log
-r--r--r--   1 rsync    rsync    22130676 Jul 27 04:17 ham-bayes-net-parkerm.log
-r--r--r--   1 rsync    rsync    14056970 Jul 27 19:37 ham-bayes-net-quinlan.log
-r--r--r--   1 rsync    rsync    8603737 Jul 27 17:01 ham-bayes-net-rod.log
-r--r--r--   1 rsync    rsync    28410747 Jul 27 02:34 ham-bayes-net-theo.log
-r--r--r--   1 rsync    rsync    62685697 Jul 16 18:22 spam-bayes-net-bzoetekouw.log
-r--r--r--   1 rsync    rsync    11891366 Jul 16 14:50 spam-bayes-net-cthielen.log
-r--r--r--   1 rsync    rsync    96553037 Jul 24 08:09 spam-bayes-net-daf.log
-r--r--r--   1 rsync    rsync    28662170 Jul 19 02:28 spam-bayes-net-jm.log
-r--r--r--   1 rsync    rsync    209202453 Jul 19 02:34 spam-bayes-net-jm2.log
-r--r--r--   1 rsync    rsync     243487 Jul 25 18:57 spam-bayes-net-misak.log
-r--r--r--   1 rsync    rsync    39357821 Jul 27 04:19 spam-bayes-net-parkerm.log
-r--r--r--   1 rsync    rsync    41987897 Jul 27 19:39 spam-bayes-net-quinlan.log
-r--r--r--   1 rsync    rsync    97404262 Jul 27 17:03 spam-bayes-net-rod.log
-r--r--r--   1 rsync    rsync    358576609 Jul 27 02:34 spam-bayes-net-theo.log
  • No labels