Trusted relays, and how header trust works
SpamAssassin will automatically attempt to figure out which Received: headers were inserted by trustworthy mailservers, and which were not. This allows it to:
- optimize DNSBL lookups
- detect when mails never left a trusted network path
- know when a Received header can be trusted for whitelisting purposes
- produce synthetic 'pseudo-headers', allowing rules to match against the message's network traversal portably
This page details the concept of 'trust' internally to SpamAssassin, and how it appears in the output. See also TrustPath for details on how to influence this by setting the 'trusted_networks' parameter, and DynablockIssues for authenticated mail submission issues.
Note that the trust path information is used by both network and non-net rules, so even if you're not running with network rules enabled (the "-L" switch), it's worth configuring this.
Here's an example email, with sets of headers for analysis:
Assuming SpamAssassin is running on 'internal.example.com', and the TrustPath on that machine is set up to trust the DMZ machine 'dmz.example.com' at 22.214.171.124 (and also consider that internal using
internal_networks), and a trustworthy external machine 'friend.example.com' at 126.96.36.199, this means that the message passed through the following relays:
Untrusted source at evil.example.net [188.8.131.52]
Untrusted relay at chaos.example.net [184.108.40.206]
Untrusted relay at loser.example.org [220.127.116.11]
Untrusted relay at notrust.example.com [18.104.22.168]
Trusted relay at friend.example.com [22.214.171.124]
Trusted and internal relay at dmz.example.com [126.96.36.199]
Trusted and internal localhost handover at internal.example.com [127.0.0.1]
A side note: the header lines for 'evil', 'chaos', and 'loser' could all be faked, for all we know, since who knows if an untrusted host is running legitimate MTA software, or is under the control of a spammer? Therefore, it's unwise to trust things you find in untrusted headers. (One exception to this is discussed at the end of this article.)
The 'X-Spam-Relays' Pseudo-headers
The last few lines of that debug output are most noteworthy – especially since rules can match against these metadata pseudo-headers. Here they are:
the 'X-Spam-Relays-Trusted' pseudoheader:
the 'X-Spam-Relays-Untrusted' pseudoheader:
There are also two more – 'X-Spam-Relays-Internal' and 'X-Spam-Relays-External'.
They are divided into trusted/untrusted and internal/external pairs, depending on the setting of 'trusted_networks' and 'internal_networks'.
You can see they list the contents of the Received header in a machine-readable, and standardised, format, so that rules can be insulated from the vagaries of the Received header, which has a tendency to look radically different between MTAs. (Some MTAs even reverse the order of the items, but look otherwise identical!)
In the samples above, they include newlines; however, in the real pseudo-headers produced by SpamAssassin, each [...] block is simply space-separated.
Some sample rules that use this data can be seen in the standard SpamAssassin rules file, '20_fake_helo_tests.cf'. Here is an example:
DNSBL lookups and the most recent untrusted host
DNSBL rules support '-firsttrusted' and '-untrusted' as special-case keywords to control IP address selection. These keywords do not refer to the trust status of the lines themselves! They refer to the trust status of the data that will be looked up in the DNSBL.
This hinges on a key border case. The most recent 'untrusted' header line is in an interesting grey area – the host it discusses is an untrusted host, but the data recorded about that host is, in itself, trustworthy.
Above, for example, 188.8.131.52 is listed as an untrusted host and is therefore listed in the 'X-Spam-Relays-Untrusted' pseudoheader. However, its IP address was recorded by a trusted host, so the IP address data is trustworthy.
This is the most commonly tested item in the string, since it's the most likely host to be a spam zombie or spammer MTA.
Testing the "most recent untrusted" host in a header rule is done as follows:
^[^]]+ part of the pattern; that skips anything but "]" characters, ensuring that the match will only happen within the first [...] block of the pseudo-header string.
Checking that IP address in a DNSBL lookup using check_rbl() is performed by appending the string '-firsttrusted' to the set name:
Using Other Untrusted Hosts
The 'most recent untrusted' host is the only 'grey area', however. All the other hosts listed in the 'X-Spam-Relays-Untrusted' pseudoheader were both untrusted themselves, and their details were not recorded by a trusted host; both the lines themselves and the IP addresses are not trustworthy, since they could have been generated by a spamware application creating fake header data. It's especially important not to trust that data for rules that could give negative points, since spammers can, and will, attempt to fake their way around your whitelisting rules.
Also worth noting: it's common for the "trusted" networks to extend further than the "internal" networks. If you are writing rules to match the host which delivered a mail into the SMTP MX server, you should use "external" instead of "untrusted", since it's common for "good" third-party senders to be put into the "trusted" list. (This is especially important for rules that match features of dynamic host senders, such as rDNS patterns etc.)