Wednesday, 28. January 2004

A Domino Anti-Spam Architecture

Given
Declan's promise of an impending Domino whitelist (hurrah!), I thought it about time that I published what I have so far on my promised
Grand Unified Domino Anti Spam Architecture (GUDASA? there must be a better acronym than
that).
It remains my contention (shared, I know, by many others) that a composite approach, taking the most refined protocol level approach and combining with the most refined content level approach, will give the best results. But what are the essential components?
You will find a
large, PDF version of that flow chart to the right here.
The colours of the boxes are: Blue; standard Domino functionality, Green; available additions to base functionality (some are free, others not), Red; not currently known by me to be possible.
The chart is in two halves. The half to the left deals with what happens at the protocol level, during the SMTP phase of message delivery. From my own empirical data, I believe it possible to defeat c.75-90% of all spam at the protocol level, without undue collateral damage and
providing that all of the elements in the protocol phase of the chart are available.
The half to the right deals with what happens after the SMTP phase has successfully completed, but before and up to when an email hits a user's mailbox.
Before you complain, I am aware that the sequence of events is not displayed perfectly. In part this is due to the fact that the impact of some elements of the process is split across ranges of events. For example, DNSBL hits are known about almost immediately on connection, but a 554 rejection is only sent after MAIL FROM. However, there is enough here to illustrate what I believe are the significant elements of the process. Let's walk through the steps:
- HELO - Daniel Koffler has demonstrated a Domino Directory tweak that will permit, for example, the protocol level blocking of hosts that HELO with a fully qualified hostname or an IP that belongs to you. This skims off a fractional percentage of spam, but at very low risk of collateral damage.
- The need for a Domino whitelist is now well established but as of right now, I know of no available and supported solution. Raymond Neeves is no longer able to offer or support Intercept and I take one last opportunity to thank him for his excellent work in bringing the concept as far as he was able.
If Domino had a whitelist (and, pax Raymond, I am still using Intercept), then it would need to be invoked early for two reasons. Firstly it should pre-empt blacklisting and secondly it renders unnecessary any DNSBL look-ups at all. We currently whitelist c. 1/3 of all email here (2/3 of all email we accept), so this represents a considerable saving in DNS overhead.
- Next, we do a local look-up on banned host names and addresses. Again, no point in doing a DNSBL look-up on IPs we know locally we don't like.
- Next, we do a DNSBL look-up. This is a critical part of the process. The GUDASA works best when a) we choose the right DNSBLs, b) we whitelist sensibly and c) we have effective content filtering.
- After all of that, we might still not like the sender envelope.
- Or the recipient envelope which we either have explicitly blocked or the local part of which does not exist in the Domino Directory.
- The other missing piece of functionality (not so critical as the whitelist, but it would be nice) is an ability to reject (server mail rules) based on raw MIME content. You can tell a lot about the spammishness of an email based on raw MIME. Does it contain lots of HTML comments (more than about two looks very suspicious)? Does it use non-existent tags? Does it use <FONT COLOR=#FFFFFF>? Does it contain web bugs? If we could expose raw MIME to server mail rules on itemisation by the SMTP server, that would form a useful starting point.
- And that's it for the protocol side. We now have an email from a host which does not claim to be one of our own and is not listed on any DNSBLs, or is listed in our whitelist and is from an address we have not chosen to block, addressed to an address we have not chosen to block and does not contain some obvious spam hallmarks. But it could still be spam (or a virus).
- So we pass the message through our virus scanner (you actually need to pay for one of these -- all of the other green boxes can be satisfied for free).
- We also check for other banned content. If it contains a .pif file, it's a virus even if the virus scanner doesn't think so.
- Cue kSpam. I dislike most content filtering solutions because a) they create false positives, b) they (often) miss real spam and c) they are high maintenance. kSpam has none of these characteristics although I note Daniel Koffler's concerns about shared Baysian scores.
- Here's a use for all that spam I showed you how to trap. If, after all of the foregoing checks, the email still gets through and is addressed to one of your spamtraps, we can feed it directly via mail-in to the kSpam database of known spam. Next time the Bayesian scores are recalculated, this new spam is taken into account and you didn't have to do anything to make it happen.
- And that's all she wrote. All you can do now is deliver the email. But by now it probably isn't spam and it definitely isn't a virus.
To summarise, the key elements of the GUDASA are:
- a whitelist, actively maintained
- well chosen DNSBLs -- once kSpam is well trained, you could even back off on some of the more aggressive ones
- kSpam -- I don't think you need any other solution at the content filtering level, and it's free
Category: Domino: AdministrationTechnorati: Domino: Administration