It is somewhat surprising that, in 2012, we are still struggling fighting spam. In fact, any victory we score against botnets is just temporary, and the spam levels raise again after some time. As an example, the amount of spam received worldwide dropped dramatically when Microsoft shut down the Rustock botnet, but has been rising again since then.
For these reasons, we need new techniques to detect and block spam. Current techniques mostly fall in two categories: content analysis and origin analysis. Content analysis techniques look at what is being sent, and typically analyze the content of an email to see if it is indicative of spam (for example, if it contains words that are frequently linked to spam content). Origin analysis techniques, on the other hand, look at who is sending an email, and flag the email as spam if the sender (for example the IP address the email is coming from) is known to be malicious. Both content analysis and origin analysis techniques fall short and have problems in practice. For instance, content analysis is usually very resource intensive, and cannot be run on every email sent to large, busy mail servers. Also, it can be evaded by carefully crafting the spam email. On the other hand, origin analysis techniques often have coverage problems, and fail to detect as malicious many sources that are actually sending out spam.
In our paper B@BEL: Leveraging Email Delivery for Spam Mitigation, that got presented at the USENIX Security Symposium last August, we propose to look at how emails are sent instead. The idea behind our approach is simple: the SMTP protocol, which is used to send emails on the Internet, follows Postel's Law, which states: "Be liberal in what you accept, but conservative in what you send". As a consequence of this, email software developers can come up with their own interpretation of the SMTP protocol, and still be able to successfully send emails. We call these variations of the protocol SMTP dialects. In the paper we show how it is possible to figure out which software (legitimate of malicious) sent a certain email just by looking at the SMTP messages exchanged between the client and the server. We also show how it is possible to enumerate the dialects spoken by spamming bots, and leverage them for spam mitigation.
Although not perfect, this technique allows, if used in conjunction with existing ones, to catch more spam, and it is a useful advancements in the war against spamming botnets.
Gianluca Stringhini is a PhD candidate working as research assistant at UC Santa Barbara. His research interests are Network Security, Botnets, and Spam Mitigation. You can follow him on Twitter at @gianlucaSB