Anti-spam and spam filtering techniquesWritten by Jesmond Darmanin on September 11, 2011
Spam is a waste of time and resources. Filtering the genuine mails from the spam on a daily basis wastes a sizeable amount of time (and therefore money) over the span of a year. Installing an anti-spam filter will provide the necessary blocks needed to ensure that minimal time is spent sifting through unwanted emails every morning. Whilst no single anti-spam software can guarantee to automatically remove all spam, there are spam controls that will do a very good job of intercepting most spam mail. Apart from removing spam messages that could cause embarrassment or offense due to their content, an anti-spam filter will also help by saving time that would otherwise be spent manually filtering the spam messages from the valid ones.
There are various spam techniques that have been created and implemented since spam started infiltrating people’s inboxes.
Spam filters work using a combination of techniques in order to filter through the messages and separate the genuine messages from the junk mail.
These techniques would rely on the following measures:
- Word lists – Lists of words that are known to be associated with spam and are commonly found in unsolicited mail messages, such as ‘sex’ or ‘mortgage’
- Blacklists and Whitelists – These lists contain known IP addresses of spam senders (blacklists) and non-spam senders (e.g. friends and family). Therefore addresses that form part of your contact list are automatically registered as whitelist and any emails originating from these email addresses will be sent directly to your inbox
Some ISPs receive requests from legitimate companies to add them to the ISP whitelist of companies. In order to be approved for whitelisting, companies are either required to pay or else they must pass a series of tests to prove that they are not sending out spam emails
- Trend Analysis – By analyzing the history of email sent from an individual, trends can help assess the likelihood of an email being genuine or spam. This can be an effective technique to help reduce false positives and improve spam detection rates
- Learning or Content filters – Learning filters, such as Bayesian filtering, examine the content of each email sent to and from an email address, and by learning word frequencies and patterns associated with both spam and non-spam messages, it is able to recognize which messages are valid and should therefore be directed towards the inbox, and which are spam and should be sent to Junk.
These techniques all work together to ensure an effective anti-spam technique. By using just one method one risks losing out on valid emails. For example, since organizations such as banks or financial institutions would have a high keyword incidence of words like ‘mortgage’ in valid emails, (a word that is commonly found in spam too), genuine emails could get sent to the spam folder because of this anomaly. However, by combining all these filtering techniques the spam filter will realize that not all messages sent to the bank containing the word ‘mortgage’ is unsolicited mail.
Spam Filters and their Limitations
Although spam filters are the most successful of anti-spam software, they have got certain limitations.
Firstly, spammers are continuously looking for ways to bypass anti-spam software. For example, to counter word lists, spammers randomize the spelling of words like ‘Viagra’, to become ‘v1agra’ or other misspelled derivatives. Therefore the word list must continuously be updated to include these alternative words in order for the spam filter to effectively filter mail.
There is also the problem of false-negatives and false-positives. As vendors and ISPs have increased their ability to block spam they are unfortunately also blocking more genuine mail.
For example, to avoid the filters, spammers came up with a new tactic and started to use “Re:” in their subject fields. Many spam slammers were altered to block these tactics, but then they fell into their own traps and occasionally stopped legitimate mail.
Effective spam filters are less likely to classify a genuine email as spam and therefore the risk of losing emails due to false-positives is reduced. The less effective a spam filter is, the higher the risk that it will classify spam as genuine mail (false-negative). The problem with false-negatives is that not all spam is simply a link to another site or an advert for some product; there is spam that contains malware and if this message is believed to be genuine, the malware can be activated by just looking at it in the preview pane, by opening the mail, or opening an attachment.
Since some emails can therefore be classified as spam and sent to the junk mail folder it’s important that a manual scan is made of both the inbox, to ensure that no spam went through, but also of the junk mail (without viewing the content in the viewing pane or opening an email since malware could be triggered) to check that valid emails were not accidentally marked as spam. Double-checking is important so as not to lose any important emails, however it is also another problem that spam causes – that of time wasting.
Sender Policy Framework
Sending an email from a trusted domain name such as ‘yahoo.com’ is a simple process and one that spammers use to its full advantage. By using a forged address it is difficult for ISPs to ascertain who the sender is and subsequently cancel their account.
The Sender Policy Framework (SPF) aims to prevent spammers from using forged names to spam others, by checking that the sender is authorized to send email from that specific domain. Any attempts at sending email from a fake address will be rejected.
SPF can stop spam to some extent; however, it is only one tool and will not address all issues. SPF can effectively stop viruses or worms from an infected machine that send out thousands of message to everyone listed in your address book.
Ultimately, SPF is designed to address vulnerabilities in Simple Mail Transfer Protocol (SMTP), the main protocol used in sending e-mail, which does not include an authentication mechanism.
Challenge-Response (CR) systems maintain a list of permitted senders, effectively a white list. Anyone who sends an email whose address is not on the list is sent a challenge. This is usually completed by clicking on a URL, repeating a display code or sending a reply email. Once the challenge is successfully completed, the new sender is added to the list of permitted senders and the original email is delivered. This method is often used when signing up to newsletters. The theory behind this method is that spammers using fake sender email addresses will never receive the challenge and thus not be able to authenticated themselves, whilst those spammers who do use real email addresses will not be able to reply to all of the challenges.
Spam is likely to be around for as long as the Internet is around and spammers will constantly come up with new ideas and techniques of how to spam users and bypass anti-spam systems. Spam is effectively an epidemic and there is no proven cure; however, there are always preventative measures that can be implemented to combat any viruses as much as possible and keep your computer clean and virus-free.