SpasAsssassin Rules – a laymans interpretation


I’ve been using SpamAssassin for some time now, personally and at work. I’ve been working with SpamAssassin rules to deal with spam that comes through the other filters and makes it to user mailboxes. I’ve been writing these rules in a very static way and they are not very effective as spammers frequently change their spelling. I did a search online and found a few good resources for figuring out how to write better spamassassin rules, but nothing really complete, or written for beginners in plan english. Spamassassin is written in PERL, so it uses PERL RegEx in its configuration. For me, since I am using a version of Spamassassin compiled for use on Windows in the NoSpamToday product, I am editing the local.cf file in my NST installation folder. This is where I add my custom keyword filters.

Before we get started, keep in mind, that I’ll be constantly updating this post with the most recent information.

First, I’ll assume you at least know what Spamassassin is, whether or not you are using NST (NoSpamToday for future referece). In using custom keyword filters, you basically have 3 lines of text for each word, the actual line telling SA (Spamassassin for future reference) where to look (i.e. body, subject, etc). The second line is the description telling basically what the filter is doing. The last line is the score, that SA uses to assign to a message matching the first line of code. I’m going to post the actual conetent of this post as an extended entry, so the main body won’t take up your entire screen. Click the title of this post to read more.

Things to know:

Format: The format of a custom keyword filter will be like this…

header SUBJ_PHRASE Subject =~ /bsmallb/i
describe SUBJ_PHRASE Subject contains “small”
score SUBJ_PHRASE 1.0

The above rule, would scan the subject of the message and find the word small, it will be an exact match before of the b before and after the word itself. Any message with a subject containing the word “small” would get a spam score of 1.

Some things to know: wild cards and syntax

The m/ and the / denote the start and end of the expression that you are looking for.

() denote a ‘regexp string’ – it’s saying check for the string in brackets at this point in the entire string. You can use it for creating ‘or this or this’ scenarios using the pipe | symbol. At the end of the brackets you can also say how many ‘hits’ you are looking for by using the following characters:

* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times

The square brackets are for classes. e.g. [a-z] says all lowercase characters between a and z. [A-Z] is the same for uppercase and [a-zA-Z] is both.

So if you did m/Hello/ it would match Hello If you did m/Hello/i it would match HeLLO or heLLO as the /i says ‘ignore case’.

If you did Hell([a-z]) then Hella Hellb Hellc… Hellz would match

m/Remove “[%*!&,]+” to make the link working!/i

That says match Removetomakethelinkworking!

About Joe

I am the author of this blog, IT engineer, husband, father, and somewhat of a nerd.

Posted on February 6, 2007, in Professional/Tech and tagged , , , , , , , , , , , , . Bookmark the permalink. Comments Off on SpasAsssassin Rules – a laymans interpretation.

Comments are closed.

%d bloggers like this: