Spamassassin Intro and Setup

The conversation was short, and unsuccessful.

"It's really a simple request; each of the employees needs an additional paid half week each year.", I pointed out.

"For what?", my boss asked. "Vacation? Community Service? An office retreat?"

I shuffled uncomfortably. "Well, no. Filtering spam out of their email."

It sounds ridiculous, right? It isn't. Handling spam is a major expense for corporations, even if it doesn't show up as a line item in the budget.

I get about 120 spams/day (would anyone care to bet that number will drop? Didn't think so. :-) ) That's 43,800 spams a year. Let's assume I do nothing but identify and discard the spam, and can do so in 3 seconds per message - pretty fast, but I'm trying to come up with a conservative estimate - I'm spending 36.5 hours a year doing nothing but filtering spam.

*sigh*

I consider it possible that since my email address is visible on a lot of web sites and articles that I get more spam than most; lets assume I get twice as much as the average email address holder. That means that the average email recipient is spending a half week a year filtering spam. If that's done on work hours, that means the equivalent of 1 salary for every hundred employees is wasted on filtering spam.

This is why my blood pressure goes up when I see a spammer in an interview saying, "What's the problem? If you don't like the message, just delete it."

In addition to the time spent handling it, we also have to account for the time spent reporting illegal spams, the steady interruptions, and the time and equipment spent on filtering it out. I'll try to minimize this latter cost by leading you through setting up a very powerful spam filter - Spamassassin - with the worlds most common mail server - Sendmail.

Overview

Your current mail setup probably acts something like this. Incoming mail arrives at port 25 of your mail server, where Sendmail is listening for incoming connections. Sendmail verifies that the mail is destined for someone with a local account. Sendmail hands the mail off to the procmail program. If you don't have a .procmailrc in your home directory on the mail server, the message is placed directly in /var/spool/mail/{your_user_id}. If you do have a .procmailrc, procmail consults the rules (such as "place all mail from mywife@hercompany.com in the family folder") in that file to decide what to do with the message.

We're going to add a few rules to .procmailrc that first run a program that scores this message on how likely it's spam, and then based on this probability, filter it into a almost-certainly-spam or probably-spam folder. Mail that doesn't hit either probability cutoff gets sent through to the original email folder.

Spamassassin

Why Spamassassin? I've used 6 different spam filtering approaches over the past 6 years (junkfilter, spambouncer, a home-grown filter, razor1, razor2, and spamassassin). All of them helped, to some degree, but none of the others have as comprehensive a list of tests as Spamassassin. Here are the tests Spamassassin can use:

Header fields: The bulk mail software tools used by spammers have certain signatures; their message-id might have a certain form, they might always screw up some portion of writing mime headers.
Body phrase identification: Yup, you guessed it. Body parts and what you can do with them, South African banks and how you get 1/5 of the take for being someone's uncle, "This is not spam", etc. Spamassassin has an amazingly good list that does a good job filtering even if you don't want to use the other features.
Bayesian filtering: No matter how good the header and body checks are, they will always fall behind the spammers and fail to take into account the characteristics of what you consider spam and what you consider legitimate mail (called "ham"). Bayesian filtering takes folders of known spam and known ham and identifies words or phrases ("tokens") that only show up in spam and tokens that only show up in ham. When a new message is being scored in the future, if it contains a lot of spammy tokens, it spam score goes up. If it contains a lot of hammy tokens, its spam score goes down. This is a much better approach than static phrase identification as it handles the case where "Nigeria" might be a legitimate word in an email to a travel agent's office, or "breast" would be legitimate in an email to a women's clinic.
Automatic white/blacklist: In much the same way, Spamassassin keeps an Automatic WhiteList, or AWL, of sender email addresses. When a new message comes in, Spamassassin goes back to the AWL database and asks "What was the average spam score for messages from this email address and IP address?" If that number for previous messages was high (likely spam), Spamassassin assumes this new message will be as well and raises the spam score for the new one. Likewise, if the number was low or negative (likely ham), Spamassassin lowers the spam likelihood score for this message.
Manual white/blacklist: And for the times when Spamassassin just doesn't know you want commercial mail from a given vendor, you can manually say "I want all mail from someone@company.com or *@company.com" and they'll arrive at your mailbox even if the spam score is very high. It's also possible to say "all mail from *@fantasy-mail.com is spam, even if the score would otherwise be borderline."
DCC, Pyzor, Razor2: Spams, by and large, get distributed to lots of people with little or no modification. The DCC, Pyzor, and Razor projects attempt to cash in on this fact by asking people to submit a message to a central database once it has been identified as spam. If I identify a message as spam at 8:45am, I'll submit it to one of these databases. When you read the same message sent to you at 9:10am, Spamassassin asks that database, "Has anyone submitted this message as spam?". The database responds, "I'm 70% sure it is because someone reported it", and now its spam likelihood goes up.
RBL (A number are consulted; see tests.): Real-time Blackhole Lists focus on the IP addresses of the mail servers that passed the message along to you. When Spamassassin asks them about a particular mail server IP address, their reponse may drive the spam likelihood up because that mail server is run by a known spammer, is an open relay (a misconfigured mail server that unwittingly agreed to do the major work in sending thousands of spams), or is dial-up modem IP address (modem connected users generally don't send mail directly; they usually hand off the message to their ISP's mail server to send).
Character set and locales: This one's easy - I have no legitimate senders that would send me mail in the GB2312 or BIG5 character sets (Chinese, I believe). I've told spamassassin that mail in non-english character sets should be marked as spam.
Positive and negative scoring: As I've mentioned, the individual spamassassin rules can either rate a message's spam likelihood up (because the rule triggers on spam) or down because it triggers on ham. For example, few spammmers use the Pine mail program on Linux to send their messages; emails with a signature that they were created with Pine on Linux can have their spam likelihood lowered a bit.

Note that none of the above criteria, by themselves, is enough to say a message is definitely spam; I can come up with examples for any of the above where a given rule will misfire and incorrectly increase or decrease the spam score. However, when taken together, the collection is marvelously strong and accurate at identifying spam and ham.

Software Setup

There are a number of steps to take, but many of them only need to be done once by a mail server administrator.

First off, make sure that your mail server is working correctly, accepting and delivering mail. I'm assuming you're using Sendmail on an rpm-based distribution. This latter is not a problem if you're not; for debian users the install may be as simple as "apt-get {packagename}", and other non-rpm distribution users are probably comfortable installing these programs from source.

Spamassassin install

Instructions and hyperlinks for a number of distributions are at the download page. I'm going off the RPM approach, so I pull down the perl-Mail-Spamassassin, spamassassin, and spamassassin-tools i386 rpms from Theo Van Dinter's site.

Most of the commands in this section should be performed by the root user.

cd ~
mkdir spamassassin
cd spamassassin
wget http://spamassassin.kluge.net/perl-Mail-SpamAssassin-2.51-2.i386.rpm
wget http://spamassassin.kluge.net/spamassassin-2.51-2.i386.rpm
wget http://spamassassin.kluge.net/spamassassin-tools-2.51-2.i386.rpm
wget ftp://ftp.kluge.net/pub/felicity/RPMS/perl-Net-DNS-0.33-0tvd.noarch.rpm

Before we can install these, we need to get some perl modules. Many of these will be right on your vendor's CD; the remainder should be at Theo's supplementary RPM site.

#For Redhat 7.2
rsync -av zaphod.stearns.org::redhatmirror/pub/redhat/linux/7.2/en/os/i386/RedHat/RPMS/perl-HTML-Parser-3.25-2.i386.rpm .
rsync -av zaphod.stearns.org::redhatmirror/pub/redhat/linux/7.2/en/os/i386/RedHat/RPMS/perl-HTML-Tagset-3.03-3.i386.rpm .
#For Redhat 7.3
rsync -av zaphod.stearns.org::redhatmirror/pub/redhat/linux/7.3/en/os/i386/RedHat/RPMS/perl-HTML-Parser-3.26-2.i386.rpm .
rsync -av zaphod.stearns.org::redhatmirror/pub/redhat/linux/7.3/en/os/i386/RedHat/RPMS/perl-HTML-Tagset-3.03-14.i386.rpm .

Now we install the Spamassassin RPMs:

rpm -Uvh perl-Mail-SpamAssassin-*.i386.rpm spamassassin-*.i386.rpm perl-HTML-Parser-*.i386.rpm perl-HTML-Tagset-*.i386.rpm perl-Net-DNS-*.noarch.rpm

On RedHat 7.2, you may need to add --nodeps if rpm complains of a missing perl(HTML::Parser); the perl-HTML-Parser obviously provides this resource but doesn't appear to correctly declare so.

To avoid the overhead of starting a fresh copy of perl each time a new mail message comes in, there's a background daemon called spamd that holds most of the spam scoring code. Let's start that up:

/etc/rc.d/init.d/spamassassin start

To check that it's running, try:

[root@slartibartfast spamassassin-kit]# netstat -anp | grep spamd
tcp   0    0 127.0.0.1:783   0.0.0.0:*  LISTEN  4753/spamd -d -c -a 
unix  2    [ ]         DGRAM            5988777 4753/spamd -d -c -a

This says that spamd is running under PID 4753 - your PID will differ. It's listening on a Unix socket and TCP port 783 but only for connections coming from localhost.

Note that we don't have to do anything about making it start on next boot as the RPM has done that for us. If you're not using rpms, use whatever approach is appropriate for starting a given service in your default runlevel; you may need to run tools like ntsysv or chkconfig or may need to rename a file in /etc/rc3.d or /etc/rc5.d .

Spamassasin Client Install

To demonstrate, I'll do all the following with a bogus user called "spamtest", which I'll add now as root.

adduser spamtest

The following steps will need to be taken for each person that would like their mail filtered; substitute the correct username everywhere you see spamtest. Everything from this point on is done as the user for whom we're filtering mail.

su - spamtest
cd ~
mkdir .spamassassin
cd .spamassassin
cp -p /usr/share/spamassassin/user_prefs.template user_prefs
cat <<EOF >>user_prefs 
rewrite_subject 1
report_header 1
use_terse_report 1
defang_mime 0
report_safe 0
use_razor2 0
use_bayes 1
auto_learn 1
ok_locales en
EOF

All of the configuration options between cat and EOF will be added to user_prefs by the cat command. See "perldoc Mail::SpamAssassin::Conf" for more info on these settings.

Let's see if spamassassin's working before we go on:

spamc -R </usr/share/doc/spamassassin-*/sample-nonspam.txt
-6.3/5.0
* -6.3 -- Contains a PGP-signed message

spamc -R </usr/share/doc/spamassassin-*/sample-spam.txt
8.4/5.0
*  0.7 -- From: does not include a real name
*  0.6 -- Invalid Date: header (not RFC 2822)
*  1.4 -- Valid-looking To "undisclosed-recipients"
*  1.5 -- BODY: Information on how to work at home (2)
*  1.5 -- BODY: Drastically Reduced
*  0.8 -- BODY: List removal information
*  0.7 -- BODY: Once in a lifetime, apparently
*  0.2 -- Date: is 12 to 24 hours before Received: date
*  0.6 -- RBL: Received via a relay in relays.osirusoft.com
          [RBL check: found 142.249.10.63.relays.osirusoft.com., type: 127.0.0.3]
*  0.4 -- Message-Id is not valid, according to RFC 2822

When the nonspam message is fed in to spamc, spamc hands the text off to spamd which calculates the actual spam score. Because it has no spam characteristics it has no plus points, but a -6.3 because it's a PGP signed message. The final score is a -6.3, far below the needed 5.0 to categorize it as spam.

The second message has a bunch of spam characteristics. None, by themselves, are enough to categorize it as spam, but together they give it a score of 8.4.

If you don't something like this output (format and exact score may vary, that's OK) for the two test messages, you should take the time to figure out why before going on.

Procmail setup

Now that spamc is correctly identifying mail, lets set up the spamtest user to actually use it and filter mail.

cd ~
mkdir .procmail
touch .procmail/proclog
mkdir mail
touch mail/mbox

We need to create a .procmailrc file in /home/spamtest . If the user doesn't already have one, here are some suggested starting points. If the user does have one, add the lines between "#Spamasssassin start" and "#Spamassassin end" from the appropriate example to their existing file.

If the user gets their mail from this machine via IMAP or POP:

SHELL=/bin/sh
PATH=/bin:/usr/bin
PMDIR=$HOME/.procmail
LOGABSTRACT=all
MAILDIR=$HOME/mail      #you'd better make sure it exists
LOGFILE=$PMDIR/proclog   #recommended
VERBOSE=off

#Spamassassin start
:0fw: spamassassin.lock
| /usr/bin/spamc
#Spamassassin end

In the above example, procmail sends the message through spamc for scoring and changing the headers like Subject if it is a spam, but the message is allowed to pass straight through to its original destination (usually /var/spool/mail/spamtest). This one folder is the IMAP INBOX.

The spam messages can now be moved from folder to folder inside the user's mailreader; this job is made easier by the modified Subject line and the X-Spam-Status and X-Spam-Level headers - read on for more detail.

If the user reads their mail right on this machine (say, with pine) use this .procmailrc instead:

SHELL=/bin/sh
PATH=/bin:/usr/bin
PMDIR=$HOME/.procmail
LOGABSTRACT=all
MAILDIR=$HOME/mail      #you'd better make sure it exists
LOGFILE=$PMDIR/proclog   #recommended
VERBOSE=off
DEFAULT=$MAILDIR/mbox


#Mailing list start
#If you subscribe to any mailing lists, you might want to filter them off first:
:0:
* ^X-BeenThere: dshield@dshield.org
dshield

:0:
* ^X-BeenThere: user-mode-linux-devel@lists.sourceforge.net
uml-devel

#Mailing list end


#Spamassassin start
:0fw: spamassassin.lock
| /usr/bin/spamc

:0:
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*
spam10

:0:
* ^X-Spam-Status: Yes
spamassassin-spam

#Spamassassin end

A little explanation is needed. The first block sets up environment variables for use in the rest of the file.

The "mailing list" block is optional. If you're subscribed to mailing lists and want to have them sent off to different folders automatically, find a line in the header that identifies the list. "X-BeenThere:" is most commonly used, but "Errors-To:", "List-Id:", "Mailing-List:", "Reply-To:", "Sender:", and "X-Loop:" are other good ones to look for.

The spamassassin block is where we finally get some useful spam filtering. The first rule (the ":0fw: spamassassin.lock" and "| /usr/bin/spamc" lines) feed the entire message off to spamc, which hands it off to spamd for spam score calculation. spamd also adds and/or modifies headers to indicate that the message is spam; in particular, it adds a "X-Spam-Status: Yes" header for any messages with a spam score 5.0 or above, and it also adds a "X-Spam-Level: *******" with the number of asterisks equal to the integer part of the score. In other words, a spam with a score of 8.3 would get a "X-Spam-Level: ********" header.

We use the X-Spam-Level header to in effect give us two cutoffs; 5.0 and 10.0. Those with a score 10.0 and higher get relegated to a different folder, one that we don't need to check as often, or depending in your needs, at all. These are the ones that are almost certainly spam with very little chance of false positives. With those out of the way, we can send all the remaining spams (with scores between 5.0 and 9.9) to the spamassassin-spam folder. This one should probably be checked from time to time as some messages may be incorrectly identified as spam.

As a side note, if the message doesn't match either test, we have no more procmail rules in this file to specify what to do with it. In that case, the message is sent to the mail folder specified in the DEFAULT variable; in this case, /home/spamtest/mail/mbox .

With this file in place, send a test message to spamtest@mydomain.com. The messages should show up in /home/spamtest/mail/mbox with minimal headers saying that the message score is less than 5 and should not include the "X-Spam-Status: Yes" or "X-Spam-Level: **..." headers.

Now bounce a spam message you've received to spamtest@mydomain.com . If the spam score is high it should show up in /home/spamtest/mail/spam10 . If the score is between 5 and 9.9, it should show up in /home/spamtest/mail/spamassassin-spam .

If this didn't work, go back and find out why. That's why we do this with a test user that won't get angry if mail is lost. :-) Some files that may help in the query will be /var/log/maillog and /home/spamtest/.procmail/proclog ; these may tell you where the mail was sent and possibly even why. With the three test messages I sent, proclog shows where they went:

From wstearns@pobox.com  Sun Mar 23 21:55:00 2003
 Subject: quick check
   Folder: /home/spamtest/mail/mbox                                         1569
From wstearns@pobox.com  Sun Mar 23 21:56:02 2003
 Subject: *****SPAM***** wow em this summer...start now
   Folder: spamassassin-spam                                                3299
From wstearns@pobox.com  Sun Mar 23 21:58:07 2003
 Subject: *****SPAM***** Earn great money from home!           DVCMXNI
   Folder: spam10                                                          10863

While you're doing your investigation, you may wish to temporarily restore the user's original .procmailrc or simply rename this new one to something else so that more incoming mail is not misdirected.

The result

At this point, I'm going to leave you to set up your users using this approach. Many of the checks are automatically working, including the Auto-Whitelist and Bayesian filtering, although both could be helped if you explicitly feed them known ham and spam. Razor2 is an excellent addition, and will probably show up in a future version of this article. The RBL checks should be working as well; some of your messages should include lines like:

*  0.6 -- RBL: Received via a relay in relays.osirusoft.com
          [RBL check: found 142.249.10.63.relays.osirusoft.com., type: 127.0.0.3]

With spamassassin in place, your users' mail should now be automatically filtered into folders. They still get just as much, but it's not a constant interruption. By filtering into "almost certainly spam" and "probably spam" folders, you actually have cut down on the number of messages to which your users have to actively pay attention. I've spent quite a bit of time teaching my spamassassin installation about known spam and ham, so I feel comfortable making the second cutoff at 8.0 (spams with a score higher than this I won't look at at all, but they're still available if I later find out that something was misclassified). This means I never look at 78% of the incoming spam, changing my yearly spam week into a yearly spam day.

And I figure I can spend 2 of my 4 free days this year writing a spam filtering article for you. :-)

Additional resources

The aracnet procmail howto
The Procmail home site, with lots of links to howtos
Spamassassin home site.

Advanced topics

Manual blacklist

By adding the following lines to ~/.spamassassin/user_prefs , you tell spamassassin that mail from any of these domains gets a +100 spam score, effectively blocking them. This list works for me, but you may wish to at least briefly look it over to see if it works for you:

blacklist_from sde@spledee.com
blacklist_from *@163.com
blacklist_from *@163.net
blacklist_from *@3fec.com
blacklist_from *@4urop.com
blacklist_from *@bluelightoffers.com
blacklist_from *@bonanzaoffers.com
blacklist_from *@deal-seeker.com
blacklist_from *@dealpatrol.com
blacklist_from *@direct.email-publisher.com
blacklist_from *@discountcertificates.com
blacklist_from *@drm.email-publisher.com
blacklist_from *@e-mailpromo.com
blacklist_from *@fantastic-bargain.com
blacklist_from *@fantasy-mail.com
blacklist_from *@free2sample.com
blacklist_from *@gr8dls.com
blacklist_from *@greatdealsdepot.net
blacklist_from *@hi-speedemail.com
blacklist_from *@hi-speedmediaoffers.net
blacklist_from *@hi-speedoffers.net
blacklist_from *@hispeedmediaoffers.com
blacklist_from *@hispeedoffers.net
blacklist_from *@hsm-mailerdirect.com
blacklist_from *@hsmediadirect.com
blacklist_from *@hsmoffers.net
blacklist_from *@hsmspecials.net
blacklist_from *@itsremarkable.com
blacklist_from *@ixpweb.com
blacklist_from *@j4un.com
blacklist_from *@j4yn.com
blacklist_from *@jfyn.com
blacklist_from *@jumpjive.com
blacklist_from *@justforyou-mail.com
blacklist_from *@justforyounewsletter.email-publisher.com
blacklist_from *@lessthanyouthought.com
blacklist_from *@lifesaversdirect.com
blacklist_from *@lotto-mail.com
blacklist_from *@mail.krazykash.com
blacklist_from *@marinedigital.com
blacklist_from *@mxdat.com
blacklist_from *@mydailyoffers.com
blacklist_from *@mypremiumoffers.com
blacklist_from *@netadoffers.com
blacklist_from *@offertoday.com
blacklist_from *@optin-offers.net
blacklist_from *@save99.com
blacklist_from *@savingshaus.com
blacklist_from *@sendgreatoffers.com
blacklist_from *@speedyvalues.com
blacklist_from *@somer.ew01.com
blacklist_from *@super-bargains.net
blacklist_from *@timesaversdirect2u.com
blacklist_from *@top-brands.net
blacklist_from *@vendeeamerica.com
blacklist_from *@yourmailsource.com
blacklist_from *@zaushon.com
blacklist_from *@*.ew01.com
blacklist_from *@*.speedi-list.com
blacklist_from *@*.verticalresponse.com
blacklist_from *@*hspeedm.com
blacklist_from *@*.*.caumraen.com
blacklist_from *@*.*.dewueld.com
blacklist_from *@*.*.festizone.com
blacklist_from *@*.*.inhauser.com
blacklist_from *@*.*.laufhuasn.com
blacklist_from *@*.*.nazlwons.com
blacklist_from *@*.*.optewian.com
blacklist_from *@*.*.pewuiea.com
blacklist_from *@*.*.queaton.com
blacklist_from *@*.*.rosevse.com
blacklist_from *@*.email-deliveries.net

Sitewide settings

If you want to make any user configuration options (like the above lines) apply to all spamassassin users, place them in /etc/mail/spamassassin/local.cf .

Sitewide filtering

Warning - currently untested, use at your own risk.

By the way, did I mention this hasn't been tested even once?

The obvious next question is, "Can I just get spamassassin to process everyone's mail without having to mess with everyone's .procmailrc?"

Obviously the answer's yes, or I wouldn't have asked the question. *grin*

Please make sure you have spamassassin working correctly for at least a test user before going for the big Kahuna. Screwing up for all your mail users tends to hurt your chances for long-term employment.

With the warnings out of the way, here's what we'll do. We're going to run spamc out of the system-wide procmail configuration file, /etc/procmailrc . Requests made in here are performed for all locally delivered messages.

Set up a conservative set of configuration choices in /etc/mail/spamassassin/local.cf . In particular, it might be a good idea to raise the default cutoff for spam to 8.0, at least initially. If everything works and you're not getting any false positives in a week, lower it to 7, and then 6.5 or 6 a week later. This gives the AWL and bayes databases a chance to learn a bit before they're really crucial.

required_hits 8
rewrite_subject 1
report_header 1
use_terse_report 1
defang_mime 0
report_safe 0
use_razor2 0
use_bayes 1
auto_learn 1
ok_locales en

Now tell procmail to run spamc on everyone's mail. Add these to /etc/procmailrc :

DROPPRIVS=yes

:0fw
| /usr/bin/spamc

Finally, remove the last two lines from everyone's individual /home/{user}.procmailrc files. The scoring and header changes were done when the message first passed through /etc/procmailrc; the custom requests like "filter all messages with a score of 10 or higher into this folder" still need to get done locally in /home/{user}/.procmailrc .

If you have a small number of users that specifically don't want their spam filtered, add a line like the following for each user or domain to /etc/mail/spamassassin/local.cf :

all_spam_to masochist1@mydomain.org
all_spam_to *@masochists.org

Existing mail in an IMAP folder

(Many thanks to Marion Bates for contributing this section.)

The above approach works fine for new mail, but what about mail that a user has already received?

If that mail is in an imap folder, you're in luck. Roger Binns has written IMAP Spam Begone to reprocess mail that is already in an IMAP folder.

There are a couple of assumptions:

You already have spamassassin installed and configured on whatever machine this script is running on.
Your main imap inbox is /var/spool/mail/username
You store custom imap mail folders in ~/mail/ and your mail client is configured thusly. For Apple's Mail.app program, go to Preferences->Account Information and select the Advanced tab, and enter "~/mail/" under "IMAP Path Prefix". For Netscape, Check under Edit, Preferences, Mail & Newsgroups, Mail servers, select the appropriate Incoming Mail Sever, Click Edit..., Select Advanced, and enter "~/mail" in the "IMAP server directory" box. You'll also want to uncheck "Show only subscribed folders". Restart Netscape for this to take effect.

Get isbg from http://www.rogerbinns.com/isbg/isbg.py .
chmod 755 isbg.py
Make a mail folder to hold all your suspected spam (in this example, ~/mail/ilovespam)

Run isbg.py like so:

./isbg.py --imaphost your-mailserver.com --imapinbox /path/to/your/inbox --spaminbox /path/to/your/spam/folder --delete --expunge

Example:

./isbg.py --imaphost mail.goober.com --imapinbox /var/spool/mail/joeblow --spaminbox /home/joeblow/mail/ilovespam --delete --expunge

It will prompt you for your imap password, then it will go through the inbox you specified (which may take awhile), and then report what it found -- something like:

4 spams found in 6 messages

The --delete command marks the messages for deletion from your inbox. The --expunge option used in conjunction with --delete will cause the marked messages to be actually removed from inbox (they will still be in ilovespam though).

isbg is smart, and will only look through messages it hasn't seen before (i.e., it won't go through your ENTIRE inbox each time -- only new messages will be scanned.) It does this via imap's use of unique message ids.

If you run isbg with the --savepw option the first time, it will remember your imap password (saved on disk in an obfuscated way) such that you can then make a cron job to run the script automatically.

Because the isbg script and spamassassin can run on a machine other than the mailserver, this approach can be used to filter mail on a remote IMAP mailserver that may not be able to run Spamassassin directly.

Credits

The Spamassassin team gets 2 thumbs up from me for an excellent tool! This is the first spam filtering tool I've used that I feel confident is doing an accurate job of identifying and filtering the spam onslaught.

Bill Stearns wrote the main text of the article. Marion Bates contributed the section on ISBG. Both Marion and Drew Como were kind enough to review an early draft of this article.