The spam filtering setup on our server is pretty good – SpamAssassin with Bayesian filtering and the FuzzyOCR Plugin which I installed to deal with the rise of image-based spam last year. Still, a few email addresses that route to me are very public, and most days one or two spam messages get through the filters.

This morning I noticed a new phenomenon in my inbox. I almost moved it across into my “missed spam” folder without giving it a second thought (we train our filters with missed spam to improve the Bayesian analysis), but something caught my eye:

“That’s odd,” I thought, so I opened the pdf. (Note, in general unless you know what you’re doing, it’s a really bad idea to open attachments if you don’t know the sender or weren’t expecting something from them – it could be a virus.)

That’s right, it’s spam, in a pdf file. While spamassassin does a great job of analysing text, and even images using FuzzyOCR, no analysis is done of pdf attachments, so this one slipped through the net. (I’ve had seven copies of this so far today.)

What next? Well, if this type of spam continues (and there’s no reason to think it won’t) I expect we’ll see a pdf scanning plugin for SpamAssassin before too long. After that gains traction the spammers will undoubtedly adapt again with some new trick to avoid the filters. Rinse, and repeat.

The arms race continues…

    download the scam.ndb.gz and phish.ndb.gz files from
    Http://sanesecurity.com/clamav and it will now catch the pdf stock spams

    II really hope you’re right.

    Checkout a module I wrote called “PDFassassin” which is a plugin for SpamAssassin.

    This can scan emails for PDF attachments, uses the pdftotext utility to extract the text for Spam messages, and also extracts images and uses OCR to pin-point Spam messages embedded in pictures.

    This plugin can really help prevent the wave of PDF spam messages from hitting your mbox

    Details at:


    Looks like the bigger anti spam vendors are up to scratch too: PDF spam – a step ahead of image spam

