A little while ago, Mark Liberman linked to an interesting exercise in computing all of the possible ways to spell "Viagra" without triggering spam filters (or maybe violating Pfizer's trademark).
Your host was reminded of it this morning when an interesting bit of spam showed up in his office e-mail. Now, it has to be said that for an account which is 5 years old, the address in question has been remarkably spam-free. Lately, this has started to change. Below, the text of the latest message:
This is a courtesy offer for our team of fina ncial experts to lower your Mor tgage rate and sa veyou thousands.
Our consul tants are at your disposal to assist you in reaching optimal savin gs & your goals.
We guar antee the low est rate s in the country.
You will be contacted by a fi nancial specialist promptly. Your satisfaction is our primary goal.
Our specialists will do everything they can to help you sa ve money starting today.
*We are a member of the BBB. All information is confidential.
**Ra tes as low as 3 . 05 %.
The use of spaces within words is too frequent to be merely accidental. So even though SC has no idea how most people configure their spam-filtering software, it's pretty easy to guess at the sender's strategy for avoiding being blocked by keywords.
If the words "financial", "mortgage", "save", "consultants" and "rates" were blocked from SC's e-mail, though, then plenty of other messages wouldn't get through. So he's not sure whether or not this really is a good strategy for avoiding keyword-based filters.
Oh wait, I forgot something:
puckish magnuson absentee excerpt byrd zucchini execute kissing madras confront iodide dirac apprentice angora accentuate muddy confectionery gunmen tantalus angel aghast drub hamper sketchbook goat phobic
A line from a William S. Burroughs novel? Nope -- a line of text included over 100 lines below the end of the text reprinted previously in this post. This brought a smile to SC's face, if only out of appreciation for a clever adversary.
It would be foolish to design an e-mail filter that simply looked for a couple of keywords and dumped anything that included them. Nope, to provide a safe guess, you've got to have some way of estimating the probability that a message is actually spam. The simplest way to do this is with a Bayesian classifier; if a high enough proportion of the words in the text relate to mortgages, or Viagra, the message will be flagged as spam. Including all this additional text, none of which is likely to correlate with the usual spam topics, decreases the percentage of potential filter-triggering terms. It's a good defense, even if the filter writers manage to defeat the insertion of spaces to break up the mortgage-related terms.
Of course, the question that this doesn't answer is: who in their right mind would send down payment-sized amounts of money to someone using a fake e-mail address to send out messages full of spelling errors?
Comments