You've been there before. A spam email pops up in your inbox and it's utter nonsense. A rancid word jambalaya of "Hello dears," a bunch of names you never even knew could be names, and sentences not of this world. Why? Why does anyone do that? Let me explain. With the help of Harry Potter quotes.
A new breed of spam emails popping up especially in Ireland have taken to prefacing their information thieving efforts with random one-or-two-line Harry Potter quotes. Stuff like, "AS THE WEREWOLF REARED SNAPPING ITS LONG JAWS SIRIUS DISAPPEARED FROM HARRYS SIDE HE HAD." Weird, right? Well, only sort of.
While hilariously out-of-context quotes from JK Rowling's spellbinding opus seem to pair with "yoU have 17 nEww private mesages fr0M local singles!!!111" about as well as wine and a big ol' bowl of sour cream, it's actually an attempt to outsmart spam blocking software. As The Daily Edge explains, anti-spam engines tend to work on three levels: RBLs (blacklists of known spammer IP addresses), URIBLs (blacklists of known spam sites), and a Bayes Engine, which handles everything the other two parts can't.
The latter is key because Bayes Engines use statistics to figure out which words and phrases are associated with spam. "Viagra," for instance, is usually a pretty obvious tell. So the Harry Potter quotes are meant to fool the system—you know, talk about the sort of wand that might be made of sacred oak and unicorn hair instead of, er... the other kind (although now I really want to see Harry-Potter-themed male enhancement spam).
Here's where it gets crazy: Bayes Engines learn over time by way of emails being marked as spam. They catch on pretty quickly, in other words. This why it frequently seems like every spammer is kinda doing the same thing, using the same sorts of phrases, sentences, etc. It comes in waves. Then, when a topic is exhausted and the mighty Spam Skynet has assimilated it completely, spammers move on to something new—perhaps a bizarre set of words or a news item or a celebrity—in an attempt to fool Bayes Engines again.
That's why spam emails are so weird, but in an almost consistent way. Spammers are paying attention to what does and doesn't get blocked and reacting accordingly.
By that logic, you'd think today's hardened veteran spam filters wouldn't let a single even slightly suspicious word slide. Heartfelt letter from grandma? She said "magic," so it's probably spam. That, however, isn't quite how it actually works. Spam filters are also programmed to essentially forget with time, precisely so that we don't end up with a situation where the entire dictionary ends up on their banned book list.
TMI is a branch of Kotaku dedicated to telling you everything about my adventures in the gaming industry (and sometimes other offbeat and/or uncomfortable subjects). It's an experiment in disclosure, storytelling, interviewing, and more. The gaming industry is weird. People are weird. I am weird. You are weird. Why hide that? Let's explore it.