A Smarter Naughty Words Filter

by danhon

In my last few days at Mind Candy I’ve taken part in a meeting with the inestimable Roo and Ian of IBM’s eightbar (Steff, upon learning of the job titles as Metaverse Evangelists, instantly declared a competition to naturally work in “the greatest swordfighter in the metaverse” into casual conversation).

For whatever reason, we got talking about the seemingly easy problem (at least, when presented by management) of putting together a “naughty words filter” or, in Ian’s case, a “death threat filter”. Which, when you think about it, normally goes something like this:

Requirements:

  • A naughty words filter

Spec:

  • A filter that will filter out a list of naughty words

Which, as many people know, is rather hard to do properly, and very easy to do badly, or such that it’s a waste of time. In Ian’s case, he told us of a rather amusing incident where, if you’re filtering/looking for death threats, an international (let’s say German) audience may not be particularly helpful when one of the indefinite articles is die.

Anyway.

We were talking about various strategies for maintaining an up-to-date list of naughty words when I struck upon the idea of having Urban Dictionary simply publish a list of all of their words, and blacklisting against that. Or, at least, having a human at Urban Dictionary quickly can through for the slightly more offensive words, and publish those as a blacklist, charging for the list, if they felt like it.

Well, I thought it was a good idea…

Update: it’s eightbar, not 8bar.