<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Extenuating Circumstances &#187; linguistics</title>
	<atom:link href="http://danhon.com/category/linguistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://danhon.com</link>
	<description>is a weblog by Dan Hon</description>
	<lastBuildDate>Mon, 06 Sep 2010 10:02:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>A Smarter Naughty Words Filter</title>
		<link>http://danhon.com/2007/06/06/a-smarter-naughty-words-filter/</link>
		<comments>http://danhon.com/2007/06/06/a-smarter-naughty-words-filter/#comments</comments>
		<pubDate>Wed, 06 Jun 2007 16:23:05 +0000</pubDate>
		<dc:creator>danhon</dc:creator>
				<category><![CDATA[8bar]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[filter]]></category>
		<category><![CDATA[filtering]]></category>
		<category><![CDATA[ianhughes]]></category>
		<category><![CDATA[ibm]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[naturallanguage]]></category>
		<category><![CDATA[rooreynolds]]></category>

		<guid isPermaLink="false">http://danhon.com/2007/06/06/a-smarter-naughty-words-filter/</guid>
		<description><![CDATA[In my last few days at Mind Candy I&#8217;ve taken part in a meeting with the inestimable Roo and Ian of IBM&#8217;s eightbar (Steff, upon learning of the job titles as Metaverse Evangelists, instantly declared a competition to naturally work in &#8220;the greatest swordfighter in the metaverse&#8221; into casual conversation). For whatever reason, we got [...]]]></description>
			<content:encoded><![CDATA[<p>In my last few days at Mind Candy I&#8217;ve taken part in a meeting with the inestimable <a href="http://rooreynolds.com/">Roo</a> and <a href="http://epredator.blogspot.com/">Ian</a> of IBM&#8217;s <a href="http://www.eightbar.co.uk/">eightbar</a> (Steff, upon learning of the job titles as Metaverse Evangelists, instantly declared a competition to naturally work in &#8220;the greatest swordfighter in the metaverse&#8221; into casual conversation). </p>
<p>For whatever reason, we got talking about the seemingly easy problem (at least, when presented by management) of putting together a &#8220;naughty words filter&#8221; or, in Ian&#8217;s case, a &#8220;death threat filter&#8221;. Which, when you think about it, normally goes something like this:</p>
<p>Requirements:</p>
<ul>
<li>A naughty words filter</li>
</ul>
<p>Spec: </p>
<ul>
<li>A filter that will filter out a list of naughty words</li>
</ul>
<p>Which, as many people know, is rather hard to do properly, and very easy to do badly, or such that it&#8217;s a waste of time. In Ian&#8217;s case, he told us of a rather amusing incident where, if you&#8217;re filtering/looking for death threats, an international (let&#8217;s say German) audience may not be particularly helpful when one of the indefinite articles is <strong>die</strong>.</p>
<p>Anyway.</p>
<p>We were talking about various strategies for maintaining an up-to-date list of naughty words when I struck upon the idea of having <a href="http://www.urbandictionary.com/">Urban Dictionary</a> simply publish a list of all of their words, and blacklisting against that. Or, at least, having a human at Urban Dictionary quickly can through for the slightly more offensive words, and publish those as a blacklist, charging for the list, if they felt like it.</p>
<p>Well, I thought it was a good idea&#8230;</p>
<p><strong>Update:</strong> it&#8217;s <a href="http://www.eightbar.co.uk/">eightbar</a>, not 8bar.</p>
]]></content:encoded>
			<wfw:commentRss>http://danhon.com/2007/06/06/a-smarter-naughty-words-filter/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
