<?xml version="1.0"?>
<rdf:RDF 
	xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns="http://purl.org/rss/1.0/">

	<channel rdf:about="http://rtnl.org.uk/words/rss.xml">
		<title>Words by c.z.robertson</title>
		<link>http://rtnl.org.uk/words/</link>
		<description>
			Technology, IP law, other stuff.
		</description>

		<items>
			<rdf:Seq>
				
					
					
						
						<rdf:li rdf:resource="http://rtnl.org.uk/words/20090417-features_in_bayesian_spam_filtering.shtml" />
					
				
					
					
						
						<rdf:li rdf:resource="http://rtnl.org.uk/words/20080825-orchid_chopstick.shtml" />
					
				
					
					
						
						<rdf:li rdf:resource="http://rtnl.org.uk/words/20080324-learn_from_the_mistakes_of_others.shtml" />
					
				
					
					
						
						<rdf:li rdf:resource="http://rtnl.org.uk/words/20080321-hands_of_ruin.shtml" />
					
				
					
					
						
						<rdf:li rdf:resource="http://rtnl.org.uk/words/20080202-weekend_reading.shtml" />
					
				
			</rdf:Seq>
		</items>

	</channel>
  
	
		
		
			
			<item rdf:about="http://rtnl.org.uk/words/20090417-features_in_bayesian_spam_filtering.shtml">
				<title>Features in bayesian spam filtering</title>
				<link>http://rtnl.org.uk/words/20090417-features_in_bayesian_spam_filtering.shtml</link>
				<dc:date>2009-04-17T05:01:24GMT/BST</dc:date>
				<content:encoded>
					&lt;p&gt;I've recently switched from using &lt;a href=&quot;http://crm114.sourceforge.net/&quot;&gt;CRM114&lt;/a&gt; as my spam filter to &lt;a href=&quot;http://spamassassin.apache.org/&quot;&gt;SpamAssassin&lt;/a&gt;. I wanted to take advantage of systems like &lt;a href=&quot;http://razor.sourceforge.net/&quot;&gt;Razor&lt;/a&gt; and &lt;a href=&quot;http://pyzor.sourceforge.net/&quot;&gt;Pyzor&lt;/a&gt; and I wanted to apply some whitelisting. On the other hand, I'm inclined to think that a bayesian classifier approach to content filtering makes a lot more sense than SpamAssassin's collection of &lt;a href=&quot;http://spamassassin.apache.org/tests_3_2_x.html&quot;&gt;weighted tests&lt;/a&gt;. I don't know how the weights of SpamAssassin's tests were calculated. They seem very precise, but I wonder about their accuracy. Furthermore, you might expect their accuracy to vary from person to person.&lt;/p&gt;

&lt;p&gt;So here's what I was thinking: The weights of the tests should be calculated in a bayesian way. Run each test over the email and if it triggers then add it as a feature for bayesian consideration. Currently all the bayesian spam filters that I'm aware of only use words (or some tokens based on words) in the email as the features they consider. But I'd like to know how much spaminess is implied by an email coming from a machine with no reverse DNS. SpamAssassin gives that a score of 0.1 (with a score of 5 indicating spam, by default), so it's got some sort of implicit notion of probability, but I'd like to see that probability calculated using bayesian techniques.&lt;/p&gt;

&lt;p&gt;Funnily, while writing this post I realised that I'm not the first person to come up with this idea:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Specific spam features (e.g. not seeing the recipient's address in the to: field) do of course have value in recognizing spam. They can be considered in this algorithm by treating them as virtual words. I'll probably do this in future versions, at least for a handful of the most egregious spam indicators. Feature-recognizing spam filters are right in many details; what they lack is an overall discipline for combining evidence.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's from Paul Graham's &lt;a href=&quot;http://www.paulgraham.com/spam.html&quot;&gt;A Plan for Spam&lt;/a&gt;, the essay that brought the world's attention to bayesian spam filtering in the first place. If anyone knows of a system that does this then please let me know.&lt;/p&gt;

					&lt;p&gt;
						Comments: 0
					&lt;/p&gt;
				</content:encoded>
			</item>
		
	
		
		
			
			<item rdf:about="http://rtnl.org.uk/words/20080825-orchid_chopstick.shtml">
				<title>ORCHID chopstick</title>
				<link>http://rtnl.org.uk/words/20080825-orchid_chopstick.shtml</link>
				<dc:date>2008-08-25T15:57:17GMT/BST</dc:date>
				<content:encoded>
					&lt;p&gt;
From the elaborate packaging of some very pretty chopsticks I was given last week:
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;
Just as we can conceive a word in a grain of sand, we can perceive the universe from a pair of chopsticks. As a top company, ORCHID chopstick, which specialise in chopsticks, sale of chopsticks as craft work. The unique managing concepts of ORCHID chopstick bring up an enterprise from a pair of chopsticks.
&lt;/p&gt;
&lt;p&gt;
Chopsticks are not only a kind of dispensable tableware in our daily life, but also a perfect present for our relatives and friends. The significance of chopsticks as a present is lies in the homophonies of chopsticks in Chinese with the implied meanings of happiness and luckiness for newly-married couples, chopsticks also expresses the sincere wishes to persons who are becoming one family and having babies as soon as possible. For children, it means fast growth and for aged, it brings happiness, health and longevity. Furthermore, chopsticks come up as a pair which is indispensable of each other, therefore, chopsticks can be a symbol of solidarity and friendship. As a present for friends, chopsticks means good things come as a pair and a long lasting friendship.
&lt;/p&gt;
&lt;/blockquote&gt;

					&lt;p&gt;
						Comments: 0
					&lt;/p&gt;
				</content:encoded>
			</item>
		
	
		
		
			
			<item rdf:about="http://rtnl.org.uk/words/20080324-learn_from_the_mistakes_of_others.shtml">
				<title>Learn from the mistakes of others</title>
				<link>http://rtnl.org.uk/words/20080324-learn_from_the_mistakes_of_others.shtml</link>
				<dc:date>2008-03-24T16:32:39GMT/BST</dc:date>
				<content:encoded>
					&lt;p&gt;If you're going to &lt;a href=&quot;http://www.flickr.com/photos/bitful/2352119883/&quot;&gt;get involved in a pillow fight&lt;/a&gt;, don't do it in a black woollen greatcoat. a) It's black, so all the little white feathers show up, b) it's wool and all the little white feathers stick to it very well, and c) there's a lot of surface area to spend the next few days picking the little white feathers off.&lt;/p&gt;
					&lt;p&gt;
						Comments: 0
					&lt;/p&gt;
				</content:encoded>
			</item>
		
	
		
		
			
			<item rdf:about="http://rtnl.org.uk/words/20080321-hands_of_ruin.shtml">
				<title>Hands of Ruin</title>
				<link>http://rtnl.org.uk/words/20080321-hands_of_ruin.shtml</link>
				<dc:date>2008-03-21T12:36:53GMT/BST</dc:date>
				<content:encoded>
					&lt;p&gt;Last year I decided that I should get back into music again. &lt;a href=&quot;http://handsofruin.com/&quot;&gt;Hands of Ruin&lt;/a&gt; is the result.
&lt;/p&gt;&lt;p&gt;
I'm sticking with the dark, atmospheric electronica aesthetic that you might have heard with my &lt;a href=&quot;http://rtnl.org.uk/music/&quot;&gt;earlier work&lt;/a&gt;. My tools are a bit different though. I'm now working almost exclusively with &lt;a href=&quot;http://zynaddsubfx.sourceforge.net/&quot;&gt;ZynAddSubFX&lt;/a&gt; and &lt;a href=&quot;http://www.filter24.org/seq24/&quot;&gt;Seq24&lt;/a&gt;. I'm trying not to spend too much time experimenting with new bits of software. Both of those tools are nice because they're very simple and it's easy to get musical results out of them. Unlike &lt;a href=&quot;http://csound.sourceforge.net/&quot;&gt;Csound&lt;/a&gt;, you don't have to spend weeks building your instruments before you can create some sound.
&lt;/p&gt;&lt;p&gt;
I'm also trying to sell my music now, which is an interesting departure for me. Too much time spent reading about entrepreneurialism, probably.&lt;/p&gt;
					&lt;p&gt;
						Comments: 0
					&lt;/p&gt;
				</content:encoded>
			</item>
		
	
		
		
			
			<item rdf:about="http://rtnl.org.uk/words/20080202-weekend_reading.shtml">
				<title>Weekend reading</title>
				<link>http://rtnl.org.uk/words/20080202-weekend_reading.shtml</link>
				<dc:date>2008-02-02T22:44:52GMT/BST</dc:date>
				<content:encoded>
					&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://www.ryanholiday.net/archives/2007/02/fight_club_moments.phtml&quot;&gt;Ryan Holiday: Fight Club Moments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.nerve.com/PersonalEssays/Barlow/shameless/&quot;&gt;John Perry Barlow: A Ladies' Man and Shameless&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.nytimes.com/2005/03/20/magazine/20HARVARD.html?ei=5090&amp;amp;en=e9727ddcbbbd4431&amp;amp;ex=1268974800&amp;amp;partner=rssuserland&amp;amp;pagewanted=all&quot;&gt;Stephen J. Dubner on Roland Fryer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.ted.com/talks/view/id/191&quot;&gt;Matthieu Ricard at TED: Habits of happiness&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
					&lt;p&gt;
						Comments: 0
					&lt;/p&gt;
				</content:encoded>
			</item>
		
	

</rdf:RDF>
