Words by c.z.robertson

Kling's Content is Crap

2003-01-15 11:05:38 UTC

Arnold Kling reckons that Content is Crap (via Infothought and others). If I were to judge by the quality of his article alone, I would have to say that he was right. (I'm sorry, that's unkind, but he was asking for it.)

The Commons enthusiasts believe that content publishers earn their profits by using copyright law to steal content from its creators and charge extortionary prices to consumers.

A misunderstanding like this is a bad way to start. Creative Commons is really only marginally related to the battle between artists and publishers. More importantly it's about changing the relationship between artists and the public.

Publishers may still be involved (see Down and Out in the Magic Kingdom), as long they don't mind dealing with the CC licenses.

[P]ublishers perform a valid economic function of filtering content and effectively distributing and selling it to consumers.

I understand the argument, but I can't agree that publishers are doing anything like a good job of it. More often publishers act as gatekeepers to the market, preventing 99.9% of artists from having access to any sort of market at all. Furthermore, the stuff they let through is so often shit (well-marketed shit, but shit nonetheless) that Kling's sewage processing analogy falls flat.

The internet has changed the ability of publishers to act as gatekeepers. As a musician I don't have any sort of income from my music, nor do I have thousands of adoring fans. However, I do have a few fans. Without the internet I wouldn't have any.

But there are no publishers to act as a filter on the musical "crap" that I produce. Doesn't matter. People will only find my music through links, and people will only link to me if they like my work. Just because I have a website that doesn't mean that everyone has to look at it.

I agree that this system could be improved. Publishers are not the improvement I'm looking for though. The collaborative-filtering idea I've had in my head for the last few years might be one way to tackle it. There are other ideas as well. I'm confident that we will develop better techniques than publishers.

Also, even if the filtering system we end up creating adds a significant amount of value, we may still be able to do that (as far as the public is concerned) for free. Just like you're reading this blog or listening to my music for free.

Kling also has a suggestion:

My guess is that the major gains in value added will come from the implementation of what are called Bayesian filters.

Kling's article goes downhill from here on. You're probably better off just not bothering with this, with the exception of this sentence:

Even Googling could be enhanced by Bayesian filters.

...which I found very funny indeed.

Bayesian techniques have been used in information retrieval for over twenty years. The Bayesian spam-filtering techniques that are getting so much attention these days are naïve implementations of techniques that are second-nature to IR people. And that includes the people working at Google.

Back in the mid- to late-nineties, the web search engines of the day used probabilistic techniques (which, as I understand it, are closely related to Bayesian techniques) to determine relevance. They did this using the words in the document, the words in the query, and possibly the words in other documents. The reason Google won so big was because they had the insight to use not just the words in the document but also incoming links to each document to determine relevance. (They may still be using Bayesian techniques in their analysis of links, I'm not sure. If you want you can read about it their paper The PageRank Citation Ranking: Bringing Order to the Web.)

In short, Bayesian techniques provide only a way of analysing probabilities. These techniques can be applied to many different kinds of attributes (e.g. the words in a document, incoming links to a document, the colour of the text, the length of the sentences, and so on). They've been used for many years and haven't magically solved the problem of finding good information. Kling is sadly ignorant of all this.