Wednesday, May 28, 2008

Adding Search (with Lucene)

The time has finally come for me to add search functionality to Ringlight. There is enough content now that just clicking around is getting tedious. After all, it's up to almost 200,000 files now.

There are a number of considerations to make when adding search to your site. For instance, you can usually get by pretty well with just integrating Google search into your website. This is fast, easy, and doesn't require messing with your backend code at all.

However, this is not really what I want. I want to let users search for files, not web pages, and I want the results integrated nicely with everything else. For instance, it would be cool to use a search query as a radio playlist like you can do on Hype Machine. So I'll need to build my own search engine.

This is not really that hard to do. I would recommend you read some articles and then download Managing Gigabytes for Java. Those articles are by Tom from AudioGalaxy. You may remember AudioGalaxy as the best thing to happen and unhappen to music in my lifetime. I know do. More importantly, it was deliciously scalable and for the most part it was just a search engine. So don't go writing one without learning some tips from the best.

I'm sure that a little engineering and MG4J could produce a highly scalable search engine. However, I didn't really want to spend that much time on it, so I went with a higher level solution in the form of Lucene for Java. There is also a popular version for Python. I would recommend waiting a while if you're considering using Lucy (Lucene in C with Python and Ruby bindings) because I don't consider it mature. I'd also stay away from layers on top of Lucene like Solr because if you're looking for tools to make Lucene easier to use then you're missing the point that it's already easy to use.

Download the official Lucene release and you'll see that it comes with source code for a demo. One class, IndexFiles, shows you how to add information to the search index. Another class, SearchFiles, shows you how to search the index to retrieve items. Like most demo code, it's both too simple (doesn't provide enough use cases of the library to let you fully understand it) and too complex (has a bunch of command line arguements that it has to parse and such). However, it will do. I have working search for my whole site after two days of fiddling around and working on it part-time.

Wednesday, May 21, 2008

Napster DRM-Free, The Future and Past of Online Music

As of today, Napster now offers DRM-free MP3 downloads. In one sense, it seems that the war is over, and we won. Let's take a look at the history and future of online music distribution.

First there was Napster, one of the original three peer-to-peer systems (along with Freenet and Gnutella). Unlike the other two, it was a commercial venture focused on exclusively on music distribution. It became the controversial flagship of the peer-to-peer movement and the conversation about p2p became focused exclusively on the online music distribution debate. While I was going to conference to speak about Freenet, I would introduce it as a censorship-resistant platform for protecting the freedom of speech of political dissidents and the audience would ask me, "How are artists going to get paid?", the jingoistic slogan propagating by the Recording Industry Association of America. You could argue that that wasn't the point, but by then you'd already gone off the topic of how to revolution media distribution to be efficient and fair and were now talking about the online music debate again. It wasn't even really possible to distribute mp3s over Freenet, but yet it kept coming up again and again.

After Napster, Grokster, and most notable AudioGalaxy (the best thing to happen and unhappen to music in my lifetime) were shut down, the p2p hackers went mostly underground, becoming assimilated into a number of small startups acquired by big companies. No one wanted to touch music at all. Personally, I stopped buying CDs as I had no desire to contribute funding the the organization attacking my friends, my livelihood, and everything interesting going on in my field. I listened exclusively to my collection of unsigned, obscure bands from AudioGalaxy (bands that had paid to be listed on AudioGalaxy so as to get more exposure) for many years. Have you ever heard of The Blood Group? So good.

Online music distribution was dead, and then Apple stepped in, with a not particularly new, but well-executed plan to make money off of online music sales by selling an expensive player for their exclusive DRM format (the iPod). This was only the first step of the plan, however. Step two was to bring in DRM-free MP3s at a higher price than DRM files, offering the RIAA what they really wanted from MP3 sales, which was making even more money selling mp3s than CDs. This is, of course, insane, as CDs contain much more value than MP3s in the form of cover art liner notes, a physical object to hold, and of course the costs of CD distribution are higher as well. Having a higher price for something which has both lower value and a lower cost is just not sustainable. However, it must have sounded quite appealing to the record industry that while CD sales were dropping they could actually make more money that ever before. I'm sure those power point charts went over very well in meetings. Then step three from Apple, readjust the price of DRM-free MP3s to be the same as DRM tracks. As iPod sales decline (because everyone already owns an iPod), Apple switches to a business model not dependent on the iPod and becomes the number one online music distributor. Apple has become what Napster and AudioGalaxy tried to be, the future of music distribution.

So in that sense, we won. Go p2p hackers! Oh but wait, this has nothing to do with p2p hackers, does it? This is just online music distribution. And in this sense, we've lost, or at least are still at war. Peer-to-peer became demonized because of its use for online music distribution. Now that online music distribution is okay, peer-to-peer is still getting picked on. The content industries would have you believe that p2p is only for morally corrupt kleptomaniacs bent on stealing and never paying. This doesn't explain why Limewire is still around as a commercial entity, making money off of Limewire Pro, or how services like TorrentFreedom manage to exist charging $17/month (more than an Napster subscription when they were doing subscriptions).

The sad fact is that the focus has been on not what you are doing (distributing music online), but how you are doing it (peer-to-peer). This is tragic because peer-to-peer distribution is a technological revolution as important as the printing press, whereas online music distribution is only moderately interesting in the long term, being just one example of the future of media distribution being over the Internet.