Tuesday, December 9, 2008

Scalable Clustering with Thrift and SQS

Since the Ringlight beta launch, we're edging up towards 100 users. It's certainly not the load that the engineers at Twitter have to deal with, but I would like to impress upon you my Law of Scaling:

Every power of ten, something different breaks (or becomes unusably slow).

So even with modest growth from 10 to 100 users, it's probably time to fix something.

Once principle of scalable design is to decouple slow operations from the user interface.

For instance, subscribers of Ringlight Personal Edition have the added feature of one-click backup of all of their files. However, this operation can take a long time to complete. Even just generating the list of files to back up can be time consuming if you have a really large number of files. Therefore, it is advantageous to move all of this out of the website and into a background process. The web application just records that you have clicked the one-click backup option and then alerts the background process that it's time to figure out exactly what needs to be done about this. This sort of architecture will keep your web page loading snappy and your users happy even on a heavily overloaded website, as you're not wasting their time making them wait for the page to load.

There are a number of ways to communicate between your web application and the background process. Of particular interest from a scalability standpoint are message queueing services such as Starling and SQS. These allow for high scalability by allowing many producers (your web server instances) talk to many consumers (your background processes).

Starling is a server written in Ruby (for Python, see Peafowl) that you run yourself. SQS is a hosted service that you pay for based on usage (number of messages sent and bandwidth used). Both are reasonable choices and have pretty similar APIs. You connect to the service, and then push strings onto a particular queue (identified by a queue name, which is also a string). Other processes can fetch strings from the queue given its name. Pretty easy! They also both have client libraries in most major languages, so integration into your app shouldn't be very difficult.

Of course, they only support strings, so if you have fancy objects that you want to send then you'll need to serialize and deserialize them to and from strings. There are of course language-specific ways to do this (Java Object Serialization, Python Pickles, etc.), but I prefer Thrift because it's fast, efficient, and is the same in multiple languages. This is handy because you can implement different components in different languages, which is sometimes useful. For instance, my web server is in Java and my background process is in Python.

Thrift also provides some additional handy components besides serialization, in particular a transport layer that provides RPC semantics over arbitrary transport mechanisms. It comes by default with socket and HTTP transports.

What I have implemented and made available for you in case you might find it useful is an SQS transport for Thrift. It effectively provides cross-language multicast RPC in a few lines of code. The key piece of code is TSqsClient, which provides the SQS transport using the boto library for Python. This is the piece that you'll need to port if you want to support other languages. The rest of the code is just for example purposes and is derived from simple-thrift-queue, which is a nice example of how to build an application using Thrift. The available methods are defined in the thrift file. It's important that they are defined as async and void, as this is a one-way transport. The producer calls methods on the stub classes generated by the Thrift compiler. These method calls are queued up SQS. The consumer gets the method calls from SQS and calls the methods on the handler class. Additionally, there are a couple of utilities. One to fetch a single message from SQS, so you can test the producer, and one clear the queue if you send too many messages.

Once nice thing about using Thrift is that you can swap out the transport easily. You can replace my SQS transport with a Starling one, or ditch queues altogether and use sockets or HTTP. The advantage of using SQS is that the producers and consumers can all be on different machines or the same machine, it makes no difference. Used together, you have a very flexible and very scalable system with very few lines of code. Just update your thrift file and handler class to use your API and everything else is handled for you!

Monday, November 3, 2008

Ringlight Beta Launch Party

After almost a year of hacking, testing, and web design, I'm ready to release a Ringlight beta to the public.

I'd like all of my friends and coworkers to join me at Conjunctured to celebrate this momentous occasion. The party is this Wednesday, November 5th, from 6-9pm.

Austin is a great place to do a startup and I couldn't have made it this far without the community. Let's work together to make things awesome!

Wednesday, October 22, 2008

The Rackspace Cloud Event

Today I attended the Rackspace Cloud Event at the ACLU stage, just a couple floors above the ACTLab. I'll skip the marketing and get right to what startups and developers might care about.
  • Rackspace bought Slicehost
  • Rackspace bought Jungle Disk
  • Their version of Google App Engine, previously Mosso, is now called "Cloud Sites"
  • Their S3 is called "Cloud Files" and is $0.15/GB per month
  • They will be offerering pay-as-you-go CDN for Cloud Files through Limelight for $0.22/GB
I've always had a problem with Mosso ("Cloud Sites") in that it promises to scale your app for you, but it never says what type of scaling it actually does. Luckily, one of the engineers talked to me afterwards and answered all of my questions. Here's how it scales:
  • A master load balancer which will be hot swapped in case of failure
  • A cluster of web server load balancers
  • A cluster of app servers
  • Master-slave replicated databases
So they're basically doing all the things that are good and reasonable. I didn't have a chance to ask about caching issues such as whether a memcached cluster or any other caching solution was available. Also, while they don't do automatic sharding of your database (which would be cool, but kind of insane), they do support apps with multiple master databases and will support separate replication for each master. It really beats services which just provide a cluster of app servers in terms of ultimate scalability.

Friday, August 15, 2008

Autocomplete Form Fields (with jQuery)

I recently added a feature to Ringlight which lets you share a private file with a particular user. There is a field to enter the username that you want to share with and obviously this field needs to autocomplete as you type. This is a feature that users expect now and so it's not really optional.

jQuery makes it very easy to add this feature:
  1. Download the autocomplete extension for jQuery. You'll need both the javascript and CSS files.
  2. Add an input field to your form:
  3. Set the cacheLength option to speed up responsiveness: options={cacheLength: "20"};
  4. Call the autocomplete extension: $("#youAutocompleteMe").autocomplete(url, options);
  5. You'll need a server-side script installed at the url you pass to autocomplete to return matches.
The autocomplete function will call the given url, passing the text currently in the input box in the q parameter. Your server-side code should return a newline-delimited list of matches.

That's it! It's pretty easy. The key to performance is setting that cacheLength parameter as it default to 1, which doesn't provide much caching at all.

Friday, August 8, 2008

AJAX File Upload Progress Bars (with jQuery)

I recently added progress bars (actually, a percentage instead of a bar, but it was the same to implement) to Ringlight uploads and downloads. I was suprised to find that the available server-side libraries for dealing with file uploads seemed to be inadequate for adding this functionality to my website.

The basic technique for adding progress bars is relatively simple. With jQuery:

  1. Install the Ajax File Upload extension.
  2. Install the periodic execution extension.
  3. Register some Ajax event callbacks to reveal the progress bar on the page when the upload starts and also to catch errors.
  4. Call the $.ajaxFileUpload function with the URL of the upload handler script, the id of the file input element, and the callback function to handle output from the upload handler.
  5. Have the upload handler return a Json object with an id for the upload.
  6. Call your progress bar update function with the periodic execution model: $.periodic(updateProgressBar);
  7. The updateProgressBar function should fetch the download status from a server-side script, supplying the upload id and a callback function: $.getJSON("fetchProgress", {id: id}, function(data) {/* update progress bar with data.percentDownloaded*/});
  8. The fetchProgress script should return upload progress information in a Json object. I return percentDownloaded, but you can include anything you'd like, such as upload rate. You should also provide error information here, such as if the upload failed.
  9. The callback function for fetchProgress should update the page to reflect updated progress. For instance, updating a percentage to completion could be as simple as $("#percent").empty().append(data.percentDownloaded);
This was all very simple to implement and jQuery made it possible in very few lines of code. The difficulty was in providing a percentDownloaded value. The difficulty comes from the fact that it is common for browsers to not include the Content-Length field for uploaded files. The file upload handling libraries generally solve this problem by either 1) not providing a content length or 2) loading the whole file into memory (or disk, in some cases) and then finding the length of it. Either way, not very useful for a progress bar! This total failure to handle streaming files is a common problem in libraries and if you could avoid it in the libraries you implement then the world would be most appreciative.

In the meantime, there are a number of action items that require your attention. First, calculate the file length by taking the HTTP request Content-Length field and subtracting the size of everything which is not the file in order to yield the file length. I did this with the following shoddy algorithm:
  1. Extract the MIME boundary from the Content-Disposition field.
  2. Subtract the size of the MIME boundary twice (there is a boundary on both sides of the file).
  3. Subtract 2 because the second boundary has a trailing "--".
  4. Subtract 4 because each boundary has a trailing two-character newline (\r\n).
  5. Subtract 8 because my numbers were always off by exactly 8. I'm not sure where this additional 8 is coming from.
As I said, this algorithm is shoddy, a kludge not fit for use in production. However, it works for now! It needs extensive testing and tweaking on a variety of browsers. The next steps:
  1. Improve algorithm so that it's robust enough to work with most browsers
  2. Submit a patch to python's FieldStorage class to support this algorithm
  3. Submit a bug report to Mozilla requesting that they supply content length in file uploads
For now, my upload progress bars are working, so I'm happy.

Tuesday, July 29, 2008

Flash Crowd Preparation - Load Testing With Siege

The official launch of Ringlight is approaching and so it's time to prepare for launch day flash crowds, in particular from Slashdot (or Digg, etc.). On the bright side, these sorts of Flash crowds are not as fearsome as the used to be, in that most website hosting these days provides adequate bandwidth and CPU to handle the load. The weak point is most likely your application itself, so it's worth load testing it.

So how much load is a Slashdot flash crowd? On the order of 1-10 requests per second. If you can handle 100 then you're more than fine. Additionally, this load only lasts about 24 hours. These are small numbers as compared to the continuous load you can expect for a popular site, but scaling up for your post-launch flash crowd is good preparation for the traffic to come.

The first step in scaling is to load test your site. Don't go writing a level 4 load balancer in C with async I/O just yet. First, find out what is slow and just how slow it is. I like to use siege for this because it let's you start simple.

First, apt-get install siege. Then, run siege.config to make a new config file (you can edit it later).
Then, try the simplest possible test: siege http://yoursite.com/

Don't run siege on someone else's website as they are likely to think they are under attack (and they are!) and block your IP.

Siege will launch a bunch of connections and keep track of core statistics: availability (should be 100%), response time (should be less than 1 second), and transaction rate (you're shooting for 100 transactions/second).

By default, siege will just keep attacking your server until you tell it to stop (ctrl-C). To run a shorter test, use -r to specify the number of repetitons. Note, however, that this is not the number of transactions that will be made. siege uses a number of simultaneous connections (15 by default, set it with -c), so if you specify -r 10 -c 10 for 10 repetitions with 10 simultaneous connections, then there will be 100 transactions.

Other important options are -d to set the random delay between requests and -b to use no delay at all. The use of -b is less realistic in terms of real load, but will give you an idea of the maximum throughput that your system can handle.

I tested my site with siege -b -c 100 -r 100 in order to hit it hard and see the max throughput. I found that most pages could handle 100 transactions/second, but one page as doing a scant 5 t/s. Unacceptable! I added memcached caching to that page for caching some of its internal state and it now benchmarks at about 90 t/s. That's perfectly acceptable for the launch crowd, but this benchmark relies on cache hits. With 100% cache misses, I'd be back to 5 t/s. So the real performance is somewhere in the middle, depending on the ratio of cache hits to misses. What this ration is depends on real traffic patterns as well as the size of the cache and the number of pages. This makes it hard to say, but I can give it a shot.

The way to simulate something like this with siege is to put a number of URLs in a file called urls.txt and then run siege -i. siege will then pick URLs to hit randomly from the file. Put multiple copies of the same URL in the file in order to simulate relative weighting of the URLs. By providing a file containing all of my caching-dependent URLs, weighted based on estimated popularity, I can see how well my cache is holding up and tweak caching settings as necessary to get adequate performance.

Friday, July 11, 2008

Startup Camp Austin

There's been a lot of discussion lately about Austin's startup culture.
Is Austin a good place to do a startup? Should you head to California instead? What sort of startsup work best in Austin? Where should you look for funding? Is Austin Ventures the only game in town? Why doesn't Austin have an equivalent of Tech Stars or Y Combinator?

There's a growing movement to make Austin a great place to do startups. Projects like the Startup District and various coworking spaces, both commercial and noncommercial are examining what we need to provide in order for Austin startups to succeed.

In the spirit of these disucssions, I'm organizing Startup Camp Austin, a one-day event on August 2nd where Austin's startup community will come together to discuss the issues, challenges, and advantages of having a startup in Austin.

RSVP on the Facebook Event page and sign up to give a presentation, moderate a discussion, demo your product, or pitch your idea on the BarCamp wiki.

Monday, June 30, 2008

Java and Memory Leaks

People are often surprised that I prefer Java as the primary language for coding serious applications. The assume that it must be ignorance of other languages, enforced slavery by my employer, or simple insanity. I assure you, however, that while I have experience in programming with 15 different programming languages and I enjoy many of them, I still choose Java for doing real work.

This is because while first class functions, closures, and metaobjects are all very cool and fun, I don't think these are the important factors when writing, say, a web application that you need to scale up to lots of users. What really matters are the libraries and the tools. These will save you more time than not having to type semi-colons at the ends of lines.

An example of what I mean is memory profiling. I recently wrote a handy load testing tool for Ringlight which generates an up-to-date google sitemap by hitting every URL on the website, comparing hashes to see if it's changed, and updating the sitemap's lastmod field. There are currently around 200,000 pages and hitting them all at once is a good test of the memory cache, database responsiveness, average page load time, etc. Interestingly enough, this process causes the server to run out of memory and crash.

Excellent! The load test revealed a memory leak, just what a load test should do. If the application was in, say, python, I would do is run the code under the primitive profiler that's available. This would spit out a stats file. I could then write some code using the bstats.py library to sort the stats in various ways looking for area which are consuming lots of memory.

Luckily, the server is in Java, so I can get a 15-day trial of YourKit Java Profiler (there are free ones as well, but YourKit is the best). My code runs on the server, the user interface runs on my local machine. They automatically communicate over the network so that I can get realtime graphs of memory consumption as my app runs. I can take snapshots of the memory state, run tests scenarious, compare the snapshots to see only the memory retained between tests, drill down through paths of method calls that look suspicious, check for common memory allocation gotchas, etc. It's an excellent tool and it makes finding memory leaks easy.

In this case, the memory leak seems to be inside the Java database access layer (JDBC). It appears that this is because the MySQL JDBC driver intentionally does not garbage collect results. You must wrap the use of any ResultSet objects with a finally clause that will attempt to close them. Of course, this is just good coding style anyway and I had already done this in my applications database access methods. Unfortunately, I later decided that I didn't like the way one of these methods was called and so added an additional database access method and this time I had forgotton to add the finally clause. As this new method because more prevelant in my code, the memory leak got worse.

Of course I'm sure that you, dear readers, would never be guilty of such inconsistent coding practices. This memory leak is a result of my fast and loose coding style. Some might say that it is this style which leads me to use languages with good tools. Others I know prefer to write their own memory profilers, object graph inspectors, and even syntax style checkers from scratch. Personally, I prefer to spend my time writing applications, at least while engaged in the professional business of writing applications. When not at work, I enjoy inventing my own impractical languages as much as anyone.

Wednesday, June 25, 2008

Mini-Bio

I realized as I was e-mailing an introduction to a new business contact today that I never really took the time to properly introduce myself. Here's the same miniature biography that I sent to my new contact:

I have worked for the last ten years in open source and peer-to-peer software, but in community projects and tech startups. I founded a number of open source peer-to-peer software projects, including Freenet, Tristero, Alluvium, and Project Snakebite. I've also worked in peer-to-peer Interent video delivery at Swarmcast as Senior Engineer and then at BitTorrent as the Director of Product Management.

I currently have a startup here in Austin called Ringlight, where I make social file-sharing software. It can be thought of as "peer-to-peer meets Web 2.0" or "google for your desktop". It indexes all of the files on all of your computers and makes them available on a website, so you can access your files from anywhere that has a web browser, send links to your friends, search, tag, bookmark, etc.

I also do some consulting in the areas of product design, product management, and engineering architecture.
I'm always happy to meet anyone in Austin with a startup and I'd love to hear what you're working on. Let me know if you'd like to have lunch anytime this week or next. Also, I will be at Jelly this Friday and most any Friday, so feel free to stop by if you'd like. Jelly is a co-working group that meets on Fridays at Cafe Caffeine. It's a great place to meet other people in startups.

Other things to check out if you want to know more about me: blog, personal website, twitter, LinkedIn profile.

Friday, June 6, 2008

Twitter Integration for your Website

Social sharing of content is a popular feature for websites. It seems like every blog post these days is accompanied with a list of bookmarklet buttons: Digg, StumbleUpon, del.icio.us, etc. However, what about adding the ability to post links to Twitter from your website? It's not quite as simple as a bookmarklet, but it's still pretty easy to do.

The Ringlight website is written in Java (client is in python), so when I picked a library for twitter access, I picked jtwitter as it has no external dependencies.

It's so easy to use that it's almost too easy. Check out my code:

Twitter twit=new Twitter(username, password);
Status status=twit.updateStatus(message);

So easy, right? Now you can go to any file on Ringlight, click on Share on Twitter, and the link is posted on Twitter. You don't even have to be logged into the Ringlight website, so any random user on the Internet can start twittering links to my copy of Accelerando.

Wednesday, June 4, 2008

Cross-Platform Monitoring of Filesystem Events

A recent problem with deployment of the Ringlight client has been that users with a large number of folders have been experiencing annoying amounts of CPU usage. This is because the most fundamental functionality that Ringlight provides is periodically rescanning your filesystem to automatically find changes to the filesystem. Rescan too infrequently and changes won't appear on the website when expected, confusing users. Rescan too frequently and users will complain about too much CPU usage. There are many applications that require rescanning the filesystem, from virus scanners to automatic backup programs, and they all have to deal with this problem.

An attractive alternative to rescanning the filesystem is to use filesystem monitoring events. Rather than periodically scanning to see if anything has changed, instead let the operating system notify you when something has change. Very efficient! Unfortunately, unlike a simple recursive traversal of directories, this approach has to be implemented separately on each major platform and each OS has its own pitfalls and gotchas. I will focus primarily on building this in python, although the same underlying mechanisms can be used in any language with appropriate bindings.

On Windows, filesystem events are available using the Python for Windows Extensions. It is not a particularly simple API to use, being a direct binding to the Windows system calls.

On OS X, PyKQueue offers a binding to kqueue, which is also available on BSD.

On Linux, there are two kernel interfaces, dnotify and inotify, depending on whether the kernel version is lesser or greater than 2.6.13. You can call inotify directly with pyinotify. You can also use a more generic library such as Gamin, which will use either inotify or dnotify, whichever is available. Of course, really old versions of Linux don't even have dnotify, and you'll have to fall back to periodic rescanning.

Every platform's filesystem monitoring API is different and each has different issues, however they generally share a common set of issues as well:

  • Network drives don't generate events - you'll have to use periodic scanning for these
  • Every folder to be watched must be registered separately - you can't request notifications for an entire directory tree. You can to register all the directories separately and when a new directory is create you have to remember to register it as well. You sometimes need to keep a file descriptor around for each directory you're watching, so watch out for running out of file descriptors.
  • No special handling is done for shortcuts, aliases, or symlinks - if you're monitoring, say, a directory, and that directory has a shortcut (or the equivalent for that OS), you need to monitor two objects: the shortcut itself (in case its target is changed), and the targeted file or directory.
  • Sometimes deleting a directory won't send deletion events for files in that directory or subdirectories. You have to maintain a copy of this information yourself and perform a virtual recursive delete on your database when the parent directory deletion event is received.
The Ringlight client, being a cross-platform application the primary function of which is to monitor filesystem changes (and report them to the server, where the real fun happens), naturally takes all this into account. I am planning on release the filesystem monitoring core of the Ringlight client as open source, as there are no good cross-platform filesystem monitoring libraries available and it's really a shame that people have to reimplement all of this for their applications.

By the way, the users seem quite happy with the new version of the client that users filesystem monitoring events. No complaints about excessive CPU usage anymore!

Wednesday, May 28, 2008

Adding Search (with Lucene)

The time has finally come for me to add search functionality to Ringlight. There is enough content now that just clicking around is getting tedious. After all, it's up to almost 200,000 files now.

There are a number of considerations to make when adding search to your site. For instance, you can usually get by pretty well with just integrating Google search into your website. This is fast, easy, and doesn't require messing with your backend code at all.

However, this is not really what I want. I want to let users search for files, not web pages, and I want the results integrated nicely with everything else. For instance, it would be cool to use a search query as a radio playlist like you can do on Hype Machine. So I'll need to build my own search engine.

This is not really that hard to do. I would recommend you read some articles and then download Managing Gigabytes for Java. Those articles are by Tom from AudioGalaxy. You may remember AudioGalaxy as the best thing to happen and unhappen to music in my lifetime. I know do. More importantly, it was deliciously scalable and for the most part it was just a search engine. So don't go writing one without learning some tips from the best.

I'm sure that a little engineering and MG4J could produce a highly scalable search engine. However, I didn't really want to spend that much time on it, so I went with a higher level solution in the form of Lucene for Java. There is also a popular version for Python. I would recommend waiting a while if you're considering using Lucy (Lucene in C with Python and Ruby bindings) because I don't consider it mature. I'd also stay away from layers on top of Lucene like Solr because if you're looking for tools to make Lucene easier to use then you're missing the point that it's already easy to use.

Download the official Lucene release and you'll see that it comes with source code for a demo. One class, IndexFiles, shows you how to add information to the search index. Another class, SearchFiles, shows you how to search the index to retrieve items. Like most demo code, it's both too simple (doesn't provide enough use cases of the library to let you fully understand it) and too complex (has a bunch of command line arguements that it has to parse and such). However, it will do. I have working search for my whole site after two days of fiddling around and working on it part-time.

Wednesday, May 21, 2008

Napster DRM-Free, The Future and Past of Online Music

As of today, Napster now offers DRM-free MP3 downloads. In one sense, it seems that the war is over, and we won. Let's take a look at the history and future of online music distribution.

First there was Napster, one of the original three peer-to-peer systems (along with Freenet and Gnutella). Unlike the other two, it was a commercial venture focused on exclusively on music distribution. It became the controversial flagship of the peer-to-peer movement and the conversation about p2p became focused exclusively on the online music distribution debate. While I was going to conference to speak about Freenet, I would introduce it as a censorship-resistant platform for protecting the freedom of speech of political dissidents and the audience would ask me, "How are artists going to get paid?", the jingoistic slogan propagating by the Recording Industry Association of America. You could argue that that wasn't the point, but by then you'd already gone off the topic of how to revolution media distribution to be efficient and fair and were now talking about the online music debate again. It wasn't even really possible to distribute mp3s over Freenet, but yet it kept coming up again and again.

After Napster, Grokster, and most notable AudioGalaxy (the best thing to happen and unhappen to music in my lifetime) were shut down, the p2p hackers went mostly underground, becoming assimilated into a number of small startups acquired by big companies. No one wanted to touch music at all. Personally, I stopped buying CDs as I had no desire to contribute funding the the organization attacking my friends, my livelihood, and everything interesting going on in my field. I listened exclusively to my collection of unsigned, obscure bands from AudioGalaxy (bands that had paid to be listed on AudioGalaxy so as to get more exposure) for many years. Have you ever heard of The Blood Group? So good.

Online music distribution was dead, and then Apple stepped in, with a not particularly new, but well-executed plan to make money off of online music sales by selling an expensive player for their exclusive DRM format (the iPod). This was only the first step of the plan, however. Step two was to bring in DRM-free MP3s at a higher price than DRM files, offering the RIAA what they really wanted from MP3 sales, which was making even more money selling mp3s than CDs. This is, of course, insane, as CDs contain much more value than MP3s in the form of cover art liner notes, a physical object to hold, and of course the costs of CD distribution are higher as well. Having a higher price for something which has both lower value and a lower cost is just not sustainable. However, it must have sounded quite appealing to the record industry that while CD sales were dropping they could actually make more money that ever before. I'm sure those power point charts went over very well in meetings. Then step three from Apple, readjust the price of DRM-free MP3s to be the same as DRM tracks. As iPod sales decline (because everyone already owns an iPod), Apple switches to a business model not dependent on the iPod and becomes the number one online music distributor. Apple has become what Napster and AudioGalaxy tried to be, the future of music distribution.

So in that sense, we won. Go p2p hackers! Oh but wait, this has nothing to do with p2p hackers, does it? This is just online music distribution. And in this sense, we've lost, or at least are still at war. Peer-to-peer became demonized because of its use for online music distribution. Now that online music distribution is okay, peer-to-peer is still getting picked on. The content industries would have you believe that p2p is only for morally corrupt kleptomaniacs bent on stealing and never paying. This doesn't explain why Limewire is still around as a commercial entity, making money off of Limewire Pro, or how services like TorrentFreedom manage to exist charging $17/month (more than an Napster subscription when they were doing subscriptions).

The sad fact is that the focus has been on not what you are doing (distributing music online), but how you are doing it (peer-to-peer). This is tragic because peer-to-peer distribution is a technological revolution as important as the printing press, whereas online music distribution is only moderately interesting in the long term, being just one example of the future of media distribution being over the Internet.

Friday, April 25, 2008

Download Party

There will be a Ringlight Download Party this coming Wednesday, April 30th, at 7pm at Tech Night at Monkey Wrench Books (110 E. North Loop).

You can check out the full event info on the Ringlight news blog.

Wednesday, April 23, 2008

Lecture Series Complete, Here Are All The Links

The last installment of the lecture series was the best day ever. I had the chance to tell some funny stories from my life as a peer-to-peer hacker, and that was really fun for me. We have video and audio podcasts of the lecture, of course. I just want to make a small disclaimer about my last talk for people out there that are going to watch the video. These are anecdotes about the peer-to-peer revolution that I recited from my memories of the past ten years of hacking and hanging out. I did not employ the services of a professional fact checker. There will be inaccuracies, exaggerations, and things I made up because they sounded cool. The talk was meant to be entertaining and to give people a sense of the social history behind things they use every day. If you want an accurate portrayal of the chronology of events, check out the Wikipedia page.


Here are all the files from all the talks:

Lecture 1: Music, Television, and Film - Audio
Lecture 2: Online Communities - Audo, Video
Lecture 3: Peer-to-Peer - Audio, Video, Video of just my part, Slides
All lectures: Links to websites

In answer to the unspoken question that I have seen glinting in the eyes of so many members of the audience at these lectures: Yes, I am on Twitter.

A special thanks to Aimy for making my slides for me and Joseph for putting on this lecture series.

Thursday, April 17, 2008

Final Lecture Tomorrow, Video of Lecture Two

The final lecture will be this Friday, April 7th. The same time and place, 6:30pm at the ACTLab (CMB Studio 4B). The topic will be the history and current state of Peer-to-Peer technology.

This is my area of expertise, so I've very excited! I hope you can make it.

Also, we have a video of lecture two.

Monday, April 14, 2008

Lecture Two Podcast and Links

The second installment of the 2008 ACTLab New Media Lecture Series went very well. There were many more people in the audience and this time the audio seemed to record properly. We have video too, but it won't be up for a while longer.

You can find all of the links for this lecture on my linkosaur.us page.
You can also download the audio podcast from Ringlight.
I'll let you know when the video is up.

Monday, April 7, 2008

Lecture One Podcast, Details on Lecture Two

The first installment of the lecture series was excellent. Unfortunately, while we are experienced in organizing and recording conferences, this was the first time that we tried to record a presentation while also presenting it! Some unexpected problems arose. First, there was no one to change the videotape so it ran out after an our. Second, there was no one working the mixing board so when halfway through the presentation I demonstrate Pandora, I forget to turn it off therefore covering the rest of the lecture with awesome music.

Anyway, despite its flawed nature, I decided to put the podcast up on Ringlight.

The next lecture will be this Friday, April 7th. The same time and place, 6:30pm at the ACTLab (CMB Studio 4B). The topic will be interesting new developments in Online Communities. I hope to see you there!

Thursday, April 3, 2008

ACTLab Lecture Series: Music, TV and Film on the Internet

For those of you that reside in Austin, Texas, starting tomorrow (Friday) and for the next two Fridays following, I will be participating in the ACTLab Lecture Series. I will be giving tag team talks with my colleagues on what's going on in the world of the Internet.

This week the topic is:

Music, TV and Film on the Internet
Have you ever wondered how the media landscape washed into the Internet Sea? In this lecture/seminar we will look at the current state of MTF on the Internet and how technology is changing the way industry, consumers and content creators deal with distribution and consumption.

I will be covering the top ten things happening right now that you should know about.

Friday, April 4th, 6:30pm in the ACTLab, CMB Studio 4B on the UT campus on Dean Keaton and Guadalupe, across from Madam Mam's. There is a pay parking lot next to Madam Mam's and a garage behind that.

I hope to see you there!

For those of you not residing in Austin, Texas, I will put the videos of the talks on the ACTLab Ringlight share as they're available.

Friday, March 28, 2008

Adobe Flex - Flash Programming for Programmers!

One of the features that I was really excited about implementing in Ringlight was a Flash-based media player for browsing photos, videos, and audio files. I like the idea of being able to view all of my files from any computer with a web browser without having to download them first.

Unfortunately, Flash programming is not the most natural occupation for me. I find the concept of laying your code out on a timeline which trudges ever forward, causing variables to go in and out of scope based on the ticking of a clock... well, I don't like it very much. I don't particularly appreciate the Flash IDE or the fact that Flash source is in a binary file either. I don't mean to malign the art of the Flash programmer. I know you all have heroically found workarounds to all of these inconveniences and I salute your dedication. It's just not my cup of tea.

However, now we have Adobe's Flex. It fixes most of my problems with Flash programming because it's Flash programming designed for programmers. The entire source for a Flex project is in an XML file. While I am not one to jump on the XML-solves-everything wagon, it does mean that you can edit for Flash applications in a text editor, check them into source control, and cut-and-paste whole projects to your blogs and chats and such.

Additionally, the compiler is free and cross-platform. This means you can compile your Flash apps right on your server. No more compiling on your desktop, uploading to the server, testing, and repeat. There's even an Apache module which will automatically detect changes in the source and recompile apps for you on demand. Sweet.

This is not to say that Flex is without problems. Flex coding can be as frustrating as Flash programming sometimes. I recently had a terrible time trying to follow online tutorials for making a Flex image browser. They all suggested using the Loader class, but it had some issues with image scaling. However, the suggested workaround for image scaling involves accessing the raw bitmap data of the image. This operation is unfortunately blocked by Adobe's inane and pointless security model, which requires that to access the raw bitmap data of an image the site hosting the image must authorize it in their crossdomain.xml file. So you are totally free to write a Flex image gallery for Flickr photos, but if you want to scale them then that's not allowed! The workaround is to just use the Image class instead of Loader. You don't get the right events to have a fancy loading animation this way, but at least it works.

Check out my Flex image loader example code. It's not finished as the buttons don't work yet, but it does load images just fine.

All in all I'd say that Flex is a breath of fresh air in the world of Flash programming. If you're a programmer then I'd recommend that you try it out.

Tuesday, March 4, 2008

Fullscreen Hardware-Accelerated H.264 Flash Video

Before I went to San Francisco to develop peer-to-peer Internet video solutions for BitTorrent, I was promoting my own peer-to-peer video streaming solution, Alluvium. There was a lot of interest in the time in p2p acceleration for video streaming and Alluvium had a unique take, pushing DVD quality videos over standard consumer broadband Internet connections. I pitched this software to KLRU, our local Austin PBS affiliate, and their comments would be the same ones that I would hear from everyone. They loved it, they were shocked by the quality, they desperately wanted to get all of their content online as soon as possible and that they needed p2p in order to make the economics work out. They only had one problem with Alluvium. It used its own customer player, a necessity to achieve quality playback. They wanted a Flash player because they wanted to integrate the player directly into their website.

Unfortunately, Flash has never been synonymous with high-quality video. At the time, Flash was missing two components necessary to be used for high-quality video: support for the H.264 codec, which makes files small enough to stream over cable modem but with DVD quality, and hardware-accelerated fullscreen playback, which let's you watch them fullscreen without your computer grinding to a halt.

With the release of Flash 9, Beta 3, (Release #115), Flash now supports these features. With this release, Adobe is positioned to dominate the high-quality Internet video market. Unfortunately, achieving the holy grail of hardware-accelerated fullscreen H.264 video in Flash is not by any means simple and neither the official documentation nor the tutorials available online clarify things.

I have, therefore, drafted this list of tips for people pursuing this noble goal.

  • The only way to achieve fullscreen, H.264, and hardware-acceleration at the same time is to use the FLVPlayback component. Several other options which you might read about should be ignored, such as Video, VideoPlayer, NetStream, and NetConnection.
  • You must use the ActionScript 3.0 version of FLVPlayback. There is an ActionScript 2.0 version available, but it does not have fullscreen support.
  • If you are integrating the player into an existing application, you must rewrite all of your ActionScript 2.0 code to be ActionScript 3.0, as your cannot mix code from the two language variants.
  • Make sure in your Publish Settings that you are exporting as both Flash 9 and ActionScript 3.0.
  • You will need a very recent version of the Flash development environment. You will need Flash CS3 and most likely will need to install all of the available updates.
  • You will need a very recent version of the Flash plugin, at least 9.0 r115.
  • Fullscreen mode for Flash 9.0 r115 for Linux does not work well at all. Even normal quality YouTube videos play back very slowly in fullscreen mode.
  • The supported video formats are Sorenson, VP6, and H.264 codecs in either .flv, .mov, or .mp4 containers. You cannot play .avi, or .wmv files. Not all .mov and .mp4 files will play as some may use other codecs.
  • In order to make a H.264 file, use QuickTime Pro. This is just the normal QuickTime player after you pay the $30 registration fee. It produces the best quality H.264 files. For the container format, choose .mov as it is the most standard.
  • If you have other H.264 files which you have made previously, you need to make sure that the index is at the beginning of the file. Some programs put the index at the end of the file and these files will not stream. Google for the "QTIndexSwapper" program to fixed files that have the index at the end.
  • If you're streaming files from an HTTP server, they must have a .flv, .mov, or .mp4 extension and not take any CGI arguements. For instance, putting the files up on a normal webserver will work fine, but serving them from a content-management or other dynamic system might not.
Once you have a player working, you might want to customize the buttons and the style of the controls. While the documentation provides several options, the only one I have found to work properly is making a new custom skin from one of the standard skins. To do this, I have provided some additional tips:

  • Find a skin you like in Program Files\Adobe\Adobe Flash CS3\en\Configuration\FLVPlayback Skins\FLA\ActionScript 3.0\
  • Copy it and edit it as you'd like.
  • To edit the controls, edit the versions in the library. The items you see on the stage are just examples for you to look at. The actual skin used by the player is created dynamically by pulling thing out of the skin movie file's library.
  • Choose the skin for the FLVPlayback component in the Parameters configuration window, using the skin property.
  • In order for your skin to show up in the list of possible skins. publish it and copy the resulting .swf file to Program Files\Adobe\Adobe Flash CS3\en\Configuration\FLVPlayback Skins\ActionScript 3.0\
  • Make sure that when you upload your final player .swf file to your webserver and you also upload the skin .swf file. Also make sure that both files are readable by the webserver. Otherwise, the controls for your player may not show up.
I hope that this has helped you in your pursuit of fullscreen high-quality video in Flash. It's certainly been an epic journey for me and I'm glad to finally have a working Flash player. Unfortunately, like most media players, it has a visual glitch when switching from one video file to the next in fullscreen mode. It briefly shows the controls on every video transition. This is the exact sort of annoying behavior which we found in every video player and which eventually lead us to writing our own (very nice) player. However, Flash is most likely going to be the king of Internet video, so you have to work with what you have.

Wednesday, February 27, 2008

Welcome to "Step Three: Profit!"

I just wanted to take a moment to tell you about this new blog and what it's all about.

In October of 2006 I left my hometown of Austin, Texas to pursue a career opportunity in San Francisco as the Director of Product Management at BitTorrent, the startup-up company responsible for the popular file-sharing software of the same name.

After a year, I returned to Austin to start my own company, working alone and with the help of some friends, I'm attempting to make that next step into entrepreneurship. This is a little crazy for me, because I am not fundamentally a business person. I'm a geek. I was senior engineer before I made the move in product management. I have spent most of my life eschewing what I considered the "businessy" side of work in pursuit of the artistic and ideological purity of coding. All of my free time has gone into open source projects.

Why then am I suddenly keen to leave my profitable job at a hip San Francisco startup company to pursue something that I do not fully comprehend? There are a lot of reasons to move from the corporate world to a life of small business entrepreneurship. The biggest one for me is control over my own life. I want to live and work in Austin so that I can be with my girlfriend and my family. I missed the Austin culture terribly when I was away and being an entrepreneur allows you to work where you want, at what hours you want, and as hard as you want. Because you own your creations. I feel full of exciting ideas and I want the chance to explore them. When you work for a company you are very restricted in what you can do. Any innovations are generally owned by the company and working on a potentially competing product (defined very broadly) is particularly forbidden.

Also, as a geek, I love learning new things: new programming languages, new technologies, and new skills. Having read every book in the Computer section of my local book store, it's refreshing to walk into the Business section and see a vast array of titles which I know nothing about. Whether my business venture succeeds or ultimately fails, I'll learn a lot in the process and probably have some fun.

So that's what this blog is about, starting a business when you don't really know anything about business, a small business guide for geeks.

Step One: Quit your job and start a blog about your new business venture
Step Two: ???
Step Three: Profit!