Tuesday, July 29, 2008

Flash Crowd Preparation - Load Testing With Siege

The official launch of Ringlight is approaching and so it's time to prepare for launch day flash crowds, in particular from Slashdot (or Digg, etc.). On the bright side, these sorts of Flash crowds are not as fearsome as the used to be, in that most website hosting these days provides adequate bandwidth and CPU to handle the load. The weak point is most likely your application itself, so it's worth load testing it.

So how much load is a Slashdot flash crowd? On the order of 1-10 requests per second. If you can handle 100 then you're more than fine. Additionally, this load only lasts about 24 hours. These are small numbers as compared to the continuous load you can expect for a popular site, but scaling up for your post-launch flash crowd is good preparation for the traffic to come.

The first step in scaling is to load test your site. Don't go writing a level 4 load balancer in C with async I/O just yet. First, find out what is slow and just how slow it is. I like to use siege for this because it let's you start simple.

First, apt-get install siege. Then, run siege.config to make a new config file (you can edit it later).
Then, try the simplest possible test: siege http://yoursite.com/

Don't run siege on someone else's website as they are likely to think they are under attack (and they are!) and block your IP.

Siege will launch a bunch of connections and keep track of core statistics: availability (should be 100%), response time (should be less than 1 second), and transaction rate (you're shooting for 100 transactions/second).

By default, siege will just keep attacking your server until you tell it to stop (ctrl-C). To run a shorter test, use -r to specify the number of repetitons. Note, however, that this is not the number of transactions that will be made. siege uses a number of simultaneous connections (15 by default, set it with -c), so if you specify -r 10 -c 10 for 10 repetitions with 10 simultaneous connections, then there will be 100 transactions.

Other important options are -d to set the random delay between requests and -b to use no delay at all. The use of -b is less realistic in terms of real load, but will give you an idea of the maximum throughput that your system can handle.

I tested my site with siege -b -c 100 -r 100 in order to hit it hard and see the max throughput. I found that most pages could handle 100 transactions/second, but one page as doing a scant 5 t/s. Unacceptable! I added memcached caching to that page for caching some of its internal state and it now benchmarks at about 90 t/s. That's perfectly acceptable for the launch crowd, but this benchmark relies on cache hits. With 100% cache misses, I'd be back to 5 t/s. So the real performance is somewhere in the middle, depending on the ratio of cache hits to misses. What this ration is depends on real traffic patterns as well as the size of the cache and the number of pages. This makes it hard to say, but I can give it a shot.

The way to simulate something like this with siege is to put a number of URLs in a file called urls.txt and then run siege -i. siege will then pick URLs to hit randomly from the file. Put multiple copies of the same URL in the file in order to simulate relative weighting of the URLs. By providing a file containing all of my caching-dependent URLs, weighted based on estimated popularity, I can see how well my cache is holding up and tweak caching settings as necessary to get adequate performance.

No comments: