Wednesday, July 11, 2012

Freefall Scaling

Freefall is a cloud-based NoSQL database which is designed from the ground up to be ultra-scalable. For the most part, users of Freefall don't need to know anything about the details. It just works. However, scalability enthusiasts might be interested in knowing what's going on under the hood.

The first step in designing Freefall to scale is that it runs on top of Google App Engine. This means that, for the most part, Google will autoscale the number of servers in order to handle the rate of incoming requests. There are plenty of parameters you can tweak on the App Engine web dashboard in order to optimize performance, but for basic functionality you don't need to change a thing.

The next step is the separation of frontend and backend services. The public API of your services, the actions and the views, are handled by the frontend servers. The transforms, the internal logic of your application and the bulk of the computation, are handled by the backend servers. This means that a client will never block while waiting for a time-consuming computation to complete. Actions and views are designed to return very quickly, freeing up the frontend servers to handle more requests, while the backend servers compute asynchronously. Therefore, when the service is overloaded with too many writes to process, the failure mode is stale data, not catastrophic collapse.

The next step is to separate reads and writes. Views are read-only. In fact, views are pre-computed and pre-serialized and cached in memory. So when you load a view all the frontend server has to do is read the pre-serialized bytes out of memory and write them to the HTTP socket. Views are therefore extremely fast. Actions are write-only. The purpose of an action is to change the application's state. All the frontend servers do for an action is to deserialize the incoming data and add the request to a queue. The actual processing of the action is done asynchronously on the backend. Actions are therefore extremely fast.

On the backend, an action and a transform are essentially the same, the only differences between them being whether it is part of the public API or internal logic and whether the input is supplied by the client or from internally stored data. From a processing perspective, they operate in the same way. The input data is loaded and the transform function is run. It makes changes to the output based on the input and the results of computation. After the output has been changed, two things happen: views are calculated, and transforms are triggered. If the particular model which is the output of the transform is marked as a view, then the state of that model is serialized and cached in memory so that it can be retrieved by the client. If any transforms are configured to be triggered by the output model then they are called and the process repeats again. Eventually all of the transforms have been processed and all of the views have been calculated and the system returns to a state of rest until the next action is performed.

That's the system in a nutshell, but I glossed over some details which are important to the technical aspects of how Freefall scales so well. If we were to process one transform at a time then that could make things very slow. So instead, App Engine can process multiple transforms at once by launching simultaneous backend servers which pull tasks from the queue. This allows for highly parallelized computation, similar to MapReduce. There are a number of ways that parallel computation can go awry, but everything works out in Freefall because of some clever design elements. First of all, the structure of transforms creates a data flow graph. Transforms can have multiple inputs, but only one output, and cycles are not allowed. The structure of the graph therefore partially serializes computation because a transform isn't executed until its inputs have changed.

Additionally, transforms are pure, side-effect free functions. So the value of the output is entirely determined by the value of the inputs (and the computation). It therefore doesn't matter what order we run the computations in as long as they have the correct inputs. This may seem confusing because the transform functions modify the output state. However, this is all a ruse to make it easier to write transforms in a more familiar syntax. Transforms do not actually modify the output model, but rather they are monadic functions, which is to say that they produce monads. A monad in this case is a list of requested changes to make to the model. The model is not actually modified, the modifications that are requested are just collected and returned at the end of the function. This is important because it means we can run the function as many times as we like without fear of it actually modifying the database. In fact, we do sometimes run the function multiple times. Freefall is a Software Transactional Memory (STM) database. We run the set of requested changes in a transaction, possibly simultaneously with a lot of other transactions. If any two transactions modify the same model then we abort and retry. The function which was aborted is rerun using the new values for the model. This is the one case in which it does matter in which order we run the functions as the rerun function might return a different output given its new input. However, this is essentially a case of two things happening simultaneously and so in the interest of moving forward one of the two simultaneous events is chosen to happen first and the other second. Deadlock is therefore avoided and consistency is maintained (because everything happens in transactions).

So that's basically all of the magic: queues, caching, asynchronous monadic functions, and software transactional memory. The result is a database that won't fall over under read load and can be scaled up to handle arbitrarily high write load by launching more backend processing servers.

No comments: