Friday, July 20, 2012

Freefall Tutorial Screencast

I made a screencast walking through the tutorial for building a Freefall example app. In this particular example, I built a simple leaderboard services where you can post scores and then get a sorted list of all the scores. Make sure to watch it in HD so that the text is legible.


Tuesday, July 17, 2012

Freefall Tutorial Docs

I've been working on some tutorial docs for using Freefall. There is a usage tutorial that describes all of the different commands you can use with the command line tool. There is also an app development tutorial which walks you through developing a simple leaderboard service. By the end of the tutorial you should have a leaderboard up and running on Google App Engine!

If you go through the tutorial, please let me know how it goes. Any feedback on the documentation would be helpful as I want this to be a tool which people can actually use.

Monday, July 16, 2012

Austin on Kickstarter

I was recently asked if I knew anyone that had done a Kickstarter campaign and might want to be on a crowdfunding panel here in Austin. I started thinking about all of the local folks I knew that had done Kickstarters. There were quite a few! I wanted to share them with you. These are just people I know personally (and friends of friends), so there must be a lot more projects going on in Austin that I don't know about. This is exciting to think about.

Here is the list:

Tammany Hall, The Great Fire of London - Pandasaurus Games / Nathan McNair
Inevitable - Dystopian Holdings / Jonathan Leistiko
Thunderbeam - Karakasa Games / Wiley Wiggins
growerbot - Luke Iseman
Big Poppa E's Poetry Project - Big Poppa E
The Blue Hit Recordings Project - The Blue Hit
Beatbox Beverages (not launched yet) - Beatbox Beverages / Aimy Steadman
CAT22 - me!

Wednesday, July 11, 2012

Freefall Scaling

Freefall is a cloud-based NoSQL database which is designed from the ground up to be ultra-scalable. For the most part, users of Freefall don't need to know anything about the details. It just works. However, scalability enthusiasts might be interested in knowing what's going on under the hood.

The first step in designing Freefall to scale is that it runs on top of Google App Engine. This means that, for the most part, Google will autoscale the number of servers in order to handle the rate of incoming requests. There are plenty of parameters you can tweak on the App Engine web dashboard in order to optimize performance, but for basic functionality you don't need to change a thing.

The next step is the separation of frontend and backend services. The public API of your services, the actions and the views, are handled by the frontend servers. The transforms, the internal logic of your application and the bulk of the computation, are handled by the backend servers. This means that a client will never block while waiting for a time-consuming computation to complete. Actions and views are designed to return very quickly, freeing up the frontend servers to handle more requests, while the backend servers compute asynchronously. Therefore, when the service is overloaded with too many writes to process, the failure mode is stale data, not catastrophic collapse.

The next step is to separate reads and writes. Views are read-only. In fact, views are pre-computed and pre-serialized and cached in memory. So when you load a view all the frontend server has to do is read the pre-serialized bytes out of memory and write them to the HTTP socket. Views are therefore extremely fast. Actions are write-only. The purpose of an action is to change the application's state. All the frontend servers do for an action is to deserialize the incoming data and add the request to a queue. The actual processing of the action is done asynchronously on the backend. Actions are therefore extremely fast.

On the backend, an action and a transform are essentially the same, the only differences between them being whether it is part of the public API or internal logic and whether the input is supplied by the client or from internally stored data. From a processing perspective, they operate in the same way. The input data is loaded and the transform function is run. It makes changes to the output based on the input and the results of computation. After the output has been changed, two things happen: views are calculated, and transforms are triggered. If the particular model which is the output of the transform is marked as a view, then the state of that model is serialized and cached in memory so that it can be retrieved by the client. If any transforms are configured to be triggered by the output model then they are called and the process repeats again. Eventually all of the transforms have been processed and all of the views have been calculated and the system returns to a state of rest until the next action is performed.

That's the system in a nutshell, but I glossed over some details which are important to the technical aspects of how Freefall scales so well. If we were to process one transform at a time then that could make things very slow. So instead, App Engine can process multiple transforms at once by launching simultaneous backend servers which pull tasks from the queue. This allows for highly parallelized computation, similar to MapReduce. There are a number of ways that parallel computation can go awry, but everything works out in Freefall because of some clever design elements. First of all, the structure of transforms creates a data flow graph. Transforms can have multiple inputs, but only one output, and cycles are not allowed. The structure of the graph therefore partially serializes computation because a transform isn't executed until its inputs have changed.

Additionally, transforms are pure, side-effect free functions. So the value of the output is entirely determined by the value of the inputs (and the computation). It therefore doesn't matter what order we run the computations in as long as they have the correct inputs. This may seem confusing because the transform functions modify the output state. However, this is all a ruse to make it easier to write transforms in a more familiar syntax. Transforms do not actually modify the output model, but rather they are monadic functions, which is to say that they produce monads. A monad in this case is a list of requested changes to make to the model. The model is not actually modified, the modifications that are requested are just collected and returned at the end of the function. This is important because it means we can run the function as many times as we like without fear of it actually modifying the database. In fact, we do sometimes run the function multiple times. Freefall is a Software Transactional Memory (STM) database. We run the set of requested changes in a transaction, possibly simultaneously with a lot of other transactions. If any two transactions modify the same model then we abort and retry. The function which was aborted is rerun using the new values for the model. This is the one case in which it does matter in which order we run the functions as the rerun function might return a different output given its new input. However, this is essentially a case of two things happening simultaneously and so in the interest of moving forward one of the two simultaneous events is chosen to happen first and the other second. Deadlock is therefore avoided and consistency is maintained (because everything happens in transactions).

So that's basically all of the magic: queues, caching, asynchronous monadic functions, and software transactional memory. The result is a database that won't fall over under read load and can be scaled up to handle arbitrarily high write load by launching more backend processing servers.

Freefall: What Is It And Why Is It Awesome?

To describe it simply, Freefall is a NoSQL database. It is similar to other NoSQL databases such as CouchDB, SimpleDB, or MongoDB. However, there are some key differences in Freefall, particularly in how you use it.

Unlike most databases, you do not run Freefall on your own servers. It runs as a Google App Engine app. You run your own instance and you pay Google for the bandwidth and computation time. The reason that you need to run your own instance is because Freefall isn't just a generic database. You specify the services provided by your application and then Freefall generates a custom App Engine app to provide those services. It also generates custom client libraries to call the services. So your experience as a developer is of a high-level API provided as a library for the language of your choice to access your specific services. In this way Freefall is similar to Rails because it provides most of the infrastructure and you just provide your application-specific code. Most importantly, you don't need to know anything at all about Google App Engine! It's all taken care of by Freefall. You just need to define your specific services and then you're ready to go.

For example, let's say you wanted a simple high score tracking service. You could define a "reportScores" action which reports a new score for a given playerid. You could then define a "highest" transform which discards all scores for a given player which aren't the highest seen. You could then define a "highScores" view which returns a list of the high scores. Freefall would then generate all of the code to make a server which supports these functions and client libraries with high-level methods such as "void reportScores(String playerid, float score)". You can then deploy your server-side code to App Engine and call the client library to access your high score service.

As you can see from the above example, Freefall is an MVC framework. You define actions which are the public API for changing the state of your application. You can also define internal transforms which derive new state from the state changed by actions, or from other transforms. Eventually, the state changes reach one of the defined views, in which case the changes become publicly accessible. Actions and views together form the public API of your service, while transforms represent the internal logic on your service. Together they form a data flow graph in which actions flow into transforms and then into views. Actions are the input and views are the output.

Transforms are a powerful feature of Freefall as they allow for arbitrary computations to take place to process your data. You can do validation, authorization, sorting, joins, and filtering. Transforms are similar to CouchDB views, except that they can take multiple inputs and they can be chained by using the output of one transform as the input for another. This makes them much more flexible than CouchDB views. Additionally, transforms (and actions) are full python functions. They can even import modules! They are stored in .py files, not in the database, so you can use version control to keep track of your code.

So when you really get down to it, Freefall is much more than a NoSQL database. It's a framework for doing data-driven server-side computations in a convenient and scalable way. Other databases just store the data and require you to do computation client-side, or they provide limited or awkward server-side computation. Freefall provides all of the power of python on the server, without all of the hassle and with much better scalability than setting up a python-based web server.

Announcing Freefall: Cloud Services for Mobile Apps

For the past few years I've been working as a scalability consultant for Internet startups, mostly on scaling websites. People call me when their Rails servers are crushed by the popularity of their product, and I fix them. There's no magic bullet to scaling. There are a few principles of good design and they are largely not followed, so my job is to bring things back in line with best practices for optimum scaling. For a while I've been thinking about taking these best practices and packaging them up into something people could use directly, rather than implementing the same set of optimizations for each client. However, I found that web companies tend to already be committed to a particular stack. As I've also been doing Android and iOS development lately, I thought mobile developers might be the ideal market for a new, super-scalable backend system. I've noticed that mobile developers aren't that interested in hacking backend code. They'd rather just get on with their mobile apps and leave the backend to someone else. There are a lot of cloud backend services already available, but my idea was different, it was a universal backend for anything from leaderboards to MMOs.

I was pitching this idea to any mobile developer that would listen at SXSW Interactive this year and I was lucky enough to pitch it to John Warren at Minicore Studios. I've known John for a while as I am a friend of the St. Edward's Digital MBA program from which he graduated. He'd been talking about starting a game development studio, and sure enough he had done so and they had a booth at SXSW Screenburn. My pitch to John was that he would not need to hire server-side developers or sysadmins to run the servers for the online components of his games. His mobile front-end developers could just continue writing their games on Android, iOS, PC, and Xbox. There would be client libraries available in all of the necessary languages and the developers could use them like any other library, not ever thinking about the server side of things. The system would be flexible to whatever he needed to do with online services, not limited to a static set of services such as leaderboards and achievements like most cloud services. Best of all, there would be no monthly fees, it would be open source and built on top of Google App Engine, so you just pay Google for your bandwidth and computation, and you only pay for what you use.

I guess John thought it was a good idea, because he hired me on the spot to build this service. I've been working all summer at Minicore building an open source persistant world server for the indie game development community to use free of charge. What can I say? This has been a dream opportunity for me. After spending so much time fixing broken design I was able to build something which follow best practices from the ground up. It's flexible, it's fast, and most of all it scales like crazy.

I'm going to post more technical details soon, as well as documentation and guides. Right now I just wanted to let the world know that this is happening. I just got the first service working, leaderboards for Minicore's Tanks for the Memories for Android, and I decided it was time to make a post. In the meantime, check out the project source (without documentation as of yet) and give a shout out to John to thank him for making this possible for the indie game community.