Thursday, June 17, 2010

The Original Introduction Problem in P2P Networks

BitCoin was released this week, a very interesting P2P currency based on proof-of-work with a novel method to deal with double-spending via a P2P timestamp server. Cool stuff.

On the BitCoin forums, a discussion was going on regarding how new BitCoin nodes connect to IRC in order to find other BitCoin nodes. This method was somewhat controversial because it was drawing the ire of the IRC network admins because it looked like they were running a botnet. Additionally, if the IRC server goes down then new users can't join the BitCoin network. However, what are you going to do? When you first run a node, it doesn't know about any other nodes. It's a tough situation.

This is a common problem in P2P, known as Original Introduction, although bootstrapping is also a good word for it. The problem with bootstrapping is that you can't decentralize it. Whether it's IRC or HTTP or DNS, the client needs to be hardcoded with an address or list of addresses which is sufficiently fresh that at least one of the listed addresses is still active. After the first node is reached, you are no longer in Original Introduction mode and can use the full range of techniques for decentralization, such as gossip. Unless, of course, you get disconnected from the network and all of your known peers go away, in which case you're back to bootstrapping.

There are two properties that are at odds when you chose a bootstrapping method: robustness (scalability/reliability) and freshness. Robustness is increased at the expense of freshness by caching on multiple servers, as is usually done with HTTP peer lists. Freshness is maximized (at least up to the TCP timeout) at the expense of robustness by having everyone connected, as with IRC. Of course, the key is finding the right mix of robustness and freshness because you need both for the bootstrap to be successful.

Here are some of my current favorite methods for bootstrapping:

Append list of fresh peers to executable or installer dynamically on download. People usually get the application from its official website, so the website is already a point of failure for new users. You're already hardcoding an address in the application, the address that the application will use to bootstrap. So instead just add fresh peers at the moment of download. You need some fancy code in the executable to read the list off the end, but I've implemented this in an NSIS installer and it's not that hard. Most software developers are upset by the idea of this method.

Connect via XMPP to Google App Engine application. This gives the freshness of IRC, but with more robust scaling. App Engine is mostly for writing web apps, but it provides email and XMPP handling as well. It would be simple to write one application that could handle peer lists via either XMPP or HTTP with the same handler code. I'm currently using this in an application and it works well and is very reliable. I only wish there was a second App Engine to use as a fallback because it does have occasional downtime.

An alternative to requiring all nodes to include the complexity of a protocol like IRC or XMPP is to have a few special sentinel nodes which sit on the network and collect addresses of connected nodes via the usual decentralized methods available to an active node. These sentinel nodes periodically upload fresh addresses, say via HTTP POST to a number of websites. A new node can then download a fresh address list from any of the websites which is currently functioning and reachable. If you have 5 sentinels each uploading every 5 minutes (staggered), then you'll have updates roughly once a minute. This is on par with IRC in terms of freshness and is robust as you care to make it by varying the number of HTTP mirrors and the number of sentinels.


Anonymous said...

Early architectures assumed that multicast- or arp- like protocols would solve this, but in practice on today's huge and NATty internet, the best solution is to have a set of rendezvous databases acting as IP address registries but not participating in the p2p protocol.

There ought to be a way to do something similar to a reverse DNS lookup, too, but with a database key.

Nick said...

The most straightforward option as far as I can tell would be to have someone (or several someones) operate a custom DNS server that serves up random peers in the network when queried. Then you can always list that name as your bootstrap address - no additional protocol support required.

Mark said...

Has there been talk of using bitcoin as a metered billing strategy for trading network resources over bittorrent? As in using bitcoins as receipts for upload transfers to be redeemed for download transfers or bandwidth? Bitcoin sets a price for system resources so it could be a way to thin out the leechers.

Anonymous said...

The problem with using bitcoin in this fashion is that bitcoin trails are publicly auditable (that's how they work) which means they are NOT untraceable.

Solution is to issue the bitcoins using a chaumian transaction server like this one:

This way the bitcoins can be traded untraceably, and they can be used to solve issues of resource allocation on anonymous networks.

The bitcoin also provides the solution of where to store the backing for the currency (no need - distributed) and how to audit it (bitcoins are publicly auditable.)

The anonymous network also solves on of Bitcoins major problems, by actually making bitcoins redeemable in something of real value: the computing resources on the p2p network.