For the last few months I’ve been thinking about two things with regard to the future of computing:

  1. All future startups/technologies will be built on scalable clustering technology (or they will fail).
  2. All existing Open Source falls short. Most existing databases (MySQL) and filesystems (XFS, ReiserFS, etC) are centralized (more on this later).

Why? My reasoning boils down to this:

  1. Consumers demand fault-tolerant software. No more getting “a massage” and kicking out hundreds of thousands of users.
  2. Consumers demand sub-second response time. They’ve been spoiled by Google.
  3. Most existing services are already consuming terabytes of data. Next-gen applications will be consuming petabytes.
  4. Lost profit from even moments of downtime is just unacceptable.

What’s required for next generation clustering applications? I’ll tell you.

Easy application repartitioning

It needs to be amazingly easy to add new hardware. Your operations department won’t scale when you have to manually manage thousands of servers. You need to have machines fail and leave them offline. New machines should be as easy install as that new lamp in your apartment – just plug it in.

Ideally the cluster will detect the presence of a new node, check that it’s healthy, repartition the cluster (move data over to it), and then bring it live. All of this needs to happen without user intervention.

Fault tolerant operation

No node in the system can fail while taking anything offline. Not only that but it’s failure can’t hurt any aspect of your current runtime performance. It has to be a transparent event that’s easily resolved. Your chain is only as strong as its weakest link.

Scalable and fast storage

Terabyte storage needs to be easy to setup and use. This doesn’t mean buying large expensive RAID arrays or storage area networks. This means buying cheap commodity boxes on gigabit (or faster) commodity networking hardware with single disks. Want faster IO? Buy more storage nodes! And since they’re fault tolerant it won’t matter that they’re cheap. You’ll also gain additional storage so this is a win/win.

Parallel and fast query execution

You need to be able to run queries within your cluster and return results to users fast. How fast? Less than 200ms. That’s not too long. That’s a blink of an eye. But it’s what Web 2.0 users are expecting.

Not so fast you say! All of this is hard!

You’d be right. It’s very hard.

Open Source is letting us down. The only main players right now are the big boys (Google, Yahoo, Microsoft, etc). It’s important to note that they’re running mostly proprietary software.

We’re starting to catch up. Google’s MapReduce is being implemented as Open Source. The Nutch Distributed Filesystem also seems promising. I actually developed a distributed filesystem while at Rojo called – RojoFS. I might be spending some time to make sure most of my ideas make it into NDFS (or rebuilding it from scratch).

Open Source also has some decent distributed memory caching. Memcached being a good example.

The database world is still falling down though. MySQL cluster is the best out there (but still early and not perfect for everyone). A lot of people have built some really impressive scalable MySQL replication installs but they’re still not what we need. They’re expensive and difficult to maintain.

If you’re building your Web 2.0 product with Web 1.0 technology your just going to fall on your face.

I’m still confident that Open Source will win the day. It always has. It always will. Sometimes it just has a little catching up to do.



  1. I agree completely. I’ve been kicking around several projects … but for all of them I need to build the cluster first. I don’t really waht to get into the business of distributed data storage and processing. I just want to write cool apps that are scalable.

  2. Yep, this is right on, and while you mention memcached, I think that people generally underappreciate (or at least under-report) how much work Brad F. at LiveJournal/SixApart has contributed to the community in this area. Perlbal is another invaluable capability, and Brad’s team has literally provided roadmaps on building such scalable systems at MySQL conferences..

  3. Dick..

    yeah.. you’re right. LiveJournal has done a bunch of cool stuff. Truth be told though must people who make significant contributions are ignored.

    http://en.wikipedia.org/wiki/Stigler‘s_law_of_eponymy

    Scott.

    Yeah.. you could use user mode linux to test your cluster software and then buy the hardware…. you wouldn’t notice any performance advantage but you could develop it and then deploy it later.

  1. 1 Productivity Hacks

    Future of Computing? Kevin thinks it is in Clusters.

    This morning I ran across Kevin Burton’s Feed Blog. Kevin was the author of NewsMonster and a co-founder of Rojo Networks (he is doing his own thing right now). He had some interesting thoughts about the future of computing. His…