Progress is being made but my gut tells me that this thing won’t be ready for real world use for about a year or two. I’ll stick to MySQL in Spinn3r, at least for the short term.
Not that people aren’t trying to move it forward as fast as possible.
Yahoo has a huge cluster that they’re using to find bugs. Facebook has put 1.3B rows into hbase as a side project. 25k writes per second peak. Not a lot of data. Just stress the system to see if it could break – and they succeeded.
There are also core design flaws. For example, they use threaded IO. This means that every client connection will require a dedicated thread. This just won’t scale. Most VMs will dump core at about 2k threads. Java threads require about 128k of memory per thread. Need 1k threads? Better allocate 128M of memory.
There’s no reason not to handle 10k concurrent connections.
Rapleaf is apparently playing with it as well. Sounds like they have about 20-100G of data dumped into hbase. They’ve also built out a REST interface on top of it.
Other choices here would be KFS or federated MySQL. I’d recommend MySQL if you want to ship this year.
There’s still a lot of progress to be made here. Myself? I want a bigtable clone in C from the bottom up. I want it to use async IO, and I want it to be aware of cluster topology.
Also, where does this leave MySQL? I’m starting to see more and more Open Source database mind share drift into the GFS/Bigtable realm. Specifically because everyone know that MySQL doesn’t scale unless you have a federated backend? The problem is that there isn’t an Open Source federated backend to chose from.