MySQL 5.1 (which is in beta right now btw) will have an interesting new feature called partitioning which will allow for a bit more scalability in certain situations. You can read more about it here, here, and here.
I don’t see this really being much of an advantage in practice. Most people want to scale their database in terms of transactions per second. You could buy three disks and put a partition on each or you could put the whole thing on a stripped RAID array. Each would have the same IO characteristics (assuming your partitioning is evenly distributed across disks).
The MySQL guys are still engineering like its 1999. They’re thinking disks are the IO bottleneck and all the machines in your cluster are identical.
Follow me here guys. Disk is the new tape, memory is the new disk, and gigabit ethernet is the new SCSI.
Case in point – NDB (or MySQL Cluster). They assume all the nodes in the cluster are the same capacity. This is #8 on the eight fallacies of distributed computing. You can’t even add/remove nodes to NDB without taking the entire cluster offline, doing a full backup restore, and bringing the system up under a new configuration. Are they serious?
I agree that the problem is a hard one but it can be solved… I had to solve it with a distributed filesystem I wrote (and will try to OSS soon).
What I really want is a cluster where the machines are just nodes providing storage. My schema sits at a layer above this and my DB layer handles IO requests within the cluster. This isn’t as far fetched as it sounds. The lbool driver provides me with a lot of this now.
The biggest problem I have is that every box is a single system image and contains the whole database. I could partition it within application logic but it seems that if MySQL were to bridge the thinking between partitioning and clustering (and fix the obvious problems with NDB) then we might have something pretty damn sweet.