I’ve now had about 24 hours to play with the Mtron SSDs and had some time to benchmark them.
The good news is that the benchmarks look really solid. The drive is very competitive in terms of performance. I’m seeing about 100MB/s sequential read throughput and 80MB/s sequential write throughput.
I’ve had some time to benchmark them and they’re really holding up.
The bad news is that they can only do about 180 random writes per second. Here’s are the raw performance numbers from Mtron’s data sheet:
I spent a lot of time reviewing this drive and didn’t notice this before.
The Battleship Mtron review went over this as well but didn’t spend much time on it:
Although they do perform astounding in random read operation, random write is still very sub-par on flash technology. Even though we are not benchmarking random write IOP’s I will give you some quick insight. Write performance is not yet a perfect and refined process using NAND flash and you will not have a drive that is going to write file operations as well as a high end U320 15K SCSI or SATA 10K setup. There is a company that I have been talking with directly about this NAND flash write issue called EasyCo in PA, USA. They are working on a process called MFT technology and they offer a simple MFT driver that is claiming to increase random write IOP’s on a single drive up to 15,000 IOP’s. Doug Dumitru had explained to me this technology will take your standard Mtron 16GB Professional drive and turn it into an enterprise behemoth.
I spent some time to see what EasyCo was up to and came across their Managed Flash Technology:
Managed Flash Technology (MFT) is a patent pending invention that accelerates the random write performance of both Flash Disks and Hard Disks by as much as a thousand fold.
It does this by converting random writes into chained linear writes. These writes are then done at the composite linear write speed of all the drives present in the file volume, subject only to the bandwidth limits of the disk control mechanism. In practice, even with as few as three drives present, this can result in the writing of as many as 75,000 4-kilobyte blocks a second.
As a result, MFT can dramatically improve the real-time performance of asymmetric storage devices such as Flash disks by making reads and writes symmetrical. Here, flash disk performance is typically improved 10 to 30 times, making some of these 60 times as fast as the fastest hard disk. Finally, it is possible to make clusters of as few as 20 flash drives run collectively as fast as RAM does but with a much larger storage space than RAM can practically have.
The question is what are they doing to get such substantial performance?
Here’s what I think is happening.
From what I’ve read they take a normal Mtron drive and install a new Linux kernel module which they use to interface with the drive. They then use a normal write ahead log and keep data in memory (probably something like a 500M buffer) and a binary tree of the block offsets. When the buffer fills they then take the data in memory, sort the results by offset, and apply the buffer to disk sequentially.
If the box crashes they have an on disk log that they apply. Probably when the drive is first mounted.
Basically a flash aware write ahead log.
Fortunately, InnoDB has a write ahead log internally so this should save us from needing to run a custom kernel module. Any database with a write ahead log should be more than competitive.
I wrote a simple benchmarking utility (see Figure 1 below) to simulate an InnoDB box performing thousands of random reads and one sequential write.
The benchmark consists of 3500 dd process running in the background reading from the SSD and writing to /dev/null. I then have one sequential write performing in the foreground writing out about 5G of data to the SSD.
The HDD holds up really well when compared to the SSD which should have an unfair advantage. So much so that I think the Linux scheduler is interfering with my benchmark. I think that’s happening is that the first few dd’s start reading in parallel and block the remaining process. This continues with 5-10 concurrent readers until the entire chain of 3500 completes.
I’m going to rewrite the benchmark to create one large 10G file and randomly read 10G from misc locations.
As you can see while SSD is very fast but it’s only about 2.5x faster than HDD. I’d expect it to be about 20-40x faster.
Figure 1. Performance of SSD vs HDD (measured in seconds)