I own a Macbook Air. I am a cool file systems person, so naturally I would buy the version with the SSD rather than the boring old crappy disk, right?
SSD – the kind built out of flash memory – is at present far less reliable than old-fashioned spinning disks. My direct personal experience, and that of my business colleagues and friends, is that flash-based storage suffers silent data corruption at an extraordinary rate. Why is this? A few reasons:
Flash-based storage has a limited number of write cycles before it wears out and stops storing the correct data. Supposedly this is not a problem because (1) current flash can do millions of erase/write cycles, (2) SSDs implement hardware wear-leveling to spread out writes evenly over all the cells.
Let’s start with hardware wear-leveling. Basically, nearly all practical implementations of it suck. You’d imagine that it would spread out writes over all the blocks in the drive, only rewriting any particular block after every other block has been written. But I’ve heard from experts several times that hardware wear-leveling can be as dumb as a ring buffer of 12 blocks; each time you write a block, it pulls something out of the queue and sticks the old block in. If you only write one block over and over, this means that writes will be spread out over a staggering 12 blocks! My direct experience working with corrupted flash with built-in wear-leveling is that corruption was centered around frequently written blocks (with interesting patterns resulting from the interleaving of blocks from different erase blocks). As a file systems person, I know what it takes to do high-quality wear-leveling: it’s called a log-structured file system and they are non-trivial pieces of software. Your average consumer SSD is not going to have sufficient hardware to implement even a half-assed log-structured file system, so clearly it’s going to be a lot stupider than that.
How about those millions of write cycles? Let’s start with a higher-level observation: Disks have been around for decades. We have decades of experience with accelerated aging of disks to predict long-term failure rates – you stick ’em in a hot room and vibrate them around and write lots of data to them, and the failure rate under these conditions can be successfully extrapolated to long-term failure rates. When you buy a disk, you can reasonably expect that it will store your data correctly, and – almost as important – that if it doesn’t store your data correctly, you’ll know because you either get an IO error or because it makes a loud clunking noise and your computer hangs.
When it comes to flash, manufacturers are handicapped when predicting long-term failure rates for a number of reasons. First, it’s hard to extrapolate failure rates under stress tests to long-term failure rates. In particular, the failure mode of a flash cell is that the charge leaks out of the cell – slowly, over time. Stress testing by writing to the cell a lot and then reading the data back is not going to test this situation. In general, we simply don’t have a lot of experience with testing flash and it will take a few years to build it up. Second, manufacturers of the device using flash are constantly switching suppliers for the actual flash memory itself. When it comes to consumer-grade flash, manufacturers have strong pressure to drive the price down and very little pressure in the direction of quality. Frequently, the manufacturer won’t have any idea where the flash chips came from for a particular device based purely on the model number because it used more than one supplier for that model. Third, failure rates will be heavily depending on the pattern of both reads and writes, so a device that checks out fine under one test pattern will failure miserably under another load – not generally a characteristic of disks. Another fascinating aspect of flash-based SSD is that you don’t seem to get any report of checksum failures on corruption – at least I haven’t seen one in the three confirmed cases of flash corruption I’ve seen. I don’t know if this is because the device isn’t reporting it or because the OS driver isn’t listening for it, but it’s what happens.
The exception to these observations are any flash device that costs a lot of money – commercial-grade flash as opposed to consumer-grade flash. Disks vary in quality too, but it’s usually much more along the performance axis than the reliability axis. Speaking of performance, the performance of flash-based SSD has not been the huge leap over spinning disk that we expected. At present time, many disks still have higher bandwidth than many SSDs. SSDs still have a performance penalty for non-sequential IO vs. sequential IO – not as high as a disk seek, but enough to drop throughput by a factor of two or so. They also have high overhead for small random writes due to the need to erase the entire erase block the target block is located in. So SSD will beat the pants off of a disk on an uncached random read workload (e.g., system boot-up), but disks have the advantage on streaming reads and both streaming and random writes, generally speaking.
Another purported advantage of flash is lower power usage than disks. This isn’t a straight-forward equation; it depends on your usage pattern and the sophistication of the device and OS’s power-saving mode. Disks can not only be spun down, the internal electronics can be put into power-saving mode – as can elements of the host-side adapter and link. Don’t automatically assume that your SSD will use less power than a sophisticated disk in power saving mode.
Note that I am explicitly *not* talking about DRAM-based SSD. Those babies are fast, reliable, and very very expensive. If you have one, more power to you.
All of these equations will change as flash-based SSD gets better. Manufacturers will figure out better quality control, hardware wear-leveling will either get better or people will use log-structured file systems at the OS level, performance will improve, prices will drop. But if you are running out and buying storage today, you should buy a disk unless you fit one of the following categories: (a) You have a lot of money and there is some particular feature flash-based SSD gives you that is worth spending that money on, (b) You don’t care much about data integrity, (c) You won’t be doing a lot of writes, (d) You’re using a full-featured log-structured file system with built-in checksums.
Let’s come back to the Macbook Air. Supposedly, one would buy the SSD version because you wanted lower power consumption, better shock-resistance, and higher performance. You wouldn’t get the SSD because it costs a hell of a lot of money (about $1000 more) or because it has lower capacity than the hard drive version (I think it’s 40GB for the SSD and 80GB for the disk at present). The reports are that you really only notice the performance difference at boot. I’ve personally dropped my Air about a dozen times, once hard enough to dent a corner, and so far the disk is fine. Laptop disks in general have become quite reliable and it’s been years since I had one fail, even though I’m what they call a “digital nomad” – my laptop is my primary machine and I travel all the time. The battery life on my Air is stellar – 4 or 5 hours – and almost completely dominated by the display brightness. Dialing it up to max approximately halves the battery life.
Overall, I think people buy the SSD-based Air because it’s cool and new (a perfectly good reason) and because if it costs more, it must be better right? It’s also a status symbol. My personal recommendation: Buy the disk version of the Air. If you did buy the SSD version, back up frequently.
Postscript: Yes, this analysis is based on anecdotal evidence and personal experience, but I can’t afford the time to do real research unless someone pays me to. If you know someone who will, send me email!