What kind of idiot stores his home directory and his entire web content on a striped set of four disks, with no redundancy?
Me, that’s who. You’d think I would have learned from January’s fiasco that disk crashes don’t just happen to other people. But back in January, I was (relatively) lucky, as the disk that crashed was part of a mirrored set. Not so this time.
The file server actually crashes when trying to read from the faulty disk, so I had to get creative and figure out a way of not only copying it to a healthy disk over the network, but doing so in a way that allows me to recover from crashes and continue where I left off. The result is ndr, the Network-assisted Disk Recovery tool.
In case you’re wondering, dumping the contents of four 300 GB disks over a 100 Mbps network takes a long time. Do the math: with 100% network saturation and no protocol overhead, it takes 300,000,000,000 * 8 / 100,000,000 = 24,000 seconds, or six hours and forty minutes. With the 30 Mbps throughput I’m actually getting, it’s closer to twenty hours per disk. As I write these words, ndr has gotten 40% through the second disk…
The good news is that this leaves me a lot of time to think of improvements to ndr, such as moving from a process model to a thread model and adding a curses-based interface on the server side. It also leaves me a lot of time to ponder backup strategies and how to make sure the replacement array will survive a disk failure without sacrificing too much storage capacity.
Currently, I’m leaning towards ZFS, which is now available on FreeBSD for the courageous. It is able, when suitably configured, to recover from the loss of any one disk in a pool, without the consistency issues you get with software RAID level 5 (if you lose power during the so-called “write hole”, when a data block has been updated but the parity block hasn’t yet, or vice versa, there is no way to tell which block later needs to be reconstructed, and you lose the entire stripe). ZFS also keeps checksums of everything on disk, allowing it to detect bit rot which with other schemes would either go undetected or result in a degraded array with (again) no way of knowing which block is intact and which has been corrupted.
ZFS is also fully and openly documented, which is cool. I have long wanted to write a JFS implementation for FreeBSD, but as far as I can tell, the on-disk layout is documented only in a) header files from the GPLed Linux implementation and b) IBM documentation CDs that come with AIX. Neither of these is an option if you plan to reimplement JFS from scratch under a BSD license.
All of this is academic, though, until and unless I finish recovering the contents of the failed array (which at this rate is going to take several more days) and actually have something to store on the new one…