In the end, I only lost two sectors: one in the middle of an ISO file in my home directory, another somewhere in my DocumentRoot. Both files were easily recoverable. The affected file systems are now safely parked on a mirror while I get the new array up and running.
ZFS proved uncooperative at first: I had trouble getting a consistent and up-to-date set of patches, and every time I tried to create a file system and copy data over, it would panic. Pawel and I tracked it down to zfs_reclaim(), and finally figured out that it was caused by zfs_reclaim() calling vdropl() directly instead of vdrop(). The thing is that vdropl() is actually private to vfs_subr.c, and declared static; the code just happened to build and work because ZFS was being built with most warnings turned off, and most testers didn’t set CPUTYPE. Giving vdropl() external linkage and a prototype in vnode.h put an end to the kernel panics.
Oh, and by the way, bunnies are cute.
What kind of idiot stores his home directory and his entire web content on a striped set of four disks, with no redundancy?
Me, that’s who. You’d think I would have learned from January’s fiasco that disk crashes don’t just happen to other people. But back in January, I was (relatively) lucky, as the disk that crashed was part of a mirrored set. Not so this time.
The file server actually crashes when trying to read from the faulty disk, so I had to get creative and figure out a way of not only copying it to a healthy disk over the network, but doing so in a way that allows me to recover from crashes and continue where I left off. The result is ndr, the Network-assisted Disk Recovery tool. Continue reading “What we have here is a lack of redundancy” »
(actual diagnostic message from the on-board JMicron RAID controller on an Asus P5B-V motherboard)
I learned a few lessons on Monday:
- Lesson the first
- If you have a RAID 1 array, and one of the drives suddenly drops out of it, do not simply assume it was a software error and reassign it, or you will be very unhappy a few months later when that drive really fails. Continue reading “Detect drives done, no any drive found” »