SATA is not SCSI… or is it?

One further comment on The sorry state of open source today, which I did not want to include in my previous entry as I felt it would distract from my main point, which was the inaccuracies in the author’s discussion of FreeBSD.

On page 19, Béranger discusses problems with the disk drivers in Linux 2.6.20. These problems are real (though hopefully transient), and I have myself been bitten by them, as on one machine, Ubuntu’s linux-image-2.6.20-14-386 would not recognize the disks at all; I could boot an older kernel, but then of course nvidia-glx, which had been updated to match the newer non-working kernel, would not load.

Where Béranger stumbles is where he asserts—or implies—that there are fundamental differences between PATA, SATA and SCSI, and that it therefore does not make sense to use similar names (/dev/sdX) for them all.

Continue reading “SATA is not SCSI… or is it?”

The sorry state of The Jem Report

Jem Matzan’s The Jem Report is running a so-called editorial by Radu-Cristian Fotescu (aka. Béranger) titled The sorry state of open source today. I say so-called, because it is more of a rant than an editorial: 26 pages long and not entirely coherent.

I won’t waste your time with a point-by-point rebuttal of this piece, not least because most of what he writes is pure opinion and interpretation. I don’t necessarily agree with it—I find him a little too radical and a little too confrontational—but he’s entitled to it.

(I do agree with his views on the differences between the GPL and the BSD license, but that’s neither here nor there)

What I take exception to are factual errors in his discussion of *BSD, and specifically of FreeBSD.

Continue reading “The sorry state of The Jem Report”

On the longevity of hard drives

By now, the fact that disk drives fail a lot more than the vendors say they should, and for different reasons than we used to think, should be old news. However, it’s been on my mind a lot lately, as in the last three months I’ve lost two drives, and a third is starting to fail. Coincidentally, all three are Maxtor DiamondMax 10 drives, one 150 GB and two 300 GB, all SATA150. They are all well within their design life (and warranty); they have all operated well within their environmental limits; there is no reason why three out of the six Maxtor drives I have should fail in such rapid succession, while all my Western Digital drives – some of them twice as old – are fine. In fact, I’ve never lost a Western Digital drive; on the other hand, all the IBM drives I’ve had are toast, as is the only Seagate I ever bought, a Barracuda that was pretty much DOA, though I misidentified the problem and let the disk lie on a shelf while the warranty ran out. Continue reading “On the longevity of hard drives”

Dead Disk Update

In the end, I only lost two sectors: one in the middle of an ISO file in my home directory, another somewhere in my DocumentRoot. Both files were easily recoverable. The affected file systems are now safely parked on a mirror while I get the new array up and running.

ZFS proved uncooperative at first: I had trouble getting a consistent and up-to-date set of patches, and every time I tried to create a file system and copy data over, it would panic. Pawel and I tracked it down to zfs_reclaim(), and finally figured out that it was caused by zfs_reclaim() calling vdropl() directly instead of vdrop(). The thing is that vdropl() is actually private to vfs_subr.c, and declared static; the code just happened to build and work because ZFS was being built with most warnings turned off, and most testers didn’t set CPUTYPE. Giving vdropl() external linkage and a prototype in vnode.h put an end to the kernel panics.

Oh, and by the way, bunnies are cute.

What we have here is a lack of redundancy

What kind of idiot stores his home directory and his entire web content on a striped set of four disks, with no redundancy?

Me, that’s who. You’d think I would have learned from January’s fiasco that disk crashes don’t just happen to other people. But back in January, I was (relatively) lucky, as the disk that crashed was part of a mirrored set. Not so this time.

The file server actually crashes when trying to read from the faulty disk, so I had to get creative and figure out a way of not only copying it to a healthy disk over the network, but doing so in a way that allows me to recover from crashes and continue where I left off. The result is ndr, the Network-assisted Disk Recovery tool. Continue reading “What we have here is a lack of redundancy”