I’ve been playing with WD Green disks, trying to solve the 4,096-byte sector problem. To summarize, Western Digital have started to move from 512-byte sectors to 4,096-byte sectors in order to reduce overhead and thereby increase the amount of data that can be stored on the same amount of platters with the same density. These disks (specifically, the EARS and AARS series) emulate 512-byte sectors for compatibility with older BIOSes and operating systems, but the problem is that they report 512-byte logical and physical sectors instead of 512/4,096.
If the length of a write operation is not a multiple of 4,096, or it does not begin at an address divisible by 4,096, either the beginning or the end of the operation, or both, will cover only part of a sector. This requires the disk to do a read-modify-write operation, meaning that it has to read a complete 4,096-byte sector, update parts of it, and write it back. This is extremely inefficient, as I will demonstrate later.
The reason why this matters so much is subtle. For efficiency reasons, most modern filesystems use on-disk structures of 4,096 bytes or more, so it shouldn’t matter, right? But on PCs, for legacy reasons, the first filesystem on a disk (or rather, the first partition) usually starts at sector 63, and 63 × 512 is not a multiple of 4,096. This means that every write operation will be misaligned.
In most cases, you can work around this by making sure, when you partition a new disk, that the first partition starts on a 4,096-byte boundary – say, sector 64 instead of 63. In addition to that, the WD EARS and AARS disks have a jumper setting that makes the disk offset every read or write operation by exactly one logical sector, so what the computer thinks is logical sector 0 is actually logical sector 1, and what the computer thinks is logical sector 63 is actually logical sector 64. Unfortunately, this means that systems that use the whole disk, starting at address 0, or that already take care to align their writes on 4096-byte boundaries, are screwed.
There is another problem: ZFS. ZFS operates on variable-sized blocks of any power of two between 512 bytes and 128 kilobytes. The only way to prevent ZFS from using block sizes smaller than 4,096 bytes is to build your vdevs from devices which advertise 4,096-byte sectors.
The ideal solution is to either a) force the disk to advertise its true physical sector size, or b) hack FreeBSD so it recognizes disks with 4,096-byte sectors.
Regarding the first option, it might be possible to lobby Western Digital to release a firmware upgrade like they did for the auto-idle issue.
As for the second solution, there is an important question: should we do this unconditionally? If we do, then misaligned filesystems on existing disks will become inaccessible. However, one could argue that those filesystems are already essentially unusable due to atrocious performance.