Benchmarking Advanced Format drives

Important: due to a bug in my benchmark program, the tps numbers in this post are incorrect. See here for the correct numbers.

In the previous post, I discussed Western Digital’s “Advanced Format” drives and the problems caused by their misreporting their real, physical sector size.

I wrote a benchmark utility to demonstrate the performance penalty of unaligned accesses and uncover a drive’s physical sector size. What it does is write blocks of zeroes varying size at regular intervals. For each block size, it writes a total of 128 MB at intervals of four times the block size, and at an offset that varies from 512 bytes up to half of the block size.

With the default settings, the first pass will write 131,072 1,024-byte blocks at n × 4,096, and the second pass will do the same at n × 4,096 + 512. The third, fourth and fifth passes will write 65,536 2,048-byte blocks each at n × 8,192, n × 8,192 + 512 and n × 8,192 + 1,024. It will make four more passes with 4,096-byte blocks and five with 8,192-byte blocks.

Here’s the idea: most passes will be very slow (up to half an hour per pass), but when we hit the right block size and alignment, performance will skyrocket; so on—let’s say—a WD20EARS with factory settings, passes 6 (4,096 bytes at offset 0), 10 (8,192 bytes at offset 0) and 14 (8192 bytes at offset 4,096) should stand out from the crowd. In fact, here are the results for passes 6 through 9:

   count    size  offset    step        msec     tps    kBps
   32768    4096       0   16384       19503     138    6720
   32768    4096     512   16384     1216537       2     107
   32768    4096    1024   16384     1213479       2     108
   32768    4096    2048   16384     1214623       2     107

Pass 6 takes 20 seconds, while passes 7, 8 and 9 take 20 minutes.

Let me rephrase that: properly aligned non-sequential writes are faster than misaligned ones by a factor of sixty.

Sixty. Six zero.

We really, really need to get that fixed somehow.

That’s not the whole story, though. Let’s see how it compares to a 7,200 rpm, 2 TB Hitachi Deskstar (HDS722020ALA330) with 512-byte physical sectors:

   count    size  offset    step        msec     tps    kBps
   32768    4096       0   16384        8803     307   14889
   32768    4096     512   16384        8701     310   15063
   32768    4096    1024   16384        8735     309   15004
   32768    4096    2048   16384        8705     310   15056

The Hitachi blows through the test so fast you don’t even have time to make yourself a cup of coffee, let alone drink it.

This is a 7,200 rpm, 400 GB Caviar SE16 (WD4000AAKS)—more than three years old, so don’t expect too much:

   count    size  offset    step        msec     tps    kBps
   32768    4096       0   16384       21348     126    6139
   32768    4096     512   16384       21674     124    6047
   32768    4096    1024   16384       20799     129    6301
   32768    4096    2048   16384       21031     128    6232

So, about the same as we get from the WD20EARS with aligned writes.

Now, here’s the kicker. The last drive in my test lineup is a WD20EADS—almost the same as the WD20EARS, but with 512-byte sectors and only 32 MB cache (although cache doesn’t mean anything here—I made sure my test program writes enough data to blow through the cache on every pass).

   count    size  offset    step        msec     tps    kBps
   32768    4096       0   16384       22811     118    5745
   32768    4096     512   16384       19552     138    6703
   32768    4096    1024   16384       36945      73    3547
   32768    4096    2048   16384       50102      53    2616

Ouch. It’s not just slow, it’s also very inconsistent. I have no idea what to make of that.

Note 1: I did not mention rotational speed for the WD Green disks, because Western Digital themselves do not specify one; the spec sheet just says “IntelliPower”. Not sure what to make of that, either. Tom’s Hardware contradict themselves, saying in one review that it means 5,400, and in another that it means it varies. Meanwhile, my supplier claim the WD20EARS rotates at 7,200 rpm. Go figure.

Note 2: I also have a 1 TB WD10EARS, but I haven’t tested it yet. I expect it to perform pretty much as well (or as poorly, depending on your perspective) as the WD20EARS.


Update: the results for the WD10EARS are in. Strangely, it is much faster at unaligned writes than the WD20EARS, although it’s a little slower at aligned writes.

   count    size  offset    step        msec     tps    kBps
   32768    4096       0   16384       23105     116    5672
   32768    4096     512   16384       79285      34    1653
   32768    4096    1024   16384       75814      35    1728
   32768    4096    2048   16384       79920      33    1640

A naïve sequential-write benchmark (diskinfo -t) suggests that it’s about 20% slower overall. It is possible that both disks use a striped layout internally, so the WD20EARS gets better results because it has more platters. If that is the case, it should be possible to modify phybs to detect the stripe size.

28 thoughts on “Benchmarking Advanced Format drives”

  1. That depends. If you need large amounts of storage for backup or archival purposes, and performance is not a big issue, they have a better dollar-per-terabyte ratio than pretty much any other disk on the market. For instance, my regular supplier charges NOK 895 for a WD20EARS, NOK 995 for a 2 TB Samsung SpinPoint F3 EcoGreen, and NOK 999 for a 2 TB Hitachi Deskstar 7K2000 (same model that I tested). That’s a 10% price advantage in WD’s favor.

    (BTW, if you or anyone else is in a giving mood, I’d love to test the SpinPoint…)

  2. Very interesting. And shame on them for reporting the wrong blocksize!

    I just bought a ssd to put in my laptop, and guess I’ve got to test that for sector size as well.

    My plan is to have a zfs-only setup with gpt on it, do you have any recommendations?

  3. SSDs are complicated. You can flip any bit anywhere from 1 to 0 at any time, but you can only flip it back to 1 by erasing the whole block. The “erase block size” varies; my impression is that it tends to be large (on the order of 128 kB).

    I wouldn’t run ZFS on a single-disk system, BTW. A single-vdev ZFS pool is significantly slower than UFS on the same device. Where ZFS really shines is large RAID setups with multiple raidz or raidz2 vdevs in the same pool.

  4. Well, considering that I now run a zfs only setup on 2.5inch sata 5400rpm drive i think it will feel snappier with a ssd :)

    The speed is not my main concern when choosing between zfs and ufs, I consider the features of zfs to far outweigh the slowdown. Which is not to say that I like to throw away performance needlessly.

    Thanks for your input though :)

  5. Good read. However the TPS metric is incorrect because the calculation overflows. Fix (assuming sizeof(long) >= 5):

    — phybs.c (revision 212043)
    +++ phybs.c (working copy)
    @@ -102,7 +102,7 @@
    usec = t1.tv_sec * 1000000 + t1.tv_usec;
    usec -= t0.tv_sec * 1000000 + t0.tv_usec;
    printf(“%10lu%8lu%8lu\n”, usec / 1000,
    – count * 1000000 / usec,
    + count * 1000000UL / usec,
    count * size * 1000000 / 1024 / usec);
    free(buf);
    }

  6. Andreas and Pieter, sorry for forgetting to check my moderation queue…

    Thanks for the fix, Pieter. I’ll try to re-run the bencmarks and publish corrected figures. Luckily, the kBps figures were correct.

  7. new wd15earx, freebsd 8.2 release amd64, 4k alignment
    FS formatted with
    newfs -S 4096 -b 32768 -f 4096 -U /dev/ada1p2
    [root@timp ~/phybs]# ./phybs /dev/ada1p2
    count size offset step msec tps kBps

    262144 512 0 2048 37585 6974 3487

    131072 1024 0 4096 20415 6420 6420
    131072 1024 512 4096 20353 6439 6439

    65536 2048 0 8192 12144 5396 10792
    65536 2048 512 8192 12179 5380 10761
    65536 2048 1024 8192 12166 5386 10773

    32768 4096 0 16384 7490 4374 17499
    32768 4096 512 16384 6565 4990 19963
    32768 4096 1024 16384 7881 4157 16631
    32768 4096 2048 16384 8652 3787 15149

  8. No, that’s not right.

    First of all, formatting the partition has no effect, since phybs operates directly on the device, not on the filesystem. If you want to test the filesystem, you have mount it, create a large file, and run phybs on that file.

    Secondly, you ran phybs on a partition instead of the whole disk. This means the results will be skewed unless the partition is aligned, and you didn’t show what gpart commands you used to create it.

    Finally, you ran phybs in read mode (the default), so the drive’s prefetch cache is masking the effects of unaligned accesses.

    From the data sheet, it looks like the EARX drives are identical to the EARS drives except for the SATA interface (6 Gbps instead of 3 Gbps), so the results should be pretty much the same. Try this on a scratch disk:

    # phybs -w -l 1024 /dev/ada1

  9. Thank you
    [root@timp ~/phybs]# ./phybs -w -l 1024 /dev/ada1
    count size offset step msec tps kBps

    131072 1024 0 4096 132845 986 986
    131072 1024 512 4096 128295 1021 1021

    65536 2048 0 8192 73596 890 1780
    65536 2048 512 8192 67192 975 1950
    65536 2048 1024 8192 67330 973 1946

    32768 4096 0 16384 16911 1937 7750
    32768 4096 512 16384 51732 633 2533
    32768 4096 1024 16384 51460 636 2547
    32768 4096 2048 16384 52040 629 2518

    16384 8192 0 32768 11221 1460 11680
    16384 8192 512 32768 48505 337 2702
    16384 8192 1024 32768 50572 323 2591
    16384 8192 2048 32768 49215 332 2663
    16384 8192 4096 32768 11123 1472 11783

  10. Just got a server back from service, and they had replaced the Seagate drives with 1.5TB WD Green Power drives.

    root#nfs1004 [/local/src/phybs] dmesg | grep da0
    da0 at mpt0 bus 0 scbus0 target 0 lun 0
    da0: Fixed Direct Access SCSI-5 device
    da0: 300.000MB/s transfers
    da0: Command Queueing enabled
    da0: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
    root#nfs1004 [/local/src/phybs] ./phybs -rw /dev/da0
    count size offset step msec tps kBps

    262144 512 0 2048 546207 479 239

    131072 1024 0 4096 450918 290 290
    131072 1024 512 4096 416217 314 314

    65536 2048 0 8192 378569 173 346
    65536 2048 512 8192 327592 200 400
    65536 2048 1024 8192 338574 193 387

    32768 4096 0 16384 229636 142 570
    32768 4096 512 16384 276989 118 473
    32768 4096 1024 16384 257056 127 509
    32768 4096 2048 16384 260296 125 503

    Am I right in concluding that I should create 4k-aligned devices for these before creating a zpool?

    Cheers

  11. No, you should send the disks back and ask for WD Caviar Blacks or Samsung Spinpoint F4s (not F4EG) instead.

    Seriously, I cannot overemphasize how crappy these drives are. Feel free to refer them to me if they object.

    1. Thanks, I’ll return the disks and swap them for WD Caviar Blacks now. (The supplier agreed fully that those WD GP disks were unsuitable for ZFS, and couldn’t quite understand why they had sent us those when they were aware that they would be used with ZFS.)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.