ZFS has a couple of very useful functions, zfs send and zfs receive, which allow you to serialize a complete ZFS dataset and recreate it in a different location. They can also be used to serialize a delta between two snapshots and apply that delta to a previously created copy of the dataset. You see where I’m going with this… That’s right, incremental backups of a ZFS dataset or even an entire pool to a different ZFS dataset or pool.
Why would you want to perform incremental ZFS-to-ZFS backups instead of just adding redundancy to the pool, or cloning a snapshot? Because—provided the ZFS pool and filesystem versions match—it allows you to duplicate your dataset or pool on removable media (which you can store off-site), or even on a different machine across the network. This technique is far more efficient than rsync, because there is no need to compare the source and destination: ZFS already knows exactly what has changed. It also preserves the filesystem hierarchy and dataset properties.
In my case, I need to duplicate a pool onto removable media because I am replacing a server that only takes PATA disks with another that only takes SATA disks, which precludes just moving the disks over and progressively replacing them with new ones. Using this technique, when the time comes, I can slide the new server into the rack, hook up the backup disk, and restore just the parts I want to keep.
Of course, like a good little hacker, I wrote a script, which you can find here, to automate this.
The script takes two arguments: the source dataset and the destination dataset. Either of these can be the root of a ZFS pool or a dataset within a pool; they can even be datasets within the same pool, provided they do not overlap. The script selects the latest snapshot of the destination dataset (it uses a naming scheme which ensures that lexical order corresponds to chronological order), verifies that the source dataset has a snapshot with the same name, takes a new snapshot of the source dataset, and streams the difference between the old and new snapshots from the source dataset to the destination dataset. Finally, it deletes the old snapshot to allow ZFS to reclaim the space occupied by old data.
You can use this script with multiple backup disks, since it will only delete the snapshot that was actually used for the current disk. If you have one disk for each day of the week, for instance, it will delete last Monday’s snapshot once it has completed this Monday’s backup, but leave the other six in place. Likewise, if you decide to keep Sunday’s disk for a month instead of reusing it next Sunday, the script will leave the snapshot in place until you run it again with the same disk.
The script does not currently support over-the-network backups, but it should be fairly easy to implement.