The Basic Guide to Working with ZFS

A few days ago thunderstorms rolled in unannounced. A crash of thunder hit as my lights went out and the blackened sky lit up for a moment. A few minutes, before the power had come back on, sirens blared as firetrucks headed down my street, a few houses down. I expected my file server to be down, but I hadn’t expected it to have fried. It didn’t POST anymore, the OS drive fried, one of the RAID-Z (RAID-Z1) drives popped as well, and it seemed like most things were lost.

With how bad everything was, I had pretty much accepted I’d need to restore from backup and do some work to get everything going again. I had some work (due ASAP naturally) scattered between several systems that synced to the share I’d have to track down and compare differences on. Work had been busy so I hadn’t quite gotten everything fully backed up either.

I was dealing with a 6TB share, so that wasn’t going to be easy. To top it all off, all of my writing, financial data, etc. were on this share (they were backed up elsewhere, just not as easily accessed or new). After a little work (most of which was finding a way to get a box with enough SATA ports to move things), I managed to get everything back up and running. All of this is thanks to ZFS and the raw power it has. My digital life in one place was wholesale moved in a matter of minutes (barring initial setup).

Let’s see what ZFS is, what it does well, and why you should consider it for your storage needs.

What Is ZFS?

ZFS (backronym of Zettabyte File System, but the Z doesn’t actually stand for anything) is a powerful file system that offers a huge number of powerful features. It combines LVM (logical volume management) with a file system leading to both physical management of volumes as well as storage. This marriage of functionality leads to the capacity to do things like healing (or at least flagging) for bit rot or bad blocks on RAID-Z volumes (versus traditional RAID which just duplicates the damage), compression, encryption, deduplication, data scrubbing (basically online fsck), etc. If it’s something an enterprise grade storage solution needs, ZFS probably has it.

ZFS was created in 2001 by Sun Microsystems as an open source file system with OpenSolaris. It was ported to BSD in 2008 and similar efforts began with Linux around this time as well. ZFS was released under the CDDL (Common Development and Distribution License) which is fundamentally incompatible with the GPL (General Public License). That being said, it’s easy to get via DKMS from any major Linux distribution.

Oracle later acquired Sun Microsystems and did what Oracle does best with ZFS. That being said, there’s still a form as OpenZFS which is mostly feature compatible with the version offered by Oracle. If it’s not, I haven’t noticed any difference, but I’d also never intentionally use anything directly supporting Oracle.

ZFS for RAID

RAID-Z is an extremely powerful RAID solution that allows pooling of disks like RAID, but has several features which make it more transparent to administrators and prevent some of the shortcomings of traditional RAID. When I originally built my server, I had the choice between traditional software RAID and ZFS.

Traditional RAID has limitations on disk sizes. If you buy 3 disks from 3 vendors (at the very least, you want to buy drives from different batches and this is a surefire way to do it) for your RAID5 array, you have to be careful with the sector counts and sizes. A disk that has an extra sector can prevent you from properly setting up an array if you don’t do your homework first. I just tossed my drives in and ZFS handled that sort of calculation and included a minor spillover in case we lost sectors.

RAID-Z1 replaces RAID5 in a traditional setup. Performance is pretty equivalent, but the benefit is in some of the features. As mentioned before, you get healing or at least detection of bit rot or a bad sector, it’s a lot easier to set up with mixed drives, etc. RAID-Z2 (equivalent to RAID6) performs better from my experience than any traditional RAID6 setup I’ve used with equivalent drives on an equivalent machine.

Managing ZFS

ZFS is extremely easy to manage. It uses simple commands which make sense. You use the zpool command to work with pools (think traditional volumes or partitions basically) and the zfs command to work with the lower level tasks (think managing file systems like mkfs, or mounting and similar). These divisions make sense when you look at how ZFS is structured, but initially the difference between certain operations under certain commands can seem arbitrary.

Once you get a pool set up, you’ll mainly end up working with zpool for most day to day tasks. The zfs command is useful for rolling back snapshots, taking snapshots, turning on or off options, mounting and unmounting volumes, etc. See this ZFS cheat sheet for more.

ZFS on (Debian) Linux

This part of the article will cover how to get started with ZFS. I’m going to focus on Debian, but the same basic process exists in almost all major distributions. I’m going to assume you have the drives you want installed and ready to go.

To get ZFS installed, start by editing your sources.list file and adding contrib (Ubuntu users should be able to skip this step). The easiest way is to run sudo nano -w /etc/apt/sources.list and edit the lines to look something like this: deb http://deb.debian.org/debian buster main contrib non-free (you may need to change buster to whatever you’re using). Do this every line that is default from Debian (add backports if it isn’t there already: deb http://ftp.debian.org/debian buster-backports main contrib non-free).

Save the file and run sudo apt update. If you don’t know what to do or understand working with apt, see this for more information on working with apt. Some Debian derivatives may require a PPA, but Ubuntu an Debian include it in their standard repositories.

After the repository finishes updating, run sudo apt install zfsutils-linux zfs-dkms which will install ZFS and the relevant packages to actually make it work. Once this installs, test it with sudo modprobe zfs. If you are unable to load the module, check and see if you have secure boot turned on for UEFI (it breaks a lot of things, including this). I could not get it to cooperate with secure boot on and I didn’t want to waste time doing this.

Actually Making a ZFS Pool

ZFS isn’t going to offer you much for a single disk and isn’t ideal on Linux distributions as a root file system. ZFS’ power lies in how it handles multiple disks.

Get a list of the disk ID’s you’ll be working with. We do not want to work with something like /dev/sdX as these are subject to change (though if you do, you can always export and import the pool before rebooting, see below for more).

You can run:

sudo ls -l /dev/disk/by-id
…
lrwxrwxrwx 1 root root   9 Aug 20 18:21 ata-LITEON_IT_LCS-256L9S-HP_002526106518 -> ../../sda
lrwxrwxrwx 1 root root  10 Aug 20 18:21 ata-LITEON_IT_LCS-256L9S-HP_002526106518-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Aug 20 18:21 ata-LITEON_IT_LCS-256L9S-HP_002526106518-part2 -> ../../sda2
…

You’ll skip the individual partitions and use the whole disks (so, for our example: ata-LITEON_IT_LCS-256L9S-HP_002526106518 would be a device to use). ZFS does have some options to expand pools, but it’s best to measure twice and cut once. Let’s go over our pool options. Also, for the remaining options, I’m going to assume every operation is being done from a root session or with sudo.

RAID0 Equivalent

This creates the equivalent of a striped RAID0 environment. This basically adds multiple disks together to make a larger disk.

zpool create [pool name] [device1] [device2] ...

This may look something like:

zpool create mystorage ata-LITEON_IT_LCS-256L9S-HP_002526106518 ata-SOMETHINGELSE_IT_LCS-256L9S-HP_002526106519

(I was really lazy with that other disk ID)

You can do this with a single disk, but you really won’t get much out of ZFS over any other mature filesystem doing this.

RAID1 Equivalent

This creates the equivalent of a mirrored RAID1 environment. This means 2+ drives become mirrored so you get half the storage (e.g. 2 1TB drives become 1 1TB pool) but redundancy.

zpool create [pool name] mirror [device1] [device2] …
RAID5 Equivalent

This creates the equivalent of a RAID5 setup with ZFS. You need at least 3 drives in order to use this functionality. Basically, you take 3 drives and get 2 drives of space with a parity drive. So, 3 1TB drives become 1 2TB pool which allows 1 disk to fail without killing the array. This is great for most basic media storage, but you shouldn’t throw too many drives into a standard RAID5 setup as 11 drives means you would have 10 times the storage of a single drive, but 2 disks popping would take out the whole array (keep reading for better solutions to these larger arrays).

zpool create [pool name] raidz [device1] [device2] [device3] …
RAID6 Equivalent

RAID6 is conceptually similar to RAID5 but with an extra parity drive. Basically, you take at least 4 drives, and get the storage of the total minus 2 (so, you get 3 drives of data with 5 drives, 4 with 6, etc.). The array will survive barring the failure of 3 disks.

zpool create [pool name] raidz2 [device1] [device2] [device3] [device4] …
RAID-Z3

Standard RAID doesn’t really have an equivalent to this. It’s basically like RAID5 or RAID6 but with yet another parity drive. This option doesn’t make a lot of sense for most use cases, but it exists. Basically, for 5 disks, you get the storage equivalent of 2 disks (n disks means you get n – 3 drives of storage). Obviously, for much larger arrays this can be okay, but there are arguably better setups with that level of storage for most common use cases.

zpool create [pool name] raidz3 [device1] [device2] [device3] [device4] [device5] …
RAID10 Equivalent

RAID10 is a combination of RAID0 and RAID1. You mirror 2 striped setups (so a minimum of 4 disks) leading to an environment where as long as 2 disks from each pool containing the same disk don’t fail, you’re fine. So, if you have disks 1, 2, and 3 and 1′, 2′, and 3′ as long as both 1 and 1′ (or 2 and 2′, etc.) don’t fail, the array survives.

zpool create [pool name] mirror [devicea1] [devicea2] … mirror [deviceb1] [deviceb2] …

To expound on the above, you can have any number in each mirror group so long as they’re the same number, but I used devicea and deviceb instead of 1 through 4 for the example just to make it more clear. This differentiation only impacts my numbering and not the array itself.

RAID50 Equivalent

RAID50 is like a combination of RAID5 and RAID0. Basically, you get the strengths of RAID5 for each section, and you’re fine as long as you don’t lose more than 1 disk in each pool.

zpool create [pool name] raidz [devicea1] [devicea2] [devicea3] … raidz [deviceb1] [deviceb2] [deviceb3] …

The numbering notation is similar to that of the RAID10 equivalent above.

Exotic Setups

ZFS allows many more types of setups, these are just the most common ones. ZFS offers more advanced features like vdev pools and similar. You can combine raidz options and do all sorts of stuff with mirroring if you’d like. ZFS allows a lot, and it’s beyond the scope of a single document to really list it all out.

Working With Your Pool

Assume you make /mnt/[name] for your ZFS data or similar. Now we need to set up ZFS to actually mount it.

zfs create -o mountpoint=/mnt/[name] [pool name]/[partition name]

The -o allows you to pass options and this tells us to make a “partition” with the name [partition name] under our newly created pool with a mount point at the previously created folder. If you don’t want to use all of the space for a single “partition” you can use the quota option (or refquota).

You can use zfs set to set properties after instantiation (and ideally before you fill it too much). For instance you can run:

zfs set quota=100g mypool/data

To set the maximum space mypool/data can use (including snapshots and similar, refquota sets the actual space the data can use). See this for more options on quotas and similar. See this for more about available properties.

I’m going to document some of the basics of common operations, but you should cross reference with more thorough documentation for actual administration. This is to serve as a bit of a reduced set of notes for ZFS, but also shows how easy some of these operations are comparatively.

Destroying Data

Just as you created pools and datasets, you can destroy them as well. To delete a data partition, just run the following:

zfs destroy [pool]/[data]

To delete everything in a data set and all levels recursively run:

zfs destroy -r [pool]/[data]

To delete every data set in a zpool run:

zfs destroy -r [pool]

To destroy the pool, run:

zpool destroy [pool]
Getting the Status and Fixing Things

The zpool command allows you to get information about a given zpool. Run:

zpool status [pool]

If something is wrong with some of the data, or you had a loss of power or similar, you may want to scrub the pool (think of this as an online fsck with fewer risks). Run the following to scrub a pool:

zpool scrub [pool]

You can check the status with:

zpool status [pool]

You can clear out error messages (this won’t fix underlying problems if they exist) with either:

zpool clear [pool]

Or for errors from a specific device:

zpool clear [pool] [device]
Adding and Removing Disks

You can add disks to a given pool as hot spares with the following:

zpool add [pool] spare [device(s)]

You can then replace a failed disk if necessary with a given spare.

zpool replace [pool] [device to replace]

If you hadn’t added the spare first, you can do the following:

zpool replace [pool] [device to replace] [device to use]

Offline the disk so you can remove it:

zpool offline [pool] [device(s)]

If the disk is unnecessary for the pool, remove it with:

zpool remove [pool] [device(s)]

This won’t work if the disk is in use.

You should check the status between each command. Make sure the pool starts to resliver before proceeding with the other operations. See something like this for more thorough troubleshooting for a failed drive.

It’s possible to extend the actual pool’s available data, but that is a bit more complicated and can lead to certain issues. Look into vdev‘s if you’re interested in this process.

Importing and Exporting Pools

This is what saved me and my data. I moved the disks over (physically) and imported my pool into a new, clean setup and was able to get back to business without any real issues.

You should ideally export a pool first:

zpool export [pool]

This basically unmounts the pool (and flags it as such).

On a new system, you can import the pool to “mount” it:

zpool import [pool]

If you have issues (like if it were not exported correctly or similar) you can try to force it. Since the physical OS disk was dead, I had no way to cleanly export my pool so it really didn’t matter for me. Run the following:

zpool import -f [pool]

ZFS Limitations

ZFS can do a lot but there are some limitations with it. For instance, it’s not ideal to try and expand a RAID-Z by adding disks and dynamically adjusting the pool. You can do something like it with vdevs, but it’s not something that makes sense for enterprise scenarios. Reslivering is expensive, diminishes performance, and creates a state prone to failure. You also run the risk of unbalancing a pool which means certain data is more failure prone. For an enterprise scenario, it makes more sense to just buy new hardware and migrate.

ZFS is all about measuring twice and cutting once. Know what you’re doing when you create pools or else you’ll have to redo things. Certain use cases are (now) possible, but are strongly discouraged in high availability scenarios. I personally wouldn’t want to add a disk to my pool outside of a replacement for a dead drive. It makes more sense to wait a bit and just create a whole new pool with a whole new setup.

This may seem a bit old guard, but the rules exist for a reason. Once you see a RAID array eat itself on a reslivering or similar, you understand where these principles come from. It may not matter for hobbyist use cases as much, but anything worth doing is worth doing right in my opinion. That said, some of the new features are really cool to play with for trivial data.

Going Further

I’ve barely scratched the surface of what ZFS can do. Feel free to check out documentation from Debian or Ubuntu for more information, or read more from the manual if any of this is confusing. Oracle also has documentation, but their version of ZFS does differ some in workflow with OpenZFS.

ZFS is an extremely powerful filesystem which can save you a lot of headaches over traditional RAID. It saved me from potential disaster (or at least a lot of wasted time for nothing). See how ZFS can help you with data storage and administration. Take note of its limitations and what it does well and you will have a much better time with it. It’s not magic, but it feels like it if you know what to use it for and why.

Image by Bruno /Germany from Pixabay