The Ubuntu ZFS boot pool problem [Part I]

Sorry, as you can see from the title, this is another “not-ESP8266” article, so you can skip this if you’re only interested in our magical little wireless modules. However, if you’re running a system (laptop, server or workstation) loaded with Ubuntu and have the root/boot partitions mounted on a ZFS filesystem, you should at least stick around long enough to read through the problem description, below.

If you landed here from a web search for “can’t upgrade Ubuntu – not enough space on bpool” (or something similar), then you’re in the right place. You can probably skip the introduction to the problem (you already know what you’re facing) and go directly to the solution.

Note that in these articles I am assuming that you are familiar with Linux and confident in your own abilities to administer your own system. Because we are dealing with filesystems, I am also going to assume that you have made and verified back-ups of your whole disk(s) before you follow any of the tips below. I am not responsible for your data. You are. The advice below is given in good faith after having been tested on several of my own machines (of different types), but I cannot be held responsible for any loss of data.

The bpool Problem

You see the familiar Software Updater pop-up, announcing that there are package updates available and prompting you to install them. You scan the list of updates and everything looks good, so you hit the “Install Now” button, but instead of a progress meter, you see a pop-up error window with the message:-

The upgrade needs a total of 127 M free space on disk '/boot'. Please free at least an additional 107 M of disk space on '/boot'.

The message continues with a couple of suggestions of how to increase available space in “/boot”, but basically at this point you are unable to install updates.

You’ve just run into the “bpool problem”.


Background

For the last few releases, Ubuntu has provided an easy method for installing root on a ZFS filesystem, giving you access to snapshots and the instantaneous rollback to previous versions which it enables (as well as the possibility of mirroring) without having to go through the long and tedious process of adding ZFS support after the actual install. It was a major move forwards and the “one-click” install process made it simple for anyone to benefit from ZFS technology (not just snapshots, rollbacks and mirrors, but raidz and ZFS send/receive back-ups, too). With all of those benefits, why wouldn’t you just use it by default?

Well, as a long time ZFS user, I can tell you that it does come with a few drawbacks. It is, for instance, somewhat more difficult to judge the available space left on any physical device than it was in the pre-ZFS world (good ole’ “df -h” is very good at giving you an instant and pretty accurate impression of the current capacity of your disks). ZFS tends to muddy that simplistic view a little, until you get used to depending more on the output from “zpool list -v” and “zfs list -o space” instead of “df”.

Fragmentation is another issue. Die-hard Unix users will remember that “fragmentation” was always a problem for people who used that other operating system, not for Unixes (well, unless you started to run out of inodes, anyway). However, with ZFS you do need to keep an eye on that “FRAG” column in the output of “zpool list -v”; once it starts climbing to around the 50~60% level it’s definitely time to start planning for an upgrade and, once it’s past 70% it’s time to take some urgent action (usually, but not always, time to add more physical disk space — we’ll touch on de-fragmentation as a side-effect in the “Solutions” presented in part-II). Once it hits 80%, you’re in big trouble as, even with available free space, the system will be struggling to find usable space.

These issues are a pain to come to terms with after all of these years of doing the odd “df” just to check that everything was okay, but they are still heavily outweighed by the advantages (did I mention copy-on-write or checksummed data yet?).

So, our simple ZFS installation method from Ubuntu is a real game changer and gives everyone the chance to ride on the ZFS bandwagon with minimal effort. Unfortunately, along with that simplicity and ease of installation comes another problem — sizing your filesystems. The installer is a one-size-fits-all deal; it looks at your disk size and makes a couple of fairly simplistic calculations to create a very simple partition table and then later adds a spread of ZFS datasets on top of those physical partitions. The “bpool” (for “boot-pool”) is a single ZFS filesystem which occupies a dedicated partition just for the kernel/initramfs and EFI/grub data required to boot the system. The “rpool” (for “root-pool”) is the ZFS filesystem which contains datasets for /, /var and /usr (as well as sub-directories) and the user home directories. It’s all very well laid out, so that a rollback to (for instance) a previous kernel version need not affect the user’s data. However, the installation script for ZFS currently has one major issue; it limits the size of the bpool physical partition to 2GB (it can be smaller, but not bigger), no matter what the size of your physical disk

The major drawback that people are finding to this installation mode comes from one of the strengths of ZFS — the snapshot functionality. Put simply, if you make a snapshot of say, your whole bpool, ZFS will never delete anything from that point in time. If you subsequently delete a kernel which was present when you took the snapshot, ZFS will keep the file itself, hidden away in the snapshot and only remove the named link to the file from the normal /boot directory. If, at some later time, you decide that you no longer need that snapshot (because it is outdated and you know you’re never going to revert to that old kernel again) you can destroy the snapshot and only then will you recover that disk space. This is one of the reasons that space management on a ZFS filesystem is a bit of a minefield for new users and is exactly why the “bpool problem” exists.

Every time Ubuntu pushes out an update (even a non-kernel update), the installer will automatically create a snapshot of your filesystem, so that you can roll back to the previous version if things go horribly wrong. In the case of actual kernel updates, the snapshot can consist of tens, or even hundreds of megabytes of combined kernel and initramfs files. Given that we have an upper limit of 2GB on the size of /boot (the bpool filesystem), that space doesn’t go very far if you don’t actively manage the snapshots. Worse, the Software Updater will start to fail when there is no longer enough free space in /boot to fit another version of the kernel and, to be clear, even non-kernel updates will fail to be installed if there is a kernel update still in the queue (even if you de-select it manually). The error messages are clear and quite specific (and even give tips on how to ameliorate the issue). However, by the time Software Updater flags the problem, it can already be too late to easily fix it (and how many of us know which system-generated snapshots are safe to remove and which aren’t, anyway?).

Short-term band-aids

These suggestions are not a true solution, they are fixes to get you up and running in order to have Software Updater working again in the shortest possible time.

Before making any changes which might permanently erase data from your filesystems, you should make a back-up of your whole disk (I’ll be repeating this at various points throughout these articles, because I really mean it — I cannot and will not be held responsible for any loss of data you might suffer while trying to follow the tips given here; I always assume that you have backed-up your data and that you have verified that those back-ups work. Batteries not included. May contain nuts.).

First, one quick fix which Software Updater lists in its output when this problem occurs — edit the /etc/initramfs-tools/initramfs.conf file and change the COMPRESS setting from “lz4” to “xz”. This won’t take effect until the next kernel update, but at that point the initramfs install will use the more efficient (compression, not speed) “xz” method. This can slice around 20MB off each of the initrd.img files in the /boot directory and so will save you about 40MB of space on each update (there are usually two versions, current and “.old”).

The results of this second tip are much more difficult to predict in terms of exact space saved — snapshot whack-a-mole.

You can list your snapshots in bpool using:-

zfs list -t snap -r -o name,used,refer,creation bpool

This will get you a listing with entries that look something like this (but probably much longer):-

NAME              USED   REFER  CREATION
@autozsys_7lzjmg  197M   197M   Wed Dec 30 11:07 2020
@autozsys_9u20hc   17K   199M   Wed Jan 20  6:56 2021
@autozsys_6c4qgp    0B   199M   Thu Jan 21  6:12 2021

[Note that I have removed the pathname before the “@” symbol to fit the text without line wraps]

As you can see, the “used” and “refer” columns vary widely between the snapshots. The bottom entry (6c4qgp) has a “used” entry of zero bytes; surely that can’t be right, can it?. Well, that’s the whack-a-mole function coming into play. In this particular case, there don’t appear to have been any changes to bpool between autozsys_9u20hc and autozsys_6c4qgp, so if we destroyed 6c4qgp, we wouldn’t get any free space released back to the filesystem. So where does the “refer” come from? That’s letting us know that even though there were no changes between the last two snapshots in our list, they both have references to 199MB of data already being held in other, previous snapshots (or the filesystem itself). What this means for us is that while destroying 6c4qgp may not free up any space, it is very likely that destroying 7lzjmg or 9u20hc will cause 6c4qgp to inherit some of that referenced data, thus causing the “used” count for 6c4qgp to increase and the filesystem not to gain back as much free space as we expected (hence “whack-a-mole”, the used data count might just pop back up elsewhere).

Now, to be fair, because we’re dealing with /boot and the turnover is usually caused by the replacement of kernel/initramfs files, it is more than likely that destroying the oldest snapshot will simply delete the oldest of those kernel-related files, returning about 140MB of free space to the filesystem. Likely, but not certain, so don’t go destroying snapshots left, right and centre without checking their contents first.

LISTING OF /boot CAPACITY AND AVAILABLE SPACE
["df -h"]
Filesystem                 Size  Used Avail Use% Mounted
bpool/BOOT/ubuntu_g1cutr   145M  117M   29M  81% /boot

["zfs list"]
NAME                       USED  AVAIL     REFER  MOUNT
bpool/BOOT/ubuntu_g1cutr  1.52G  28.5M      116M  /boot

["zfs list -o space"]
NAME                      AVAIL   USED  USEDSNAP  USEDDS
bpool/BOOT/ubuntu_g1cutr  28.5M  1.52G     1.41G    116M

You can always check the contents of any given snapshot by noting where the ZFS pool is mounted and then navigating to the .zfs/snapshot directory at that point. So, for our bpool snapshots, we can see from the output of “df” or “mount” that bpool/BOOT/ubuntu_g1cutr is mounted at /boot. Listing /boot/.zfs/snapshot will give us a listing of directories which correspond to the snapshot names in the listing above. You can list each of those directories to see what files and directories are included in the snapshot. As /boot is actually quite small, you can easily do an “ls -i” on two of the snapshot directories and see which files have the same inodes and which are different (which gives a good indication of which files are shared between different snapshots and the current, live filesystem and which are unique to a given snapshot).

Snapshots are removed using the “zfs destroy” command, by the way, not by removing the snapshot directories.

Don’t forget that destroying any snapshot restricts your ability to recover to a known point in time, so I would urge you to err on the side of caution — if you’re not 100% certain of what you’re about to remove, don’t do it!

If you’re a user who likes to create their own snapshots (before a major upgrade, for instance) you might already be able to easily target some of your own snapshots as candidates for deletion (perhaps you already know that you’re not going to roll back to that ancient, previous release?).


The [partial and not very satisfactory] Solution

The stop-gap solutions listed above are just that; short term solutions which will give you some extra breathing space, but not long term fixes. To remedy the bpool problem long term, we obviously need to add a substantial chunk of extra disk space.

One brutal, but simple way of getting back more bpool space (and a solution which is very topical with the release of 21.10 almost upon us) is to re-install Ubuntu after editing the ZFS section of the install script to bypass the problem. I’m not suggesting that this is the best or most versatile answer to the issue (in most cases it won’t be), but if you happen to be on the verge of upgrading, or perhaps have a machine where all of the application and user data is mounted from a fileserver rather than local disk, this may be an option (but, as always, I would recommend a full back-up of the system before taking such a drastic step).


Just in case you didn’t quite get that… ==WARNING== The following steps will delete -ALL- of the data on your disk. Do -NOT- proceed with the steps below if you are not prepared to have your disk(s) totally wiped of all existing data.


Booting from the Ubuntu install image will drop you into the Try-or-Install header page. Select “Try Ubuntu”, which will restart the desktop to the normal, live image. From there you can open a terminal session and become root using “sudo -s” (no password required).

Using whatever editor you’re most comfortable with, open /usr/local/share/ubiquity/zsys-setup.

On (or about) line number 267, you’ll find this code:-

[ ${bpool_size} -gt 2048 ] && bpool_size=2048

You just need to comment out that line completely for the simplest fix. Doing so will allow the bpool size calculation to grab a much larger chunk of your available disk space. Note that if your disk is 50GB or less, this isn’t going to help — you’ll still end up with roughly 2GB. Conversely, if your disk is 1TB or larger, you may end up with much more bpool space than you actually need, or want. In these cases, you might want to change line 267 to use some value other than 2GB; I found 10GB (ie:- replace “2048” with “10240”) to be satisfactory for my machines, but if you happen to be a kernel developer, you might want to bump that up even further.

Following the edit of the ubiquity file, you simply select “Install Ubuntu” from the desktop icon and proceed with the normal ZFS install.


Okay, that’s the gist of the bpool problem, along with a couple of suggestions for clawing back some disk space and the most drastic way of fixing the issue.

In the second part of this series, we’ll be looking at a couple of more reasonable methods of fixing existing installations without resorting to a full re-install (fair warning though — it will still involve booting from the install media, so is not entirely without risk).

And don’t forget …back up early and back up often!


Will it mirror?

Pic of (old) Laptop with SSD attached to lid

Here’s another silly one for you. What do you do if the latest release of your OS of choice ships with ZFS, but you don’t have space in your laptop for a second disk? …Answer:- Reach for the velcro.

This Sony Vaio has a Centrino Core-2 p8600 processor, so it’s not going to break any speed records, but it works well enough for day to day use. Courtesy of the Buffalo 500GB SSD taped to the lid, it now sports 1TB of disk space (500GB mirrored), which is probably well in excess of what the designers originally envisaged.

Centrino.2 sticker next to USB ports

This is one of those little “because I can” projects which I don’t necessarily recommend to anyone else, but at the same time, the lightness of an SSD compared to a normal hard-disk (even a 2.5″ one) means this is now an eminently practical solution if your old laptop happens to be running out of space (I do move the laptop around the house to work in different rooms at different times, but it generally doesn’t travel much further afield than a deck-chair out on the veranda).

So, will it mirror? Heck yes!

Should I mirror?

No, probably not. It’s much more sensible to use a periodic ZFS send/receive job to back up your work to an existing server, that way you don’t need to worry about the extra drain on your battery and you still have your work if your laptop is stolen or knocked off the deck of your yacht, mid-ocean (what, you mean that’s never happened to you?).

One other consideration when thinking about installing Ubuntu on ZFS — currently (as of 21.04) Ubuntu will not allow you to edit the size of the disk partitions; you must accept their optimized defaults. Unfortunately , those defaults include a /boot partition which is much too small (typically 2GB). It will work fine for a couple of months, but with every apt update, the system will automatically add a snapshot of the boot partition. When the upgrade includes a kernel update, this means that tens, or even hundreds of megabytes of storage can be used. Even when you set the system defaults to compress the kernels using “xz”, it doesn’t take too many updates before you start getting “not enough free space” messages from apt and it will refuse to continue with the update. This is not something a novice user can easily recover from (hint: deleting files on a ZFS partition doesn’t always return that free space to the system — it all depends on whether it is still being held by a snapshot).

ZFS snippet — “import”

Nowadays I can easily forget what I’ve already done.  When it comes to ZFS, that includes forgetting what pools I’d already created on a particular device.  That’s the point of this “memo to self” …zpool import is your friend.

When you want to import the pools after juggling disks between machines, but can’t remember what the heck you named the pools in the first place, just do:-

zpool import

…to display a list of all available pools which are not currently attached (or a message to the effect that there are none available).  The important point here is that it won’t actually try to actually import any pools with this simple command; it only lists them.

If you have a pool which is shown as available for import (ie:- it exists, but is not currently attached  —  which might be the case if you’ve just physically moved a disk from some other machine to this system), you can temporarily have it imported and attached to a temporary mount point using the “-R /tmp/NAME” option.  This can be really handy for recovering data from a “retired” drive.  For instance, an old disk has a “data_tank” pool, which had the mountpoint “/store” on the old machine.  You want to recover some data from that drive, but your current system already has “d_pool” mounted at “/store”, so you can’t simply do “zpool import data_tank”, because the mountpoint is already in use.  Your import command should instead be:-

zpool import -R /tmp/old_store data_tank

…and all of your old data will be mounted to /tmp/old_store where you can access it normally.  But wait, there’s more!

If you already have an existing pool named “data_tank” on your current system, you can have “import” rename the old pool to something different (to prevent embarrassing mistakes) by simply appending a new name to the previous command:-

zpool import -R /tmp/old_store data_tank old_data_tank

Now when you do a “zpool status” or “zpool list” the pool mounted from the old disk will show up as “old_data_tank”.

When things go awry

What about if you’ve already suffered some late-night brain fade and just destroyed the live pool on your current system  by mistake?  Well, as long as you realize your mistake fairly promptly and haven’t already scribbled all over the disk, import can help you with that, too.  The “-D” option will show you any pools which have been destroyed, but for which ZFS can still find valid metadata.  So:-

zpool import -D

…will display all pools, including previously destroyed ones, which still appear to be available for importing.

zpool import -D -f -R /tmp/old_store data_tank old_data_tank

…will import the original “data_tank” pool (note the -D -f options to force import of a previously destroyed pool) with the new pool name of “old_data_tank” and mount it at /tmp/old_store.

More pool recovery tricks

There are additional options to “import” which can further aid recovery of incomplete pools (see the zpool manual page entries for import and check the “-F”, “-m” and “-n” options for more information on how import can provide extra help for getting out of sticky situations).

Doing things nicely

While the import command will do it’s very best to save you from yourself, you can help things along considerably by doing the right thing and using the “export” command on any pool which you intend to re-import elsewhere at a later date.  Note that this command will make the target pool unavailable on the system where you run it (that’s the whole point …to effectively shut down the pool cleanly and prevent any further modifications by marking it as still being reserved space), but the subsequent import shouldn’t have any difficulty at all when re-importing a previously exported pool.