‡ — No invertebrates were harmed during the production of this article.
Almost exactly two weeks ago I got a (normal) pop-up message on my screen, letting me know that there were updates available from Ubuntu for my 21.10 version of Mate and that, by the way, 22.04 was now available if I wanted to upgrade. I politely refused the upgrade offer, but accepted the updates. Once the updates were completed I was prompted to reboot (and I noted that the initramfs had been updated). This is all normal stuff for the millions of us running the various versions of Ubuntu, so after finishing up what I was working on (and washing my hands) I hit the reboot button.
It hasn’t for the last two weeks (not without a recovery image or an external disk plugged in, anyway).
At first I wasn’t too concerned. This (as the ‘mericans say) isn’t my first rodeo and, if everything else failed, I’m using ZFS, so I just had to roll back to the last good snapshot, right?
As I wasted more and more time futzing around with things, I just got more and more confused. The EFI boot was working and grub was being run, but it insisted that it couldn’t find a workable filesystem. Booting into a live-CD image said there were two perfectly good ZFS pools on the disk (actually a 512GB NVME SSD) and a “zfs scrub” passed with flying colours on both of them (bpool and rpool). Running grub-probe always came back with a message along the lines of “Nope, ain’t nuthin there that I recognize, pinhead!”.
By this time I had convinced myself that someone had broken into Ubuntu’s update servers and slipped a Trojan into my disk headers, or maybe someone was slipping magic mushrooms into my breakfast tea, or aliens had abducted my real ZFS pools and replaced them with zombies. Anyway, I needed to get some work done, so I found an unused external USB disk, fired up a live-CD again and, because this is ZFS, did a ZFS send-receive between the original disk and the external. When it came time to run the grub install (on the external disk) it once again insisted that there was nothing at all there which it recognised as any sort of valid filesystem. Duh!
Okay, nuts to this. I put a 22.04 live-CD image in, booted it up (“Oh look, you have two perfectly good ZFS pools on this external disk! Are you really sure you want me to scribble all over them?”) and installed a brand spanking new version of 22.04 onto the external, followed by a ZFS send-receive of just the USERDATA filesystem from the original NVME drive (I’d like to say that I rebooted and everything sprang into life on the external disk, but of course it didn’t. You can’t easily remove an NVME “disk” from a laptop and of course EFI and grub just kept right on trying to access it).
Eventually, I did manage to broker a truce between the laptop BIOS, EFI and grub, to the point that I could reboot the system successfully, as long as I was actually there to hit the F9 key and manually select the EFI entry for the external disk …otherwise it would stubbornly continue with a blinky dance of reset, spin disk up, flash something on the screen for 10ms, spin the disk down and repeat. I never did manage to read what that 10ms micro message was (something from those dang aliens again).
But glory hallelujah!! It was fantastic! Not only did I have Mate back again, but the stupidly useless ELAN touchpad on the HP S15 laptop actually worked properly for the first time ever!! Banzai!! Well done 22.04! I’m sorry that I refused your offer of an upgrade at the start of all of this.
Well, long story longer, that’s the way that things have stayed for the past two weeks. Not that there hasn’t been great gnashing of teeth (not recommended at my advanced age) and thrashing of Gewgull searches (and even Ubuntu’s bug database) for any hints of what could be causing my problems. Searches including both “grub” and “aliens” turned up some interesting stuff, but it seems that I’m the only semi-sentient being-man-thingy in the universe who is having problems with those things in connection with computers.
Eventually, the third neuron in the second bank started firing occasionally and I began to have flashbacks of the horrible problems I had a while back, trying to do ZFS back-ups between my Linux laptop and the mirrored disks on my FreeBSD servers. Although the FreeBSD servers could send data to a receiving Linux system, the other way around just would not work at all. It turned out that there were incompatibilities between the ZFS properties used between the two systems, even though the ZFS versions were meant to be the same (upgrading the FreeBSD servers to version 13.1 seemed to cure that issue). Now it seemed probable that there was some similar incompatibility between grub and ZFS …and, finally, it clicked.
One of those property updates which hit FreeBSD a while back was the introduction of “zstd” compression (Ubuntu introduced it a short while later). I had tried changing the compression to “zstd” on FreeBSD and noticed a worthwhile increase in compression-ratio with no apparent difference in speed, so I’d made the changes on my laptop, too. The back-ups kept on working and I was saving more disk space …what’s not to like?
Double-duh! Pinhead at work (stand clear!).
Okay, so I have (I suspect, anyway) a philosophical disconnect between grub and ZFS about what constitutes a filesystem. In my book, ZFS wins (whatever grub says, I’m an ex-Sun guy and old loyalties die hard). What to do now? Well,back on Gewgull I replace “aliens” with “alternative” and get a few hits on something called “ZFSBootMenu”. I go to their GitHub repository and start reading. Nope, I must have left “aliens” in there somewhere, ‘coz I can’t understand this stuff. It starts off more or less okay (I understand their reference to FreeBSD’s boot environments …a very handy feature, sign me up), but then they start wandering off into the weeds with things like “fzf” and “dracut” (which sounds like a bad day at the vasectomy clinic) and I want to stop reading so badly that my teeth hurt (told you not to gnash). I briefly leave the README page and head over to the pre-built releases instead.
Ah, this is interesting. “ZFSBootMenu v2.0.0 …New features – Dracut is now optional;”. Phew! Well that’s a relief. Unfortunately, there are four different files (and additional source tarballs), but no information on what the differences are. A couple have “release” in their names and the others have “recovery”, but no actual information in the v2.0.0 section about what the differences are or how to use them. Two are compressed tar files (okay, I know what those are, anyway) but the two others have a “.EFI” suffix, which I haven’t come across before, but it’s not too much of a stretch to suspect that they should be used in the Extensible Firmware Interface (that’s the BIOS to you, sonny!). But how?
Back to the README:-
“Each release includes pre-generated images (both a monolithic UEFI applications as well as separate kernel and initramfs components suitable for both UEFI and BIOS systems) based on Void Linux. Building a custom image is known to work in the following configurations…”
Well there you go then. C’mon! Chop-chop! Get on with it!
I won’t bore you with the rest of it, but despite the rambling (or partly missing) documentation, it turns out that ZFSBootMenu is really quite a nifty bootloader (grub alternative) specifically for ZFS filesystems. It does have a couple of quirks, but after a few false starts, I did manage to bring my 21.10 NVME SSD back from zombieville and ended up with a laptop which I could once again disconnect from all external cables and disks and still have it boot reliably. To do ZFSBootMenu full justice I need to put together another post with a step-by-step guide of how to recover a stuffed ZFS machine, but just in case you’re stuck up this same creek without a paddle, here are just a couple of essential hints to getting ZFSBootMenu to work:-
- Use the .EFI file (I used the “release”, but “recovery” should also work okay).
- –Remember– When you’re booting using ZFSBootMenu, it assumes that the kernel and initramfs are on the root partition, so you’re going to be booting from “rpool”, not “bpool” (and so your kernel and initramfs need to be present on “rpool”, which is not standard in Ubuntu).
- Disconnect any other disks which you may have added to the system in previous attempts to fix this boot problem (ie:- external USB disks with copies of your original bpool/rpool filesystems). They will just confuse EFI, grub and you.
- Set “canmount=noauto” on all filesystems which have “mountpoint=/” set (this one is really important).
- Make sure that your ZFS pools can be imported without having to use “import -f” (use a live-CD or ZFSBootMenu itself to import the pools manually, fix any problems and then export them again, before you reboot).
- Lastly, for those others of you out there with non-US keyboard layouts, the “Alt” key for command mode selection within ZFSBootMenu will probably not be the “Alt” key. For my Japanese keyboard, the magic key turned out to be the one key which I had never before pressed on my keyboard …the (shudder!) “Windows” key.
And the bottom line… Just don’t change your bpool to zstd compression. Okay?!? † ∇
† — From a cursory≅ check of the Grub2 code in the upstream Debian repository, it looks as though the zstd compression library was added in November of 2018. Unfortunately it seems to have been implemented only for BTRFS, not for ZFS.
≅ — “cursory” Adjective. Causes the reader to leap out of their chair and shout “G’dammit!!” (this answer is sure to get you an “A” in GCSE English).
∇ – Grub Compatibility — As usual, the folks at OpenZFS/Debian/Ubuntu are way ahead of me. If you need to create a new pool which will work with Grub2, there is already an existing compatibility file (they’re stored in /usr/share/zfs/compatibility.d). You can use it like this:-
zpool create -o compatibility=grub2 <POOL NAME>
If anyone is desperate enough to have read all the way down here to the bottom, I’ll just note that this problem hasn’t been submitted as an actual bug, because the auto-submission procedure won’t let you submit a bug against a program which isn’t installed on your system and, of course, I’d removed the grub package and used ZFSBootMenu to actually get my laptop working again. However, the secondary suggested method of opening the issue as a question in LaunchPad is available here if you’re interested (and I’d like to thank Manfred Hampl for trying to help me out there).
Update — 27th July 2022 – I have submitted bug request #1982897 with a condensed description of this issue and a request for an upstream addition of “zstd” support for ZFS in the Grub2 code.
Update #2 — 27th July 2022 – Submitted bug #62821 as a feature request for the addition of “zstd” support to the Grub2 ZFS code.
Update #3 — 13th August 2022 – Updated the Grub2 bug report with a little more detail on the severity of the problem, as there hasn’t been any acknowledgement of the original submission as yet.