Grub bites man. Man stamps on grub. ‡

‡ — No invertebrates were harmed during the production of this article.

TLDR:-  Skip straight to the updates at the bottom of the page for links to the bug reports.  The short version is “You can blow away any root-on-ZFS Linux system by enabling zstd compression on bpool (or whatever the actual boot partition is called on your particular distribution)”. Do NOT do this (not even to test it); you will regret it!

Almost exactly two weeks ago I got a (normal) pop-up message on my screen, letting me know that there were updates available from Ubuntu for my 21.10 version of Mate and that, by the way, 22.04 was now available if I wanted to upgrade.  I politely refused the upgrade offer, but accepted the updates.  Once the updates were completed I was prompted to reboot (and I noted that the initramfs had been updated).  This is all normal stuff for the millions of us running the various versions of Ubuntu, so after finishing up what I was working on (and washing my hands) I hit the reboot button.

It didn’t.

It hasn’t for the last two weeks (not without a recovery image or an external disk plugged in, anyway).

At first I wasn’t too concerned.  This (as the ‘mericans say) isn’t my first rodeo and, if everything else failed, I’m using ZFS, so I just had to roll back to the last good snapshot, right?

Wrong!

As I wasted more and more time futzing around with things, I just got more and more confused. The EFI boot was working and grub was being run, but it insisted that it couldn’t find a workable filesystem.  Booting into a live-CD image said there were two perfectly good ZFS pools on the disk (actually a 512GB NVME SSD) and a “zfs scrub” passed with flying colours on both of them (bpool and rpool).  Running grub-probe always came back with a message along the lines of “Nope, ain’t nuthin there that I recognize, pinhead!”.

By this time I had convinced myself that someone had broken into Ubuntu’s update servers and slipped a Trojan into my disk headers, or maybe someone was slipping magic mushrooms into my breakfast tea, or aliens had abducted my real ZFS pools and replaced them with zombies.  Anyway, I needed to get some work done, so I found an unused external USB disk, fired up a live-CD again and, because this is ZFS, did a ZFS send-receive between the original disk and the external.  When it came time to run the grub install (on the external disk) it once again insisted that there was nothing at all there which it recognised as any sort of valid filesystem.  Duh!

Okay, nuts to this.  I put a 22.04 live-CD image in, booted it up (“Oh look, you have two perfectly good ZFS pools on this external disk!  Are you really sure you want me to scribble all over them?”) and installed a brand spanking new version of 22.04 onto the external, followed by a ZFS send-receive of just the USERDATA filesystem from the original NVME drive (I’d like to say that I rebooted and everything sprang into life on the external disk, but of course it didn’t.  You can’t easily remove an NVME “disk” from a laptop and of course EFI and grub just kept right on trying to access it).

Eventually, I did manage to broker a truce between the laptop BIOS, EFI and grub, to the point that I could reboot the system successfully, as long as I was actually there to hit the F9 key and manually select the EFI entry for the external disk  …otherwise it would stubbornly continue with a blinky dance of reset, spin disk up, flash something on the screen for 10ms, spin the disk down and repeat.  I never did manage to read what that 10ms micro message was (something from those dang aliens again).

But glory hallelujah!!  It was fantastic!  Not only did I have Mate back again, but the stupidly useless ELAN touchpad on the HP S15 laptop actually worked properly for the first time ever!!  Banzai!!  Well done 22.04!  I’m sorry that I refused your offer of an upgrade at the start of all of this.

Well, long story longer, that’s the way that things have stayed for the past two weeks.  Not that there hasn’t been great gnashing of teeth (not recommended at my advanced age) and thrashing of Gewgull searches (and even Ubuntu’s bug database) for any hints of what could be causing my problems. Searches including both “grub” and “aliens” turned up some interesting stuff, but it seems that I’m the only semi-sentient being-man-thingy in the universe who is having problems with those things in connection with computers.

Eventually, the third neuron in the second bank started firing occasionally and I began to have flashbacks of the horrible problems I had a while back, trying to do ZFS back-ups between my Linux laptop and the mirrored disks on my FreeBSD servers.  Although the FreeBSD servers could send data to a receiving  Linux system, the other way around just would not work at all.  It turned out that there were incompatibilities between the ZFS properties used between the two systems, even though the ZFS versions were meant to be the same (upgrading the FreeBSD servers to version 13.1 seemed to cure that issue).  Now it seemed probable that there was some similar incompatibility between grub and ZFS  …and, finally, it clicked. 

One of those property updates which hit FreeBSD a while back was the introduction of “zstd” compression (Ubuntu introduced it a short while later).  I had tried changing the compression to “zstd” on FreeBSD and noticed a worthwhile increase in compression-ratio with no apparent difference in speed, so I’d made the changes on my laptop, too.  The back-ups kept on working and I was saving more disk space …what’s not to like?

Double-duh!  Pinhead at work (stand clear!).

Okay, so I have (I suspect, anyway) a philosophical disconnect between grub and ZFS about what constitutes a filesystem.  In my book, ZFS wins (whatever grub says, I’m an ex-Sun guy and old loyalties die hard).  What to do now?  Well,back on Gewgull I replace “aliens” with “alternative” and get a few hits on something called “ZFSBootMenu”.  I go to their GitHub repository and start reading.  Nope, I must have left “aliens” in there somewhere, ‘coz I can’t understand this stuff.  It starts off more or less okay (I understand their reference to FreeBSD’s boot environments …a very handy feature,  sign me up), but then they start wandering off into the weeds with things like “fzf” and “dracut” (which sounds like a bad day at the vasectomy clinic) and I want to stop reading so badly that my teeth hurt (told you not to gnash).  I briefly leave the README page and head over to the pre-built releases instead. 

Ah, this is interesting.  “ZFSBootMenu v2.0.0  …New features – Dracut is now optional;”.  Phew!  Well that’s a relief.  Unfortunately, there are four different files (and additional source tarballs), but no information on what the differences are.  A couple have “release” in their names and the others have “recovery”, but no actual information in the v2.0.0 section about what the differences are or how to use them.  Two are compressed tar files (okay, I know what those are, anyway) but the two others have a “.EFI” suffix, which I haven’t come across before, but it’s not too much of a stretch to suspect that they should be used in the Extensible Firmware Interface (that’s the BIOS to you, sonny!).  But how?

Back to the README:-

Each release includes pre-generated images (both a monolithic UEFI applications [sic] as well as separate kernel and initramfs components suitable for both UEFI and BIOS systems) based on Void Linux. Building a custom image is known to work in the following configurations…

Well there you go then.  C’mon!  Chop-chop!  Get on with it!

I won’t bore you with the rest of it, but despite the rambling (or partly missing) documentation, it turns out that ZFSBootMenu is really quite a nifty bootloader (grub alternative) specifically for ZFS filesystems.  It does have a couple of quirks, but after a few false starts, I did manage to bring my 21.10 NVME SSD back from zombieville and ended up with a laptop which I could once again disconnect from all external cables and disks and still have it boot reliably.  To do ZFSBootMenu full justice I need to put together another post with a step-by-step guide of how to recover a stuffed ZFS machine, but just in case you’re stuck up this same creek without a paddle, here are just a couple of essential hints to getting ZFSBootMenu to work:-

  • Use the .EFI  file (I used the “release”, but “recovery” should also work okay).
  • Remember– When you’re booting using ZFSBootMenu, it assumes that the kernel and initramfs are on the root partition, so you’re going to be booting from “rpool”, not “bpool” (and so your kernel and initramfs need to be present on “rpool”, which is not standard in Ubuntu).
  • Disconnect any other disks which you may have added to the system in previous attempts to fix this boot problem (ie:- external USB disks with copies of your original bpool/rpool filesystems).  They will just confuse EFI, grub and you.
  • Set “canmount=noauto” on all filesystems which have “mountpoint=/” set (this one is really important).
  • Make sure that your ZFS pools can be imported without having to use “import -f” (use a live-CD or ZFSBootMenu itself to import the pools manually, fix any problems and then export them again, before you reboot).
  • Lastly, for those others of you out there with non-US keyboard layouts, the “Alt” key for command mode selection within ZFSBootMenu will probably not be the “Alt” key.  For my Japanese keyboard, the magic key turned out to be the one key which I had never before pressed on my keyboard …the (shudder!)  “Windows” key.

And the bottom line… Just don’t change your bpool to zstd compression.  Okay?!?    †  ∇


†  —  From a cursory check of the Grub2 code in the upstream Debian repository, it looks as though the zstd compression library was added in November of 2018.  Unfortunately it seems to have been implemented only for BTRFS, not for ZFS.

  —  “cursory”  Adjective.  Causes the reader to leap out of their chair and shout “G’dammit!!”  (this answer is certain to get you an “A” in GCSE English).

∇  –  Grub Compatibility  —  As usual, the folks at OpenZFS/Debian/Ubuntu are way ahead of me.  If you need to create a new pool which will work with Grub2, there is already an existing compatibility file (they’re stored in /usr/share/zfs/compatibility.d).  You can use it like this:-  

   zpool create -o compatibility=grub2  <POOL NAME>

If anyone is desperate enough to have read all the way down here to the bottom, I’ll just note that this problem hasn’t (has …see below) been submitted as an actual bug, because the auto-submission procedure won’t let you submit a bug against a program which isn’t installed on your system and, of course, I’d removed the grub package and used ZFSBootMenu to actually get my laptop working again. However, the secondary suggested method of opening the issue as a question in LaunchPad is available here if you’re interested (and I’d like to thank Manfred Hampl for trying to help me out there).

Update — 27th July 2022 – I have submitted bug request #1982897 with a condensed description of this issue and a request for an upstream addition of “zstd” support for ZFS in the Grub2 code.

Update #2 — 27th July 2022 – Submitted bug #62821 as a feature request for the addition of “zstd” support to the Grub2 ZFS code.

Update #3 — 13th August 2022 – Updated the Grub2 bug report with a little more detail on the severity of the problem, as there hasn’t been any acknowledgement of the original submission as yet.

Advertisement

There be bees here…

If you’ve never sat close to a bee hive and watched the little ladies going about their business, you’ve missed one of natures treats.  They are fascinating creatures on their own and the social cohesion and workings of the hive make them doubly so.  Bee keepers have always had to tread the delicate line between upsetting the balance of daily life in the hive against it’s internal state and the general welfare of the bees, though.  We have traditionally had to open the hives on a regular basis to check on the health of the bees themselves, as well as the state of honey and food stores.  While we still need to keep a careful eye on the health of the bees, IoT and ingenuity are making it easier to look after the bees without having to disrupt the hive with such frequent openings and I was very pleasantly surprised to come across the OpenHiveScale project a couple of days ago.

This is a new and ingenious take on a perennial favourite; a method of monitoring the weight of a beehive and thus being able to judge the amount of stored honey (summer) or food supplies (winter), without having to open the top of the hive and pull out individual frames of comb to inspect them (which always disrupts the hive, no matter how careful the beekeeper is).

The problem initially looks easy to techies.  Just pop a digital scale under the hive and automate sampling of the weight reading to have it delivered directly to your smart-phone.  Unfortunately, if you’ve tried going down this route, you will have discovered for yourself that it’s not quite that simple.  Digital scales (and load sensors in general) don’t deal well with being in a loaded condition for long periods of time (they tend to drift in a non-linear manner) and even if they did, there is still the problem of constantly supplying power to the scale at some remote (sometimes -very- remote) site.  One of the other options is to use a mechanical scale and somehow collect the data (the common low-tech version of this has a cheap, mechanical bathroom scale in a wooden housing under the beehive with some sort of mirror arrangement to allow the beekeeper to monitor the weight on his visits to the apiary).

OpenHiveScale mechanismThe OpenHiveScale project overcomes these problems by using a design based on the ancient “Steelyard” balance, with its sliding weight mechanism.  The fulcrum is folded within the frame to provide a wide load capability in a compact package.  The sliding counterbalance weight is driven by a stepper-motor and belt combination, with the “smarts” for the whole system provided by an ESP8266.  

The whole of this project (electronics, PCB, mechanical build and firmware) are all available from the OpenHiveScale GitHub repository.  Unfortunately, comments in the code are few and far between and there are no real build instructions on the site (there is an assembly_manual.pdf in the “mechanical” directory of the repository, but it is limited to diagrams, without any narrative), so you might have a difficult job duplicating their original work.  The good news is that they do sell kits.  The bad news is that they’re currently out of stock (and the price, including postage, is over €200).

The controller is modular, so if the hive is outside of WiFi coverage, an optional GSM, LoRa or SigFox card can be added (for an extra €30 to €40) to increase the range.

Power consumption is kept to a minimum by using an RTC chip to gate the enable signal on an LM3671 (low quiescent current) regulator and the recommended power source is a pack of 3 x AA alkaline batteries.

Unfortunately, the price (and lack of specific build details) make this a non-starter for all but the most well-heeled beekeepers, which is a shame.  We can only hope that this unique design will become popular enough to make it into mass production (with, hopefully,  a resulting drop in price).  It’s easy to imagine other applications for this scale, too.  Anyone who relies on tanked propane gas, for instance, would welcome the convenience of knowing just when the tanks need to be switched over (instead of having to go out at 10:30 at night, still wet and soapy from an interrupted shower).