opened 05:29AM - 04 Oct 24 UTC
bug
**Describe the bug**
If systemd is used within the initramfs, and the `root=` k…ernel parameter is present but incorrect, it becomes extraordinarily difficult to boot manually from the emergency shell, even if the user knows which partition they want to boot from and mounts it to /sysroot. After mounting the partition to /sysroot and running `exit`, the emergency shell will appear to hang indefinitely. If the user has booted with `rd.debug` enabled, the displayed logs show that systemd is stuck in a loop, re-executing parts of dracut over and over. The emergency shell is *not* given to the user again, the system just sits there looping forever.
**Distribution used**
This is reproducible on both Fedora 40 and Debian Trixie.
**Dracut version**
103
**Init system**
systemd
**To Reproduce**
1. Install Fedora 40 (it ships with dracut as the initramfs generator and thus it is easier to reproduce this there).
2. Force the GRUB menu to be shown at bootup (this can be done by running `sudo grub2-editenv - unset menu_auto_hide`)
3. Reboot.
4. At the GRUB menu, press `e` to edit the default boot entry.
5. Corrupt the `root=` kernel parameter somehow (changing one character of the UUID is enough). Also add `rd.debug` and `rd.timeout=2` kernel parameters to speed things up and get detailed logs.
6. Press Ctrl+X to boot. You will be dropped to an emergency shell.
7. Run `cd / && mount /dev/myRootPartition /sysroot -osubvol=root`, replacing `myRootPartition` as appropriate.
8. Run `exit`. The system will go into an infinite loop and never boot.
**Expected behavior**
dracut should recognize that `/sysroot` has been mounted and contains a usable root filesystem, and thus skip all other steps, proceeding directly to initramfs cleanup and switch-root.
**Additional context**
You actually *can* get the system to boot from the emergency shell at this point, but doing so is... arcane, and the steps vary from distro to distro. The way I managed to do it with Fedora 40 is:
* Mount the proper disk and subvolume to /sysroot.
* Run `cd /usr/lib/systemd/system-generators; rm dracut-rootfs-generator systemd-debug-generator systemd-gpt-auto-generator; mv systemd-fstab-generator ../systemd-sysroot-fstab-check` You have to delete all generators from here, except for `systemd-fstab-generator` which must be moved to `/usr/lib/systemd/systemd-sysroot-fstab-check`.
* Run `rm -rf /run/systemd/generator` to get rid of all the systemd unit pieces that insist on the improperly specified root drive being present.
* Modify `/usr/lib/systemd/system/initrd-parse-etc.service` to look for `systemd-fstab-generator` in its new location: `sed -i 's/\/usr\/lib\/systemd\/system-generators\/systemd-fstab-generator/\/usr\/lib\/systemd\/systemd-sysroot-fstab-check/' initrd-parse-etc.service`
* Run `systemctl daemon-reload`.
* Finally, run `exit`. The system will boot properly now.
For those who are wondering, yes, it took approximately five hours of debugging to figure this out. :P
If you are attempting to reproduce this issue with Debian Trixie, some notes:
* Debian uses initramfs-tools by default rather than dracut, so you have to install dracut yourself. `sudo apt install dracut` as usual.
* You MUST then proceed to install systemd-cryptsetup: `sudo apt install systemd-cryptsetup` dracut will be unable to boot your system if you don't install this, even if your disk is not encrypted.
* It is unnecessary to modify `initrd-parse-etc.service` under Debian, it's pointed at the `systemd-sysroot-fstab-check` file by default.
* You will have to run `mount -o remount,rw /usr` near the beginning of the recovery process, since by default everything under `/usr` is mounted read-only by Debian's dracut.
I believe the root cause of this bug is the fact that there are various files under `/run/systemd/generator` that look for the specified root disk, carefully coordinating the timing of unit execution such that until the device pointed to by `root=` shows up, the system will not boot. Simply deleting these files in the emergency shell isn't enough though, as they are latent in systemd's memory, and running `systemctl daemon-reload` results in all of the systemd generators being re-executed, which thwarts even that. You have to gut out the generators themselves too in order to get things to work properly, and depending on the distro that can be easier said than done thanks to the multiple uses of `systemd-fstab-generator` and the occasional situation of `/usr` being mounted read-only.