Derivative Maker Automated CI Builder

Mycobee · September 7, 2022, 10:16pm

Still running in to some very frustrating issues with the mount point being busy. Not sure why. I restarted, unmounted everything, etc

$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

$ sudo umount /home/ansible/derivative-binary/Whonix-Gateway-XFCE_image
umount: /home/ansible/derivative-binary/Whonix-Gateway-XFCE_image: target is busy.

Sure if I add a -f or -l flag

but if I unmount I still see

$ sudo losetup -a
/dev/loop0: [65025]:1312560 (/home/ansible/derivative-binary/ae1f3d469da21e5532573f32b0f9415e14d849e8/Whonix-Gateway-XFCE-ae1f3d469da21e5532573f32b0f9415e14d849e8.Intel_AMD64.raw (deleted))

running sudo losetup -D does not detach it, but when I reboot the machine, sudo losetup -a shows nothing (so restarting does in fact kill the losetup, but only after the force umount)

I am not sure how all this stuff works. I love linux, but really feelin like a dumb webdev right now

About to rerun the build for new logs

Mycobee · September 7, 2022, 10:17pm

sorry to keep buggin ya @Patrick, I am trying

Logs should be here after the build fails
https://github.com/Mycobee/derivative-maker/actions/runs/2986664249

Patrick · September 8, 2022, 11:09am

There used to be an obscure but in kpartx that was probably not really cleanly fixed:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=734794

Could either be the same issue here or something similar. Whatever that might be…

Can we automate rebooting the server? I am asking, because I’d like to iterate faster here.

If I detect a stale mount and folder /home/ansible exists can I tell the build script just the reboot the server? Then I could do more builds in quick succession.

I would also temporarily change the build script to a simpler state that does not create a full build but just to quickly debug the mount issue.

Mycobee · September 8, 2022, 7:55pm

If we are going to check for the stray mounts in the derivative-maker scripts, here is a solution I came up with to umount.

STRAY_MOUNT=$(df -h | grep 'ansible\/derivative-binary')

if [ -n "$STRAY_MOUNT" ]; then
  MOUNTED_FOLDER=$(echo $STRAY_MOUNT | awk '{print $6}')
  $SUDO_TO_ROOT umount -l $MOUNTED_FOLDER
fi

Mycobee · September 8, 2022, 8:21pm

Okay so now CI always reboot the VPS before running the derivative-maker build step. This should fix the issue with sudo losetup --all having a lingering loopback

I think mount steps above can fix the issue with the mounted volume, if necessary.

Mycobee · September 9, 2022, 4:08am

With the reboot implemented, the same issue is occuring with the umount
https://github.com/Mycobee/derivative-maker/actions/runs/3018043263

I think the above solution I posted with the umount -l could potentially fix it

Patrick · September 9, 2022, 11:10am

Mycobee:

If we are going to check for the stray mounts in the derivative-maker scripts, here is a solution I came up with to umount.
STRAY_MOUNT=$(df -h | grep 'ansible\/derivative-binary')

if [ -n "$STRAY_MOUNT" ]; then
  MOUNTED_FOLDER=$(echo $STRAY_MOUNT | awk '{print $6}')
  $SUDO_TO_ROOT umount -l $MOUNTED_FOLDER
fi

If the build script and/or the CI scripts check for stray mounts, I guess is best if you decide.

The build script will check for stray mounts anyhow because non-CI builds should get a build error early when there is a stray mount. That test can stay independent from any potential test by the CI scripts.

The tests that I am currently using:

Usually a CI sets environment variable CI=true. And depending on that, the build script could perform some action it does not do outside of CI.

But if the build script is a clean place to say “please reboot the CI” that is highly questionable. Could be a rather unclean hack.

I guess -l, –lazy won’t help much here. The mount is probably staying forever until reboot.

Sure, if that works.

An additional test that would probably be useful in the CI scripts (similar to the test already in the build script):

   losetup_output=$($SUDO_TO_ROOT losetup --all)

   if [ "$losetup_output" = "" ]; then
      true "INFO: Output of losetup_output is empty. No stray loop devices, OK."
      return 0
   else
      error TODO bail out here and reboot CI server
   fi

That is probably ideal and better than what I said above.

Does it work for you if you run that command manually? I mean, the command will exit without error but will the mount point be unmounted indeed? If yes, I’ll happily add it to the build script.
Actually, I might just try that now.

Mycobee · September 9, 2022, 2:40pm

Yup

ansible@host:~$ df -h
Filesystem           Size  Used Avail Use% Mounted on
udev                 974M     0  974M   0% /dev
tmpfs                199M  540K  198M   1% /run
/dev/vda1             50G   17G   31G  35% /
tmpfs                992M     0  992M   0% /dev/shm
tmpfs                5.0M     0  5.0M   0% /run/lock
/dev/vda15           124M  5.9M  118M   5% /boot/efi
/dev/mapper/loop0p1   98G  2.8G   91G   4% /home/ansible/derivative-binary/Whonix-Gateway-XFCE_image
tmpfs                199M     0  199M   0% /run/user/1001

ansible@host:~$ sudo umount -l /home/ansible/derivative-binary/Whonix-Gateway-XFCE_image

ansible@host:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            974M     0  974M   0% /dev
tmpfs           199M  540K  198M   1% /run
/dev/vda1        50G   17G   31G  35% /
tmpfs           992M     0  992M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/vda15      124M  5.9M  118M   5% /boot/efi
tmpfs           199M     0  199M   0% /run/user/1001

Patrick · September 9, 2022, 3:00pm

I am very surprised a normal umount doesn’t work but umount with -l / --lazy works. Yeah, then that would mean there’s some sort of bug in umount or some other low level tool or even the kernel.

Therefore I’ve already rewritten for more defensive unmount.
derivative-maker/unmount-helper at master · derivative-maker/derivative-maker · GitHub

It also attempts umount --lazy.

Not yet tested but I guess that’s what the CI is for.

Mycobee · September 9, 2022, 3:00pm

yes it is

Thanks !

Patrick · September 9, 2022, 3:10pm

As for file automated_builder/tasks/create_vm.yml may I suggest keeping the actual clean and build commands out of the CI?

There are various CI scirpts in derivative-maker/help-steps at master · Mycobee/derivative-maker · GitHub (those starting with ci_) but these can probably all be deleted since nowadays defunct.

The help-steps folder however could be the place where we a short or several short scripts which contain the actual build command.

Or better, let’s add a ci sub folder in the derivative-maker repository where the small scripts that define the actual clean and build command reside?

Reason: if the mount continues to make issues, I would use simpler build commands to just focus on the mount issue in quicker iteration.

Patrick · September 9, 2022, 3:15pm

The following…

- name: Build new gateway VM
  shell: "dist_build_non_interactive=true /home/ansible/derivative-maker/derivative-maker --flavor whonix-gateway-xfce --target virtualbox --build >> /home/ansible/build.log 2>&1"

Would stay mostly the same. Just it would do something like this:

- name: Build new gateway VM
  shell: "/home/ansible/derivative-maker/derivative-maker/ci/build-gateway >> /home/ansible/build.log 2>&1"

(dist_build_non_interactive=true would be set within ci/build-gateway.)

Mycobee · September 9, 2022, 3:16pm

Yup I will do that

Patrick · September 9, 2022, 3:17pm

---
- name: Clean existing gateway VM
  shell: "dist_build_non_interactive=true /home/ansible/derivative-maker/derivative-maker --flavor whonix-gateway-xfce --target virtualbox --clean > /home/ansible/build.log 2>&1"

- name: Clean existing workstation VM
  shell: "dist_build_non_interactive=true /home/ansible/derivative-maker/derivative-maker --flavor whonix-workstation-xfce --target virtualbox --clean >> /home/ansible/build.log 2>&1"

- name: Reboot VPS for stray loop devices
  reboot:
    reboot_timeout: 60
  become: true

- name: Build new gateway VM
  shell: "dist_build_non_interactive=true /home/ansible/derivative-maker/derivative-maker --flavor whonix-gateway-xfce --target virtualbox --build >> /home/ansible/build.log 2>&1"

- name: Build new workstation VM
  shell: "dist_build_non_interactive=true /home/ansible/derivative-maker/derivative-maker --flavor whonix-workstation-xfce --target virtualbox --build >> /home/ansible/build.log 2>&1"

Not sure it would be wise to combine all of them to a single script in ci folder?

Patrick · September 9, 2022, 3:17pm

Except - name: Reboot VPS for stray loop devices that might make more sense for the CI to take care of.

Mycobee · September 9, 2022, 3:21pm

How about:

ansible calls script in ci folder to clean vms
ansible reboots machine for loop devices
ansible call script in ci folder to build vms

I agree that leaving reboot functionality in ansible is a good idea so ansible knows to expect connection to the machine to break during reboot

Mycobee · September 9, 2022, 3:27pm

For longer term maintenance, I have a question.

Is there any way to speed up the builds to only run the relevant steps that have been affected in the code changes?

i.e. - Do we need to build ../monero-gui_0.18.1.0-1_all.deb when you only change a few things in the help steps or something?

Would be nice to have a “light” build or something for iterating more quickly. Currently it takes over 1.5 hours to build fresh workstation and gateway VMs. I’d love it if we could make it where you have the ability to get feedback more quickly

I guess though when troubleshooting you can always just SSH in to the VPS and run the troublesome build step manually and see what is causing the problem. Just a thought though, I want this CI feature to make your life easier

Patrick · September 9, 2022, 3:54pm

We have this thingy here: Whonix build script now optionally supports installing packages from Whonix remote repository rather than building packages locally

So just by adding…

--remote-derivative-packages true

No packages should be built and all packages would be downloaded from the Whonix binary repository. It would skip all the lengthy package creation.

How does that sounds?

How often will we use --remote-derivative-packages true? Maybe for git commits, use it. For git tags, do it “proper” and drop it?

Though, when using --remote-derivative-packages true we would not notice when package builds fail. But that isn’t very likely since if packages are updated, I need to build them locally anyhow.

Absolutely makes sense. In my previous build, a rookie mistake for forgetting $SUDO_TO_ROOT has lead to a failed build. Another 40 minutes to wait for me now until I can see if that is fixed now - unless I do a local build with local hacks which with the CI we’re trying to avoid.

That’s also why I suggested Derivative Maker Automated CI Builder - #74 by Patrick - because then I would hack the build command to a much simpler various to a point where only a minimal raw image gets created with even nothing useful inside just to test various mount / umount to quickly get that fixed.

Patrick · September 9, 2022, 3:59pm

My latest commit fix · derivative-maker/derivative-maker@03f6496 · GitHub didn’t get picked up by the CI on https://github.com/Mycobee/derivative-maker/actions yet.

Last time that was faster, I think. I am not complaining about the speed. Just thinking if that commit got lost, not there yet, it will never come.

Maybe the automated CI reboots could leak to some commits overlooked by the CI?

Mycobee · September 9, 2022, 4:31pm

This happened because I rebased and pushed pretty quickly after you commented, and didn’t give enough time for you to finish your stuff.

It isn’t an issue with CI or anything, simply me being a bit too trigger happy and pushing without rebasing 03f64961 yet