Automate recovering free space from VDI disks

This has been talked about before:

My proposal is to automate this

Overview

VirtualBox VDI disks can grow to take up a lot of space because free space is not automatically recovered on the host OS when the guest OS deletes files on its VDI disk. The free space can be recovered by the following steps:

  1. install the zerofree package from Debian on the guest (zerofree overwrites free space with zeroes, needed for VirtualBox to reclaim free space)
  2. delete/merge all VirtualBox snapshots (recommended) to have only one VDI disk per guest VM (booting from a snapshot will only recover space in the snapshot’s disk)
  3. delete unnecessary user-saved files on the guest
  4. purge unused packages, remove old kernels to free up space, delete cache files, etc
  5. make sure the disk is mounted read-only (needed for zerofree to run)
    • the easiest way is to reboot into the LIVE mode. If you have SYSMAINT, boot into LIVE SYSMAINT (required to use sudo)
    • or edit the boot kernel command line and append init=/bin/bash to the linux line
  6. run sudo zerofree -v /dev/sdaX on the ext4 disk (/dev/sda3 for Whonix 17.4.4.6)
  7. shut down the guest
    • if booted using init=/bin/bash, first run exec init to finish booting
  8. on the host OS, run VBoxManage modifymedium "path/to/guest/disk.vdi" ‑‑compact (--compact causes all contiguous zero-filled space on guest VDIs to be reclaimed by the host)
  9. check how much space has been recovered on the host disk

Automation

Whonix/Kicksecure could preinstall the zerofree package, and add an entry to the boot menu that performs steps 5-7. It could also add an entry to the GUI System Maintenance Panel that performs steps 4-7.

It may be possible to safely (by suspending all background/update processes?) remount the disk as read-only without rebooting with

echo s | sudo tee /proc/sysrq-trigger
sleep 2
echo u | sudo tee /proc/sysrq-trigger

(sync and remount all disks as read-only)
then

sudo zerofree -v /dev/sda3

(or sudo zerofree -v $(mount | grep "/dev/sda. on / type ext4" | awk '{print $1}'))

sudo mount -o remount,rw / &&
sudo mount -o remount,rw /boot/efi &&
shutdown -h now

Explanatory note presented to users:

  1. manually perform steps 1-3 (or 1-4)
  2. after guest VM shutdown, run VBoxManage modifymedium "path/to/guest/disk.vdi" ‑‑compact on the host

more info on zerofree from the VirtualBox manual

Testing

Disk space usage can be tested by saving a file with random data on the guest and deleting it

dd if=/dev/urandom bs=1048576 count=500 of=$HOME/500MB-random-data.test
rm $HOME/500MB-random-data.test

After shutting down the guest, the VDI disk will have grown by 500 MB on the host. Check in VirtualBox disk manager and click Refresh. Sizes in VirtualBox are only updated when the VM is shut down. Running both commands multiple times will increase the size a bit more (for example, I ran both commands 8 times and got around 1.5 GB increase in disk usage)

Maintaining low disk usage

It could also be encouraged that to keep disk usage as low as possible, users should:

  1. set up the guest VM and install all user-desired software first
  2. upgrade everything, and follow steps 1-9 above
  3. create a (disposable) snapshot for daily use, use the VM from the snapshot
When you need to upgrade
  1. restore the snapshot (without creating a snapshot of the current machine state)
  2. delete the snapshot
  3. upgrade inside Whonix and perform steps 1-9 above post-upgrade
  4. create a new disposable snapshot for daily use
1 Like

This is a neat idea. I would submit that we should be using fstrim rather than zerofree, since zerofree only works with ext4, does direct filesystem modifications which may be risky, and requires the filesystem be unmounted, whereas fstrim works on any filesystem that has TRIM support in the kernel, doesn’t require the filesystem to be unmounted, and (I believe) relies on the kernel’s filesystem code rather than a custom reimplementation. One can even mount a filesystem with the discard option to cause trimming to be done automatically by the kernel (we already do this on physical hardware with SSDs because trimming results in faster SSD performance). Maybe we could just start using that option everywhere?

(Combining TRIM and LUKS has some security implications, an attacker may be able to tell what kinds of data are on a disk when this is done and may be able to determine the filesystem in use, but the data itself should still be unrecoverable. See: encryption - Is using trim on a ssd with LVM LUKS safe? - Information Security Stack Exchange At the moment, we use TRIM with LUKS on SSDs only.)

Deleting snapshots and compacting the VDI file sounds like something that Whonix-Starter could do perhaps?

Worthy of note, libguestfs has a tool, virt-sparsify, that can do a lot of this also. It essentially boots the host’s own kernel in a virtual machine (using a supermin appliance), mounts the guest’s partitions in the VM, trims them (which does much the same job that zerofree would do), then either makes a minified copy of the disk image or makes the disk image on the host sparse. I have not yet studied the security of virt-sparsify very much, but functionally it does a good job. Of course, this assumes the host runs a Linux distro of some sort, since you have to run virt-sparsify on the host, not the guest.

1 Like

That’s a really cool use-case for live mode, sysmaint session.


Maybe zerofree could be run from inside initial ramdisk or early ar boot before the disk is mounted as read-write?

However, there are some data corruption concerns about zerofree


zerofree

zerofree device

It is possible that using this program can damage the filesystem or data on the filesystem.

Quote How to Compact a VHDX with a Linux Filesystem - Virtualization DOJO | Hyper-V

zerofree

It’s not recommend for ext4 or xfs file systems.

Quote guestfish zerofree on LVM ? - Libguestfs - Libguestfs List Archives - Richard W.M. Jones - 2011

HOWEVER, I would be cautious about using zerofree at all. It’s been
checked reasonably carefully against ext2/ext3, but I don’t think
anyone has looked at whether it does reasonable things on ext4
(particularly w.r.t. filesystems with extents).

So it might work, or it might silently corrupt files 
 It might be
advisable to keep a backup and check your filesystem before and after
with ‘virt-ls --checksum’.

Relevant credentials:

So this needs further research and/or we should ask Richard W.M. Jones if this is still the case.

That also begs the question, why no other Linux distribution has implemented this?


There are also some data corruption concerns about fstrim.

More on trim:

https://www.reddit.com/r/AlmaLinux/comments/zz9i25/why_is_there_no_fstrimtimer_service_running_by/


Virtualizer settings change may be required to make trim work:

VirtualBox:


KVM:

Those concerns are the resutl of buggy hardware. We already use trim on physical SSDs, so we’re not going to incur any additional risk enabling it on virtual machines too I would argue.

I think this person is confused about the threat model of using trim and FDE together. Even without using trim, an attacker can still likely determine the filesystem in use and what kinds of files are saved on disk if the disk was not entirely wiped with pseudorandom data first. But wiping drives with entirely pseudorandom data before each OS installation is not at all practical because of the extremely long amount of time it takes and the high amount of wear-and-tear it causes on storage, so this isn’t really much of an extra risk (unless it’s “good” for a filesystem to slowly become less and less distinctive as more data is written to it, and it’s bad if the filesystem becomes more distinctive after data is deleted). The claim that trim allows third-party access to “arbitrary data trimmed by fstrim” is misleading; a particular SSD might choose to not immediately wipe data after a trim, but that data is encrypted so it shouldn’t matter anyway. Similar issues apply with virtualization technology, it’s up to the hypervisor to support TRIM and implement it in whatever way it thinks is best.

1 Like

Quote ext4(5) — e2fsprogs — Debian trixie — Debian Manpages

discard/nodiscard

Controls whether ext4 should issue discard/TRIM commands to the underlying block device when blocks are freed. This is useful for SSD devices and sparse/thinly-provisioned LUNs, but it is off by default until sufficient testing has been done.

https://unix.stackexchange.com/questions/649964/is-mounting-with-discard-needed-for-trim mentions a performance impact and recommends fstrim over the discard mount option.

Quote Filesystems, Disks and Volumes - KubeVirt user guide

Thick and thin volume provisioning¶

Sparsification can make a disk thin-provisioned, in other words it allows to convert the freed space within the disk image into free space back on the host. The fstrim utility can be used on a mounted filesystem to discard the blocks not used by the filesystem. In order to be able to sparsify a disk inside the guest, the disk needs to be configured in the libvirt xml with the option discard=unmap. In KubeVirt, every disk is passed as default with this option enabled. It is possible to check if the trim configuration is supported in the guest by runninglsblk -D, and check the discard options supported on every disk.



Theodore Y. Ts’o (maintainer of ext4 according to Theodore Ts'o - Wikipedia) supposedly wrote in linux-ext4 - Re: discard and data=writeback wrote:

I personally don’t bother using mount -o discard, and instead
periodically run fstrim, on my personal machines. Part of that is
because I’m mostly just reading and replying to emails, building
kernels and editing text files, and that is not nearly as stressful on
the FTL as a full-blown random write workload (for example, if you
were running a database supporting a transaction processing workload).


So my advice is: If you have slow discard, just don’t use ‘discard’ mount
option. What is the problem with running fstrim(8) once a day instead? That
should yield much better results.



So perhaps fstrim would be the best solution?

Or maybe not. Issues were reported against VirtualBox.

May or may not be outdated nowadays.

Further research is required.