improving compression of Whonix image downloads

7z a -t7z -m0=lzma2 -mx=3 -mfb=32 -md=1m -ms=on archive.7z Whonix-Gateway-XFCE-14.0.1.4.4.raw Whonix-Workstation-XFCE-14.0.1.4.4.raw

du -sh archive.7z
1.3G archive.7z

Yes, that’s the drawback and probably the price to pay for having much smaller download sizes.

I know, but it’s really no big work changing the default path from /usr to /opt for a deb package.
Creating deb packages is rather simple and straightforward.
The installation path change for the makefile is required for the binary, because the binary executable is searching for its libs and files in paths compiled into the binary. If that is done, the rest, packing them in a deb archive isn’t much work.
Especially if these are only a small amount of Tor or Whonix specific packages.

Just a question, how many Tor or Whonix specific packages that are not Debian packages are in Whonix gateway and workstation at the moment?

I know that. I did use that as an example because the rest, creating the deb isn’t a real problem.
deb packages work just fine with having files in another path like /opt as long as the executable was created for the new path. In other words, the work is in the Makefiles and compiling, the rest is only packaging the files to a deb package.

Dependencies are not an issue, because a program that is installed to /opt by its deb package can easily search for its required libs in /usr. This is all done in the Makefile process.
And thus this can also be worked out for dependencies. There is really no big change in that.

The only exception is, when you have to replace a Debian package with a Whonix specific package and having other still Debian deb packages be dependent on this replaced Debian package.
Only then you have a small problem. Because the programs of these other Debian deb packages will search in /usr, not in /opt.
But that’s why symlinks are there for.
So it’s still solvable. the Whonix package,which replaces the Debian package has just to also provide and create the symbolic links in /usr for the files in /opt.

Well most Debian packages that depend on another Debian package just do this, because they need some specific files in the path which is provided by the other Debian package. Only very view Debian packages are shipped with some sort of script, that change some configuration information in a file, that was provided by the other package it depends on.
Thus you can in most times just overwrite the dependency checking for this step, because it is solved anyway as soon as the missing packages are installed after the first boot.
Only if you have some script, that change a config, you need to reinstall it after installing the Whonix package.
But this can be solved too, by just providing the Debian package in the /var/apt/cache/apt/archives/ directory and installing it, after the other Whonix specific packages are installed.
The only drawback of this approach is, that you will have some deb packages as duplicates in gateway and workstation.

That’s sad to hear.

I agree on that a generic way by using a compression tool is an easier approach with less work, but i highly doubt that this can fight all duplicates.
I also agree on that it is a good idea to do more tests with different compression tools and other compression settings to find a better solution.

But on the other side i also doubt that a generic compression tool can solve the principle problem.
The actual problem is, that the compression tool must or should be clever enough and be able to understand file-systems and vm images, only when this is the case, the duplicates can really be sorted out. But generic compression tools can’t do that.
Maybe we should also take a look into backup tools. They understand that on a filesystem level or use at least the filesystem support of the system for that.

With a generic approach, you will always have duplicates in two vm images, but the byte sequence differs because block arrangement or meta information is different. Especially when the vm image does use compression too.

New suggestion 4:

I have a new idea for a new approach that will simplify a lot and be able to fight all duplicate problems.
You could provide two vm starter images for Whonix workstation and Whonix gatewith with the absolute minimum just to boot a kernel and be able to install deb packages.

And additional to this, we provide a big data vm image which contains ALL deb packages that are required for whonix workstation or whonix gateway.

The user loads a VM configuration file that knows about mounting the initial starter vm image and the data image. Also to mention, the data image is only mounted read only.
Then he boots both Whonix systems, workstation and gateway.

And on workstation and gateway a script runs at first boot time and installes all deb packages that are required for workstation or gateway from the data partition provided with the data vm img.
If that is done, the user does have his Whonix workstation and gateway system.
The data img can be removed from the VM after that and deleted from the disk.

By doing it this way, you don’t have to hassle about /opt or /usr, or a /usr partition, you can use any compression tool and you can still fight most duplication files, because all the deb files that are duplicates at the moment are uniq on the data partition, with a small amount of exceptions in the small bare starter images. They will still have the same kernel and thus some duplicates, but the amount is very small.

Suggestion 5

Thinking about my suggestion 4, which i personally like the best at the moment, i am wondering if a installer image wouldn’t be the best solution of all.
With a installer image the user would download just one image and when the installer image boots up it allows to select between installing a Whonix Gateway or a Whonix Workstation system.
With the installer, all problems with duplicates are gone and a installer is also much more versatile than VM images. You could for example use an installer image to install Whonix Workstation or Gateway on real hardware.
With other architecture specific deb files, the installer image could also be used to provide a solution for non x86 hardware, like for example an arm based Raspberry Pi.
I am also wondering why Whonix isn’t using an installer in the first place like other distributions do?

2 Likes

For you it may be simple but for me it’s a high mountain to climb. That’s why I call it unrealistic. If I saw an implementation of this, I might change my mind. But unless you intent to fork Whonix, this isn’t a great exercise (unless it costs you very, very little time). I estimate this this will introduce way too much extra complexity for relatively small gains. There are few people who told me that they looked into the build script, understood it or let alone contributed major things to it. So I wouldn’t want to make it even more difficult than it already is.

More and more complexity which then leads to follow up issues such as symlinks vs apparmor.

May be possible but extra source code, extra complexity.

Perhaps you’re more clever than me. However, the volunteer workforce contributing to Whonix is rather small. So while me being scared by he extra complexity I am still useful. (Unless, you fork Whonix and do everything better. I wouldn’t mind about that either. I could be a contributor and also I am sure I’d find other fun things in life too.)

I concentrate on something easy and yet impact such as kloak. I would encourage you to contribute. Perhaps you could implement something that fixes Advanced Deanonymization Attacks - Whonix or some of the open tickets ⚓ Query: Open Tasks?

Whonix ¡ GitHub

  • Historic growth. Limited time. Flood of issues.
  • Lack of source code contributors.
  • Requires development work.
  • Whonix is the only distribution that specialized on deploying VMs for end-users that I am aware of. Build script, deployment methods (platform independent ova’s), it’s maintainable. During Whonix’s total lifetime 2012 to time of writing in 2019, I managed to keep up with all underlying changes (Debian, Tor, virtzalizers).
1 Like

Very interesting read, thank you @Firefox for the amazing efforts you put into this well-thought reflection.

But while it’s always nice to have smaller images to download, I agree that the amount of work needed for what would eventually a rather small gain is probably not worth it. If you look at the standard Linux distributions, Whonix is actually on the lighter side regarding the size of the downloadable image files. Ubuntu is 1.8 GB, Tails 1.2 GB, any other distro probably somewhere in-between, if not more. As a matter of fact, Whonix images size has already been greatly reduced since 2018 (going from 2 files of 1.7 and 2 GB to one single file of 1.6 GB for the ova file, 1.1 GB for the libvirt file).

IMHO anything that adds complexity for the end-users must be avoided at all costs if possible, especially for a vanilla install and first use. It may be seem absurdly easy to you, but many people already struggle with the basic task of downloading and setting up VirtualBox and importing two simple .ova files (I am not mentioning KVM on purpose as its users are probably more tech savvy). I couldn’t imagine how it would be if they had to run some kind of script on top of that. Not even mentioning all the extra work for the already understaffed Whonix team and the many many inevitable bugs to solve.

Speaking of which, judging by your level of knowledge and commitment, I think the Whonix project would greatly benefit from your help and contributions! :slight_smile:

2 Likes

Okay, i understand.

What about the Debian installer?
Have you ever thought about adapting it for Whonix?

That’s more packages than i have expected.

Yes. Complex thing. A ton of work.

Whonix 0.2 or so build process was based on automating (preseeding) Ubuntu installer. Fragile, messy.

Yeah. Some stuff should be merged. anon-mixmaster fits into anon-apps-config and more. Long time ago, I acted on bad advice and overdid the split.

Progress on that was made. Calamares based. Search the forums for Whonix Host or Calamares.

du -h whonix_binary/*

(Dropping unimportant files such as signatures.)

522M whonix_binary/Kicksecure-XFCE-15.0.0.5.4.libvirt.xz
813M whonix_binary/Kicksecure-XFCE-15.0.0.5.4.ova
2.6G whonix_binary/Kicksecure-XFCE-15.0.0.5.4.qcow2
3.9G whonix_binary/Kicksecure-XFCE-15.0.0.5.4.raw

du -h /home/user/VirtualBox\ VMs/Kicksecure-XFCE/Kicksecure-XFCE.vdi 

2.9G /home/user/VirtualBox VMs/Kicksecure-XFCE/Kicksecure-XFCE.vdi

The libvirt.xz is a lot smaller than the ova.

How can we improve compression of the ova?

1 Like

This might not have any effect since we are already using zerofree.

https://github.com/Whonix/Whonix/commit/e4a86cd9315ded19164fbac339411bba53c31f07

Could fstrim be of any use?

https://manpages.debian.org/buster/util-linux/fstrim.8.en.html

VBoxManage convertfromraw whonix_binary/Kicksecure-XFCE-15.0.0.5.4.raw whonix_binary/Kicksecure-XFCE-15.0.0.5.4-convertfromraw-variant-stream.vmkd --format vmdk --variant Stream

That did not go a long way.

813M Kicksecure-XFCE-15.0.0.5.4-convertfromraw-variant-stream.vmkd

That is what is already happening.

813M Kicksecure-XFCE-15.0.0.5.4.ova

I figured out we might speed up the build process. Even though vmdk is not VirtualBox’s own / best supported file format, it’s defined in the OVA standard.

Current Whonix VirtualBox build process: raw image creation → create vdi from raw → create VirtualBox VM → stop here for most users → prepare_release script can create ova (which makes VirtualBox convert the vdi to vmdk)

(And when user imports the ova, VirtualBox will convert back to vdi by default nowadays. A waste of time. But unfixable until/if OVA supports vdi. Maybe the ova could be manually created without use of vboxmange but that could be more prone to bugs during VM import.)

Whonix VirtualBox build process could be speed up by skipping the convert to vdi step but that would require different build --targets being implemented. Perhaps --target virtualbox-vdi (handy for use after build) and --virtualbox-vmdk (handy for ova creation). But convert form raw to vdi is taking only two minutes so maybe not worth the hassle.

When using installers (Whonix Windows Installer - Design Documentation) these might improve compression on top of it.

1 Like

What happens when you xz compress the ova? If it doesn’t go down to approximately the same size as the libvirt file there must be more data in there.

1 Like

I didn’t try that as this would complicate VM import instructions.

Not necessarily. Compressing already compressed data might not be as compressible as plain data. Or let’s say, a compressor works best with what a compressor is designed to handle.

Sure, but you would know if it is (maybe) related to the compression algorithm.

1 Like