speed up libvirt tarball creation time

phabricator-migrator · February 19, 2024, 5:44pm

Information

ID: 605
PHID: PHID-TASK-rmlmclldrixkknshl67q
Author: Patrick
Status at Migration Time: open
Priority at Migration Time: Wishlist

Description

The current tar.xz compression code is a burden since it literally takes hours. Currently building VBox and KVM images until upload finished therefore takes more than a day.

Using tar --xz and --mtime="2014-05-06 00:00:00" so the archives are deterministic.

Using --sparse…

-S, --sparse
    handle sparse files efficiently

The replacement requirements:

faster than current one
deterministic
handle sparse files efficiently
** currently the result of the compression is reducing the a sparse file with a real size of ~ 4.5 GB (and apparent size 100 GB) workstation qcow2 file to ~ 1.5 GB tar.xz.
** the new file size should be similarly small
** (not 100 GB reduced to ~ 30 GB)

The priority is high, since this reduces my motivation to create Non-Qubes-Whonix images.

Comments

HulaHoop

2017-01-15 02:27:14 UTC

Patrick

2017-01-15 05:09:41 UTC

HulaHoop

2017-01-15 15:13:37 UTC

OK so its reproducibility > speed

Most compression algorithms are deterministic. Being “adaptive” in no way contradicts being “deterministic”: it only means varying behavior based on input, so if the input is the same, so will be the output.
You can easily verify this by compressing the same file several times using an algorithm of your choice (zip, gzip, bzip2, 7z, etc.) and comparing the outputs. For example on linux, you can run this command several times to compress the file /etc/fstab and compare if its checksum is the same each time: gzip < /etc/fstab | md5sum -

though the algorithm itself is indeed deterministic, the implementation will sometimes store additionnal information (file permissions, timestamps etc) which can make it look like the output is not deterministic. adding a touch on the file between the compress and decompress can generate a different zip even though the file’s content did not change. That being said, it’s still deterministic once all parameters are factored in.

Any Deterministic Compression Algorithms out There? - Software Engineering Stack Exchange

The next best algo in speed is gzip and it explicitly supports disabling timestamps with

gzip:!timestamp

tar(1)

With lz4 and tar this may be possible as it is with gzip:

Preserve timestamp when compressing files with lz4 on linux

compression - Preserve timestamp when compressing files with lz4 on linux - Stack Overflow

anonymous1

2017-01-18 20:31:32 UTC

anonymous1

2017-01-19 08:38:09 UTC

anonymous1

2017-01-19 08:41:39 UTC

anonymous1

2017-01-19 20:29:45 UTC

Patrick

2017-01-20 12:58:31 UTC

Patrick

2017-01-20 13:12:41 UTC

Patrick

2017-01-21 00:04:55 UTC

anonymous1

2017-01-21 03:25:58 UTC

HulaHoop

2017-01-23 00:10:43 UTC

anonymous1

2017-03-07 20:59:04 UTC

Patrick

2017-03-10 03:44:48 UTC

The xz command just uses about 10% of CPU and just about 90 MB RAM. iotop -a is below 1%.

Building on Debian stretch with xz-utils 5.2.2-1.2 / tar 1.29b-1.1. Using ext4 as file system.

Any idea why system load is so low? I’d like a much higher load so it goes faster.

libvirt_compress function at time of writing:
developer-meta-files/release/prepare_release at bb1907e319acda314a1c57df200ff1696f979971 · Kicksecure/developer-meta-files · GitHub

The compression command from bash xtrace.
tar --create --verbose --owner=0 --group=0 --numeric-owner --mode=go=rX,u+rw,a-s --sort=name --sparse '--mtime=2015-10-21 00:00Z' --xz --directory=/home/user/whonix_binary --file Whonix-Gateway-14.0.0.4.0.libvirt.xz Whonix-Gateway-14.0.0.4.0.qcow2 Whonix-Gateway-14.0.0.4.0.xml Whonix_external_network-14.0.0.4.0.xml Whonix_internal_network-14.0.0.4.0.xml
The whole prepare_release script now took 21 minutes for Whonix-Gateway. libvirt archive creation is still that part that takes the longest time. Archive size: 1.2 GB.

By adding environment variable XZ_OPT="-0", time is down to 4:45 min, size is up to 1.4 GB.

By adding environment variable XZ_OPT="-2", time is down to 7:45 min, size is up to 1.3 GB.

(XZ_OPT="-0 --fail" makes it fail as expected. Did that as a test to see if environment variable XZ_OPT is honored.)

! In T605#11756, @anonymous1 wrote:
@Patrick good news

tar has finally added support for SEEK_DATA/SEEK_HOLE for sparse file detection in latest version 1.29

GNU tar - News: tar 1.29 [Savannah]

Upgrading to this version should speed up your compression without changing any command. Please let me know how it goes

As per GNU tar - News: tar 1.29 [Savannah] it should be automatically using seek hole detection on systems that support it. How do I find out if my system supports it or how to enable it?

If you have other ideas to speed it up / shrink size while keeping it reproducible, could you suggest changes to the prepare_release script please? Perhaps by making a github pull request?

anonymous1

2017-03-10 04:56:52 UTC

anonymous1

2017-03-10 05:03:47 UTC

anonymous1

2017-03-10 05:24:19 UTC

anonymous1

2017-03-10 05:31:18 UTC

anonymous1

2017-03-10 11:54:05 UTC

anonymous1

2017-03-10 12:25:58 UTC

It seems you could use something like this with xz utils:

export XZ_OPT=“–threads=0”

 -T threads, --threads=threads
Specify the number of worker threads to use. Setting threads to a special value 0 makes xz use as many threads as there are CPU cores on the system.
The actual number of threads can be less than threads if the input file is not big enough for threading with the given settings or if using more
threads would exceed the memory usage limit.

Currently the only threading method is to split the input into blocks and compress them independently from each other. The default block size
depends on the compression level and can be overriden with the --block-size=size option.

Patrick

2017-03-10 17:06:23 UTC

Patrick

2017-03-10 17:12:46 UTC

set and export XZ_OPT=“–threads=0” makes sense either way. Therefore added.
set and export XZ_OPT="--threads=0" to speed up libvirt archive creation

This might also speed up other operations where xz is used internally by
other packages.

Thanks to @anonmos1 for the suggestion!

https://phabricator.whonix.org/T605
https://github.com/Whonix/Whonix/commit/5d125180051fa55b5ec1ce50e16cdc8db6d6906d

anonymous1

2017-03-10 17:50:03 UTC

anonymous1

2017-03-10 17:53:58 UTC

Patrick

2017-03-10 18:11:32 UTC

Patrick

2017-03-10 18:22:08 UTC

anonymous1

2017-03-10 19:02:58 UTC

anonymous1

2017-03-10 19:32:30 UTC

anonymous1

2017-03-10 19:40:42 UTC

Patrick

2017-03-10 19:53:59 UTC

! In T605#12649, @anonymous1 wrote:
I think the default settings are optimal

Okay.

–threads=8 should still work on slower machines, however it would work like --threads=4 or --threads=2 I guess. In that case choosing the default threads is up to you, could you try with 4? it may not be too different from 8

4 uses only 50% of CPU.

Done, made that 8:

https://github.com/Whonix/Whonix/commit/17581ebbd05cc04f5ed52637e675481ddecc0845

! In T605#12650, @anonymous1 wrote:
If you have 8 threads and if using more than 8 produces same checksum as 8, then what I said would be true

I would recommend 4 max, but it’s your choice

It’s also a good idea to test with same threads on different machines whether there is any variation or not

Theoretically lets say a single core machine might produce a different checksum than a quad core due to threads. But I doubt that. It’s probably not using physical cpu threads but virtual cpu threads. top -H shows easily more than 500 virtual threads on a usual linux system.

A few more threads than physical threads will probably only have a negligible performance penalty. 8 vs 4 should not matter on slow system. (However I speculate 10000 threads would cause significant overhead.)

! In T605#12665, @anonymous1 wrote:
I think in the worst case you could care less about a perfectly reproducible end archive (tar.xz) and instead focus on the extracted (tar) file being reproducible

Having the final file reproducible makes verification instructions and automation a lot easier.

Then it’s just “rebuild the libvirt.xz, and compare the hashes”.

Otherwise it’s "rebuild libvirt.qcow2, download libvirt.xz, extract the qcow2, and compare the hashes, of the qcow2 files not libvirt.xz files.

Hypothetical the compressed libvirt.xz could contain an exploit against xz that compromises the system during decompression. By having reproducible libvirt.xz we can exclude that.

For now it does not really matter if libvirt.xz is reproducible. It’s very far forward thinking, since Whonix reproducible images are for now far away unfortunately, see:

https://forums.whonix.org/t/is-whonix-reproducible-yet-backdoor-protection

anonymous1

2017-03-10 20:27:19 UTC

anonymous1

2017-03-10 20:51:50 UTC

Patrick

2017-03-10 21:13:40 UTC

anonymous1

2017-03-10 21:19:18 UTC

anonymous1

2017-03-10 21:22:07 UTC

anonymous1

2017-03-10 21:36:16 UTC

anonymous1

2017-03-10 21:48:25 UTC

anonymous1

2017-03-10 22:53:43 UTC

Sorry for all this confusion, I think it is only a difference between whether the program “tries” to operate in a single-threaded mode or multi-threaded mode, when we use --threads=1 or don’t specify it (default is 1) it compresses the whole file in a single block, however setting --threads 0 or bigger than 1 triggers the multi-threaded mode and the file is split into blocks depending on the compression level and then compressed resulting in a difference in the archive file. how many threads actually used is irrelevant. changing compression level or manually specifying the block sizes will change the outcome.

with setting --threads 8 instead of 0 we actually enforce the multi-threaded mode (splitting the file into blocks) and prevent at least one cause of non-determinism: when --threads is set to 0 inside a single core machine, xz operates in “single-threaded mode” and compresses the file in a single block whereas setting this to anything higher than 1 enforces “multi-threaded mode” without being multi-threaded at all but still splits and compresses the file in blocks. You can see this with a VM

so it should be safe to set --threads to 8 or 16 or higher, this option means: “use at most NUM threads”

anonymous1

2017-03-11 15:13:17 UTC

Patrick

2017-03-13 11:40:30 UTC

Patrick

2019-04-04 18:17:58 UTC