[archived] Previous, now Deprecated Whonix Windows Installer Testing

good to hear, but too bad srep doesn’t produce deterministic results

Good day,

That might be an issue. Anyways, thank you very much, for telling me beforehand, otherwise I would have likely started to rip out my hair in confusion. That sadly would mean that I’d have to adjust my verification process. Will think of a solution.

Have a nice day,

Ego

I would appreciate if deterministic results would be preferred over disk
space and extraction time.

It does not matter for the time being, but long term we’ll be making
Whonix reproducible, and then this is one item less which where we need
to hunt down non-determinism.

Good day,

I see. Very reasonable and makes verification across systems easier. Will still look into trimming the installer down though, while keeping the deterministic character.

Have a nice day,

Ego

1 Like

Btw we have another issue with compression of KVM images.
https://phabricator.whonix.org/T605
If you have any insights to improve that… @anonymous1

I’m not experienced as to how to improve xz compression, assuming you don’t want to experience with less known compressors.

What I do know is that freearc (with its many unique compression filters and technologies such as srep) and nanozip are the best compressors around in terms of both speed and compression ratio. I don’t know the details of the ticket you mentioned but I think srep might be the best tool to speed it up again. But then it is not deterministic, I’m not sure if there is a way to make it so

Could it help trying tar implementation of other programs like 7zip?

bsdtar or star didn’t help?

The other deduplication tool “rep” from freearc apparently produces deterministic results, though not as powerful as srep the results are still significant and it is fast.

It is available as a separate executable which could be used after tar like I did with srep above.

http://freearc.org/download/research/rep.zip

It is also built into freearc itself and if compressed using freearc the file would be extracted in a single step.

Some tests using different dictionary sizes:

original 3,89 gb
rep:96mb 3,05 gb
rep:200mb 2,92 gb
rep:256mb 2,90 gb
rep:1536mb 2,84 gb

extraction time around 1:30 minutes

Decompression memory is twice the size of dictionary, so if you choose rep:256mb it will require 512 MB ram on extraction

If you decide to use rep and have any questions I would gladly help.

Ah, my fault. No need to tar in this case as it doesn’t change the results as in srep’s case. Try these commands:

compression:
rep -b256mb gateway.ova gateway.rep
rep -b256mb workstation.ova workstation.rep

decompression
rep -d gateway.rep gateway.ova
rep -d workstation.rep workstation.ova

Though you could use any arbitrary extension instead of rep

I should have found this earlier, srep is deterministic too :smiley: if you change the hash to one of these options:
-hash=md5
-hash=sha1
-hash=sha512
or disabling it with -hash- Is there any benefit of using hashes? Not using will make it faster

I think the road is clear to using this in Whonix Installer

@Patrick could you try srep for that ticket too, you may need to play with another option in case it doesn’t handle the file efficiently:

Default settings (-l512) allow to process files that are 10x larger than RAM size. Memory requirements are proportional to 1/L, so by increasing -l option value it’s possible to process even larger files. For example, with -l64k RAM usage will be about 1/1000 of filesize.

If you have more than 10 GB RAM it may not be necessary, otherwise try changing this option.

This link includes both 32-bit and 64-bit executables for linux and windows along with the full source code necessary to build it.

http://freearc.org/download/research/srep393a.zip

1 Like

@Patrick have you seen this?

Whonix KVM:
Btw there is one more requirement for these compression tools.
Usability. These compression tools should be installable from Debian
(and Fedora) repository. Otherwise we would have to demand from users to
download, verify and install software from the internet, which can go
wrong and makes the instructions less usable and more lengthy.

freearc for example is not in Debian.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=689017

Sorry for not stating that beforehand. That was a wrong assumption for
me to make.

Windows installer:
Ideally, it works as is. An exe that works out of the box. No extra
prerequisite software download required.

according to this

bsdtar should fix your problem, did you try yet

@Ego I have tested all settings of srep and these provide the optimal size and performance for decompression along with deterministic compression:

compression:
7za a -ttar whonix.tar gateway.ova workstation.ova
srep -m4f -l128 -hash- whonix.tar

decompression:
srep -d -hash- whonix.tar.srep - | 7za x -ttar -si

Decompression requires around 1 GB RAM, use this instead and it will not require any RAM:

srep -m4o -l128 -hash- whonix.tar

But now extraction would take around 1 minute longer

Both of these options reduce the total size from 3,89 GB to 2,06 GB

1 Like

Good day,

Thank you very much. That’s exceptional. Tarball though doesn’t hash in a consistent manner in it’s standard implementation, as far as I know, since it includes timestamps and other aspects dependent on the machine the archive was created on.

I’ll thus have to look into a Windows compatible solution to remove those first.

Have a nice day,

Ego

Are you thinking of using some command line tool(s) to fix the file attributes? At least I know that creating a tar using these commands creates the same file each time on my machine

Good day,

I haven’t yet tried different implementations. I only know that tarball by design non desterministic.

Have a nice day,

Ego

It could be deterministic:
https://www.gnu.org/software/tar/manual/html_section/tar_33.html
https://reproducible-builds.org/docs/archives/

If we have the same timestamps, filenames and create the tar using the same commands, 7za should create identical files as I found out it doesn’t add user name or ids by default, it uses zeros. But to use these other options such as --mtime we need gnu tar itself, like using linux in a vm or cygwin/mingw. Most tar.exe binaries online are old and don’t support --mtime. Git for windows includes latest version of tar.exe so it can be grabbed from there. You may also build it yourself.

We may also overwrite the file mode with --mode= option to ensure it will be the same. 7za is creating tars with mode 1777 while tar.exe creates with 644 for me.

Notes to @Patrick

For really deterministic tars, you should probably add --sort=name --owner=0 --group=0 --numeric-owner and use 00:00:00Z instead of 00:00:00 in mtime to specify the UTC timezone.

Let’s compare now if 7za creates deterministic tars or not

Use the stable whonix ovas named as gateway.ova and workstation.ova. Set their timezones to 2000-01-01 00:00:00 UTC. I used touch.exe from “git for windows” package:

touch -d “2000-01-01 00:00:00 UTC” gateway.ova
touch -d “2000-01-01 00:00:00 UTC” workstation.ova

now let’s compress

7za a -ttar whonix.tar gateway.ova workstation.ova
srep -m4f -l128 -hash- whonix.tar

my SHA256 hashes

whonix.tar
F2265763F18717328E10FB8FA7FBC589B6E4D8C84F3DFEBDFE69213C51108557
whonix.tar.srep
0BF398F5C137360DFC2773BEA56502F24A7D302FF33759F53E9CB233678C18BD