[archived] Previous, now Deprecated Whonix Windows Installer Testing

The other deduplication tool “rep” from freearc apparently produces deterministic results, though not as powerful as srep the results are still significant and it is fast.

It is available as a separate executable which could be used after tar like I did with srep above.

http://freearc.org/download/research/rep.zip

It is also built into freearc itself and if compressed using freearc the file would be extracted in a single step.

Some tests using different dictionary sizes:

original 3,89 gb
rep:96mb 3,05 gb
rep:200mb 2,92 gb
rep:256mb 2,90 gb
rep:1536mb 2,84 gb

extraction time around 1:30 minutes

Decompression memory is twice the size of dictionary, so if you choose rep:256mb it will require 512 MB ram on extraction

If you decide to use rep and have any questions I would gladly help.

Ah, my fault. No need to tar in this case as it doesn’t change the results as in srep’s case. Try these commands:

compression:
rep -b256mb gateway.ova gateway.rep
rep -b256mb workstation.ova workstation.rep

decompression
rep -d gateway.rep gateway.ova
rep -d workstation.rep workstation.ova

Though you could use any arbitrary extension instead of rep

I should have found this earlier, srep is deterministic too :smiley: if you change the hash to one of these options:
-hash=md5
-hash=sha1
-hash=sha512
or disabling it with -hash- Is there any benefit of using hashes? Not using will make it faster

I think the road is clear to using this in Whonix Installer

@Patrick could you try srep for that ticket too, you may need to play with another option in case it doesn’t handle the file efficiently:

Default settings (-l512) allow to process files that are 10x larger than RAM size. Memory requirements are proportional to 1/L, so by increasing -l option value it’s possible to process even larger files. For example, with -l64k RAM usage will be about 1/1000 of filesize.

If you have more than 10 GB RAM it may not be necessary, otherwise try changing this option.

This link includes both 32-bit and 64-bit executables for linux and windows along with the full source code necessary to build it.

http://freearc.org/download/research/srep393a.zip

1 Like

@Patrick have you seen this?

Whonix KVM:
Btw there is one more requirement for these compression tools.
Usability. These compression tools should be installable from Debian
(and Fedora) repository. Otherwise we would have to demand from users to
download, verify and install software from the internet, which can go
wrong and makes the instructions less usable and more lengthy.

freearc for example is not in Debian.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=689017

Sorry for not stating that beforehand. That was a wrong assumption for
me to make.

Windows installer:
Ideally, it works as is. An exe that works out of the box. No extra
prerequisite software download required.

according to this

bsdtar should fix your problem, did you try yet

@Ego I have tested all settings of srep and these provide the optimal size and performance for decompression along with deterministic compression:

compression:
7za a -ttar whonix.tar gateway.ova workstation.ova
srep -m4f -l128 -hash- whonix.tar

decompression:
srep -d -hash- whonix.tar.srep - | 7za x -ttar -si

Decompression requires around 1 GB RAM, use this instead and it will not require any RAM:

srep -m4o -l128 -hash- whonix.tar

But now extraction would take around 1 minute longer

Both of these options reduce the total size from 3,89 GB to 2,06 GB

1 Like

Good day,

Thank you very much. That’s exceptional. Tarball though doesn’t hash in a consistent manner in it’s standard implementation, as far as I know, since it includes timestamps and other aspects dependent on the machine the archive was created on.

I’ll thus have to look into a Windows compatible solution to remove those first.

Have a nice day,

Ego

Are you thinking of using some command line tool(s) to fix the file attributes? At least I know that creating a tar using these commands creates the same file each time on my machine

Good day,

I haven’t yet tried different implementations. I only know that tarball by design non desterministic.

Have a nice day,

Ego

It could be deterministic:
https://www.gnu.org/software/tar/manual/html_section/tar_33.html
https://reproducible-builds.org/docs/archives/

If we have the same timestamps, filenames and create the tar using the same commands, 7za should create identical files as I found out it doesn’t add user name or ids by default, it uses zeros. But to use these other options such as --mtime we need gnu tar itself, like using linux in a vm or cygwin/mingw. Most tar.exe binaries online are old and don’t support --mtime. Git for windows includes latest version of tar.exe so it can be grabbed from there. You may also build it yourself.

We may also overwrite the file mode with --mode= option to ensure it will be the same. 7za is creating tars with mode 1777 while tar.exe creates with 644 for me.

Notes to @Patrick

For really deterministic tars, you should probably add --sort=name --owner=0 --group=0 --numeric-owner and use 00:00:00Z instead of 00:00:00 in mtime to specify the UTC timezone.

Let’s compare now if 7za creates deterministic tars or not

Use the stable whonix ovas named as gateway.ova and workstation.ova. Set their timezones to 2000-01-01 00:00:00 UTC. I used touch.exe from “git for windows” package:

touch -d “2000-01-01 00:00:00 UTC” gateway.ova
touch -d “2000-01-01 00:00:00 UTC” workstation.ova

now let’s compress

7za a -ttar whonix.tar gateway.ova workstation.ova
srep -m4f -l128 -hash- whonix.tar

my SHA256 hashes

whonix.tar
F2265763F18717328E10FB8FA7FBC589B6E4D8C84F3DFEBDFE69213C51108557
whonix.tar.srep
0BF398F5C137360DFC2773BEA56502F24A7D302FF33759F53E9CB233678C18BD

anonymous1:

Notes to @Patrick

For really deterministic tars, you should probably add --sort=name --owner=0 --group=0 --numeric-owner and use 00:00:00Z instead of 00:00:00 in mtime to specify the UTC timezone.

Thanks!

That alone did not work for me. Solution here:

https://www.whonix.org/pipermail/whonix-devel/2017-January/000852.html

This is related to recent work on using cowbuilder to build Whonix
packages as well as making orig.tar.xz / debian.tar.xz archive
generation deterministic. ( https://phabricator.whonix.org/T52 )

Relevant except of genkmfile.

   LC_ALL=C.UTF-8
   TZ=UTC
   export LC_ALL TZ

   find \
      "." \
      -not -iwholename '*.git*' \
      -print0 \
         | tar \
            --null \
            --no-recursion \
            --create \
            --verbose \
            --owner=root --group=root --numeric-owner \
            --mode=go=rX,u+rw,a-s \
            --sort=name \
            --mtime='2015-10-21 00:00Z' \
            --xz \
            --file="$make_upstream_tarball_real_path" \
            -T \

that mtime command ‘–mtime=2015-10-21 00:00Z’ does not work for me.

do you mean it works but you see 02:00 instead? maybe your system time zone was not set to UTC, in that case it should only be a cosmetic issue, I think

–owner=root --group=root

this is apparently different than using --owner=0 --group=0 as it adds some ids to the file, with the latter the ids are saved as zero. for me it is cleaner. you could see that by creating a small tar file and opening it under a notepad.

anonymous1:

that mtime command ‘–mtime=2015-10-21 00:00Z’ does not work for me.

do you mean it works but you see 02:00 instead?

Yes.

maybe your system time zone was not set to UTC, in that case it should only be a cosmetic issue, I think

Right. Set time zone to UTC during genmkfile now.

–owner=root --group=root

this is apparently different than using --owner=0 --group=0 as it adds user/group names and ids to the file as root, with the latter they are not added and the ids are saved as zero. for me it is cleaner. you could see that by creating a small tar file and opening it under a notepad.

Using the recommendation as per
Archive metadata — reproducible-builds.org.

Using root/root and --numeric-owner is a safe bet, as it will effectively record 0 as values:


GNU ar and other tools from binutils have a deterministic
mode which will use zero for UIDs, GIDs, timestamps, and use consistent file modes for all files.

Even they say they are trying to achieve zeros but somehow when I use root it doesn’t really record zeros as when I use 0. Maybe I’m missing something or maybe them. They also didn’t provide any recommendation for setting file modes on the page other than mentioning binutils.

As a side note when I use --owner=0 --group=0 no names are added to archive and ids are filled with zeros but with --owner=root --group=root ids are definitely not filled with zero on my machine and the name “root” is added twice, --numeric-owner removes the names but doesn’t change the ids while it doesn’t (need to) do anything if 0s are used instead.

This is from strip-nondeterminism source code for ar archives:

    # mtime
    syswrite $fh,
      sprintf("%-12d", $File::StripNondeterminism::canonical_time // 0);
    # owner
    syswrite $fh, sprintf("%-6d", 0);
    # group
    syswrite $fh, sprintf("%-6d", 0);
    # file mode
    syswrite $fh,
      sprintf("%-8o", ($file_mode & oct(100)) ? oct(755) : oct(644));

So it is not really a safe bet or reasonable to use “root” just in the end to get to “0”. That assumes root is 0, which may not be the case, especially in my case as in using tar.exe in windows. I just tried tar in whonix and both commands produced the same output, however in windows it is not the case and the safest bet is to use “–owner=0 --group=0 --numeric-owner” to keep determinism across operating systems. You may want to report this “upstream”

I am not familiar with these: --mode=go=rX,u+rw,a-s

what does it do? what are the benefits compared to using --mode=644 or --mode=755

does it always produce same permissions across systems?

Got a reply for my tar / mtime question.

http://lists.alioth.debian.org/pipermail/reproducible-builds/Week-of-Mon-20170116/008119.html

Turns out exporting TZ to UTC may not be required, but also looks like very safe, sane to do and will also prevent some confusion, so probably good to keep.


I don’t know. My approach is rather basic. I am following authoritative arguments here. Choose the Debian Reproducible Builds team as the experienced experts on the topic. Following their recommendations as long as seemingly sensible. This was introduced here:

commit 0fe840b4dd3c82b88a2d62550de94d11c3f5731d
Author: Patrick Schleizer <adrelanos@riseup.net>
Date:   Thu Jan 19 09:40:35 2017 +0000

    add --mode=go=rX,u+rw,a-s to tar to avoid non-determinism
    
    as suggested by https://wiki.debian.org/ReproducibleBuilds/VaryingPermissionsInTarballs

Then the strategy is to keep testing it. Should issues arise (non-determinism reported), I’d investigate further. As for Whonix 14, Whonix deb reproducibility was only on a best effort basis. ( https://phabricator.whonix.org/T52 )
More progress is scheduled during development Whonix 15. ( ⚓ T615 use Reproducible Builds Experimental Toolchain by Debian )
(Or earlier if someone contributes.)


Having said that… You seem to be knowledgeable on the topic.

Please consider re-posting that question on the Debian reproducible builds mailing list.
( Reproducible-builds Info Page )

That would quite likely lead to a more educated answer to your question as well as this would be a great service to Whonix.


Your root vs 0 argument seems solid. Could you report it on the reproducible builds mailing list please?

https://lists.reproducible-builds.org/pipermail/rb-general/2017-January/000287.html

https://lists.alioth.debian.org/pipermail/reproducible-builds/Week-of-Mon-20170116/008127.html

1 Like