Ok, as I expected it is not even worth using a regular compressor like 7zip or freearc at all.
Using only srep without any modification (drag&drop to srep.exe) the total size was reduced to 2,12 GB.
In my test, extracting of ova images during Whonix Installer took around 5 minutes, extracting of the above compressed (2,06 GB) file took around 7 minutes, extracting srep compressed file (2,12 GB) took about 1 minute
Try for yourself
So both the user and server could save about 1,5 GB bandwidth per download and extraction could be around 4-5 times faster
When small issue I am seeing is, that when new Non-Qubes-Whonix images are to be announced in Whonix blog, the Windows installer ones will lag behind. Let’s say I post a call for testing blog post. Will these get a Windows installer? Or will only stable Whonix released get an installer? The Windows installer is great to have either way!
In an ideal world, building Windows UI and Windows installer would be done in a single run of the build script. I.e. the Windows UI and installer source code referenced here https://github.com/Whonix/Whonix in a Windows sub folder or so.
Is this realistic at all? Since obviously WIndows tools are required as build dependencies for the build process. Build security would also be involved. Windows dependencies are often only downloadable over http and provide no gpg signatures. These then would be fully capable to compromise all build images, unless there are run within a VM.
We probably cannot automate installation and usage of VisualStudio for all sorts of reasons (gui, legal). Perhaps the Whonix build script could fetch a binary build of Whonix Windows UI, gpg verify it. For the Windows installer that however obviously wouldn’t work since we need to put the newly build ovas inside there.
Not just NSA. I wonder if such a huge entity gets interested, that is also on the level of the CIA. Wouldn’t they just physically tamper with developer machines while no one is at home?
Of course very much desirable to make sure it is backdoor free.
That however is very hard. The ova’s are not reproducible yet. This is nearly impossible before Debian raw VM images are fully reproducible, which I have been told is a few years away. Installable Debian debs are not reproducible yet. Let alone installed Debian packages.
For more info one reproducible builds, have a look at the following pages:
Again, thank you very much. Just tested it and seem to be able to get some very good results indeed. Will look at ways of utilizing it for future versions.
Honestly, I personally feel like only stable releases should find their way into the Installers. Anything else might overcomplicate things. For those brave enough to test a newer version, they may still use the standard VBox import features.
Sadly compiling everything “in one go” is not possible at the moment. Regarding compromise via the toolset, there are multiple things making this rather hard:
1.) I compile nsisbi from source each and every time I build the installer.
2.) VisualStudio (being freshly installed for every version) does sanity checks by itself during download and installation.
3.) Every other file gets verified by me.
4.) I compile each version on multiple independent EC2-Instances which are accessed via a virtual machine to begin with. These are only used once for their respective purpose, then destroyed.
5.) Once compiled, I verify the binary with a fresh and only once used GPG-Key.
6.) Verify and test this version in another, seperate VM on my PC.
7.) Compare the signature from this version to one I got from an installer I simply made on my PC.
8.) Once everything is in order, sign it with my own, permanent GPG key.
The process is rather complicated and takes quite a lot of time, though from what I can see, there seems to be a rather small attack surface.
The easiest way to ensure that there is really no code “in there”, which somehow slipped through my compiling procedures, would really be to compile everything by yourself.
Like mentioned, I litterally do not safe anything from any previous compilation. I always start out on what are fresh machines with nothing on them. So I myself can only rely on the source code found and retrieved from Github.
Regarding the utilization of srep, I think it can only receive a single file at a time and the way to achieve the best compression is to compress two images together. so storing the images in a zip/7z file without compression and then using srep would be easy but would require extracting the files twice. Do you know of any way to achieve this extraction in a single step? I think freearc or peazip could be the easiest way to use srep as custom compression method, I actually just did this on freearc successfully but extracting gave me some error, if I can figure it out I will let you know
So far the only way I found to extract in a single step is to create a tar using 7za and using the commands below. The extraction below doesn’t create any more temporary files (whonix.tar) and takes about 2 minutes for me currently. However srep by default requires 928 MB RAM for decompression of whonix.tar.srep. There is another option (srep -m3o whonix.tar) which will require almost no memory during decompression but it extends decompression time, for me it took a little longer than 3 minutes.
7za a -ttar whonix.tar gateway.ova workstation.ova
srep -d whonix.tar.srep - | 7za x -ttar -si
There cannot be an effective anti malware check during download or installation. Once it gets executed, if infected with malware, it would be game over for that machine. I mean, if the downloaded visualstudio setup had malware attached, then no check within the visualstudio setup can detect that. What infests first in memory (and is sophisticated) wins.
Of course, it is impossible to hold Windows build security to the same standards as Linux build security. Since VisualStudio is neither provided with gpg signatures nor as source archive, downloading it over https is most that can be expected.
Compilation on EC2 is scary. It’s reasonable to expect amazon build machines being compromised. But since you use it only for comparison, this is actually superb.
I expected it to be build on your PC only.
I am surprised the comparison of your EC2 build vs Windows build is even possible. The ova images are unfortunate not built deterministically. But the build process of the Windows installer apparently is deterministic?
Wouldn’t they just physically tamper with developer machines while no one is at home?
I don’t think any of this could prevent physical attacks on the build machine. You don’t buy new hardware for each build in random stores?
Rather a theoretic question. The build security on my build is a better target. Infect everyone except Qubes-Whonix users. I don’t think it’s possible to defeat silent unnoticed physical hardware tampering when no one is at home. Only reproducible builds and people actually comparing could exclude that threat.
As long as the version of the compiler used is the same, they appear to be completely identical, bit-for-bit.
That’s why I use EC2. To compare to something I can be somewhat certain off to be independent of my machines at home. Because, even if we consider EC2 compromised (which we should) the likelihood of it and my compilation at home giving out the same result is rather low (especially compared to the alternative of not comparing at all). Adding to that, the VM I use isn’t connected to the internet while compiling, making it even harder.
Maybe I should get myself a cheap system which doesn’t get any connection to the internet for additional comparison though. Wanted to have a reason to create an air-tight system with custom BIOS and via soldering removed ports anyways. To keep such a system as safe as possible, I would have to compare signatures by hand/eye though, as any connection via USB, etc would negate the advantages.
Is likely going to be rather impossible. Even if you flash your BIOS every time you use your system, keyboard controllers and others have become capable enough to be problematic.
If we’d assume your machine compromised, you should assume all output seen from EC2 compromised also. It may be fine and have correct checksums on EC2, but the malware could replace the hashsum and adjust real and shown hashsum to fool you.
Sure, that’s an incredibly sophisticated attack. But there has been “comparable” sophisticated malware, well, Stuxnet comes to mind.
Against the sophistication and threat model we are discussing here, that would not help either.
I like to repeat, I am just discussing this on a rather theoretic purposes of informational exchange. I might learn a thing that may be good to know later. Not suggesting actual improvements. The only thing which isn’t a serious wish is deterministic builds, I really want them for Whonix generally as soon as possible, since that would be a solid “perfectly secure” approach.
I can understand it in regards to the first quote (breaking into a VM is potentially possible), though why wouldn’t the second quote (regarding USB) help? The scenario I’ve outlined would mean that there is one system which has no physical port, no connection to the internet, open source hardware and Bios, as well as no other connection to the outside world.
If we are talking about something similar to Stuxnet, this kind of protection should actually help, as an attacker can’t use USB, etc to gain access.
If independent Builds where made on there, whose sole purpose it to compare their signature by hand with one of a build done on a machine which in the end is connected, I have a hard time seeing how an attacker would be able to sneak any modifications by.
That might be an issue. Anyways, thank you very much, for telling me beforehand, otherwise I would have likely started to rip out my hair in confusion. That sadly would mean that I’d have to adjust my verification process. Will think of a solution.
I’m not experienced as to how to improve xz compression, assuming you don’t want to experience with less known compressors.
What I do know is that freearc (with its many unique compression filters and technologies such as srep) and nanozip are the best compressors around in terms of both speed and compression ratio. I don’t know the details of the ticket you mentioned but I think srep might be the best tool to speed it up again. But then it is not deterministic, I’m not sure if there is a way to make it so