Consider adding Retry Loop into build script

Preface:

I’ve done probably a couple dozen --install-to-root builds over the past few weeks.

Some were in Qubes and some were in VirtualBox.

Some were version 8.2, some 8.6.6.0, some 8.6.6.7.

All were through a Tor internet connection.

I seem to be getting inconsistent build errors at various times, even with the same build configuration.

For example, one time a package dependency is not met and throws an error, and the next time it doesn’t, etc.

It seems like a number of these build errors may often be due to temporary conditions, such as tor/internet interruptions or system resource consumption issues.

Where, by simply waiting a little bit, refreshing Tor circuit, etc, a simple retry of an erred build step might actually succeed instead of fail.

Suggestion:

Much of the infrastructure to catch build errors, and pause the script, seems to be in place already. For example, I often press “c + enter” to ignore and continue in my test builds.

I think a “RETRY” option should be added that loops back through and tries the error build step yet again.

For example, along with the other options: “Press r and enter to retry this step.”

Instead of having to either ignore the step after a single attempt or quit the build script entirely, then have to start all over from the beginning.

it seems like a Retry Loop could help correcting and succeeding with some build error conditions on the fly, instead of having to start from the top and hope for the best again.

This retry capability could also help one further sort out real persistent build errors that the project should know about from temporary conditional ones that are individual and fleeting.

Also, it seems that when the build script is using apt-get to download multiple packages, and sometimes fails on one, it doesn’t have error catching infrastructure to handle that.

For example, I recently got:

Err http://security.debian.org/ wheezy/updates/main python-magic i386 5.11-2+deb7u5 [39.1 kB] Connection failed.

The script did not seem to pause for this failed download, where, depending upon the package, something like this maybe could harm the final build and/or spawn additional errors further in the build script which could not be fixed without going back and getting the downloaded package.

So, at some point, maybe adding an error catching and Retry Loop capability to other parts, like these package downloads could be very helpful to ensuring stable and successful builds.

Thanks!

I don’t see how this is possible. Please post two logs to compare so this can be fixed.

I am always eager to improve the build environment to make development more comfortable.

Let’s start simple with a retry feature for failed bash commands in the error handler?

I just did some changes to implement that:
https://github.com/Whonix/Whonix/commit/4ff311cd80d0b44199c0bc7b140a953e62c54908
https://github.com/Whonix/Whonix/commit/721e470db0a37c57fede0fa49beefae4eca7ee44
https://github.com/Whonix/Whonix/commit/deac7955d36115ea9324f0b5d68f24a087f18138
https://github.com/Whonix/Whonix/commit/389d58aae4b88da3905b501830363a8759c23271
https://github.com/Whonix/Whonix/commit/91a427f336b5ba9ae417788272572d34be7fbd04

Example log:

sudo ./build-steps.d/1100_prepare-build-machine 
+ true 'INFO: Currently running script: ./build-steps.d/1100_prepare-build-machine'
+++ dirname ./build-steps.d/1100_prepare-build-machine
++ cd ./build-steps.d
++ pwd
+ MYDIR=/home/user/whonix_dot/Whonix/build-steps.d
+ cd /home/user/whonix_dot/Whonix/build-steps.d
+ cd ..
+ cd help-steps
+ WHONIX_BUILD_INTERNALRUN=1
+ whonix_build_on_operating_system_detect_skip=1
+ source pre
++ set +x
+ source variables
++ set +x
INFO: Setting... export UWT_DEV_PASSTHROUGH="1"
++ bash -n /home/user/whonix_dot/Whonix/help-steps/parse-cmd
++ source /home/user/whonix_dot/Whonix/help-steps/parse-cmd
++ whonix_build_cmdoptions
++ trap error_handler_general ERR INT TERM
++ '[' '!' 1 = 1 ']'
++ build_machines=0
++ :
++ case $1 in
++ break
++ '[' '' = '' ']'
++ echo 'INFO: No build target (such as --486-linux, --32bit-linux, --64bit-linux, --64bit-kfreebsd [untested] or--32bit-kfreebsd [untested] has been chosen.
Defaulting to BUILD_TARGET_ARCH i386 and setting BUILD_KERNEL_PKGS to:
   linux-image-486
   linux-headers-486
   linux-image-686-pae
   linux-headers-686-pae'
INFO: No build target (such as --486-linux, --32bit-linux, --64bit-linux, --64bit-kfreebsd [untested] or--32bit-kfreebsd [untested] has been chosen.
Defaulting to BUILD_TARGET_ARCH i386 and setting BUILD_KERNEL_PKGS to:
   linux-image-486
   linux-headers-486
   linux-image-686-pae
   linux-headers-686-pae
++ export BUILD_TARGET_ARCH=i386
++ BUILD_TARGET_ARCH=i386
++ export 'BUILD_KERNEL_PKGS=linux-image-486 linux-headers-486 linux-image-686-pae linux-headers-686-pae'
++ BUILD_KERNEL_PKGS='linux-image-486 linux-headers-486 linux-image-686-pae linux-headers-686-pae'
++ '[' i386 = '' ']'
++ '[' '' = 1 ']'
++ '[' '!' 1 = 1 ']'
++ '[' '' = 1 ']'
++ '[' '' = 1 ']'
++ '[' '' = 1 ']'
++ '[' 0 -gt 1 ']'
++ '[' 0 -le 0 ']'
++ '[' '' = 1 ']'
++ '[' 1 = 1 ']'
++ true
++ '[' '' = 1 ']'
++ '[' '' = true ']'
++ '[' '' = true ']'
++ error 'You must choose --install-to-root, --virtualbox or --qcow2.'
/home/user/whonix_dot/Whonix/help-steps/parse-cmd: line 370: error: command not found
+++ error_handler_general
+++ error_handler_shared
+++ last_failed_exit_code=127
+++ last_failed_bash_command='error "${red}${bold}You must choose --install-to-root, --virtualbox or --qcow2.${reset}"'
+++ error_handler_shared_process
++++ caller
+++ last_caller='46 pre'
+++ last_script=pre
+++ benchmark_time_end
++++ date +%s
+++ benchmark_time_end=1410964589
+++ benchmark_took_seconds=0
++++ convertsecs 0
++++ (( h=0/3600 ))
++++ (( m=(0%3600)/60 ))
++++ (( s=0%60 ))
++++ printf '%02d:%02d:%02d\n' 0 0 0
++++ true
+++ benchmark_took_time=00:00:00
+++ true
+++ true '
############################################################
ERROR in pre detected!
(benchmark: 00:00:00)
last_failed_bash_command: error "${red}${bold}You must choose --install-to-root, --virtualbox or --qcow2.${reset}"
last_failed_exit_code: 127
caller: 46 pre
ERROR in pre!
############################################################
'
+++ '[' '!' '' = 1 ']'
+++ read -p 'ERROR in pre detected!
Please have a look above "error_handler_general", note the command that failed, its output and last_failed_exit_code.
- Please enter c and press enter to ignore the error and continue building. (Recommended against!)
- Please press r and enter to retry.
- Please press s and enter to open an chroot interactive shell.
- Please press enter to cleanup and exit. ' answer
ERROR in pre detected!
Please have a look above "error_handler_general", note the command that failed, its output and last_failed_exit_code.
- Please enter c and press enter to ignore the error and continue building. (Recommended against!)
- Please press r and enter to retry.
- Please press s and enter to open an chroot interactive shell.
- Please press enter to cleanup and exit. r
+++ ignore_error=
+++ error_handler_do_retry=
+++ '[' r = continue ']'
+++ '[' r = c ']'
+++ '[' r = s ']'
+++ '[' r = shell ']'
+++ '[' r = r ']'
+++ ignore_error=true
+++ error_handler_do_retry=true
+++ error_handler_retry
+++ true 'INFO: Retrying last_failed_bash_command...: error "${red}${bold}You must choose --install-to-root, --virtualbox or --qcow2.${reset}" '
+++ eval error '"${red}${bold}You' must choose --install-to-root, --virtualbox or '--qcow2.${reset}"'
++++ error 'You must choose --install-to-root, --virtualbox or --qcow2.'
pre: line 31: error: command not found
+++ retry_last_failed_bash_command_exit_code=127
+++ '[' 127 = 0 ']'
+++ true 'INFO: Retry failed. exit code of last_failed_bash_command: 127 '
+++ last_failed_exit_code=127
+++ last_failed_bash_command='error "${red}${bold}You must choose --install-to-root, --virtualbox or --qcow2.${reset}"'
+++ error_handler_shared_process
++++ caller
+++ last_caller='39 pre'
+++ last_script=pre
+++ benchmark_time_end
++++ date +%s
+++ benchmark_time_end=1410964598
+++ benchmark_took_seconds=9
++++ convertsecs 9
++++ (( h=9/3600 ))
++++ (( m=(9%3600)/60 ))
++++ (( s=9%60 ))
++++ printf '%02d:%02d:%02d\n' 0 0 9
++++ true
+++ benchmark_took_time=00:00:09
+++ true
+++ true '
############################################################
ERROR in pre detected!
(benchmark: 00:00:09)
last_failed_bash_command: error "${red}${bold}You must choose --install-to-root, --virtualbox or --qcow2.${reset}"
last_failed_exit_code: 127
caller: 39 pre
ERROR in pre!
############################################################
'
+++ '[' '!' '' = 1 ']'
+++ read -p 'ERROR in pre detected!
Please have a look above "error_handler_general", note the command that failed, its output and last_failed_exit_code.
- Please enter c and press enter to ignore the error and continue building. (Recommended against!)
- Please press r and enter to retry.
- Please press s and enter to open an chroot interactive shell.
- Please press enter to cleanup and exit. ' answer
ERROR in pre detected!
Please have a look above "error_handler_general", note the command that failed, its output and last_failed_exit_code.
- Please enter c and press enter to ignore the error and continue building. (Recommended against!)
- Please press r and enter to retry.
- Please press s and enter to open an chroot interactive shell.
- Please press enter to cleanup and exit. 

What do you think?

Would you like to have an auto retry feature?

Let’s see if bash command retry feature suffices or if a build step retry feature will be required.

Since it’s a more complex change that needs more testing, that feature will be added to Whonix 10 and above.

[
Interestingly you could use this change also in Whonix 9 without git commits.

You could copy the edited/new functions
error_handler_shell
error_handler_retry
error_handler_shared
error_handler_shared_process
error_handler_exit
from https://github.com/Whonix/Whonix/blob/91a427f336b5ba9ae417788272572d34be7fbd04/help-steps/pre and paste them into a build configuration file. This is untested, but by doing that, you could re-implement those functions.
]

Also, it seems that when the build script is using apt-get to download multiple packages, and sometimes fails on one, it doesn't have error catching infrastructure to handle that.
Err http://security.debian.org/ wheezy/updates/main python-magic i386 5.11-2+deb7u5 [39.1 kB] Connection failed.
The script did not seem to pause for this failed download, where, depending upon the package, something like this maybe could harm the final build and/or spawn additional errors further in the build script which could not be fixed without going back and getting the downloaded package.
It is not possible to break that early. At least not without a giant hack (parsing stdout) or patching apt-get. The build script waits until apt-get exits. It's up to apt-get when it exits. As far I know there is no exit-as-soon-as-first-download-failed feature. However, when apt-get finished it should reliably exit with a non-zero exit code, which the build script will reliably notice and report.

Great. Thanks Patrick!

[quote=“Patrick, post:2, topic:500”][quote author=WhonixQubes link=topic=525.msg4040#msg4040 date=1410959036]
For example, one time a package dependency is not met and throws an error, and the next time it doesn’t, etc.
[/quote]
I don’t see how this is possible. Please post two logs to compare so this can be fixed.[/quote]

I don’t have two comparative logs handy, but I would imagine this could come from a failed package download over Tor.

Back in my 8.2 builds, I mentioned that new error with the “liblwres80” package, but before mentioning that, my 8.2 builds had completed without throwing an error for that package.

I do often see Debian package downloads failing over certain tor circuits and succeeding after refreshing the Tor circuits.

So I’m guessing these inconsistent package dependency errors could simply be due to failed downloads over Tor.

Awesome to hear that

Sounds good.

[quote=“Patrick, post:2, topic:500”]I just did some changes to implement that:

What do you think?[/quote]

With a brief review and without being very acclimated to Whonix code yet, it looks good.

I’ll have to do a development build at some point and look at it more closely.

Would want to make sure if a step continues to fail, one can retry an indefinite number of times in a row.

Yes. Unless there are certain steps where this clearly doesn’t make sense for some reason, then I generally think retrying steps is a straight-forward choice to make and could be auto-retried maybe even a few times by default before prompting the user to intervene. Or maybe even some conditional logic to set the number of retries in a command line option flag, like (–auto-retry=20), with a default number of retries of maybe 0, 1, or a few, if this option flag is not set.

Yup. That might do.

Sounds good.

Good to know.

Yeah, that’s likely what happened in that case.

If after going through all of the apt-get downloads, and it reports a non-zero exit code, then I’m guessing the bash command retry feature would handle this, retry the apt-get downloads, recognize not needing the ones it already has, and attempt the ones that failed before?

This would be good if so.

Quite likely.

Would want to make sure if a step continues to fail, one can retry an indefinite number of times in a row.
This is tested, was possible with last commit already.
Yes. Unless there are certain steps where this clearly doesn't make sense for some reason, then I generally think retrying steps is a straight-forward choice to make and could be auto-retried maybe even a few times by default before prompting the user to intervene. Or maybe even some conditional logic to set the number of retries in a command line option flag, like (--auto-retry=20), with a default number of retries of maybe 0, 1, or a few, if this option flag is not set.

Added --auto-retry (default: 1) and --wait-auto-retry (default: 5) (wait before auto retry in seconds) to error handler.

Implemented in https://github.com/Whonix/Whonix/tree/8d6edb1e77e1d9f60578d013d0d046b88ace0164, commit https://github.com/Whonix/Whonix/commit/8d6edb1e77e1d9f60578d013d0d046b88ace0164.

I guess --auto-retry default 1 is good, because some errors such as missing command line option aren’t useful to be retried more times. It would unnecessarily flood stdout. However, maybe you have another idea here.

Anything else missing here to ease development? I could also easily add an option for an hook to dispatch (evaluate) a custom command before auto retry. One could automatically run a custom newnym/wait script then or something like that.

If after going through all of the apt-get downloads, and it reports a non-zero exit code, then I'm guessing the bash command retry feature would handle this, retry the apt-get downloads, recognize not needing the ones it already has, and attempt the ones that failed before?
Yes.

Nice.

Sounds good.

Nothing notable that I can think of at the moment. Will keep my mind open.

This could probably be useful in certain circumstances. I could see doing such a newnym/wait procedure.

Thanks again!

Implemented --dispatch-before-retry and --dispatch-after-retry in https://github.com/Whonix/Whonix/tree/c4c11ee2e8219980293788c1a29bbab6e60989c0, git commit https://github.com/Whonix/Whonix/commit/c4c11ee2e8219980293788c1a29bbab6e60989c0.

You could use something like:

--dispatch-before-retry /path/to/your/script

[quote=“Patrick, post:6, topic:500”][quote author=WhonixQubes link=topic=525.msg4046#msg4046 date=1410973841]
This could probably be useful in certain circumstances. I could see doing such a newnym/wait procedure.
[/quote]
Implemented --dispatch-before-retry and --dispatch-after-retry in https://github.com/Whonix/Whonix/tree/c4c11ee2e8219980293788c1a29bbab6e60989c0, git commit https://github.com/Whonix/Whonix/commit/c4c11ee2e8219980293788c1a29bbab6e60989c0.

You could use something like:

--dispatch-before-retry /path/to/your/script

Nice! Thanks Patrick. :slight_smile: