whonix-ws-16 Template fails to update due to timing issue

Could also be an issue with Debian’s fasttrack repository. To further debug this issue, Enable Debian Fasttrack Repository in Debian Template.

I hit this issue very often, not only for the fasttrack repo. It’s enough to unlucky enough to update just after debian updates the repo (which AFAIR happens every 6h). It is very annoying especially for openQA tests, where such failure makes the whole 3h+ job to fail. Recently (because of fasttrack?) it started to happen more often - to the point I got annoyed enough to apply a workaround by setting the clock 5 minutes forward in Whonix. This also breaks split-gpg tests (where generated test-key is “not yet valid”) and we have similar workaround there too.

Maybe you can adjust sdwdate to shift the clock only forward and not backward in templates? It does reduce its properties, but maybe for templates (which have limited network connectivity anyway) it isn’t an issue?

1 Like

That’s already the case.

sdwdate isn’t running in Qubes templates.

sudo systemctl status sdwdate

● sdwdate.service - Secure Distributed Web Date
     Loaded: loaded (/lib/systemd/system/sdwdate.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/sdwdate.service.d
             └─20_arch_syscall_whitelist.conf, 40_qubes.conf
     Active: inactive (dead)
  Condition: start condition failed at Sat 2021-11-20 11:26:08 UTC; 3h 39min ago
             └─ ConditionPathExists=!/run/qubes/this-is-templatevm was not met
       Docs: https://www.whonix.org/wiki/sdwdate

Nov 20 11:26:08 host systemd[1]: Condition check resulted in Secure Distributed Web Date being skipped.

Generally, sdwdate as implemented now only sets time forward, never backwards.
It is even quite careful about it → sdwdate Time Replay Protection.
The idea is that forward should always be safe (or at least safer) but backwards could produce a mess […].
How/if time/sdwdate should be handled / could be improved in Qubes templates is another interesting question for a different forum thread.

Since sdwdate currently isn’t running inside Qubes templates, I don’t understand how Whonix/sdwdate could cause such an issue.

Non-Qubes-Whonix haven’t been reporting any APT sources no valid yet issues for a very long time. (Cannot recall.)

If this happens, in dom0, please run:

qvm-run --pass-io whonix-gw-16 "date --utc" && date --utc

Replace whonix-gw-16 with the template that is currently experiencing issues. That will show the time in dom0 and the time in the Qubes TemplateVM. Then of course it would be interesting to compare that time with other Qubes Templates.

The time difference should just be 1 or 2 seconds. It is for me. And even that minor time delay might be from the command execution delay or Boot Clock Randomization - Kicksecure. That minor difference isn’t the issue here.

Perhaps your dom0 clock is slow or fast?

Ok, then I need to investigate it in more details, because I have assumed sdwdate is the reason. Anyway, I have hit this issue in Whonix templates only and not on plain Debian templates, so it is somehow related.

1 Like

One difference comes to mind…

Non-Whonix Templates are using qubes-sync-time.service / qubes-sync-time.timer.

Whonix templates are not using it.

sudo systemctl status qubes-sync-time.service
● qubes-sync-time.service - Update time from ClockVM
     Loaded: loaded (/lib/systemd/system/qubes-sync-time.service; static)
    Drop-In: /usr/lib/systemd/system/qubes-sync-time.service.d
             └─40_qubes-whonix.conf
     Active: inactive (dead)
TriggeredBy: ● qubes-sync-time.timer
  Condition: start condition failed at Sat 2021-11-20 11:26:05 UTC; 3h 59min ago
             └─ ConditionPathExists=!/usr/lib/qubes-whonix was not met

Nov 20 11:26:05 host systemd[1]: Condition check resulted in Update time from ClockVM being skipped.

When Templates (or any Qubes VM) are started, they initially get their clock from dom0, which is probably a Xen virtualizer feature? That is maybe why for me the clock is correct.

Maybe the issue is happening with long running Qubes-Whonix Templates? Maybe the issue would happen generally with long running Qubes templates that are not using qubes-sync-time? That would also explain why I am not experiencing it since I don’t leave Templates long running.

What’s the purpose of qubes-sync-time? I couldn’t find any documentation/rationale for it. If there is, please let me know. Eager to read.

Is it because otherwise Xen DomU’s would start to clock drift more and more?

Generally, yes, that’s the reason. In practice it happens when you pause a VM, or in some cases (depending on domU kernel config) under high load. The service is triggered after system suspend and in similar occasions (in addition to a timer).

As for the time on domU boot, it indeed may be the case. Xen does provide some initial clock to domU, but I’m not sure if it’s synchronized properly in all the cases. It is a different clock than dom0 uses.

In case of tests, there is no long system uptime involved. The system is started just before performing updates. And since other templates do not have this issue, I think it’s pretty clear that at least dom0 clock is correct.

[user@dom0 ~]$ qvm-run -ap whonix-ws-16 'date --utc' && date --utc
Sat 20 Nov 2021 03:42:51 PM UTC
Sat 20 Nov 2021 03:43:24 PM UTC
[user@dom0 ~]$ qvm-run -ap debian-11 'date --utc' && date --utc
Sat 20 Nov 2021 03:44:58 PM UTC
Sat 20 Nov 2021 03:44:59 PM UTC

This is after fresh system start, and also templates are just started here.

1 Like

I have disabled (systemctl mask) qubes-sync-time service in debian-11, and still got correct time there (after template restart ofc).

2 Likes

Also suspend/resume time fix is disabled in Qubes-Whonix templates.
(sdwdate /usr/libexec/sdwdate/suspend-post)

Are you using suspend/resume? That might result in time in Qubes-Whonix Templates to become stale.

As a workaround, does it help to shutdown and restart Qubes-Whonix Templates?

Maybe Qubes-Whonix Templates should use qubes.GetRandomizedTime?


Reminds me of ⚓ T387 Qubes-Whonix-Gateway as ClockVM. I still didn’t get around implementing it since it’s kinda complex, not pretty and daunting to implement. More realistically might be using sdwdate inside the ClockVM (sys-net) or using a Kicksecure based sys-net which comes with sdwdate pre-installed.

All the above checks are just after startup, no suspend nor long uptime is involved.

1 Like

Looks good enough.
Please re-run in case this issue happens again. Or could you please add it to the Qubes Q/A scripts? Seems quite useful anyhow to have this information handy. Checking/comparing dom0/VM time before proceeding with any updates.

Yet another cause could be Debian indeed setting a wrong valid-from field.

When this issue is happening, could you please check this link/file?
https://fasttrack.debian.net/debian/dists/bullseye-fasttrack/Release

At time of writing the interesting fields are:

Date: Sat, 20 Nov 2021 15:30:10 UTC
Valid-Until: Sat, 27 Nov 2021 15:30:10 UTC

VM:

date --utc

Sat 20 Nov 2021 04:00:03 PM UTC

dom0:

date --utc

Sat Nov 20 16:00:29 UTC 2021

(Few seconds delay due to thinking, typing.)

The different time zones might be an issue. dom0 / Templates are “normal” but Qubes-Whonix templates are in UTC.

For Debian 11 / bullseye in sdwdate a change was required (and introduced since the first Debian 11 based Whonix version 16) to set in python:

os.environ["LC_TIME"] = "C"
os.environ["TZ"] = "UTC"
time.tzset()

This python change may need to be added to any Qubes source code. For example the python based Qubes dom0 file /etc/qubes-rpc/qubes.GetDate might need it.

Furthermore, to improve the robustness (and perhaps even fix this bug) any invocation of date should be prefixed with --utc, i.e. date --utc for any Qubes shell/bash scripts.

Internally, programmatically Qubes should always handle time in UTC. The time shown to the user when manually running date or looking at the systray can remain as is in user local timezone, no problme.

The Whonix template is ~half a minute in the past. If it’s close enough that the 5 minutes makes a difference (date -s +5min is enough to fix), then I can very well believe that 30s in the past may be problematic at times too. Something clearly is making Whonix’s clock be in the past, and I’d say we should avoid it at all regardless if that’s 30s or 5min.

1 Like

added to wiki just now…

Qubes Time Synchronization Features:

Xen DomU’s get their initial time from Xen (Qubes dom0) at VM start.

  • qvm-sync-clock.service
  • qvm-sync-clock.timer

qvm-sync-clock gets time from Qubes ClockVM.

It is used because otherwise Xen DomU’s would start to clock drift more and more. This this answer by marmarek [archive].

qvm-sync-clock is unwanted in sys-whonix and anon-whonix because sdwdate runs there.

qvm-sync-clock is disabled in Qubes-Whonix ™ Templates until version Qubes-Whonix ™ 16.0.3.7. To be re-considered for later versions. Qubes-Whonix ™ get their time from dom0 at VM startup, which is then randomized using Boot Clock Randomization.

Future: qvm-sync-clock should be equally safe to run inside Qubes-Whonix ™ Templates, if passed though clock-random-manual-cli.


Now working on making timesync in Qubes-Whonix Template similar to Non-Qubes Templates.


https://github.com/Whonix/sdwdate/commit/6215a9ea996e9db970059c3b4ad58d17016b7483

Time to revise:

New plan:
As for Qubes-Whonix Templates, make it similar to /etc/qubes-rpc/qubes.SetDateTime.

I thought suspend/resume as well as long running Templates should be improved by modifying qvm-sync-clock for Qubes-Whonix. Started working on it:

Not in use yet. And was probably in vain. And probably low priority.

Because…


Then if this “small” difference is causing some much issues… The reason for that is probably:

If someone would like to try if that is the case, try the workaround of disabling boot clock randomization.

Related boot clock randomization development discussion:
Boot Clock Randomization - bootclockrandomization

As for usefulness of Boot Clock Randomization in Qubes-Whonix Template, see:

I am still not sure Boot Clock Randomization is the cause since then this issue should be equally happening to Non-Qubes-Whonix users and a lot more users. But to find out…

Now I am a bit more sure this is caused by Boot Clock Randomization. It’s not happening in Non-Qubes-Whonix since that is using sdwdate which has much higher accuracy than Boot Clock Randomization. Qubes-Whonix Templates however use only Boot Clock Randomization and do not use sdwdate at time of writing.

Still not sure why this isn’t happening more often to more users and wasn’t reported earlier.

In summary:

  • Non-Qubes-Whonix: Boot Clock Randomization + sdwdate
  • Qubes App Qubes: Boot Clock Randomization + sdwdate
  • Qubes Templates: Boot Clock Randomization only

Accuracy:

  • Boot Clock Randomization: +/ 180 seconds
  • sdwdate: quite good, often +/- 1 second

Should sdwdate be run inside Qubes Templates?

It would probably be easy to make sdwdate connect through Qubes UpdatesProxy - similar to how already APT and Tor Browser Downloader by Whonix ™ are using networking in Qubes-Whonix Templates.

A more secure solution might be something like ⚓ T387 Qubes-Whonix-Gateway as ClockVM, i.e. Qubes-Whonix Templates receiving a timestamp with sdwdate accuracy from an App Qube, but also quite harder to implement.

@Patrick & @marmarek : Thanks a lot for all your work on tracking down the root cause of the reported issue.

It is not clear to me, if at the moment you need additional information from my system.

FYI: Qubes Updater has (again) successfully updated several fedora templates - but - failed to update ‘whonix-ws-16’.