Massive time skew problems with Whonix-Gateway

Whonix11 VirtualBox running on a Linux host.

For some reason my Whonix Gateway machine seems to be constantly losing the correct time. It doesn’t happen as a sudden change, it seems to just drift over time, so the longer I leave the machine running the more out of sync the time gets. Eventually the system time becomes so inaccurate that my Tor circuits begin failing. When things get to that point I can’t even seem to run Timesync command any more, and the only way that I can correct things is to manually edit the system time.

I’ve just run the Timesync command twice on the Gateway machine, the first time to set a baseline, and the second time approx 10 minutes later. In that 10 minutes the time skew was already more than 100 seconds.
It continues to get worse over time, so if I leave the Gateway running for a couple of days the time will end up being incorrect by several HOURS.

My Workstation machines (running on the same physical host) don’t seem to be having the same problem either.

Any ideas why this might be happening, or (more importantly) how I can try and prevent it?

edit - Same test again with approx 30 minute interval, and the time skew was already 300 seconds. I’ll continue to keep an eye on it, but it actually seems to be scaling fairly accurately at around 100 seconds every 10 minutes.

Check out sdwdate log.

The logs initially show success, but then things quickly start going south.

Any thoughts?

No. Further debug information required.

Stop sdwdate.

Empty the whole log.

(Mark all. Delete. Safe.)

Reboot.

Share the full sdwdate log.

Thank you Patrick. I have just cleared the log and rebooted, so I shall wait until my Tor circuits start failing again and then post the complete log file.

Just to make sure that I’m clear, there shouldn’t be anything ‘sensitive’ in that log which I might want to alter/remove before posting it?

Right.

Thanks Patrick.

Here is my complete log file after clearing and restarting the machine.

http://m.uploadedit.com/ba3k/1444112936909.txt

Thanks.

The same doesn’t happen on the workstation?

Nothing seems wrong with sdwdate, or sclockadj. Must have something to do with VirtualBox. I made a quick search but could not find anything mentioning such an important skew.

Assuming that the drift is real [it must be if Tor fails, but you could still run “date” in a terminal right after an sdwdate cycle has completed, to compare], the only option that comes to mind at the moment is to re-install a fresh Gateway, keeping that one for investigation, if the issue does not replicate in the new one.

[quote=“Patrick, post:9, topic:1452”]Thanks.

The same doesn’t happen on the workstation?[/quote]

Once the time on my gateway becomes so far out of sync that my Tor circuits begin failing, then of course the timesync process begins failing on my workstation machines as well.

However, for some reason the time on my workstations never seems to be nearly as far out of date.

As an example I rebooted a my workstation and gateway machine at the same time, performed an initial timesync on both [verifying that the time was correct at this stage], and then left them both running together.
I checked them again a little bit under 24 hours later, and this was the result:

Actual UTC time: 07:15
Workstation time : 07:00
Gateway time: 05:53

I realise that 15 minutes time skew [over approx 24 hours] still represents a problem, however it never seems to be nearly as bad as my gateway.

Debugging this could become a bit hairy. Let’s check first if something from the outside interferes with timesync.

First of all, please stop sdwdate on the gateway [in a working state if you prefer].

Then compare Whonix-Gateway clock with the host clock. To begin the test, the clocks should not match. There should be at least 10 seconds or so difference.

Can you leave Whonix-Gateway run >=24 hour and check that the time difference does not change? (There should not be more than one or two seconds drift.)

sclockadj2 ( TimeSync: Whonix ™ Time Synchronization Mechanism ) might also fix this, but who knows if/when it gets finished.

Moving this to the VirtualBox sub forum for now as long as it does not happen for other virtualizers.

I could not reproduce this neither in VirtualBox nor Qubes. sclockadj is capable of setting back the clock.