[REOPENED] sclockadj stuck at 100% CPU

0.0005 is the default value in Whonix 9. Unchanged in Whonix 10. As per sdwdate config:

That usually works well, except at some strange occurrences. It’s the same time changing algorithm as NTP:

0.0005 increments is fine (although I don’t see why min and max is the same. I thought we wanted random), but 25 hours?

According to my debug output, the total amount was going down just fine. But it was going to take a lot more than 30 iterations – more like 180 million.

Is it possible to turn off this ruby script for good? My laptop gonna fly, because of speed of its fans. :wink:

There is many ways, but none of them is a solution because then you differ from other Whonix users.

Whonix 9 sdwdate default config:

Try this, which used date and clockjumps rather than sclockadj.

echo 'SDWDATE_USE_SCLOCKADJ_WHEN_DAEMON="0"' | sudo tee /etc/sdwdate.d/50_user_sclockadj_disable

Or try this, which disabled sdwdate after first run which will let the clock drift after while.

echo 'INTERVAL="0"' | sudo tee /etc/sdwdate.d/50_user_exit_after_first

In any case, restart sdwdate afterwards.

service sdwdate restart

[hr]

For a real solution, please help getting this fixed. Try replacing your version of sclockadj with the latest one:
https://raw.githubusercontent.com/Whonix/sdwdate/master/usr/lib/sdwdate/sclockadj

Then restart sdwdate. Perhaps that helps.

Or upgrade [in some VM] to testers repository (Whonix APT Repository), then you’ll get Whonix 10-ish packages, that comes with more recent sclockadj and more recent sdwdate. Both have more strict error handling and more debugging code added.

This is somewhat of a strange bug, because it seems to work in roughly felt approximately >99% of cases that really should be fixed. This one might just be a manifestation and it might have other consequences. Could be a hardware bug, specific, rare VirtualBox bug or who knows what.

service sdwdate restart
Did the job. Maybe it's not great solution, but the 100% cpu bug occurs very rare, maybe 1/10 of whonix starts. Thanks Patrick, keep up the good work.

On Which platform was this happening? VirtualBox, KVM or Qubes? @c846478 @John_Howard

100% CPU but happens reproducibly after suspend and resume in Qubes-Whonix:
https://github.com/QubesOS/qubes-issues/issues/1764#issuecomment-187089419


https://forums.whonix.org/t/sclockadj2-slow-clock-adjuster-prevent-fingerprintable-clock-adjustments

Also happens reproducibly when launching whonix-ws disposable VMs in Qubes.

⚓ T695 Whonix running as Qubes DispVM uses saved clock was marked closed wontfix because it was assumed to be caused by dvm savefiles. It may actually be a result of broken sclockadj.

There are lots of changes happening soon (Qubes 4, new dispVM implementation, sclockadj rewrite from ruby to C, suspend-resume hooks) so may be better to wait before investigating. Will document what I’m seeing for future reference.

Qubes 3.2, Whonix 13

Using DispVM (DVM) based on DispVMTemplateVM (DTVM) based on TemplateVM (TVM = anon-whonix)

If DTVM savefile is constructed while sdwdate service is in progress of iterating through time servers, then when DVM is launched, results will be highly skewed. If large (insane?) time jumps are sent to sclockadj, then clock is not adjusted and cpu runs at 100%.

disp1: sdwdate.log

date -s used to set time. no issues. but time is 2 days in the past

2017-11-26 17:25:50,054 - sdwdate.log - INFO - Fetching remote times, **start Sun Nov 26 17:25:50 UTC 2017** (unixtime 1511717150.05)
2017-11-28 21:10:04,301 - sdwdate.log - INFO - Pool 2 last url: #.onion, web unixtime: 1511717009, web time: Sun Nov 26 17:23:29 UTC 2017, diff: -186395 seconds
2017-11-28 21:10:04,301 - sdwdate.log - INFO - Pool 3 last url: #.onion, web unixtime: 1511717012, web time: Sun Nov 26 17:23:32 UTC 2017, diff: -186392 seconds
2017-11-28 21:10:09,858 - sdwdate.log - INFO - Pool 1 last url: #.onion, web unixtime: 1511903265, web time: Tue Nov 28 21:07:45 UTC 2017, diff: -144 seconds
2017-11-28 21:10:09,859 - sdwdate.log - INFO - Fetching remote times, **end Tue Nov 28 21:10:09 UTC 2017** (unixtime 1511903409.86)
2017-11-28 21:10:09,869 - sdwdate.log - INFO - Pool differences, sorted: [-186395, -186392, -144]
2017-11-28 21:10:09,870 - sdwdate.log - INFO - Median time difference: -186392
2017-11-28 21:10:09,870 - sdwdate.log - INFO - Seconds to add: + 0.832422362
2017-11-28 21:10:09,870 - sdwdate.log - INFO - New time difference: -186391.167578
2017-11-28 21:10:09,870 - sdwdate.log - INFO - Old unixttime: 1511903409.87
2017-11-28 21:10:09,870 - sdwdate.log - INFO - New unixtime : 1511717018.7
2017-11-26 17:23:38,702 - sdwdate.log - INFO - Instantly setting the time by using command "sudo /bin/date --set @1511717018.7"

sdwdate restarted to get correct time:

2017-11-26 17:30:19,352 - sdwdate.log - INFO - Fetching remote times, start Sun Nov 26 17:30:19 UTC 2017 (unixtime 1511717419.35)
2017-11-26 17:30:23,906 - sdwdate.log - INFO - Pool 1 last url: #.onion, web unixtime: 1511903669, web time: Tue Nov 28 21:14:29 UTC 2017, diff: 186246 seconds
2017-11-26 17:30:23,907 - sdwdate.log - INFO - Pool 2 last url: #.onion, web unixtime: 1511903670, web time: Tue Nov 28 21:14:30 UTC 2017, diff: 186247 seconds
2017-11-26 17:30:23,907 - sdwdate.log - INFO - Pool 3 last url: #.onion, web unixtime: 1511903669, web time: Tue Nov 28 21:14:29 UTC 2017, diff: 186246 seconds
2017-11-26 17:30:23,907 - sdwdate.log - INFO - Fetching remote times, end Sun Nov 26 17:30:23 UTC 2017 (unixtime 1511717423.91)
2017-11-26 17:30:23,919 - sdwdate.log - INFO - Pool differences, sorted: [186246, 186246, 186247]
2017-11-26 17:30:23,919 - sdwdate.log - INFO - Median time difference: 186246
2017-11-26 17:30:23,919 - sdwdate.log - INFO - Seconds to add: - 0.594058128
2017-11-26 17:30:23,920 - sdwdate.log - INFO - New time difference: 186245.405942
2017-11-26 17:30:23,928 - sdwdate.log - INFO - Gradually adjusting the time by running sclockadj, PID=2400

cmd: /usr/lib/sdwdate/sclockadj --no-debug --no-verbose --no-systohc --no-first-wait --move-min 5000000 --move-max 5000000 --wait-min 1000000000 --wait-max 1000000000 --add 186243100204744
results in 100% cpu (the --add parameter is not actual from this example but close)

These results can be reproduced for arbitrarily many dispVMs.

2 Likes