Tor connection dies randomly and needs manual restart

Tor connection in Whonix-Gateway dies and when it dies, it doesn’t recover but the /var/run/tor/log says:

[NOTICE] Tried for 100 seconds to get a connection to [scrubbed]:[xxx]. Giving up.

It’s probably all selected guard nodes are gone and Tor is not picking new ones. The only solution is to restart Tor manually.

sudo systemctl restart tor@default

After restarting Tor the connection is up again.

Is it safe to make cron job that detects Tor connectivity and restarts Tor daemon? Has Whonix any built-in tool for checking that?

Or alternatively, can you configure Tor to pick another guard nodes when they’re off for large amount of time? How is it insecure?

The second issue, /var/run/tor/log grows up and when tmpfs goes to 100%, Tor connection breaks and the only way to restart Whonix-Gateway or (if still possible) stop Tor, delete this file and start again.

sudo systemctl stop tor@default
sudo rm /var/run/tor/log
sudo systemctl start tor@default

Most logged infomation are useless. Where can I change logging level?

Could be a general Tor network issue. An active attack on specific Tor users is also conceivable as part of searching for a specific user. If traffic keeps getting interrupted at one side, it also gets corrupted at the destination. Correlate.

No.

I recommend:

One issue = one forum thread please.
Please create a separate forum thread for that.

This issue may occur when host or guest system runs out of RAM and needs swapping. This needs further examination whether this may cause error “Tried for X seconds to get a connection to [scrubbed]”. It looks like the entry node fails circuits but my experiment with multiple Whonix-Workstation machines running (some software need a lot of RAM so I increased RAM in VM settings to 1.5 GB while Whonix-Gateway remained still only 384 MB) Tor connection was almost dead due to failing circuits.

If this is true, the solution is to add more RAM to physical machine.

1 Like

Problem still exists. Example errors:

[warn] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)
[warn] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)
[warn] Tried for 120 seconds to get a connection to [scrubbed]:0. Giving up. (waiting for circuit)

[warn] Guard [guard name] (guard ID) is failing an extremely large amount of circuits. This could indicate a route manipulation attack, extreme network overload, or a bug. Success counts are 71/243. Use counts are 56/56. 70 circuits completed, 0 were unusable, 0 collapsed, and 10 timed out. For reference, your timeout cutoff is 60 seconds.

[notice] Closed 1 streams for service [scrubbed].onion for reason resolve failed. Fetch status: No more HSDi available to query.

[notice] No circuits are opened. Relaxed timeout for circuit 1945 (a General-purpose client 3-hop circuit state waiting to see how other guards perform with channel state open) to 85410ms. However, it appears the circuit has timed out anyway.

[warn] Giving up on launching a rendezvous circuit to [scrubbed] for hidden service [scrubbed]

This usually happens when host system is under heavier load than when idle. Anyone else experienced this issue?

Turning off Tor and deleting state file helps (another guard is picked) but this is very bad solution for security reasons.

Probably vanguards are closing circuits. Changing configuration of vanguards /etc/tor/anon-vanguards.conf would help but which of these configuration flags can be safely changed without lowering anonymity?

Also, which config options to change to examine this problem better? To solve this issue, you would need more what exactly is going on on the system.

Probably Generic Bug Reproduction required. Specifically Tor Generic Bug Reproduction.

The same would probably happen if Tor was installed on the host operating system. Could you please test iof that is the case?

If so, that would be unspecific to Whonix.

After upgrading Whonix-Gateway to latest version it’s even worse.

The upgrade process:

sudo apt update
sudo apt full-upgrade
sudo apt autoremove
sudo reboot
sudo release-upgrade
sudo reboot
sudo apt autoremove

Changed guard (not recommended for security) few times by:

sudo systemctl stop tor@default
sudo rm /var/lib/tor/state
sudo systemctl start tor@default

What anon-log shows (selected lines from logs and these are very frequent):

vanguards.service:
Tor bug #29699: Got 1 dropped cell on circ ... (in state HS_SERVICE_INTRO HSSI_ESTABLISHED; old state HS_SERVICE_INTRO HSSI_CONNECTING)
We force closed circuit ...
Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ ... (in state HS_SERVICE_REND HSSR_JOINED; old state HS_SERVICE_REND HSSR_CONNECTING)
Tor has been failing all circuits for 30 seconds!
Tor has been failing all circuits for 60 seconds!
Tor has been failing all circuits for 90 seconds!
Circ ... exceeded CIRC_MAX_HSDESC_KILOBYTES: ... > ...
Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ ... (in state GENERAL None; old state None None)
Possible Tor bug, or possible attack if very frequent: Got 2 dropped cell on circ ... (in state GENERAL None; old state None None)
Possible Tor bug, or possible attack if very frequent: Got 3 dropped cell on circ ... (in state GENERAL None; old state None None)
.......... up to 60 .........

What /var/run/tor/log shows (the first line is very frequent):

[notice] Tried for 120 seconds to get a connection to [scrubbed]:80. Giving up. (waiting for circuit)
[notice] Your network connection speed appears to have changed. Resetting timeout to 60000ms after 18 times and 520 buildtimes.
[notice] We tried for 16 seconds to connect to '[scrubbed]' using exit ..... at ..... Retrying on new circuit.
[warn] Invalid hostname [scrubbed]; rejecting
[notice] Failed to find node for hop #1 of our path. Discarding this circuit.
[warn] Guard ... is failing a very large amount of circuits. Most likely this mean the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 126/247. Use counts are 98/98. 123 circuits completed, 0 were unusable, 0 collapsed, and 10 timed out. For reference, your timeout cutoff is 60 seconds.

My old approach to fix connectivity issues is a cron job that:

1. Connect to one of hidden services (60 ms timeout)
2. Try again (120 ms timeout)
3. If no success, then: systemctl restart tor@default

After Gateway upgrade it’s very frequent that the cron job restarts Tor service.

Is it correct approach to do restart? I did this because Tor itself could not recover from broken all connections for hours and I noticed that restarting Tor usually helps however it takes about 10 minutes to make all connections work again.

Troubleshooting

When host system is under load, then it’s most likely that Tor connections will fail in Whonix-Gateway. Dunno why. High system load may cause response latencies.

If there is some latency, do Tor nodes close/fail circuits with the Gateway?

Also checking RAM (it constantly changes a bit):

MiB Mem:  331.5 total  7.4 free    250.5 used  82.2 buff/cache
MiB Swap: 496.0 total  473.6 free  22.4 used   83.3 buff/cache

When you lower RAM to 256 MB, then no hidden service works.
Now RAM is 384 MB and should be fine. If this is a cause, I will try add more RAM.

What happens after update

  1. Even more frequent connectivity issues:
    • hidden services do not work or randomly stop to work (0xF2 not connected to introduction point or just connection timeout)
    • also sometimes no Internet connection in Whonix-Workstation
    • very long time to wait for first opening of every hidden service
  2. Changing guard (not recommended) helps for a while or doesn’t help.

Tor version is 0.4.7.13 and it also shows:

[warn] Tor was compiled with zstd 1.5.2 but is running with zstd 1.5.4. For safety, we'll avoid advanced zstd functionality.

Questions

  1. Tor 0.4.8 introduced Proof of Work. Should update to resolve the issues?
  2. Should I pick some specific guard node instead of random? How safe is it?
  3. What to change in Tor configuration or any related services (vanguards, etc.) to make hidden services respond faster and to fix connectivity issues?
  4. Is it good approach to automatically restart Tor when connection is dead? Is there a way to force Tor rebuild circuits?

I’m still reading Whonix docs but any ideas to fix issues are appreciated.

There is 1 issue.

Unspecific to Whonix.