Issues with Whonix-Gateway cracking out after 1-2 days

Hello,

I am using whonix gateway to host some privatte hidden service. The traffic is not really intense but moderate. I am using whonix-gateway in virtualbox for my convinience because I am also using VBox for other VMs on same host.

Right now the problem is that whonix-gateway cracks out every 2 days or so. It does it again and again. I have to reboot my VM then the problem dissappears. Its not a VM crash or anything like that I can open VBox and see the shell. When I check top I see no RAM issues or swapping it just looks like usual.

But when it happens whole internet goes down basically. HTTP hidden nodes go down first. TCP goes next. If TCP connection was already established before its still working until next reconnection happens, then its also gone for good. Same time exit nodes stop working I get no internet in browser so its in both directions.

Do you have any idea what can be the problem? I cant even figure out where to look since the RAM doesnt look like an issue. Already pissed off poking it every 2 days to make it work again.

And second question. Since for some reason it happens do I have any option to setup second whonix in the same network? I need my service to be online ephemerally at least on any of onion domains. I could do workaround so there would be 2 whonixes that would host different onion domains and reboot daily by cron at a different times. I cannot do it on just one whonix because it takes 10 minutes for onion domains to get propagated after reboot. While it could work somehow on 2 whonix gateways without any downtime. So is it possible to change that 10.152.152.10 ip to something else? or its hardcoded to the bones?

Thanks, I appreciate any advice

No. To debug:

  1. Generally: how would one debug this if it was not a Whonix VM, but a Debian buster based VM? I suggest to research, debug this as per Self Support First Policy for Whonix.

  2. Still not Whonix specific…

Hardware Issues?

General VirtualBox Troubleshooting Steps?

But we have documentation how to watch logs… Which can be useful inside VM and on the host.

Also…

Perhaps OnionBalance. Quote Onion Services - Whonix

OnionBalance [archive] can help to prevent de-anonymization of an onion service by protecting it from becoming unavailable through denial of service attacks (DDOS). OnionBalance is mentioned in the security readme [archive] by vanguards author and Tor developer Mike Perry where he discusses attacks against onion services and defenses. OnionBalance [archive] is now available for onion v3 services [18], see: Cooking with Onions: Reclaiming the Onionbalance [archive].

We don’t have detail documentation for that. Might be unspecific to Whonix.

Related:

No.

Onion-balance is something about one onion domain getting down and protecting it, but for me whole gateway cracks out so I cant even browse websites (outward exit node connection). And its not getting back as well, it was down for 14 hours once then I rebooted whonix-gateway, so its some bug but not clear yet in whonix or tor software. I have experienced no DDOS yet tho.

Hardware issues cant be happening, there is no issues with other VMs, using that powerful dedicated server long time already, no any VM have crashed ever or anything else I noted. Also hardware failures happen randomly but this thing cracks out like a clockwork, basically 1-2 days and it needs reboot already and then again same thing.

So I read multiple Whonix-Gateway where it recommends separating the internal virtualbox network by renaming it from Whonix to Whonix1 on second VM

But it wont work because my backend VM which hosts website and other services then should have 2 different interfaces with a same subnet. I am fairly concerned it wont work like that.

Anyways what should I do to change 10.152.152.10 gateway ip to something else (I mean on second whonix-gateway VM)? Will it work if I just change it in /network/interfaces? Or I should edit some configs as well?

Using multiple gateway’s / workstation’s should work too with onion-balance. If one gateway is overload (ddos) or down (power loss, software issues) another would take over.

You can find out in which places 10.152.152.10 is written change by greping the Whonix source code.

For example:

mygrep -r 10.152.152.10

includes:

packages/whonix-ws-network-conf/etc/network/interfaces.d/30_non-qubes-whonix:       gateway 10.152.152.10

which means package whonix-ws-network-conf contains a file /etc/network/interfaces.d/30_non-qubes-whonix includes string 10.152.152.10.

Package whonix-developer-meta-files you can ignore. Files in folder /debian you can ignore. Lots of other files are probably also not required. Might make sense to write a script to replace IPs. str_replace might be very useful.

mygrep -rl 10.152.152.10 | grep --invert-match whonix-developer-meta-files | grep --invert-match /debian

Pointers only. I won’t be providing tested, step by step instructions.

Complex indeed to set up with a single backend server indeed as long nobody invents step by step instructions how to set up onionshare with Whonix for a single backend.

So logged into Whonix-Gateway VM in cracked out state via VBox RDP (Physical screen of a VM)

Date and Time - Correct
Internet as root user on whonix gateway (curl ifconfig_me) - dns times out
Internet as root user on whonix gateway (curl 1_1_1_1) - still times out

sudo -u clearnet curl ifconfig_me - not working
sudo -u clearnet UWT_DEV_PASSTHROUGH=1 curl --tlsv1.2 --proto =hxxps -H ‘Host: check_torproject_org’ -k hxxxs:__116_202_120_81 - working shows html
So clearnet is working, and tor has to work

RAM - 300mb free

Journalctl - nothing fishy there

added logging to tor
then doing service tor restart
now Vanguards say “Tor daemon connection failed: [No such file or directory]”
in journalctl Tor 0.4.2.7 died Caught signal 11 and some C++ stacktrace

but that happened only after I did restart of tor service

so looks like its tor software problem tho, but I dont have any of those problems on other debians
Tor Debug log I cant enable for first minute it bloated to 100mb file, I just dont have such space

I did debian pakage update but idk what else I can do only setup second whonix gateway and reboot em with cron daily on different times

so still a mystery right now. No doubt its related to tor version in whonix, but not clear if its tor bug or tor bug induced by some whonix environment
because on vanilla debian I have never seen this kind of stuff ever

With 2 whonixes with 99% uptime and downtime on different times + onionballance might be some stable solution

Please report that stacktrace to Tor Project issue tracker so it can be fixed in Tor.

I have lost that stacktrace for now, it got overwritten after reboot
but I will get it next time in 24hrs or so

1 Like

You said 10.152.152.10 was not hardcoded but its was. A lot of files contains 10.152.152.10. Or you probably meant recompiling whonix myself but thats just redundant.

Anyways if anyone wanna change gateway ip to .11 its easy to do with simple sed approach

grep -R "10.152.152.10" /etc | awk -F: '{ print $1 }' | sort | uniq | while read p; do echo sed -i \'s/10[.]152[.]152[.]10/10.152.152.11/g\' $p; done | bash
grep -R "10.152.152.10" /usr | awk -F: '{ print $1 }' | sort | uniq | while read p; do echo sed -i \'s/10[.]152[.]152[.]10/10.152.152.11/g\' $p; done | bash

And then after reboot it just works with new gateway ip.

str_replace is a ton easier than sed.

Depends on your definition of hardcoded. Many configuration files do not have any scripting / variables support such as Tor configuration or ifupdown.

A post was split to a new topic: multiple Whonix-Gateway / Whonix-Workstation

You might be right about hardware issues. So I noticed reboot doesnt help when problem happens. After reboot it gets online, gateway is working but it works partially. For todays case no any onions got online even after an hour until I did reset of VM.

Might be some intel xeon virtualization issues on dedicated server, for some reason they are very cheap or hence any other hardware issues. Tho still it happens only with whonix vms

So only resetting a VM gives it a cleanse and a clear start. So I am fine with current workaround, I just reset em by cron daily now, also using bunch of load balancing software, so uptime is fine.

Thus doesnt looks like that its some common problem, might be anything in my case.

1 Like

Ok I had to replace that whonix-gateway with new one

It got worse and mad. I havent done no setup except hidden services. It was working for about 6 month fine, then those lags started happening, now its completely dead. Onion addresses just dont go online ever. Internet starting working in 30 mins after reboot. So looks like it just got too old and then died a natural death.
I got even kernel panic when I was exporting onion keys.

Not the first time I see whonix-gateway got spoiled. Happens even locally but rare case. Then I just replace it with new version. Right now I setup live mode as default grub mode hope it helps new one live longer. No idea how debian can make such harm to itself by just working with traffic.

And still no issues on other VMs or any problems noticed on host.

I’m sorry to bother but still not works properly

So I did reinstall and everything, i setup load balancer it got better. Rightnow uptime about 100% because of my workarounds with load balancers and those cron VM resets, but I still want to make it work properly.

Uptime 100% because one of two whonix VMs are online at one particular moment. But whenever I login to load balancer I rarely see them both online. Only after restart it works for some time.

Oct 06 13:18:58 host vanguards[737]: WARNING[Tue Oct 06 13:18:58 2020]: Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ 1705 (in state CIRCUIT_PADDING None; old state HS_CLIENT_INTRO HSCI_INTRO_SENT)
Oct 06 13:18:58 host vanguards[737]: NOTICE[Tue Oct 06 13:18:58 2020]: We force-closed circuit 1705
Oct 06 13:18:59 host vanguards[737]: WARNING[Tue Oct 06 13:18:59 2020]: Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ 1730 (in state CIRCUIT_PADDING None; old state HS_CLIENT_INTRO HSCI_INTRO_SENT)
Oct 06 13:18:59 host vanguards[737]: NOTICE[Tue Oct 06 13:18:59 2020]: We force-closed circuit 1730
Oct 06 13:19:00 host vanguards[737]: WARNING[Tue Oct 06 13:19:00 2020]: Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ 1738 (in state CIRCUIT_PADDING None; old state HS_CLIENT_INTRO HSCI_INTRO_SENT)
Oct 06 13:19:00 host vanguards[737]: NOTICE[Tue Oct 06 13:19:00 2020]: We force-closed circuit 1738
Oct 06 13:19:00 host vanguards[737]: WARNING[Tue Oct 06 13:19:00 2020]: Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ 1736 (in state CIRCUIT_PADDING None; old state HS_CLIENT_INTRO HSCI_INTRO_SENT)
Oct 06 13:19:00 host vanguards[737]: NOTICE[Tue Oct 06 13:19:00 2020]: We force-closed circuit 1736
Oct 06 13:19:01 host vanguards[737]: WARNING[Tue Oct 06 13:19:01 2020]: Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ 1735 (in state CIRCUIT_PADDING None; old state HS_CLIENT_INTRO HSCI_INTRO_SENT)
Oct 06 13:19:01 host vanguards[737]: NOTICE[Tue Oct 06 13:19:01 2020]: We force-closed circuit 1735
Oct 06 13:30:09 host vanguards[737]: WARNING[Tue Oct 06 13:30:09 2020]: Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ 2089 (in state CIRCUIT_PADDING None; old state HS_CLIENT_INTRO HSCI_INTRO_SENT)
Oct 06 13:30:09 host vanguards[737]: NOTICE[Tue Oct 06 13:30:09 2020]: We force-closed circuit 2089
Oct 06 13:30:09 host vanguards[737]: WARNING[Tue Oct 06 13:30:09 2020]: Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ 2088 (in state CIRCUIT_PADDING None; old state HS_CLIENT_INTRO HSCI_INTRO_SENT)
Oct 06 13:30:09 host vanguards[737]: NOTICE[Tue Oct 06 13:30:09 2020]: We force-closed circuit 2088
Oct 06 13:30:12 host vanguards[737]: WARNING[Tue Oct 06 13:30:12 2020]: Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ 2117 (in state CIRCUIT_PADDING None; old state HS_CLIENT_INTRO HSCI_INTRO_SENT)
Oct 06 13:30:12 host vanguards[737]: NOTICE[Tue Oct 06 13:30:12 2020]: We force-closed circuit 2117
Oct 06 13:30:12 host vanguards[737]: WARNING[Tue Oct 06 13:30:12 2020]: Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ 2119 (in state CIRCUIT_PADDING None; old state HS_CLIENT_INTRO HSCI_INTRO_SENT)
Oct 06 13:30:12 host vanguards[737]: NOTICE[Tue Oct 06 13:30:12 2020]: We force-closed circuit 2119
Oct 06 13:41:32 host vanguards[737]: WARNING[Tue Oct 06 13:41:32 2020]: Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ 2503 (in state CIRCUIT_PADDING None; old state HS_CLIENT_INTRO HSCI_INTRO_SENT)
Oct 06 13:41:32 host vanguards[737]: NOTICE[Tue Oct 06 13:41:32 2020]: We force-closed circuit 2503
Oct 06 13:52:56 host vanguards[737]: WARNING[Tue Oct 06 13:52:56 2020]: Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ 2722 (in state CIRCUIT_PADDING None; old state HS_CLIENT_INTRO HSCI_INTRO_SENT)
Oct 06 13:52:56 host vanguards[737]: NOTICE[Tue Oct 06 13:52:56 2020]: We force-closed circuit 2722
Oct 06 13:53:52 host vanguards[737]: WARNING[Tue Oct 06 13:53:52 2020]: Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ 2753 (in state CIRCUIT_PADDING None; old state HS_CLIENT_INTRO HSCI_INTRO_SENT)
Oct 06 13:53:52 host vanguards[737]: NOTICE[Tue Oct 06 13:53:52 2020]: We force-closed circuit 2753
Oct 06 13:53:53 host vanguards[737]: WARNING[Tue Oct 06 13:53:53 2020]: Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ 2778 (in state CIRCUIT_PADDING None; old state HS_CLIENT_INTRO HSCI_INTRO_SENT)
Oct 06 13:53:53 host vanguards[737]: NOTICE[Tue Oct 06 13:53:53 2020]: We force-closed circuit 2778
Oct 06 13:53:56 host vanguards[737]: WARNING[Tue Oct 06 13:53:56 2020]: Possible Tor bug, or possible attack if very frequent: Got 1 dropped cell on circ 2788 (in state CIRCUIT_PADDING None; old state HS_CLIENT_INTRO HSCI_INTRO_SENT)
Oct 06 13:53:56 host vanguards[737]: NOTICE[Tue Oct 06 13:53:56 2020]: We force-closed circuit 2788

Right now was able to catch this log, tor says it cant reach entry guards or something. Second whonix VM works fine at this moment. So tried to attach nyx screenshot, but dont have permissions to post images. Anyways it shows same log and about several kb/s which is just tor negotiation traffic, it should be much more when working indeed. I wonder wtf, could it be some networking problems, or entry guards thinking I am DDOSing them (but its just tor app) or could it be 2 whonixes cant work from one ip (but it always worked before) or might be some attack and this is tor protection (tho there is no ddosing happening, but might be some other attack maybe)?

Just need to understand what does those logs mean. If entry guards block me indeed I can buy some VPS for socks so second whonix will have different ip (or setup bridge there).

Please see this:

I read that but dont get what is happening specifically in my issue. Why it cant connect to entry guard? Is there a way I can expand the set of entry guards for my whonix-gateway without degrading anonymity?

How I can mitigate this issue with vanguards? I dont want to disable it as well. But it just blocks tor completely sometimes even outgoing gateway traffic.

From the tor entry guards wiki

The safest decision is to persist with poor performance and wait for normal guard rotation.

Thats sounds too pathetic. Any ways I can make it more stable without degrading the security and forcing guards rotation?

If users feel compelled in their circumstances to proceed despite the anonymity risks, then it may be safer to first try:

One of the fallback primary entry guards.
A configured bridge.
Chaining other tunnels with Tor.
Creating a fresh Whonix-Gateway ™ (sys-whonix) and copying across the Tor state file.

Well I thought about this already, I will try setting up a private bridge but idk how it changes anything.

Ok I am lost right now, I see bridges getting broadcasted to bridge authority thats not what I want on my private relay. I was thinking of some private relay to mitigate the problem. need your reply before I start messing around.

I don’t know that. Whonix is a downstream Linux distribution, a research and implementation project. We take existing components such as Tor and vanguards and then implement, bundle these according to their documentation. Therefore, I am the wrong person to ask.

The critical chapter is this one:
Vanguards - Tor Anonymity Improvement

And the critical quote is this one:

Events that are detected by heuristics that still need tuning are at NOTICE level. They may be a bug, a false positive, or an actual attack. If in doubt, don’t panic. Please check the Github issues [archive] to see if any known false positives are related to these lines, and if not, consider filing an issue. Please redact any relay fingerprints from the messages before posting.

I doubt this issue is caused by anything Whonix does. The onion domain on whonix.org server isn’t used for providing location privacy for the server, doesn’t require anonymity since it’s a public server reachable over clearnet. Has been experiencing similar reachability issues and vanguards warnings.

Then also if this was an attack on the Tor network (happened before with the Tor botnet attack) or against you specifically, what can I say…

Hosting a high availability (“99,99%” online), let alone a high traffic one, load balanced one, certainly is non-trivial. It’s not something we have yet researched and documented in Whonix documentation. Therefore until then Self Support First Policy for Whonix is the only way to resolve this.

Final conclusion on this:
Some authority of XXX country tryna mess with tor.
I have 3 whonixes right now and all of 3 gone offline suddenly. Everything gone mad, it said could not connect to entry guard, nothing helped even reboots of all 3 whonixes.

So I changed the setup. One whonix is on regular bridges now. Second one on obfs4 bridges. Third one direct. And the result is as follows:
Bridged ones works perfectly now
Direct one goes online randomly but mostly its offline.

So I suppose someone really doing some shady stuff with onion traffic at XXX location. I dont know how else to explain this. But it happened recently before it was fine thats why I was confused.