Working on a theory that it’s keepalive related and maybe some sort of ‘new’ thing since our OS version upgrade. Looking at experimenting with keepalive settings between nginx, varnish and apache soon, or possibly removing varnish altogether and using nginx cache instead. Thanks
Splitting into a new thread.
I’ve disabled KeepAlive in Apache, working on a hunch. May or may not impact performance though.
Will monitor if there’s any further 503s and if so, I need to catch one ‘in the act’ with varnishlog to see the actual error reported.
It wasn’t KeepAlive, but I am pretty sure it was related to PHP opcache (which indeed was a ‘new’ thing as of a few weeks ago as part of a big upgrade).
Since I adjusted an opcache setting about 6 or 7 hours ago, there’ve been no more random 503s. Hopefully I’ve caught it…
Please let me know in this thread if you encounter the instant 503 ‘guru meditation’ Varnish error again.
Nope, it was not opcache, though it was an important thing to fix anyway.
Just saw it happen again, This time counted the max Apache processes was exactly 150 - which is the limit of MaxRequestWorkers. I normally expect to see this limit mentioned in the log which is why I didn’t think it was this issue, but given the ‘coincidence’, think it’s maybe that. Have adjusted that value up. Will see how it goes…
pretty sure it’s Phabricator (as usual) causing some sort of huge spike in traffic - whenever it occurs, there’s a huge spate of 503s that are on the Phabricator .onion and are all or mostly on /diffusion URLs e.g http://phabricator.dds6qkxpwdeubwucdiaord2xgbbeyds25rbsgr73tbfpqpt4a6vjwsyd.onion/diffusion/WHONIX/history/pidgin/whonix_shared/usr/lib/timesync/30_run-sdwdate;18.104.22.168.7-developers-only
We’ve made some adjustments that will technically remove all ‘Guru meditation’ errors, but may still cause some form of timeout when we get a big rush of traffic particularly to the wiki. However, we’re hoping the performance adjustments prevent those stampedes either way.
One other issue causing 503s was discovered to be intermittent hardware issues which are also being addressed, but obviously can’t be fixed with software optimisations. So, fighting a multi-headed snake right now
Closely monitoring since today’s changes.