Increased memory usage?

I’m observing that Whonix (mostly Workstation, but Gateway too) takes more and more time to start. This seems to be connected with more memory usage, which causes swapping during startup (until dynamic memory management kicks in), sometimes to the point of startup timeout. I haven’t observed specific change or time where it went worse, it feels like it slowly got worse and worse over longer period of time. At this point the default 400MiB initial memory isn’t feasible anymore - I’m going to increase it for Whonix Workstation specifically: Increase default initial memory for Whonix workstation by marmarek · Pull Request #24 · QubesOS/qubes-core-admin-addon-whonix · GitHub . While initial memory change may not look like much, the change is due to overall greater memory usage - not only during startup. This is especially bad for users running several workstation VMs.

One recent(?) change I think contributes to this, is enabling memlockd. This service takes about 20 MiB of RAM (and forces itself to remain in RAM). This seems to be a part of emergency shutdown feature (security-misc package). But this looks wrong for a few reasons:

  • emergency shutdown as described doesn’t make sense in Qubes VM - VM won’t see yanking USB stick this way
  • emergency shutdown feature is disabled for now, yet memlockd service is started
  • emergency shutdown feature doesn’t even use standard memlockd.service unit - it starts its own instance - which, when enabled, will use even more RAM (although, hopefully, some of that will be shared between the two instances)

Another thing I noticed is sdwdate-watcher process. It takes about 40 MiB of actual RAM (RSS). I don’t think it’s anything new, but if I see it correctly, it’s a very simple script that imports the whole Qt5 to just monitor changes in a single file (and also run a timer that does a no-op call every 500ms…).

There are probably more improvement opportunities, but those two were easy to find.

As for other reasons for slow startup, systemd-analyze blame points at plocate-updatedb.service - it’s second slowest startup (just after apparmor.service) while looking to not be really essential (or is it?). Apparmor also takes quite a bit of time to start, but I don’t think there is much room for improvement there…

3 Likes

Will be investigated soonish.

Is it sufficient if this gets implemented for Whonix 18 (Debian 13 / trixie based) (and above) only? This fix would arrive with the first Whonix 18 release.

This should be released as soon as possible.

2 Likes

I’d like there to be a solution before final Qubes 4.3 release (since we have -rc1, estimated release is in few months, like 3 or 4). Is Whonix 18 feasible in this timeframe?

Or maybe I can simply modify my PR to increase memory only on Whonix 17 - then we have a workaround for the old version, and new version will have a proper fix (the minor downside is the in-place upgrade 17→18 will remain with increased memory).

3 Likes

Most likely, yes.

2 Likes

I would much rather put up with the increased startup time until whonix-18 gets released (which seems to be soon anyways). I wouldn’t want this “stealth” RAM increase of whonix qubes go on without my notice indefinitely as I do in-place upgrade of my whonix-17 templates to whonix-18.

1 Like

True, we should probably add conditionals to keep it from running in Qubes OS. It would need to run directly in dom0 to be useful.

Hmm… for some reason I thought (or at least think I thought) that memlockd’s service wasn’t enabled by default. Maybe that’s because I only looked at it in a sysmaint session where a lot of things are disabled by default… at any rate, that certainly shouldn’t be happening, if we continue to use memlockd we should ship configuration to keep memlockd disabled by default.

We should also determine if memlockd is even needed - maybe systemd already locks itself into memory sufficiently? Or maybe we can just lock it into memory manually rather than pulling in an extra dependency?

True.

The reasoning for including memlockd in emerg-shutdown was that, even though systemd is very crash-resilient (even if it receives a signal like SIGSEGV, it will try to keep from truly terminating in order to prevent a kernel panic), in theory the code in systemd that provides crash resilience could be swapped out or otherwise not present in memory, or some code it relied in within libc could be absent. That would result in the crash resilience code re-crashing, which didn’t sound safe at all, thus I wanted to lock systemd and its dependencies in memory so that the system didn’t panic before it managed to shut down. That will result in increased memory usage and make even things like paging a bit more difficult in heavily constrained environments though.

sdwdate-watcher will be going away in the split sdwdate-gui rewrite (which should be merged into the Kicksecure 18 code soon-ish), so hopefully that will help things a bit there.

Edit: It looks like the new sdwdate-gui-client, written as part of the effort to split sdwdate-gui in a more efficient way for Qubes OS, is going to be doing the equivalent job, and it also pulls in all of Qt for its work. Maybe it can be rewritten to not use Qt? It isn’t graphical so we could do that potentially.

Edit 2: Porting away from Qt in order to watch for filesystem events is apparently also not easy. sdwdate-gui-client needs to be able to respond to both filesystem and network events, which means either it needs to be able to use something like select or poll on both an inotify fd and a socket fd, or it needs to use a framework that allows it. The Python inotify bindings don’t look like they’ll make that possible without reaching into private object components, and the framework we’re using that makes things possible now is Qt, which is the very thing we’re trying to avoid. Rewriting this in C would make things easier here, but then we have to fight with architecture-dependent and memory-unsafe code…

Unfortunately this is necessary to get PyQt applications to respond to Ctrl+C keypresses in a terminal. This is a somewhat common pattern in our Qt code, we don’t know of a better solution yet.

This seems to be being pulled in by Catfish (a file search tool we include by default in Whonix-Workstation and Kicksecure). This is apparently a timer-triggered unit that attempts to trigger daily, at semi-random times of day I believe.

I have mixed feelings here - Catfish appears to work on my anon-whonix AppVM, so it might be useful to users. On the other hand, plocate isn’t even usable by default on Whonix because it’s a setgid executable, causing permission-hardener to strip it of its setgid bit and refuse to let anything other than root execute it. Maybe we’re better off removing it if it’s causing startup speed issues and pulls in things that the average user isn’t able to use at all.

(On the topic of Catfish, systemd-analyze blame doesn’t actually show anything related to plocate on my machine, so I’m wondering if this is a red herring.)

Edit: So this is very weird. plocate is present on my Whonix-Workstation VM under Qubes R4.3. But the latest build log for Whonix-Workstation (build-logs-tertiary/build-r43-templates-community/log_2025-08-12_12-11-23.xz at main · QubesOS/build-logs-tertiary · GitHub) doesn’t show plocate being installed, and Catfish doesn’t depend on it, it only recommends it (which shouldn’t pull it in since Whonix is built with the no-recommendsflavor enabled by default in qubes-builderv2’s config). So how on earth is it even ending up in the image to begin with?

Edit 2: Did more research and discussed with Marek, I believe this is probably not part of the base Whonix-Workstation template but was probably pulled in while doing other work. Ignoring for now.

2 Likes

Another couple of possible improvements I can see looking at htop and pstree -a | grep -C1000 sleep in a Whonix-Workstation DispVM:

  • There are an awful lot of things that call sleep infinity it appears, judging by the number of times I see it in htop. Every sleep process eats about 1.7 MB (??? This is according to htop, but KDE system monitor thinks it only uses a bit over 100 KiB, and another memory usage check told me it uses 253 KiB, I have no idea who’s right), so sleep infinity may essentially throw away a megabyte and a half of RAM more-or-less permanently. Other code seems to run sleep in a loop, which isn’t much better from a memory consumption standpoint (looking at msgdispatcher here, which runs sleep -- 10 over and over before checking some PIDs). System components that do this or something similar are:

    • anon-ws-disable-stacked-tor’s tor.anondist (seems to be used to keep a dummy Tor wrapper from ever exiting so that systemd doesn’t restart it over and over) (Pushed a commit that replaces the sleep with an all-Bash implementation of sleep)
    • sdwdate (some component sleeps for a long amount of time) (Used to use sleep to work around a Python bug which is no longer an issue, switched to using Python time.sleep)
    • systemcheck’s canary-daemon (sleeps for an hour, then runs /usr/libexec/systemcheck/canary before sleeping for another hour in a loop) (Switched to all-Bash implementation of sleep)
    • msgdispatcher, as mentioned above (Switched to all-Bash implementation of sleep)
    • /usr/bin/qubes-session (not sure why)
    • usertest2.service, part of msgcollector, it literally just runs /usr/bin/sleep infinity, I do not know why
  • privleapd is taking up almost 10 MB. Getting around this is hard since privleap is a critical component of keeping sudo non-exeuctable for users, since it allows opening up specific privileged operations without a SUID executable. Maybe it could be written in some lighter programming language, or maybe there’s room for memory optimization there. 10 MB seems like an awful lot for the job it does, but it is written in Python, so…

2 Likes

pyinotify and asyncio can do that. We use that in a few places, for example qubes-desktop-linux-manager/qui/clipboard.py at main · QubesOS/qubes-desktop-linux-manager · GitHub (graphical application) or qubes-core-qrexec/qrexec/policy/utils.py at main · QubesOS/qubes-core-qrexec · GitHub (non-graphical application)

3 Likes

I haven’t used asyncio previously (looked into it a few times but ended up finding simpler, less involved solutions each time). I’ll work with it and see if I can get something good to work. Hopefully it will be lighter than using Qt.

2 Likes
dpkg -S /lib/systemd/system/plocate-updatedb.service

plocate: /lib/systemd/system/plocate-updatedb.service

I checked the latest (from Qubes community testing repository) Templates for Qubes R4.2.

dpkg -l | grep plocate

kicksecure-17:

  • plocate installed: no

whonix-gateway-17

  • plocate installed: no

whonix-workstation-17

  • plocate installed: no

I have also checked the build logs for R4.2 and R4.3. No sign of the plocate package. (Except a mention as “recommended package”, but it does not get installed.)

In version 18 and above, we plan on introducing /usr/lib/systemd/system/ensure-shutdown.service.

ensure-shutdown. Not emergency shutdown.

ensure-shutdown will also require memlockd but not memlockd.service.

Arron already gotten rid of memlockd.service: Disable memlockd service by default, fix systemd path · ArrayBolt3/security-misc@cd44a7e · GitHub

Useful in a Qubes sys-usb?

related: Consider the impact of USBGuard in sys-usb · Issue #10174 · QubesOS/qubes-issues · GitHub

1 Like

Booting Qubes OS from USB is currently incompatible with sys-usb. Furhtermore, forcefully shutting down sys-usb isn’t enough to shutdown the whole system. So, no, not useful in sys-usb.

A feature like this would need to be integrated with dom0 somehow. And there are existing options for that:

3 Likes

The issue applies to Whonix 18 too, at least in a few cases:

I haven’t looked deeper into what uses most memory on Whonix 18, at least not yet.

3 Likes

Looking at memory usage on my end, some low-hanging fruit I see:

  • polkit-mate-authentication-agent-1. This is necessary for authentication popup prompts to work. It’s useful in sysmaint sessions, and in user sessions if user-sysmaint-split is not installed, but otherwise it’s useless since polkit-agent-helper-1 isn’t able to run in an unprivileged user session anyway. We might be able to save somewhere between 20 and 70 megabytes of memory if we can get rid of that in user sessions.
  • smart-notifier. This probably shouldn’t be running, smart-notifier was included even in Qubes VMs for in the event someone passed through a physical hard disk to one of them, but the number of people who are going to do that is probably very small, and it’s eating almost 40 megabytes of memory. It’s not worth it.
  • sdwdate_gui_client.py, which can probably be rewritten to be lighter by porting to asyncio as mentioned above. Other work got in the way of doing that, but it’s probably a good time to just do it.
  • usbguard-daemon, usbguard-notifier, and usbguard-dbus. These are useful in sys-usb. Whonix-Gateway will not be used as sys-usb (or at least it really should not be). Should be easy to disable there, together these should save about 26 megabytes.
  • upowerd is eating about 5 megabytes, it’s obviously not useful in Qubes since dom0 handles power management.
  • There’s a Bash script called sdwdate-start-anondate-set-file-watcher that’s designed to allow sdwdate to restart Tor without having to weaken its sandbox to allow it to do the restart itself. This is using up probably around 5.5 megabytes or more, it could probably be rewritten in C to make it substantially lighter and then compiled on-demand. (Edit: Upon further experimentation, turning this off only saved about 1 megabyte of memory, so ignoring for now.)

Beyond that I’m not sure there’s a whole lot else that can be done, but with all that put together we stand to gain 60-70+ megabytes by implementing all of the above (a lot more if we can get polkit-mate-authentication-agent-1 to go away), which is a decent chunk.

2 Likes

That sounds like a good idea for user session with user-sysmaint-split. How is it started? If a .desktop file, we support (IIUC originally gnome-only) mechanism similar to ConditionPathExists from systemd - see qubes-gui-agent-linux/window-icon-updater/qubes-icon-sender.desktop at main · QubesOS/qubes-gui-agent-linux · GitHub . In other files we have qvm-service-wrapper (see qubes-core-agent-linux/autostart-dropins/volumeicon.desktop at main · QubesOS/qubes-core-agent-linux · GitHub ) but that’s likely less useful as it isn’t qvm-service for controlling user/sysmaint.

3 Likes

Hm, I found qubes-gui-agent-linux/appvm-scripts/etc/xdgautostart/qubes-polkit-gnome-authentication-agent-1.desktop at main · QubesOS/qubes-gui-agent-linux · GitHub - but that’s gnome one not mate one. Or is it the same?

2 Likes

smart-notifier also sounds like a good idea to disable (can be made conditional on qvm-service with the same name - if somebody really wants to re-enable it).

2 Likes

I think those two should be more than enough to fix the issue. The default initial memory is 400MB, and it hits OOM only sometimes (in PVH). So, it feels like not much is needed.

2 Likes

It is a desktop file. Really the memory savings would be valuable outside of Qubes too though, so I implemented a small service in usability-misc that just moves the desktop file out of the way on boot if the system is booted in a mode where it’s not needed. I’m realizing as I type this that the mechanism for moving the file back should be made slightly more robust, but that’s a solution that will work everywhere.

I just moved it to a different metapackage, so it shouldn’t even be installed on Qubes anymore.

I did implement a lot of the other solutions too, or at least started work on implementing them. The polkit agent solution will only work in a limited number of scenarios, and smart-notifier was an entirely new addition that just wasn’t expected to take up so much memory. We were having memory usage issues before that, so hopefully this will get RAM usage actually lower than last time we were using too much RAM.

2 Likes

4 posts were split to a new topic: Should apt-get autoremove be automated during release-upgrade and/or upgrade-nonroot?