Onion-grater crashes

I noticed that swdate would occasionally fail on the Workstation when using kvm.

The Workstations swdate logs mention that the Gateways control port could not be reached.

When looking at the Gateway, the onion-grater.service is crashing.

seccomp logs show that onion-grater is repeatedly forbidden from calling syscall 25 (mremap)

The onion-grater service unit does not allowlist mremap in it’s SystemCallFilter as a workaround.
So I tried to add mremap as on override, and onion-grater stopped crashing.


I haven’t seen any swdate or onion-grater issues since I applied this workaround.

2 Likes

Reproducible? @HulaHoop

I’m having difficulty confirming this syscall is actually needed.

onion-grater is written in Python. It uses four third-party libraries, psutil, stem, yaml, and sdnotify (all in Debian’s repos). Together with Python as a dependency, there are five pieces of software that could be trying to call mremap().

Python only appears to use mremap for its mmap.resize() call according to a code search. /usr/lib/onion-grater (which the failing systemd unit calls) does not call mmap.resize(), so a dependency would have to be at fault. Of the four remaining dependencies:

At this point about the only thing that could be calling it still would be a Debian-added patch, or a downstream dependency of one of the four dependencies above. Digging deeper than that would be difficult though, and I’m reticent to do that without having reproduced the bug myself. If you could show what is calling mremap(), that would make me more comfortable with adding an exception for it.

1 Like

I investigated this some more, and I believe this is caused by a CPython change in Python 3.13

First I turned on core dumps and set ptrace_scope=0 by changing the relevant settings in

  • usr/lib/systemd/coredump.conf.d/30_security-misc.conf
  • usr/lib/sysctl.d/30_security-misc_ptrace-disable.conf
  • usr/lib/sysctl.d/990-security-misc.conf
  • etc/security/limits.d/30_security-misc.conf

Then I installed systemd-coredump, gdb, libc6-dbg and python3-dbg.
After a reboot I could inspect the relevant coredumps.

CPython 3.13 introduced PyMem_RawRealloc to its Limited C API

https://docs.python.org/3/whatsnew/3.13.html#limited-c-api-changes
https://docs.python.org/3/c-api/memory.html#c.PyMem_RawRealloc

When CPython is parsing the onion-grater source code, it tries to reallocate a list via list_resize which calls PyMem_RawRealloc which calls glibc realloc

https://github.com/python/cpython/blob/3.13/Objects/listobject.c#L181
https://github.com/python/cpython/blob/3.13/Objects/obmalloc.c#L80
https://github.com/python/cpython/blob/3.13/Objects/mimalloc/alloc-override.c#L134

glibc realloc calls mremap_chunk which tries to syscall mremap.

To verify that this is the cause, I increased glibc.malloc.mmap_threshold by overwriting onion-grater.service

[Service]
Environment=“GLIBC_TUNABLES=glibc.malloc.mmap_threshold=50000000”

This fixed the crash!

Setting mmap_threshold to a lower limit like “GLIBC_TUNABLES=glibc.malloc.mmap_threshold=5000” brought the crash back.

1 Like

I can confirm that the glibc implementation of realloc() calls mremap(), in the version of glibc in Debian trixie. (Making sure you're not a bot!)

I can also confirm that Python calls realloc() internally in more than a couple spots. (Repository search results · GitHub)

IMO this is enough justification to add the mremap syscall to the exception list. I’d rather not mess with the internals of glibc via a tunable if we can avoid it.

2 Likes

While so far I haven’t run into it, it seems like a legit bug that OP and Array managed to hunt down. I made a thread about some of my experiences on the dev subforum. Don’t know if these problems are specific to KVM or not

1 Like