Apply systemd sandboxing by default to some services

The pam related changes most likely broke whonixcheck systemd hardening.

Jul 13 19:46:53 host whonixcheckdaemon[25502]: sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the ‘nosuid’ option set or an NFS file system without root privileges?

To test:

sudo systemctl daemon-reload && sudo systemctl restart whonixcheck && sleep 1 && sudo systemctl status whonixcheck.service | cat

Again “most” of them don’t work. List below.

## Copyright (C) 2012 - 2018 ENCRYPTED SUPPORT LP <adrelanos@riseup.net>
## See the file COPYING for copying conditions.

[Unit]
Description=whonixcheck
Documentation=https://www.whonix.org/wiki/whonixcheck

After=network.target
Wants=network.target

After=rinetd.service
After=tor.service
After=tor@default.service
After=onion-grater.service
After=whonix-firewall.service
After=whonix-firewall-sdwdate-watcher.service

Requires=msgcollector.service

[Service]
Type=simple
User=user
Group=user
ExecStart=/usr/lib/whonixcheckdaemon
SuccessExitStatus=143
#KillMode=process
TimeoutSec=30
Restart=always

# Hardening.
#ProtectSystem=strict

## ok
ProtectHome=true

## fail
#ProtectKernelTunables=true

## fail
#ProtectKernelModules=true

## ok
ProtectControlGroups=true

## ok
PrivateTmp=true

## ok
PrivateMounts=true

## fail
#PrivateDevices=true

## fail
#MemoryDenyWriteExecute=true

## fail
#NoNewPrivileges=true

## fail
#RestrictRealtime=true

## fail
#SystemCallArchitectures=native

## fail
#RestrictNamespaces=true

## fail
#RestrictAddressFamilies=AF_UNIX AF_INET

## fail
#SystemCallFilter=wait4 read close execve open write rt_sigprocmask stat munmap mprotect clone mmap fstat access brk poll rt_sigaction select ioctl recvfrom getuid getgid getegid pipe getpid futex arch_prctl lseek rt_sigreturn geteuid fcntl getdents dup2 readlink sync getsid unlink sysinfo uname connect setresuid lstat newfstatat sendto getrlimit statfs faccessat sendmsg getppid setgroups bind umask fchmod writev mremap msync madvise dup alarm socket recvmsg shutdown getsockname getpeername socketpair getsockopt setsockopt kill getcwd chdir fchdir rename mkdir chmod chown lchown getrusage setuid setgid setpgid getpgrp setsid getgroups getresuid setresgid getresgid getpgid capget sigaltstack fstatfs prctl setrlimit gettid getxattr sched_getaffinity set_tid_address fadvise64 timer_create timer_settime openat unlinkat fchmodat ppoll set_robust_list utimensat getrandom

[Install]
WantedBy=multi-user.target

But even with the few enabled ones there is still an error message.

Jul 13 20:05:54 host PAM-CGFS[634]: cgroupfs v1: Failed to escape to init’s cgroup
Jul 13 20:05:54 host PAM-CGFS[634]: cgroupfs v1: Failed to enter cgroups
Jul 13 20:05:54 host PAM-CGFS[634]: Failed to enter user cgroup /user/root/0 for user root

I don’t think systemd hardening for whonixcheck is very important. It runs under user whonixcheck automatically anyhow. (/user/bin/whonixcheck does that.) And once users run whonixcheck manually it would not have systemd seccomp protections anyhow.

Therefore disabled in git master. Please retest and fix if you like.

1 Like

The pam related changes didn’t do anything with sudo so I find it unlikely that they’re the reason for this.

1 Like
1 Like

Doesn’t kloak already run as a privleged process? By definition of its function it needs to access the input device.

That doesn’t mean it can’t be sandboxed. It’s actually more of a reason to sandbox it.

The sandboxing doesn’t restrict access to any devices.

1 Like

Yeah. The original systemd unit file has certainly room for improvement.

Maybe it could/should even run under a limited user account kloak.

If you want to do that too…

Btw please consider Port to sysusers.d mechanism?

1 Like

I’m not sure that would be a good idea. The sandboxing removes all capabilities from kloak and it only works because the root user owns the devices kloak needs access to. Running it under a different user will make it so it doesn’t have permission to access those devices and we’ll need to give it the CAP_DAC_OVERRIDE capability which is dangerous as it allows it to bypass any DAC permission check. Any possible advantage to running it as a different user would mostly be lost.

2 Likes

Members of group input are allowed to write to /dev/input devices. So user kloak could be a member of group /dev/input.

ls -la /dev/input/

crw-rw---- 1 root input
crw-rw---- 1 root input

By being a member of the input group no CAP_DAC_OVERRIDE capability would be required?

1 Like

Kloak doesn’t actually write to /dev/input. It writes to /dev/uinput.

At least, that’s what it says when starting it

* Started kloak : Keystroke-level Online Anonymizing Kernel
* Reading from  : /dev/input/event5 (dakai PS/2+USB Keyboard)
* Writing to    : /dev/uinput
* Maximum delay : 100 ms

/dev/uinput isn’t owned by the input group. It’s owned by the root group.

ls -la /dev/uinput

crw------- 1 root root

It seems like we can change what group it’s owned by with a udev rule so we can probably make it owned by the kloak group.

1 Like

Another thing I think we should look into is creating device policies for our sandboxes so the service can only access a limited number of devices.

PrivateDevices=true sets up a new /dev mount and adds a few necessary devices like /dev/null and /dev/random but some services need access to more than just those (e.g. kloak needs access to input devices) so we can either just leave out PrivateDevices (less secure) or create a device policy so only the needed devices are in the sandbox (more secure).

Currently, we just leave out PrivateDevices but that isn’t great.

I haven’t had much luck with creating device policies though. The documentation is here.

Debian doesn’t seem to use this much. Running

grep "DeviceAllow" /lib/systemd/system/*.service

only shows OpenVPN stuff.

1 Like

Great guides for using systemd to sandbox services. There may be features in there beyond what we do:

1 Like

https://github.com/Whonix/onion-grater/pull/9

We should look into setting a more restrictive umask per-service (not global like we tried before) and whitelisting IP addresses.

Umask=0077
IPAddressDeny=any
IPAddressAllow=...

I tried these but it broke onion-grater. Exactly which IP addresses does onion-grater connect to? We can create a whitelist of them.

Also see: Linux Hardening Guide | Madaidan's Insecurities

1 Like

Is disable systemd hardening · Kicksecure/sdwdate@b5f0ea1 · GitHub still a problem? I can’t reproduce it with the sandboxing enabled. I can send a PR to strengthen sdwdate’s sandboxing too if that isn’t an issue anymore.

1 Like

I see no issues on my end. Please test.

I had to remove MemoryDenyWriteExecute from onion-grater and sdwdate because some python thing seems to be using RWX memory for some reason.

1 Like

You may get these errors:

Unknown lvalue 'ProtectKernelLogs' in section 'Service', ignoring
Unknown lvalue 'ProtectHostname' in section 'Service', ignoring
Unknown lvalue 'ProtectProc' in section 'Service', ignoring
Unknown lvalue 'ProcSubset' in section 'Service', ignoring

These are harmless. Those options only exist in later versions of systemd but I added them for forward compatibility. Another option to look into: https://github.com/systemd/systemd/commit/407234203b41e0a27b3229347c1ad6b2b17e3c21

1 Like

All merged.

Awesome! I will test this over the next days and report back should there be any issues.

1 Like

Broken.

sdwdate log will show:

/bin/date: cannot set date: Operation not permitted

During testing might help:

  • sudo sdwdate-clock-jump
  • sudo systemctl restart sdwdate

First sets time using date. The later (after first time after boot setting using date) using sclockadj.

1 Like

sudo systemctl stop sdwdate also broken. Takes too long. Should be (almost) instant. Probably since the signal is no longer received by sdwdate. (Then systemd uses sigkill after timeout but that’s bad.)

1 Like

Try adding kill getsockopt unlink to SystemCallFilter, /var/lib/sdwdate/ to ReadWriteDirectories and commenting PrivateUsers.

## Sandboxing.
AmbientCapabilities=CAP_SYS_TIME
CapabilityBoundingSet=CAP_SYS_TIME
ProtectSystem=strict
ReadWriteDirectories=/run/sdwdate/ /var/lib/sdwdate/
ProtectHome=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
ProtectKernelLogs=true
ProtectHostname=true
ProtectProc=invisible
ProcSubset=pid
PrivateTmp=true
#PrivateUsers=true
PrivateDevices=true
NoNewPrivileges=true
LockPersonality=true
RestrictRealtime=true
RestrictSUIDSGID=true
RestrictAddressFamilies=AF_UNIX AF_INET
RestrictNamespaces=true
SystemCallFilter=wait4 select futex read stat close openat fstat lseek mmap rt_sigaction getdents64 mprotect ioctl recvfrom munmap brk rt_sigprocmask fcntl getpid write access socket sendto dup2 clone execve getrandom geteuid getgid madvise getuid getegid readlink pipe rt_sigreturn connect pipe2 prlimit64 set_robust_list dup arch_prctl lstat set_tid_address sysinfo sigaltstack rt_sigsuspend shutdown timer_settime mkdir timer_create statfs getcwd setpgid setsockopt uname bind getpgrp getppid getpeername chdir poll getsockname fadvise64 clock_settime kill getsockopt unlink
SystemCallArchitectures=native

Does that fix the issue?

1 Like

That works better. However one issue remains. remote_times.py process.wait(timeout_seconds) is broken. Hangs forever. To reproduce, try this:

File:

/usr/lib/python3/dist-packages/sdwdate/remote_times.py

change

timeout_seconds = 120

to

timeout_seconds = 2

And then restart sdwdate.

Meanwhile I will revert these changes to unbreak sdwdate in Whonix developers repository. Once this is fixed, I’ll look into it ASP.

1 Like