set oops=panic kernel parameter or kernel.panic_on_oops=1 sysctl for better security

madaidan · July 2, 2019, 5:02pm

oops=panic might be another good kernel parameter to use. This makes the kernel panic on certain errors (oops) to prevent the kernel from continuing running a flawed process. Kernel exploits can also cause oopses. This is similar to mce=0.

This can also be set with kernel.panic_on_oops=1 with sysctl.

The problem with this is that sometimes buggy drivers will cause harmless oopses which will cause unnecessary kernel panics. This might be less likely if this is enabled in a virtual machine.

This was suggested to Tails but they didn’t accept it as it would cause many hardware-related kernel panics.

A good way around the errors was suggested.

I think a nice, albeit slightly hacky compromise would be to have kernel.panic_on_oops=0 upon boot, and then set kernel.panic_on_oops=1 after the GNOME desktop starts up if the kernel has not oopsed by then. That would opportunistically provide a security improvement on supported hardware.

For some reason, the Tails devs didn’t include this, they didn’t seem to notice it at all.

A good way to implement this would be to add sysctl kernel.panic_on_oops=1 inside /etc/X11/Xsession.d/50panic_on_oops.

More kernel parameters were mentioned but nothing came of them.

Additional options I am looking into are reboot=cold (may make certain types of cold-boot attacks harder if memory is not removed from the system), acpi=copy_dsdt (may harden the system slightly from buggy BIOSes), and elevator=deadline (might reduce kernel surface area, with a nice side effect of improving USB and SSD performance). I may post rational for them as well if they turn out to be useful security-wise.

Does anyone know if these would have any security advantage? I can’t see how they would.

Another issue was opened about the panic_on_oops part but no useful information is there.

Checking the Tails issue tracker for kernel related issues may have more useful information but I couldn’t find much more there.

Patrick · July 3, 2019, 6:57am

madaidan via Whonix Forum:

oops=panic might be another good kernel parameter to use. This makes the kernel panic on certain errors (oops) to prevent the kernel from continuing running a flawed process. Kernel exploits can also cause oopses. This is similar to mce=0.

This can also be set with kernel.panic_on_oops=1 with sysctl.

The problem with this is that sometimes buggy drivers will cause harmless oopses which will cause unnecessary kernel panics. This might be less likely if this is enabled in a virtual machine.

So for one we could have a systemd unit file with
ConditionVirtualization=true that enables it as early as possible.

This was suggested to Tails but they didn’t accept it as it would cause many hardware-related kernel panics.

Harden Tails kernel with security-related kernel parameters (#11143) · Issues · tails / tails · GitLab

A good way around the errors was suggested.

I think a nice, albeit slightly hacky compromise would be to have kernel.panic_on_oops=0 upon boot, and then set kernel.panic_on_oops=1 after the GNOME desktop starts up if the kernel has not oopsed by then. That would opportunistically provide a security improvement on supported hardware.

For some reason, the Tails devs didn’t include this, they didn’t seem to notice it at all.

Might have slipped through? Feel free to create a new Tails ticket
pointing at this.

A good way to implement this would be to add sysctl kernel.panic_on_oops=1 inside /etc/X11/Xsession.d/50panic_on_oops.

Sounds good.

Or perhaps better a systemd unit file which runs last?

Also /etc/profile.d.

Please use /etc/X11/Xsession.d and /etc/profile.d to only dispatch the
hook. The actual implementation should go to a separate script.

/etc/profile.d /etc/X11/Xsession.d and /etc/xdg/autostart could have a
91_last file which creates a done file in /var/run [keep file
permissions in mind] which then gets used by the script to know if it is
early enough to enable kernel.panic_on_oops=1?

madaidan · July 3, 2019, 2:23pm

A systemd unit file would be better as it would enable it earlier and won’t depend on Xorg (so it can be used on CLI and wayland). I’m not sure exactly how it could be configured so that it is past the hardware errors stage.

A systemd unit file using targets sounds better to me. We could probably add something like After=graphical.target or something similar.

Patrick · July 4, 2019, 3:21am

That’s why it should be last, after graphical Xorg since that could cause hardware bugs, I assume.

madaidan · July 8, 2019, 10:13pm

Xsession.d scripts don’t work. I think they are run as a user rather than root so it won’t work.

A systemd service with After=graphical.target doesn’t work either but I can use After=lightdm.target but this isn’t really portable.

Patrick · July 8, 2019, 10:46pm

No problem. Can use sudo --non-interactive /path/to/wrapper/script and then an /etc/sudoers.d/security-misc exception.

user ALL=NOPASSWD: /path/to/wrapper/script

Or perhaps

ALL ALL=NOPASSWD: /path/to/wrapper/script

not sure.

(Then the wrapper script would assume running as root. The idea is to keep the code in Xsession.d minimal, having the hooks there only.)

madaidan · July 8, 2019, 11:03pm

That worked great.

Patrick · July 9, 2019, 10:57am

Merged. Could you please add some comments why we choose this implementation, and link to this thread?

madaidan · July 9, 2019, 1:54pm

Where should that be? In the readme, script or Xsession.d file?

Patrick · July 10, 2019, 7:26am

I guess there. The other files could have a brief comment on where to find the comment.

Patrick · July 10, 2019, 7:30am

Our current implementation is lacking a critical feature, the part:

if the kernel has not oopsed by then

We are currently enabling it unconditionally. Our current implementation has little/no(?) advantages about using an /etc/sysctl.d configuration snippet.

/usr/lib/security-misc/panic-on-oops needs to check if there were oppses yet, and:

if yes, don’t enable
if no, enable

Echo result either way. Since this output would probably visible in systemd journal log or some other log?

madaidan · July 10, 2019, 3:03pm

I don’t think that will be necessary. By the time Xorg has started, all the oopses caused by hardware will already have happened so it won’t trigger a kernel panic.

Patrick · July 11, 2019, 5:56am

Don’t some of these oppses keep happening all the time once some
component causes these?

HulaHoop · July 11, 2019, 1:55pm

Will panic the kernel only when attempting to run the flawed process?

madaidan · July 11, 2019, 3:09pm

Not sure. I don’t get any oopses on my hardware so I can’t test.

No, it panics when it gets an oops.

Patrick · July 11, 2019, 6:02pm

Will it kill only the one process or will the whole machine crash?

HulaHoop · July 11, 2019, 7:19pm

From what I’ve read the entire machine will fail instantly.

I worry that it might cause data loss after an update on hardware with buggy drivers. It won’t be possible to boot into the system to temporarily disable this or to salvage data even if a compromise happened (unlikely they will leave a trace).

I feel it is a weaker stop gap for the lack of a hardened mprotect (like grsec) that had this property of terminating processes if it detected bruteforce by an exploit.

Kernel oops has the potential for false positives where it prevents a buggy machine from booting while any advanced adversary will probably be competent enough to tailor their exploit not to trigger this.

madaidan · July 11, 2019, 10:17pm

The whole machine will crash.

It would be possible to chroot into it and recover any data.

That’s the reason we did the Xsession.d approach. Now we just need to figure out how to detect an oops and only set kernel.panic_on_oops=1 if there hasn’t been any oopses.

HulaHoop · July 12, 2019, 2:15am

Can you please write something for the wiki on that? Will this require a livecd? What about starting up LUKS encrypted systems?

Patrick · July 12, 2019, 8:11am

I would bet that the same oppses can happen over and over again at a
random time. It is probably whenever something unexpected triggers an
unexpected kernel code path which can happen many times.

madaidan via Whonix Forum:

It would be possible to chroot into it and recover any data.

Would be good to have chroot recovery instructions for all platforms
anyhow for other reasons.

However… What about to boot into single user mode (recovery mode)
(from boot menu), and disable this feature? We need that documented
anyhow too.

Also, what about a kernel parameter to disable this feature? Documenting
how to set kernel parameters is also something we should have documented.

That’s the reason we did the Xsession.d approach. Now we just need to figure out how to detect an oops and only set kernel.panic_on_oops=1 if there hasn’t been any oopses.

Parsing dmesg?