Should all kernel patches for CPU bugs be unconditionally enabled? Vs Performance vs Applicability

Trying to decode this mostly fluff talk video by Intel on L1TF. Quote https://www.youtube.com/watch?v=n_pa2AisRUs

in some cases where it can’t be guaranteed that all virtualized operating systems have been updated some customers may choose to take additional actions first and coupled with the l1 cache flush they can ensure that only trusted siblings have access to the same processor core this capability is called core scheduling and is already supported by some hypervisors if course scheduling isn’t available and they suspect there might be potentially untrustworthy sibling sharing access to the l1 cache it may be appropriate to take further action like only allowing one thread to run per core this demonstrates what would happen if SMT were disabled while these actions might be applicable to a relatively small portion of the overall market we think it’s important to provide solutions for all our customers

Did they really mean “if the microcode, host and guest operating systems is patched, then there is no need to disable SMT”?

In other words, did they really mean “only if guests are not patched it might make sense to pin CPUs to guests and/or to disable SMT”?

Is it true that SMT needs to be disabled when running unpatched guests? A malicious guest using kexec to load a malicious kernel couldn’t read secrets by another VM using the same L1 cache if that other VM was patched?

This has to remain future work. While there are many researchers find out more and more attacks and kernel developers invent mitigations, there seems to be a severe shortage of people who are working on analysis tools such as spectre-meltdown-checker or enabling mitigations for as many users as easily as possible. Such as through installation of a package such as security-misc or a distribution such as Kicksecure or Whonix-Host. I am not aware of anyone else working on that currently. There are way too many vulnerabilities, mitigations, use scenarios, processors and therefore resulting combinations which makes this hard to implement.

For start, we could use a wiki table with an overview. Not even sure yet what contents such as table would need. Vulnerabilities, mitigations, use scenarios, processor(s) (families), locally exploitable, remotely exploitable, local information leak, remote information leak (NetSpectre), relevant when using a hypervisor vs not using a hypervisor.

Qubes disabled HT.
( Safe use of Hyperthreading when Xen stable includes new sched-gran parameter · Issue #5547 · QubesOS/qubes-issues · GitHub )
Performance is OK for me with disabled HT but it’s a powerful machine.

I think we should disable HT. Qubes disables it too.

Quote Red Hat Customer Portal - Access to 24x7 support and knowledge

Recent Intel CPU vulnerabilities (L1TF and MDS) cannot be fully mitigated in software without disabling Simultaneous Multi-Threading.

Therefore we better disable it.

This can have a substantial performance impact and is only necessary for certain workloads, so for compatibility reasons, SMT is enabled by default.

I guess for good security, switch from Windows to a Linux distribution. And I guess for advanced security, use Whonix, Whonix-Host, Kicksecure. For our target user it seems appropriate to disable SMT.

In addition, the Intel TAA vulnerability cannot be fully mitigated without disabling either of SMT or the Transactional Synchronization Extensions (TSX).

SMT and TSX should be disabled on affected Intel processors under the following circumstances:

  1. A bare-metal host runs untrusted virtual machines, and other arrangements have not been made for mitigation.
  2. A bare-metal host runs untrusted code outside a virtual machine.

In our threat model we deem most code untrusted. We don’t want to trust any code but sometimes we have to. That’s why we use linux user account separation and mandatory access controls. Because some code is considered not deliberately malicious (such as the browser) but potentially exploitable (therefore considered untrusted). Therefore I think “runs untrusted code outside a virtual machine” always applies in our threat model.

Otherwise also things would be more complicated.

  • Kicksecure users not using a hypervisor: no need to disable SMT
  • Whonix host users running VMs: should disable SMT

SMT can be conditionally disabled by passing mitigations=auto,nosmt on the kernel command line. This will disable SMT only if required for mitigating a vulnerability. This approach has two caveats:

  1. It does not protect against unknown vulnerabilities in SMT.

I guess at the quantity and speed at which vulnerabilities are published we again ought to disable SMT here.

Alternatively, SMT can be unconditionally disabled by passing nosmt on the kernel command line. This provides the most protection and avoids possible behavior changes on upgrades, at the cost of a potentially unnecessary reduction in performance.

1 Like

+1


OK


The amount of new bugs that keeps being uncovered is staggering and will make anyone’s head spin:

1 Like
1 Like

After the recent reconsideration, yes.

Seems like a good idea.

https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/spectre.html

High security mode

All Spectre variant 2 mitigations can be forced on at boot time for all programs (See the “on” option in Mitigation control on the kernel command line). This will add overhead as indirect branch speculations for all programs will be restricted.

Done above.

Using that now.


Please let me know should any mitigation not be unconditionally enabled yet.

1 Like

This is an important point. I was wondering about this too. And I could be wrong here. It would be very much desirable to simplify this. However, I think mitigations=auto isn’t the same as mitigations=force, while the latter does not seem to exist yet. This seems like a missing kernel feature but I wasn’t sure it’s worth a feature request.

mitigations=auto means “conditionally enable” while mitigations=force would mean “unconditionally enable”.

Hardware vulnerabilities — The Linux Kernel documentation and following pages don’t mention mitigations=.

quote The kernel's command-line parameters — The Linux Kernel documentation

mitigations=
[X86,PPC,S390,ARM64] Control optional mitigations for
CPU vulnerabilities. This is a set of curated,
arch-independent options, each of which is an
aggregation of existing arch-specific options.

auto (default)

I.e. auto is already the default. mitigations=auto would essentially do nothing.

auto,nosmt
Mitigate all CPU vulnerabilities, disabling SMT
if needed. This is for users who always want to
be fully mitigated, even if it means losing SMT.
Equivalent to: l1tf=flush,nosmt [X86]
mds=full,nosmt [X86]
tsx_async_abort=full,nosmt [X86]

It does not set l1tf=full,force, nosmt=force, tsx=off, spectre_v2=on, pti=on.

Kernel — CLIP OS 5.0.0_beta3 documentation writes

  • mitigations : This parameter controls optional mitigations for CPU vulnerabilities in an arch-independent and more coarse-grained way. For now, we keep using arch-specific options for the sake of explicitness. Not setting this parameter equals setting it to auto , which itself does not update anything.
1 Like

I don’t see any point in forcing these to be enabled. The kernel will automatically detect if the CPU is vulnerable and if it is, will enable the mitigations. If it isn’t vulnerable, it won’t enable the mitigations but that doesn’t matter since they aren’t needed.

Debian disables TSX by default with CONFIG_X86_INTEL_TSX_MODE_OFF=y but I guess the tsx=off parameter is useful for kernels with it enabled by default.

nosmt=force is redundant. We already disable SMT with mds=full,nosmt.

tsx_async_abort=full,nosmt is redundant as TSX is already disabled.

2 Likes

Which ones? All recent ones or only the ones mentioned in your last post?

Yes.

mds=full,nosmt does not make that redundant.
Though, l1tf=full,force makes it redundant.

l1tf=full,force https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt says

Implies the ‘nosmt=force’ command line option. (i.e. sysfs control of SMT is disabled.)

mds=full,nosmt does not say “Implies the ‘nosmt=force’”.

The reason why I am interested in nosmt=force is:

Force disable SMT, cannot be undone via the sysfs control file.

Which seems useful at least for purposes of Untrusted Root - improve Security by Restricting Root.

I see your point. But I am not sure.

As per TAA - TSX Asynchronous Abort — The Linux Kernel documentation here are some possible combinations (these are listed in a table on that page):


  • tsx=on
  • tsx_async_abort=full,nosmt
  • As above, cross-thread attacks on SMT mitigated.

  • tsx=off
  • tsx_async_abort=full,nosmt
  • TSX might be disabled if microcode provides a TSX control MSR. If so, system is not vulnerable.

What also confuses me:

Default mitigations

The kernel’s default action for vulnerable processors is:

Deploy TSX disable mitigation (tsx_async_abort=full tsx=off).

A combination of tsx=off and tsx_async_abort=full.

Why didn’t they write simpler:

Deploy TSX disable mitigation (tsx=off).

Therefore perhaps better to keep tsx_async_abort=full,nosmt?

1 Like

3 posts were merged into an existing topic: Whonix vulerable due to missing processor microcode packages? spectre / meltdown / retpoline / L1 Terminal Fault (L1TF)

All except maybe tsx=off.

mds=full,nosmt:

full,nosmt - Enable MDS mitigation and disable SMT on vulnerable CPUs

nosmt=force:

nosmt=force: Force disable SMT, cannot be undone via the sysfs control file.

They both do essentially the same thing. mds=full,nosmt can’t be undone via sysfs either.

user@host:~$ cat /sys/devices/system/cpu/smt/control
notsupported

My interpretation of the docs is using tsx=off and tsx_async_abort=full,nosmt together will only disable TSX if the mitigation isn’t available. Just using tsx=off should disable TSX regardless.

1 Like

I might still misinterpret that. Please send a pull request since I guess that is easier than trying to ask/explain it.


Intuition tells me that

  • if tsx=off is set, i.e. if tsx is disabled anyhow, there is no need for tsx_async_abort=full.
  • but on the other hand if tsx=off is set anyhow, tsx_async_abort=full is superfluous and shouldn’t do harm. Therefore in doubt, set it.

However, quote TAA - TSX Asynchronous Abort — The Linux Kernel documentation contradicts my intuition:

Default mitigations
The kernel’s default action for vulnerable processors is:
Deploy TSX disable mitigation (tsx_async_abort=full tsx=off).

Therefore I interpret tsx_async_abort=full tsx=off as a valid, sensible, non-illogical option. Switching tsx_async_abort=full to tsx_async_abort=full,nosmt is just an extension of that.

The kernel’s default action for vulnerable processors is tsx_async_abort=full tsx=off. Therefore I concluded to set tsx_async_abort=full tsx=off regardless to reach goal of unconditionally enabling mitigation seven if the processor not detected as vulnerable. [Plus add ,nosmt.]

I guess we’ll get only higher certainty on that by asking the kernel developers and/or reading the kernel source code.

1 Like

No.

My interpretation of the docs is using tsx=off and tsx_async_abort=full,nosmt together will only disable TSX if the mitigation isn’t available . Just using tsx=off should disable TSX regardless.

Using tsx_async_abort=full along with tsx=off will still enable TSX which can leave us open to 0days the mitigation doesn’t prevent.

Just using tsx=off should leave TSX always disabled and protect us from all TSX vulnerabilities.

1 Like

Could you please look this up in the kernel source code?

1 Like

I couldn’t find anything.

I think we should just leave all CPU mitigation things to the kernel default as it automatically enables needed mitigations if the device is vulnerable.

Is this a complete reversal of your original post here Should all kernel patches for CPU bugs be unconditionally enabled? Vs Performance vs Applicability?

1 Like

The original post was more of a question than a suggestion, but yes.

2 Likes

I see. Well these questions / this discussion actually convinced me that this is a good idea. I did re-read this whole forum thread just now. These were the main points which convinced me most:

We’re passing spectre-meltdown-checker --paranoid test.

Then also reading this strengthened this:

Quoting specifically:

Kroah-Hartman dispelled the idea that an issue like Spectre has a single fix. “We are still fixing Spectre 1.0 issues [almost] two years later. It’s taken a couple of thousand patches over [almost] two years. Always take the latest kernel and always take the latest BIOS update.”

There are way too many CPU models and this is a huge mess. I don’t think we have enough people who have each CPU and then test if these vulnerabilities are properly detected and mitigated? Looking at how few care about spectre-meltdown-checker (you’d expect more contributions there), I don’t think there is a strong handle on this issue yet.

We want the High security mode.

Edit by Patrick:
fix typo - propeller → properly

1 Like

This is a great point. I didn’t think about that. Considering that, forcing mitigations sounds like a really good idea.

1 Like
2 Likes

madaidan via Whonix Forum:

Improve CPU mitigations documentation by madaidan · Pull Request #58 · Kicksecure/security-misc · GitHub

Merged.

2 Likes