Should all kernel patches for CPU bugs be unconditionally enabled? Vs Performance vs Applicability

AFAIK Linux enables the mitigations by default whenever possible using auto. Debian also backported patches against mds. [1]

It is only recently (Linux v5.2) that there is a unified switch for disabling all mitigations for performance on machines not facing the network.[2]

[1] Debian Patches New Intel MDS Security Vulnerabilities in Debian Linux Stretch

[2] Spectre/Meltdown Mitigations Can Now Be Toggled With Convenient "mitigations=" Option - Phoronix

Not really. In the case of MDS microcode is needed to be of any effect.

IMPORTANT: There is no software fallback mechanism available for processors that have not received microcode updates from Intel. Mitigation is only possible if Intel has provided a microcode update for your processor.

I really don’t think we should mess with the default spectre/meltdown kernel switches in this case as upstream is clearly aware of the problem and is taking appropriate action. Non defaults will confuse users and might leave them to other attacks they are otherwise safe from.

EPT is the Virtualization hardware solution for safe and fast memory isolation allowing deprecation of shadow page tables done in software for hypervisors You don’t want to disable this.

Yes everything should be left as is. It comes as auto by default. The KVM guest kernel will enforce whenever it detects the necessary microcode extensions are activated.

2 Likes

Quote L1TF - L1 Terminal Fault — The Linux Kernel documentation

The kernel does not by default enforce the disabling of SMT, which leaves SMT systems vulnerable when running untrusted guests with EPT enabled.

Since we disable SMT (mds=full,nosmt) that should be ok?

As per L1TF - L1 Terminal Fault — The Linux Kernel documentation there is also a separate kernel boot parameter nosmt=force which I find confusing. Should we set that as well?

Let’s please reconsider this. Quote L1TF - L1 Terminal Fault — The Linux Kernel documentation

The kernel does not by default enforce the disabling of SMT, which leaves SMT systems vulnerable when running untrusted guests with EPT enabled.

The administrators of cloud and hosting setups have to carefully analyze the risk for their scenarios and make the appropriate mitigation choices, which might even vary across their deployed machines and also result in other changes of their overall setup. There is no way for the kernel to provide a sensible default for this kind of scenarios.

My reading of that page is that the kernel developers do not want to set secure defaults for all scenarios due to:

  • the huge degraded performance as well as
  • breaking existing systems that use unattended upgrades.

But we have secure by default development goals and more flexibility of blessing some scenarios “then you need to change back the settings” or unsupported.

L1TF = L1 Terminal Fault

Should we set l1tf kernel boot parameter?

[X86] Control mitigation of the L1TF vulnerability on affected CPUs

Need to make sure this doesn’t break any virtualizer at the host or any guest VMs.

This is because currently our security-misc config passes

sudo spectre-meltdown-checker ; echo $?

but fails

sudo spectre-meltdown-checker --paranoid ; echo $?

due to

STATUS: VULNERABLE (enable L1D unconditional flushing and disable Hyper-Threading to fully mitigate the vulnerability)

Setting kernel boot parameter l1tf=full,force fixes that.


These mitigations are much more important on the host for Host/vm isolation, but are needed for interprocess separation in a VM. If anyone can lookup their status in default debian kernels it would be useful so we know what needs to be enabled in security-misc/documented and so on.

The separate option is to guard against disabling via sysfs. I’d enable it to make things more robust in case of malicious actions in a guest.

One important consideration is to find out if AMD systems are unfairly impacted by these mitigations when they don’t really apply to them. In that case we should turn them on for certain CPU families.

Agreed.

1 Like

Without disabling HT the vuln would be there, but I heard HT disabling really cripples system perf.

1 Like

A post was merged into an existing topic: Whonix vulerable due to missing processor microcode packages? spectre / meltdown / retpoline / L1 Terminal Fault (L1TF)

Trying to decode this mostly fluff talk video by Intel on L1TF. Quote https://www.youtube.com/watch?v=n_pa2AisRUs

in some cases where it can’t be guaranteed that all virtualized operating systems have been updated some customers may choose to take additional actions first and coupled with the l1 cache flush they can ensure that only trusted siblings have access to the same processor core this capability is called core scheduling and is already supported by some hypervisors if course scheduling isn’t available and they suspect there might be potentially untrustworthy sibling sharing access to the l1 cache it may be appropriate to take further action like only allowing one thread to run per core this demonstrates what would happen if SMT were disabled while these actions might be applicable to a relatively small portion of the overall market we think it’s important to provide solutions for all our customers

Did they really mean “if the microcode, host and guest operating systems is patched, then there is no need to disable SMT”?

In other words, did they really mean “only if guests are not patched it might make sense to pin CPUs to guests and/or to disable SMT”?

Is it true that SMT needs to be disabled when running unpatched guests? A malicious guest using kexec to load a malicious kernel couldn’t read secrets by another VM using the same L1 cache if that other VM was patched?

This has to remain future work. While there are many researchers find out more and more attacks and kernel developers invent mitigations, there seems to be a severe shortage of people who are working on analysis tools such as spectre-meltdown-checker or enabling mitigations for as many users as easily as possible. Such as through installation of a package such as security-misc or a distribution such as Kicksecure or Whonix-Host. I am not aware of anyone else working on that currently. There are way too many vulnerabilities, mitigations, use scenarios, processors and therefore resulting combinations which makes this hard to implement.

For start, we could use a wiki table with an overview. Not even sure yet what contents such as table would need. Vulnerabilities, mitigations, use scenarios, processor(s) (families), locally exploitable, remotely exploitable, local information leak, remote information leak (NetSpectre), relevant when using a hypervisor vs not using a hypervisor.

Qubes disabled HT.
( Safe use of Hyperthreading when Xen stable includes new sched-gran parameter · Issue #5547 · QubesOS/qubes-issues · GitHub )
Performance is OK for me with disabled HT but it’s a powerful machine.

I think we should disable HT. Qubes disables it too.

Quote Red Hat Customer Portal - Access to 24x7 support and knowledge

Recent Intel CPU vulnerabilities (L1TF and MDS) cannot be fully mitigated in software without disabling Simultaneous Multi-Threading.

Therefore we better disable it.

This can have a substantial performance impact and is only necessary for certain workloads, so for compatibility reasons, SMT is enabled by default.

I guess for good security, switch from Windows to a Linux distribution. And I guess for advanced security, use Whonix, Whonix-Host, Kicksecure. For our target user it seems appropriate to disable SMT.

In addition, the Intel TAA vulnerability cannot be fully mitigated without disabling either of SMT or the Transactional Synchronization Extensions (TSX).

SMT and TSX should be disabled on affected Intel processors under the following circumstances:

  1. A bare-metal host runs untrusted virtual machines, and other arrangements have not been made for mitigation.
  2. A bare-metal host runs untrusted code outside a virtual machine.

In our threat model we deem most code untrusted. We don’t want to trust any code but sometimes we have to. That’s why we use linux user account separation and mandatory access controls. Because some code is considered not deliberately malicious (such as the browser) but potentially exploitable (therefore considered untrusted). Therefore I think “runs untrusted code outside a virtual machine” always applies in our threat model.

Otherwise also things would be more complicated.

  • Kicksecure users not using a hypervisor: no need to disable SMT
  • Whonix host users running VMs: should disable SMT

SMT can be conditionally disabled by passing mitigations=auto,nosmt on the kernel command line. This will disable SMT only if required for mitigating a vulnerability. This approach has two caveats:

  1. It does not protect against unknown vulnerabilities in SMT.

I guess at the quantity and speed at which vulnerabilities are published we again ought to disable SMT here.

Alternatively, SMT can be unconditionally disabled by passing nosmt on the kernel command line. This provides the most protection and avoids possible behavior changes on upgrades, at the cost of a potentially unnecessary reduction in performance.

1 Like

+1


OK


The amount of new bugs that keeps being uncovered is staggering and will make anyone’s head spin:

1 Like
1 Like

After the recent reconsideration, yes.

Seems like a good idea.

https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/spectre.html

High security mode

All Spectre variant 2 mitigations can be forced on at boot time for all programs (See the “on” option in Mitigation control on the kernel command line). This will add overhead as indirect branch speculations for all programs will be restricted.

Done above.

Using that now.


Please let me know should any mitigation not be unconditionally enabled yet.

1 Like

This is an important point. I was wondering about this too. And I could be wrong here. It would be very much desirable to simplify this. However, I think mitigations=auto isn’t the same as mitigations=force, while the latter does not seem to exist yet. This seems like a missing kernel feature but I wasn’t sure it’s worth a feature request.

mitigations=auto means “conditionally enable” while mitigations=force would mean “unconditionally enable”.

Hardware vulnerabilities — The Linux Kernel documentation and following pages don’t mention mitigations=.

quote The kernel's command-line parameters — The Linux Kernel documentation

mitigations=
[X86,PPC,S390,ARM64] Control optional mitigations for
CPU vulnerabilities. This is a set of curated,
arch-independent options, each of which is an
aggregation of existing arch-specific options.

auto (default)

I.e. auto is already the default. mitigations=auto would essentially do nothing.

auto,nosmt
Mitigate all CPU vulnerabilities, disabling SMT
if needed. This is for users who always want to
be fully mitigated, even if it means losing SMT.
Equivalent to: l1tf=flush,nosmt [X86]
mds=full,nosmt [X86]
tsx_async_abort=full,nosmt [X86]

It does not set l1tf=full,force, nosmt=force, tsx=off, spectre_v2=on, pti=on.

Kernel — CLIP OS 5.0.0_beta3 documentation writes

  • mitigations : This parameter controls optional mitigations for CPU vulnerabilities in an arch-independent and more coarse-grained way. For now, we keep using arch-specific options for the sake of explicitness. Not setting this parameter equals setting it to auto , which itself does not update anything.
1 Like

I don’t see any point in forcing these to be enabled. The kernel will automatically detect if the CPU is vulnerable and if it is, will enable the mitigations. If it isn’t vulnerable, it won’t enable the mitigations but that doesn’t matter since they aren’t needed.

Debian disables TSX by default with CONFIG_X86_INTEL_TSX_MODE_OFF=y but I guess the tsx=off parameter is useful for kernels with it enabled by default.

nosmt=force is redundant. We already disable SMT with mds=full,nosmt.

tsx_async_abort=full,nosmt is redundant as TSX is already disabled.

2 Likes

Which ones? All recent ones or only the ones mentioned in your last post?

Yes.

mds=full,nosmt does not make that redundant.
Though, l1tf=full,force makes it redundant.

l1tf=full,force https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt says

Implies the ‘nosmt=force’ command line option. (i.e. sysfs control of SMT is disabled.)

mds=full,nosmt does not say “Implies the ‘nosmt=force’”.

The reason why I am interested in nosmt=force is:

Force disable SMT, cannot be undone via the sysfs control file.

Which seems useful at least for purposes of Untrusted Root - improve Security by Restricting Root.

I see your point. But I am not sure.

As per TAA - TSX Asynchronous Abort — The Linux Kernel documentation here are some possible combinations (these are listed in a table on that page):


  • tsx=on
  • tsx_async_abort=full,nosmt
  • As above, cross-thread attacks on SMT mitigated.

  • tsx=off
  • tsx_async_abort=full,nosmt
  • TSX might be disabled if microcode provides a TSX control MSR. If so, system is not vulnerable.

What also confuses me:

Default mitigations

The kernel’s default action for vulnerable processors is:

Deploy TSX disable mitigation (tsx_async_abort=full tsx=off).

A combination of tsx=off and tsx_async_abort=full.

Why didn’t they write simpler:

Deploy TSX disable mitigation (tsx=off).

Therefore perhaps better to keep tsx_async_abort=full,nosmt?

1 Like

3 posts were merged into an existing topic: Whonix vulerable due to missing processor microcode packages? spectre / meltdown / retpoline / L1 Terminal Fault (L1TF)

All except maybe tsx=off.

mds=full,nosmt:

full,nosmt - Enable MDS mitigation and disable SMT on vulnerable CPUs

nosmt=force:

nosmt=force: Force disable SMT, cannot be undone via the sysfs control file.

They both do essentially the same thing. mds=full,nosmt can’t be undone via sysfs either.

user@host:~$ cat /sys/devices/system/cpu/smt/control
notsupported

My interpretation of the docs is using tsx=off and tsx_async_abort=full,nosmt together will only disable TSX if the mitigation isn’t available. Just using tsx=off should disable TSX regardless.

1 Like

I might still misinterpret that. Please send a pull request since I guess that is easier than trying to ask/explain it.


Intuition tells me that

  • if tsx=off is set, i.e. if tsx is disabled anyhow, there is no need for tsx_async_abort=full.
  • but on the other hand if tsx=off is set anyhow, tsx_async_abort=full is superfluous and shouldn’t do harm. Therefore in doubt, set it.

However, quote TAA - TSX Asynchronous Abort — The Linux Kernel documentation contradicts my intuition:

Default mitigations
The kernel’s default action for vulnerable processors is:
Deploy TSX disable mitigation (tsx_async_abort=full tsx=off).

Therefore I interpret tsx_async_abort=full tsx=off as a valid, sensible, non-illogical option. Switching tsx_async_abort=full to tsx_async_abort=full,nosmt is just an extension of that.

The kernel’s default action for vulnerable processors is tsx_async_abort=full tsx=off. Therefore I concluded to set tsx_async_abort=full tsx=off regardless to reach goal of unconditionally enabling mitigation seven if the processor not detected as vulnerable. [Plus add ,nosmt.]

I guess we’ll get only higher certainty on that by asking the kernel developers and/or reading the kernel source code.

1 Like

No.

My interpretation of the docs is using tsx=off and tsx_async_abort=full,nosmt together will only disable TSX if the mitigation isn’t available . Just using tsx=off should disable TSX regardless.

Using tsx_async_abort=full along with tsx=off will still enable TSX which can leave us open to 0days the mitigation doesn’t prevent.

Just using tsx=off should leave TSX always disabled and protect us from all TSX vulnerabilities.

1 Like

Could you please look this up in the kernel source code?

1 Like

I couldn’t find anything.

I think we should just leave all CPU mitigation things to the kernel default as it automatically enables needed mitigations if the device is vulnerable.