FPU Context Switching #1325

Indanz · 2024-09-26T18:04:13Z

Implementation of the FPU RFC.

Simplest version of semi-lazy FPU switching.

Still work in progress, but in good enough shape for wider review. For previous development, see here.

TODO:

Test virtualisation aarch64, aarch32 and 32-bit x86 (x86_64 done).
Test and benchmark.

Possible Optimisations:

Try alternative domain switch variants.
Try switchFpuState with or without non-NULL ksActiveFPUState.
Try global ksFPUEnabled.

Indanz · 2024-09-26T18:14:34Z

For some reason if (config_set(CONFIG_HAVE_FPU)) doesn't work to hide things. Easily fixed by creating an empty fpuRelease, but I wouldn't expect that would be necessary. Any suggestions?

Edit: I fixed it by using ifdef.

Indanz · 2024-11-05T11:22:51Z

@chrisguikema If you have time, could you review and test the x86 VCPU changes on your VmMs? Would be good to know whether there is a performance degradation too. I don't expect any, but if there is, it's probably a bug.

I'm not sure if the guest's requested CR0:TS is being preserved or not (because I clear TS when unconditionally enabling the FPU for the guest). Is the CR0 used by guests the same as the CR0 used by hosts, or does it have its own architectural copy? I guess it has it's own copy because of VMX_GUEST_CR0, but not sure what the interaction is with CR0.

Edit: Or was it @Furao who promised me this at the summit? I forgot.

corlewis · 2025-03-24T04:06:25Z

Just a couple of last changes that we'd like to make to this for verification. Would it be ok to move prepareSetDomain out of setDomain and into decodeDomainInvocation? That matches up better with how the specifications have been written and allows us to separate out the proofs about FPU and domain state.

Separately, we need to label enableFpu with MODIFIES: phantom_machine_state, as it us where we currently have the machine interface for the proofs and we need to know that it only modifies state that is not in the model.

See corlewis@ef617bc for the versions of these changes that I've been working with.

Also, when you have a chance another rebase would be helpful, to get the most recent verification-visible changes.

With these we're able to complete the functional correctness proofs for AArch64 and expect the other architectures to be relatively straightforward (although keeping in mind that the verified x86 configuration does not have hypervisor support so we won't have to handle the complications there that you've been dealing with).

corlewis · 2025-03-24T04:08:53Z

Just a couple of last changes that we'd like to make to this for verification. Would it be ok to move prepareSetDomain out of setDomain and into decodeDomainInvocation? That matches up better with how the specifications have been written and allows us to separate out the proofs about FPU and domain state.

Oh, I've just realised that if we do make this change then for consistency we should probably wrap them up in a new invokeDomain function and call that from decodeDomainInvocation, like we do with all of the other decode functions.

Indanz · 2025-03-27T23:01:00Z

Okay, done. Let me know if this is fine.

src/object/tcb.c

corlewis · 2025-03-30T23:45:02Z

Okay, done. Let me know if this is fine.

Sorry, I thought that the new invokeDomainSetSet function would be fine but it returning an exception causes a surprising amount of pain in the proofs. Do you have any objections to making it a void function, like in corlewis@c852075? With that I've been able to update all of the proofs. There is also a precedent for this pattern in the various invokeIRQHandler functions that can be seen at https://github.com/seL4/seL4/blob/master/src/object/interrupt.c#L86.

Indanz · 2025-03-31T06:41:48Z

Sorry, I thought that the new invokeDomainSetSet function would be fine but it returning an exception causes a surprising amount of pain in the proofs. Do you have any objections to making it a void function, like in corlewis@c852075?

That's surprising, as that's how all other invoke functions are in that file. Actually, looking at all invoke functions, only the IRQ ones are void, all others return exception_t. I have no objection to changing it, but it might be worth looking into why this one is so hard compared to the others? Perhaps you can simplify the proofs a lot if you change the other invoke functions to void too.

corlewis · 2025-03-31T07:13:24Z

That's surprising, as that's how all other invoke functions are in that file. Actually, looking at all invoke functions, only the IRQ ones are void, all others return exception_t. I have no objection to changing it, but it might be worth looking into why this one is so hard compared to the others? Perhaps you can simplify the proofs a lot if you change the other invoke functions to void too.

Honestly, in this case a fair bit of the pain is probably because I already had working proofs and I'm now trying to minimise the changes I'm making for invokeDomainSetSet. If we'd had this from the start I would have just copied the specifications and proofs of the other functions and it would presumably have been fine.

One thing when considering the other invoke functions is that some of them actually can be preempted and return an exception, so those ones can't be changed to void. Now that I've done this though I think that in the cases where they can't be preempted it actually is nicer to be void, but I don't think that it's enough of an improvement to be worth going through and updating them.

lsf37 · 2025-03-31T23:04:47Z

One thing when considering the other invoke functions is that some of them actually can be preempted and return an exception, so those ones can't be changed to void. Now that I've done this though I think that in the cases where they can't be preempted it actually is nicer to be void, but I don't think that it's enough of an improvement to be worth going through and updating them.

When there is a quiet period at some point, we should consider it, because it is one of the rare cases where the C type systems actually does help a little bit on the C side (when it returns void we know there is no preemption). Axel had a pull request up a while ago to change some of the existing invocations, but I never got around to doing the update. For the same reasons we don't want to do them now :-), but not because it's a bad idea as such.

lsf37 · 2025-07-23T23:55:26Z

Tests are looking good now, I'll rebase for the cmake restyling.

Aligning saves one cycle per LDP/STP instruction. Signed-off-by: Indan Zupancic <[email protected]>

It was blindly copied from Aarch32 and has never been correct. Signed-off-by: Indan Zupancic <[email protected]>

Signed-off-by: Indan Zupancic <[email protected]>

Otherwise ksActiveFPUState may point to freed memory after a vcpu with active FPU state gets deleted. Signed-off-by: Indan Zupancic <[email protected]>

Save current FPU state on domain switch. Signed-off-by: Indan Zupancic <[email protected]>

Co-authored-by: Corey Lewis <[email protected]> Signed-off-by: Indan Zupancic <[email protected]>

Retain the ridiculous name to make clear which invocation is being handled. Rename tptr to tcb for consistency within the file. We have a dom_t type, use it as early as possible. Signed-off-by: Indan Zupancic <[email protected]>

Add flags to tcb_t and the seL4_TCBFlag_fpuDisabled flag. Enums are signed, make TCB flags word_t to make it unsigned. Signed-off-by: Indan Zupancic <[email protected]> Signed-off-by: Corey Lewis <[email protected]>

Remove fault-based FPU state saving and loading. Signed-off-by: Indan Zupancic <[email protected]> Signed-off-by: Gerwin Klein <[email protected]>

On x86 both the VM monitor as the VM guest have their own FPU state. The FPU of the monitor is controlled with seL4_TCBFlag_fpuDisabled TCB flag. For the guest running on the VCPU the FPU will always be enabled. Signed-off-by: Indan Zupancic <[email protected]>

Trivial change, except for x86 virtualisation and bootup. Normally the seL4_TCBFlag_fpuDisabled flag is used to decide whether to enable or disable the FPU. However, with x86 virtualisation there is only one TCB. Use the task flag for the host and always enable FPU for the guest. When either the host or the guest's FPU is loaded, ksCurFPUOwner will point to our TCB. saveFpuState and loadFpuState check fpu_active and do the right thing depending on the current FPU user. To make this work it was necessary to pass tcb_t to saveFpuState and loadFpuState. Use the idle thread to store the initial FPU state. x86KSnullFpuState is kept to simplify verification. However, when Arch_initFpu is called, the idle thread hasn't been created yet, so move the initialisation to after create_idle_thread, but before create_initial_thread, as that leads to Arch_initFpuContext using x86KSnullFpuState. Also initialise VCPU FPU registers correctly for x86, otherwise the initial state is wrong and can't be loaded when XSAVES is enabled. Signed-off-by: Indan Zupancic <[email protected]>

Add config choice and change the default from XSAVEOPT to XSAVE. The first config choice is used as the default option. Only XSAVE is guaranteed to always work, the others require newer CPUs. Get rid of dubious FPU state headers, we don't need them: - XCOMP_BV_COMPACTED_FORMAT is set by xsavec or xsaves. - We can init MXCSR with the ldmxcsr instruction. Only system state should be configured in IA32_XSS_MSR, setting FPU user state bits causes an exception. All memory should be zeroed already, no need to do it again. See also issue seL4#179. Signed-off-by: Indan Zupancic <[email protected]>

Unused on ARM, on RISC-V it is actually the final decision and not a cache. Signed-off-by: Indan Zupancic <[email protected]>

To ease verification. Fairly trivial change, except for x86, of course. Signed-off-by: Indan Zupancic <[email protected]>

It's not only used when deleting a thread any more. Swapping fpuRelease and dissociateVCPUTCB makes no practical difference as they are independent, but it simplifies verification slightly. Signed-off-by: Indan Zupancic <[email protected]>

Also release FPU when dissociating a vcpu, otherwise ksCurFPUOwner may be wrong. fpuRelease() sets ksCurFPUOwner to NULL, if this happens before we return to the host, it may end up with the FPU disabled and no FPU state loaded. Instead of hunting down and handling all obscure corner cases where this might happen (dissociating vcpus, cross-core operations), just check for this in vcpu_fpu_to_host(). Signed-off-by: Indan Zupancic <[email protected]>

disableFpu() is always called at boot and calls one of those. Signed-off-by: Indan Zupancic <[email protected]>

Signed-off-by: Indan Zupancic <[email protected]>

Signed-off-by: Corey Lewis <[email protected]>

On 32-bit ARM the fpexc system register is set by loadFpuState, which includes the FPU enable/disable bit FPEXC_EN_BIT. This register is part of the usercontext and needs to be initialised correctly, otherwise the FPU will be disabled by loadFpuState. Before, this bug was hidden because the FPU was enabled lazily after a trap. This bug just caused one extra FPU trap at first FPU use for each task: The first handleFPUFault would fail to enable the FPU, causing another FPU trap when user space gets restarted. On the second FPU fault, switchLocalFpuOwner calls enableFpu first and then calls saveFpuState because ksActiveFPUState is set to the current task's FPU state. Then it gets saved with FPEXC_EN_BIT set and the task can continue with the FPU actually enabled. This also means that with the old code, the initial FPEXC state of each task was equal to the previous active FPU task's. Signed-off-by: Indan Zupancic <[email protected]>

According to Arm documentation, the exception-handling routine needs to zero the following bits in FPEXC: DEX, IDF, IXF, UFF, OFF, DZF and IOF In seL4, the user space fault handler should do this, as the kernel doesn't know when user space is done handling the FPU trap. In case of virtualisation, the guest kernel should clear these bits. Now that the seL4 kernel doesn't handle FPU traps itself any more, user space can handle asynchronous traps if it wants to. Signed-off-by: Indan Zupancic <[email protected]>

Signed-off-by: Indan Zupancic <[email protected]>

Signed-off-by: Gerwin Klein <[email protected]>

lsf37 · 2025-07-24T06:43:58Z

Alright, things are finally passing. The link check is failing on the RFC link that will only exist once merged, and the zcu106 is failure is unrelated.

Will go ahead with the merge!

Indanz added verification Needs formal verification input/change, or is motivated by verification hw-test sel4test hardware builds + runs for this PR labels Sep 26, 2024

Indanz marked this pull request as draft September 26, 2024 18:04

Indanz mentioned this pull request Sep 26, 2024

FPUv1 Indanz/seL4#1

Closed

Indanz force-pushed the fpu branch from 4544074 to d672283 Compare September 26, 2024 18:25

Indanz self-assigned this Sep 26, 2024

Indanz force-pushed the fpu branch 3 times, most recently from 4ce3c41 to 5873c16 Compare September 27, 2024 13:18

Indanz requested a review from chrisguikema October 17, 2024 12:10

Indanz force-pushed the fpu branch 2 times, most recently from a31dae8 to 8f3f1b2 Compare October 31, 2024 17:06

Indanz force-pushed the fpu branch 2 times, most recently from e0d3443 to ad1be9f Compare February 27, 2025 21:12

Indanz force-pushed the fpu branch from ad1be9f to 2b223f3 Compare March 27, 2025 22:59

corlewis mentioned this pull request Mar 27, 2025

Update AARCH64 Refine and CRefine for explicit FPU changes seL4/l4v#873

Merged

corlewis reviewed Mar 28, 2025

View reviewed changes

src/object/tcb.c Outdated Show resolved Hide resolved

corlewis reviewed Mar 28, 2025

View reviewed changes

src/object/tcb.c Outdated Show resolved Hide resolved

Indanz force-pushed the fpu branch from 2b223f3 to 161eb31 Compare March 30, 2025 05:42

Indanz force-pushed the fpu branch from 161eb31 to 0ae31be Compare March 31, 2025 06:47

lsf37 mentioned this pull request Apr 10, 2025

x86 FPU XSAVES Option Doesn't Exist #1446

Closed

Indanz and others added 7 commits July 24, 2025 10:12

Aarch64: Align FPU vregs

5edab73

Aligning saves one cycle per LDP/STP instruction. Signed-off-by: Indan Zupancic <[email protected]>

Aarch64: Remove incorrect comment

9b6e387

It was blindly copied from Aarch32 and has never been correct. Signed-off-by: Indan Zupancic <[email protected]>

ARM: Move debug state to the end of user_context

30f5870

Signed-off-by: Indan Zupancic <[email protected]>

x86, HYP: Cleanup VCPU FPU state

0f514e1

Otherwise ksActiveFPUState may point to freed memory after a vcpu with active FPU state gets deleted. Signed-off-by: Indan Zupancic <[email protected]>

Domain: Avoid touching cross-domain FPU state

0011020

Save current FPU state on domain switch. Signed-off-by: Indan Zupancic <[email protected]>

Domain: Save FPU state when changing domain

6e3c2cf

Co-authored-by: Corey Lewis <[email protected]> Signed-off-by: Indan Zupancic <[email protected]>

Domain: Split off invokeDomainSetSet

5715ef9

Retain the ridiculous name to make clear which invocation is being handled. Rename tptr to tcb for consistency within the file. We have a dom_t type, use it as early as possible. Signed-off-by: Indan Zupancic <[email protected]>

lsf37 force-pushed the fpu branch from 3c00cc6 to 220636b Compare July 24, 2025 00:16

Indanz and others added 16 commits July 24, 2025 10:20

Add seL4_TCB_SetFlags Syscall

95d8ec8

Add flags to tcb_t and the seL4_TCBFlag_fpuDisabled flag. Enums are signed, make TCB flags word_t to make it unsigned. Signed-off-by: Indan Zupancic <[email protected]> Signed-off-by: Corey Lewis <[email protected]>

FPU: Save and restore FPU state based on TCB flag

2bfd00d

Remove fault-based FPU state saving and loading. Signed-off-by: Indan Zupancic <[email protected]> Signed-off-by: Gerwin Klein <[email protected]>

FPU: Remove Unused isFpuEnable

4a9b9df

Unused on ARM, on RISC-V it is actually the final decision and not a cache. Signed-off-by: Indan Zupancic <[email protected]>

FPU: Do lazyFPURestore in switchToThread

6651a10

To ease verification. Fairly trivial change, except for x86, of course. Signed-off-by: Indan Zupancic <[email protected]>

FPU: Rename fpuThreadDelete to fpuRelease

2b4ec64

It's not only used when deleting a thread any more. Swapping fpuRelease and dissociateVCPUTCB makes no practical difference as they are independent, but it simplifies verification slightly. Signed-off-by: Indan Zupancic <[email protected]>

ARM64: Always declare disable FPU functions

95becc2

disableFpu() is always called at boot and calls one of those. Signed-off-by: Indan Zupancic <[email protected]>

x86: Fix Style Errors

944c3e6

Signed-off-by: Indan Zupancic <[email protected]>

FPU: ease verification

126e821

Signed-off-by: Corey Lewis <[email protected]>

FPU: Update CHANGES.md

2a42fba

Signed-off-by: Indan Zupancic <[email protected]>

Restyle x86/config.cmake for new cmake-format

d2a7da4

Signed-off-by: Gerwin Klein <[email protected]>

lsf37 force-pushed the fpu branch from 220636b to d2a7da4 Compare July 24, 2025 00:21

lsf37 merged commit 22f816a into seL4:master Jul 24, 2025
85 of 110 checks passed

Indanz mentioned this pull request Jul 25, 2025

SMP, MCS: Avoid unnecessary IPIs because of lazy FPU loading #901

Open

JDuchniewicz mentioned this pull request Dec 10, 2025

Performance degradation after #1325 (FPU Switching) #1569

Open

FPU Context Switching #1325

FPU Context Switching #1325

Uh oh!

Conversation

Indanz commented Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Indanz commented Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Indanz commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

corlewis commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

corlewis commented Mar 24, 2025

Uh oh!

Indanz commented Mar 27, 2025

Uh oh!

Uh oh!

Uh oh!

corlewis commented Mar 30, 2025

Uh oh!

Indanz commented Mar 31, 2025

Uh oh!

corlewis commented Mar 31, 2025

Uh oh!

lsf37 commented Mar 31, 2025

Uh oh!

lsf37 commented Jul 23, 2025

Uh oh!

lsf37 commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Indanz commented Sep 26, 2024 •

edited

Loading

Indanz commented Sep 26, 2024 •

edited

Loading

Indanz commented Nov 5, 2024 •

edited

Loading

corlewis commented Mar 24, 2025 •

edited

Loading