-
Notifications
You must be signed in to change notification settings - Fork 731
FPU Context Switching #1325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FPU Context Switching #1325
Conversation
|
For some reason Edit: I fixed it by using ifdef. |
4ce3c41 to
5873c16
Compare
a31dae8 to
8f3f1b2
Compare
|
@chrisguikema If you have time, could you review and test the x86 VCPU changes on your VmMs? Would be good to know whether there is a performance degradation too. I don't expect any, but if there is, it's probably a bug. I'm not sure if the guest's requested CR0:TS is being preserved or not (because I clear TS when unconditionally enabling the FPU for the guest). Is the CR0 used by guests the same as the CR0 used by hosts, or does it have its own architectural copy? I guess it has it's own copy because of VMX_GUEST_CR0, but not sure what the interaction is with CR0. Edit: Or was it @Furao who promised me this at the summit? I forgot. |
e0d3443 to
ad1be9f
Compare
|
Just a couple of last changes that we'd like to make to this for verification. Would it be ok to move Separately, we need to label See corlewis@ef617bc for the versions of these changes that I've been working with. Also, when you have a chance another rebase would be helpful, to get the most recent verification-visible changes. With these we're able to complete the functional correctness proofs for AArch64 and expect the other architectures to be relatively straightforward (although keeping in mind that the verified x86 configuration does not have hypervisor support so we won't have to handle the complications there that you've been dealing with). |
Oh, I've just realised that if we do make this change then for consistency we should probably wrap them up in a new |
|
Okay, done. Let me know if this is fine. |
Sorry, I thought that the new |
That's surprising, as that's how all other invoke functions are in that file. Actually, looking at all invoke functions, only the IRQ ones are void, all others return exception_t. I have no objection to changing it, but it might be worth looking into why this one is so hard compared to the others? Perhaps you can simplify the proofs a lot if you change the other invoke functions to void too. |
Honestly, in this case a fair bit of the pain is probably because I already had working proofs and I'm now trying to minimise the changes I'm making for One thing when considering the other invoke functions is that some of them actually can be preempted and return an exception, so those ones can't be changed to void. Now that I've done this though I think that in the cases where they can't be preempted it actually is nicer to be void, but I don't think that it's enough of an improvement to be worth going through and updating them. |
When there is a quiet period at some point, we should consider it, because it is one of the rare cases where the C type systems actually does help a little bit on the C side (when it returns |
|
Tests are looking good now, I'll rebase for the cmake restyling. |
Aligning saves one cycle per LDP/STP instruction. Signed-off-by: Indan Zupancic <[email protected]>
It was blindly copied from Aarch32 and has never been correct. Signed-off-by: Indan Zupancic <[email protected]>
Signed-off-by: Indan Zupancic <[email protected]>
Otherwise ksActiveFPUState may point to freed memory after a vcpu with active FPU state gets deleted. Signed-off-by: Indan Zupancic <[email protected]>
Save current FPU state on domain switch. Signed-off-by: Indan Zupancic <[email protected]>
Co-authored-by: Corey Lewis <[email protected]> Signed-off-by: Indan Zupancic <[email protected]>
Retain the ridiculous name to make clear which invocation is being handled. Rename tptr to tcb for consistency within the file. We have a dom_t type, use it as early as possible. Signed-off-by: Indan Zupancic <[email protected]>
Add flags to tcb_t and the seL4_TCBFlag_fpuDisabled flag. Enums are signed, make TCB flags word_t to make it unsigned. Signed-off-by: Indan Zupancic <[email protected]> Signed-off-by: Corey Lewis <[email protected]>
Remove fault-based FPU state saving and loading. Signed-off-by: Indan Zupancic <[email protected]> Signed-off-by: Gerwin Klein <[email protected]>
On x86 both the VM monitor as the VM guest have their own FPU state. The FPU of the monitor is controlled with seL4_TCBFlag_fpuDisabled TCB flag. For the guest running on the VCPU the FPU will always be enabled. Signed-off-by: Indan Zupancic <[email protected]>
Trivial change, except for x86 virtualisation and bootup. Normally the seL4_TCBFlag_fpuDisabled flag is used to decide whether to enable or disable the FPU. However, with x86 virtualisation there is only one TCB. Use the task flag for the host and always enable FPU for the guest. When either the host or the guest's FPU is loaded, ksCurFPUOwner will point to our TCB. saveFpuState and loadFpuState check fpu_active and do the right thing depending on the current FPU user. To make this work it was necessary to pass tcb_t to saveFpuState and loadFpuState. Use the idle thread to store the initial FPU state. x86KSnullFpuState is kept to simplify verification. However, when Arch_initFpu is called, the idle thread hasn't been created yet, so move the initialisation to after create_idle_thread, but before create_initial_thread, as that leads to Arch_initFpuContext using x86KSnullFpuState. Also initialise VCPU FPU registers correctly for x86, otherwise the initial state is wrong and can't be loaded when XSAVES is enabled. Signed-off-by: Indan Zupancic <[email protected]>
Add config choice and change the default from XSAVEOPT to XSAVE. The first config choice is used as the default option. Only XSAVE is guaranteed to always work, the others require newer CPUs. Get rid of dubious FPU state headers, we don't need them: - XCOMP_BV_COMPACTED_FORMAT is set by xsavec or xsaves. - We can init MXCSR with the ldmxcsr instruction. Only system state should be configured in IA32_XSS_MSR, setting FPU user state bits causes an exception. All memory should be zeroed already, no need to do it again. See also issue seL4#179. Signed-off-by: Indan Zupancic <[email protected]>
Unused on ARM, on RISC-V it is actually the final decision and not a cache. Signed-off-by: Indan Zupancic <[email protected]>
To ease verification. Fairly trivial change, except for x86, of course. Signed-off-by: Indan Zupancic <[email protected]>
It's not only used when deleting a thread any more. Swapping fpuRelease and dissociateVCPUTCB makes no practical difference as they are independent, but it simplifies verification slightly. Signed-off-by: Indan Zupancic <[email protected]>
Also release FPU when dissociating a vcpu, otherwise ksCurFPUOwner may be wrong. fpuRelease() sets ksCurFPUOwner to NULL, if this happens before we return to the host, it may end up with the FPU disabled and no FPU state loaded. Instead of hunting down and handling all obscure corner cases where this might happen (dissociating vcpus, cross-core operations), just check for this in vcpu_fpu_to_host(). Signed-off-by: Indan Zupancic <[email protected]>
disableFpu() is always called at boot and calls one of those. Signed-off-by: Indan Zupancic <[email protected]>
Signed-off-by: Indan Zupancic <[email protected]>
Signed-off-by: Corey Lewis <[email protected]>
On 32-bit ARM the fpexc system register is set by loadFpuState, which includes the FPU enable/disable bit FPEXC_EN_BIT. This register is part of the usercontext and needs to be initialised correctly, otherwise the FPU will be disabled by loadFpuState. Before, this bug was hidden because the FPU was enabled lazily after a trap. This bug just caused one extra FPU trap at first FPU use for each task: The first handleFPUFault would fail to enable the FPU, causing another FPU trap when user space gets restarted. On the second FPU fault, switchLocalFpuOwner calls enableFpu first and then calls saveFpuState because ksActiveFPUState is set to the current task's FPU state. Then it gets saved with FPEXC_EN_BIT set and the task can continue with the FPU actually enabled. This also means that with the old code, the initial FPEXC state of each task was equal to the previous active FPU task's. Signed-off-by: Indan Zupancic <[email protected]>
According to Arm documentation, the exception-handling routine needs to zero the following bits in FPEXC: DEX, IDF, IXF, UFF, OFF, DZF and IOF In seL4, the user space fault handler should do this, as the kernel doesn't know when user space is done handling the FPU trap. In case of virtualisation, the guest kernel should clear these bits. Now that the seL4 kernel doesn't handle FPU traps itself any more, user space can handle asynchronous traps if it wants to. Signed-off-by: Indan Zupancic <[email protected]>
Signed-off-by: Indan Zupancic <[email protected]>
Signed-off-by: Gerwin Klein <[email protected]>
|
Alright, things are finally passing. The link check is failing on the RFC link that will only exist once merged, and the zcu106 is failure is unrelated. Will go ahead with the merge! |
Implementation of the FPU RFC.
Simplest version of semi-lazy FPU switching.
Still work in progress, but in good enough shape for wider review. For previous development, see here.
TODO:
Possible Optimisations: