-
Notifications
You must be signed in to change notification settings - Fork 191
Adding support for ppc64le #892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@FreddieWitherden @joseemoreira @MichaelMeissner you might like to take a look at this. |
In the code generator there are quite a few large switch/case blocks. These can be eliminated by repurposing some of the unused bits in the instruction value to indicate the type of the instruction. Then checking the form of an instruction becomes a simple bitwise operation rather than a massive switch/case block. This is a trick which is used by both the x86 and ARM generators to improve performance and simplify the code. Also, please check everything compiles as C89 (no // comments and all variables must be defined at the start of a block). |
Thanks for your contribution. I'm with @FreddieWitherden here in terms of potential for optimization. I also suggest we merge for now into a branch "feature_ppc64le" and the we also get CI up and running before merging into main. Is there a chance that we could run CI in IBM cloud in a similar way as we use GVT3 in AWS? Would you folks be able to provide credits for this? |
Here is an option which was announced today: https://lists.osuosl.org/pipermail/openpower/Week-of-Mon-20240819/000110.html |
I'll investigate the IBM cloud power10 access for CI. Also I found a bug for FP64 when m and n are small, k is large, and k % 4 == 0. I'll fix this later today. |
Getting IBM Cloud credit may be difficult, I could get a power10 server span up but I'm not sure how long it would be available for. The option pointed out by @breuera I think might be best. Here are some further links:
|
Added initial FP32-POWER GEMM-microkernel.
Rmeoved trailing white spaces.
Increased reuse-distance of GPRs w.r.t. VSX-loads and -stores.
Added wrapper for power fixed-point compare.
Signed-off-by: Will Trojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
0331e90
to
9ef51a6
Compare
@FreddieWitherden I embedded a form ID in the 32 and 64 bit opcodes, and did some other clean up. It reduced the line count a fair bit.
|
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: WillTrojak <[email protected]>
Signed-off-by: Will Trojak <[email protected]>
Hi @WillTrojak, thanks for the work on PPC! I have submitted the form above and listed you as the advocate. I agree with Alex, we can easily merge this into a branch, create a CI loop in Github and work there. When we're happy, we merge to main. We do this with other feature branches, too. |
@rengolin i actually already have a power10 server via OSUOSL for libxsmm. Email me and I can add you to the instance to set up the CI stuff. I found with OpenBLAS, that the performance was 10-20% lower than on the bare metal as they seem to be running some OpenShift cluster. Sorry about being so slow on this PR, I have a number of updates I was going to push in the next few days which Improve the performance quite a bit. I’m just testing and solving some regression issues with power9. |
Oh, it seems the process has started already. Perhaps you'll be contacted?
Yeah, I'm not too worried about performance numbers in a VM, it won't be a constant factor anyway, so mostly caring about conformance in this CI loop. If you have access to a bare metal machine, you can run benchmarks and report the numbers on the PR, that should be fine. |
@rengolin I pushed an update. I'm pretty happy with the state of the kernels now being produced, especially for FP32. I've spoken to @FreddieWitherden and we have a plan for spare kernels, but I'll sort this in a separate PR. |
Ok, so I changed the base branch to https://github.com/libxsmm/libxsmm/tree/feature_powerpc until we work on funcionality, but we'll need to rebase to If you have more code on this, please submit PRs against the |
Great. I've been working on rebasing it. When I run the code before the rebase everything's fine, but after the rebase it segfaults. Did something change in the way arguments are passed to kernels? There seem to have been some changes in |
This adds F32 and F64 support for PowerPC LE 64, by building on some initial work by @breuera.
Features:
Not currently supported: