-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime kernel management #129
Comments
When the architecture is decided automatically (e.g., CPUID), will that be done on a per-kernel level, or only the first time a BLIS function is invoked? I love the idea of this, and my software currently uses BLIS for its BLAS interface when |
@ShadenSmith good point! It seems OpenBLAS DTRT, so it may be worthwhile to check their solution? |
@ShadenSmith Thanks for your question. Even after runtime kernel management is implemented (with a CPUID-based heuristic for choosing the kernels at runtime on Could you clarify which application of CPUID you are referring to? |
@fgvanzee I believe the question is "will BLIS check cpuid every time dgemm is called?", and from previous discussions I believe the answer is No. The only time you'd want to check more than once is heterogeneous systems like big.LITTLE, and that requires a more streamlined solution than checking cpuid each and every time anyways. The implementation in TBLIS uses (local) static initialization to perform the check then caches the result. In BLIS, there is already some library initialization stuff that gets performed exactly once that this could piggy-back on to. |
@devinamatthews You are correct. The value would be queried once, probably at library initialization, and then cached. However, this behavior could become configurable in the future to accommodate heterogeneous architectures. |
Yes, this is what I meant. Thanks for the clarification and the quick response. @fgvanzee calling just once is great. A very exciting feature! |
Just a quick update. I haven't made much progress on this issue since early June, but I have recently resumed working on it. Thanks for your patience. |
Systems using the GNU libc support indirect functions, notably via the GCC ifunc function attribute. This allows @fgvanzee, is this what you had in mind? Regardless, thanks a lot for looking into it! |
@civodul Thanks for chiming in. It's good to know that GNU libc supports this way of selecting functions at runtime (I was unaware). My plans do not depend on this sort of feature, however. I am planning what is hopefully a more portable solution (one that does not rely on GNU libc) that builds all of the necessary object files and symbols into the same library with all the necessary name-mangling built into the build system and source code. Where applicable, architecture-specific functions will be looked up via arrays of function pointers, indexed via special architecture ids values. The last step of this project, which is not too far off, will be to write the CPUID-based code that maps CPUID return values to the BLIS-specific architecture ids. Other architecture families (e.g. ARM, Power, etc.) will need their own solutions if we are to support auto-detection at runtime (or configure-time for that matter). |
Please pull the CPUID code from TBLIS, as that should be close to a ready-made solution. |
@devinamatthews Thanks, Devin. I'll definitely take a look when the time comes. |
Quick update. I've recently pushed a new commit (453deb2) and branch (named 'rt') that finally implements the feature mentioned in this issue. The only thing(s) missing are the heuristics (e.g. CPUID-based code) that allow us to choose an architecture at run-time when multiple configurations (microarchitectures) are included in the build. I'll be working on that next, along with updating the wiki documentation to describe how to add support for a new configuration, either permanently or as a prototype. But for now, using the |
Additional update. Commit 2c51356 implements the remainder of the work I had planned for this issue. There will inevitably be cleanup and tweaks going forward, but the core of the effort--support for runtime kernel management as well as the multi-architecture builds + runtime hardware detection feature that many have asked for--has been implemented. If you are inclined, feel free to give it a try. (For example, you could try configuring with the Notice that, for now, the configure-time hardware detection uses different code than that used at runtime. Ideally, I would merge the two so that there is just one set of code to maintain, and also so that the two follow the same rules. |
@fgvanzee thanks so much for your work on this! Excited to test. What's the best way to compile a BLIS now that contains all possible kernels for a given architecture (i.e., i386 or arm64 or x86_64)? Tried to go through your commits but I am not sure there is a configure_registry selector that would do that out of the box. |
@iotamudelta You're welcome! I'm excited to provide this feature to the community. Some of your questions may be answered once I write documentation on the new configuration system. (Several of the wikis need to be updated.) I'll get the new documentation set up first, and then merge the The short answer is that you need to define a configuration family in the configuration registry ( Also, you may have also noticed a confusing syntax for some sub-configurations in
This means that the
I'll explain all of this in the updated wikis too. |
@fgvanzee for the FreeBSD port, it'd be great if we could have repositories that include all applicable configurations for x86, x86_64, power, arm64, respectively. I'm happy to help out. Will first need to get the current state to work for FreeBSD. Either way: this is very nice. Unrelated but you may be interested: on FreeBSD-HEAD and an AMD Carrizo, BLIS is competitive to / slightly faster than OpenBLAS for dgemms in my tests. |
@iotamudelta I'm not that surprised that BLIS is highly competitive on an AMD Excavator core, as I vaguely remember observing this myself when I was first writing that microkernel. But, good to know that others are seeing the same thing. |
AFAIK this is done now; closing. |
Agreed, this is done. |
BLIS currently only allows building support for one architecture (configuration) at a time. In the future, it should allow the user to build support for multiple architectures into the same library. The "correct" architecture will then be selected at runtime according to some method. That method could be CPUID or an equivalent, or it could be set to a default in some other way and then later manually changed by the user. Ideally, runtime kernel management should even allow the user to link his own kernel files at link-time, which he can then switch to using the same procedure for switching among the pre-defined architectures.
This feature will require substantial changes to the build system, primarily
configure
,Makefile
, and themake_defs.mk
files. It will also redefine what we think of as a configuration and require a reorganization (and renaming) of the files in the top-levelkernels
directory. A registry will be needed to associate actual configuration names (e.g. haswell) with multi-architecture configurations aliases (e.g., intel64). Finally, it will require a change to the reference kernel files (as well as a relocation) so that their names can be mangled according to the targeted architecture, allowing us to build one set of reference kernels per supported configuration.The text was updated successfully, but these errors were encountered: