Skip to content

'VFT dispatching' to call into SIMD-ISA-specific code #2364

Open
@kfjahnke

Description

Hi! This is more of a little excursion than a 'true' issue, but it's about a technique which I've found useful and would like to share. The occasion is that I'm extending my library zimt to use highway's foreach_target mechanism.

My first remark - before I start out on the issue proper - is about this mechanism. I knew it was there, I thought it might be a good idea to use it, but the documentation was thin and I had a working solution already. Before I turned my attention to zimt again this autumn, I did a lot of reading in the SIMD literature, and I also decided to have a closer look at highway's foreach_target mechanism. Lacking extensive documentation, I sat myself down and read the code. Only then I realized just how well-thought-out and useful it actually is. Yes, using it is slightly intrusive to the client code, but you've really done a good job to hide the complexity and make it easy to 'suck' code into the SIMD-ISA-specific nested namespaces and dispatch to it. But here I do actually have some criticism - to figure that out I had to read and understand the code! It would have been much easier had there been some sort of technical outline, paper, or such - to explain the concept. This criticism goes beyond this specific topic - I think you'd be well advised to improve documentation, to address a wider user base.

My first step in introducing highway's multi-ISA capability into zimt was to introduce 'corresponding' nested namespaces in both my library's 'zimt' namespace and in the 'project' namespace (let's use this one for user-side code). I had a hard time initially figuring out the namespace scheme, probably because the name of the central macro HWY_NAMESPACE. The naming is unfortunate - of course it's a symbol for a namespace, but a name like HWY_SIMD_ISA would have hinted at it's semantics, not at syntax. With the namespaces set up, I could use foreach_target.h and dynamic dispatch. But I found the way to introduce the ISA-specific code via a free function verbose, so I tried to figure out a way to make this more concise and manageable. I 'bent' some of the code I used in lux to the purpose, and this is where I come to 'VFT dispatching'. The concept is quite simple:

  • Instead of using free functions, I work with virtual member functions in a 'dispatch' object
  • The dispatch object's base class has declarations of these as pure virtual static member functions
  • In each ISA-specific nested namespace I inherit from the dispatch base class and provide the implementations
  • I have a 'get_dispatch' routine (using highway's dynamic dispatch) which yields a dispatch base class pointer
  • which does in fact point to the derived class with the ISA-specific implementations

So this is where the 'VFT' in 'VFT dispatching' comes from: it uses the virtual function table of a class with virtual functions. The language guarantees that the VFTs of all classes derived from the base have the same layout (otherwise the mechanism could not function). What do I gain? the base class pointer is a uniform handle to a - possible large - set of functions I want to keep ISA-specific versions of. Dispatching is as simple as calling through the dispatcher base class pointer, so once I have obtained it, it serves as a conduit:

// inside code submitted to foreach_target I have:

struct _dispatch
: public dispatch // inherit from dispatch base class
{
   ...
} ;

static const _dispatch local_dispatch ; // instantiate the derived class

const dispatch * const get_dispatch() // install the ISA-specific fetch routine
{
  return & local_dispatch ; // return type will be cast to base class, as declared
}

// in the main body of code (in HWY_ONCE) I use highway's dynamic dispatch:

HWY_EXPORT(_get_dispatch);

const dispatch * const get_dispatch()
{
  return HWY_DYNAMIC_DISPATCH(_get_dispatch)() ;
}

// with a member function 'foo' in the dispatcher, I can now dispatch like this:

auto dp = get_dispatch() ;
bar = dp->foo() ; // calls the ISA-specific variant

This is more or less it, with one more little twist which I also found useful. In a first approach, I wrote out the declaration of the pure virtual member function in the base class, and again the declaration (now no longer pure) in the derived, ISA-specific, class. This is error-prone, so I now use an interface header, introducing the member functions via a macro. In a header 'interface.h' I put macro invocations only:

// register int dummy ( float z ) for dispatch:

ZIMT_REGISTER(int, dummy, float z)

Then I can #include this header into the class declarations, #defining the macro differently:

// the dispatch base class is coded like this:
struct dispatch
{
#define ZIMT_REGISTER(RET,NAME,...) virtual RET NAME ( __VA_ARGS__ ) const = 0 ;
#include "interface.h"
#undef ZIMT_REGISTER
} ;
// the ISA-specific derived class (inside code submitted to foreach_target):
struct _dispatch
: public dispatch
{
#define ZIMT_REGISTER(RET,NAME,...) RET NAME ( __VA_ARGS__ ) const ;
#include "interface.h"
#undef ZIMT_REGISTER
} ;
// followed, inside the same nested namespace, by the definition(s)
int _dispatch::dummy (float z) const
{
  return 23 ;
}

This ensures that the declarations are consistent. For the actual implementation, the signature has to be written out once again, but since there is a declaration, providing a definition with different signature is an error, and when providing the implementation, coding with the signature 'in sight' is advisable anyway - especially when the argument list becomes long. the 'interface.h' header provides a good reference to the set of functions using the dispatcher, and additional dispatch-specific functionality can be coded for the lot. I think it makes a neat addition to VFT dispatching.

To wrap up, I'd like to point out that this mechanism is generic and can be used to good effect for all sorts of dispatches - If appropriate specific derived dispatch classes are coded along with a mechanism to pick a specific one, it can function quite independently of highway's dispatch. It can also be used to 'pull in' code which doesn't even use highway - e.g. code with vector-friendly small loops ('goading') relying on autovectorization which will still benefit from being re-compiled several times with ISA-specific flags, be it with highway's foreach_target or by setting up separate TUs with externally supplied ISA-specific compiler flags - this is what I currently do in lux, but it requires quite some 'scaffolding' code in cmake.

Comments welcome! I hope you find this useful - I intended to share useful bits here every now and then and it's been a while since the last one (about goading), but better late than never. If you're interested, you can have a peek into zimt's new multi_isa branch, where I have a first working example using the method (see linspace.cc and driver.cc in the examples section). If you don't approve of my intruding into your issue space, let me know.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions