NIMD: A Write Once SIMD Library for Nim wrapping libsimdpp.
NIMD supports a large part of libsimdpp
. See its docs: libsimdpp docs
NIMD allows you to use SIMD vectorization without the need for specific intrinsics.
Courtesy of libsimdpp:
On architectures that support different SIMD instruction sets the library allows the same source code files to be compiled for each SIMD instruction set and then hooked into an internal or third-party dynamic dispatch mechanism. This allows the capabilities of the processor to be queried on runtime and the most efficient implementation to be selected.
import nimd
var x = Vector(1.float32, 2, 3, 4)
let
y = Vector(4.float32, 3, 2, 1)
z = x * y / x # standard ops
x[3]=9 #insertion
echo z # float32x4[4.0, 3.0, 2.0, 1.0]
echo x # float32x4[1.0, 2.0, 3.0, 9.0]
echo y.min(x) # float32x4[1.0, 2.0, 2.0, 1.0]
nimble install https://github.com/CircArgs/nimd.git
You will also need libsimdpp (As of this writing NIMD works with f7ab03f). Clone it somewhere git clone https://github.com/p12tic/libsimdpp.git
. Lets call the absolute path to libsimdpp
: MY_LIBSIMDPP_PATH
.
We can build projects derivative of NIMD
by using the following nim compiler flags:
cpp
: use the c++ backend-t:-I$MY_LIBSIMDPP_PATH
: a NIMD compile-time flag to state wherelibsimdpp
is located
Note: you could also add these to a config.nims
nim cpp -t:"-I/home/nick/Projects/testnimd/libsimdpp" -r src/testnimd.nim
NIMD supports the following vector types
base_type | size_type | simd_width | simd_type |
---|---|---|---|
uint | uint8 | 16 | uint8x16 |
int | int8 | 16 | int8x16 |
uint | uint16 | 8 | uint16x8 |
int | int16 | 8 | int16x8 |
uint | uint32 | 4 | uint32x4 |
int | int32 | 4 | int32x4 |
uint | uint64 | 2 | uint64x2 |
int | int64 | 2 | int64x2 |
float | float32 | 4 | float32x4 |
float | float64 | 2 | float64x2 |
uint | uint8 | 32 | uint8x32 |
int | int8 | 32 | int8x32 |
uint | uint16 | 16 | uint16x16 |
int | int16 | 16 | int16x16 |
uint | uint32 | 8 | uint32x8 |
int | int32 | 8 | int32x8 |
uint | uint64 | 4 | uint64x4 |
int | int64 | 4 | int64x4 |
float | float32 | 8 | float32x8 |
float | float64 | 4 | float64x4 |
uint | uint8 | 64 | uint8x64 |
int | int8 | 64 | int8x64 |
uint | uint16 | 32 | uint16x32 |
int | int16 | 32 | int16x32 |
uint | uint32 | 16 | uint32x16 |
int | int32 | 16 | int32x16 |
uint | uint64 | 8 | uint64x8 |
int | int64 | 8 | int64x8 |
float | float32 | 16 | float32x16 |
float | float64 | 8 | float64x8 |
mask_uint | mask_uint8 | 16 | mask_uint8x16 |
mask_int | mask_int8 | 16 | mask_int8x16 |
mask_uint | mask_uint16 | 8 | mask_uint16x8 |
mask_int | mask_int16 | 8 | mask_int16x8 |
mask_uint | mask_uint32 | 4 | mask_uint32x4 |
mask_int | mask_int32 | 4 | mask_int32x4 |
mask_uint | mask_uint64 | 2 | mask_uint64x2 |
mask_int | mask_int64 | 2 | mask_int64x2 |
mask_float | mask_float32 | 4 | mask_float32x4 |
mask_float | mask_float64 | 2 | mask_float64x2 |
mask_uint | mask_uint8 | 32 | mask_uint8x32 |
mask_int | mask_int8 | 32 | mask_int8x32 |
mask_uint | mask_uint16 | 16 | mask_uint16x16 |
mask_int | mask_int16 | 16 | mask_int16x16 |
mask_uint | mask_uint32 | 8 | mask_uint32x8 |
mask_int | mask_int32 | 8 | mask_int32x8 |
mask_uint | mask_uint64 | 4 | mask_uint64x4 |
mask_int | mask_int64 | 4 | mask_int64x4 |
mask_float | mask_float32 | 8 | mask_float32x8 |
mask_float | mask_float64 | 4 | mask_float64x4 |
mask_uint | mask_uint8 | 64 | mask_uint8x64 |
mask_int | mask_int8 | 64 | mask_int8x64 |
mask_uint | mask_uint16 | 32 | mask_uint16x32 |
mask_int | mask_int16 | 32 | mask_int16x32 |
mask_uint | mask_uint32 | 16 | mask_uint32x16 |
mask_int | mask_int32 | 16 | mask_int32x16 |
mask_uint | mask_uint64 | 8 | mask_uint64x8 |
mask_int | mask_int64 | 8 | mask_int64x8 |
mask_float | mask_float32 | 16 | mask_float32x16 |
mask_float | mask_float64 | 8 | mask_float64x8 |