You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have you considered using the core::simd module for simd? I went thru the effort of porting the decode_two_unsafe function, and it seems to have the same performance for me.
Here's a godbolt link with a simplified implementation of it using both the core::simd and core::arch::x86_64 modules. The core::simd implementation actually has no unsafe code besides the transmute, although it does expect a [u8; 16] as input. It also compiles on other platforms, since the core::simd module is meant to be portable. I haven't tested it on anything other than x86_64, but it's supposed to act exactly the same.
I did make my port of the function on the actual library, so I could benchmark it, but the code is really messy, so I only sent the godbolt link for now. I'll probably put it in a branch of my fork, but again, it's really messy and just thrown together.
The text was updated successfully, but these errors were encountered:
I did consider using std::simd but it isn’t stable yet. It would be nice to get support for other architectures for “free” but at this scale every CPU cycle counts, so it would probably still be best to use architecture-specific intrinsics.
Have you considered using the core::simd module for simd? I went thru the effort of porting the decode_two_unsafe function, and it seems to have the same performance for me.
Here's a godbolt link with a simplified implementation of it using both the core::simd and core::arch::x86_64 modules. The core::simd implementation actually has no unsafe code besides the transmute, although it does expect a [u8; 16] as input. It also compiles on other platforms, since the core::simd module is meant to be portable. I haven't tested it on anything other than x86_64, but it's supposed to act exactly the same.
I did make my port of the function on the actual library, so I could benchmark it, but the code is really messy, so I only sent the godbolt link for now. I'll probably put it in a branch of my fork, but again, it's really messy and just thrown together.
The text was updated successfully, but these errors were encountered: