Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use core::simd? #6

Open
Inconn opened this issue Jan 2, 2024 · 3 comments
Open

Use core::simd? #6

Inconn opened this issue Jan 2, 2024 · 3 comments

Comments

@Inconn
Copy link

Inconn commented Jan 2, 2024

Have you considered using the core::simd module for simd? I went thru the effort of porting the decode_two_unsafe function, and it seems to have the same performance for me.

Here's a godbolt link with a simplified implementation of it using both the core::simd and core::arch::x86_64 modules. The core::simd implementation actually has no unsafe code besides the transmute, although it does expect a [u8; 16] as input. It also compiles on other platforms, since the core::simd module is meant to be portable. I haven't tested it on anything other than x86_64, but it's supposed to act exactly the same.

I did make my port of the function on the actual library, so I could benchmark it, but the code is really messy, so I only sent the godbolt link for now. I'll probably put it in a branch of my fork, but again, it's really messy and just thrown together.

@Inconn
Copy link
Author

Inconn commented Jan 2, 2024

@as-com
Copy link
Owner

as-com commented Jan 3, 2024

I did consider using std::simd but it isn’t stable yet. It would be nice to get support for other architectures for “free” but at this scale every CPU cycle counts, so it would probably still be best to use architecture-specific intrinsics.

@andrewgazelka
Copy link

i'd argue nice to be able to at least compile on aarch64. Often my workflow is developing on my MacBook Pro but deploying to a linux server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants