-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add warp-aggregated atomic increment #735
Add warp-aggregated atomic increment #735
Conversation
Moving this functionality from cuml to raft as discussed in rapidsai/cuml#4803 |
4a76a73
to
a001020
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Please add a unit test.
Faster atomic counter increment using warp-aggregated atomics. Useful for filtering. Adapted from: https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/
a001020
to
04ea9a7
Compare
I have added a unit test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ahendriksen for adding the test! The PR looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a tiny question / nitpick in the test. Thanks @ahendriksen!
out_device.data()); | ||
|
||
// Copy data to host | ||
RAFT_CUDA_TRY(cudaMemcpy(out_host.data(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use raft::copy
here?
@gpucibot merge |
Implement fast atomic counter increment using warp-aggregated atomics. Useful
for filtering.
Adapted from:
https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/