Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup make_blobs by up to 2x by fixing inefficient kernel launch configuration #1100

Merged
merged 3 commits into from
Dec 14, 2022

Conversation

Nyrio
Copy link
Contributor

@Nyrio Nyrio commented Dec 14, 2022

The kernel generates two elements per iteration and attempts to write the second element with an offset equal to the grid stride. However, the grid stride is currently computed to be greater than the length of the generated array, so this second value is never used. By using a grid stride of half the array size, we speed up the kernel by nearly 2x in some cases (see perf charts in the PR comments).

Note: this will effectively modify many test inputs, so be aware of that when comparing results prior to and following the change.

@Nyrio Nyrio requested a review from a team as a code owner December 14, 2022 13:23
@github-actions github-actions bot added the cpp label Dec 14, 2022
@Nyrio
Copy link
Contributor Author

Nyrio commented Dec 14, 2022

This is a before/after benchmark:

2022-12-14_make_blobs

@Nyrio Nyrio added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 14, 2022
@cjnolet
Copy link
Member

cjnolet commented Dec 14, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 51c45b0 into rapidsai:branch-23.02 Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants