Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename kernel arch finding function for dispatch #1536

Merged
merged 82 commits into from
May 19, 2023

Conversation

mdoijade
Copy link
Contributor

-- as the kernel arch given by the cudaFuncAttribute ptxVersion depends on what archs the kernel was compiled for
we should renam kernel_runtime_arch() as kernel_virtual_arch().
-- accordingly update comments to reflect this.

mdoijade and others added 30 commits November 18, 2022 20:40
…ck/warp shape size, this now touches the perf of fusedL2NN simt kernel
… from per row to multi-rows. now this kernel is 1.3x to 1.8x faster as k value increases perf gets better than fusedL2NN simt kernel
… than per warp multi row lock, cleanup and doc update
… tile iterator, this both improves perf by 20%+ compared to previous version by reducing gmem atomics+coalesced stores
mdoijade and others added 22 commits May 5, 2023 02:40
…election, add comments on register spills tile shape, add test case for veclen=2
@mdoijade mdoijade requested a review from a team as a code owner May 19, 2023 12:13
@github-actions github-actions bot added the cpp label May 19, 2023
@mdoijade mdoijade added doc Documentation non-breaking Non-breaking change labels May 19, 2023
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cjnolet
Copy link
Member

cjnolet commented May 19, 2023

/merge

@rapids-bot rapids-bot bot merged commit 0154e8e into rapidsai:branch-23.06 May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp doc Documentation non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

3 participants