Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parallelism of refine host #2059

Merged

Conversation

anaruse
Copy link
Contributor

@anaruse anaruse commented Dec 13, 2023

This PR addresses #2058 by changing the thread parallelism method.

In the first half of the refine process, the distance calculation is performed on all candidate vectors, i.e., the number of queries * the original top-k vectors. Since the distance calculations for each vector can be performed independently, this part is thread-parallelized assuming that maximum parallelism is the number of queries * original top-k. This means that even if the number of queries is 1, this part can be executed in thread parallel.

On the other hand, the second half of the refine process, the so-called top-k calculation, can be performed independently for each query, but it is difficult to thread parallelize the calculation for a given query, Therefore, this part is parallelized assuming the maximum parallelism is the number of queries, as in the current implementation.

@anaruse anaruse requested a review from a team as a code owner December 13, 2023 07:28
Copy link

copy-pr-bot bot commented Dec 13, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the cpp label Dec 13, 2023
@achirkin
Copy link
Contributor

/ok to test

@achirkin achirkin added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 13, 2023
@tfeher
Copy link
Contributor

tfeher commented Dec 13, 2023

/ok to test

@achirkin
Copy link
Contributor

/ok to test

Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, apart from a small nitpick, LGTM.

Comment on lines 98 to 99
if (size_t(suggested_n_threads) > n_queries) { suggested_n_threads = n_queries; }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: the check is not redundant thank to the new code

Suggested change
if (size_t(suggested_n_threads) > n_queries) { suggested_n_threads = n_queries; }

@cjnolet
Copy link
Member

cjnolet commented Dec 15, 2023

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Dec 15, 2023

/merge

@cjnolet
Copy link
Member

cjnolet commented Dec 19, 2023

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Jan 3, 2024

/ok to test

@wphicks
Copy link
Contributor

wphicks commented Jan 4, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Jan 6, 2024

@anaruse it looks like there's a minor compile error in this PR, but otherwise the changes are ready to be merged.

@cjnolet
Copy link
Member

cjnolet commented Jan 7, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Jan 9, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Jan 9, 2024

/ok to test

@rapids-bot rapids-bot bot merged commit 3b88d17 into rapidsai:branch-24.02 Jan 9, 2024
61 checks passed
pentschev pushed a commit to pentschev/raft that referenced this pull request Jan 9, 2024
This PR addresses rapidsai#2058 by changing the thread parallelism method.

In the first half of the `refine` process, the distance calculation is performed on all candidate vectors, i.e., the number of queries * the original top-k vectors. Since the distance calculations for each vector can be performed independently, this part is thread-parallelized assuming that maximum parallelism is the number of queries * original top-k. This means that even if the number of queries is 1, this part can be executed in thread parallel.

On the other hand, the second half of the `refine` process, the so-called top-k calculation, can be performed independently for each query, but it is difficult to thread parallelize the calculation for a given query, Therefore, this part is parallelized assuming the maximum parallelism is the number of queries, as in the current implementation.

Authors:
  - Akira Naruse (https://github.com/anaruse)
  - Corey J. Nolet (https://github.com/cjnolet)
  - William Hicks (https://github.com/wphicks)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#2059
ChristinaZ pushed a commit to ChristinaZ/raft that referenced this pull request Jan 17, 2024
This PR addresses rapidsai#2058 by changing the thread parallelism method.

In the first half of the `refine` process, the distance calculation is performed on all candidate vectors, i.e., the number of queries * the original top-k vectors. Since the distance calculations for each vector can be performed independently, this part is thread-parallelized assuming that maximum parallelism is the number of queries * original top-k. This means that even if the number of queries is 1, this part can be executed in thread parallel.

On the other hand, the second half of the `refine` process, the so-called top-k calculation, can be performed independently for each query, but it is difficult to thread parallelize the calculation for a given query, Therefore, this part is parallelized assuming the maximum parallelism is the number of queries, as in the current implementation.

Authors:
  - Akira Naruse (https://github.com/anaruse)
  - Corey J. Nolet (https://github.com/cjnolet)
  - William Hicks (https://github.com/wphicks)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#2059
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

5 participants