Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase parallelism in allgatherv #525

Merged
merged 3 commits into from
Feb 23, 2022

Conversation

seunghwak
Copy link
Contributor

allgatherv is implemented using multiple NCCL broadcast operations.

Previously, RAFT performed these broadcast operations sequentially creating a hot-spot around the root node in each broadcast operations.

These PR places multiple broadcast operations inside ncclGroupStart and ncclGroupEnd increasing the parallelism and more evenly stressing the communication interconnect.

@seunghwak seunghwak requested a review from a team as a code owner February 22, 2022 22:52
@seunghwak seunghwak self-assigned this Feb 22, 2022
@github-actions github-actions bot added the cpp label Feb 22, 2022
@seunghwak seunghwak added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed cpp labels Feb 22, 2022
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a limit on the number of calls inside a group, quoting the document from nccl:

Also note, that there is a maximum of 2048 NCCL operations that can be inserted between the ncclGroupStart and ncclGroupEnd calls.

@seunghwak
Copy link
Contributor Author

I think there's a limit on the number of calls inside a group, quoting the document from nccl:

Also note, that there is a maximum of 2048 NCCL operations that can be inserted between the ncclGroupStart and ncclGroupEnd calls.

Thanks, yeah... I don't think cuGraph will ever hit this limit as we run allgatherv on sub-communicator (unless we will work on multi-million GPUs), but this can definitely happen in other use cases working no the global communicator.

I will make an update to accommodate this.

@seunghwak
Copy link
Contributor Author

@seunghwak are you ready for this to be merged?

No, so @trivialfis said, this code can fail if someone runs allgatherv with more than 2048 GPUs. I guess no one will do this in short-term, but better address this now to be future proof.

@trivialfis
Copy link
Member

Maybe just a check since this will not happen in foreseeable future. ;-)

@github-actions github-actions bot added the cpp label Feb 23, 2022
@seunghwak
Copy link
Contributor Author

OK, done, now I believe this PR is ready to be merged.

@cjnolet
Copy link
Member

cjnolet commented Feb 23, 2022

@gpucibot merge

@cjnolet
Copy link
Member

cjnolet commented Feb 23, 2022

rerun tests

@rapids-bot rapids-bot bot merged commit 5de31e0 into rapidsai:branch-22.04 Feb 23, 2022
@cjnolet
Copy link
Member

cjnolet commented Feb 23, 2022

@gpucibot merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants