-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use rapids-cmake new parallel testing feature #1623
Use rapids-cmake new parallel testing feature #1623
Conversation
cpp/test/util/device_atomics.cu
Outdated
// Copy data to host | ||
RAFT_CUDA_TRY(cudaStreamSynchronize(s)); //ensure stream is done before copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use a cudaMemcpyAsync below instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can, and done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use raft::copy and raft::update_host instead of calling cuda directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no existing raft calls ( or handles ) in this test. The original version was calling cudaMemcpy
on memory being written to by a kernel on a different stream.
Corrects multiple test issues uncovered by running tests in parallel. Splits the matrix and neighbor tests into multiple executables to improve throughput Previous performance on 1 GPU: 830sec New performance on 1 GPU: 692sec New performance on 2 GPU: 418sec
29e5659
to
aa0b622
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
/merge |
Corrects multiple test issues uncovered by running tests in parallel. Splits the matrix and neighbor tests into multiple executables to improve throughput
Previous performance on 1 GPU: 830sec
New performance on 1 GPU: 692sec
New performance on 2 GPU: 418sec