Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANN-benchmarks: avoid using the dataset during search when possible #1657

Merged

Conversation

achirkin
Copy link
Contributor

@achirkin achirkin commented Jul 20, 2023

This PR changes the behavior of ANN benchmark dataset.h to defer reading the data until it is definitely needed. This allows to avoid copying/reading huge datasets in search benchmarks when the database (index) is already built.

@achirkin achirkin added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 20, 2023
@achirkin achirkin requested a review from tfeher July 20, 2023 12:25
@achirkin achirkin requested a review from a team as a code owner July 20, 2023 12:25
@github-actions github-actions bot added the cpp label Jul 20, 2023
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @achirkin for the PR. This is useful for performing benchmarks with already saved indices. LGTM.

if (!d_base_set_) {
base_set();
RAFT_CUDA_TRY(cudaMalloc((void**)&d_base_set_, base_set_size_ * dim_ * sizeof(T)));
RAFT_CUDA_TRY(cudaMalloc((void**)&d_base_set_, base_set_size() * dim() * sizeof(T)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we are not using RMM allocator here? (I know it was not changed in this PR, and should be a separate task to fix that).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was a straight file copy from the separate ann benchmarks project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this has been on the TODO but we rushed this out in 23.04 and haven't gotten back to this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relevant issue here: #1367. Item number 2

@cjnolet
Copy link
Member

cjnolet commented Jul 20, 2023

/merge

@rapids-bot rapids-bot bot merged commit 06b3aa0 into rapidsai:branch-23.08 Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

3 participants