Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make IVF-PQ build index in batches when necessary #1056

Merged

Conversation

achirkin
Copy link
Contributor

Before this patch, when the input data was not accessible directly from the device, the build and extend functions mapped it using the cudaHostRegister. Although this approach was rather fast, it could fail when the input data is too large to fit in the device memory.
This PR, changes the logic of build and extend, so that the data is loaded in batches when necessary. Moreover, when the passed pointer represents the mapped file (e.g. using the system call mmap ), the size of the input may even be larger than the host memory.
The build does one pass through the input (to sample the training set), and the extend does at most two passes.

@achirkin achirkin requested a review from a team as a code owner November 30, 2022 16:29
@github-actions github-actions bot added the cpp label Nov 30, 2022
@achirkin achirkin added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 30, 2022
@achirkin
Copy link
Contributor Author

run tests

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for this PR! It is a larger refactoring of the index building, and it is nice to see the improved modularity of the extend method as a result. See my comments below.

cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh Outdated Show resolved Hide resolved
cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh Outdated Show resolved Hide resolved
cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh Outdated Show resolved Hide resolved
cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh Outdated Show resolved Hide resolved
cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh Outdated Show resolved Hide resolved
@achirkin achirkin requested a review from tfeher December 19, 2022 16:57
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for the updates! It is great that the functions to store the encoded datasets can be replaced with more concise and faster kernels! The PR looks good to me.

cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

Codecov Report

Base: 87.68% // Head: 87.68% // No change to project coverage 👍

Coverage data is based on head (0c69e89) compared to base (96578a1).
Patch has no changes to coverable lines.

Additional details and impacted files
@@              Coverage Diff              @@
##           branch-23.02    #1056   +/-   ##
=============================================
  Coverage         87.68%   87.68%           
=============================================
  Files                20       20           
  Lines               471      471           
=============================================
  Hits                413      413           
  Misses               58       58           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@achirkin achirkin added 2 - In Progress Currenty a work in progress 5 - DO NOT MERGE Hold off on merging; see PR for details and removed 5 - Ready to Merge labels Jan 5, 2023
@achirkin
Copy link
Contributor Author

achirkin commented Jan 5, 2023

WIP: there's a bug somewhere, which makes the recall drop non-deterministically in tests. IvfPq/f32_f32_i64.build_extend_search/17 fails most often ~ every 4-8 runs if tested in a loop. I suspect the 'extend' code is missing cuda sync somewhere.

@achirkin achirkin added 3 - Ready for Review and removed 5 - DO NOT MERGE Hold off on merging; see PR for details 2 - In Progress Currenty a work in progress labels Jan 6, 2023
@achirkin
Copy link
Contributor Author

achirkin commented Jan 6, 2023

Update: the root of the problem was that the list_offsets() sometimes ended up with incorrect values. I'm not sure exactly why the old way of computing padded offsets was incorrect, but changing it using thrust::transform_iterator made the whole thing pass the tests stable (tested running the offending tests in a loop for >1000 times).

@benfred
Copy link
Member

benfred commented Jan 6, 2023

/merge

@rapids-bot rapids-bot bot merged commit 9944b3a into rapidsai:branch-23.02 Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

4 participants