Make IVF-PQ build index in batches when necessary #1056

achirkin · 2022-11-30T16:29:17Z

Before this patch, when the input data was not accessible directly from the device, the build and extend functions mapped it using the cudaHostRegister. Although this approach was rather fast, it could fail when the input data is too large to fit in the device memory.
This PR, changes the logic of build and extend, so that the data is loaded in batches when necessary. Moreover, when the passed pointer represents the mapped file (e.g. using the system call mmap ), the size of the input may even be larger than the host memory.
The build does one pass through the input (to sample the training set), and the extend does at most two passes.

achirkin · 2022-12-12T11:56:42Z

run tests

…d out-of-memory exceptions at the cost of longer build times

…batched-building

tfeher

Thanks Artem for this PR! It is a larger refactoring of the index building, and it is nice to see the improved modularity of the extend method as a result. See my comments below.

cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh

cpp/include/raft/spatial/knn/detail/ann_utils.cuh

cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh

…inearly with the batch size

…dex size is too big

…batched-building

tfeher

Thanks Artem for the updates! It is great that the functions to store the encoded datasets can be replaced with more concise and faster kernels! The PR looks good to me.

cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh

codecov-commenter · 2023-01-04T17:40:29Z

Codecov Report

Base: 87.68% // Head: 87.68% // No change to project coverage 👍

Coverage data is based on head (0c69e89) compared to base (96578a1).
Patch has no changes to coverable lines.

Additional details and impacted files

@@              Coverage Diff              @@
##           branch-23.02    #1056   +/-   ##
=============================================
  Coverage         87.68%   87.68%           
=============================================
  Files                20       20           
  Lines               471      471           
=============================================
  Hits                413      413           
  Misses               58       58

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

achirkin · 2023-01-05T15:42:08Z

WIP: there's a bug somewhere, which makes the recall drop non-deterministically in tests. IvfPq/f32_f32_i64.build_extend_search/17 fails most often ~ every 4-8 runs if tested in a loop. I suspect the 'extend' code is missing cuda sync somewhere.

achirkin · 2023-01-06T16:05:32Z

Update: the root of the problem was that the list_offsets() sometimes ended up with incorrect values. I'm not sure exactly why the old way of computing padded offsets was incorrect, but changing it using thrust::transform_iterator made the whole thing pass the tests stable (tested running the offending tests in a loop for >1000 times).

…batched-building

benfred · 2023-01-06T22:14:03Z

/merge

Make ivf-pq build index in batches when necessary

4d4512f

achirkin requested a review from a team as a code owner November 30, 2022 16:29

github-actions bot added the cpp label Nov 30, 2022

achirkin added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 30, 2022

cjnolet assigned achirkin Dec 6, 2022

achirkin added 8 commits December 15, 2022 09:18

Merge branch 'branch-23.02' into enh-ivf-pq-batched-building

b42b848

Adjust the logic for choosing the batch sizes and memory type to avoi…

6c5d024

…d out-of-memory exceptions at the cost of longer build times

Merge remote-tracking branch 'rapidsai/branch-23.02' into enh-ivf-pq-…

e977f55

…batched-building

Fix integer comparison with different signedness

9645b11

Merge remote-tracking branch 'rapidsai/branch-23.02' into enh-ivf-pq-…

3e8cc03

…batched-building

Use raft operators in place of thrust

7155eef

Explain the in-place transform of the trainset in comments

66e58db

Add docs to process_and_fill_codes

b5491f6

tfeher requested changes Dec 17, 2022

View reviewed changes

achirkin added 5 commits December 19, 2022 16:54

extend-fill codes and indices in a single kernel, exec time scaling l…

232b772

…inearly with the batch size

Update documentation

1c3800c

Update documentation

9a6c6f7

Wrap the allocation error with the raft logic error in case if the in…

63a6caa

…dex size is too big

Merge remote-tracking branch 'rapidsai/branch-23.02' into enh-ivf-pq-…

e28ca3d

…batched-building

achirkin requested a review from tfeher December 19, 2022 16:57

tfeher approved these changes Dec 19, 2022

View reviewed changes

cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh Outdated Show resolved Hide resolved

achirkin added 2 commits December 20, 2022 07:41

Avoid narrowing conversion error for long->size_t in the error reporting

81250e8

Split out flat_compute_residuals code from process_and_fill_codes

45920ec

achirkin added 5 - Ready to Merge and removed 3 - Ready for Review labels Dec 20, 2022

cjnolet mentioned this pull request Jan 3, 2023

New helper: using_mapped_memory_t #1046

Closed

Merge branch 'branch-23.02' into enh-ivf-pq-batched-building

0c69e89

achirkin added 2 - In Progress Currenty a work in progress 5 - DO NOT MERGE Hold off on merging; see PR for details and removed 5 - Ready to Merge labels Jan 5, 2023

Fix somehow incorrect inclusive scan / padded cumulative sum

c66cc02

achirkin added 3 - Ready for Review and removed 5 - DO NOT MERGE Hold off on merging; see PR for details 2 - In Progress Currenty a work in progress labels Jan 6, 2023

Merge remote-tracking branch 'rapidsai/branch-23.02' into enh-ivf-pq-…

7fe84a9

…batched-building

rapids-bot bot merged commit 9944b3a into rapidsai:branch-23.02 Jan 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make IVF-PQ build index in batches when necessary #1056

Make IVF-PQ build index in batches when necessary #1056

achirkin commented Nov 30, 2022

achirkin commented Dec 12, 2022

tfeher left a comment

tfeher left a comment

codecov-commenter commented Jan 4, 2023

achirkin commented Jan 5, 2023

achirkin commented Jan 6, 2023

benfred commented Jan 6, 2023

Make IVF-PQ build index in batches when necessary #1056

Make IVF-PQ build index in batches when necessary #1056

Conversation

achirkin commented Nov 30, 2022

achirkin commented Dec 12, 2022

tfeher left a comment

Choose a reason for hiding this comment

tfeher left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jan 4, 2023

Codecov Report

achirkin commented Jan 5, 2023

achirkin commented Jan 6, 2023

benfred commented Jan 6, 2023