-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Add pre-filtering to CAGRA #1811
Conversation
/ok to test |
/ok to test |
1 similar comment
/ok to test |
/ok to test |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @enp1s0 for this PR, I had a quick look at it. The changes in the template instantiations make the PR look large, but the relevant changes are more concise.
Overall the PR looks good, but it should be explained how the filtering works. You gave a detailed description in this comment, and it would be good to add part of it as code comments, or at least include it in the PR description.
cpp/include/raft/neighbors/detail/cagra/search_multi_kernel.cuh
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found those 2 changes while combining pre-filtering with bitset. I think that this behavior is not present in multi-kernel
cpp/include/raft/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh
Outdated
Show resolved
Hide resolved
cpp/include/raft/neighbors/detail/cagra/search_multi_cta_kernel-inl.cuh
Outdated
Show resolved
Hide resolved
Co-authored-by: Tamas Bela Feher <[email protected]>
…l-inl.cuh Co-authored-by: Micka <[email protected]>
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @enp1s0 for the update LGTM.
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for getting this feature ready for 23.10, @enp1s0! Others have already done great reviews, but I see just one API thing that I'd like to address before we end up with too many different filtering implementations.
@@ -37,6 +37,18 @@ struct none_ivf_sample_filter { | |||
} | |||
}; | |||
|
|||
/* A filter that filters nothing. This is the default behavior. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please use one sample filtering API across all the ANN indexes? For example, a user should be able to instantiate a filter and use it in IVF-flat, ivf-pq, brute-force, or cagra. Is there anything stopping us from being able to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lowener What do you think about the @cjnolet 's suggestion? This code is taken from your PR. I think we can use none_ivf_sample_filter
as the default filter instead of none_cagra_sample_filter
, but the name is not good because CAGRA is not ivf. I prefer to rename none_ivf_sample_filter
to none_sample_filter
but it changes the API so I'm not sure it is the best way for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have been communicating to users that sample filtering has been in an experimental state because we haven't had any actual optimized filtering primitives like bitsets, bitmaps, or bloom filters for them to use with the filters. It will be okay to make small adjustments to the APIs at this point, especially if they are consolidating allowing for more reuse.
/merge |
Closes #1600. To be merged after #1803 and #1811 This PR adds `bitset_filter` to filter an index with a bitset. Authors: - Micka (https://github.com/lowener) - tsuki (https://github.com/enp1s0) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet)
This PR adds the pre-filtering feature to the CAGRA search implementations.
Rel: taken over from #1765
Algorithm
The pre-filtering algorithm removes a node that should not be in the final result after it has behaved as a parent node. This way, the nodes that should not be in the final result are also used in the graph traversal, avoiding potential performance degradation.
Changes