-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support vector deletion in ANN IVF #1831
Conversation
@lowener it looks like these changes are making great progress and should be wrapped up soon. I think it's time to start proactively gathering benchmarks for the CAGRA deletion all the way through. Please focus exclusively on CAGRA for 23.10 since that's what we've promised. We can add deletion to the others in 23.12 (or 23.10 if there's time after CAGRA). Benchmarks will be super important. I would make sure to benchmark various different cases. Maybe stepping up in powers of 2 would be helpful? (2%, 4%, 8% 16% 32% 64%). |
9661814
to
6c6008e
Compare
6c6008e
to
81dad56
Compare
Benchmark result on my local machine. IVF-Flat:
IVF-PQ:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one question in hopes of making the filtering API consistent across index types, but otherwise this looks very good. The minimal overhead in the benchmark results is very impressive.
* @tparam filter_t | ||
*/ | ||
template <typename index_t, typename filter_t> | ||
struct ivf_to_sample_filter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to provide a unified filtering interface, would it make sense to wrap any filter that is passed to search_with_filtering
for IVF methods in this struct? I have trouble imagining a scenario in which the outer search would really want to make use of a filter that takes into account the in-cluster ID rather than the overall ID, and that would establish a rule that the filter's signature is just (query_index, sample_index) -> bool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexanderguzhva is the original creator of this filtering for IVF indexes. Do you have any examples or use-cases of filtering applied to a cluster/ids in the cluster?
If we can't find a use-case then I think it make sense to wrap every filter with ivf_to_sample_filter
for a consistent filtering API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lowener Yes. https://github.com/rapidsai/raft/blob/branch-23.08/cpp/include/raft/neighbors/sample_filter_types.hpp contains commented out samples, which resemble the way it was originally used in Meta. If I remember correctly, there was some additional semantic information, associated with every sample in every cluster, and a cluster id was necessary to be known.
Technically, it is trivial to write a template wrapper that accepts two types of delegates, both (query_index, sample_index) -> bool
and (query_index, cluster_index, sample_index) -> bool
and put one inside ivfflat
/ ivfpq
code. This is how I would do it.
/merge |
PR based on the new Bitset feature (rapidsai#1803) to support vector deletion in ANN. Closes rapidsai#1177. Closes rapidsai#1620. This PR adds `ivf_to_sample_filter` that acts as an intermediate filter to use an IVF index with a bitset filter. Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - William Hicks (https://github.com/wphicks) URL: rapidsai#1831
PR based on the new Bitset feature (#1803) to support vector deletion in ANN.
Closes #1177.
Closes #1620.
This PR adds
ivf_to_sample_filter
that acts as an intermediate filter to use an IVF index with a bitset filter.