Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add pre-filtering to CAGRA #1811

Merged
merged 44 commits into from
Sep 25, 2023

Conversation

enp1s0
Copy link
Member

@enp1s0 enp1s0 commented Sep 8, 2023

This PR adds the pre-filtering feature to the CAGRA search implementations.

Rel: taken over from #1765

Algorithm

The pre-filtering algorithm removes a node that should not be in the final result after it has behaved as a parent node. This way, the nodes that should not be in the final result are also used in the graph traversal, avoiding potential performance degradation.

Changes

  • Add filtering operation on a parent node after internal top-M buffer candidate calculation.
  • Add filtering operation to result buffer before storing them in the device memory.

@enp1s0 enp1s0 requested a review from a team as a code owner September 8, 2023 05:34
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 8, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@enp1s0 enp1s0 self-assigned this Sep 8, 2023
@github-actions github-actions bot added the cpp label Sep 8, 2023
@enp1s0 enp1s0 removed the cpp label Sep 8, 2023
@cjnolet
Copy link
Member

cjnolet commented Sep 9, 2023

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Sep 12, 2023

/ok to test

1 similar comment
@cjnolet
Copy link
Member

cjnolet commented Sep 12, 2023

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Sep 20, 2023

/ok to test

@lowener
Copy link
Contributor

lowener commented Sep 20, 2023

/ok to test

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enp1s0 for this PR, I had a quick look at it. The changes in the template instantiations make the PR look large, but the relevant changes are more concise.

Overall the PR looks good, but it should be explained how the filtering works. You gave a detailed description in this comment, and it would be good to add part of it as code comments, or at least include it in the PR description.

Copy link
Contributor

@lowener lowener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found those 2 changes while combining pre-filtering with bitset. I think that this behavior is not present in multi-kernel

@enp1s0
Copy link
Member Author

enp1s0 commented Sep 21, 2023

Thank you, @tfeher and @lowener, for the review. I have fixed the code and explained how the filtering works in the PR description.

@lowener
Copy link
Contributor

lowener commented Sep 21, 2023

/ok to test

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enp1s0 for the update LGTM.

@lowener
Copy link
Contributor

lowener commented Sep 22, 2023

/ok to test

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for getting this feature ready for 23.10, @enp1s0! Others have already done great reviews, but I see just one API thing that I'd like to address before we end up with too many different filtering implementations.

@@ -37,6 +37,18 @@ struct none_ivf_sample_filter {
}
};

/* A filter that filters nothing. This is the default behavior. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please use one sample filtering API across all the ANN indexes? For example, a user should be able to instantiate a filter and use it in IVF-flat, ivf-pq, brute-force, or cagra. Is there anything stopping us from being able to do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lowener What do you think about the @cjnolet 's suggestion? This code is taken from your PR. I think we can use none_ivf_sample_filter as the default filter instead of none_cagra_sample_filter, but the name is not good because CAGRA is not ivf. I prefer to rename none_ivf_sample_filter to none_sample_filter but it changes the API so I'm not sure it is the best way for users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have been communicating to users that sample filtering has been in an experimental state because we haven't had any actual optimized filtering primitives like bitsets, bitmaps, or bloom filters for them to use with the filters. It will be okay to make small adjustments to the APIs at this point, especially if they are consolidating allowing for more reuse.

@cjnolet
Copy link
Member

cjnolet commented Sep 25, 2023

@enp1s0 @lowener I'm okay doing the rename as a follow-on to speed up reviews for the PRs that rely on this feature.

@cjnolet
Copy link
Member

cjnolet commented Sep 25, 2023

/merge

@rapids-bot rapids-bot bot merged commit cb24d99 into rapidsai:branch-23.10 Sep 25, 2023
raydouglass pushed a commit that referenced this pull request Oct 3, 2023
Closes #1600.
To be merged after #1803 and #1811
This PR adds `bitset_filter` to filter an index with a bitset.

Authors:
   - Micka (https://github.com/lowener)
   - tsuki (https://github.com/enp1s0)
   - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
   - Corey J. Nolet (https://github.com/cjnolet)
@enp1s0 enp1s0 deleted the 23.10-cagra-filter branch November 22, 2024 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp enhancement New feature or request improvement Improvement / enhancement to an existing function Milvus non-breaking Non-breaking change REDIS Vector Search
Projects
Development

Successfully merging this pull request may close these issues.

4 participants