-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add eps-neighbor search via RBC #2028
Conversation
This is awesome! So can we support arbitrarily large dimensions now or are we still limited to what can be fused into the knn with k-selection? (I admit I haven't skimmed the whole PR yet) |
Sorry, my formulation was a bit unclear. 2/3-D limitation still holds for k-nn queries, but it is removed for eps-nn. |
@cjnolet , I believe this is ready for review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these changes @mfoerste4. This is a large PR, though, so I'm going to take a couple different passes. I went through the general API hanges first. I'll look through the algorithmic changes more closely but I wanted to provide some initial feedback before it gets too late for 24.02.
@@ -67,6 +67,28 @@ void knn_query(raft::resources const& handle, | |||
bool perform_post_filtering = true, | |||
float weight = 1.0) RAFT_EXPLICIT; | |||
|
|||
template <typename idx_t, typename value_t, typename int_t, typename matrix_idx_t> | |||
void epsUnexpL2NeighborhoodRbc(raft::resources const& handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an opportunity here to choose a nicer and more user-friendly name. I'd prefer something like "eps_nn" and we use knn
to designate the k-nearest neighbors, since we are already in the ball_cover namespace. Please also use mdspan-based APIs for these. The pointer-based functions are deprecated in the public APIs, so we shouldn't be introducing any new ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I have changed the API accordingly.
template <typename value_idx, | ||
typename value_t, | ||
typename value_int = std::uint32_t, | ||
typename matrix_idx = std::uint32_t> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too wrt new template args. If we're defaulting it to be the same as value_int
, can we just reuse it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Malte for this work! The main issue that I see actually coincides with what @cjnolet has already pointed out: we would like to simplify the template parameters and reduce the number of instantiations.
I have started to play with the idea of keeping all idx in int64_t in a separate branch. This commit fixes the compilation issues that we have in the CI, but some issues still remain with the test.
I am still looking into some of the implementation details, I will get back to you soon.
@cjnolet , I have removed all double precision instantiations to reduce compile time and binary size. The double-path was only partly used via cuml-dbscan but caused a 2x factor for all major rbc and knn instances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mfoerste4 for the update, looks good now, just a few small things:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mfoerste4 for the update, the PR looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR LGTM! Thanks again @mfoerste4!
/merge |
@cjnolet , sorry for the last minute fix - just noticed this while building with cuml. |
This PR adds support for an eps-neighborhood search based on random ball cover. The algorithm re-uses the existing rbc index creation.
Changes:
raft::neighbors::ball_cover::epsUnexpL2NeighborhoodRbc
Note: The PR also contains a fix for the vertex degree computation in the brute force algorithm
spatial::knn::detail::epsUnexpL2SqNeighborhood
.Related to #1984 and #517