Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Masked NN for connect_components #1445

Merged

Conversation

tarang-jain
Copy link
Contributor

@tarang-jain tarang-jain commented Apr 20, 2023

Replace fused L2 Nearest Neighbors in connect_components with masked NN.
Closes #743
Closes #1569

@github-actions github-actions bot added the cpp label Apr 20, 2023
@tarang-jain tarang-jain added non-breaking Non-breaking change and removed cpp labels Apr 20, 2023
@github-actions github-actions bot added the cpp label Apr 22, 2023
@tarang-jain tarang-jain added the feature request New feature or request label Apr 26, 2023
@tarang-jain tarang-jain marked this pull request as ready for review May 11, 2023 03:41
@tarang-jain tarang-jain requested a review from a team as a code owner May 11, 2023 03:41
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took another pass. I think it's almost there. Mostly just a couple minor things but would still like to see more assertion cases for scatter.cu tests.

@@ -59,21 +59,41 @@ value_idx get_n_components(value_idx* colors, size_t n_rows, cudaStream_t stream
* @param[in] orig_colors array containing component number for each row of X
* @param[in] n_rows number of rows in X
* @param[in] n_cols number of cols in X
* @param[in] reduction_op
* @param[in] metric
* @param[in] reduction_op reduction operation for computing nearest neighbors. The reduction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also rename this file to cross_component_nn?

cpp/test/sparse/neighbors/connect_components.cu Outdated Show resolved Hide resolved
INSTANTIATE_TEST_CASE_P(ScatterTests, test_name, ::testing::ValuesIn(test_inputs))

const std::vector<ScatterInputs<int>> inputs_i32 =
raft::util::itertools::product<ScatterInputs<int>>({2000}, {6, 31, 129}, {2, 3, 6}, {1234ULL});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need more cases here (e.g. different parameter settings).

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @tarang-jain! LGTM, but we need to make sure we synchronize the merge with cuml.

Copy link
Contributor

@ahendriksen ahendriksen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hard work! Just a single comment related to RAFT_EXPLICIT_INSTANTIATE.

cpp/test/sparse/neighbors/cross_component_nn.cu Outdated Show resolved Hide resolved
Copy link
Contributor

@ahendriksen ahendriksen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cjnolet
Copy link
Member

cjnolet commented Jul 25, 2023

/merge

@rapids-bot rapids-bot bot merged commit a66c3a3 into rapidsai:branch-23.08 Jul 25, 2023
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Jul 26, 2023
Make HDBSCAN MST reduction operation compatible with [#1445](rapidsai/raft#1445) raft PR. This PR updates the Mutual Reachability reduction op in HDBSCAN in the following ways:
1. Colors information is no longer needed in the reduction op because the new `mst` implementation in raft ensures that distances between points of the same color are not computed.
2. Adds gather and scatter functions to rearrange the core distances within the reduction op so that they are aligned with the sort-plan wherein rows in the input matrix and core distances are rearranged so that the training data points are sorted by color.
Closes #5456

Authors:
  - Tarang Jain (https://github.com/tarang-jain)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #5386
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change CMake cpp feature request New feature or request
Projects
Development

Successfully merging this pull request may close these issues.

[FEA] In-place gather primitive [FEA] SLHC should compute distances only across components/colors
4 participants