Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Remaining sparse semiring distances #261

Merged

Conversation

cjnolet
Copy link
Member

@cjnolet cjnolet commented Jun 8, 2021

This PR is intended to be merged after #207 (hash table strategy) has been merged.

This PR introduces the following distances:

  • Hamming
  • Jensen-Shannon
  • Russell-Rao
  • KL-Divergence
  • Correlation

Most of the changes here are from #207 and will be reviewed in that PR. The only files that need to be reviewed for this PR are sparse/distance/l2_distance.cuh, sparse/distance/bin_distance.cuh, sparse/distance/lp_distances.cuh, and their corresponding gtests: test/sparse/distance.cuh

rapids-bot bot pushed a commit that referenced this pull request Jun 23, 2021
This branch includes several new features and optimizations:

1. Introduces a hash table strategy to sparsify the vector in the coo spmv shared memory
2. Adds a batching strategy for rows with nnz too large to fit into shared memory
3. Removes the need for the cusparse csrgemm
4. Uses raft handle in distances_config_t rather than accepting each resource explicitly
5. Removes the naive CSR semiring code

This PR is also required to merge #261, which introduces the remaining distances

Authors:
  - Divye Gala (https://github.com/divyegala)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #269
@github-actions github-actions bot added the CMake label Jun 23, 2021
@github-actions github-actions bot removed the CMake label Jun 23, 2021
Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had one quick comment/question about using RMM directly, the code for the distances looks good!

raft::mr::device::buffer<value_t> R_sq_norms(alloc, stream, n);

// sum for mean
raft::mr::device::buffer<value_t> Q_norms(alloc, stream, m);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it wouldn't be too many changes, could you use rmm uvectors in general instead? (just adding a single comment to avoid repeated ones)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think that would be too much to do for this PR.

@cjnolet
Copy link
Member Author

cjnolet commented Jul 7, 2021

rerun tests

@cjnolet
Copy link
Member Author

cjnolet commented Jul 7, 2021

@dantegd, I replaced the usage of the raft device buffer w/ the rmm device uvector everywhere. This is ready to be merged.

@dantegd
Copy link
Member

dantegd commented Jul 12, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit f94780c into rapidsai:branch-21.08 Jul 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review cpp feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants