Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Add chebyshev, canberra, minkowksi and hellinger distance metrics #276

Merged
merged 43 commits into from
Jul 6, 2021

Conversation

mdoijade
Copy link
Contributor

@mdoijade mdoijade commented Jun 15, 2021

This change will be followed up by cuML side change to use these metrics and update the pytest accordingly.

mdoijade added 30 commits May 12, 2021 22:03
…r usage in all contraction based kernels so that n is along x dir and m is along y dir blocks
…kernels. --add launch config generator function to launch optimal grid size kernel for these pairwise dist kernels
…ed up over previous version. -- improve logic of the grid launch config generator for x-dir blocks
… for subsequent gridStrideX variations. this overall improves perf of fusedL2NN to 1.85x over previous version. --Also remove checking keys only check values in fusedL2nn test case, as it may happen a row has multiple keys with same min val
…und in launchConfigGenerator. --Use constexpr in shmemSize.
…e sure next grid stride doesn't pollute shmem before completion of this calculation
@mdoijade
Copy link
Contributor Author

@teju85 @cjnolet please help with review.

@cjnolet cjnolet added feature request New feature or request non-breaking Non-breaking change labels Jun 15, 2021
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Just very minor things.

@@ -0,0 +1,159 @@
/*
* Copyright (c) 2018-2021, NVIDIA CORPORATION.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a new file, we should just put 2021 here.

@@ -0,0 +1,156 @@
/*
* Copyright (c) 2018-2021, NVIDIA CORPORATION.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

* @tparam OutType output data-type (for C and D matrices)
* @tparam FinalLambda user-defined epilogue lamba
* @tparam Index_ Index type
* @param m number of rows of A and C/D
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the ins/outs here as well for consistency?

* It computes the following equation: cij = max(cij, op(ai-bj))
* @tparam InType input data-type (for A and B matrices)
* @tparam AccType accumulation data-type
* @tparam OutType output data-type (for C and D matrices)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, maybe add the ins/outs for consistency w/ the above docs (also apply to the distances below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@mdoijade mdoijade changed the title Add chebyshev, canberra, minkowksi and hellinger distance metrics [REVIEW] Add chebyshev, canberra, minkowksi and hellinger distance metrics Jun 17, 2021
@mdoijade
Copy link
Contributor Author

@cjnolet @teju85 the cuML side PR is passing - rapidsai/cuml#3990
the failure in those tests is in dbscan due to some recent numpy update.
All tests pertaining to these distance metrics are passing.

@cjnolet
Copy link
Member

cjnolet commented Jun 24, 2021

@mdoijade, the dbscan / numpy issue should be fixed in this PR: rapidsai/cuml#4012.

Would it reduce the compile time at all if we just used function arguments for the individual distances and removed the explicit templates for them?

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we merge this PR, could you quantify the impact of the changes in cuML's compilation time (locally at least)?

@mdoijade
Copy link
Contributor Author

Before we merge this PR, could you quantify the impact of the changes in cuML's compilation time (locally at least)?

@dantegd
Without new distance metrics cuML total build time
real 9m24.016s
user 118m8.600s
sys 3m43.230s

With these 4 additional distance metrics cuML total build time
real 18m11.113s
user 120m28.920s
sys 3m34.880s

It is roughly 2x build time increase, and this reduction is after my change of using compiled cuML pairwise dist kernels instead of each time calling raft API wherever it is feasible without that change build time is about 25mins.
This increase is coming from pairwise_distance.cu

@mdoijade
Copy link
Contributor Author

mdoijade commented Jun 29, 2021

@dantegd @cjnolet @teju85
I split the pairwise_distance.cu compilation into multiple source files for each distance metric to enable parallel build in it,
this has helped by a certain extent from previous effort like around 2-3 mins build time reduction.
I am seeing that minkowksi kernels are taking longer than others with CUDA 11.0 it has "powf(), pow()" functions in it that is the only difference.
so after disabling minkowski kernels (other 3 new kernels enabled) the time take for full cuML build with CUDA 11.0
real 9m35.232s
user 113m56.715s
sys 3m44.966s
this is almost same time as without these kernels.

With these 4 additional distance metrics cuML total build time without splitting - CUDA 11.0
real 18m11.113s
user 120m28.920s
sys 3m34.880s

Without new distance metrics cuML build time - CUDA 11.0 (current baseline)
real 9m24.016s
user 118m8.600s
sys 3m43.230s

After splitting the distance prims in pairwise_distance.cu into individual source files (with minkowski) - CUDA 11.0
real 17m44.648s
user 127m42.492s
sys 3m48.588s

Without minkowski - CUDA 11.0
real 9m35.232s
user 113m56.715s
sys 3m44.966s

With minkowski - cuda 11.2 + split pairwise distance
real 9m57.291s
user 123m47.790s
sys 3m54.819s

With minkowski - cuda 11.2 + without split pairwise distance
real 12m22.500s
user 118m22.608s
sys 3m45.721s

So it is evident that this splitting of pairwise_distance.cu does help reduce build time.
and there is some compilation slowness issue with cuda 11.0 for pow(), powf() which seems to be resolved in cuda 11.2.
cuML PR - rapidsai/cuml#3990

@mdoijade
Copy link
Contributor Author

mdoijade commented Jul 5, 2021

@dantegd can we merge this now? and also cuML PR - rapidsai/cuml#3990

@dantegd
Copy link
Member

dantegd commented Jul 6, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 82061e0 into rapidsai:branch-21.08 Jul 6, 2021
wphicks added a commit to wphicks/raft that referenced this pull request Jul 7, 2021
This reverts commit 82061e0. This PR
appears to have resulted in a substantial increase in compilation time,
causing timeouts in CI for cuML PR 4029. These changes should be
reinstated once the cause of the increased compilation time has been
determined.
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Jul 8, 2021
)

This PR relies on RAFT PR rapidsai/raft#276 which adds these distance metrics support.

Authors:
  - Mahesh Doijade (https://github.com/mdoijade)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Corey J. Nolet (https://github.com/cjnolet)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #3990
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
…pidsai#3990)

This PR relies on RAFT PR rapidsai/raft#276 which adds these distance metrics support.

Authors:
  - Mahesh Doijade (https://github.com/mdoijade)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Corey J. Nolet (https://github.com/cjnolet)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: rapidsai#3990
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants