-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Add chebyshev, canberra, minkowksi and hellinger distance metrics #276
Conversation
…r usage in all contraction based kernels so that n is along x dir and m is along y dir blocks
…kernels. --add launch config generator function to launch optimal grid size kernel for these pairwise dist kernels
…ed up over previous version. -- improve logic of the grid launch config generator for x-dir blocks
…ced val for pre-volta arch
… for subsequent gridStrideX variations. this overall improves perf of fusedL2NN to 1.85x over previous version. --Also remove checking keys only check values in fusedL2nn test case, as it may happen a row has multiple keys with same min val
…und in launchConfigGenerator. --Use constexpr in shmemSize.
…e sure next grid stride doesn't pollute shmem before completion of this calculation
…t iteration of grid stride
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Just very minor things.
@@ -0,0 +1,159 @@ | |||
/* | |||
* Copyright (c) 2018-2021, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a new file, we should just put 2021
here.
@@ -0,0 +1,156 @@ | |||
/* | |||
* Copyright (c) 2018-2021, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
* @tparam OutType output data-type (for C and D matrices) | ||
* @tparam FinalLambda user-defined epilogue lamba | ||
* @tparam Index_ Index type | ||
* @param m number of rows of A and C/D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the ins/outs here as well for consistency?
* It computes the following equation: cij = max(cij, op(ai-bj)) | ||
* @tparam InType input data-type (for A and B matrices) | ||
* @tparam AccType accumulation data-type | ||
* @tparam OutType output data-type (for C and D matrices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, maybe add the ins/outs for consistency w/ the above docs (also apply to the distances below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
…and reverting in post completion. also add unrolls to ldg arrays in contractions
@cjnolet @teju85 the cuML side PR is passing - rapidsai/cuml#3990 |
@mdoijade, the dbscan / numpy issue should be fixed in this PR: rapidsai/cuml#4012. Would it reduce the compile time at all if we just used function arguments for the individual distances and removed the explicit templates for them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we merge this PR, could you quantify the impact of the changes in cuML's compilation time (locally at least)?
@dantegd With these 4 additional distance metrics cuML total build time It is roughly 2x build time increase, and this reduction is after my change of using compiled cuML pairwise dist kernels instead of each time calling raft API wherever it is feasible without that change build time is about 25mins. |
@dantegd @cjnolet @teju85 With these 4 additional distance metrics cuML total build time without splitting - CUDA 11.0 Without new distance metrics cuML build time - CUDA 11.0 (current baseline) After splitting the distance prims in Without minkowski - CUDA 11.0 With minkowski - cuda 11.2 + split pairwise distance With minkowski - cuda 11.2 + without split pairwise distance So it is evident that this splitting of |
@dantegd can we merge this now? and also cuML PR - rapidsai/cuml#3990 |
@gpucibot merge |
This reverts commit 82061e0. This PR appears to have resulted in a substantial increase in compilation time, causing timeouts in CI for cuML PR 4029. These changes should be reinstated once the cause of the increased compilation time has been determined.
) This PR relies on RAFT PR rapidsai/raft#276 which adds these distance metrics support. Authors: - Mahesh Doijade (https://github.com/mdoijade) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: #3990
…pidsai#3990) This PR relies on RAFT PR rapidsai/raft#276 which adds these distance metrics support. Authors: - Mahesh Doijade (https://github.com/mdoijade) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#3990
This change will be followed up by cuML side change to use these metrics and update the pytest accordingly.