Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster matrix-vector-ops #401

Merged
merged 25 commits into from
Jan 12, 2022

Conversation

achirkin
Copy link
Contributor

Introduce matrixLinewiseOp for applying row- or column-wise operations on matrices with (templated) fixed number of vectors. This is a rewriting of matrixVectorOp.

The new primitive is on average 2x faster for various numbers of columns/rows. In general case: it improves performance by reusing the vector values across multiple matrix rows/columns (trying to load vector value once or at least cache it). In edge case: it tries to use vectorized load/store operations on the input/output matrices even when the pointers are not properly aligned, or the vectors' length is not multiple of the alignment.

@achirkin achirkin requested review from a team as code owners November 25, 2021 14:42
@achirkin achirkin added 3 - Ready for Review enhancement New feature or request improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 25, 2021
@achirkin achirkin force-pushed the enh-faster-linewise-ops branch from d4a1e69 to 5c63642 Compare November 25, 2021 15:01
@achirkin
Copy link
Contributor Author

I have a question here. I've used NVTX profiling for measuring the execution time in the old and new versions. However, I've noticed it was not used before. Shall I remove it from the PR?

Also, if you want to measure the execution time of the old version, check out the last commit before I replaced the matrixVectorOp implementation (Cosmetics and tests #bb30615).
The script I used:

# TEST_FILTER=LinewiseOp/Gigabyte_float_int.run/0
# FBASENAME=LinewiseOp_Gigabyte_float_new

TEST_FILTER=LinewiseOp/Gigabyte_float_int.run/9
FBASENAME=LinewiseOp_Gigabyte_float_orig

nsys profile -f true -o ${FBASENAME} \
  -c cudaProfilerApi -t "cublas,cuda,osrt,nvtx" \
  ./cpp/build/test_raft -vv --gtest_filter=$TEST_FILTER

@achirkin
Copy link
Contributor Author

nsys

@tfeher tfeher self-assigned this Dec 6, 2021
int BlockSize,
typename Lambda,
typename... Vecs>
void matrixLinewiseVecCols(Type* out,
Copy link
Member

@cjnolet cjnolet Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @divyegala for his ongoing work to hide implementation details of dense linalg primitives.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the changes here look great but we've been bitten a few times recently by making changes in RAFT that inadvertently break cuML (or its CI). The longer-term goal as RAFT becomes more stable is to get to a point where it can act independently from consuming projects. For now, however, we should open a PR in cuml that builds against this branch and verify CI runs successfully.

@achirkin
Copy link
Contributor Author

achirkin commented Dec 6, 2021

Yeah, actually, I just found out today that cuml's CD solver tests crash with my changes in raft on a matrix of size (1×2).

@achirkin achirkin added the 2 - In Progress Currenty a work in progress label Dec 6, 2021
achirkin added a commit to achirkin/cuml that referenced this pull request Dec 8, 2021
@achirkin achirkin force-pushed the enh-faster-linewise-ops branch from 488754e to 40e1a7e Compare December 10, 2021 14:25
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @achirkin for this PR. I have reviewed the row-wise code path. Overall it looks great, but it is difficult to follow the complex indexing logic. I have left a few suggestions where the description could be improved.

cpp/include/raft/pow2_utils.cuh Outdated Show resolved Hide resolved
cpp/include/raft/matrix/detail/linewise_op.cuh Outdated Show resolved Hide resolved
cpp/include/raft/matrix/detail/linewise_op.cuh Outdated Show resolved Hide resolved
cpp/include/raft/matrix/detail/linewise_op.cuh Outdated Show resolved Hide resolved
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have finished the review, I have a few additional comments toward improving the description of the algorithm. Apart from these, it looks good.

cpp/test/matrix/linewise_op.cu Outdated Show resolved Hide resolved
cpp/include/raft/matrix/detail/linewise_op.cuh Outdated Show resolved Hide resolved
@achirkin achirkin requested a review from tfeher December 17, 2021 11:13
@achirkin
Copy link
Contributor Author

rerun tests

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @achirkin for addressing the issues and improving the description of the functions! It looks good to me.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updats mostly look good from my end. I realized I had not submitted my original review, which had a bunch of requests for docs and have been addressed already from Tamas' review.

I just have a little question and a trivial update to use new (non-deprecated) macros.

cpp/include/raft/matrix/detail/linewise_op.cuh Outdated Show resolved Hide resolved
cpp/include/raft/matrix/detail/linewise_op.cuh Outdated Show resolved Hide resolved
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, thanks for adding the constants

@cjnolet
Copy link
Member

cjnolet commented Jan 12, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 15fd1d3 into rapidsai:branch-22.02 Jan 12, 2022
@achirkin achirkin deleted the enh-faster-linewise-ops branch March 31, 2022 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review CMake cpp enhancement New feature or request improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants