Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cagra template instantiations #1650

Merged

Conversation

tfeher
Copy link
Contributor

@tfeher tfeher commented Jul 17, 2023

Cagra was introduced header only in #1375. This PR adds a precompiled single- and multi-cta search kernels to libraft.so.

The single- and multi-cta search kernels were moved to separate header files to make it easier to specify extern template instantiations for these.

The macros for dispatching the kernels were replaced by functions. We define explicit instantiations for the top level dispatch functions. (This is in contrast to #1428 where the kernels themselves were instantiated, which resulted in a large number of parameter combinations that had to be explicitly spelled out.)

This PR fixes #1443.

@tfeher tfeher requested review from a team as code owners July 17, 2023 12:03
@tfeher
Copy link
Contributor Author

tfeher commented Jul 17, 2023

Note for reviewers, it is best to review this PR commit by commit:

  • Move CAGRA search kernels to separate header files (without code change) 87c501a
  • Replace CAGRA search kernel dispatch macros with functions 700112f
  • Add explicit template instantiations for CAGRA 5e74e7e

@tfeher tfeher self-assigned this Jul 17, 2023
@tfeher tfeher added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 17, 2023
@@ -319,6 +319,14 @@ if(BUILD_TESTS)
test/neighbors/ann_cagra/test_int8_t_uint32_t.cu
test/neighbors/ann_cagra/test_uint8_t_uint32_t.cu
test/neighbors/ann_cagra/test_float_int64_t.cu
src/neighbors/detail/cagra/search_multi_cta_float_uint64_dim128_t8.cu
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: to libraft.so we add precompiled cagra kernels for 32-bit index type only. We have test for 64-bit as well, therefore we add extra object files for the tests here.

@tfeher tfeher force-pushed the cagra_template_instantiations_final branch from d3300bd to 816bc25 Compare July 17, 2023 13:14
@tfeher tfeher force-pushed the cagra_template_instantiations_final branch from 816bc25 to 5e74e7e Compare July 17, 2023 13:18
Copy link
Member

@divyegala divyegala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me! Just want to make sure the compile time decrease/binary size increase is roughly equivalent to #1428 before I approve this since we add another templated function in this PR.

@tfeher
Copy link
Contributor Author

tfeher commented Jul 17, 2023

This PR adds 150 MB to libraft.so. Status before the PR:

Compilation unit size (MB) parallel build time
libraft.so 453 0 (considering cagra components only)
cagra test 205.6 0:29:42
cagra prims bench 50 0:23:14
cagra ann bench 152.5 1:14:12

After the PR

Compilation unit size (MB) parallel build time
libraft.so 601.2 0:08:51
cagra test 59.7 0:07:07
cagra prims bench  
cagra ann bench 5.9 0:00:10

Changes in Cagra prims bench are not yet measured, expected to be similar to cagra ann bench.

@tfeher
Copy link
Contributor Author

tfeher commented Jul 17, 2023

The binary size will be reduced once #1459 is solved. Here is a list of added compilation units:

File Compile time (h : m : s) size (MB)
search_single_cta_float_uint32_dim1024_t32.cu.o 0:07:30 11.8
search_single_cta_float_uint32_dim256_t16.cu.o 0:07:23 10.6
search_single_cta_float_uint32_dim128_t8.cu.o 0:07:21 10.1
search_single_cta_uint8_uint32_dim1024_t32.cu.o 0:07:12 13.1
search_single_cta_float_uint32_dim512_t32.cu.o 0:07:12 11.3
search_single_cta_int8_uint32_dim1024_t32.cu.o 0:07:02 13.1
search_single_cta_int8_uint32_dim512_t32.cu.o 0:07:01 11.8
search_single_cta_uint8_uint32_dim512_t32.cu.o 0:06:55 11.9
search_single_cta_int8_uint32_dim128_t8.cu.o 0:06:55 11.5
search_single_cta_uint8_uint32_dim128_t8.cu.o 0:06:54 11.5
search_single_cta_uint8_uint32_dim256_t16.cu.o 0:06:53 11.7
search_single_cta_int8_uint32_dim256_t16.cu.o 0:06:20 11.7
search_multi_cta_float_uint32_dim1024_t32.cu.o 0:01:33 0.8
search_multi_cta_uint8_uint32_dim1024_t32.cu.o 0:01:30 0.9
search_multi_cta_int8_uint32_dim1024_t32.cu.o 0:01:30 0.9
search_multi_cta_float_uint32_dim128_t8.cu.o 0:01:24 0.5
search_multi_cta_float_uint32_dim256_t16.cu.o 0:01:22 0.6
search_multi_cta_float_uint32_dim512_t32.cu.o 0:01:22 0.6
search_multi_cta_uint8_uint32_dim256_t16.cu.o 0:01:21 0.7
search_multi_cta_uint8_uint32_dim512_t32.cu.o 0:01:20 0.9
search_multi_cta_int8_uint32_dim512_t32.cu.o 0:01:20 0.9
search_multi_cta_int8_uint32_dim256_t16.cu.o 0:01:20 0.7
search_multi_cta_int8_uint32_dim128_t8.cu.o 0:01:20 0.7
search_multi_cta_uint8_uint32_dim128_t8.cu.o 0:01:19 0.7
total 1:41:20 149.0

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good enough in its current state to approve for 23.08. I do think we should continue to work on the binary size and build times, but that's also a longer-term need.

#include "../ann_cagra.cuh"
#include "search_kernel_uint64_t.cuh"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already testing for uint32_t here but int64_t has become the "standard" indexing type across RAFT's ANN algos. Could we test int64_t here just so we're covering both unsigned and signed types here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can make the change in a follow up PR. Under the hood IdxT is converted to unsigned type, so it is only a matter of testing the outer wrapper layers.

@cjnolet
Copy link
Member

cjnolet commented Jul 19, 2023

/merge

@rapids-bot rapids-bot bot merged commit 2ee9a57 into rapidsai:branch-23.08 Jul 19, 2023
@tfeher tfeher deleted the cagra_template_instantiations_final branch October 25, 2023 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Vector Search
Projects
Development

Successfully merging this pull request may close these issues.

Re-introduce CAGRA template instantiations to reduce compile time
3 participants