Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support uint64_t in CAGRA index data type #1514

Merged
merged 28 commits into from
May 19, 2023

Conversation

enp1s0
Copy link
Member

@enp1s0 enp1s0 commented May 15, 2023

This PR updates CAGRA::search to support uint64_t in the index data type. This update is required to implement the RAFT ANN benchmark for CAGRA.

@enp1s0 enp1s0 requested review from a team as code owners May 15, 2023 05:35
@enp1s0 enp1s0 self-assigned this May 15, 2023
@enp1s0 enp1s0 added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed cpp CMake labels May 15, 2023
@enp1s0
Copy link
Member Author

enp1s0 commented May 15, 2023

Currently, some unit tests fail because recalls of them are smaller than the threshold.

@enp1s0
Copy link
Member Author

enp1s0 commented May 15, 2023

TODO:

  • hash table
  • mottainai bit

@enp1s0
Copy link
Member Author

enp1s0 commented May 15, 2023

Fixed the low recall problem and did the to-do things but found that this update is not necessary to implement the RAFT RNN benchmark for CAGRA (:cry:). However, I will proceed with this PR to ensure the completeness of the index data type.
(In the current mainstream implementation, some compilation errors occur when specifying uint64_t as the index data type. This PR address them.)

@enp1s0
Copy link
Member Author

enp1s0 commented May 15, 2023

@tfeher Can you check the code and let me know if there are any problems?

@enp1s0 enp1s0 added bug Something isn't working and removed improvement Improvement / enhancement to an existing function labels May 15, 2023
@tfeher
Copy link
Contributor

tfeher commented May 15, 2023

Thanks @enp1s0 for this PR! I think it is very useful to update the code to work with other index type (even if we would mainly use with 32bit indices). I will have a look at the changes.

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enp1s0 for this PR. It is great to have the freedom to customize the index type paramater. I am somewhat concerned about the test size and compile time maybe it would be sufficient to test limited template type combinations in int64 mode? (In practice it is expected that uint32_t would be sufficient in most of the cases.)

Comment on lines 319 to 321
test/neighbors/ann_cagra/test_float_uint64_t.cu
test/neighbors/ann_cagra/test_int8_t_uint64_t.cu
test/neighbors/ann_cagra/test_uint8_t_uint64_t.cu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding these effectively doubles our test binary size:

File Compile time Size
test_float_uint32_t.cu.o 43:55 min 106.657 MB
test_float_uint64_t.cu.o 42:24 min 98.597 MB
test_uint8_t_uint32_t.cu.o 41:31 min 117.630 MB
test_int8_t_uint32_t.cu.o 41:29 min 116.698 MB
test_int8_t_uint64_t.cu.o 40:04 min 106.032 MB
test_uint8_t_uint64_t.cu.o 39:56 min 107.312 MB

See #1499 (comment) for potential impact on packages.

(Note that compile time is improved by #1428 by enabling parallel build for different template parameters, but binary size is not changed there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed test_int8_t_uint64_t and test_uint8_t_uint64_t

cpp/test/neighbors/ann_cagra/test_float_uint64_t.cu Outdated Show resolved Hide resolved
@enp1s0 enp1s0 closed this May 17, 2023
@enp1s0 enp1s0 force-pushed the cagra-64bit-index branch from 9fa8c9e to d99d249 Compare May 17, 2023 03:48
@enp1s0 enp1s0 reopened this May 17, 2023
@enp1s0
Copy link
Member Author

enp1s0 commented May 17, 2023

Thanks @tfeher for reviewing the changes.

it would be sufficient to test limited template type combinations in int64 mode?

Yes, I agree with you. I'll remove the (DataT, IdxT) = (int8, uint64) and (uint8, uint64) tests.

@enp1s0
Copy link
Member Author

enp1s0 commented May 17, 2023

@tfeher I have a question about supporting signed 64-bit int (int64_t) as the index data type.
Currently, int64_t does not work as expected, with the recall becoming zero when using it. I'm not sure of the cause, but some parts of the current CAGRA kernels assume the index data type is unsigned. While it is true that the index is always positive, I understand that it is preferred to support signed data types to have freedom in the index data types.

While it is possible to modify the kernels to support int64_t and make them work as expected, I prefer an alternative approach: using kernel functions designed for unsigned index data types and using reinterpret_cast to treat the signed types as unsigned. The advantages of this implementation are as follows:

  1. We can use the same kernel functions for both signed and unsigned index types, which helps to keep the binary size of the library from increasing.
  2. We can avoid the need for additional out-of-range memory access checks for signed types.

This implementation should not have any disadvantages because, if the data type is signed, the index is always positive, and the range of representation for unsigned types is larger than that of signed types.

What do you think about it?

@tfeher
Copy link
Contributor

tfeher commented May 17, 2023

using kernel functions designed for unsigned index data types and using reinterpret_cast to treat the signed types as unsigned

This sounds good to me!

@enp1s0
Copy link
Member Author

enp1s0 commented May 19, 2023

@tfeher I have fixed the codes! Can you look at them again?

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enp1s0 for the updates, the PR looks good to me!

@cjnolet
Copy link
Member

cjnolet commented May 19, 2023

/merge

@rapids-bot rapids-bot bot merged commit 1f61b47 into rapidsai:branch-23.06 May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CMake cpp non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

4 participants