Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix euclidean_dist in IVF-Flat search #1122

Merged
merged 5 commits into from
Jan 10, 2023

Conversation

Nyrio
Copy link
Contributor

@Nyrio Nyrio commented Jan 5, 2023

Solves #1058

This is a tricky bug so the fix deserves some explanation. The previous implementation of euclidean_dist was the following in vectorized cases, where x and y are int32 vectors of 4 int8 each and acc is a single int32 number to accumulate the distance in:

// Compute vectorized absolute differences independently.
const auto diff = static_cast<int32_t>(__vabsdiffs4(x, y));
// Square, reduce, and add to the accumulator.
acc = dp4a(diff, diff, acc);

Now consider the following case:

x = 0x80; // -128, 0, 0, 0
y = 0x7f; //  127, 0, 0, 0

The difference between -128 and 127 is 255, represented as FF (__vabsdiffs4 is smart enough not to compute abs(a-b) which would result in 01). However, if we call the signed version of dp4a, FF is cast from int8 to int32 as FFFFFFFF (or -1). The square of -1 is 1, which is added to acc (instead of 65025).

As the output of __vabsdiffs4 is correct when considered as an unsigned number, and as addition is the same for signed and unsigned in 2's complement (and acc is positive anyway), the easiest fix is to use the unsigned version of dp4a, which will cast overflowed differences properly to 32 bits. The previous code simply becomes:

const auto diff = __vabsdiffs4(x, y);
acc = dp4a(diff, diff, static_cast<uint32_t>(acc));

Additionally, to avoid underflows in the non-vectorized unsigned case, I replaced the subtraction with __usad (absolute difference of unsigned numbers). Note that using the subtraction was correct anyway, because the addition/subtraction is the same for unsigned and signed integers, as well as the least significant half of the multiplication (which is the part that is stored), and the square of a number is also the square of its opposite. Consider:

uint32_t a = 10;
uint32_t b = 20;
uint32_t c = a - b; // fffffff6, i.e -10 or 4294967286
uint32_t d = c * c; // (ffffffec)00000064, i.e 100

@Nyrio Nyrio requested review from a team as code owners January 5, 2023 17:00
@Nyrio Nyrio added bug Something isn't working 3 - Ready for Review non-breaking Non-breaking change labels Jan 5, 2023
@Nyrio Nyrio requested review from tfeher and achirkin January 5, 2023 17:03
@Nyrio Nyrio self-assigned this Jan 5, 2023
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that must have been a tough investigation, good job! You've explained the problem very nice here, but maybe it's worth adding a short gist of it (or a link) in the code?

@Nyrio Nyrio requested a review from achirkin January 9, 2023 11:35
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Louis for fixing this problem, the PR looks good to me!

Could you update the second part of the PR description? You say

added comment in the unsigned, non-vectorized implementation

But that has changed during the review.

@Nyrio
Copy link
Contributor Author

Nyrio commented Jan 9, 2023

Could you update the second part of the PR description?

Ah, yes, done.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. Thanks @Nyrio!

Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cjnolet
Copy link
Member

cjnolet commented Jan 10, 2023

/merge

@rapids-bot rapids-bot bot merged commit b5c2b39 into rapidsai:branch-23.02 Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review bug Something isn't working cpp non-breaking Non-breaking change python
Projects
Development

Successfully merging this pull request may close these issues.

4 participants