Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU][ARM] Fixed cvt_copy fast path for mha_single_token_kernel #28265

Open
wants to merge 2 commits into from

Conversation

dmitry-gorokhov
Copy link
Contributor

Details:

  • This PR fixes incorrect cvt_copy rountine behavior inside mha_single_token kenrel on ARM platforms. In case __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is defined on the system and fp32 inference scalar code path is chosen.
  • Additionally cvt_copy impl is refactored via template specialization for better readability
  • Follow-up after [CPU] [ARM] SVE FP16 functions for MHASingleToken kernel #28182

@dmitry-gorokhov dmitry-gorokhov added the platform: arm OpenVINO on ARM / ARM64 label Jan 3, 2025
@dmitry-gorokhov dmitry-gorokhov added this to the 2025.0 milestone Jan 3, 2025
@dmitry-gorokhov dmitry-gorokhov requested review from a team as code owners January 3, 2025 11:58
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Jan 3, 2025
@dmitry-gorokhov dmitry-gorokhov force-pushed the feature/fix_mha_single_token_kernel branch from 71372fa to c60cd0c Compare January 3, 2025 12:44
# endif
svfloat16_t b1 = svld1_f16(pg, reinterpret_cast<const float16_t*>(src + i));
svst1_f16(pg, reinterpret_cast<float16_t*>(dst + i), b1);
i += inc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we add this loop in HAVE_SVE branch?

    for (; i < n; i++) {
        dst[i] = src[i];
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't, because SVE handles tiles inside vector loop:
image
In fact this is one the major concepts of SVE: generilized tiles handling.

@@ -100,20 +119,21 @@ void cvt_copy(TA* dst, TB* src, size_t n) {
svst1_f32(pg, _dst + i, b1);
i += inc;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comment applied here.

@dmitry-gorokhov dmitry-gorokhov added this pull request to the merge queue Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin platform: arm OpenVINO on ARM / ARM64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants