src: cpu: aarch64: Enable matmul and convolution static quantisation through ACL. #2198

renato-arantes · 2024-11-04T11:01:08Z

Description

This is to enable matmul and conv static quantization operations through ACL. Currently, the supported data type combinations are s8:s8:s8 and u8:s8:u8.

Checklist

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?

Radu2k

LGTM

Radu2k · 2024-12-10T13:10:12Z

@mgouicem

mgouicem · 2024-12-11T14:01:09Z

src/cpu/aarch64/matmul/acl_lowp_matmul_sq.cpp

+}
+
+status_t acl_lowp_matmul_sq_t::execute(const exec_ctx_t &ctx) const {
+    std::lock_guard<std::mutex> _lock {this->mtx};


Maybe add a comment on which part below is not thread safe. That would help remove the lock later in time.

For some reason, I do not see any new comments. Did you forget to push your changes?

I removed the code from src/common/convolution_pd.hpp as you requested. Now the bias is set to f32 from inside src/cpu/aarch64/acl_gemm_convolution.cpp, and consequently, the types match and so the #ifdef inside convolution_pd.hpp is not needed anymore. The change is there, but it is mostly about removing the previous one!

Radu2k · 2024-12-13T11:05:30Z

@mgouicem Can we please close this?

src/common/convolution_pd.hpp

theComputeKid · 2024-12-16T10:58:15Z

@renato-arantes you have put in a merge commit. You need to squash and rebase.

renato-arantes · 2024-12-16T11:13:13Z

@mgouicem, I have addressed all the comments.

vpirogov · 2024-12-17T20:23:53Z

@theComputeKid, @renato-arantes,

Looks like this PR passed benchdnn smoke set, but introduced fails caught by extended coverage in #2263. Sample fails:

198/208 Test #186: test_benchdnn_modeC_conv_ci_cpu .........................***Exception: SegFault856.02 sec
<snip>
run: --conv --dir=FWD_I --dt=s8:s8:s8 --attr-post-ops=linear:2:1 ic3ih5oc16oh5kh3ph1n"conv_basic_2d:1st"
[   0][DST][0:0:0:0] exp_f32:         -23 exp:         -23 got:        -128 diff:     105 rdiff: 4.56522
[   1][DST][0:0:0:1] exp_f32:         -35 exp:         -35 got:        -128 diff:      93 rdiff: 2.65714
[   2][DST][0:0:0:2] exp_f32:        -111 exp:        -111 got:        -128 diff:      17 rdiff:0.153153
[   3][DST][0:0:0:3] exp_f32:         -55 exp:         -55 got:        -128 diff:      73 rdiff: 1.32727
[   4][DST][0:0:0:4] exp_f32:          73 exp:          73 got:         127 diff:      54 rdiff:0.739726
[   5][DST][0:0:1:0] exp_f32:         -47 exp:         -47 got:        -128 diff:      81 rdiff:  1.7234
[   6][DST][0:0:1:1] exp_f32:          81 exp:          81 got:         127 diff:      46 rdiff:0.567901
[   7][DST][0:0:1:2] exp_f32:         -23 exp:         -23 got:        -128 diff:     105 rdiff: 4.56522
[   8][DST][0:0:1:3] exp_f32:         125 exp:         125 got:         127 diff:       2 rdiff:   0.016
[   9][DST][0:0:1:4] exp_f32:          21 exp:          21 got:         127 diff:     106 rdiff: 5.04762
[COMPARE_STATS][DST]: trh=0 err_max_diff:     126 err_max_rdiff:     126 all_max_diff:     126 all_max_rdiff:     126
[PRIM_REF][INFO]: L2_size:524288 bytes; per_core_L3_size:1048576 bytes; nthr:16; impl_name:indirect_gemm:acl
[PRIM_REF][REPRO]: --conv --dir=FWD_I --attr-post-ops=linear:2:1 ic3ih5oc16oh5kh3ph1n"conv_basic_2d:1st"
30152:FAILED (errors:695 total:800) __REPRO: --conv --dir=FWD_I --dt=s8:s8:s8 --attr-post-ops=linear:2:1 ic3ih5oc16oh5kh3ph1n"conv_basic_2d:1st"
<snip>
run: --conv --dir=FWD_I --dt=s8:s8:s8 --attr-post-ops=linear:2:1 ic16ih5oc16oh5kh1ph0n"conv_basic_2d:1x1"
<snip>
207/208 Test #193: test_benchdnn_modeC_matmul_ci_cpu .......................***Exception: SegFault1346.38 sec
run: --matmul --dt=s8:s8:s8 --bia_dt=u8 77x133:133x117

Full log is here.

vpirogov · 2024-12-17T20:29:50Z

As the result I had to disable all required checks for AArch64 to avoid blocking promotion of PRs for other platforms.

theComputeKid · 2024-12-17T22:12:40Z

Thought I was doing a good deed by expanding CI but alas no good deed goes unpunished.

@Sqvid do you know why we didn't catch this internally? We need to fix this ASAP so we can get our public CI working again.

Change-Id: Ifaecea0c53d270e7dd3e4c38b4261456e40c7418 Signed-off-by: Renato Arantes <[email protected]> Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/oncpuml/oneDNN/+/706947 Tested-by: svc_mongoosetron <[email protected]> Reviewed-by: Radu Salavat <[email protected]> IP-Review: Hamza Butt <[email protected]>

renato-arantes requested review from a team as code owners November 4, 2024 11:01

renato-arantes marked this pull request as draft November 4, 2024 11:02

renato-arantes changed the title ~~src: cpu: aarch64: Enable matmul static quantisation through ACL.~~ src: cpu: aarch64: Enable matmul and convolution static quantisation through ACL. Nov 4, 2024

github-actions bot added the platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 label Nov 7, 2024

renato-arantes force-pushed the master branch 2 times, most recently from efad4f6 to d728790 Compare December 4, 2024 14:15

renato-arantes marked this pull request as ready for review December 5, 2024 08:25

theComputeKid approved these changes Dec 5, 2024

View reviewed changes

Radu2k approved these changes Dec 5, 2024

View reviewed changes

milpuz01 mentioned this pull request Dec 6, 2024

Failing to get oneDNN patch when building Tool-Solutions PyTorch docker image ARM-software/Tool-Solutions#260

Closed

mgouicem reviewed Dec 11, 2024

View reviewed changes

renato-arantes force-pushed the master branch 2 times, most recently from 25271ff to 91524da Compare December 13, 2024 10:47

mgouicem reviewed Dec 13, 2024

View reviewed changes

src/common/convolution_pd.hpp Outdated Show resolved Hide resolved

src/common/convolution_pd.hpp Outdated Show resolved Hide resolved

renato-arantes added 2 commits December 16, 2024 11:00

src: cpu: aarch64: Enable convolution static quantisation.

554a2b7

src: cpu: aarch64: Enable matmul static quantisation.

360a144

renato-arantes force-pushed the master branch from 9c53fdc to 360a144 Compare December 16, 2024 11:02

mgouicem approved these changes Dec 16, 2024

View reviewed changes

theComputeKid merged commit f4fcbb6 into uxlfoundation:main Dec 16, 2024
21 of 22 checks passed

This was referenced Dec 19, 2024

Revert PR for static quantisation for both matmul and convolution #2293

Merged

Fix CI failing tests for matmul and convolution #2285

Closed

vpirogov mentioned this pull request Jan 8, 2025

src: cpu: aarch64: Re-enable matmul static quantisation through ACL. #2308

Merged

2 tasks

src: cpu: aarch64: Enable matmul and convolution static quantisation through ACL. #2198

src: cpu: aarch64: Enable matmul and convolution static quantisation through ACL. #2198

Uh oh!

Conversation

renato-arantes commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

General

Uh oh!

Radu2k left a comment

Choose a reason for hiding this comment

Uh oh!

Radu2k commented Dec 10, 2024

Uh oh!

mgouicem Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

renato-arantes Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

mgouicem Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

renato-arantes Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

Radu2k commented Dec 13, 2024

Uh oh!

Uh oh!

Uh oh!

theComputeKid commented Dec 16, 2024

Uh oh!

renato-arantes commented Dec 16, 2024

Uh oh!

Uh oh!

vpirogov commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vpirogov commented Dec 17, 2024

Uh oh!

theComputeKid commented Dec 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

renato-arantes commented Nov 4, 2024 •

edited

Loading

vpirogov commented Dec 17, 2024 •

edited

Loading