Qualcomm AI Engine Direct - Fix static cache #16423

chenweng-quic · 2026-01-02T03:59:53Z

Summary

Fix static cache used by t5 and whisper
Fix online prepare on Mutable buffer
Fix minor issues

Test plan

python ${EXECUTORCH_ROOT}/backends/qualcomm/tests/test_qnn_delegate.py -k test_t5 --model SM8650 --build_folder build-android --host <host> --device <device_id>--executorch_root ${EXECUTORCH_ROOT} --artifact_dir ./t5_artifact --qa_dataset SQuAD-v1.1.csv
python ${EXECUTORCH_ROOT}/backends/qualcomm/tests/test_qnn_delegate.py -k test_whisper --model SM8650 --build_folder build-android --host <host> --device <device_id>--executorch_root ${EXECUTORCH_ROOT} --artifact_dir ./whisper_artifact_dir

cc @cccclai @cbilgin

- Fix t5 and whisper - Fix online prepare for mutable buffer - Fix minor issues

pytorch-bot · 2026-01-02T03:59:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16423

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

B200 runners are down due to network issues

✅ You can merge normally! (1 Unrelated Failure)

As of commit ba138a5 with merge base c730feb ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / android / run-emulator (gh) (#16137)
Timeout waiting for emulator to boot.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-01-02T04:00:34Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

mergennachin

See inline comments

mergennachin · 2026-01-02T18:16:43Z

backends/qualcomm/builders/op_copy.py

+            )
+            multiples = []
+            for i in range(len(reshape_tensor.shape)):
+                multiples.append(output_tensor.shape[i] // reshape_tensor.shape[i])


assert output_tensor.shape[i] % reshape_tensor.shape[i] == 0, f"Shape mismatch at dim {i}: {output_tensor.shape[i]} not divisible by {reshape_tensor.shape[i]}"

mergennachin · 2026-01-02T18:17:51Z

examples/qualcomm/oss_scripts/whisper/whisper_model.py

+        for idx in range(len(self.static_cache.layers)):
+            self.register_buffer(f"key_cache_{idx}", self.static_cache.layers[idx].keys)
+        for idx in range(len(self.static_cache.layers)):
+            self.register_buffer(
+                f"value_cache_{idx}", self.static_cache.layers[idx].values
+            )


nitpick: can do in one loop

mergennachin · 2026-01-02T18:21:33Z

backends/qualcomm/quantizer/qconfig.py

 ) -> QuantizationConfig:
-    # the smallest scale: 0.0001 / 255
-    extra_args: Dict[str, Any] = {"eps": 2**-21}
+    # the smallest scale defaults to 0.0001 / 255


# At top of file DEFAULT_EPS_8BIT = 0.0001 / 255 DEFAULT_EPS_16BIT = 0.0001 / 65535 # In functions extra_args: Dict[str, Any] = {"eps": eps if eps else DEFAULT_EPS_8BIT}

You are repeating this pattern over and over again. 2 times for 8it and 6 times for 16bit

Qualcomm AI Engine Direct - Fix t5 and whisper

2e817da

- Fix t5 and whisper - Fix online prepare for mutable buffer - Fix minor issues

chenweng-quic requested a review from cccclai as a code owner January 2, 2026 03:59

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 2, 2026

chenweng-quic added the module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ label Jan 2, 2026

chenweng-quic and others added 2 commits January 2, 2026 14:06

fix ConvertBmmToMatmul pass

e2db633

Update convert_bmm_to_matmul.py

ba138a5

mergennachin approved these changes Jan 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - Fix static cache #16423

Qualcomm AI Engine Direct - Fix static cache #16423

Uh oh!

chenweng-quic commented Jan 2, 2026 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jan 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 2, 2026

Uh oh!

mergennachin left a comment

Uh oh!

mergennachin Jan 2, 2026

Uh oh!

mergennachin Jan 2, 2026

Uh oh!

mergennachin Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Qualcomm AI Engine Direct - Fix static cache #16423

Are you sure you want to change the base?

Qualcomm AI Engine Direct - Fix static cache #16423

Uh oh!

Conversation

chenweng-quic commented Jan 2, 2026 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16423

❗ 1 Active SEVs

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

github-actions bot commented Jan 2, 2026

This PR needs a release notes: label

Uh oh!

mergennachin left a comment

Choose a reason for hiding this comment

Uh oh!

mergennachin Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

mergennachin Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

mergennachin Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenweng-quic commented Jan 2, 2026 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jan 2, 2026 •

edited

Loading

This PR needs a `release notes:` label