Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPUW: Fix cases when FOLDing is not enabled #28255

Merged
merged 4 commits into from
Jan 6, 2025

Conversation

dmatveev
Copy link
Contributor

@dmatveev dmatveev commented Jan 2, 2025

Details:

  • CompiledModel: do API-modifying transformations before passing the model down to ICompiledModel constructor

    • Without that, compiling model with weights and SLICE_OUT:YES caused a mismatch in allocated/set tensor size. NPUW's CompiledModel first initializes its inputs() / outputs() via the parent class, ov::ICompiledModel, and then modifies the shape of the output tensor if SLICE_OUT transformation worked.
    • When an Infer Request is created, it's outputs are allocated based either on the model information (above) or the function information, if produced by a function.
    • SLICE_OUT altered function bodies so when the last graph's output was a function (WITH folding), the meta mismatch was ignored. When folding is disabled, however, the output tensor is created based on the original (untransformed) model information and then is passed to a transformed subgraph, causing an assert failure.
    • Moving the parent-model-side API-altering transformations before initializing the NPUW's CompiledModel solved the problem.
  • UnfoldInferRequest: run unpack_closure for functions only.

    • If FOLDing is disabled, there's no functions but unpack_closure will still be called, leading to an assert.

Tickets:

  • ticket-id

- CompiledModel: do API-modifying transformations before passing
  the model down to ICompiledModel constructor
- UnfoldInferRequest: run unpack_closure for functions only
@dmatveev dmatveev added this to the 2025.0 milestone Jan 2, 2025
@dmatveev dmatveev self-assigned this Jan 2, 2025
@dmatveev dmatveev requested review from a team as code owners January 2, 2025 17:32
@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Jan 2, 2025
@dmatveev dmatveev added this pull request to the merge queue Jan 6, 2025
Merged via the queue into openvinotoolkit:master with commit dd6a128 Jan 6, 2025
161 checks passed
@dmatveev dmatveev deleted the dm/npuw_fix_nofold branch January 6, 2025 17:19
MirceaDan99 pushed a commit to MirceaDan99/openvino that referenced this pull request Jan 22, 2025
### Details:
- CompiledModel: do API-modifying transformations before passing the
model down to ICompiledModel constructor
- Without that, compiling model _with_ weights _and_ SLICE_OUT:YES
caused a mismatch in allocated/set tensor size. NPUW's `CompiledModel`
first initializes its `inputs()` / `outputs()` via the parent class,
`ov::ICompiledModel`, and then modifies the shape of the output tensor
if SLICE_OUT transformation worked.
- When an Infer Request is created, it's outputs are allocated based
either on the model information (above) or the function information, if
produced by a function.
- SLICE_OUT altered function bodies so when the last graph's output was
a function (WITH folding), the meta mismatch was ignored. When folding
is disabled, however, the output tensor is created based on the original
(untransformed) model information and then is passed to a transformed
subgraph, causing an assert failure.
- Moving the parent-model-side API-altering transformations before
initializing the NPUW's `CompiledModel` solved the problem.
    
- UnfoldInferRequest: run unpack_closure for functions only.
- If FOLDing is disabled, there's no functions but unpack_closure will
still be called, leading to an assert.

### Tickets:
 - *ticket-id*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants