Add normalized and group-boundary AutoQuant scoring by meenchen · Pull Request #1878 · NVIDIA/Model-Optimizer

meenchen · 2026-07-01T19:58:12Z

What does this PR do?

Type of change: New feature

This PR adds two complementary AutoQuant scoring controls:

constraints["score_model"]="per_element" normalizes selector coefficients by the number of represented weight elements while preserving the configured effective-bit budget.
method="group_recon" with score_boundary="group" measures normalized reconstruction error at shared attention or MLP group outputs. This changes where sensitivity is measured without forcing the corresponding projection modules to share a quantization recipe.

It also keeps shared-expert gate/up/down projections in one deployable fused-MoE recipe decision, persists the new scoring metadata in AutoQuant checkpoints, validates restored-state compatibility, and exposes the controls through the HF PTQ CLI.

Existing behavior remains the default: score_model="raw", method="gradient", and score_boundary="local".

Usage

model, search_state = mtq.auto_quantize(
    model,
    constraints={
        "effective_bits": 5.5,
        "cost_model": "active_moe",
        "score_model": "per_element",
    },
    data_loader=calib_dataloader,
    forward_step=forward_step,
    quantization_formats=[mtq.NVFP4_DEFAULT_CFG, mtq.FP8_DEFAULT_CFG],
    method="group_recon",
    score_boundary="group",
)

Equivalent HF PTQ flags:

--auto_quantize_method group_recon \
--auto_quantize_score_model per_element \
--auto_quantize_score_boundary group

Testing

72 passed: focused non-distributed AutoQuant unit suite
1 passed: distributed AutoQuant checkpoint/search test
3 passed: HF PTQ argument tests
All changed-file pre-commit hooks passed, including Ruff, mypy, RST checks, and Bandit
bash -n passed for the modified HF PTQ shell entry points
Qwen3.6-35B end-to-end checkpoint generation and vLLM serving smoke test passed on the implementation-equivalent pre-rename revision; the terminology-only revision is covered by the unit and CLI tests above

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: ✅
Did you get Claude approval on this PR?: N/A (draft)

Additional Information

The terms group scoring and recipe grouping are intentionally distinct: group scoring changes the output boundary used to measure a projection's perturbation, while recipe grouping constrains multiple modules to use the same quantization format.

Signed-off-by: weimingc <[email protected]>

copy-pr-bot · 2026-07-01T19:58:16Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-07-01T19:58:46Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 945959f4-2af6-4f44-bcc8-39c777ebe0fe

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch weimingc/autoquant-per-element-group-scoring

_{Comment @coderabbitai help to get the list of available commands.}

github-actions · 2026-07-01T20:09:27Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1878/
Built to branch `gh-pages` at 2026-07-01 20:09 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-07-01T20:16:09Z

Codecov Report

❌ Patch coverage is 94.83871% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.26%. Comparing base (1b03381) to head (876f6e4).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/algorithms.py	94.44%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1878      +/-   ##
==========================================
+ Coverage   61.17%   61.26%   +0.09%     
==========================================
  Files         515      515              
  Lines       57207    57357     +150     
==========================================
+ Hits        34994    35141     +147     
- Misses      22213    22216       +3

Flag	Coverage Δ
unit	`55.02% <94.83%> (+0.11%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add normalized and group-boundary AutoQuant scoring

876f6e4

Signed-off-by: weimingc <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add normalized and group-boundary AutoQuant scoring#1878

Add normalized and group-boundary AutoQuant scoring#1878
meenchen wants to merge 1 commit into
mainfrom
weimingc/autoquant-per-element-group-scoring

meenchen commented Jul 1, 2026

Uh oh!

copy-pr-bot Bot commented Jul 1, 2026

Uh oh!

coderabbitai Bot commented Jul 1, 2026

Review skipped

Uh oh!

github-actions Bot commented Jul 1, 2026

Built to branch `gh-pages` at 2026-07-01 20:09 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

meenchen commented Jul 1, 2026

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented Jul 1, 2026

Uh oh!

coderabbitai Bot commented Jul 1, 2026

Review skipped

Uh oh!

github-actions Bot commented Jul 1, 2026

Built to branch gh-pages at 2026-07-01 20:09 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Jul 1, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Built to branch `gh-pages` at 2026-07-01 20:09 UTC.
Preview will be ready when the GitHub Pages deployment is complete.