HIGGS Quantization Support #34997

BlackSamorez · 2024-11-28T14:12:00Z

HIGGS 0-Shot Quantization

HIGGS is a new 0-shot quantization algorithm that combines Hadamard preprocessing with MSE-Optimal quantization grids to achieve lower quantization error and SOTA performance. You can find more information in the paper.

Runtime support for HIGGS is implemented through FLUTE, and its library.

This PR adds support for HIGGS+FLUTE into transformers allowing for low-error 0-shot quantization and fast LLM inference.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Rocketknight1 · 2024-11-28T14:19:58Z

cc @SunMarc @MekkCyber

SunMarc · 2024-11-28T14:51:55Z

cc @MekkCyber

BlackSamorez · 2024-11-28T14:52:11Z

Failed tests look like a problem on the runner's end

SunMarc

Thanks for integrating this new quantization method so fast! I left some comments and don't forget to also update the documentation so that the users knows how to use it !

src/transformers/integrations/higgs.py

src/transformers/quantizers/quantizer_higgs.py

src/transformers/utils/quantization_config.py

tests/quantization/higgs/test_higgs.py

Co-authored-by: Marc Sun <[email protected]>

MekkCyber · 2024-11-29T09:58:33Z

Hey @BlackSamorez, Thanks for adding this quantization method so quickly ! I added some very small nits

MekkCyber · 2024-11-29T08:39:00Z

docker/transformers-quantization-latest-gpu/Dockerfile

 RUN python3 -m pip install git+https://github.com/NetEase-FuXi/EETQ.git

+# Add flute-kernel and fast_hadamard_transform for quantization testing
+RUN python3 -m pip install --no-cache-dir flute-kernel==0.2.6


The docker image will be deployed on an instance with cuda 11.8 but on the flute github I noticed you need to specify https://flute-ai.github.io/whl/cu118 in that case

Thanks, updated.

src/transformers/quantizers/quantizer_higgs.py

MekkCyber · 2024-11-29T09:50:20Z

tests/quantization/higgs/test_higgs.py

+        nb_fbgemm_linear = 0
+        for module in model.modules():
+            if isinstance(module, HiggsLinear):
+                nb_fbgemm_linear += 1


I think you meant nb_higgs_linear 😉

Sure. Fixed

MekkCyber · 2024-11-29T09:54:37Z

src/transformers/quantizers/quantizer_higgs.py

+    for m in module_tree:
+        parent = parent._modules[m]
+    return parent
+


sorry if i'm mistaken, I don't believe we use this function anywhere

Removed the unused function. Thanks!

Co-authored-by: Mohamed Mekkouri <[email protected]>

BlackSamorez · 2024-12-05T12:39:46Z

@SunMarc Hi! Is there a chance you could take a look at it this week? We wanted to release it before NIPS.

SunMarc · 2024-12-05T13:50:31Z

@SunMarc Hi! Is there a chance you could take a look at it this week? We wanted to release it before NIPS.

I'll do that now. When is the deadline ? After my review, I still need this to be reviewed by a core maintainer

BlackSamorez · 2024-12-05T14:24:57Z

When is the deadline ?

There isn't a hard deadline, but it would be very nice to have this merged this week or early next week.

SunMarc

Thanks for your work ! Just a few nits.

docs/source/en/quantization/higgs.md

SunMarc · 2024-12-05T13:52:43Z

docs/source/en/quantization/higgs.md

+## Usage Example
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, HiggsConfig
+
+model = AutoModelForCausalLM.from_pretrained(
+    "google/gemma-2-9b-it",
+    quantization_config=HiggsConfig(bits=4),
+    device_map="auto",
+)
+
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
+
+tokenizer.decode(model.generate(
+    **tokenizer("Hi,", return_tensors="pt").to(model.device),
+    temperature=0.5,
+    top_p=0.80,
+)[0])
+```


Small suggestion: What would be even better is to link a colab notebook as a demo.

Sadly, T4 is not among the supported GPUs yet, so no Colab demo for now.

Make sense thanks ! It would be nice to add a section to precise which specific gpu works with it

docs/source/en/quantization/higgs.md

src/transformers/quantizers/quantizer_higgs.py

tests/quantization/higgs/test_higgs.py

SunMarc · 2024-12-05T15:15:37Z

docs/source/en/quantization/higgs.md

+# HIGGS
+
+HIGGS is a 0-shot quantization algorithm that combines Hadamard preprocessing with MSE-Optimal quantization grids to achieve lower quantization error and SOTA performance. You can find more information in the paper [arxiv.org/abs/2411.17525](https://arxiv.org/abs/2411.17525).
+
+Runtime support for HIGGS is implemented through [FLUTE](https://arxiv.org/abs/2407.10960), and its [library](https://github.com/HanGuo97/flute).
+


Small suggestion : If you can share pre-quantized model in an hf org and link it here, that would be nice also !

Co-authored-by: Marc Sun <[email protected]>

BlackSamorez · 2024-12-06T21:41:26Z

Pre-quantized a bunch of models (including Llama-3.3-70B-Instruct) and added a link to the collection to docs.

SunMarc · 2024-12-09T14:03:39Z

Gentle ping @ArthurZucker

SunMarc · 2024-12-13T16:52:12Z

Can you fix the CI with make style for the quality and potentially rebase the PR @BlackSamorez ?

SunMarc

Thanks for adding new features !

SunMarc · 2024-12-13T16:58:11Z

src/transformers/integrations/higgs.py

+                model._modules[name].weight.data = module(
+                    torch.eye(in_features, device=module.scales.device, dtype=module.scales.dtype)
+                ).T.contiguous()


Smart way to perform dequantization. This could be added to all quants methods no ? cc @MekkCyber

ArthurZucker

Thanks a lot for your hard work and for introducing this new method!
Super small nit and @SunMarc will merge 🤗

docs/source/en/quantization/higgs.md

src/transformers/utils/quantization_config.py

BlackSamorez · 2024-12-23T14:48:54Z

Things to recheck later:

make style doesn't actually fix python utils/custom_init_isort.py --check_only errors. I had to run python utils/custom_init_isort.py manually.
pillow (PIL) is needed for make quality but isn't included in pip install -e .[quality]

SunMarc

Thanks for iterating ! I will merge this when the CI is green

HuggingFaceDocBuilderDev · 2024-12-23T15:38:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BlackSamorez · 2024-12-23T15:48:21Z

Quantization tests are nightly, right?

SunMarc · 2024-12-23T15:54:44Z

That's right !

BlackSamorez added 9 commits November 27, 2024 14:35

higgs init

08b347c

working with crunches

14a0c82

per-model workspaces

1c5b9e7

style

9f2ef77

style 2

0ff58c3

Merge branch 'huggingface:main' into main

2e9adc6

tests and style

b6bad71

higgs tests passing

c2bcf39

protecting torch import

a1e7b35

BlackSamorez added 4 commits November 28, 2024 15:22

removed torch.Tensor type annotations

8f1a0a6

torch.nn.Module inheritance fix maybe

120f360

hide inputs inside quantizer calls

fdb71a5

style structure something

127c5f0

SunMarc requested a review from MekkCyber November 28, 2024 14:43

Merge branch 'main' into main

947a53d

Merge branch 'main' into main

e6ddc41

SunMarc reviewed Nov 28, 2024

View reviewed changes

BlackSamorez and others added 4 commits November 28, 2024 20:30

Update src/transformers/quantizers/quantizer_higgs.py

0de97f1

Co-authored-by: Marc Sun <[email protected]>

reworked num_sms

1f08cb0

Merge branch 'main' of github.com:BlackSamorez/transformers

ed369be

Update src/transformers/integrations/higgs.py

96023ab

Co-authored-by: Marc Sun <[email protected]>

MekkCyber reviewed Nov 29, 2024

View reviewed changes

BlackSamorez and others added 4 commits November 29, 2024 15:52

revamped device checks

60ce44b

docstring upd

8142443

Update src/transformers/quantizers/quantizer_higgs.py

1d636ac

Co-authored-by: Mohamed Mekkouri <[email protected]>

edited tests and device map assertions

66ece1d

fixed torch_dtype for HIGGS

15f3789

MekkCyber requested a review from SunMarc December 3, 2024 11:13

SunMarc approved these changes Dec 5, 2024

View reviewed changes

SunMarc requested a review from ArthurZucker December 5, 2024 15:16

BlackSamorez and others added 2 commits December 5, 2024 18:10

Update docs/source/en/quantization/higgs.md

b10375f

Co-authored-by: Marc Sun <[email protected]>

Collection link

2abdc84

BlackSamorez added 3 commits December 10, 2024 15:10

dequantize interface

5074b60

newer flute version, torch.compile support

2fab798

unittest message fix

db27c59

SunMarc approved these changes Dec 13, 2024

View reviewed changes

ArthurZucker approved these changes Dec 17, 2024

View reviewed changes

docs/source/en/quantization/higgs.md Outdated Show resolved Hide resolved

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved

huggingface deleted a comment from code30x58 Dec 18, 2024

BlackSamorez added 3 commits December 23, 2024 15:22

docs update compile

6917ece

Merge branch 'main' into main

7c003c2

isort

e317fc5

ValueError instead of assert

53e6827

SunMarc approved these changes Dec 23, 2024

View reviewed changes

SunMarc merged commit 64c05ee into huggingface:main Dec 23, 2024
25 checks passed

Godofnothing mentioned this pull request Feb 4, 2025

HIGGS Quantization not working properly #36025

Closed

4 tasks

HIGGS Quantization Support #34997

HIGGS Quantization Support #34997

Uh oh!

Conversation

BlackSamorez commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

HIGGS 0-Shot Quantization

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Nov 28, 2024

Uh oh!

SunMarc commented Nov 28, 2024

Uh oh!

BlackSamorez commented Nov 28, 2024

Uh oh!

SunMarc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MekkCyber commented Nov 29, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BlackSamorez commented Dec 5, 2024

Uh oh!

SunMarc commented Dec 5, 2024

Uh oh!

BlackSamorez commented Dec 5, 2024

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BlackSamorez commented Dec 6, 2024

Uh oh!

SunMarc commented Dec 9, 2024

Uh oh!

SunMarc commented Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

BlackSamorez commented Nov 28, 2024 •

edited

Loading

SunMarc left a comment •

edited

Loading

SunMarc commented Dec 13, 2024 •

edited

Loading