Fix : Falcon processor doesn't account for a layout difference of qkv between transformers and GGUF #35088

MekkCyber · 2024-12-04T15:49:05Z

What does this PR do?

For Falcon 40B in the transformers modeling code, the Q, K, and V tensors are fused and stored in an interleaved manner. This means that, for each group, the Q tensors for all heads in the group are stacked together, followed by the K and V matrices of that group. However, in the GGUF layout, the falcon.tensor_data_layout is set to jploski, which changes how the fused Q, K, and V tensors are stored (more info here). In this layout, the Q, K, and V tensors are stored sequentially instead of interleaved. This sequential storage makes it easier to use on the llama_cpp side. To handle this difference, the PR processes the qkv tensors to convert them back into the interleaved format required for transformers modeling.

Additionally, this PR adds a new field, new_decoder_architecture, to the configuration. This is necessary for ensuring the modeling code of Falcon handles the fused_qkv sizes correctly. The PR also fixes the name of the num_key_value_heads field in the Falcon configuration, changing it to the correct name, num_kv_heads.

Who can review ?

@SunMarc @LysandreJik

HuggingFaceDocBuilderDev · 2024-12-04T16:16:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

LGTM ! cc @Isotr0py for a second look. In this refactor PR, he removed model_size. Maybe there is a better check to deduce if new_decoder_architecture should be set to True or not

SunMarc · 2024-12-10T15:36:05Z

src/transformers/modeling_gguf_pytorch_utils.py

+        if model_size == "40b":
+            parsed_parameters["config"]["new_decoder_architecture"] = True


Can you add tests for the 40b even if we won't run them on the CI (add @Skip) .

In fact, I'm thinking about avoid using model_size extracted from file name, because some user fine-tuned model may use custom filename without "40b" explicitly. (I also removed the model_size logic in #34385)

Considering "new_decoder_architecture" having 2 layernorm for attn in decoder layer, I will prefer to check the existence of attn_norm_2 to determine "new_decoder_architecture".

You can refer to https://huggingface.co/maddes8cht/tiiuae-falcon-40b-instruct-gguf?show_file_info=tiiuae-falcon-40b-instruct-Q2_K.gguf, which indeed has attn_norm_2 params.

Suggested change

if model_size == "40b":

parsed_parameters["config"]["new_decoder_architecture"] = True

new_decoder_architecture = any("attn_norm_2" in tensor.name for tensor in reader.tensors)

parsed_parameters["config"]["new_decoder_architecture"] = new_decoder_architecture

cc @MekkCyber

MekkCyber and others added 2 commits December 4, 2024 15:27

fixing_falcon_processor

a1c0f64

Merge branch 'main' into fix_falcon_processor

b990729

MekkCyber added 2 commits December 5, 2024 08:57

Merge branch 'main' into fix_falcon_processor

b3e9d7b

Merge branch 'main' into fix_falcon_processor

a9eef12

MekkCyber requested a review from SunMarc December 10, 2024 15:26

SunMarc approved these changes Dec 10, 2024

View reviewed changes

MekkCyber and others added 4 commits December 10, 2024 16:52

Merge branch 'main' into fix_falcon_processor

282af20

fixing test

4abd10f

Merge branch 'main' into fix_falcon_processor

bb76f78

Merge branch 'main' into fix_falcon_processor

5bccb59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix : Falcon processor doesn't account for a layout difference of qkv between transformers and GGUF #35088

Fix : Falcon processor doesn't account for a layout difference of qkv between transformers and GGUF #35088

MekkCyber commented Dec 4, 2024

HuggingFaceDocBuilderDev commented Dec 4, 2024

SunMarc left a comment

SunMarc Dec 10, 2024

Isotr0py Dec 10, 2024 •

edited

Loading

Isotr0py Dec 10, 2024

SunMarc Dec 24, 2024

		if model_size == "40b":
		parsed_parameters["config"]["new_decoder_architecture"] = True

Fix : Falcon processor doesn't account for a layout difference of qkv between transformers and GGUF #35088

Are you sure you want to change the base?

Fix : Falcon processor doesn't account for a layout difference of qkv between transformers and GGUF #35088

Conversation

MekkCyber commented Dec 4, 2024

What does this PR do?

Who can review ?

HuggingFaceDocBuilderDev commented Dec 4, 2024

SunMarc left a comment

Choose a reason for hiding this comment

SunMarc Dec 10, 2024

Choose a reason for hiding this comment

Isotr0py Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Isotr0py Dec 10, 2024

Choose a reason for hiding this comment

SunMarc Dec 24, 2024

Choose a reason for hiding this comment

Isotr0py Dec 10, 2024 •

edited

Loading