Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix : Falcon processor doesn't account for a layout difference of qkv between transformers and GGUF #35088

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

MekkCyber
Copy link
Contributor

What does this PR do?

For Falcon 40B in the transformers modeling code, the Q, K, and V tensors are fused and stored in an interleaved manner. This means that, for each group, the Q tensors for all heads in the group are stacked together, followed by the K and V matrices of that group. However, in the GGUF layout, the falcon.tensor_data_layout is set to jploski, which changes how the fused Q, K, and V tensors are stored (more info here). In this layout, the Q, K, and V tensors are stored sequentially instead of interleaved. This sequential storage makes it easier to use on the llama_cpp side. To handle this difference, the PR processes the qkv tensors to convert them back into the interleaved format required for transformers modeling.

Additionally, this PR adds a new field, new_decoder_architecture, to the configuration. This is necessary for ensuring the modeling code of Falcon handles the fused_qkv sizes correctly. The PR also fixes the name of the num_key_value_heads field in the Falcon configuration, changing it to the correct name, num_kv_heads.

Who can review ?

@SunMarc @LysandreJik

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@MekkCyber MekkCyber requested a review from SunMarc December 10, 2024 15:26
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! cc @Isotr0py for a second look. In this refactor PR, he removed model_size. Maybe there is a better check to deduce if new_decoder_architecture should be set to True or not

Comment on lines +361 to +362
if model_size == "40b":
parsed_parameters["config"]["new_decoder_architecture"] = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests for the 40b even if we won't run them on the CI (add @Skip) .

Copy link
Collaborator

@Isotr0py Isotr0py Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I'm thinking about avoid using model_size extracted from file name, because some user fine-tuned model may use custom filename without "40b" explicitly. (I also removed the model_size logic in #34385)

Considering "new_decoder_architecture" having 2 layernorm for attn in decoder layer, I will prefer to check the existence of attn_norm_2 to determine "new_decoder_architecture".

You can refer to https://huggingface.co/maddes8cht/tiiuae-falcon-40b-instruct-gguf?show_file_info=tiiuae-falcon-40b-instruct-Q2_K.gguf, which indeed has attn_norm_2 params.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if model_size == "40b":
parsed_parameters["config"]["new_decoder_architecture"] = True
new_decoder_architecture = any("attn_norm_2" in tensor.name for tensor in reader.tensors)
parsed_parameters["config"]["new_decoder_architecture"] = new_decoder_architecture

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants