Add Zamba2 #34517

pglorio · 2024-10-30T17:57:31Z

What does this PR do?

Please include support for Zamba2 architecture created by Zyphra Technologies.

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker

…into zamba2

Rebase zamba2

rebase

Rebase

pglorio · 2024-11-11T06:50:35Z

Hey @Arthur,

Thank you again for your help in getting Zamba2 into transformers! The PR is now finally ready to be reviewed. I added the documentation and all unit tests pass, including slow tests.

A few remarks, mostly related to modular transformers:

To generate modeling and configuration I used utils/modular_model_converter.py from a previous commit because the most recent version of this script that followed from a large refactoring produces an error that I was not able to fix:

Converting src/transformers/models/zamba2/modular_zamba2.py to a single model single file format
Traceback (most recent call last):
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1510, in <module>
    converted_files = convert_modular_file(file_name, args.old_model_name, args.new_model_name)
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1447, in convert_modular_file
    for file, module in create_modules(cst_transformers).items():
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1387, in create_modules
    nodes_to_add, file_type, new_imports = get_class_node_and_dependencies(modular_mapper, class_name, node, files)
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1337, in get_class_node_and_dependencies
    new_node_dependencies, new_imports = check_dependencies_and_create_import_node(
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1283, in check_dependencies_and_create_import_node
    class_dependencies = {dep for dep in new_dependencies if m.matches(mapper.global_nodes[dep], m.ClassDef())}
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1283, in <setcomp>
    class_dependencies = {dep for dep in new_dependencies if m.matches(mapper.global_nodes[dep], m.ClassDef())}
KeyError: 'Zamba2Config'

I carefully compared Zamba2Config with classes of other models that also use modular (such as Gemma2Config) and they appear to have consistent format. Relatedly, the utils/modular_model_converter.py in the current PR (path) is the version from the previous commit mentioned above.

After running utils/modular_model_converter.py, the modeling and configuration files generated contain unintended code that I had to update. All these modifications are in this commit. In particular, the produced modeling file contains Zamba2DynamicCache, which is the correct cache of Zamba2 as well as HybridMambaAttentionDynamicCache, which is the cache of Zamba and is not relevant to Zamba2, so I deleted HybridMambaAttentionDynamicCache and related references.
I ran make fixup and all zamba-related tests pass, with the exception of python utils/check_modular_conversion.py. This test doesn't pass due to the modifications mentioned in the previous point.
I slightly edited the Zamba2MambaMixer compared to the original Mamba2Mixer of mamba2, the main difference is that I added these lines, which was necessary to appropriately process the mamba2 cache (note this step already existed in the torch forward in these lines).

Looking forward to your feedback. Thanks so much!

src/transformers/models/zamba2/modular_zamba2.py

rebase on upstream

pglorio · 2024-12-20T01:09:17Z

Thank you so much for this feedback @Cyrilvallez. I realized that the issue with unit tests was inside the torch_forward method of the mamba2 mixer (when i ran locally, the unit tests used cuda_kernels method instead). I fixed that method here: 1 2 3.

By the way, we originally took the the torch_forward from the mamba2 model, so the same issues hold there. In particular, running this:

config = Mamba2Config(num_heads=8,
        n_groups=8,
        state_size=2,
        head_dim=8,
        conv_kernel=4,
        chunk_size=8,
        vocab_size=99,
        hidden_size=32,
        num_hidden_layers=4,
        hidden_act="silu",
        hidden_dropout_prob=0.1,
        max_position_embeddings=512,
                      )
model = Mamba2ForCausalLM(config)

inputs = {'input_ids': torch.tensor([[86,  6, 51,  3, 12, 15, 33, 18,  4, 92],
         [69, 66, 49, 45, 48, 44, 61, 56, 68, 85]]),
 'attention_mask': torch.tensor([[0, 0, 1, 1, 1, 0, 0, 0, 1, 0],
         [0, 1, 1, 0, 1, 1, 1, 0, 1, 1]])}

outputs_cpu = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
model = model.to('cuda')
inputs = {key: tensor.to(device=model.device) for key, tensor in inputs.items()}
outputs_cuda = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print(torch.all(outputs_cpu == outputs_cuda.cpu()).item())

returns False.

pglorio · 2025-01-14T08:28:45Z

Hi @Cyrilvallez and @ArthurZucker,

I updated the attention forward to the new standard of transformers here and here.

I ran all final tests, including @slow tests, and everything appears to pass!

Cyrilvallez

Nice work for the refactor! Almost ready, left some final comments but overall quite nice! 🤗

src/transformers/models/zamba2/modular_zamba2.py

Cyrilvallez · 2025-01-15T19:35:51Z

src/transformers/testing_utils.py

        "ZambaModelTester",
+        "Zamba2ModelTester",
        "RwkvModelTester",


cc @ydshieh here to ensure this change is necessary, as I'm not familiar with this new part!

@ydshieh for context, when running this test the config of the model is forced to have num_hidden_layers=1 but other parameters of the config are not updated accordingly so when the model is initialized it errors out as these params are not consistently updated. It's probably also the reason why Zamba was added to this list I imagine.

tests/models/zamba2/test_modeling_zamba2.py

pglorio · 2025-01-16T09:47:59Z

Thank you @Cyrilvallez for the review. I addressed the comments above, although there are a couple of pending points.

All zamba-related tests appear to pass.

pglorio · 2025-01-17T05:09:39Z

Hello @Cyrilvallez, I ran all model tests on two GPUs and after a couple of minor fixes everything appears to work now. I'm skipping this test as it gives an error related to mamba2 kernels. I indeed verified that mamba2 skips that test here.

Separately, when running utils/check_modular_conversion.py I get the following error:

Differences found between the generated code and src/transformers/models/zamba2/modeling_zamba2.py:

   1 --- src/transformers/models/zamba2/modeling_zamba2.py_generated
   2 +++ src/transformers/models/zamba2/modeling_zamba2.py
   3 @@ -313,6 +313,13 @@
   4      return attn_output, attn_weights
   5  
   6  
   7 +def rotate_half(x):
   8 +    """Rotates half the hidden dims of the input."""
   9 +    x1 = x[..., : x.shape[-1] // 2]
  10 +    x2 = x[..., x.shape[-1] // 2 :]
  11 +    return torch.cat((-x2, x1), dim=-1)
  12 +
  13 +
  14  def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
  15      """Applies Rotary Position Embedding to the query and key tensors.
  16  
  17 @@ -338,13 +345,6 @@
  18      q_embed = (q * cos) + (rotate_half(q) * sin)
  19      k_embed = (k * cos) + (rotate_half(k) * sin)
  20      return q_embed, k_embed
  21 -
  22 -
  23 -def rotate_half(x):
  24 -    """Rotates half the hidden dims of the input."""
  25 -    x1 = x[..., : x.shape[-1] // 2]
  26 -    x2 = x[..., x.shape[-1] // 2 :]
  27 -    return torch.cat((-x2, x1), dim=-1)

which I was not getting before despite this part was identical.

pglorio added 6 commits October 24, 2024 05:33

First commit

acd25b7

Finish model implementation

70639b8

First commit

d111b98

Finish model implementation

8f36dba

Merge branch 'zamba2' of https://github.com/Zyphra/transformers_zamba …

f0c547c

…into zamba2

Register zamba2

700fbf0

pglorio marked this pull request as draft October 30, 2024 17:57

pglorio and others added 17 commits November 4, 2024 23:57

generated modeling and configuration

70a6021

Merge pull request #2 from Zyphra/main

88c4b26

Rebase zamba2

generated modeling and configuration

685906a

added hybrid cache

4da8d5f

fix attention_mask in mamba

6b5a9be

dropped unused loras

248350d

fix flash2

d1d2c66

Merge pull request #3 from Zyphra/main

eb6063e

rebase

config docstrings

5f5d01e

fix config and fwd pass

c1b7647

make fixup fixes

979b99b

text_modeling_zamba2

9d9b2eb

Merge pull request #4 from Zyphra/main

3a457f5

Rebase

small fixes

549d4cb

make fixup fixes

987bba9

Merge pull request #5 from Zyphra/main

ffc2a58

Rebase

Fix modular model converter

9adf85e

ArthurZucker reviewed Nov 14, 2024

View reviewed changes

src/transformers/models/zamba2/modular_zamba2.py Show resolved Hide resolved

added inheritances in modular, renamed zamba cache

904da4e

pglorio force-pushed the zamba2 branch from 6d20bf9 to 904da4e Compare November 19, 2024 06:28

pglorio and others added 2 commits November 19, 2024 01:06

Merge pull request #6 from Zyphra/main

4725983

rebase on upstream

modular rebase

0be27d7

pglorio added 3 commits December 19, 2024 19:42

removed rope from attention init

99708af

updated rope

d9b4a50

created get_layers method

095d853

pglorio added 4 commits December 20, 2024 01:17

rebase

10ebad5

make fixup fix

99e343e

make fixup fixes

4e40975

make fixup fixes

61bb32f

pglorio mentioned this pull request Jan 7, 2025

Zamba new attention standard #35375

Merged

5 tasks

pglorio added 5 commits January 7, 2025 19:47

fix merge conflicts

bb9b24b

update to new attention standard

cb90bb4

fixes for merge

8ed701e

update to new attention standard

1dbc8c7

make fixup fixes

f24e452

Cyrilvallez reviewed Jan 15, 2025

View reviewed changes

pglorio added 4 commits January 16, 2025 06:35

rebase

676f862

minor fixes

2b29338

cache_position

b212cb2

removed cache_position postion_ids use_cache

1e3b51e

pglorio added 8 commits January 16, 2025 19:58

remove config from modular

5ace701

removed config from modular (2)

535b631

rebase

5a16aa9

import apply_rotary_pos_emb from llama

1c92266

fixed rope_kwargs

99bde93

Instantiate cache in Zamba2Model

baf2ed3

fix cache

9afb57e

fix @slow decorator

d1687f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Zamba2 #34517

Add Zamba2 #34517

pglorio commented Oct 30, 2024

pglorio commented Nov 11, 2024

pglorio commented Dec 20, 2024

pglorio commented Jan 14, 2025 •

edited

Loading

Cyrilvallez left a comment

Cyrilvallez Jan 15, 2025

pglorio Jan 16, 2025

pglorio commented Jan 16, 2025

pglorio commented Jan 17, 2025

Add Zamba2 #34517

Are you sure you want to change the base?

Add Zamba2 #34517

Conversation

pglorio commented Oct 30, 2024

What does this PR do?

Who can review?

pglorio commented Nov 11, 2024

pglorio commented Dec 20, 2024

pglorio commented Jan 14, 2025 • edited Loading

Cyrilvallez left a comment

Choose a reason for hiding this comment

Cyrilvallez Jan 15, 2025

Choose a reason for hiding this comment

pglorio Jan 16, 2025

Choose a reason for hiding this comment

pglorio commented Jan 16, 2025

pglorio commented Jan 17, 2025

pglorio commented Jan 14, 2025 •

edited

Loading