run_mlm_flax on tpu v5-pods

### System Info

Latest update of both transformers and jax

### Who can help?

 @ArthurZucker I am trying to use the `run_mlm_flax.py` to train a Roberta model on a v5-256 pod. However, while a single v3-8 is capable of running with `per_device_batch_size=128`, the v5-256 are only able to run with` per_device_batch_size=2`. Any ideas?

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Using default code.

### Expected behavior

I would expect a v5-256 to run a lot faster here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

run_mlm_flax on tpu v5-pods #35205

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

run_mlm_flax on tpu v5-pods #35205

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions