-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Closed
Description
System Info
Latest update of both transformers and jax
Who can help?
@ArthurZucker I am trying to use the run_mlm_flax.py to train a Roberta model on a v5-256 pod. However, while a single v3-8 is capable of running with per_device_batch_size=128, the v5-256 are only able to run with per_device_batch_size=2. Any ideas?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Using default code.
Expected behavior
I would expect a v5-256 to run a lot faster here.