Can tinygrad automatically split larger models across gpus while training? #4696

dhandhalyabhavik · 2024-05-23T17:07:09Z

dhandhalyabhavik
May 23, 2024

Wondering, if I want to train model similar to llama-70b from scratch on 2 gpus 24gb memory each. Will tinygrad automatically split the model for training?

I am talking about model parallelism not Data parallelism.

Answered by chenyuxyz

May 23, 2024

Sharding model is supported, we don't explicitly distinguish model parallelism from data parallelism

tinygrad/test/test_multitensor.py

Line 25 in ce46a7e

# shard_x is "data parallel"

It's not automatic and you need to specify how you want to shard the model. This example shards 70B llama onto 6 GPU for serving, and you would need a lot more GPUs for training

tinygrad/examples/llama.py

Line 283 in ce46a7e

for k,v in nn.state.get_state_dict(model).items():

View full answer

chenyuxyz · 2024-05-23T17:21:55Z

chenyuxyz
May 23, 2024
Maintainer

Sharding model is supported, we don't explicitly distinguish model parallelism from data parallelism

tinygrad/test/test_multitensor.py

Line 25 in ce46a7e

# shard_x is "data parallel"

It's not automatic and you need to specify how you want to shard the model. This example shards 70B llama onto 6 GPU for serving, and you would need a lot more GPUs for training

tinygrad/examples/llama.py

Line 283 in ce46a7e

for k,v in nn.state.get_state_dict(model).items():

0 replies

dhandhalyabhavik · 2024-05-23T17:28:33Z

dhandhalyabhavik
May 23, 2024
Author

Just want to know your thoughts on tinygrad shard vs pytorch + accelerate (huggingface)
Which one will make my experience writing model seamless while not worrying about data vs model parallelism.

1 reply

g1y5x3 May 23, 2024

Probably accelerate since it is basically wrapped around torch.distributed and deepspeed which handles those details for you, all you need to provide is the number of gpus.

Personally I think tinygrad's approach is much more expressive and flexible. It also removes the need for additional API wrapper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can tinygrad automatically split larger models across gpus while training? #4696

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Can tinygrad automatically split larger models across gpus while training? #4696

Uh oh!

dhandhalyabhavik May 23, 2024

Replies: 2 comments · 1 reply

Uh oh!

chenyuxyz May 23, 2024 Maintainer

Uh oh!

dhandhalyabhavik May 23, 2024 Author

Uh oh!

g1y5x3 May 23, 2024

dhandhalyabhavik
May 23, 2024

Replies: 2 comments 1 reply

chenyuxyz
May 23, 2024
Maintainer

dhandhalyabhavik
May 23, 2024
Author