Skip to content
\n
\n \n\n \n \n \n \n
# shard_x is \"data parallel\"
\n
\n
\n

\n

It's not automatic and you need to specify how you want to shard the model. This example shards 70B llama onto 6 GPU for serving, and you would need a lot more GPUs for training

\n
\n

\n tinygrad/examples/llama.py\n

\n

\n Line 283\n in\n ce46a7e\n

\n
\n
\n \n\n \n \n \n \n
for k,v in nn.state.get_state_dict(model).items():
\n
\n
\n

","upvoteCount":2,"url":"https://github.com/tinygrad/tinygrad/discussions/4696#discussioncomment-9537925"}}}
Discussion options

You must be logged in to vote

Sharding model is supported, we don't explicitly distinguish model parallelism from data parallelism

# shard_x is "data parallel"

It's not automatic and you need to specify how you want to shard the model. This example shards 70B llama onto 6 GPU for serving, and you would need a lot more GPUs for training

for k,v in nn.state.get_state_dict(model).items():

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by dhandhalyabhavik
Comment options

You must be logged in to vote
1 reply
@g1y5x3
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants