Add an example that train the torchtitan version of llama. #8400

qihqi · 2024-11-20T00:35:54Z

Few bugs fixed along the way:

silu.default lowering to go to direct lowering
Blockspec computatation should be inside of flash_attention (because the query len might change if shard_map applies).

tengyifei · 2024-11-22T22:02:21Z

experimental/torch_xla2/examples/train_llama_torchtitan/train_llama.py

+  tpu_args = "--xla_tpu_enable_async_collective_fusion_fuse_all_gather=true --xla_tpu_megacore_fusion_allow_ags=false --xla_enable_async_collective_permute=true --xla_tpu_enable_ag_backward_pipelining=true --xla_tpu_enable_data_parallel_all_reduce_opt=true --xla_tpu_data_parallel_opt_different_sized_ops=true --xla_tpu_enable_async_collective_fusion=true --xla_tpu_enable_async_collective_fusion_multiple_steps=true --xla_tpu_overlap_compute_collective_tc=true --xla_enable_async_all_gather=true"
+  os.environ.setdefault('LIBTPU_INIT_ARGS', tpu_args)
+
+_setup_default_env()


Did you test that changing the environ here actually gets picked up by XLA? For example I was wondering if import jax or some other import will cause the TPU backend to get initialized and ignore future changes to XLA flags.

hmmm... moving to the top to be safe.

experimental/torch_xla2/examples/train_llama_torchtitan/train_llama.py

qihqi added 3 commits November 19, 2024 04:23

Add hybrid mesh

6302f41

misc changes

f85feaa

train llama

af96c6f

qihqi force-pushed the hanq_hybrid_mesh branch from c1418fd to bacce0b Compare November 22, 2024 21:30

silu.default to use direct lowering

c0c10f1

qihqi force-pushed the hanq_hybrid_mesh branch from bacce0b to c0c10f1 Compare November 22, 2024 21:31

reset xla_sharding file

ecbdbb0

qihqi force-pushed the hanq_hybrid_mesh branch from 195a6e1 to ecbdbb0 Compare November 22, 2024 21:39

qihqi changed the title ~~Add hybrid mesh~~ Add an example that train the torchtitan version of llama. Nov 22, 2024

qihqi marked this pull request as ready for review November 22, 2024 21:41

qihqi requested review from tengyifei and JackCaoG November 22, 2024 21:41

tengyifei reviewed Nov 22, 2024

View reviewed changes

experimental/torch_xla2/examples/train_llama_torchtitan/train_llama.py Outdated Show resolved Hide resolved

tengyifei reviewed Nov 22, 2024

View reviewed changes

experimental/torch_xla2/examples/train_llama_torchtitan/train_llama.py Outdated Show resolved Hide resolved

tengyifei reviewed Nov 22, 2024

View reviewed changes

experimental/torch_xla2/examples/train_llama_torchtitan/train_llama.py Outdated Show resolved Hide resolved

tengyifei reviewed Nov 22, 2024

View reviewed changes

experimental/torch_xla2/examples/train_llama_torchtitan/train_llama.py Outdated Show resolved Hide resolved

comment

28b23a3

qihqi requested a review from tengyifei November 22, 2024 22:26

tengyifei approved these changes Nov 23, 2024

View reviewed changes

qihqi merged commit 31d348e into master Nov 23, 2024
3 checks passed

qihqi deleted the hanq_hybrid_mesh branch November 23, 2024 01:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an example that train the torchtitan version of llama. #8400

Add an example that train the torchtitan version of llama. #8400

qihqi commented Nov 20, 2024 •

edited

Loading

tengyifei Nov 22, 2024

qihqi Nov 22, 2024

Add an example that train the torchtitan version of llama. #8400

Add an example that train the torchtitan version of llama. #8400

Conversation

qihqi commented Nov 20, 2024 • edited Loading

tengyifei Nov 22, 2024

Choose a reason for hiding this comment

qihqi Nov 22, 2024

Choose a reason for hiding this comment

qihqi commented Nov 20, 2024 •

edited

Loading