-
Notifications
You must be signed in to change notification settings - Fork 6k
Pull requests: karpathy/nanoGPT
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Merge for comprehension when filtering parameters without grad
#574
opened Nov 22, 2024 by
tsdeng
Loading…
added fix to type comparison to enable fused AdamW
#569
opened Oct 30, 2024 by
seanjudelyons
Loading…
Updated README.md to include table of contents, why this project is useful, and how to contribute, and added an output for one command
#558
opened Sep 25, 2024 by
arhaque09
Loading…
free up state_dict variable memory after loading checkpoint
#533
opened Jul 10, 2024 by
adistomar
Loading…
fix(train.py): mfu estimation to respect CPU-GPU sync point
#527
opened Jun 23, 2024 by
JasonLiJT
Loading…
Fix: conditional use of GradScaler based on device_type and dtype in train.py
#481
opened May 9, 2024 by
BRAINIAC2677
Loading…
PyTorch nn.LayerNorm now takes bias arg - removed custom class
#454
opened Mar 10, 2024 by
calmitchell617
Loading…
Fix a small bug in the attention bias calculation when flash attention is not available
#398
opened Nov 28, 2023 by
tbuthfer
Loading…
Previous Next
ProTip!
Adding no:label will show everything without a label.