-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Insights: deepspeedai/DeepSpeed
Overview
Could not load contribution data
Please try again later
2 Pull requests merged by 2 people
-
Fix fp8 gemm
#7265 merged
May 8, 2025 -
add
Makefile
to ease maintenance#7267 merged
May 7, 2025
1 Pull request opened by 1 person
-
Update to use torch2.7 for nv-torch-latest
#7273 opened
May 8, 2025
1 Issue closed by 1 person
-
About offload stage3 source code learning problems
#6735 closed
May 8, 2025
5 Issues opened by 5 people
-
[REQUEST] New Integration - NeptuneMonitor
#7274 opened
May 8, 2025 -
[BUG]"DeepSpeedZeRoOffload missing '_restore_from_bit16_weights' method when loading checkpoints"
#7272 opened
May 6, 2025 -
[REQUEST] Equivalent of FSDP ignore_params or ignore_modules for DeepSpeed Zero 3
#7271 opened
May 6, 2025
20 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
set `device_id` in torch's `init_process_group`
#7266 commented on
May 7, 2025 • 4 new comments -
Ulysses SP for HF Integration
#7268 commented on
May 8, 2025 • 0 new comments -
Avoid graph break by removing another redundant requires grad false
#7263 commented on
May 8, 2025 • 0 new comments -
rollback #6726
#7258 commented on
May 8, 2025 • 0 new comments -
Fix AutoTP gathering replaced layer params when bias is not None
#7257 commented on
May 8, 2025 • 0 new comments -
Fix issue with symint input
#7243 commented on
May 8, 2025 • 0 new comments -
DeepNVMe update
#7215 commented on
May 2, 2025 • 0 new comments -
Enable torch.autocast with ZeRO
#6993 commented on
May 7, 2025 • 0 new comments -
Improve overflow handling in ZeRO
#6976 commented on
May 8, 2025 • 0 new comments -
[BUG]Issues with Running DeepSpeed Zero2 & Zero3 Not Taking Effect
#7026 commented on
May 8, 2025 • 0 new comments -
[BUG] - Multiple 5090s failing on deepspeed.initialize()
#7261 commented on
May 8, 2025 • 0 new comments -
nv-nightly CI test failure
#7140 commented on
May 8, 2025 • 0 new comments -
nv-torch-nightly-v100 CI test failure
#7195 commented on
May 8, 2025 • 0 new comments -
[BUG] DeepCompile in ZeRO-1 fails to do the forward pass
#7229 commented on
May 7, 2025 • 0 new comments -
where is pretrain_zeropp_gpt.py
#3840 commented on
May 6, 2025 • 0 new comments -
[REQUEST]Does the current version support distributed fine-tuning on mac devices (M2-M4)?
#7148 commented on
May 6, 2025 • 0 new comments -
[BUG] ZeRO 3 error: expected the next 4 parameters in the parameter fetch queue to be ... but got ()
#3599 commented on
May 4, 2025 • 0 new comments -
[BUG] Zero2 offload overflow
#5241 commented on
May 3, 2025 • 0 new comments -
nv-ds-chat CI test failure
#7213 commented on
May 3, 2025 • 0 new comments -
[BUG] [ERROR] [autotuner.py:699:model_info_profile_run] The model is not runnable with DeepSpeed with error = (
#4759 commented on
May 3, 2025 • 0 new comments