Skip to content

Conversation

@tjruwase
Copy link
Contributor

@tjruwase tjruwase commented Jan 28, 2025

Fix #5241: Improve overflow handling

  • ZeRO 1
  • ZeRO 2
  • ZeRO 3
  • BF16Optimizer

Enable pydantic configuration for mixed precision

  • bf16
  • fp16

@tjruwase
Copy link
Contributor Author

@delock, @inkcherry, can you please help investigate the failing xpu-max1100 CI? Thanks!

@delock
Copy link
Collaborator

delock commented Feb 5, 2025

@delock, @inkcherry, can you please help investigate the failing xpu-max1100 CI? Thanks!

@tjruwase thanks! Our engineer is looking into it.

@sayakpaul
Copy link

Any ETA on this for merge?

@tjruwase
Copy link
Contributor Author

tjruwase commented Jun 6, 2025

Any ETA on this for merge?
Since CI looks to now be fine, this should be merged by 06/13/25. Thanks for the patience.

@loadams loadams enabled auto-merge (squash) June 9, 2025 16:39
@loadams loadams merged commit e440506 into master Jun 9, 2025
12 checks passed
@loadams loadams deleted the olruwase/ds_5241 branch June 9, 2025 17:30
deepcharm pushed a commit to deepcharm/DeepSpeed that referenced this pull request Jun 16, 2025
Fix deepspeedai#5241: Improve overflow handling
- [x] ZeRO 1
- [x] ZeRO 2
- [ ] ZeRO 3
- [ ] BF16Optimizer

Enable pydantic configuration for mixed precision
- [x] bf16
- [x] fp16

---------

Signed-off-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Fabien Dupont <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
Signed-off-by: inkcherry <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Xinyu Lian <[email protected]>
Co-authored-by: loadams <[email protected]>
Co-authored-by: Omar Elayan <[email protected]>
Co-authored-by: Fabio Geraci <[email protected]>
Co-authored-by: Sam Foreman <[email protected]>
Co-authored-by: Fabien Dupont <[email protected]>
Co-authored-by: Liangliang Ma <[email protected]>
Co-authored-by: inkcherry <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Max Kovalenko <[email protected]>
Antlera pushed a commit to Antlera/DeepSpeed that referenced this pull request Jun 27, 2025
Fix deepspeedai#5241: Improve overflow handling 
- [x] ZeRO 1
- [x] ZeRO 2
- [ ] ZeRO 3
- [ ] BF16Optimizer

Enable pydantic configuration for mixed precision
- [x] bf16
- [x] fp16

---------

Signed-off-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Fabien Dupont <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
Signed-off-by: inkcherry <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Xinyu Lian <[email protected]>
Co-authored-by: loadams <[email protected]>
Co-authored-by: Omar Elayan <[email protected]>
Co-authored-by: Fabio Geraci <[email protected]>
Co-authored-by: Sam Foreman <[email protected]>
Co-authored-by: Fabien Dupont <[email protected]>
Co-authored-by: Liangliang Ma <[email protected]>
Co-authored-by: inkcherry <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
mauryaavinash95 pushed a commit to DataStates/DeepSpeed that referenced this pull request Oct 4, 2025
Fix deepspeedai#5241: Improve overflow handling 
- [x] ZeRO 1
- [x] ZeRO 2
- [ ] ZeRO 3
- [ ] BF16Optimizer

Enable pydantic configuration for mixed precision
- [x] bf16
- [x] fp16

---------

Signed-off-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Fabien Dupont <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
Signed-off-by: inkcherry <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Xinyu Lian <[email protected]>
Co-authored-by: loadams <[email protected]>
Co-authored-by: Omar Elayan <[email protected]>
Co-authored-by: Fabio Geraci <[email protected]>
Co-authored-by: Sam Foreman <[email protected]>
Co-authored-by: Fabien Dupont <[email protected]>
Co-authored-by: Liangliang Ma <[email protected]>
Co-authored-by: inkcherry <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Zero2 offload overflow