Tags: Young768/DeepSpeed
Tags
[zero_to_fp32] adapt to 4-bytes alignment in z2 (deepspeedai#1372) Co-authored-by: Olatunji Ruwase <[email protected]>
Reducing the memory-overhead of creating model for multi-GPU run (dee… …pspeedai#1244) Co-authored-by: Jeff Rasley <[email protected]>
DeepSpeed MoE (deepspeedai#1310) Co-authored-by: Alex Muzio <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Conglong Li <[email protected]> Co-authored-by: Felipe Cruz Salinas <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Reza Yazdani <[email protected]> Co-authored-by: Samyam Rajbhandari <[email protected]> Co-authored-by: Shaden Smith <[email protected]> Co-authored-by: Young Jin Kim <[email protected]> Co-authored-by: bapatra <[email protected]> Co-authored-by: Samyam Rajbhandari <[email protected]> Co-authored-by: Shaden Smith <[email protected]> Co-authored-by: Young Jin Kim <[email protected]>
Use correct input size for splits (deepspeedai#1284) * Use correct input size for splits * Use smarter partitioning
[Doc] round_robin_gradients (deepspeedai#1261) * Fix docstring * Make screenshots clickable for easier viewing * Navigation menu in alphabetical order; More clicable screenshots * Rename 1Cycle doc * Tweak naming * Remove no longer used flag * ZeRO3 Offload release * Single GPU results * Rearrange figures * Single GPU text * tweak intro * zero3-offload section * Add asynchronous i/o docs * Fix print_per_steps doc * Document round_robin_gradients * Tweak description * Trigger CI
revert part of deepspeedai#1220 (deepspeedai#1221) deepspeedai#1220 fixed the leak, but lead to another problem. reverting that part so that we could do release and will work on it after the release. @jeffra
clean up logging (deepspeedai#1190) Co-authored-by: Jeff Rasley <[email protected]>
PreviousNext