Skip to content

Conversation

@leejianwoo-collab
Copy link

fix: #7747
Hi DeepSpeed team,

I hope this PR finds you well. I'm submitting this fix to resolve issue #7747, which addresses the problem where MoE router parameters are forcibly cast to bf16 under DeepSpeed bf16 configuration, causing dtype mismatch in fp32 routing logic.

Key changes:

Added should_preserve_dtype() helper function to check parameter preservation flags
Enhanced parameter processing in _setup_for_real_optimizer() to handle mixed precision scenarios
Updated storage management in _update_storage_to_flattened_tensor() to preserve original data types
Included comprehensive tests and documentation
Usage:
Users can now mark specific parameters with param.preserve_dtype = True to maintain their original precision while still benefiting from BF16 mixed precision training for other parameters.

This solution is backward compatible and provides an official mechanism for handling numerically sensitive modules like MoE routers. I've tested this thoroughly and believe it will be valuable for users facing similar precision-related issues.

Looking forward to your feedback and review. Thank you for your time and consideration!

Best regards,

@tohtana
Copy link
Contributor

tohtana commented Jan 1, 2026

Hi @leejianwoo-collab,
This PR removes a lot of code in the BF16 optimizer, and I’m concerned it may break existing DeepSpeed features. Could you keep the current behavior?
I agree that adding parameter-level precision control is very important. If you think we should deprecate some features as part of this PR, we’re happy to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]MoE router parameters are forced to bf16 under DeepSpeed bf16, causing dtype mismatch in fp32 routing logic

2 participants