Isn't min-SNR training strategy counter-intuitive? #13054
Replies: 2 comments 3 replies
-
|
@drhead could you please explain this phenomenon? |
Beta Was this translation helpful? Give feedback.
-
|
Great question — this tripped me up initially too, but it makes sense once you think about it from a signal processing perspective. The key insight is that min-SNR weighting is not about task difficulty — it's about gradient signal quality. When Without min-SNR weighting, these noisy high-t gradients dominate training because their magnitude is large. This is analogous to a classical signal processing problem: a high-variance estimator with poor SNR can degrade overall system performance even if it carries some information. The optimal strategy (in a Wiener filter sense) is to attenuate components in proportion to their noise power. That's exactly what min-SNR does: it clips the loss weight at high-noise timesteps to prevent those noisy gradients from overwhelming the useful signal from low/mid-noise timesteps. Think of it this way:
The paper (Section 3.2) shows this formally: the optimal weighting under an SNR-based variance analysis is So it's not "easy tasks get high weight" — it's "timesteps where the gradient signal is reliable get preserved, noisy timesteps get attenuated." Classic matched filtering intuition applied to diffusion training. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Consider the case 'prediction_type' = 'sample' (predict x0). min-SNR training strategy assign weights to each sample of the loss (as in the figure below, but replace$\gamma$ with values > $\gamma$ :

This is quite counter-intuitive as (when t → 0), the denoising task of diffusion model is easier than (when t → T). Therefore, the weights (when t → T) should be higher than (when t → 0), and the formula for the weighs should be the inverse of min-SNR weighting, but the paper https://arxiv.org/abs/2303.09556 suggests the opposite weighting.
Can anyone please explain, please?
Beta Was this translation helpful? Give feedback.
All reactions