[Numpy] Further simplifying log of Softmax for NLL loss, to use LSE trick for numerical stability #21

casinca · 2024-08-09T10:48:51Z

NLLLoss is directly computed on log of Softmax and with some high learning rates you can get
unstable trainings/inf loss and RuntimeWarning because of div by 0 in nlls = -np.log(probs_targets)

example with the same lr = 0.1 in numpy and pytorch implementations:

To get it more in line with pytorch I further simplify the log of the softmax in order to use the LogSumExp trick.
You had already done exp_logits = np.exp(logits - logits_max), I just take the log of the sum of that.

I get the actual probabilities back for the cache by taking the exponential of log_probs

This doesn't have much impact on this result tbh and could simply be left as an exercise idea.

casinca added 2 commits August 8, 2024 22:51

LSE for stability

6e7bcf0

better syntax and reset to original lr

eb3f43f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Numpy] Further simplifying log of Softmax for NLL loss, to use LSE trick for numerical stability #21

[Numpy] Further simplifying log of Softmax for NLL loss, to use LSE trick for numerical stability #21

casinca commented Aug 9, 2024

[Numpy] Further simplifying log of Softmax for NLL loss, to use LSE trick for numerical stability #21

Are you sure you want to change the base?

[Numpy] Further simplifying log of Softmax for NLL loss, to use LSE trick for numerical stability #21

Conversation

casinca commented Aug 9, 2024