Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Numpy] Further simplifying log of Softmax for NLL loss, to use LSE trick for numerical stability #21

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

casinca
Copy link

@casinca casinca commented Aug 9, 2024

NLLLoss is directly computed on log of Softmax and with some high learning rates you can get
unstable trainings/inf loss and RuntimeWarning because of div by 0 in nlls = -np.log(probs_targets)

example with the same lr = 0.1 in numpy and pytorch implementations:
rtw

To get it more in line with pytorch I further simplify the log of the softmax in order to use the LogSumExp trick.
You had already done exp_logits = np.exp(logits - logits_max), I just take the log of the sum of that.

I get the actual probabilities back for the cache by taking the exponential of log_probs

This doesn't have much impact on this result tbh and could simply be left as an exercise idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant