On word embeddings - Part 2: Approximating the Softmax The softmax layer is a core part of many current neural network architectures. When the number of output classes is very large, such as in the case of language modelling, computing the softmax becomes very expensive. This post explores approximations to make the computation more efficient. This post gives an overview of approximations that can
{{#tags}}- {{label}}
{{/tags}}