Table of Contents: Gradient checks Sanity checks Babysitting the learning process Loss function Train/val accuracy Weights:Updates ratio Activation/Gradient distributions per layer Visualization Parameter updates First-order (SGD), momentum, Nesterov momentum Annealing the learning rate Second-order methods Per-parameter adaptive learning rates (Adagrad, RMSProp) Hyperparameter Optimization Evalua
{{#tags}}- {{label}}
{{/tags}}