- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- arXiv:https://arxiv.org/abs/1502.03167
Internal Covariance Shiftã®åé¡
ãã¥ã¼ã©ã«ãããã¯ã¼ã¯ã§ã¯ï¼å ¥åã¨ãªããã¼ã¿ã®åå¸ãç½è²åããã¦ããã¨å¦ç¿ãæ©ãé²ãï¼ ç¹å¾´ãç¡ç¸é¢åãï¼å¹³å0ï¼åæ£1ã¨ãããã¨ã¯ç»åå¦çã§ã¯ç¹ã«ããè¡ãï¼
ãããï¼é層çãªãã¥ã¼ã©ã«ãããã§ã¯ï¼å
¥å層ã§ã®å
¥åãã¼ã¿ãç½è²åããã¦ããã¨ãã¦ãï¼
éã¿ãã©ã¡ã¼ã¿ã¯æ´æ°ããã¦ããããï¼å¾å±¤ã§ã¯å¸¸ã«åå¸ãå¤åããå
¥åããã¨ã«å¦ç¿ãããã¨ã«ãªãï¼
ãã£ãããã©ã¡ã¼ã¿ãå¦ç¿ãã¦ããæ°ããªå
¥ååå¸ã«åããã¦åé©å¿ãããªããã°ãªãããç¡é§ã¨ãªã£ã¦ãã¾ãã
ãã®ç¾è±¡ãèè
ãã¯ï¼ internal covariance shift ã¨å¼ãã§ããï¼
åå¸ãã³ã³ããã¼ã«ããªããã®ãããªå ´åã«ã¯ãå¾é
æ¶å¤±åé¡ãèµ·ãããããªãã
ãã®ã¨ãã«ã¯ï¼
- å¦ç¿çãå°ããè¨å®ãã
- éã¿ã®åæåã注ææ·±ãè¡ã
ãªã©ã®å¦ç½®ãè¡ãå¿
è¦ãããã
ããã§ãã®è«æã§ã¯ä¸è¨ã®åé¡ãåé¿ãããBatch Normalizationãææ¡ãã¦ããã
Batch Normalizationã§ã¯ä»¥ä¸ã®ããã«ï¼
ãããããå
ã§å¹³åã0ã«ï¼åæ£ã1ã«ããæä½ãè¡ãï¼
ä¸å³ã®æå¾ã®è¡ãè¦ãã¨ï¼ gamma 㨠beta ãæãã¦ç´°ãã調æ´ãã§ããããã«ãªã£ã¦ããï¼ ãã¨ãã°ï¼Batch Normalizationã®å¹æããã£ã³ã»ã«ãããã¨ãã§ãã ï¼gamma ãæ¨æºåå·®ã«è¨å®ãï¼betaãå¹³åã«è¨å®ããï¼
ãã ãï¼ãã®gammaã¨betaã¯å¦ç¿ã§ãããã©ã¡ã¼ã¿ã¨ãªã£ã¦ããããï¼ äººãèªç±ã«æ±ºãããã訳ã§ã¯ãªãï¼ åç´ã«è¡¨ç¾åãå¢ãããï¼ã¨ããæå³ã¨è§£éããï¼
BatchNormalizationã«ã¯ã
- ãããããåä½ã§å¹³åã»åæ£ãæ£è¦åããããè¨ç®éãå°ãã
- ãããããåä½ã§å¾®åå¯è½
ã¨ããç¹å¾´ãããã ã¢ã«ã´ãªãºã ã¯ã©ã®ããã«ãªããã¨ããã¨
è¨ç·´æã¯ããããããã¨ã«æ£è¦åãã
ä¸æ¹ããã¹ãï¼inference)æã¯ãå
¨è¨ç·´ãã¼ã¿ããå¦ç¿ããæ
å ±ã§æ£è¦åãããããã
åæ£ã¯è¨ç·´ãã¼ã¿ã®å¹³åãä¸ååæ£ã使ãã
ã©ãã«Batch Norm層ãæ¿å ¥ãããã¨ããã¨ï¼ æ®éã¯å ¨çµå層ãç³ã¿è¾¼ã¿å±¤ã®ç´å¾ã§ï¼æ´»æ§åé¢æ°ã®ç´åï¼
MNISTã®çµæ
è«æã®å³ããã®ã¾ã¾ã®ããã
åæãæ©ããã¾ãå ¥ååå¸ã¯å®å®ãã¦ããã®ããããã
èªã¿è§£ããªãã£ãç¹
regularizationã®æå³
Batch Normã«ããæ£ååã®å¹æã謳ãæå¥ãã¨ããã©ããã«æ£è¦ããããï¼ ãã®æ£ååã®æå³ãããããããªãã£ãï¼
Furthermore, batch normalization regularizes the model and reduces the need for Dropout (Srivastava et al., 2014).
When training with Batch Normalization, a training example is seen in conjunction with other examples in the mini-batch, and the training network no longer producing deterministic values for a given training example.
ãããã«å«ã¾ããè¨ç·´ãã¼ã¿ã«ä¾åãã¦åãã¼ã¿ã¯æ£è¦åããããï¼ ãã®ãããã®é¸ã³æ¹ã¯deterministicã§ãªãããï¼çµæçã«è¨ç·´ãã¼ã¿ã®æ£è¦åã®ããæ¹ãdeterministicã§ã¯ãªãï¼ ã°ãã¤ããããã®ã§ï¼è¨ç·´ãã¼ã¿ã®æ¡å¼µã«ã¤ãªãã£ã¦ï¼æ£ååã®å¹æãããã¨ããæå³ï¼
Reduce the photometric distortions.ã®æå³
Batch Normã®ãããã¯ã¼ã¯ã«ããããã¯ããã¯ã®æå¾ã®ä¾ã¨ãã¦ï¼ Reduce the photometric distortionsãç´¹ä»ããã¦ãããï¼ ããã¯è¨ç·´ãã¼ã¿ã®æ¡å¼µããã¾ãããªãï¼ã¨ããæå³ï¼
Because batchnormalized networks train faster and observe each training example fewer times, we let the trainer focus on more ârealâ images by distorting them less.
(追è¨)è¨ç®ã°ã©ãã«ãã誤差éä¼æ
以ä¸ã®ããã°ã§ï¼ è¨ç®ã°ã©ãã§èª¤å·®éä¼æãä¸å¯§ã«è§£èª¬ãã¦ããï¼
Understanding the backward pass through Batch Normalization Layer