ä»ããã¥ã¼ã©ã«ãã³ã¼ãã¯å½ããåã§ãç¨éã«åããã¦å¤ç¨®å¤æ§.
ãã®åç¹ãWaveNet.
ä»ã§ã¯WaveNetãã®ãã®ã¯ä½¿ãããªããããã®æ ¹æ¬çã¢ã¤ãã¢ã¯å½ããåã¬ãã«ã«æ®åããã¢ã¸ã¥ã¼ã«ãåæã§ä½¿ããã¦ãã.
ããã°æ°å¤å
¸ã§ããWaveNetããã¾æ¯ãè¿ã.
Summary
WaveNetã¯é³å£°æ³¢å½¢çæã¿ã¹ã¯ã以ä¸ã®çµã¿åããã§è§£æ±º.
- çæã¢ãã«: 8bit μ-law softmaxããã®ãµã³ããªã³ã°
- èªå·±å帰: åºå xt-1ã xt çæã®å ¥åã¸
- CausalConv: èªå·±åå¸°å ¥åãConvã§å¦ç
- DilatedConv: çãªConvã§åºãå容é
- Gated Activation Unit: ReLUã«ä»£ããã²ã¼ãä»ãæ´»æ§åé¢æ°
- Residual connection: å¹çã®ããå¦ç¿
- Skip connection: å¹çã®ããå¦ç¿
æ ¹æ¬çã«ã¯ãåºãèªå·±å帰å
¥åãåãè¾¼ãã°é«é³è³ªã®é³å£°æ³¢å½¢çæã¢ãã«ãã§ãããã«å°½ãã.
ã¢ã¤ãã¢ã¨ãã¦ã¯ã·ã³ãã«éãããããã ããããå®ç¾ãã¡ãã£ããããä¸ç㯠Before WaveNet 㨠After WaveNet ã§æ¿å¤ãã.
å
¬å¼ãã¢.
çæã¢ãã«
N bit ã®ç¢ºçåå¸ã¨ãã¦ã¢ãã«åãæå°¤æ¨å®ã§å¦ç¿.
ä¸è¬çãªé³å£°ã®16-bitã ã¨216éãã§softmaxãççºãããããμ-lawã¢ã«ã´ãªãºã ãå©ç¨.
èªå·±å帰
çæã¢ãã«ãããµã³ããªã³ã°ããå¤ãèªå·±å帰ãããã ã.
CausalConv
ç´è§ä¸è§å½¢ã®å容é.
èªå·±å帰ãããããæªæ¥æ¹åã®æ
å ±ããã¹ã¯ç¡ãã§å©ç¨ã§ããã¨æ
å ±ãleakãã¦ãã¾ãï¼ç¡å¤æã®å¦ç¿ã§ç²¾åº¦100%åºã).
Convã«ã¼ãã«ãå·¦å³é対称ã«ããï¼= t+ké¨ã0ãã¹ã¯ããï¼ãã¨ã§ããã«å¯¾å¿.
DilatedConv
ãã«ãkernelãåå½¢æ¯æãã«ããConv.
è¨ç®éã«å¯¾ããå容éã®å¤§ãããæ¹åãã (Dilation factorãä¸ããã»ã©çã§åºã).
ãªãã¨ãªãstrideã彷彿ã¨ããã¦æ¬¡å ã縮ã¿ããã ããstride=1ã«ããã°ã¡ããã¨å ¥åºå次å ãä¸è´ããï¼0ãå¤ãæ®éã®ãã«ãkernel使ãã®ã¨ç価ãªã®ã§ï¼. poolingãstrided convã ã¨æ¬¡å ã縮ã.
gated activation unit
Gated Activation Unit ã¯æ´»æ§åé¢æ°/ã¦ãããã®ä¸ç¨®ã§ããï¼è§£èª¬ï¼ãé³å£°ãã¡ã¤ã³ã§ã®æç¨æ§ããReLUã®ä»£æ¿ã¨ãã¦å°å
¥ããã1.
ä»ã§ãConvç³»ã®å¦çã§ã¯ãã°ãã°ç»å ´.
Residual Connection
WaveNetã¯èªå·±å帰ãªã®ã§æéæ¹åã®æ¬¡å
å§ç¸®ãç¡ã. ãªã®ã§Residual Connectionãæ¡ç¨ãããã.
å容éãåºãããã³ã«å±¤ãæ·±ããªãã®ã§æ·±å±¤å¦ç¿ã®ãã¹ããã©ã¯ãã£ã¹ã¨ãã¦æ¡ç¨2.
ResBlockå
ã«DilatedConv-GAU-pointwiseConvãå
¥ã.
Skip Connection
å層ã®åºååå²ãskip connectionã§æçµå±¤ã«ç´çµ.
Residual Connectionã¨åæ§ã®çç±ã§æ¡ç¨.
è«æã«ã¯ç¡ããã層ãç©ã¿ããã£ã¦å¾ãããåºãcontextã¨skip connectionããããå±ææ å ±ï¼âèªå·±å帰ï¼ã®ä¸¡æ¹ããæçµåºåã決ãããï¼âU-Netï¼ãã¨ãããæ°æã¡ãããã¨æã£ã¦ã.
ã¾ã¨ã
ä¸è¨ã®ããã«ãåºãå容éãæ¾ã£ã¦å¹çããå¦ç¿ã§ããããã«ããããã§èªå·±å帰ããã¦åå¸ã¢ãã«åãã¦ãã ã.
ããã ãã ããä¸çãå¤ãã.
Original paper
@misc{1609.03499, Author = {Aaron van den Oord and Sander Dieleman and Heiga Zen and Karen Simonyan and Oriol Vinyals and Alex Graves and Nal Kalchbrenner and Andrew Senior and Koray Kavukcuoglu}, Title = {WaveNet: A Generative Model for Raw Audio}, Year = {2016}, Eprint = {arXiv:1609.03499}, }
-
“In our initial experiments, we observed that this non-linearity worked significantly better than the rectified linear activation function (Nair & Hinton, 2010) for modeling audio signals.” Oord, et al. (2016). WaveNet: A Generative Model for Raw Audio.↩
-
“residual … connections are used … to speed up convergence and enable training of much deeper models.” Oord (2016).↩