- ã¯ããã«
- äºåæºå
- å®è£ ã®æ¦è¦
- ç°å¢é³åæå®é¨
- å®è£ ã®èå°è£ãªã©
- ãããã«
ã¯ããã«
以åãOnoma-to-Waveãå®è£ ããè¨äºãæ¸ãããã¨ããã£ãï¼
Onoma-to-Waveã¨ã¯ãªããããï¼æååï¼ãç°å¢é³ï¼ã¹ãã¯ããã°ã©ã ï¼ã«å¤æããã¢ãã«ã§ãããRNNã«åºã¥ãencoderã¨decoderããæ§æããã¦ãããããããSequence-to-Sequenceï¼Seq2Seqï¼ã®æ§é ãæã¤ã ããããTransformerã«ãã£ã¦ç½®ãæããã¢ãã«ããOnoma-to-Waveã®èè ãã«ãã£ã¦å®ã¯ãã§ã«ææ¡ããã¦ããã
å²¡æ¬ æ å¸ï¼äºæ¬ æ¡å³ï¼é«é æ ä¹ä»ï¼ç¦æ£® éå¯ï¼å±±ä¸ æ´ä¸ï¼"Transformerãç¨ãããªããããããã®ç°å¢é³åæï¼" æ¥æ¬é³é¿å¦ä¼2021å¹´ç§å£ç 究çºè¡¨ä¼ï¼pp. 943-946ï¼2021.
ä¸è¨ã¯å½å å¦ä¼ã«ãããæ»èªãªãã®äºç¨¿è«æã§ããã2022å¹´10ææç¹ã§Transformerã«åºã¥ãã¢ãã«ã®ææ¡èªä½ã¯æ»èªä»ãè«æã¨ãã¦åºçããã¦ããªãããã ããèè ãã«ãã以ä¸ã®è«æã§ã¯ãæ¬ã¢ãã«ãå«ãå½¢ã§ä¸»è¦³è©ä¾¡å®é¨ãå®æ½ããã¦ããã
èè ã«ããç°å¢é³åæã®ãã¢ã³ã¹ãã¬ã¼ã·ã§ã³ã¯ä»¥ä¸ãã試è´ã§ããã
ã¢ãã«ã®å ¨å®¹ãææ¡ããã«ã¯ä»¥ä¸ã®PDFãã¡ã¤ã«ãå½¹ã«ç«ã¤ã ããã
TransformerçOnoma-to-Waveã®å®è£ ã¯2022å¹´ã®æ¬è¨äºå·çæç¹ã§ã¯å ¬éããã¦ããªããããèªåã§å®è£ ãã¦ç°å¢é³åæã®åç¾å®é¨ã試ã¿ãã¨ããããã§ããã
äºåæºå
ãã¼ã¿ã»ããã¯åé ã«ç¤ºããè¨äºã¨åæ§ã«ãRWCP å®ç°å¢é³å£°ã»é³é¿ãã¼ã¿ãã¼ã¹ (Real World Computing Partnership-Sound Scene Database; RWCP-SSD) ãå©ç¨ããã
å®è£ ã®æ¦è¦
ã½ã¼ã¹ã³ã¼ãã¯ä»¥ä¸ã®ãªãã¸ããªã«ç½®ãããEnjoy!
ååã¨åæ§ã«ãã©ã«ããåãã¦ããã
ãã©ã«ãå | 説æ |
---|---|
unconditional |
é³é¿ã¤ãã³ãã«ããæ¡ä»¶ã¥ããªãï¼ãªããããã®ã¿ãå ¥åï¼ |
conditional |
é³é¿ã¤ãã³ãã«ããæ¡ä»¶ã¥ãããï¼ãªããããã¨é³é¿ã¤ãã³ãã©ãã«ãå ¥åï¼ |
ããããã®ãã©ã«ãã«ã¯ä»¥ä¸ã®ä¸»è¦ãªã¹ã¯ãªãããã¡ãç¨æããã
ãã¡ã¤ã«å | æ©è½ |
---|---|
preprocess.py | å種ã®åå¦çãå®æ½ããã¹ã¯ãªãã |
training.py | ã¢ãã«ã®è¨ç·´ãå®æ½ããã¹ã¯ãªãã |
inference.py | è¨ç·´æ¸ã®ã¢ãã«ãç¨ãã¦ç°å¢é³ãåæããã¹ã¯ãªããï¼è¨ç·´ã«ç¨ããªãã£ããã¹ããã¼ã¿ã対象ã¨ããåæï¼ |
synthesis.py | è¨ç·´æ¸ã®ã¢ãã«ãç¨ãã¦ç°å¢é³ãåæããã¹ã¯ãªãã |
preprocess.pyã§ã¯ååè¨äºã§ç´¹ä»ããåå¦çã«å ãã¦ãä»åã¯é³é¿ç¹å¾´éããªã¼ãã£ãªãã¼ã¿ããæ½åºãã¦ä¿åããå¦çãå«ãã§ããããã®çç±ã«ã¤ãã¦ã¯æ¬è¨äºã®æå¾ã§è¿°ã¹ã¦ããã®ã§ãåèã¾ã§ã
synthesis.pyã§ã¯Dataset, DataLoaderã使ããããã¾ããªãããããé³ç´ æååã¨ãã¦èªç±ã«ä¸ãããã¨ãã§ããç¹ã«ãinference.pyã¨ã®éãããããä¾ãã°ããªããããã"ãã¤ã¤ã¤ã¤"ï¼/b i i i i i/ï¼ãªã "b i i i i i" ã®ããã«ä¸ãããã¨ãã§ãããã¨ãã£ãå ·åã§ãããæå ã§è²ã ã¨ãªãããããå¤ããªããç°å¢é³åæã試ãéã«ã¯synthesis.pyã®ã»ãã便å©ã§ããã
å種ã®è¨å®ã¯ååè¨äºã«å¼ãç¶ãyamlãã¡ã¤ã«ï¼config.yamlï¼ã«è¨è¿°ãããããèªã¿è¾¼ãã§ä½¿ãå½¢ã¨ããããããã£ã¦ã³ã¼ãã®åããæ¹ã¯ã
- config.yamlã®ç·¨é
- preprocess.py ã«ããåå¦çã®å®æ½
- training.pyã«ããã¢ãã«è¨ç·´ã®å®æ½
- inference.py / synthesis.pyã«ããç°å¢é³åæã®å®æ½
ã¨ãªããpreprocess.pyã¯ä¸åº¦ã ãåããã°ããã
ã¢ãã«ã®æ¡ä»¶ä»ãã«å¿ è¦ãªé³é¿ã¤ãã³ãã©ãã«ã®åãè¾¼ã¿ã«ã¯ããã¤ãã®é¸æè¢ãèããããããæ¬è¨äºã§ã¯ä»¥ä¸ã®åãè¾¼ã¿æ¹æ³ã§å¾ãããçµæãç´¹ä»ããã
memoryï¼ã¨ã³ã³ã¼ãåºåï¼ã¨ã¤ãã³ãã©ãã«ãçµåããã®ã¡ã«ç·å½¢å¤æãã
ãRWCP é³å£°ã»é³é¿ãã¼ã¿ãã¼ã¹ãç¨ããç°å¢é³ã»å¹æé³åæã®æ¤è¨ã¨ãªããããæ¡å¼µãã¼ã¿ã»ããã®æ§ç¯ãã§ã¯ã ãã以å¤ã«ããã³ã¼ãå ¥åã¨ã¤ãã³ãã©ãã«ãçµåããç·å½¢å¤æããããã¨ã§ã¤ãã³ãã©ãã«ãåãè¾¼ãæ¹æ³ãä½µãã¦æ¡ç¨ãã¦ããã ããããªããæ¬è¨äºã«å ç«ã¤äºåå®é¨ã®çµæãããmemoryã«åãè¾¼ãã ãã§ç°å¢é³ã®åæã«ã¨ã£ã¦ååãªå¹æããããããã¨ã確èªãã¦ããã ã¡ãªã¿ã«ä»åã®å®è£ ã§ã¯ã©ã¡ãã®åãè¾¼ã¿æ¹æ³ãé¸æã§ããï¼ä¸¡æ¹ONã両æ¹OFFãã©ã¡ããçæ¹ã®ã¿ONã¨ããåãæ¿ããå¯è½ï¼ã
ç°å¢é³åæå®é¨
å®é¨æ¡ä»¶
RWCP-SSDãã以ä¸ã®10種é¡ã®ç°å¢é³ãå®é¨ã«æ¡ç¨ããï¼ã«ãã³å ã¯ãã¼ã¿ãã¼ã¹ã«ãããèå¥åï¼ã
- éå±è£½ã®ã´ãç®±ãå©ããã¨ãã®é³ï¼trashboxï¼
- ã«ãããå©ããã¨ãã®é³ï¼cup1ï¼
- ãã«ãé³´ãããã¨ãã®é³ï¼bells5ï¼
- ç´ãç ´ãã¨ãã®é³ï¼tearï¼
- ãã©ã ãå©ããã¨ãã®é³ï¼drumï¼
- ãã©ã«ã¹ãæ¯ã£ãã¨ãã®é³ï¼maracasï¼
- ãã¤ãã¹ã«ãå¹ããã¨ãã®é³ï¼whistle3ï¼
- ã³ã¼ãã¼è±ããã«ã§æ½ããã¨ãã®é³ï¼coffmillï¼
- é»æ°ã·ã§ã¼ãã¼ãåãããã¨ãã®é³ï¼shaverï¼
- ç®è¦ã¾ãæè¨ã®é³ï¼clock1ï¼
åé³é¿ã¤ãã³ãã«ã¤ã95åã®ãªã¼ãã£ãªãã¡ã¤ã«ãè¨ç·´ã«ç¨ãããããã«åãªã¼ãã£ãªãã¡ã¤ã«ã«ä»éãããªããããã15åã©ã³ãã ã«é¸æãã¦è¨ç·´ã«ç¨ããã
è¨äºãé·ããªãã®ã§ãå®é¨æ¡ä»¶ã®è©³ç´°ã¯æãããã¿å½¢å¼ã§ç¤ºãããããã¯ã¼ã¯ã®è¨å®ãªã©ãã©ããã¦ãæ°ã«ãªãã²ã¨ã ãã¯ãªãã¯ã
å®é¨æ¡ä»¶ã表示ãã
è¨ç®æ©ç°å¢ã示ãã
è¨ç®æ©ç°å¢ | ãã¼ã¸ã§ã³ãªã© |
---|---|
OS | Ubuntu 22.04 |
CPU | Intel i9-9900K |
GPU | RTX2070 |
Python | 3.10.6 |
PyTorch | 1.12 |
è¨ç·´ã®åºæ¬çãªè¨å®ã¯ä»¥ä¸ã®éãã§ããã
é ç® | è¨å® |
---|---|
ããããããµã¤ãº | 32 |
ã¨ããã¯æ° | Seq2SeqTransformer ã¯1500 Mel2linear ã¯1000 |
å¦ç¿ç | Seq2SeqTransformer ã¯0.0003 Mel2linear ã¯0.0001 |
å¾é ã¯ãªããã³ã°ã®ãããå¤ | 1.0 |
ãªãTransformerã®è¨ç·´ã«ã¯"Attention is All You Need"ï¼ã¢ãã³ã·ã§ã³ããåããï¼ã®è«æã§æ¡ç¨ãããå¦ç¿çã¹ã±ã¸ã¥ã¼ãªã³ã°ãæ¡ç¨ãããwarm-upã®ã¨ããã¯ã¯300ã¨ãããMel2linear
ã¯ã¹ã±ã¸ã¥ã¼ãªã³ã°ãªãã
é³é¿ç¹å¾´éã®è¨å®ã¯ä»¥ä¸ã®éãã§ããã
é ç® | è¨å® |
---|---|
é³é¿ç¹å¾´é | 80次å ã®ã¡ã«ã¹ãã¯ããã°ã©ã |
FFTã®çªé· | 2048 |
ãã¬ã¼ã é· | 2048 |
ãã¬ã¼ã ã·ãã | 512 |
ç¶ãã¦Transformerã¾ããã®è¨å®ã示ãã
è¨å®é ç® | è¨å®å¤ |
---|---|
åãè¾¼ã¿æ¬¡å | 512 |
Multi-head attentionã®ãããæ° | 4 |
ãªããã£ãã¤ã¶ | RAdam |
ã¨ã³ã³ã¼ãã®å±¤æ° | 3 |
ãã³ã¼ãã®å±¤æ° | 3 |
Position-wise Feed-Forward Networkã®æ¬¡å æ° | 1536 |
Activation function | ReLU |
Dropoutç | 0.1 |
Scaled Positional Encodingã«ã¤ãã¦ã¯Dropoutçã0.1ã«è¨å®ããã
Encoder Prenetã®è¨å®ã¯ä»¥ä¸ã®éãã§ããã1次å ã®CNNã3層éãã¦ããã
é ç® | è¨å® |
---|---|
åãè¾¼ã¿æ¬¡å | 512 |
1d-CNNã®ãã£ãã«æ° | 512 |
1d-CNNã®ã«ã¼ãã«ãµã¤ãº | 3 |
1d-CNNã®å±¤æ° | 3 |
æ´»æ§åé¢æ° | ReLU |
Dropoutç | 0.5 |
Decoder Prenetã¯ä»¥ä¸ã®éãã§ãããå ¨çµå層ã2層ãããªããã¾ãè¨ç·´æåã³æ¨è«æã§å¸¸ã«Dropoutãæå¹ã«ãã¦ããã ããã«ãããåæé³ã®å¤æ§æ§ã確ä¿ããçããããã
é ç® | è¨å® |
---|---|
åãè¾¼ã¿æ¬¡å | 512 |
å±¤æ° | 2 |
æ´»æ§åé¢æ° | ReLU |
Dropoutç | 0.5 |
Postnetã®è¨å®ã¯ä»¥ä¸ã®éãã§ããã
é ç® | è¨å® |
---|---|
åãè¾¼ã¿æ¬¡å | 512 |
1d-CNNã®ãã£ãã«æ° | 512 |
1d-CNNã®ã«ã¼ãã«ãµã¤ãº | 5 |
1d-CNNã®å±¤æ° | 5 |
æ´»æ§åé¢æ° | tanh |
Dropoutç | 0.5 |
CBHGã®è¨å®ã¯ä»¥ä¸ã®éãã§ããã
é ç® | è¨å® |
---|---|
ç³ã¿è¾¼ã¿ãããã®ãã³ã¯æ° | 8 |
Highway Netã®å±¤æ° | 4 |
ããã¸ã§ã¯ã·ã§ã³ã®æ¬¡å æ° | 512 |
Dropoutç | 0.5 |
å®é¨çµæ
é³é¿ã¤ãã³ãã«ããæ¡ä»¶ä»ããè¡ããªãå ´åãããªãã¡ãªããããã®ã¿ãå ¥åã®å ´åãç°ãªãã¯ã©ã¹ï¼é³é¿ã¤ãã³ãï¼ã®é³ã¨ãã¦åæãããããä»ã®ã¯ã©ã¹ã®é³ãåé ã«æ··ãããªã©ãåæ失æã¨ãªãã±ã¼ã¹ãå¤ã確èªããããããããªããæ¡ä»¶ä»ããè¡ããã¨ã§ãã¯ã©ã¹ã®é£ãéãã®ãªãåæé³ãå¾ãããã
- bells5ï¼ãªãããã: ãªã³ãªã¼ã³ãé³ç´ å: / r i N r i: N /ï¼
æ¡ä»¶ä»ããªã | æ¡ä»¶ä»ããã |
---|---|
- clock1ï¼ãªãããã: ããªãªãªãªãé³ç´ å: / ch i r i r i r i r i /ï¼
æ¡ä»¶ä»ããªã | æ¡ä»¶ä»ããã |
---|---|
- coffmillï¼ãªãããã: ã¬ãµã¬ãµãé³ç´ å: / g a s a g a s a /ï¼
æ¡ä»¶ä»ããªã | æ¡ä»¶ä»ããã |
---|---|
- cup1ï¼ãªãããã: ãã£ã³ããé³ç´ å: / ch i: N q /ï¼
æ¡ä»¶ä»ããªã | æ¡ä»¶ä»ããã |
---|---|
- drumï¼ãªãããã: ãã³ããé³ç´ å: / p o N q /ï¼
æ¡ä»¶ä»ããªã | æ¡ä»¶ä»ããã |
---|---|
- maracasï¼ãªãããã: ã·ã£ã«ãé³ç´ å: / sh a k a /ï¼
æ¡ä»¶ä»ããªã | æ¡ä»¶ä»ããã |
---|---|
- shaverï¼ãªãããã: ãã¼ãã¼ãé³ç´ å: / b i: b i: /ï¼
æ¡ä»¶ä»ããªã | æ¡ä»¶ä»ããã |
---|---|
- tearï¼ãªãããã: ã¹ã¦ã£ã¤ã¤ã¤ã¤ã³ãé³ç´ å: / s u w i i i i i N /ï¼
æ¡ä»¶ä»ããªã | æ¡ä»¶ä»ããã |
---|---|
- trashboxï¼ãªãããã: ãã¼ã³ãé³ç´ å: / p o: N /ï¼
æ¡ä»¶ä»ããªã | æ¡ä»¶ä»ããã |
---|---|
- whistle3ï¼ãªãããã: ãã¤ããé³ç´ å: / p i i q /ï¼
æ¡ä»¶ä»ããªã | æ¡ä»¶ä»ããã |
---|---|
ç¶ãã¦ãå ±éã®ãªãããããä¸ãã¦ãæ¡ä»¶ä»ãã®ã¯ã©ã¹ãè²ã ã¨å¤ããå ´åã®åæä¾ã示ããããã§Seq2Seqã¨ã¯ååOnoma-to-Waveã«ããåææ³ãæãã
- ãªãããã: ãã¤ã¤ã¤ã¤ã¤ (/ b i i i i i i /)
Seq2Seq | Transformer |
---|---|
"whistle3"ã§æ¡ä»¶ä»ã | "whistle3"ã§æ¡ä»¶ä»ã |
"shaver"ã§æ¡ä»¶ä»ã | "shaver"ã§æ¡ä»¶ä»ã |
"tear"ã§æ¡ä»¶ä»ã | "tear"ã§æ¡ä»¶ä»ã |
- ãªãããã: ã¹ã£ãªã¹ã£ãª (/ sh a r i sh a r i /)
Seq2Seq | Transformer |
---|---|
"maracas"ã§æ¡ä»¶ä»ã | "maracas"ã§æ¡ä»¶ä»ã |
"coffmill"ã§æ¡ä»¶ä»ã | "coffmill"ã§æ¡ä»¶ä»ã |
- ãªãããã: ãã£ã (/ ch i: q /)
Seq2Seq | Transformer |
---|---|
"cup1"ã§æ¡ä»¶ä»ã | "cup1"ã§æ¡ä»¶ä»ã |
"whislte3"ã§æ¡ä»¶ä»ã | "whislte3"ã§æ¡ä»¶ä»ã |
"shaver"ã§æ¡ä»¶ä»ã | "shaver"ã§æ¡ä»¶ä»ã |
- ãªãããã: ãã³ã (/ b o N q /)
Seq2Seq | Transformer |
---|---|
"drum"ã§æ¡ä»¶ä»ã | "drum"ã§æ¡ä»¶ä»ã |
"transhbox"ã§æ¡ä»¶ä»ã | "trashbox"ã§æ¡ä»¶ä»ã |
- ãªãããã: ãªã³ãªã³ (/ r i N r i N /)
Seq2Seq | Transformer |
---|---|
"bells5"ã§æ¡ä»¶ä»ã | "bells5"ã§æ¡ä»¶ä»ã |
"clock1"ã§æ¡ä»¶ä»ã | "clock1"ã§æ¡ä»¶ä»ã |
å®è£ ã®èå°è£ãªã©
- ã¢ãã«ã®Encoder PrenetãDecoder PrenetãPostnetã®å®è£ ã«ããããRyuichi Yamamotoæ°ãå¶ä½ããttslearnã大ãã«åèã«ãããå¤å°ã®ä¿®æ£ãå ããããã»ã¼ãã®ã¾ã¾adaptãããttslearnã«ãããã³ãã¼ã©ã¤ã表è¨ã¯ãã¡ããå¿ããã«è¡ã£ã¦ããã
- Transformerã®å®è£
ã¯PyTorchå
¬å¼ã®ã¢ã¸ã¥ã¼ã«ãç¨ããããã®ã¢ã¸ã¥ã¼ã«ã¾ããã§æåã«æ¸æãã®ã¯ããã¹ã¯ãã®ä¸ãæ¹ã§ãããä»åã¯sourceå´ã®å
¥åã«ãªããããï¼æååï¼ãtargetå´ã®å
¥åã«ã¡ã«ã¹ãã¯ããã°ã©ã ï¼ã¹ãã¯ãã«åï¼ã¨ããæ§æã§ãããç³»åãã¼ã¿ã«å¯¾ãããã¼ã·ãã¯ãªå®å¼åãè¸è¥²ããããã®ã¨ããsource self-attentionã®ããã®ãã¹ã¯ï¼src_maskï¼ã¯
None
ãsource-target attentionã®ããã®ãã¹ã¯ï¼memory_maskï¼ãNone
ã§ãããã¤ã¾ãattentionã®è¨ç®ã«é¢ãã¦ç¹å®ã®sourceå ¥åãé¤å¤ãããã¹ã¯ã¯å¿ è¦ã¨ããªããtarget self-attentionã®ããã®ãã¹ã¯ï¼tgt_maskï¼ã¯ããããå èªã¿ãé²ãé段ç¶ã®ãã¹ã¯ãä¸ãã¦ãããã»ãããããããå ã®ãªããããé·ããã¬ã¼ã é·ã¯ä¿æãã¦ãããããã£ã³ã°ã«å¿ãããã¹ã¯ï¼src_key_padding_mask, tgt_key_padding_mask, memory_key_padding_maskï¼ãä¸ãã¦ããã - preprocess.pyã«ç¹å¾´éæ½åºå¦çã追å ããçç±ï¼ååè¨äºã®Onoma-to-Waveã®å®è£ ã§ã¯ãã¢ãã«è¨ç·´ã®éãDatasetæ§ç¯æã«é³é¿ç¹å¾´éã®æ½åºå¦çãæ¯åããç´ãã¦ãããããããªãããè¨ç·´ã®åº¦ã«åãå¦çãç¹°ãè¿ãã®ã¯ç¡é§ã大ãããã¾ããã°ã®æ¸©åºã«ãªãããæ¹åãå¿ è¦ã¨èãããããã§ãæ½åºå¦çãDatasetæ§ç¯æã«è¡ãã®ã§ãªããpreprocess.pyã®ä¸ã§ä¸åº¦ã ãè¡ãæ¹å¼ã«å¤æ´ããã以éã®è¨ç·´ã§ã¯æ½åºæ¸ã®ç¹å¾´éãæ¯åãã¼ãããã ãã§ããããã ãæ¯åæ½åºãç´ãã«ãããäºåã«æ½åºãã¦ãã¼ãããã«ãããç¹å¾´éå ¨ä½ãã¡ã¢ãªã«ä¹ããã¨ãæé»ã®åæã¨ãã¦ããï¼ä»åã¯130MBã»ã©ï¼ããªãåå¦çã«é¢ããè°è«ã¯ä»¥ä¸ã®ã¹ã©ã¤ããåèã«ããã¨ããï¼15æç®ãããï¼ã
- è£å©çã«ä½æããã¢ã¸ã¥ã¼ã«ãã¡ã¯ä»¥ä¸ã®éãã§ããã
ãã¡ã¤ã«å | æ©è½ |
---|---|
dataset.py | ãã¼ã¿ã»ããã®ã¯ã©ã¹OnomatopoeiaDataset ãå®ç¾© |
mapping_dict.py | ãªããããï¼è±åï¼ã®åã¨æ°å¤åãç¸äºã«å¤æããè¾æ¸ãä¸ããã¯ã©ã¹MappingDict ãå®ç¾© |
models.py | ãªãããããã対æ°ã¡ã«ã¹ãã¯ããã°ã©ã ã«å¤æããSeq2SeqTransformer ã¯ã©ã¹ã対æ°ã¡ã«ã¹ãã¯ããã°ã©ã ãã対æ°ãªãã¢ï¼ã¹ã±ã¼ã«ï¼ã¹ãã¯ããã°ã©ã ã«å¤æããMel2Linear ã¯ã©ã¹ãå®ç¾© |
module.py | Seq2SeqTransformer ã¯ã©ã¹ã¨Mel2Linear ã¯ã©ã¹ãå®ç¾©ããã®ã«å¿
è¦ãªã¢ã¸ã¥ã¼ã«ç¾¤ï¼ã¯ã©ã¹ï¼ãå®ç¾© |
scheduler.py | Seq2SeqTransformer ã¯ã©ã¹ã§å©ç¨å¯è½ãªãå¦ç¿çã調æ´ããã¹ã±ã¸ã¥ã¼ã©ã¼ã®ã¯ã©ã¹TransformerLR ãå®ç¾© |
trainer.py | ãããããã«ããã¢ãã«è¨ç·´ã®ã«ã¼ããåããã¾ãè¨ç·´æ¸ã¿ã®ã¢ãã«ãç¨ãã¦ç°å¢é³ãçæããã¯ã©ã¹Trainer ãå®ç¾© |
util.py | ãã¹ã¯ä½æãé³é¿ç¹å¾´éæ½åºãªã©è£å©çãªé¢æ°ç¾¤ãå®ç¾© |
Seq2SeqTransformer
ã¨ããã¯ã©ã¹åã¯PyTorchã®ãã¥ã¼ããªã¢ã«ç±æ¥ã§ããããã£ãããªãä»åã®ç°å¢é³åæã«ç¹åããã¯ã©ã¹åãè¯ãã¨èããOnoma2WaveTransformer
ãOnoma2Mel
ã¨ãã£ã代æ¡ãèãã¦ã¿ãããã©ãããã£ããããªãã£ããåè ã¯ã¯ã©ã¹åãé·ãããæ°ããã¦ãããå¾è ã¯å ·ä½çãªå¦çå 容ï¼ãªãããããã対æ°ã¡ã«ã¹ãã¯ããã°ã©ã ã¸ã®å¤æï¼ãåæ ããæªããªãããTransformerã使ã£ã¦ããæãèããæ°ããããSeq2SeqTransformer
ã¯ã©ã¹ã¨Mel2Linear
ã¯ã©ã¹ã¯ç¬ç«ã«è¨ç·´ããããããããããå°ç¨ã®Trainerã¯ã©ã¹ãç¨æãã¦å¥ã ã®ã¢ã¸ã¥ã¼ã«ï¼ã¹ã¯ãªããï¼ã«ãããã¨ãå®è£ å½åã«èãããããããã¢ã¸ã¥ã¼ã«ãããã¿ã«å¢ããã®ã¯ãã¾ãç¾ãããªãè¨è¨ã¨èãã両ã¯ã©ã¹ã®è¨ç·´ã¨æ¨è«ãã²ã¨ã¤ã®Trainer
ã¯ã©ã¹ã«çµ±åããã®ã§ãããããããã®è¨ç·´ã®ON/OFFã¯configãã¡ã¤ã«ã§åãæ¿ããè¨è¨ã¨ãããTransformerLR
ã®å®è£ ã¯ä»¥ä¸ã®è¨äºã§ç´¹ä»ãããã®ã§ãããå®ã¯æ¬è¨äºã®å¸ç³ã§ãã£ãã
ãããã«
å®è£ ãéãã¦ãTransformerã®æ°æã¡ãå°ãåãã£ãæ°ããããç¹ã«self-attentionãMulti-head attentionã¾ããã¯ãã¹ã¯ã®ä½¿ãæ¹ãå«ãã¦åå¼·ã«ãªã£ãããã¯ãå®è£ ã¯ç解ã¸ã®è¿éã