2021-11-01ãã1ã¶æéã®è¨äºä¸è¦§
manipulation = call(sound, "To Manipulation", 0.01, 75, 600) pitch_tier = call(manipulation, "Extract pitch tier") call(pitch_tier, "Multiply frequencies", sound.xmin, sound.xmax, 2) call([pitch_tier, manipulation], "Replace pitch tier") sâ¦
VCC2020 T10ã¢ãã«1 (top score). ASRãã¼ã¹ã®rec-synã§MOS 4.0 & similarity 3.6 ãéæ. Models ASR SI-ASR (N10ã¨ä¸ç·?) Conversion model Encoder-Decoderã¢ãã« (â S2S). Encoder LSTM -> 2x time-compressing concat2 -> LSTM Decoder Attentionä»ãAR-â¦
Non-local Neural Networks (2018) ã¢ã¸ã¥ã¼ã«ã®ãæ°æã¡ ãç§ã欲ãããã®ã ããã¼ãã¶ãã ããã FC: ã¨ã«ããå ¨è¦ç´ ãåããã Conv: 決ãæã¡ã§å±æã ãåããã RNN: hiddent-1ã ãç´æ¥åããã => ç¾å¨å¤ã«åºã¥ãã¦åçã«ãå ¨é·ãã欲ããè¦ç´ ã ããåâ¦
FastSpeechã«ãããæ¨å®ãå ¼ãã¦ã¿ã¾ããè«æ. Durationã¨åããphonemeåä½ã§PitchPredictorãå¦ç¿. Scalaräºæ¸¬ããå¤ãlatentã¨åãFeature次å ã«å¤æããã®ã¡ããªãã¨ãã sumï¼segFCã§Feature次å ã«é£ã°ãã¦ããã®ã§å¦ç¿å¯ã«ãªãããã®è¾ºã§é³é«æ¬¡å ã§ãæâ¦
éããå·§ããï¼å®ããã¯å¾®å¦ï¼FastSpeech æ¦è¦ Transformerã§é³ç´ åãç³»åå¤æãåçã«ã¢ãããµã³ããªã³ã°ãTransformerã§âç³»åãmel-specã¸å¤æ. 以ä¸. åçã¢ãããµã³ããªã³ã°ã¯ LengthRegulator ã§å®è¡ãããé³ç´ ãã¨ã®åçã DurationPredictior ã§åçâ¦
ã¢ãã«ãä¸é表ç¾ã¨ãã¦ç¹å®ã®å¤ãåãããã«å¦ç¿ãã¦ã»ãã. A: ã¢ãã«ãã¤ã¢ã¹ã§èªç¶ã¨ããå¦ç¿ããããã«ç¥ã B: ã¢ãã«åå²ããã¦åå¥å¦ç¿ C: ãã®ä¸é表ç¾ã«å¯¾ãã¦Lossãè¨å® D: Lossãè¨å®ããããã§æ¬¡ã®å±¤ã¸ã¯æ師ãã¼ã¿ã渡ãï¼teacher forcingçï¼ â¦
éãã«ã¼ãã«ãµã¤ãºã®Convã並åã«ä¸¦ã¹ããã¿ã¼ã³. ãã£ãã«ãã¨ã«ã«ã¼ãã«ãµã¤ãºãéãã¨è¦åããã¨ãã§ãã. åºå次å ã¯strideã¨channelæ°ã§æ±ºã¾ãã®ã§multi-resolutionã«ãããã©ããã¨ã¯ç¡é¢ä¿. ã¡ããã¡ããè²ããªã¨ãã§åéçºããã¦ãã¤ã¡ã¼ã¸. Tacotrâ¦
主張ãTTSããããªãWaveNetãè¤éãªç¹å¾´éã§ç´æ¥æ¡ä»¶ä»ãããã "è¯ãchar2specã¢ãã«+spec2wave WaveNet" ããããã æ¦è¦ Attention Seq-to-Seq ã§æååããã¡ã«ã¹ãã¯ããã°ã©ã ãçæãWaveNetã§æ³¢å½¢çæ. LSTM Encoderãæç« ã丸åã¿ãæçµåºåãzã¨â¦
3Dã¯VTuberã«ã¨ã£ã¦ãããã°æ´»ããããç´ æ´ããããªãã·ã§ã³ãã ããå¿ è¦æ¡ä»¶ã§ã¯ãªã. ãã3Dãããæ¬è³ªãªãã°ãè«ççã«YouTuberã«åã¦ãªã. ãªããªãç©ççãªäººéã®3D度ã¯ç©¶æ¥µã«é«ããã. ããã¦å®éãVTuberã¯3Dã«ãã ãããªãé åã¸ãççºçã«æµ¸éãã¦â¦
å¤è¨èªASRã®äºåå¦ç¿ã«CPCãå©ç¨ãæ¢åã®æ師ããã¢ãã«ã¨åç以ä¸ã®æ§è½ãçºæ®. èæ¯ å°ãã¼ã¿ã®æã©ãããã => è¿ããã¡ã¤ã³ã®å¤§ãã¼ã¿ã§pre-training & Transfer learning ASRã¯é³ç´ ã£ã½ããã®ãäºåå¦ç¿ã§ããã°ããã¨å ±ç¨ã§ããã => CPC ææ³ CPCã®æâ¦
MelGANã«å¯¾ãã¦ã¢ãã«ã»Lossã®æé©åãããä¸ã§ãæçµåºåãã£ãã«ãè¤æ°ã«ãã¦ãããããµããã³ããäºæ¸¬. é称 MB-MelGAN ã¢ãã« MelGANãã¼ã¹ãããªãã¡ConvT1dãã¼ã¹. ResBlockå°å ¥ãDilatedConvã«ããå容éæ¡å¤§ã«ããããã«ãã³ãã¢ãã«ãã®ãã®ãã¾ãâ¦