2021-01-01ãã1å¹´éã®è¨äºä¸è¦§
æ¥è ãSIMã50æç¨æããã±ããWi-Fiæ©å¨ã100å°ç¨æãã. ãã±ããWi-Fiæ©å¨ã«ã¯ä»®æ³çãªSIMãåçã«å²ãæ¯ããã. æ®éã¯æ©å¨100å°ã«å¯¾ãã¦100æã®SIMãå¿ è¦ãªã®ã«ååã§æ¸ãã®ã§ãå®ãã§ãã. ãã ãã50æåã®å®¹éã使ãåã£ããå½ç¶ç¹ãããªããªãã®ã§â¦
WaveRNNã«é¢ããè¨äºé
é³å£°æ³¢å½¢çæã¿ã¹ã¯ã«ããã¦çæããã波形ã«å¯¾ããSTFTãæ失é¢æ°ã«ä½¿ãç 究ã®ãµã¼ã㤠Parallel WaveGAN NSF HiFi-GAN MultiBand-MelGAN StyleMelGAN GANç³»ã§SoTAãªvocoderã¯ã¿ããªæ¡ç¨ãã¦ãã¤ã¡ã¼ã¸ãã. model loss name reference loss intent PWG1 muâ¦
ã·ã¹ãã ã®ç¹æ§: åæªãæãããããã®ã¯å®å®æ§ã®ã¿ æªãæ» ã³ãã§ãåãåºããã§ããªããã·ã¹ãã ã¯å®å®æ§ã®é«ãæ¹ã»ã¨ãã«ã®ã¼ã®ä½ãæ¹ã¸æºãããªããæµãã¦ãã ãã¨ãä¸ç³ãæãã¦ãã¨ãã«ã®ã¼æ§é ãå¤ãããªããã°å¾å´çµããå¯è½æ§ãé«ã (ä¸æçã«ç±éâ¦
確çåå¸ããã®å¾®åå¯è½ãªãµã³ããªã³ã°ææ³ ç¢ºçãã¯ãã«ã«ãã¤ãºå ¥ãã¦argmaxããã°indexãåãããã®ã¾ã¾one-hot vectorã«ãã§ãã. => ãã¤ãºã®å ¥ãæ¹ã工夫ããã¨åå¸éãã®ãµã³ããªã³ã°ãå¯è½ (Gumbel-Max Trick) ãµã³ããªã³ã°ã¯ãããã©å¾®åãããã aâ¦
manipulation = call(sound, "To Manipulation", 0.01, 75, 600) pitch_tier = call(manipulation, "Extract pitch tier") call(pitch_tier, "Multiply frequencies", sound.xmin, sound.xmax, 2) call([pitch_tier, manipulation], "Replace pitch tier") sâ¦
VCC2020 T10ã¢ãã«1 (top score). ASRãã¼ã¹ã®rec-synã§MOS 4.0 & similarity 3.6 ãéæ. Models ASR SI-ASR (N10ã¨ä¸ç·?) Conversion model Encoder-Decoderã¢ãã« (â S2S). Encoder LSTM -> 2x time-compressing concat2 -> LSTM Decoder Attentionä»ãAR-â¦
Non-local Neural Networks (2018) ã¢ã¸ã¥ã¼ã«ã®ãæ°æã¡ ãç§ã欲ãããã®ã ããã¼ãã¶ãã ããã FC: ã¨ã«ããå ¨è¦ç´ ãåããã Conv: 決ãæã¡ã§å±æã ãåããã RNN: hiddent-1ã ãç´æ¥åããã => ç¾å¨å¤ã«åºã¥ãã¦åçã«ãå ¨é·ãã欲ããè¦ç´ ã ããåâ¦
FastSpeechã«ãããæ¨å®ãå ¼ãã¦ã¿ã¾ããè«æ. Durationã¨åããphonemeåä½ã§PitchPredictorãå¦ç¿. Scalaräºæ¸¬ããå¤ãlatentã¨åãFeature次å ã«å¤æããã®ã¡ããªãã¨ãã sumï¼segFCã§Feature次å ã«é£ã°ãã¦ããã®ã§å¦ç¿å¯ã«ãªãããã®è¾ºã§é³é«æ¬¡å ã§ãæâ¦
éããå·§ããï¼å®ããã¯å¾®å¦ï¼FastSpeech æ¦è¦ Transformerã§é³ç´ åãç³»åå¤æãåçã«ã¢ãããµã³ããªã³ã°ãTransformerã§âç³»åãmel-specã¸å¤æ. 以ä¸. åçã¢ãããµã³ããªã³ã°ã¯ LengthRegulator ã§å®è¡ãããé³ç´ ãã¨ã®åçã DurationPredictior ã§åçâ¦
ã¢ãã«ãä¸é表ç¾ã¨ãã¦ç¹å®ã®å¤ãåãããã«å¦ç¿ãã¦ã»ãã. A: ã¢ãã«ãã¤ã¢ã¹ã§èªç¶ã¨ããå¦ç¿ããããã«ç¥ã B: ã¢ãã«åå²ããã¦åå¥å¦ç¿ C: ãã®ä¸é表ç¾ã«å¯¾ãã¦Lossãè¨å® D: Lossãè¨å®ããããã§æ¬¡ã®å±¤ã¸ã¯æ師ãã¼ã¿ã渡ãï¼teacher forcingçï¼ â¦
éãã«ã¼ãã«ãµã¤ãºã®Convã並åã«ä¸¦ã¹ããã¿ã¼ã³. ãã£ãã«ãã¨ã«ã«ã¼ãã«ãµã¤ãºãéãã¨è¦åããã¨ãã§ãã. åºå次å ã¯strideã¨channelæ°ã§æ±ºã¾ãã®ã§multi-resolutionã«ãããã©ããã¨ã¯ç¡é¢ä¿. ã¡ããã¡ããè²ããªã¨ãã§åéçºããã¦ãã¤ã¡ã¼ã¸. Tacotrâ¦
主張ãTTSããããªãWaveNetãè¤éãªç¹å¾´éã§ç´æ¥æ¡ä»¶ä»ãããã "è¯ãchar2specã¢ãã«+spec2wave WaveNet" ããããã æ¦è¦ Attention Seq-to-Seq ã§æååããã¡ã«ã¹ãã¯ããã°ã©ã ãçæãWaveNetã§æ³¢å½¢çæ. LSTM Encoderãæç« ã丸åã¿ãæçµåºåãzã¨â¦
3Dã¯VTuberã«ã¨ã£ã¦ãããã°æ´»ããããç´ æ´ããããªãã·ã§ã³ãã ããå¿ è¦æ¡ä»¶ã§ã¯ãªã. ãã3Dãããæ¬è³ªãªãã°ãè«ççã«YouTuberã«åã¦ãªã. ãªããªãç©ççãªäººéã®3D度ã¯ç©¶æ¥µã«é«ããã. ããã¦å®éãVTuberã¯3Dã«ãã ãããªãé åã¸ãççºçã«æµ¸éãã¦â¦
å¤è¨èªASRã®äºåå¦ç¿ã«CPCãå©ç¨ãæ¢åã®æ師ããã¢ãã«ã¨åç以ä¸ã®æ§è½ãçºæ®. èæ¯ å°ãã¼ã¿ã®æã©ãããã => è¿ããã¡ã¤ã³ã®å¤§ãã¼ã¿ã§pre-training & Transfer learning ASRã¯é³ç´ ã£ã½ããã®ãäºåå¦ç¿ã§ããã°ããã¨å ±ç¨ã§ããã => CPC ææ³ CPCã®æâ¦
MelGANã«å¯¾ãã¦ã¢ãã«ã»Lossã®æé©åãããä¸ã§ãæçµåºåãã£ãã«ãè¤æ°ã«ãã¦ãããããµããã³ããäºæ¸¬. é称 MB-MelGAN ã¢ãã« MelGANãã¼ã¹ãããªãã¡ConvT1dãã¼ã¹. ResBlockå°å ¥ãDilatedConvã«ããå容éæ¡å¤§ã«ããããã«ãã³ãã¢ãã«ãã®ãã®ãã¾ãâ¦
Libri-light 㯠LibriVox ããçæãããã³ã¼ãã¹1. ãªã®ã§ LibriSpeech ã®è¦ªæ2. Unlabelled Speech Training Set unlab-60k unlab-6k unlab-600 Dev and Test Set (totally same as LibriSpeech3) dev-clean: 5.4 hours dev-other: 5.3 hours test-clean: â¦
ä»ããã¥ã¼ã©ã«ãã³ã¼ãã¯å½ããåã§ãç¨éã«åããã¦å¤ç¨®å¤æ§. ãã®åç¹ãWaveNet. ä»ã§ã¯WaveNetãã®ãã®ã¯ä½¿ãããªããããã®æ ¹æ¬çã¢ã¤ãã¢ã¯å½ããåã¬ãã«ã«æ®åããã¢ã¸ã¥ã¼ã«ãåæã§ä½¿ããã¦ãã. ããã°æ°å¤å ¸ã§ããWaveNetããã¾æ¯ãè¿ã. Summarâ¦
çæã¢ãã«: ãµã³ãã«åå¸å ¨ä½ãå¦ç¿ çæã¢ãã«ã¯ããªãé«çãªã¢ãã«. ããç¨ãªãµã³ãã«ã®ãã©ã¨ãã£ãå«ããå ¨ã¦ãã¢ããªã³ã°ãããã¨ãã. çæã¢ãã«ã®å®ç¨æã«ã¯åå¸ã®ç´°é¨ãç¡è¦ããã»ããçµæãè¯ããã¨ãå¤ã ãã. => çæã¢ãã«ã¨"温度"ãã©ã¡ã¼ã¿:â¦
çæã¢ãã«ã«ã¯ãã°ãã° temperature ãã©ã¡ã¼ã¿ãè¨å®ããã. ããã¤ãä½ã§ãä½ããã¦ãã©ã便å©ãªã®ãã解説ãã. tempered softmax Dahl, et al. (2017). Pixel Recursive Super Resolution. Parmar, et al. (2018). Image Transformer. "truncation tricâ¦
Lockãã¡ã¤ã«ã¯ä¾åæ§ã®ãã¼ã¸ã§ã³ãå³å¯ã«æå®ãããã¨ã§åä½ãä¿è¨¼ãããã¨ãç®çã¨ãã. ä¾åæ§ã®ä¸ã«ã¯ãã¼ãã¦ã§ã¢ã¨ã®å ¼ãåãã§ï¼åä½ã¯åãï¼ãã¼ãã¦ã§ã¢åºæãã¼ã¸ã§ã³ãè¦æ±ããå ´åããã. ãã®2ã¤ã¯åã¿åãããæªã. Lockã§å³å¯ã«ç¸ãã¨ãã¼ãâ¦
BLAS: Intel MKL, OpenBLAS MKL-DNN PyTorch TorchScript libtorchããå¼ã³åºããJIT-compiled ã¢ãã«. æåã®20ã«ã¼ããããã§æé©åãèµ°ãã£ã½ã (ref). Fused-opsã¨ããããã. BLAS PyTorch㯠pip installæã®CMakeã§BLASã©ã¤ãã©ãªãèªåæ¤åºãã¦å©ç¨. â¦
Makefileã®ç®çã¯ãã¡ã¤ã«æ´æ°æ¥ã¨ãã¡ã¤ã«ä¾åæ¨ã«åºã¥ããç¡é§ã®ãªãã¬ã·ã/ã³ãã³ãå®è¡ã§ãã. (Target, Prerequisites, Recipe) ã®3ã¤çµãèãã. make Target ãã㨠Prerequisites ãã¡ã¤ã«ã®æ´æ°ã確èªããæ´æ°ãããã° Recipe ãå®è¡ãããªããã°ãâ¦
API docs ããªãã®ã§ãå ¨ä½ãè¦éãããªã¹ãã¨ãã¦. config <=> others I/O from empty (.create()) dictionary (.create({"key": "value"})) list (.create([1, 2, 3])) YAML string (.create("k1: v1 /n k2: v2")) YAML file (.load(path)) dot-list (.froâ¦
Pythonã¯åãã³ããæã£ã¦ãã¦ãå®è¡ævalidationãæããªã. ãã ãPythonã¯åãã³ããã¢ããã¼ã·ã§ã³ã¨ãã¦å®è¡æã¾ã§æ±ãã¦ãã. ãªã®ã§ã«ã¹ã¿ã validationãä»è¾¼ããã¨ãã§ãã. OmegaConf 㯠Structured config ã®å©ç¨æã«validationãæä¾ãã. MISSINâ¦
åºæ¬: ãPython ã©ã³ã¿ã¤ã ã¯ãé¢æ°ãå¤æ°ã®åã¢ããã¼ã·ã§ã³ãå¼·å¶ãã¾ãããã by Python Docs åºç¤çãªç¨éã¯åãã§ãã¯ã»IDE/Linterãµãã¼ã Pythonã¯ã¢ããã¼ã·ã§ã³ãæ軽ã«åå¾ã§ããï¼X.__annotations__ï¼ã®ã§ãã©ã³ã¿ã¤ã ã§ãã®åãå©ç¨ã§ãã. ã¤ã¾â¦
Hydraã®ç¹å¾´: å®è¡æã®configå·®ãæ¿ãã¦ã¼ãã£ãªãã£ãå å®. Defaults List: yamlå ããã®ä»yamlã®å¼ã³åºã/composeãå¯è½ã«ãã Config Group: CLIå¼æ°ã§ãã£ã¬ã¯ããªæ§é ã«åºã¥ããä»ã®yamlã®å¼ã³åºã è¦ç´ ãã¨ã«Configãã¡ã¤ã«ãåå²ãããã£ã¬ã¯ããªã§ã°â¦
ä¾: 125 msé·ã®FIRãã£ã«ã¿ 125 msã«å«ã¾ãããµã³ãã«æ°ã¯srã«ä¾å. sr=16kHzã ã¨2000ãµã³ãã«ãsr=8kHzã ã¨1000ãµã³ãã«. 帯åã2åå²ãã㨠16kHz/2000ãµã³ãã« => 8kHz/1000ãµã³ãã«x2 ã«ãªã. æ°ãããµã³ãã«ãæ¥ããã³ã«FIRãã£ã«ã¿ããããããããã«â¦
Multiband-WaveRNN ã¯ãWaveRNNã¯è¡¨ç¾åãä½ããã¦ããã¨ãã仮説ã®ä¸ã§ããµã¤ãºãå¤ãã¦ããªãWaveRNNã¸ãµããã³ãNåã®åæäºæ¸¬ã課ããã¢ãã«1. ãªãã¨å®éã«MOSå·®ç¡ãã§Nãã³ãäºæ¸¬ã«æå. åä½å¨æ³¢æ°ã1/Nã«ã§ããã®ã§RTFãå¤§å¹ ã«æ¹å. èæ¯ã»ã¢ãã« Waâ¦
Gated Activation Unit ã¯æ´»æ§åé¢æ°/ã¦ãããã®ä¸ç¨®. output = tanh(Wfilter â input) ⦿ Ï(Wgate â input) tanh(conv(input))ã§éç·å½¢å¤æããåºåã«å¯¾ããsigmoid(conv'(input)) ã§åºã¦ãã 0~1 ãç¨ããGatingãããã¦ããã¨ã¿ãªãã. Gated PixelRNNã«ãâ¦