2024-12-01ãã1ã¶æéã®è¨äºä¸è¦§
åãã« éçºç°å¢ ã·ã³ãã«ã«imageãä½ã dockerignoreã®å¯¾å¿ ãã«ãæã«å¿ è¦ãªãã¡ã¤ã«ã®ã¿ãã³ãã¼ãã torchãwhlããã¤ã³ã¹ãã¼ã«ãã ãã«ãã¹ãã¼ã¸ã§ãã«ããè¡ã åã㫠以ä¸ã®Repositoryã«ã¦ ã¢ãã«ãonnxããã¼ortã¢ãã«ã«å¤æããå¦çãdockerã§è¡â¦
åãã« èæ¯ ããããã㨠å®ç¾æ¹æ³ Actionsã§ã®å®è£ Actionsã®è¨å® åãã« ä»å㯠ã¢ã»ããç°å¢ã«ããã ãã©ã³ãéã®ã¢ã»ããå·®ååãè¾¼ã¿CIã«ã¤ãã¦æ¸ãã¦ããã¾ãã ä»åã¯èªåã§ä»¥ä¸ã®ãããªPRã¾ã§ãèªåã§ä½ã£ã¦ããããã®ãæ³å®ãã¦ãã¾ãã ãã¢ã¨ãâ¦
åãã« éçºç°å¢ ã»ããã¢ãã ä¸äººã®ã¨ã¼ã¸ã§ã³ãã«ã¦ã¼ã¶ã¼è³ªåããã è¨æ¶ã·ã¹ãã ã追å è¤æ°äººã®ã¨ã¼ã¸ã§ã³ãã«å¯¾ãã¦è³ªåããã æ°å¹´åä½ã®è¤æ°ã¨ã¼ã¸ã§ã³ãã«å¯¾ãã¦ã®æåã·ãã¥ã¬ã¼ã·ã§ã³ åãã« LLMã»LLMæ´»ç¨ã¢ãã«ã¬ 18æ¥ç®ã§ãï¼ genagentsã¯ãçâ¦
åãã« äºåèª¿æ» éçºç°å¢ è©ä¾¡ãã¼ã¿ 対象ã®ã¢ãã« è©ä¾¡çµæ è©ä¾¡æ¹æ³ wespeaker xvector_jtubespeech åãã« é³å£°ãã¼ã¿ãæåãè¶ããããéã«ãè¤æ°äººã®é³å£°ãå ¥ã£ã¦ããå ´åã« ã誰ããã¤è©±ããã®ãããæ¨å®ããæè¡ã¨ã㦠話è ãã¤ã¢ã©ã¤ã¼ã¼ã·ã§ã³ãâ¦
éçºç°å¢ ã»ããã¢ãã å®è¡ éçºç°å¢ python 3.9 uv ã»ããã¢ãã ã©ã¤ãã©ãªãã¤ã³ã¹ãã¼ã«ãã¾ã uv pip install datasets[audio] soundfile pydub å®è¡ 以ä¸ã§ãã¼ã¿ã»ããããã¦ã³ãã¼ããã¦ãwavå½¢å¼ã§ä¿åãã¾ã from datasets import load_dataset iâ¦
åãã« éçºç°å¢ ã»ããã¢ãã å®è¡ åãã« ä»åã¯å®çªã®pyanonoteã¨whisperã§è©±è ãã¤ã¢ã©ã¤ã¼ã¼ã·ã§ã³ãè¡ã£ã¦ã¿ã¾ã 以ä¸ã§è¨äºã®ãµã³ãã«ãªãã¸ããªãå ¬éãã¦ãã¾ã github.com éå»ã«ã¯ã»ãã®ã©ã¤ãã©ãªã§ã試ãã¦ããã®ã§ãã»ãã«ã©ã®ãããªã©ã¤ãã©â¦
åãã« éçºç°å¢ ã»ããã¢ãã CLIããå®è¡ Pythonã®ã³ã¼ãã§å®è¡ åè åãã« wespeakerã§è©±è ãã¤ã¢ã©ã¤ã¼ã¼ã·ã§ã³ãè¡ã£ã¦ã¿ã¾ãã ã¢ãã«ã¯ä»¥ä¸ã§ã huggingface.co 以ä¸ã«è¨äºã®å 容ã®Repositoryãå ¬éãã¦ãã¾ã github.com éçºç°å¢ windows11 pythonâ¦
åãã« éçºç°å¢ ã»ããã¢ãã 話è ãã¤ã¢ã©ã¤ã¼ã¼ã·ã§ã³ãå®è¡ åãã« powerset_calibrationã使ã£ã¦é³å£°å ã®è©±è ãã¤ã¢ã©ã¤ã¼ã¼ã·ã§ã³ãè¡ã£ã¦ã¿ã¾ããè«æã«ãããã¼ã¿ã»ããã«ã¯æ¥æ¬èªãå«ã¾ãã¦ããªããããæ¥æ¬èªã®é³å£°ã«ä½¿ãå ´åã¯èªåã§å¦ç¿ãè¡ãå¿ â¦
åãã« éçºç°å¢ åå 対å¿æ¹æ³ åè åãã« NVIDIAãå ¬éãã¦ããæ¥æ¬èªé³å£°åãã®é³å£°èªèã¢ãã« nvidia/parakeet-tdt_ctc-0.6b-jaã Windowsã§åããéã«ä»¥ä¸ã®ã¨ã©ã¼ãåºãå¤ããã対å¿æ¹æ³ãè¨è¼ãã¾ã packages\nemo\collections\asr\models\configs\aâ¦
åãã« éçºç°å¢ ã»ããã¢ãã æåãè¶ãã®å®è¡ åãã« reazon-researchãæ°ããé³å£°èªèã¢ãã«ãå ¬éããã¦ããã®ã§è§¦ã£ã¦ã¿ã¾ã ReazonSpeech v2.1ããªãªã¼ã¹ãã¾ããï¼v2.1ã§è¿½å ããæ°ããæ¥æ¬èªASRã¢ãã«ãReazonSpeech-k2-v2ã¯ONNXãã©ã¼ãããã§æä¾â¦
åãã« éçºç°å¢ ã»ããã¢ãã å®è¡ åãã« CosyVoiceã®æ師ããé³å£°ãã¼ã¯ãã¤ã¶ã¼ã«é¢ããã³ã¼ããåºã¦ããªãã£ããã issueã以ä¸ã®Repositoryã«ã¦åç¾å®è£ ãè¡ããã¾ããããã¡ããåããã¦ããã¾ã github.com 以ä¸ã§ã©ã¤ãã©ãªã®verãåºå®ããRepositoâ¦
åãã« éçºç°å¢ DNSMOSã«ã¤ã㦠ç°å¢æ§ç¯ é³å£°ãã¡ã¤ã«ã®è©ä¾¡ åãã« AI声ã¥ããæè¡ç ç©¶ä¼ ã¢ãã«ã¬ 12æ¥ç®ã§ãã ä»å㯠é³å£°åæã«ããããã¼ã¿ã»ããä½æã®ä¸ã¤ã®éè¦ãªè¦ç´ ã§ãã ãã¼ã¿ã»ããã®é³å£°è©ä¾¡ã«ã¤ã㦠æ¯è¼çæ°ããã®è©ä¾¡ææ³ããã³ã©ã¤ãâ¦
åãã« éçºç°å¢ ã»ããã¢ãã é³å£°ãã¡ã¤ã«ããæ¨è« ãªã¢ã«ã¿ã¤ã ã®ã¤ã³ã¿ã©ã¯ãã£ãDemoãåãã åãã« speech to speechã®ã©ã¤ãã©ãªã® Freeze-Omniãåããã¦ããã¾ããããã¯å¾æ¥ã®ãã㪠speech to text(STT) â text to text(LLM) â text to speech(Sâ¦
åãã« éçºç°å¢ ã»ããã¢ãã ã²ããªãã«å¤æ è¾æ¸ã®å¤æ´ åãã« TTSã®å¦ç¿ã®ã²ã¨ã¤ã§ãæååããã¹ã¦ã²ãããªã«ãããå ´åãããã¾ãããã®éã«ç°¡åã«ä½¿ãã sudachiã使ã£ã¦å¦çããã¦ã¿ã¾ã 以ä¸ã«ãµã³ãã«ãªãã¸ããªãå ¬éãã¦ãã¾ã github.com éçºâ¦
åãã« éçºç°å¢ ã»ããã¢ãã 話è åé¢ã®å®è¡ åãã« æè¿å ¬éããã WeSpeakerã使ã£ã¦ é³å£°ãã¼ã¿ã§ããããè¡ã£ã¦ããã¾ããä»åã¯ãé³å£°ãã¼ã¿å ã®è©±è åé¢ããã³è©±è æ°ã®ç¹å®ãè¡ãã¾ã github.com 以ä¸ã«ã¦ãµã³ãã«ãªãã¸ããªãå ¬éãã¦ãã¾ã https:â¦
åãã« Demo éçºç°å¢ å®è£ åãã« Unityã§rayã®å½ããå¤å®ã§ãããã°ãããéã« Gizmoã Drawlineã使ã£ã¦è¡ããã¨ãå¤ã ããã¾ããä»å㯠Gizmoã使ããªã(MonoBehaviourãç¶æ¿ãã¦ããªã or æ´æ°é¢æ°ã OnDrawGizmosã¾ã§ä¼æã§ããªã)å ´åã®æ¹æ³ã¨ã㦠Drâ¦