PCã§åçä¸ã®é³å£°ããªã¢ã«ã¿ã¤ã ã§æåèµ·ããã§ããã¨ãåå¹æ©è½ããªãåç»ã®åçããå¤å½ã¨ã®ãªã³ã©ã¤ã³ä¼è°ã§ä¾¿å©ã§ããã
å
æ¥ãOpenAIãå
¬éããWhisperã¯ãé³å£°ãã¡ã¤ã«ããæåèµ·ãããããããã¼ã«ãæä¾ããã¦ãããããªã¢ã«ã¿ã¤ã ã§å¦çãããã¼ã«ã¯æä¾ããã¦ããªãã
ããã§ãPythonã¹ã¯ãªããã§ããªã¢ã«ã¿ã¤ã ã§æåèµ·ãããããã¼ã«ãä½æããã
ã«ã¼ãããã¯é²é³
SoundCardã使ãã¨ãPCã§åçããã¦ããé³å£°ãé²é³ãããã¨ãã§ããã
pip install SoundCard
ã§ã¤ã³ã¹ãã¼ã«ããã
import soundcard as sc with sc.get_microphone(id=str(sc.default_speaker().name), include_loopback=True).recorder(samplerate=SAMPLE_RATE, channels=1) as mic: while True: data = mic.record(BUFFER_SIZE)
ã®ããã«ãã¦å¦çãè¡ãã
ã¡ã¤ã³ã¹ã¬ãã以å¤ã§ã¯åä½ããªãã®ã§ã注æãå¿
è¦ã§ããã
ãªã¢ã«ã¿ã¤ã æåèµ·ãã
Whisperã§é³å£°ãèªèããã«ã¯ã1ã¤ã®ã»ã³ãã³ã¹ãå«ããããã®é·ãã®é³å£°ãå¿
è¦ã§ããã
ãªã¢ã«ã¿ã¤ã ã§æåèµ·ããããå ´åãæ°ç§ééãã¨ã«é³å£°ãåºåã£ã¦å¦çãè¡ãã
é³å£°ãåèªã®éä¸ã§åºåãã¨ã誤èªèãããããã§ããã ãç¡é³ã®åºéã§åºåãæ¹ãããã
ããã§ããããã¡ãªã³ã°ããé³å£°ã®æ«å°¾ãããã§ãç¡é³ã®åºéãæ¢ãããã®ä½ç½®ã§åºåãããã«ããã
åºåã£ãæ®ãã®é³å£°ã¯ã次ã®å¦çã®å
é ã«çµåããããã«ããã
ç¡é³ã®åºéã®æ¢ãæ¹ã¯ããããã¡ãªã³ã°ããé³å£°ã®å¾å4/5ã®åºéã§ãé³å§ã®ç§»åå¹³åã®æå°å¤ã®ä½ç½®ã¨ããã
Whisperã®å¦ç
Whisperã§é³å£°ãæååã«å¤æããå¦çã¯ãREADMEã«ãµã³ãã«ã³ã¼ããããã®ã§ãã»ã¼ãã®ã¾ã¾æµç¨ã§ããã
float32ã®é³å£°ãwhisper.pad_or_trimã§ããã£ã³ã°ãã¦ãwhisper.log_mel_spectrogramã§ãã°ã¡ã«ã¹ãã¯ããã°ã©ã ã«å¤æããwhisper.decodeã§ããã¹ãã«å¤æããã
è¨èªã®èªèã¯ãmodel.detect_languageã§è¡ããã
é²é³ã¨æåèµ·ããã®éåæå¦ç
æåèµ·ãããéã«åããªãå¯è½æ§ããããããé²é³å¦çã¨æåèµ·ããã®å¦çã¯éåæã«è¡ãã
é²é³å¦çã§ã¯é³å£°ãå¦çåä½ã«åºåã£ã¦ãqueueã«è¿½å ããæåèµ·ããã®å¦çã¯å¥ã¹ã¬ããã§ãqueueããåãåºãã¦å¦çãè¡ãã
以ä¸ã®å¦çãå®è£ ããã³ã¼ãã¯ã以ä¸ã®éãã§ããã
LoopbackWhisper.py
import whisper import soundcard as sc import threading import queue import numpy as np import argparse SAMPLE_RATE = 16000 INTERVAL = 3 BUFFER_SIZE = 4096 parser = argparse.ArgumentParser() parser.add_argument('--model', default='base') args = parser.parse_args() print('Loading model...') model = whisper.load_model(args.model) print('Done') q = queue.Queue() b = np.ones(100) / 100 options = whisper.DecodingOptions() def recognize(): while True: audio = q.get() if (audio ** 2).max() > 0.001: audio = whisper.pad_or_trim(audio) # make log-Mel spectrogram and move to the same device as the model mel = whisper.log_mel_spectrogram(audio).to(model.device) # detect the spoken language _, probs = model.detect_language(mel) # decode the audio result = whisper.decode(model, mel, options) # print the recognized text print(f'{max(probs, key=probs.get)}: {result.text}') th_recognize = threading.Thread(target=recognize, daemon=True) th_recognize.start() # start recording with sc.get_microphone(id=str(sc.default_speaker().name), include_loopback=True).recorder(samplerate=SAMPLE_RATE, channels=1) as mic: audio = np.empty(SAMPLE_RATE * INTERVAL + BUFFER_SIZE, dtype=np.float32) n = 0 while True: while n < SAMPLE_RATE * INTERVAL: data = mic.record(BUFFER_SIZE) audio[n:n+len(data)] = data.reshape(-1) n += len(data) # find silent periods m = n * 4 // 5 vol = np.convolve(audio[m:n] ** 2, b, 'same') m += vol.argmin() q.put(audio[:m]) audio_prev = audio audio = np.empty(SAMPLE_RATE * INTERVAL + BUFFER_SIZE, dtype=np.float32) audio[:n-m] = audio_prev[m:n] n = n-m
â»2022/10/16追è¨ï¼Ctrlï¼Cã§çµäºããããããdaemon=Trueã追å ãã
GitHub: GitHub - TadaoYamaoka/LoopbackWhisper
å®è¡ä¾
ã³ãã³ãã©ã¤ã³ããã¹ã¯ãªãããå®è¡ããPCã§é³å£°ãåçããã¨ãèªèãããããã¹ãã表示ãããã
D:\src\LoopbackWhisper>python LoopbackWhisper.py Loading model... Done en: And so my fellow Americans en: Ask not. en: What your country can do for you? en: Ask what you can do for your country.
使ç¨ããã¢ãã«ãå¤æ´ããå ´åã¯ã--modelãªãã·ã§ã³ã§æå®ããã
>python LoopbackWhisper.py --model large
ãµã¼ãã§æåèµ·ãããã
PCã«GPUããªãå ´åã¯ãå¦çè² è·ã大ãããªãããããªã³ã©ã¤ã³ä¼è°ãªã©ã§ã¯æ¯éãã§ãå¯è½æ§ãããã
ãã®å ´åã¯ãPCã§ã¯é²é³ã®ã¿è¡ããé³å£°ããµã¼ãã«ãªã¢ã«ã¿ã¤ã ã«éä¿¡ãã¦ããµã¼ãã§æåèµ·ããããã¨ããã
PCã«ã¯Pythonã®å®è¡ç°å¢ããªãå ´åãããã®ã§ãC#ã§é²é³å¦çãè¡ããSocketéä¿¡ã§ãµã¼ãã§æåèµ·ãããããã¼ã«ãä½æããã
GitHub - TadaoYamaoka/StreamingWhisper
ã¾ã¨ã
PCã§åçä¸ã®é³å£°ãWhisperã§ãªã¢ã«ã¿ã¤ã ã«æåèµ·ããããæ¹æ³ã«ã¤ãã¦è¨è¿°ããã
ãªã¢ã«ã¿ã¤ã å¦çããããã«ãé³å£°ãåèªã®éä¸ã§åºåããªãããã«ãããã¨ã¨ãé²é³ã¨æåèµ·ãããéåæã§å¦çãããã¨ãèæ
®ãã¦å®è£
ããã
ã¾ããPCã®å¦çè² è·ããããªãããã«ãé³å£°ãSocketã§ãµã¼ãã«éä¿¡ãã¦ããµã¼ãå´ã§æåèµ·ããããæ¹æ³ã«ã¤ãã¦ãè¨è¼ããã