Skip to content

I am trying to recognize speech from discord channel audio, vosk is putting out empty strings #1634

Open
@saipavankumar-muppalaneni

Description

I have tried all the possible settings for Models, sample rate, and channels, I am not able to get recognized speech from VOSK, just the empty strings, I have tried the same sample on free speech recognizing websites and they all worked fine with my sample.

def transcribe_audio(audio_file):
global model, recognizer
if not model:
print("Error: Vosk model not initialized.")
return

wf = wave.open(audio_file, "rb")
if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
    print("Audio file must be WAV format mono PCM.")
    return

recognizer = KaldiRecognizer(model, wf.getframerate())
while True:
    data = wf.readframes(4000)
    if len(data) == 0:
        break
    if recognizer.AcceptWaveform(data):
        result = recognizer.Result()
        print(result)
        # transcription = result[14:-3]  # Extract the transcribed text
        # print(transcription)

if recognizer.FinalResult():
    result = recognizer.FinalResult()
    print(result)
    # transcription = result[14:-3]  # Extract the transcribed text
    # print(transcription)

def init_vosk():
global model
if not model:
try:
model = Model(model_name="vosk-model-small-en-us-0.15")
print("Vosk model loaded successfully.")
except Exception as e:
print(f"Error loading Vosk model: {e}")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions