mp4grep is a tool that transcribes and searches audio files, caching the results for fast repeated searches. Out of the box, it only supports single-channel, 16000 Hz wav files. mp4grep ships with mp4grep-convert
which converts mp3, mp4, ogg, webm, mov, wav, and avi to the correct format.
mp4grep depends on Vosk to transcribe audio. You can download models from Vosk's official list.
The latest release provides a pre-built executable for x86 Linux. You can also refer to the most current build instructions, which require installing the OCaml compiler.
The mp4grep executable only takes single-channel, 16000 Hz wav files as input. Running make install
also provides you with mp4grep-convert
, which is a Bash script that will take directories or audio files as its arguments, extract audio files from directories, and convert them to wav files using ffmpeg.
mp4grep was previously written in Java, and later in C++. Although we learned a lot from using those languages, we've moved to OCaml because we think its robustness will help mp4grep to survive and improve with time. The OCaml ecosystem is unfamiliar to most people: if you're building from source, it's important to be aware of which compiler you are using and where dependencies are stored. We recommend following the latest build instructions and using Opam.
Prior versions of mp4grep came bundled with a ffmpeg executable or made calls to ffmpeg in the user's shell. Both caused compatibility issues and hidden transcription errors; this method was also insecure and unpredictable. Although it's inconvenient to convert files before transcribing them, we thought that the alternative was worse. New versions will separate these concerns until we can find a better option.
Pull requests are welcome. Please open a pull request if you have a bug to fix or a cool idea.
mp4grep currently supports Linux.