A simple python script for transcribing and translating videos locally, combining Faster-Whisper transcription capability with Argos Translate/Opus-MT translation engine.
No internet required* - everything is processed locally!
*after the dependencies and models are downloaded
- Python 3.10.18 (specifically this version due to dependency constraints)
- FFmpeg: For audio extraction from videos
- NVIDIA GPU: For CUDA acceleration (optional, CPU is supported but slower)
# 1. Navigate to project
cd goober
# 2. Initiate and activate virtual environment
uv venv --python 3.10.18
./venv/Scripts/activate
# 3. Sync dependencies
uv sync
# 4. Run interactive mode
uv run python main.pyNote
First run may take longer due to model downloads. All models are cached locally for future use.
This project uses uv for fast Python package management:
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh # bash
# or
powershell -ExecutionPolicy Bypass -c "irm https://astral.sh/uv/install.ps1 | iex" # PowerShell
# Clone this repository
git clone https://github.com/narendnp/goober
# Navigate to project directory
cd goober
# Create a virtual environment
uv venv --python 3.10.18
# Activate the virtual environment
./venv/Scripts/activate # bash
# or
./venv/Scripts/activate.ps1 # PowerShell
# Install dependencies
uv syncNote
Make sure you have correct CUDA version for your GPU defined on the pyproject.toml file before running uv sync. For more information, go here.
If you plan to use Opus-MT translation engine, you need to install NLTK data:
# Enter Python environment
uv run python
# In Python interpreter:
>>> import nltk
>>> nltk.download('punkt_tab')
>>> exit()For more information, go here.
For GPU acceleration, ensure you have:
- NVIDIA GPU with CUDA support
- CUDA toolkit installed
- PyTorch with CUDA support (already included in dependencies)
Note
Make sure you have correct CUDA version for your GPU defined on the pyproject.toml file. For more information, go here.
Inside the virtual environment, run the main script:
uv run python main.pyThe script will prompt you for:
- Video path: Path to your input video file
- Source language: Language code (e.g.,
en,fr,ja) orautofor detection - Target language: Desired translation language (e.g.,
id,en,es) - Silence duration: Minimum silence duration for VAD (default: 500ms)
- Threshold: VAD sensitivity (0.1-1.0, default: 0.5)
- Translation engine:
argos(faster) oropus(more accurate)
You can also run the script directly from the command line:
uv run src/tl_argos.py "path/to/video.mp4" \
--language auto \
--to en \
--vad-ms 500 \
--vad-threshold 0.5uv run src/tl_opus.py "path/to/video.mp4" \
--language auto \
--to en \
--vad-ms 500 \
--vad-threshold 0.5Note
First run may take longer due to model downloads. All models are cached locally for future use.
--model: Whisper model size (large-v3,distil-large-v3,medium,small)- Larger models are more accurate but slower
- Default:
large-v3
--device: Processing device (cudafor GPU,cpufor CPU)--compute-type: Precision level (float16,int8_float16)--language: Source language code orautofor detection--to: Target language code (required)--beam-size: Beam search size for transcription (default: 5)
--vad-ms: Minimum silence duration in milliseconds (default: 500)--vad-threshold: Speech detection sensitivity (0.1-1.0, default: 0.5)--no-vad: Disable VAD filtering
--batch-size: Translation batch size (default: 32)
Faster-Whisper, Argos Translate, and Opus-MT generally support a wide array of languages.
These are some of the popular language codes:
- English:
en - Indonesian:
id - Spanish:
es - French:
fr - German:
de - Japanese:
ja - Chinese:
zh - Korean:
ko - Arabic:
ar
Please refer to the respective library's documentation for more details.
The tool generates two subtitle files on the directory of the video file:
- Original transcription:
{video_name}.orig.srt - Translated subtitles:
{video_name}.{target_lang}.srt
1. Failed to build/install fasttext libary (Windows)
-
Try installing it using the pre-built wheel (credit to FKz11)
Inside this repo's directory, run:
uv pip install https://github.com/FKz11/fasttext-0.9.3-windows-wheels/releases/download/0.9.3/fasttext-0.9.3-cp310-cp310-win_amd64.whl
-
Re-run
uv sync
Q: Why is it named goober?
A: Because I can see how gooners would use this to generate subtitles to watch JAV. (I know this doesn't explain it but I just think it's funny).
MIT License. See LICENSE file for details.
- Faster-Whisper: Speech-to-text transcription
- Argos Translate: Fast translation library
- Opus-MT: High-quality translation models
- FFmpeg: Audio/video processing
- UV: Ultra-fast Python dependency management
