- Transcribes audio (e.g., WAV) into time‑aligned musical note events.
- Writes two outputs:
.midStandard MIDI for DAWs/notation..txtcompact, LLM‑friendly text listing notes and basic meta.
- Prefers a high‑quality polyphonic model (Basic Pitch) when installed; otherwise falls back to a monophonic f0‑based method.
- Polyphonic path (preferred):
- Uses Spotify’s Basic Pitch (>= 0.4) to infer multiple simultaneous notes from the input audio file.
- Internally runs a Core ML model (mlpackage) via
coremltoolson macOS; no manual model download needed.
- Monophonic fallback:
- Uses
librosa.pyinto track f0, then segments stable pitch regions into discrete notes.
- Uses
- Musical meta:
- Tempo estimated via
librosa.beat.beat_track. - Key estimated via chroma features + Krumhansl‑Schmuckler profiles (simple best‑match label).
- Tempo estimated via
- MIDI writing:
- Uses
mido; converts seconds→ticks using the estimated tempo and emits note on/off with velocities.
- Uses
- Language/runtime: Python 3.9–3.11.
- Core libraries:
numpy,librosa,soundfile,mido. - Optional polyphonic:
basic-pitch(>= 0.4.0). The code calls the newer API by passing the audio file path intobasic_pitch.inference.predict. - Package layout:
audio2llm/transcribe.py— transcription pipeline (polyphonic or mono fallback) + tempo/key estimation.audio2llm/midi.py— write events to MIDI.audio2llm/textfmt.py— produce LLM‑friendly text lines.audio2llm/types.py— dataclasses for notes and meta.audio2llm/cli.py— command‑line interface.
Option A — Conda (recommended)
- Create and activate an env named
audio2llmfrom the provided spec:conda env create -f environment.yml -n audio2llmconda activate audio2llm
- Install polyphonic support (recommended):
pip install basic-pitch
- macOS Apple Silicon note: Basic Pitch 0.4 runs via Core ML and installs
coremltoolsautomatically. If polyphonic still appears to fall back to mono, ensurebasic-pitchis installed and re‑run. In some setups, installingonnxruntime==1.18.1can help, but it is not strictly required for Core ML.
Option B — Minimal pip (mono only)
pip install numpy librosa soundfile mido
CLI
- Basic run (auto polyphonic if available):
python -m audio2llm.cli input.wav --out-midi output.mid --out-text output.txt --prefer-polyphonic
- Force monophonic fallback:
python -m audio2llm.cli input.wav --out-midi output.mid --out-text output.txt --no-polyphonic
API
- Example:
from audio2llm import transcribe_audio, save_midi, events_to_linesresult = transcribe_audio("input.wav", prefer_polyphonic=True)save_midi(result.events, "out.mid", tempo_bpm=result.meta.tempo_bpm)lines = events_to_lines(result.events, tempo_bpm=result.meta.tempo_bpm, key=result.meta.key)
-
Text format: header lines then one NOTE per line, e.g.
META tempo=118.123 key=G majorNOTE onset=0.512000 dur=0.240000 pitch=67 vel=92 track=0 -
Fields:
onset: seconds from startdur: duration in secondspitch: MIDI note number (0–127)vel: MIDI velocity (1–127)track: instrument/part index (0 unless a polyphonic model supplies parts)
- Polyphonic didn’t kick in (text looks monophonic, few notes):
- Verify
pip show basic-pitchin your env. - Reinstall:
pip install --upgrade basic-pitch - On macOS, ensure your Python can import
coremltools(installed with Basic Pitch). If issues persist, trypip install onnxruntime==1.18.1and rerun.
- Verify
- MIDI file loads with unexpected tempo:
- Tempo is estimated; you can override when writing MIDI via
save_midi(..., tempo_bpm=...).
- Tempo is estimated; you can override when writing MIDI via