Chatterbox Streaming API Docker

A high-performance, text-to-speech API with streaming capabilities, featuring OpenAI-compatible endpoints and multiple voice support. Built with FastAPI and optimised for Docker deployment with GPU acceleration.

🚀 Features

OpenAI-Compatible API: Drop-in replacement for OpenAI's TTS API endpoints
Real-time Streaming: Low-latency audio generation with chunked streaming
Multiple Voices: Six distinct voices (alloy, echo, fable, onyx, nova, shimmer)
GPU Acceleration: CUDA support for fast inference
Live Chat Integration: Real-time LLM + TTS chat with Ollama
Cross-Platform: Windows, macOS, and Linux support
Docker Ready: Easy deployment with Docker Compose
Voice Upload: Custom voice upload functionality

📋 Prerequisites

Docker and Docker Compose
NVIDIA GPU with CUDA support (recommended)
Python 3.10+ (for local development)
Ollama (for chat functionality)

🛠️ Quick Start

1. Clone the Repository

git clone https://github.com/dwain-barnes/chatterbox-streaming-api-docker.git
cd chatterbox-streaming-api-docker

2. Build and Run with Docker

# Build the image with voice support
docker build -f Dockerfile.voices -t chatterbox-tts-voices .

# Run the container
docker run -d --gpus all -p 5001:5001 --name chatterbox-tts-voices chatterbox-tts-voices

Or use Docker Compose:

docker-compose up -d

3. Verify Installation

# Check server health
curl http://localhost:5001/health

# List available voices
curl http://localhost:5001/v1/voices

4. Test Voice Generation

# Test with curl
curl -X POST http://localhost:5001/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input":"Hello, this is a test of the text-to-speech system.","voice":"alloy","model":"tts-1"}' \
  --output test.wav

# Or use the provided test scripts
./voice_test_script.sh        # Linux/macOS
voice_test.bat               # Windows

🎯 Usage Examples

Basic TTS API Call

import requests

response = requests.post(
    "http://localhost:5001/v1/audio/speech",
    json={
        "input": "Hello, world! This is Chatterbox speaking.",
        "voice": "nova",
        "model": "tts-1"
    }
)

with open("output.wav", "wb") as f:
    f.write(response.content)

Streaming TTS

import requests

response = requests.post(
    "http://localhost:5001/v1/audio/speech/stream",
    json={
        "input": "This is a streaming example with real-time audio generation.",
        "voice": "echo",
        "stream": True
    },
    stream=True
)

with open("streaming_output.wav", "wb") as f:
    for chunk in response.iter_content(chunk_size=1024):
        if chunk:
            f.write(chunk)

Real-time Chat with LLM + TTS

First, ensure Ollama is running:

# Install and run Ollama
ollama pull gemma3:latest
ollama serve

Then start the chat:

# Basic chat
python ultra_simple_chat.py

# With custom voice and model
python ultra_simple_chat.py --voice fable --llm-model gemma3:latest

🎙️ Available Voices

Voice	Characteristics
alloy	Professional, neutral, balanced
echo	Clear, articulate, precise
fable	Warm, friendly, storytelling
onyx	Deep, authoritative, commanding
nova	Energetic, enthusiastic, bright
shimmer	Soft, gentle, soothing

🔧 Configuration

Environment Variables

# Docker environment
NVIDIA_VISIBLE_DEVICES=all
PYTHONUNBUFFERED=1

# Model configuration
TORCH_HOME=/app/.cache
HF_HOME=/app/.cache
TRANSFORMERS_CACHE=/app/.cache

Server Parameters

python server.py [voices_dir] [supported_voices] [options]

Options:
  --host          Host to bind to (default: 0.0.0.0)
  --port          Port to bind to (default: 5001)
  --device        Device to use: cuda/mps/cpu (default: cuda)
  --exaggeration  Exaggeration factor 0-1 (default: 0.5)
  --cfg-weight    CFG weight 0-1 (default: 0.5)
  --temperature   Temperature for sampling (default: 0.8)
  --chunk-size    Chunk size for streaming (default: 50)

Chat Configuration

python ultra_simple_chat.py [options]

Options:
  --llm-url       Ollama URL (default: http://localhost:11434)
  --llm-model     LLM model (default: gemma2:latest)
  --tts-url       TTS URL (default: http://localhost:5001)
  --voice         TTS voice (default: alloy)
  --system-prompt Custom system prompt
  --debug         Enable debug mode

📚 API Reference

OpenAI-Compatible Endpoints

POST `/v1/audio/speech`

Generate complete audio from text.

Request:

{
  "model": "tts-1",
  "input": "Text to speak",
  "voice": "alloy",
  "response_format": "wav",
  "speed": 1.0
}

Response: Audio file (WAV/MP3)

POST `/v1/audio/speech/stream`

Generate streaming audio from text.

Request:

{
  "model": "tts-1", 
  "input": "Text to speak",
  "voice": "nova",
  "stream": true
}

Response: Streaming audio chunks

GET `/v1/models`

List available models.

GET `/v1/voices`

List available voices.

POST `/v1/voices/upload`

Upload custom voice file.

🧪 Development

Local Development Setup

# Install dependencies
pip install -r requirements.txt

# Install Chatterbox TTS
pip install git+https://github.com/davidbrowne17/chatterbox-streaming.git

# Run server locally
python server.py ./voices alloy,echo,fable,onyx,nova,shimmer --device cpu

Testing

# Test all voices
./voice_test_script.sh

# Debug mode (no chunking)
python debug_chat.py --debug

# Simple chat test
python ultra_simple_chat.py --debug

🐳 Docker Configuration

Build Options

# Standard build
docker build -t chatterbox-tts .

# With voice support
docker build -f Dockerfile.voices -t chatterbox-tts-voices .

# Development build
docker build --target development -t chatterbox-tts-dev .

Docker Compose

The docker-compose.yml includes:

GPU support
Volume mounting for voices
Health checks
Restart policies
Environment configuration

🔍 Troubleshooting

Common Issues

GPU not detected:

# Check NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi

# Fallback to CPU
docker run -p 5001:5001 chatterbox-tts-voices python server.py /app/voices alloy,echo --device cpu

Audio playback issues on Windows:

# Use debug version
python debug_chat.py

# Check audio format
file test.wav

Ollama connection issues:

# Check Ollama status
curl http://localhost:11434/api/tags

# Restart Ollama
ollama serve

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Chatterbox TTS - Core TTS engine
FastAPI - Web framework
Ollama - Local LLM inference
OpenAI - API compatibility standards
davidbrowne17 - Streaming version

Disclaimer

Don't use this model to do bad things.

⭐ Star this repository if you find it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Dockerfile.voices		Dockerfile.voices
README.md		README.md
docker-compose.yml		docker-compose.yml
emotional_test_chat.py		emotional_test_chat.py
requirements.txt		requirements.txt
server.py		server.py
ultra_simple_chat.py		ultra_simple_chat.py
voice_test.bat		voice_test.bat
voice_test_script.sh		voice_test_script.sh

Folders and files

Latest commit

History

Repository files navigation

Chatterbox Streaming API Docker

🚀 Features

📋 Prerequisites

🛠️ Quick Start

1. Clone the Repository

2. Build and Run with Docker

3. Verify Installation

4. Test Voice Generation

🎯 Usage Examples

Basic TTS API Call

Streaming TTS

Real-time Chat with LLM + TTS

🎙️ Available Voices

🔧 Configuration

Environment Variables

Server Parameters

Chat Configuration

📚 API Reference

OpenAI-Compatible Endpoints

POST /v1/audio/speech

POST /v1/audio/speech/stream

GET /v1/models

GET /v1/voices

POST /v1/voices/upload

🧪 Development

Local Development Setup

Testing

🐳 Docker Configuration

Build Options

Docker Compose

🔍 Troubleshooting

Common Issues

📄 License

🙏 Acknowledgments

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

POST `/v1/audio/speech`

POST `/v1/audio/speech/stream`

GET `/v1/models`

GET `/v1/voices`

POST `/v1/voices/upload`

Packages