Skip to content
/ stimm Public

The Open Source Voice Agent Platform. Orchestrate ultra-low latency AI pipelines for real-time conversations over WebRTC.

License

Notifications You must be signed in to change notification settings

stimm-ai/stimm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Stimm Voice Agent Platform

The Open Source Voice Agent Platform
Orchestrate ultra-low latency AI pipelines for real-time conversations over WebRTC.

Tests License Python LiveKit Documentation Ruff ESLint Prettier Semgrep Bandit
demo_stimm.mp4

🔊 Sound on! Watch the ultra-low latency (< 1s) in action.


A modular, real-time AI voice assistant platform built with Python (FastAPI) and Next.js. This project provides a flexible infrastructure for creating, managing, and interacting with voice agents using various LLM, TTS, and STT providers.

📚 Read the Full Documentation

🚀 Features

  • Real-time Voice Interaction: Low-latency voice conversations using WebRTC and WebSocket transports.
  • SIP Telephony Integration: Connect incoming phone calls to AI agents via SIP protocol.
  • Modular AI Providers:
    • LLM: Support for Groq, Mistral, OpenRouter, and local Llama.cpp.
    • TTS: Deepgram, ElevenLabs, Async.ai, and local Kokoro.
    • STT: Deepgram and local Whisper.
  • Administrable RAG Configurations: Create and manage multiple RAG configurations with Qdrant and per‑agent knowledge bases.
  • Agent Management: Admin interface to configure and manage multiple agents with different personalities and provider settings.
  • Modern Frontend: Responsive web interface built with Next.js 16 and Tailwind CSS.
  • Robust Infrastructure: Dockerized deployment with Traefik reverse proxy, PostgreSQL for data persistence, and Alembic for migrations.
  • Voice Activity Detection: Integrated Silero VAD for accurate speech detection.

💡 Why Stimm?

  • Ultra-low latency thanks to optimized Silero VAD and LiveKit real-time media pipeline.
  • Provider-agnostic (LLM, TTS, STT): choose any AI stack.
  • Trimmable dependencies: Heavy build-time dependencies (like pyaudio) are optional, keeping the core image lean and secure.
  • Lightweight core: Pure ONNX Runtime inference for fast installation.
  • Scalable architecture: Docker, Traefik, Postgres, and a technical foundation designed for production deployment.

🧩 Use Cases

  • Customer support voicebots: Handle common queries automatically.
  • Interactive phone-based assistants (SIP): Connect traditional telephony to AI.
  • Real-time agent demos: Perfect for AI research and prototyping.
  • On-premise conversational agents: Deploy securely with AGPL-friendly terms.

🔄 How it Works

graph LR
    %% High Contrast Theme
    classDef person fill:#FF007F,stroke:#fff,stroke-width:4px,color:white,font-weight:bold;
    classDef transport fill:#00B0FF,stroke:#fff,stroke-width:2px,color:white,font-weight:bold;
    classDef input fill:#7F00FF,stroke:#fff,stroke-width:2px,color:white,font-weight:bold;
    classDef brain fill:#FFD700,stroke:#FF8C00,stroke-width:3px,color:black,font-weight:bold,stroke-dasharray: 5 5;
    classDef output fill:#00E676,stroke:#fff,stroke-width:2px,color:black,font-weight:bold;

    %% Multi-platform User
    User([👤 📱 💻 📞 User]):::person

    %% LiveKit Layer
    subgraph "📡 LiveKit Infrastructure"
        direction TB
        Room[🔄 Real-time Room <br/> WebRTC / SIP]:::transport
    end

    %% Stimm Core Layer
    subgraph "⚡ Stimm Core"
        direction TB
        Hear[👂 Hear & Transcribe]:::input
        Think(🧠 Think & Retrieve):::brain
        Speak[🗣️ Speak & Respond]:::output
    end

    %% Connections
    User <==>|Audio Stream| Room
    Room ==>|Raw Audio| Hear
    Hear ==>|Text| Think
    Think ==>|Text| Speak
    Speak ==>|Synthesized Audio| Room

    %% Link Styles
    linkStyle default stroke-width:3px,fill:none,stroke:#666
Loading

🏁 Quick Start

Get Stimm up and running in minutes:

# Clone the repository
git clone https://github.com/stimm-ai/stimm.git
cd stimm

# Set up environment (copies .env.example files)
chmod +x scripts/setup_env.sh
./scripts/setup_env.sh

# Start all services with Docker Compose
docker-compose up --build

Once the services are running, open your browser to:

For detailed instructions, refer to the Full Documentation or check the guides below:

💻 Development

For local development, see the Development Guide in the documentation.

Quick Development Setup

# Start supporting services (PostgreSQL, Qdrant, LiveKit, Redis, SIP)
docker compose up -d postgres qdrant traefik livekit redis sip

# Install Python dependencies (add --extra audio for local CLI audio/testing)
uv sync --group dev --group docs --all-extras

# Set up environment files and Python path (optional)
./scripts/setup_env.sh

# Run backend locally
uv run python -m src.main

# In another terminal, run frontend
cd src/front
npm install
npm run dev

🛡️ Security & Quality

Stimm follows a modern security-first approach using open-source SAST (Static Application Security Testing) and SCA (Software Composition Analysis) tools integrated into the development workflow.

Code Security (SAST)

  • Semgrep: Multi-language security scanner for finding complex vulnerabilities.
  • Bandit: Python-specific security scanner for common vulnerabilities.
  • ESLint SonarJS: Advanced code quality and security rules for the React frontend, providing SonarQube-like analysis.

Dependency Security (SCA)

  • pip-audit: Checks Python dependencies for known vulnerabilities.
  • npm audit: Checks JavaScript dependencies for known vulnerabilities.

Running All Checks

Developers can run the entire suite of security, quality, and formatting checks with a single command:

# Run all checks on all files in the repository
pre-commit run --all-files

This command runs Ruff (lint/format), Prettier, Bandit, Semgrep, pip-audit, npm audit, and frontend-security-lint in sequence. It is highly recommended to run this before submitting any pull request.

Quality & Standards

  • Ruff: Extremely fast Python linter and formatter.
  • ESLint & Prettier: Frontend linting and formatting.

🤝 Contributing

We welcome contributions! Please read our Contributing Guide for details on how to submit pull requests, report issues, and our code of conduct.

By contributing, you agree to the Contributor License Agreement (CLA).

⚖️ License

Stimm is open-source software licensed under the GNU Affero General Public License v3.0 (AGPL v3). See the LICENSE file for details.

Trademark Notice: The name "Stimm" and the Stimm logo are exclusive trademarks of the project maintainers and are not covered by the open-source license. Derivative works must remove the logo and change the name to avoid suggesting official affiliation.

⚡ Acknowledgments

Built with LiveKit

Stimm relies on LiveKit for high-performance real-time media transport (WebRTC).

Disclaimer: Stimm is an independent project and is not affiliated with, endorsed by, or sponsored by LiveKit.