Distill

A reliability layer for LLM context. Deterministic deduplication that removes redundancy before it reaches your model.

Data Sources → Distill → LLM
(docs, code, memory, tools)    (reliable outputs)

The Problem

LLM outputs are unreliable because context is polluted.

30-40% of context assembled from multiple sources is semantically redundant. Same information from docs, code, memory, and tools competing for attention. This leads to:

Non-deterministic outputs - Same workflow, different results
Confused reasoning - Signal diluted by repetition
Production failures - Works in demos, breaks at scale

How It Works

Query → Over-fetch (50) → Cluster → Select → MMR Re-rank (8) → LLM

Over-fetch - Retrieve 3-5x more chunks than needed
Cluster - Group semantically similar chunks (agglomerative clustering)
Select - Pick best representative from each cluster
MMR Re-rank - Balance relevance and diversity

Result: Deterministic, diverse context in ~12ms. No LLM calls. Fully auditable.

Installation

Binary (Recommended)

Download from GitHub Releases:

# macOS (Apple Silicon)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*darwin_arm64.tar.gz" | cut -d '"' -f 4) | tar xz

# macOS (Intel)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*darwin_amd64.tar.gz" | cut -d '"' -f 4) | tar xz

# Linux (amd64)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*linux_amd64.tar.gz" | cut -d '"' -f 4) | tar xz

# Linux (arm64)
curl -sL $(curl -s https://api.github.com/repos/Siddhant-K-code/distill/releases/latest | grep "browser_download_url.*linux_arm64.tar.gz" | cut -d '"' -f 4) | tar xz

# Move to PATH
sudo mv distill /usr/local/bin/

Or download directly from the releases page.

Go Install

go install github.com/Siddhant-K-code/distill@latest

Docker

docker pull ghcr.io/siddhant-k-code/distill:latest
docker run -p 8080:8080 -e OPENAI_API_KEY=your-key ghcr.io/siddhant-k-code/distill

Build from Source

git clone https://github.com/Siddhant-K-code/distill.git
cd distill
go build -o distill .

Quick Start

1. Standalone API (No Vector DB Required)

Start the API server and send chunks directly:

export OPENAI_API_KEY="your-key"  # For embeddings
distill api --port 8080

Deduplicate chunks:

curl -X POST http://localhost:8080/v1/dedupe \
  -H "Content-Type: application/json" \
  -d '{
    "chunks": [
      {"id": "1", "text": "React is a JavaScript library for building UIs."},
      {"id": "2", "text": "React.js is a JS library for building user interfaces."},
      {"id": "3", "text": "Vue is a progressive framework for building UIs."}
    ]
  }'

Response:

{
  "chunks": [
    {"id": "1", "text": "React is a JavaScript library for building UIs.", "cluster_id": 0},
    {"id": "3", "text": "Vue is a progressive framework for building UIs.", "cluster_id": 1}
  ],
  "stats": {
    "input_count": 3,
    "output_count": 2,
    "reduction_pct": 33,
    "latency_ms": 12
  }
}

With pre-computed embeddings (no OpenAI key needed):

curl -X POST http://localhost:8080/v1/dedupe \
  -H "Content-Type: application/json" \
  -d '{
    "chunks": [
      {"id": "1", "text": "React is...", "embedding": [0.1, 0.2, ...]},
      {"id": "2", "text": "React.js is...", "embedding": [0.11, 0.21, ...]},
      {"id": "3", "text": "Vue is...", "embedding": [0.9, 0.8, ...]}
    ]
  }'

2. With Vector Database

Connect to Pinecone or Qdrant for retrieval + deduplication:

export PINECONE_API_KEY="your-key"
export OPENAI_API_KEY="your-key"

distill serve --index my-index --port 8080

Query with automatic deduplication:

curl -X POST http://localhost:8080/v1/retrieve \
  -H "Content-Type: application/json" \
  -d '{"query": "how do I reset my password?"}'

3. MCP Integration (AI Assistants)

Works with Claude, Cursor, Amp, and other MCP-compatible assistants:

distill mcp

Add to Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "distill": {
      "command": "/path/to/distill",
      "args": ["mcp"]
    }
  }
}

See mcp/README.md for more configuration options.

CLI Commands

distill api       # Start standalone API server
distill serve     # Start server with vector DB connection
distill mcp       # Start MCP server for AI assistants
distill analyze   # Analyze a file for duplicates
distill sync      # Upload vectors to Pinecone with dedup
distill query     # Test a query from command line

Configuration

Environment Variables

OPENAI_API_KEY      # For text → embedding conversion (see note below)
PINECONE_API_KEY    # For Pinecone backend
QDRANT_URL          # For Qdrant backend (default: localhost:6334)
DISTILL_API_KEYS    # Optional: protect your self-hosted instance (see below)

Protecting Your Self-Hosted Instance

If you're exposing Distill publicly, set DISTILL_API_KEYS to require authentication:

# Generate a random API key
export DISTILL_API_KEYS="sk-$(openssl rand -hex 32)"

# Or multiple keys (comma-separated)
export DISTILL_API_KEYS="sk-key1,sk-key2,sk-key3"

Then include the key in requests:

curl -X POST http://your-server:8080/v1/dedupe \
  -H "Authorization: Bearer sk-your-key" \
  -H "Content-Type: application/json" \
  -d '{"chunks": [...]}'

If DISTILL_API_KEYS is not set, the API is open (suitable for local/internal use).

About OpenAI API Key

When you need it:

Sending text chunks without pre-computed embeddings
Using text queries with vector database retrieval
Using the MCP server with text-based tools

When you DON'T need it:

Sending chunks with pre-computed embeddings (include "embedding": [...] in your request)
Using Distill purely for clustering/deduplication on existing vectors

What it's used for:

Converts text to embeddings using text-embedding-3-small model
~$0.00002 per 1K tokens (very cheap)
Embeddings are used only for similarity comparison, never stored

Alternatives:

Bring your own embeddings - include "embedding" field in chunks
Self-host an embedding model - set EMBEDDING_API_URL to your endpoint

Parameters

Parameter	Description	Default
`--threshold`	Clustering distance (lower = stricter)	0.15
`--lambda`	MMR balance: 1.0 = relevance, 0.0 = diversity	0.5
`--over-fetch-k`	Chunks to retrieve initially	50
`--target-k`	Chunks to return after dedup	8

Self-Hosting

Docker (Recommended)

Use the pre-built image from GitHub Container Registry:

# Pull and run
docker run -p 8080:8080 -e OPENAI_API_KEY=your-key ghcr.io/siddhant-k-code/distill:latest

# Or with a specific version
docker run -p 8080:8080 -e OPENAI_API_KEY=your-key ghcr.io/siddhant-k-code/distill:v0.1.0

Docker Compose

# Start Distill + Qdrant (local vector DB)
docker-compose up

Build from Source

docker build -t distill .
docker run -p 8080:8080 -e OPENAI_API_KEY=your-key distill api

Fly.io

fly launch
fly secrets set OPENAI_API_KEY=your-key
fly deploy

Render

Or manually:

Connect your GitHub repo
Set environment variables (OPENAI_API_KEY)
Deploy

Railway

Connect your repo and set OPENAI_API_KEY in environment variables.

Architecture

┌─────────────────────────────────────────────────────────┐
│                      Your App                           │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                      Distill                            │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐    │
│  │ Fetch   │→ │ Cluster │→ │ Select  │→ │  MMR    │    │
│  │  50     │  │   12    │  │   12    │  │   8     │    │
│  └─────────┘  └─────────┘  └─────────┘  └─────────┘    │
│       2ms         6ms         <1ms         3ms          │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                       LLM                               │
└─────────────────────────────────────────────────────────┘

Supported Backends

Pinecone - Fully supported
Qdrant - Fully supported
Weaviate - Coming soon

Use Cases

Code Assistants - Dedupe context from multiple files/repos
RAG Pipelines - Remove redundant chunks before LLM
Agent Workflows - Clean up tool outputs + memory + docs
Enterprise - Deterministic outputs for compliance

Why Distill?

	LLM Compression	Distill
Latency	~500ms	~12ms
Deterministic	No	Yes
Auditable	No	Yes
Lossless	No	Yes

Contributing

Contributions welcome! Please read the contributing guidelines first.

# Run tests
go test ./...

# Build
go build -o distill .

License

AGPL-3.0 - see LICENSE

For commercial licensing, contact: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
cmd		cmd
examples		examples
mcp		mcp
pkg		pkg
testdata		testdata
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
fly.toml		fly.toml
go.mod		go.mod
go.sum		go.sum
main.go		main.go
render.yaml		render.yaml

License

Siddhant-K-code/distill

Folders and files

Latest commit

History

Repository files navigation

Distill

The Problem

How It Works

Installation

Binary (Recommended)

Go Install

Docker

Build from Source

Quick Start

1. Standalone API (No Vector DB Required)

2. With Vector Database

3. MCP Integration (AI Assistants)

CLI Commands

Configuration

Environment Variables

Protecting Your Self-Hosted Instance

About OpenAI API Key

Parameters

Self-Hosting

Docker (Recommended)

Docker Compose

Build from Source

Fly.io

Render

Railway

Architecture

Supported Backends

Use Cases

Why Distill?

Contributing

License

Links

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Languages

Packages