DeepSearch

Stateful, multi-turn video retrieval with a production-grade indexing pipeline built on VideoDB.
Explore the docs »

Quick Start · Features · How It Works · Python Integration · LLM Routing · Report Bug

Why DeepSearch

Traditional video search often collapses complex intent into one embedding query.

DeepSearch improves relevance by combining:

Indexing orchestration (scene extraction, transcript, object detection, multimodal enrichment, summary indexes)
Retrieval orchestration (LLM-planned multi-index search, validator loop, reranking, and follow-up refinement)
Stateful retrieval memory (context-aware refinement across conversational turns)

Features

Index from a public video URL or an existing VideoDB media ID
Structured indexing telemetry via event callbacks for progress visibility
Multi-turn search sessions with search, followup, resume_session
Explainable clip results (primary subquery, primary index, supporting subqueries)
Robust conversational continuity with persisted session state across turns
Pluggable stores with SQLite defaults for sessions, index records, metadata, and artifacts
Configurable model routing and per-stage model overrides
Vision-aware enrichment for stronger multimodal retrieval quality

How It Works

DeepSearch has two connected runtimes:

Indexing runtime builds semantic indexes from scenes, transcript, and optional object signals.
Retrieval runtime runs a stateful LangGraph loop that plans queries, validates candidates, reranks clips, and supports follow-up turns.

For each user query, DeepSearch returns ranked clips with explainability fields so you can see why each clip matched.

Quick Start

Prerequisites

Python 3.10+
uv 0.8+
VideoDB API key (console.videodb.io)
OpenAI-compatible API key for configured LLM route

1) Install

Recommended install (best retrieval quality):

uv sync --extra detection

DeepSearch uses object detection during indexing to add object-level visual signals (for example: person, laptop, car, traffic sign) into scene metadata. Retrieval then uses those signals during ranking and refinement, which improves results for object-centric queries.

If you want a lighter setup without local detector dependencies, you can still run DeepSearch by installing base deps only and disabling detection in config.

Base-only install:

uv sync

Then set indexing.object_detection.mode to a non-local value in deepsearch_config.yaml to skip detection:

indexing:
  object_detection:
    mode: off

2) Configure env

cp .env.sample .env

Set at minimum:

VIDEO_DB_API_KEY
OPENAI_API_KEY

Optional:

DEEPSEARCH_DB_PATH
DEEPSEARCH_CONFIG (defaults to deepsearch_config.yaml)

If you use a different LLM route:

OpenRouter route: set OPENROUTER_API_KEY
Vercel AI Gateway route: set VERCEL_AI_GATEWAY_API_KEY and optionally VERCEL_AI_GATEWAY_BASE_URL

3) Index a video

--collection-id is optional. If you already have a VideoDB collection, pass its ID. If you leave it empty, DeepSearch falls back to your account default collection via the SDK.

uv run python index_video.py \
  [--collection-id <collection_id>] \
  --video-url <public_video_url>

Or index an existing VideoDB media object:

uv run python index_video.py \
  [--collection-id <collection_id>] \
  --media-id <media_id>

If your source video is local, upload it to VideoDB first, copy the returned media_id, then run indexing with --media-id.

import videodb

conn = videodb.connect(api_key="YOUR_VIDEO_DB_API_KEY")
collection = conn.get_collection()  # or conn.get_collection("<collection_id>")

# Upload local media file
video = collection.upload(file_path="./videos/my_video.mp4", name="My Local Video")

print("media_id:", video.id)

Then index it with DeepSearch:

uv run python index_video.py \
  [--collection-id <collection_id>] \
  --media-id <media_id>

4) Run interactive retrieval

uv run python run_deepsearch.py \
  [--collection-id <collection_id>] \
  --query "rainy night scenes with emotional dialogue"

Interactive commands:

/more for next page
/help for command help
/exit to end and print session_id

Python Integration

If you are integrating DeepSearch directly inside your app, use DeepSearchClient:

from deepsearch import DeepSearchClient

client = DeepSearchClient(config="deepsearch_config.yaml")

# Index from a public URL
manifest = client.index_video(
    collection_id="c-...",
    video_url="https://example.com/video.mp4",
)

# Or index an existing VideoDB media
# manifest = client.index_video(collection_id="c-...", media_id="m-...")

session = client.start_session(collection_id="c-...", page_size=5)
first = session.search("find product demo moments")
next_page = session.followup(ui_event={"type": "show_more"})
refined = session.followup(text="only include scenes with pricing discussion")

Configuration

DeepSearch supports typed config, dict config, and YAML-file config.

Default config file: deepsearch_config.yaml
Config schema: deepsearch/config/schema.py
Environment overrides via DeepSearchConfig.from_env() with DEEPSEARCH_ prefix and nested keys using double underscores

Example:

export DEEPSEARCH_RETRIEVAL__PAGE_SIZE=20

LLM Routing and Per-Node Models

DeepSearch uses one configured LLM route for a run (OpenAI-compatible, OpenRouter, or Vercel AI Gateway), while still letting you set different models per indexing/retrieval node.

Route examples

llm:
  route: openrouter
  provider_mode: openrouter
  openrouter:
    enabled: true
    api_key_env: OPENROUTER_API_KEY

llm:
  route: vercel_ai_sdk_python
  provider_mode: direct

Per-node model overrides

llm:
  models:
    indexing:
      scene_enrichment: openai/o3
      subplot_summary: openai/o3-mini
      final_summary: openai/o3
    retrieval:
      planner: openai/o3
      paraphrase: openai/gpt-4o-mini
      validator: openai/o3-mini
      none_analyzer: openai/o3-mini
      interpreter: openai/o3
      reranker: openai/o3

Use model IDs valid for your selected route/provider.

Project Structure

deepsearch/
├── client.py                     # Public client/session entrypoints
├── indexing/                     # Indexing pipeline + stage contracts
├── retrieval/                    # LangGraph retrieval graph + nodes
├── providers/                    # LLM and detector provider adapters
├── stores/                       # Session/metadata/index record stores
├── config/                       # Typed config schema and defaults
├── telemetry/                    # Logging utilities
└── errors/                       # Error taxonomy and typed errors

index_video.py                    # CLI script for indexing
run_deepsearch.py                 # Interactive retrieval script
sample_end_user_usage.py          # End-user API walkthrough
deepsearch_config.yaml            # Example config
docs/PRD.md                       # Product requirements draft
docs/specs.md                     # Technical specs draft

Troubleshooting

Local detector import errors

If detection stage raises missing modules (torch, transformers, etc.), either install detection extras or disable local detection mode in config.

uv sync --extra detection

Resume an indexing run

If indexing fails after upload/extract, rerun index_video with the printed media_id (and optionally --force-reindex) to continue from persisted artifacts.

Community & Support

Docs: docs.videodb.io
Issues: GitHub Issues
Discord: Join community
Console: Get API key

Made with ❤️ by the VideoDB team

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
deepsearch		deepsearch
docs		docs
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
deepsearch_config.yaml		deepsearch_config.yaml
index_video.py		index_video.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_deepsearch.py		run_deepsearch.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepSearch

Why DeepSearch

Features

How It Works

Quick Start

Prerequisites

1) Install

2) Configure env

3) Index a video

4) Run interactive retrieval

Python Integration

Configuration

LLM Routing and Per-Node Models

Route examples

Per-node model overrides

Project Structure

Troubleshooting

Local detector import errors

Resume an indexing run

Community & Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeepSearch

Why DeepSearch

Features

How It Works

Quick Start

Prerequisites

1) Install

2) Configure env

3) Index a video

4) Run interactive retrieval

Python Integration

Configuration

LLM Routing and Per-Node Models

Route examples

Per-node model overrides

Project Structure

Troubleshooting

Local detector import errors

Resume an indexing run

Community & Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages