haystack-tutorial

layout	default
title	Haystack Tutorial
nav_order	23
has_children	true
format_version	v2

Haystack: Deep Dive Tutorial

Project: Haystack — An open-source framework for building production-ready LLM applications, RAG pipelines, and intelligent search systems.

Why This Track Matters

Haystack is increasingly relevant for developers working with modern AI/ML infrastructure. Project: Haystack — An open-source framework for building production-ready LLM applications, RAG pipelines, and intelligent search systems, and this track helps you understand the architecture, key patterns, and production considerations.

This track focuses on:

understanding getting started with haystack
understanding document stores
understanding retrievers & search
understanding generators & llms

What Is Haystack?

Haystack is an open-source LLM framework by deepset for building composable AI pipelines. It provides a modular, component-based architecture that combines retrieval, generation, and evaluation into production-ready workflows. Haystack supports dozens of LLM providers, vector databases, and retrieval strategies out of the box.

Feature	Description
Pipeline System	Directed graph of components with typed inputs/outputs and automatic validation
RAG	First-class retrieval-augmented generation with hybrid search (BM25 + embedding)
Multi-Provider	OpenAI, Anthropic, Cohere, Google, Hugging Face, Ollama, and more
Document Stores	In-memory, Elasticsearch, OpenSearch, Pinecone, Qdrant, Weaviate, Chroma, pgvector
Evaluation	Built-in metrics (MRR, MAP, NDCG) and LLM-based evaluation components
Custom Components	`@component` decorator for building reusable pipeline nodes with typed I/O

Current Snapshot (auto-updated)

repository: deepset-ai/haystack
stars: about 24.6k
latest release: v2.26.1 (published 2026-03-20)

Mental Model

graph TB
    subgraph Ingestion["Ingestion Pipeline"]
        FILES[File Converters]
        SPLIT[Document Splitter]
        EMBED_D[Document Embedder]
        WRITER[Document Writer]
    end

    subgraph Store["Document Stores"]
        MEM[In-Memory]
        ES[Elasticsearch]
        PG[pgvector]
        VEC[Pinecone / Qdrant / Weaviate]
    end

    subgraph Query["Query Pipeline"]
        EMBED_Q[Query Embedder]
        BM25[BM25 Retriever]
        EMB_RET[Embedding Retriever]
        JOINER[Document Joiner]
        RANKER[Ranker]
        PROMPT[Prompt Builder]
        GEN[Generator / LLM]
    end

    FILES --> SPLIT --> EMBED_D --> WRITER
    WRITER --> Store

    Store --> BM25
    Store --> EMB_RET
    EMBED_Q --> EMB_RET
    BM25 --> JOINER
    EMB_RET --> JOINER
    JOINER --> RANKER --> PROMPT --> GEN

Chapter Guide

Chapter	Topic	What You'll Learn
1. Getting Started	Setup	Installation, first RAG pipeline, architecture overview
2. Document Stores	Storage	Store backends, indexing, preprocessing, multi-store patterns
3. Retrievers & Search	Retrieval	BM25, embedding, hybrid search, filtering, re-ranking
4. Generators & LLMs	Generation	Multi-provider LLMs, prompt engineering, streaming, chat
5. Pipelines & Workflows	Composition	Pipeline graph, branching, loops, serialization, async
6. Evaluation & Optimization	Quality	Retrieval metrics, LLM evaluation, A/B testing, optimization
7. Custom Components	Extensibility	@component decorator, typed I/O, testing, packaging
8. Production Deployment	Operations	REST API, Docker, Kubernetes, monitoring, scaling

Tech Stack

Component	Technology
Language	Python 3.9+
Pipeline Engine	Custom directed graph with topological execution
Serialization	YAML / JSON pipeline definitions
Embeddings	Sentence Transformers, OpenAI, Cohere, Fastembed
Vector Search	FAISS, Pinecone, Qdrant, Weaviate, Chroma, pgvector
Text Search	Elasticsearch, OpenSearch, BM25 (in-memory)
LLM Providers	OpenAI, Anthropic, Google, Cohere, Hugging Face, Ollama
API Layer	Hayhooks (FastAPI-based pipeline serving)

Ready to begin? Start with Chapter 1: Getting Started.

Built with insights from the Haystack repository and community documentation.

What You Will Learn

Core architecture and key abstractions
Practical patterns for production use
Integration and extensibility approaches

Navigation & Backlinks

Full Chapter Map

Source References

Haystack

Generated by AI Codebase Knowledge Builder

Name		Name	Last commit message	Last commit date
parent directory ..
01-getting-started.md		01-getting-started.md
02-document-stores.md		02-document-stores.md
03-retrievers-search.md		03-retrievers-search.md
04-generators-llms.md		04-generators-llms.md
05-pipelines-workflows.md		05-pipelines-workflows.md
06-evaluation-optimization.md		06-evaluation-optimization.md
07-custom-components.md		07-custom-components.md
08-production-deployment.md		08-production-deployment.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Haystack: Deep Dive Tutorial

Why This Track Matters

What Is Haystack?

Current Snapshot (auto-updated)

Mental Model

Chapter Guide

Tech Stack

What You Will Learn

Related Tutorials

Navigation & Backlinks

Full Chapter Map

Source References

FilesExpand file tree

haystack-tutorial

Directory actions

More options

Directory actions

More options

Latest commit

History

haystack-tutorial

Folders and files

parent directory

README.md

Haystack: Deep Dive Tutorial

Why This Track Matters

What Is Haystack?

Current Snapshot (auto-updated)

Mental Model

Chapter Guide

Tech Stack

What You Will Learn

Related Tutorials

Navigation & Backlinks

Full Chapter Map

Source References