🥤 RAGLite

RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite.

Features

Configurable

🧠 Choose any LLM provider with LiteLLM, including local llama-cpp-python models
💾 Choose either PostgreSQL or SQLite as a keyword & vector search database
🥇 Choose any reranker with rerankers, including multilingual FlashRank as the default

Fast and permissive

❤️ Only lightweight and permissive open source dependencies (e.g., no PyTorch or LangChain)
🚀 Acceleration with Metal on macOS, and CUDA on Linux and Windows

Unhobbled

📖 PDF to Markdown conversion on top of pdftext and pypdfium2
🧬 Multi-vector chunk embedding with late chunking and contextual chunk headings
✂️ Optimal level 4 semantic chunking by solving a binary integer programming problem
🔍 Hybrid search with the database's native keyword & vector search (tsvector+pgvector, FTS5+sqlite-vec¹)
💰 Improved cost and latency with a prompt caching-aware message array structure
🍰 Improved output quality with Anthropic's long-context prompt format
🌀 Optimal closed-form linear query adapter by solving an orthogonal Procrustes problem

Extensible

💬 Optional customizable ChatGPT-like frontend for web, Slack, and Teams with Chainlit
✍️ Optional conversion of any input document to Markdown with Pandoc
✅ Optional evaluation of retrieval and generation performance with Ragas

Installing

First, begin by installing spaCy's multilingual sentence model:

# Install spaCy's xx_sent_ud_sm:
pip install https://github.com/explosion/spacy-models/releases/download/xx_sent_ud_sm-3.7.0/xx_sent_ud_sm-3.7.0-py3-none-any.whl

Next, it is optional but recommended to install an accelerated llama-cpp-python precompiled binary with:

# Configure which llama-cpp-python precompiled binary to install (⚠️ only v0.2.88 is supported right now):
LLAMA_CPP_PYTHON_VERSION=0.2.88
PYTHON_VERSION=310
ACCELERATOR=metal|cu121|cu122|cu123|cu124
PLATFORM=macosx_11_0_arm64|linux_x86_64|win_amd64

# Install llama-cpp-python:
pip install "https://github.com/abetlen/llama-cpp-python/releases/download/v$LLAMA_CPP_PYTHON_VERSION-$ACCELERATOR/llama_cpp_python-$LLAMA_CPP_PYTHON_VERSION-cp$PYTHON_VERSION-cp$PYTHON_VERSION-$PLATFORM.whl"

Finally, install RAGLite with:

pip install raglite

To add support for a customizable ChatGPT-like frontend, use the chainlit extra:

pip install raglite[chainlit]

To add support for filetypes other than PDF, use the pandoc extra:

pip install raglite[pandoc]

To add support for evaluation, use the ragas extra:

pip install raglite[ragas]

Using

Overview

1. Configuring RAGLite

Tip

🧠 RAGLite extends LiteLLM with support for llama.cpp models using llama-cpp-python. To select a llama.cpp model (e.g., from bartowski's collection), use a model identifier of the form "llama-cpp-python/<hugging_face_repo_id>/<filename>@<n_ctx>", where n_ctx is an optional parameter that specifies the context size of the model.

Tip

💾 You can create a PostgreSQL database in a few clicks at neon.tech.

First, configure RAGLite with your preferred PostgreSQL or SQLite database and any LLM supported by LiteLLM:

from raglite import RAGLiteConfig

# Example 'remote' config with a PostgreSQL database and an OpenAI LLM:
my_config = RAGLiteConfig(
    db_url="postgresql://my_username:my_password@my_host:5432/my_database"
    llm="gpt-4o-mini",  # Or any LLM supported by LiteLLM.
    embedder="text-embedding-3-large",  # Or any embedder supported by LiteLLM.
)

# Example 'local' config with a SQLite database and a llama.cpp LLM:
my_config = RAGLiteConfig(
    db_url="sqlite:///raglite.sqlite",
    llm="llama-cpp-python/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/*Q4_K_M.gguf@8192",
    embedder="llama-cpp-python/lm-kit/bge-m3-gguf/*F16.gguf",
)

You can also configure any reranker supported by rerankers:

from rerankers import Reranker

# Example remote API-based reranker:
my_config = RAGLiteConfig(
    db_url="postgresql://my_username:my_password@my_host:5432/my_database"
    reranker=Reranker("cohere", lang="en", api_key=COHERE_API_KEY)
)

# Example local cross-encoder reranker per language (this is the default):
my_config = RAGLiteConfig(
    db_url="sqlite:///raglite.sqlite",
    reranker=(
        ("en", Reranker("ms-marco-MiniLM-L-12-v2", model_type="flashrank")),  # English
        ("other", Reranker("ms-marco-MultiBERT-L-12", model_type="flashrank")),  # Other languages
    )
)

2. Inserting documents

Tip

✍️ To insert documents other than PDF, install the pandoc extra with pip install raglite[pandoc].

Next, insert some documents into the database. RAGLite will take care of the conversion to Markdown, optimal level 4 semantic chunking, and multi-vector embedding with late chunking:

# Insert documents:
from pathlib import Path
from raglite import insert_document

insert_document(Path("On the Measure of Intelligence.pdf"), config=my_config)
insert_document(Path("Special Relativity.pdf"), config=my_config)

3. Searching and Retrieval-Augmented Generation (RAG)

3.1 Simple RAG pipeline

Now you can run a simple but powerful RAG pipeline that consists of retrieving the most relevant chunk spans (each of which is a list of consecutive chunks) with hybrid search and reranking, converting the user prompt to a RAG instruction and appending it to the message history, and finally generating the RAG response:

from raglite import create_rag_instruction, rag, retrieve_rag_context

# Retrieve relevant chunk spans with hybrid search and reranking:
user_prompt = "How is intelligence measured?"
chunk_spans = retrieve_rag_context(query=user_prompt, num_chunks=5, config=my_config)

# Append a RAG instruction based on the user prompt and context to the message history:
messages = []  # Or start with an existing message history.
messages.append(create_rag_instruction(user_prompt=user_prompt, context=chunk_spans))

# Stream the RAG response:
stream = rag(messages, config=my_config)
for update in stream:
    print(update, end="")

# Access the documents cited in the RAG response:
documents = [chunk_span.document for chunk_span in chunk_spans]

3.2 Advanced RAG pipeline

Tip

🥇 Reranking can significantly improve the output quality of a RAG application. To add reranking to your application: first search for a larger set of 20 relevant chunks, then rerank them with a rerankers reranker, and finally keep the top 5 chunks.

In addition to the simple RAG pipeline, RAGLite also offers more advanced control over the individual steps of the pipeline. A full pipeline consists of several steps:

Searching for relevant chunks with keyword, vector, or hybrid search
Retrieving the chunks from the database
Reranking the chunks and selecting the top 5 results
Extending the chunks with their neighbors and grouping them into chunk spans
Converting the user prompt to a RAG instruction and appending it to the message history
Streaming an LLM response to the message history
Accessing the cited documents from the chunk spans

# Search for chunks:
from raglite import hybrid_search, keyword_search, vector_search

user_prompt = "How is intelligence measured?"
chunk_ids_vector, _ = vector_search(user_prompt, num_results=20, config=my_config)
chunk_ids_keyword, _ = keyword_search(user_prompt, num_results=20, config=my_config)
chunk_ids_hybrid, _ = hybrid_search(user_prompt, num_results=20, config=my_config)

# Retrieve chunks:
from raglite import retrieve_chunks

chunks_hybrid = retrieve_chunks(chunk_ids_hybrid, config=my_config)

# Rerank chunks and keep the top 5 (optional, but recommended):
from raglite import rerank_chunks

chunks_reranked = rerank_chunks(user_prompt, chunks_hybrid, config=my_config)
chunks_reranked = chunks_reranked[:5]

# Extend chunks with their neighbors and group them into chunk spans:
from raglite import retrieve_chunk_spans

chunk_spans = retrieve_chunk_spans(chunks_reranked, config=my_config)

# Append a RAG instruction based on the user prompt and context to the message history:
from raglite import create_rag_instruction

messages = []  # Or start with an existing message history.
messages.append(create_rag_instruction(user_prompt=user_prompt, context=chunk_spans))

# Stream the RAG response:
from raglite import rag

stream = rag(messages, config=my_config)
for update in stream:
    print(update, end="")

# Access the documents cited in the RAG response:
documents = [chunk_span.document for chunk_span in chunk_spans]

4. Computing and using an optimal query adapter

RAGLite can compute and apply an optimal closed-form query adapter to the prompt embedding to improve the output quality of RAG. To benefit from this, first generate a set of evals with insert_evals and then compute and store the optimal query adapter with update_query_adapter:

# Improve RAG with an optimal query adapter:
from raglite import insert_evals, update_query_adapter

insert_evals(num_evals=100, config=my_config)
update_query_adapter(config=my_config)  # From here, every vector search will use the query adapter.

5. Evaluation of retrieval and generation

If you installed the ragas extra, you can use RAGLite to answer the evals and then evaluate the quality of both the retrieval and generation steps of RAG using Ragas:

# Evaluate retrieval and generation:
from raglite import answer_evals, evaluate, insert_evals

insert_evals(num_evals=100, config=my_config)
answered_evals_df = answer_evals(num_evals=10, config=my_config)
evaluation_df = evaluate(answered_evals_df, config=my_config)

6. Serving a customizable ChatGPT-like frontend

If you installed the chainlit extra, you can serve a customizable ChatGPT-like frontend with:

raglite chainlit

The application is also deployable to web, Slack, and Teams.

You can specify the database URL, LLM, and embedder directly in the Chainlit frontend, or with the CLI as follows:

raglite chainlit \
    --db_url sqlite:///raglite.sqlite \
    --llm llama-cpp-python/bartowski/Llama-3.2-3B-Instruct-GGUF/*Q4_K_M.gguf@4096 \
    --embedder llama-cpp-python/lm-kit/bge-m3-gguf/*F16.gguf

To use an API-based LLM, make sure to include your credentials in a .env file or supply them inline:

OPENAI_API_KEY=sk-... raglite chainlit --llm gpt-4o-mini --embedder text-embedding-3-large

raglite-chainlit.mov

Contributing

Prerequisites

1. Set up Git to use SSH

Generate an SSH key and add the SSH key to your GitHub account.

Configure SSH to automatically load your SSH keys:

cat << EOF >> ~/.ssh/config

Host *
  AddKeysToAgent yes
  IgnoreUnknown UseKeychain
  UseKeychain yes
  ForwardAgent yes
EOF

2. Install Docker

Install Docker Desktop.
- Linux only:
  - Export your user's user id and group id so that files created in the Dev Container are owned by your user:
```
cat << EOF >> ~/.bashrc

export UID=$(id --user)
export GID=$(id --group)
EOF
```

3. Install VS Code or PyCharm

Install VS Code and VS Code's Dev Containers extension. Alternatively, install PyCharm.
Optional: install a Nerd Font such as FiraCode Nerd Font and configure VS Code or configure PyCharm to use it.

Development environments

The following development environments are supported:

⭐️ GitHub Codespaces: click on Code and select Create codespace to start a Dev Container with GitHub Codespaces.
⭐️ Dev Container (with container volume): click on Open in Dev Containers to clone this repository in a container volume and create a Dev Container with VS Code.
Dev Container: clone this repository, open it with VS Code, and run Ctrl/⌘ + ⇧ + P → Dev Containers: Reopen in Container.
PyCharm: clone this repository, open it with PyCharm, and configure Docker Compose as a remote interpreter with the dev service.
Terminal: clone this repository, open it with your terminal, and run docker compose up --detach dev to start a Dev Container in the background, and then run docker compose exec dev zsh to open a shell prompt in the Dev Container.

Developing

This project follows the Conventional Commits standard to automate Semantic Versioning and Keep A Changelog with Commitizen.
Run poe from within the development environment to print a list of Poe the Poet tasks available to run on this project.
Run poetry add {package} from within the development environment to install a run time dependency and add it to pyproject.toml and poetry.lock. Add --group test or --group dev to install a CI or development dependency, respectively.
Run poetry update from within the development environment to upgrade all dependencies to the latest versions allowed by pyproject.toml.
Run cz bump to bump the package's version, update the CHANGELOG.md, and create a git tag.

We use PyNNDescent until sqlite-vec is more mature. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.devcontainer		.devcontainer
.github		.github
src/raglite		src/raglite
tests		tests
.cruft.json		.cruft.json
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🥤 RAGLite

Features

Configurable

Fast and permissive

Unhobbled

Extensible

Installing

Using

Overview

1. Configuring RAGLite

2. Inserting documents

3. Searching and Retrieval-Augmented Generation (RAG)

3.1 Simple RAG pipeline

3.2 Advanced RAG pipeline

4. Computing and using an optimal query adapter

5. Evaluation of retrieval and generation

6. Serving a customizable ChatGPT-like frontend

Contributing

About

Releases 8

Contributors 3

Languages

License

superlinear-ai/raglite

Folders and files

Latest commit

History

Repository files navigation

🥤 RAGLite

Features

Configurable

Fast and permissive

Unhobbled

Extensible

Installing

Using

Overview

1. Configuring RAGLite

2. Inserting documents

3. Searching and Retrieval-Augmented Generation (RAG)

3.1 Simple RAG pipeline

3.2 Advanced RAG pipeline

4. Computing and using an optimal query adapter

5. Evaluation of retrieval and generation

6. Serving a customizable ChatGPT-like frontend

Contributing

Footnotes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 8

Contributors 3

Languages