SQLite Database Engine | Augment Code

Overview

Relevant Files

src/sqlite.h.in - Public API interface definition
src/sqliteInt.h - Internal data structures and interfaces
src/main.c - Core library initialization and connection management
src/vdbe.h - Virtual Database Engine interface
src/pager.h - Page cache and transaction management
src/btree.h - B-Tree storage engine interface

SQLite is a self-contained, serverless SQL database engine written in C. It is designed for embedded use, providing a complete relational database in a single library file. The codebase is modular, with distinct subsystems handling parsing, query execution, storage, and transactions.

Architecture Overview

SQLite follows a layered architecture with clear separation of concerns:

Loading diagram...

Core Components

Parser & Compiler - The SQL parser (generated from parse.y by the Lemon parser generator) converts SQL text into an abstract syntax tree. The compiler then generates bytecode for the Virtual Database Engine (VDBE).

Virtual Database Engine (VDBE) - An abstract machine that executes prepared statements. Each VDBE instruction (VdbeOp) performs a specific database operation. The VDBE provides a layer of abstraction between SQL semantics and storage implementation.

Query Optimizer - Located in where.c, analyzes WHERE clauses and generates efficient access plans. It determines whether to use indexes, the join order, and other optimization strategies.

B-Tree Storage Engine - Implements the core storage structure using B-Trees. Each table and index is stored as a B-Tree, with pages as the fundamental unit of storage. The B-Tree interface is defined in btree.h.

Pager - Manages the page cache, handles transactions, and implements the journal mechanism for crash recovery. It sits between the B-Tree and the VFS layer, providing ACID guarantees.

VFS Layer - Provides an abstraction for operating system file I/O. Platform-specific implementations exist for Unix (os_unix.c) and Windows (os_win.c).

Key Data Structures

The sqlite3 structure (in sqliteInt.h) represents a database connection and contains:

Array of attached databases (aDb)
Active VDBE list
Mutex for thread safety
Configuration and state flags

The Db structure represents each attached database file with its B-Tree (pBt) and schema (pSchema).

The Schema structure maintains metadata: tables, indexes, triggers, and foreign keys indexed by name in hash tables.

Build System

SQLite uses a sophisticated build process that generates several files:

sqlite3.h - Generated from src/sqlite.h.in with version information
parse.c - Generated from parse.y by Lemon
opcodes.h - Generated by scanning vdbe.c
sqlite3.c - The "amalgamation" combining all source files into one compilation unit

The amalgamation improves performance by enabling cross-procedure compiler optimizations.

Architecture & Core Subsystems

Relevant Files

src/parse.y
src/vdbe.c
src/vdbeInt.h
src/where.c
src/btree.c
src/pager.c
src/os.c

SQLite's architecture follows a layered design, with each subsystem handling a specific responsibility. Understanding these layers is essential for working with the codebase.

The Execution Pipeline

SQL statements flow through SQLite in a well-defined sequence:

Parser (parse.y) - Converts SQL text into an abstract syntax tree using the Lemon parser generator
Compiler - Generates VDBE bytecode from the AST
Virtual Database Engine (VDBE) - Executes the bytecode
B-Tree Storage - Manages persistent data structures
Pager - Handles page caching and transactions
VFS Layer - Abstracts OS file I/O

Core Subsystems

Parser & Compiler

The parser (parse.y) is a Lemon grammar file that defines SQLite's SQL syntax. It generates a C parser that builds an abstract syntax tree. The compiler then walks this tree and emits VDBE bytecode instructions. This separation allows SQL semantics to be independent of execution details.

Virtual Database Engine (VDBE)

The VDBE (vdbe.c, vdbeInt.h) is an abstract machine that executes prepared statements. Each instruction (VdbeOp) performs a specific operation: arithmetic, comparisons, table scans, index lookups, or data manipulation. The VDBE maintains:

Memory cells - Temporary storage for values during execution
Cursors - Pointers to table/index positions (wrapped in VdbeCursor)
Program counter - Current instruction address

Cursors can be of multiple types: B-tree cursors for tables/indexes, sorters for ORDER BY, or virtual table cursors.

Query Optimizer (WHERE Clause)

The where.c module analyzes WHERE clauses and generates efficient access plans. It determines:

Whether to use indexes or full table scans
Join order for multi-table queries
Cost estimates for different strategies

The optimizer generates VDBE code that implements the chosen plan.

B-Tree Storage Engine

The B-Tree (btree.c, btree.h) implements the core storage structure. Each table and index is stored as a B-Tree, with pages as the fundamental unit. Key operations include:

Opening/closing B-Trees
Cursor navigation (seek, next, previous)
Insert/delete/update operations
Page allocation and management

Pager & Transaction Management

The Pager (pager.c) sits between B-Tree and VFS, managing:

Page cache - In-memory buffer of recently accessed pages
Transactions - ACID guarantees via rollback journals or WAL
Locking - File-level locks to prevent concurrent corruption
Journal mechanism - Records original page content for rollback

The pager maintains strict invariants to ensure crash recovery works correctly.

VFS Layer

The VFS (os.c, os_unix.c, os_win.c) abstracts OS file operations. It provides:

File open/close/read/write
Locking primitives
Sector size detection
Platform-specific optimizations

Data Flow Example

A SELECT query flows through the system as follows:

SQL Text
  ↓
Parser (parse.y) → AST
  ↓
Compiler → VDBE Bytecode
  ↓
VDBE Executor
  ├→ WHERE optimizer (where.c) generates access plan
  ├→ Opens cursors via B-Tree (btree.c)
  ├→ Pager fetches pages from cache or disk (pager.c)
  └→ VFS reads from file (os.c)
  ↓
Result rows

Key Design Principles

Layered abstraction - Each layer has a clean interface, enabling independent optimization
Page-based storage - All data is organized in fixed-size pages for efficient I/O
Cursor-based access - Queries navigate data structures using cursor objects
Transaction safety - Pager ensures ACID properties through journaling
Virtual machine - VDBE bytecode provides a portable, debuggable execution model

SQL Parsing & Code Generation

Relevant Files

src/parse.y - Lemon grammar file defining SQL syntax rules
src/tokenize.c - Lexical analyzer that breaks SQL into tokens
src/prepare.c - Statement preparation and schema loading
src/build.c - Parser action handlers for DDL statements
tool/lemon.c - Lemon parser generator (generates parse.c from parse.y)

SQLite's SQL parsing pipeline transforms raw SQL text into executable bytecode through three main stages: tokenization, parsing, and code generation.

Tokenization

The lexical analyzer in tokenize.c breaks SQL input into tokens using sqlite3GetToken(). This function uses a character classification table (aiClass[]) to efficiently categorize each byte as a keyword, identifier, operator, string literal, or other token type. Tokens are classified into types like TK_SELECT, TK_WHERE, TK_ID, etc., which are then fed to the parser.

Parsing with Lemon

SQLite uses Lemon, a custom LALR(1) parser generator, to build its parser from the grammar file parse.y. The grammar defines SQL syntax rules with associated C code actions. When a rule reduces, its action executes—for example, when a SELECT statement is recognized, the action calls sqlite3SelectNew() to construct a Select structure.

Key grammar sections include:

DDL statements - CREATE TABLE, DROP TABLE, CREATE INDEX
DML statements - SELECT, INSERT, UPDATE, DELETE
Expressions - Binary operators, functions, CASE expressions, window functions
Clauses - WHERE, GROUP BY, ORDER BY, LIMIT, JOIN

The parser maintains a Parse context structure that accumulates metadata during parsing, including error state, temporary register allocation, and code generation state.

Code Generation

As parsing completes, the build.c module executes parser actions that generate VDBE bytecode. For example:

sqlite3StartTable() and sqlite3EndTable() handle table creation
sqlite3Insert(), sqlite3Update(), sqlite3DeleteFrom() generate DML code
sqlite3Select() generates SELECT query execution plans

The sqlite3FinishCoding() function finalizes the VDBE program after a statement is fully parsed, adding cleanup code and optimizations.

Data Flow

Loading diagram...

The entire pipeline is driven by sqlite3RunParser() in tokenize.c, which iteratively tokenizes and feeds tokens to the Lemon-generated parser until the SQL statement is complete.

Query Execution & Virtual Machine

Relevant Files

src/vdbe.c - Core VDBE execution engine
src/vdbeInt.h - VDBE internal structures and definitions
src/vdbeapi.c - Public VDBE API (sqlite3_step, sqlite3_finalize)
src/vdbeaux.c - VDBE auxiliary functions (creation, opcodes)
src/vdbemem.c - Memory cell management

The VDBE (Virtual Database Engine) is SQLite's bytecode interpreter that executes prepared SQL statements. It translates parsed SQL into a sequence of low-level instructions that manipulate data, manage cursors, and control program flow.

Architecture Overview

The VDBE operates as a stack-based virtual machine with three core components:

1. Instruction Set (Opcodes)

Each instruction is a VdbeOp structure containing:

opcode - The operation type (e.g., OP_OpenRead, OP_Seek, OP_Add)
p1, p2, p3 - Three integer operands (register indices, jump targets, counts)
p4 - A fourth parameter (pointers to KeyInfo, FuncDef, or other structures)
p5 - Flags modifying opcode behavior

Opcodes fall into categories: control flow (OP_Goto, OP_If), arithmetic (OP_Add, OP_Multiply), comparisons (OP_Eq, OP_Lt), cursor operations (OP_OpenRead, OP_Next), and data manipulation (OP_Insert, OP_Delete).

2. Memory Cells (Registers)

The VDBE maintains an array of Mem structures (memory cells) that store intermediate values during execution. Each Mem can hold:

SQL NULL, INTEGER, REAL, TEXT, or BLOB values
Multiple cached representations (e.g., both integer and string forms)
Flags indicating which representations are valid

3. Cursors

VdbeCursor objects track positions in tables and indexes. Four cursor types exist:

CURTYPE_BTREE - B-tree cursors for tables/indexes
CURTYPE_SORTER - Sorter cursors for ORDER BY operations
CURTYPE_VTAB - Virtual table cursors
CURTYPE_PSEUDO - Single-row pseudotables

Execution Model

The Vdbe structure represents a prepared statement and contains:

aOp[] - Array of instructions
aMem[] - Array of memory cells
apCsr[] - Array of open cursors
pc - Program counter (current instruction)
eVdbeState - State machine (INIT, READY, RUN, HALT)

Execution begins with sqlite3_step(), which calls sqlite3VdbeExec(). This function loops through instructions, incrementing the program counter and executing each opcode's case statement. Jumps modify the program counter; most opcodes proceed sequentially.

Key Operations

Table Scans: OP_OpenRead opens a cursor on a table. OP_Rewind positions at the first row, OP_Next advances to subsequent rows, and OP_Column extracts column values into memory cells.

Index Lookups: OP_SeekRowid and OP_SeekLT perform binary searches on indexes using B-tree cursors, enabling efficient WHERE clause evaluation.

Aggregation: OP_AggStep accumulates values for aggregate functions (SUM, COUNT, etc.), with OP_AggFinal computing the final result.

Sorting: OP_SorterOpen creates a sorter cursor, OP_SorterInsert adds rows, and OP_SorterData retrieves sorted results.

Memory Management

vdbemem.c handles memory cell lifecycle. sqlite3VdbeMemSetInt64(), sqlite3VdbeMemSetNull(), and similar functions manage value assignment. The MEM_Dyn flag indicates dynamically allocated strings/blobs requiring cleanup via destructors.

State Transitions

A VDBE progresses through states: INIT (construction), READY (prepared), RUN (executing), and HALT (finished). sqlite3VdbeReset() returns to READY for reuse; sqlite3_finalize() deallocates the entire structure.

Loading diagram...

Query Optimization & Planning

Relevant Files

src/where.c
src/whereInt.h
src/wherecode.c
src/whereexpr.c
src/select.c

SQLite's query optimizer transforms WHERE clauses into efficient execution plans by analyzing constraints, selecting indexes, and determining the optimal join order. The process involves three main phases: clause decomposition, loop generation, and path selection.

Core Data Structures

The optimizer uses several key structures to represent query plans:

WhereClause: Decomposes the WHERE clause into individual terms connected by AND/OR operators. Each term is analyzed for indexability and constraint strength.
WhereLoop: Represents a single table scan strategy, including which index to use, how many equality constraints apply, and estimated costs.
WherePath: Represents a complete join order combining multiple WhereLoops. The solver evaluates many paths to find the lowest-cost plan.
WhereInfo: The main context object holding the complete query plan state, including all loops, levels, and metadata.

Clause Analysis & Term Extraction

The optimizer begins by decomposing the WHERE clause into individual terms via whereClauseInsert() and exprAnalyze(). Each term is classified by its operator type (equality, range, IN, etc.) and marked with flags indicating whether it can use an index. OR expressions are recursively decomposed into separate WhereClause objects, allowing the optimizer to evaluate different branches independently.

Index Selection & Loop Generation

For each table in the FROM clause, the optimizer generates candidate WhereLoop objects representing different scan strategies:

Full table scan
Index scans using available indexes
Rowid lookups for primary key constraints

Cost estimation uses logarithmic approximations (LogEst) to avoid overflow. For index scans, costs account for seek operations plus sequential scanning:

cost = nSeek * (log(nRow) + K * nVisit)

where K varies based on index size relative to table size.

Path Solver Algorithm

The wherePathSolver() function uses dynamic programming to find the optimal join order. It builds paths incrementally:

Start with N best single-table paths
For each subsequent table, extend existing paths by adding new loops
Keep only the M best paths at each stage (typically M=10)
Compare paths using a vector metric: (cost, row count, unsorted cost)

This greedy approach avoids exponential explosion while exploring promising alternatives.

ORDER BY Optimization

The optimizer evaluates whether index ordering can satisfy ORDER BY clauses without explicit sorting. The wherePathSatisfiesOrderBy() function checks if loop output naturally provides the required order, considering:

Index column ordering
Equality constraints that reduce the search space
UNIQUE and NOT NULL properties for order-distinctness

If an index provides partial ordering, the optimizer calculates reduced sorting costs.

Cost Tuning & Heuristics

SQLite applies several heuristics to improve plan quality:

Star-schema optimization: Reduces costs for queries with a central fact table
Skip-scan: Uses IN operators to avoid full index scans
Automatic indexes: Creates temporary indexes for expensive IN clauses
ORDER BY LIMIT optimization: Skips rows that won't fit in the result set

The interstage heuristic runs between two solver passes to disable suboptimal loops after the first pass identifies a good plan.

Loading diagram...

Simple Query Fast Path

For common single-table queries with simple equality constraints, whereShortCut() bypasses the full solver, directly generating an optimized plan. This reduces preparation time for the most frequent query patterns.

Storage Engine & Transactions

Relevant Files

src/btree.c - B-tree implementation and page management
src/btreeInt.h - B-tree internal structures and file format
src/pager.c - Page cache and transaction state machine
src/wal.c - Write-Ahead Log implementation
src/pcache.c - Page cache management

SQLite uses a sophisticated storage engine built on B-trees with multiple transaction modes to ensure ACID compliance. The system manages data through pages, maintains consistency via locking, and provides two distinct journaling strategies.

B-Tree Storage Structure

The database file is divided into fixed-size pages (typically 4KB). Each page can be a B-tree node, freelist page, overflow page, or pointer-map page. The first page contains a 100-byte file header with metadata: page size, format versions, schema cookie, and file change counter. B-tree pages store cells (key-value pairs) with a header, cell pointer array, and cell content area. Large payloads spill to overflow pages.

// Page header structure (8-12 bytes)
// Offset 0: Flags (intkey, zerodata, leafdata, leaf)
// Offset 1-2: First freeblock offset
// Offset 3-4: Number of cells
// Offset 5-6: Cell content area start
// Offset 7: Fragmented free bytes
// Offset 8-11: Right child pointer (interior nodes only)

Transaction State Machine

The pager implements a seven-state machine controlling transaction lifecycle:

OPEN - No transaction active, file may be unlocked
READER - Read transaction active, SHARED lock held
WRITER_LOCKED - Write transaction started, RESERVED lock acquired
WRITER_CACHEMOD - Pages modified in cache, journal opened
WRITER_DBMOD - Changes written to database file, EXCLUSIVE lock held
WRITER_FINISHED - All writes synced, ready to commit
ERROR - Unrecoverable error state

State transitions enforce strict ordering: OPEN → READER → WRITER_LOCKED → WRITER_CACHEMOD → WRITER_DBMOD → WRITER_FINISHED → READER → OPEN.

Locking Protocol

SQLite uses a five-level file locking hierarchy:

NO_LOCK - No access to database
SHARED_LOCK - Multiple readers allowed, no writers
RESERVED_LOCK - One writer preparing, readers still allowed
PENDING_LOCK - Writer waiting for readers to finish
EXCLUSIVE_LOCK - Single writer, no other access

Transitions follow strict rules: UNLOCKED → SHARED → RESERVED → PENDING → EXCLUSIVE. This prevents deadlocks and ensures serializable isolation.

Journaling Modes

Rollback Journal (Default): Before modifying a page, its original content is written to a journal file. On commit, the journal is synced and deleted. On rollback, the journal is replayed to restore original pages. This guarantees atomicity: either all changes persist or none do.

Write-Ahead Log (WAL): Changes are written to a separate WAL file before modifying the database. Multiple transactions accumulate in the WAL. Periodically, a checkpoint transfers WAL content back to the database. WAL enables concurrent readers to view consistent snapshots while writers append new frames.

Page Cache Management

The page cache (PCache) maintains dirty pages in LRU order. Dirty pages are tracked in a doubly-linked list with metadata about sync status. The cache implements a stress callback to evict pages when memory pressure occurs. Clean pages (matching disk content) can be evicted freely; dirty pages require journal writes first.

struct PCache {
  PgHdr *pDirty, *pDirtyTail;  // Dirty pages in LRU order
  PgHdr *pSynced;              // Last synced page for optimization
  int szCache;                 // Configured cache size
  int (*xStress)(void*,PgHdr*); // Eviction callback
};

ACID Guarantees

Atomicity: Journal/WAL ensures all-or-nothing commits. Consistency: B-tree invariants maintained via locking. Isolation: SHARED/EXCLUSIVE locks provide serializable isolation. Durability: Sync operations ensure committed data survives crashes. The file change counter (bytes 24-39) signals cache invalidation across processes.

Extensions & Optional Features

Relevant Files

ext/fts5/fts5_main.c
ext/rtree/rtree.c
ext/session/sqlite3session.c
ext/rbu/sqlite3rbu.c
ext/recover/sqlite3recover.c
src/loadext.c
src/sqlite3ext.h

SQLite provides a modular extension system that allows optional features to be compiled into the core, loaded dynamically, or registered as auto-extensions. Extensions enhance SQLite with specialized functionality for full-text search, spatial indexing, change tracking, and database recovery.

Extension Loading Mechanism

Extensions are loaded through sqlite3_load_extension() in src/loadext.c. The system supports three loading modes:

Compiled-in extensions – Built directly into the SQLite library via compile-time flags like SQLITE_ENABLE_FTS5 or SQLITE_ENABLE_RTREE
Loadable extensions – Dynamically loaded shared libraries (.so, .dylib, .dll) at runtime
Auto-extensions – Registered via sqlite3_auto_extension() and automatically loaded for every new database connection

The loader searches for entry points named sqlite3_<name>_init() or sqlite3_extension_init() and passes the sqlite3_api_routines structure containing all public SQLite APIs.

Core Extensions

FTS5 (Full-Text Search) – ext/fts5/fts5_main.c

Provides advanced full-text search with tokenization, phrase queries, and ranking. FTS5 uses a virtual table interface and manages its own index storage. It supports custom tokenizers and auxiliary functions for relevance scoring.

R-Tree (Spatial Indexing) – ext/rtree/rtree.c

Implements R-tree and R*-tree data structures for spatial queries. Stores tree nodes in three backing tables (_node, _parent, _rowid) and supports rectangular range queries with efficient bounding-box filtering.

Session Module – ext/session/sqlite3session.c

Records database changes into changesets that can be applied to other databases. Uses preupdate hooks to capture old and new values, enabling change tracking, replication, and conflict resolution workflows.

RBU (Resumable Bulk Update) – ext/rbu/sqlite3rbu.c

Performs large database updates in three stages: (1) write changes to an OAL file, (2) atomically rename to WAL, (3) checkpoint incrementally. Supports resumption if interrupted, making it ideal for mobile and embedded systems.

Recover Module – ext/recover/sqlite3recover.c

Recovers data from corrupted databases by scanning pages and reconstructing table schemas. Extracts records from freelist pages and attempts to recover deleted data when configured.

Extension Architecture

Loading diagram...

Configuration and Compilation

Extensions are enabled via compile-time flags in the build system:

SQLITE_ENABLE_FTS5 – Enable FTS5 full-text search
SQLITE_ENABLE_RTREE – Enable R-tree spatial indexing
SQLITE_ENABLE_SESSION – Enable session/changeset module
SQLITE_ENABLE_RBU – Enable resumable bulk update
SQLITE_ENABLE_RECOVER – Enable database recovery

The --all build flag enables FTS4, FTS5, RTREE, GEOPOLY, SESSION, DBPAGE, DBSTAT, and CARRAY extensions together.

Extension API

Extensions access SQLite through the sqlite3_api_routines structure defined in src/sqlite3ext.h. This provides stable ABI compatibility across SQLite versions. Key capabilities include:

Creating virtual tables via sqlite3_create_module()
Registering functions via sqlite3_create_function()
Accessing database handles and prepared statements
Memory management and error reporting

The SQLITE_EXTENSION_INIT2() macro initializes the API pointer for loadable extensions, enabling them to call SQLite functions through the provided thunk layer.

Build System & Testing

Relevant Files

main.mk - Primary POSIX-compatible makefile
Makefile.msc - Windows MSVC build configuration
configure - AutoSetup configuration script
tool/mksqlite3c.tcl - Amalgamation generator
test/all.test - Master test suite runner

Build Architecture

SQLite uses a dual-makefile system supporting both POSIX and Windows platforms. The main.mk file is POSIX-compatible and included by platform-specific makefiles. Configuration is handled by AutoSetup, which generates Makefile.in from auto.def.

Key build modes:

Amalgamation mode (USE_AMALGAMATION=1): Combines all source files into sqlite3.c and sqlite3.h for simpler distribution and better compiler optimizations
Non-amalgamation mode: Builds from individual source files in src/ directory
Static vs. shared libraries: Controlled by ENABLE_LIB_STATIC and ENABLE_LIB_SHARED

Amalgamation Generation

The amalgamation process is a two-step build:

Target source preparation (.target_source target):
- Copies all source files to tsrc/ directory
- Generates parser files using Lemon (parse.c, parse.h)
- Generates keyword hash table (keywordhash.h)
- Compresses VDBE code
Amalgamation creation (tool/mksqlite3c.tcl):
- Merges all C files into single sqlite3.c
- Deduplicates header includes using amalgamator directives
- Optionally includes #line macros for debugging
- Supports custom extensions via EXTRA_SRC

Build Targets

make all              # Build libraries and shell
make lib              # Static library only
make so               # Shared library only
make sqlite3          # CLI shell executable
make testprogs        # All test executables

Testing Framework

SQLite employs a comprehensive multi-layer testing strategy:

Test execution levels:

quicktest: Fast smoke test (<3 minutes)
alltest: Full TCL test suite via testfixture
fulltest: All tests plus fuzzing
releasetest: Pre-release validation with source tree checks
devtest: Developer testing with standard configurations

Test infrastructure:

testfixture - TCL-based test harness linking against SQLite
test/all.test - Master test orchestrator running 50+ test configurations
test/testrunner.tcl - Parallel test execution with job scheduling
test/permutations.test - Configuration matrix defining test variants

Test configurations include single-threaded, multi-threaded, memory subsystems, WAL modes, journal modes, and cache configurations.

Compiler Configuration

Feature flags control compilation:

OPT_FEATURE_FLAGS - Enable/disable features (-DSQLITE_ENABLE_*, -DSQLITE_OMIT_*)
SHELL_OPT - CLI shell-specific options
CFLAGS.{feature} - Per-feature compiler flags (readline, ICU, zlib)
LDFLAGS.{feature} - Per-feature linker flags

Windows builds use Makefile.msc with MSVC-specific options for optimization levels, debug symbols, and runtime linking.

Installation

Standard autotools-compatible installation directories:

$(prefix)/bin       - Executables (sqlite3)
$(prefix)/lib       - Libraries (libsqlite3.so, libsqlite3.a)
$(prefix)/include   - Headers (sqlite3.h)
$(prefix)/lib/pkgconfig - pkg-config files

DLL installation on Unix creates versioned symlinks for compatibility with legacy libtool-style naming.