Overview
Relevant Files
src/sqlite.h.in- Public API interface definitionsrc/sqliteInt.h- Internal data structures and interfacessrc/main.c- Core library initialization and connection managementsrc/vdbe.h- Virtual Database Engine interfacesrc/pager.h- Page cache and transaction managementsrc/btree.h- B-Tree storage engine interface
SQLite is a self-contained, serverless SQL database engine written in C. It is designed for embedded use, providing a complete relational database in a single library file. The codebase is modular, with distinct subsystems handling parsing, query execution, storage, and transactions.
Architecture Overview
SQLite follows a layered architecture with clear separation of concerns:
Loading diagram...
Core Components
Parser & Compiler - The SQL parser (generated from parse.y by the Lemon parser generator) converts SQL text into an abstract syntax tree. The compiler then generates bytecode for the Virtual Database Engine (VDBE).
Virtual Database Engine (VDBE) - An abstract machine that executes prepared statements. Each VDBE instruction (VdbeOp) performs a specific database operation. The VDBE provides a layer of abstraction between SQL semantics and storage implementation.
Query Optimizer - Located in where.c, analyzes WHERE clauses and generates efficient access plans. It determines whether to use indexes, the join order, and other optimization strategies.
B-Tree Storage Engine - Implements the core storage structure using B-Trees. Each table and index is stored as a B-Tree, with pages as the fundamental unit of storage. The B-Tree interface is defined in btree.h.
Pager - Manages the page cache, handles transactions, and implements the journal mechanism for crash recovery. It sits between the B-Tree and the VFS layer, providing ACID guarantees.
VFS Layer - Provides an abstraction for operating system file I/O. Platform-specific implementations exist for Unix (os_unix.c) and Windows (os_win.c).
Key Data Structures
The sqlite3 structure (in sqliteInt.h) represents a database connection and contains:
- Array of attached databases (
aDb) - Active VDBE list
- Mutex for thread safety
- Configuration and state flags
The Db structure represents each attached database file with its B-Tree (pBt) and schema (pSchema).
The Schema structure maintains metadata: tables, indexes, triggers, and foreign keys indexed by name in hash tables.
Build System
SQLite uses a sophisticated build process that generates several files:
sqlite3.h- Generated fromsrc/sqlite.h.inwith version informationparse.c- Generated fromparse.yby Lemonopcodes.h- Generated by scanningvdbe.csqlite3.c- The "amalgamation" combining all source files into one compilation unit
The amalgamation improves performance by enabling cross-procedure compiler optimizations.
Architecture & Core Subsystems
Relevant Files
src/parse.ysrc/vdbe.csrc/vdbeInt.hsrc/where.csrc/btree.csrc/pager.csrc/os.c
SQLite's architecture follows a layered design, with each subsystem handling a specific responsibility. Understanding these layers is essential for working with the codebase.
The Execution Pipeline
SQL statements flow through SQLite in a well-defined sequence:
- Parser (
parse.y) - Converts SQL text into an abstract syntax tree using the Lemon parser generator - Compiler - Generates VDBE bytecode from the AST
- Virtual Database Engine (VDBE) - Executes the bytecode
- B-Tree Storage - Manages persistent data structures
- Pager - Handles page caching and transactions
- VFS Layer - Abstracts OS file I/O
Core Subsystems
Parser & Compiler
The parser (parse.y) is a Lemon grammar file that defines SQLite's SQL syntax. It generates a C parser that builds an abstract syntax tree. The compiler then walks this tree and emits VDBE bytecode instructions. This separation allows SQL semantics to be independent of execution details.
Virtual Database Engine (VDBE)
The VDBE (vdbe.c, vdbeInt.h) is an abstract machine that executes prepared statements. Each instruction (VdbeOp) performs a specific operation: arithmetic, comparisons, table scans, index lookups, or data manipulation. The VDBE maintains:
- Memory cells - Temporary storage for values during execution
- Cursors - Pointers to table/index positions (wrapped in
VdbeCursor) - Program counter - Current instruction address
Cursors can be of multiple types: B-tree cursors for tables/indexes, sorters for ORDER BY, or virtual table cursors.
Query Optimizer (WHERE Clause)
The where.c module analyzes WHERE clauses and generates efficient access plans. It determines:
- Whether to use indexes or full table scans
- Join order for multi-table queries
- Cost estimates for different strategies
The optimizer generates VDBE code that implements the chosen plan.
B-Tree Storage Engine
The B-Tree (btree.c, btree.h) implements the core storage structure. Each table and index is stored as a B-Tree, with pages as the fundamental unit. Key operations include:
- Opening/closing B-Trees
- Cursor navigation (seek, next, previous)
- Insert/delete/update operations
- Page allocation and management
Pager & Transaction Management
The Pager (pager.c) sits between B-Tree and VFS, managing:
- Page cache - In-memory buffer of recently accessed pages
- Transactions - ACID guarantees via rollback journals or WAL
- Locking - File-level locks to prevent concurrent corruption
- Journal mechanism - Records original page content for rollback
The pager maintains strict invariants to ensure crash recovery works correctly.
VFS Layer
The VFS (os.c, os_unix.c, os_win.c) abstracts OS file operations. It provides:
- File open/close/read/write
- Locking primitives
- Sector size detection
- Platform-specific optimizations
Data Flow Example
A SELECT query flows through the system as follows:
SQL Text
↓
Parser (parse.y) → AST
↓
Compiler → VDBE Bytecode
↓
VDBE Executor
├→ WHERE optimizer (where.c) generates access plan
├→ Opens cursors via B-Tree (btree.c)
├→ Pager fetches pages from cache or disk (pager.c)
└→ VFS reads from file (os.c)
↓
Result rows
Key Design Principles
- Layered abstraction - Each layer has a clean interface, enabling independent optimization
- Page-based storage - All data is organized in fixed-size pages for efficient I/O
- Cursor-based access - Queries navigate data structures using cursor objects
- Transaction safety - Pager ensures ACID properties through journaling
- Virtual machine - VDBE bytecode provides a portable, debuggable execution model
SQL Parsing & Code Generation
Relevant Files
src/parse.y- Lemon grammar file defining SQL syntax rulessrc/tokenize.c- Lexical analyzer that breaks SQL into tokenssrc/prepare.c- Statement preparation and schema loadingsrc/build.c- Parser action handlers for DDL statementstool/lemon.c- Lemon parser generator (generates parse.c from parse.y)
SQLite's SQL parsing pipeline transforms raw SQL text into executable bytecode through three main stages: tokenization, parsing, and code generation.
Tokenization
The lexical analyzer in tokenize.c breaks SQL input into tokens using sqlite3GetToken(). This function uses a character classification table (aiClass[]) to efficiently categorize each byte as a keyword, identifier, operator, string literal, or other token type. Tokens are classified into types like TK_SELECT, TK_WHERE, TK_ID, etc., which are then fed to the parser.
Parsing with Lemon
SQLite uses Lemon, a custom LALR(1) parser generator, to build its parser from the grammar file parse.y. The grammar defines SQL syntax rules with associated C code actions. When a rule reduces, its action executes—for example, when a SELECT statement is recognized, the action calls sqlite3SelectNew() to construct a Select structure.
Key grammar sections include:
- DDL statements -
CREATE TABLE,DROP TABLE,CREATE INDEX - DML statements -
SELECT,INSERT,UPDATE,DELETE - Expressions - Binary operators, functions, CASE expressions, window functions
- Clauses -
WHERE,GROUP BY,ORDER BY,LIMIT,JOIN
The parser maintains a Parse context structure that accumulates metadata during parsing, including error state, temporary register allocation, and code generation state.
Code Generation
As parsing completes, the build.c module executes parser actions that generate VDBE bytecode. For example:
sqlite3StartTable()andsqlite3EndTable()handle table creationsqlite3Insert(),sqlite3Update(),sqlite3DeleteFrom()generate DML codesqlite3Select()generates SELECT query execution plans
The sqlite3FinishCoding() function finalizes the VDBE program after a statement is fully parsed, adding cleanup code and optimizations.
Data Flow
Loading diagram...
The entire pipeline is driven by sqlite3RunParser() in tokenize.c, which iteratively tokenizes and feeds tokens to the Lemon-generated parser until the SQL statement is complete.
Query Execution & Virtual Machine
Relevant Files
src/vdbe.c- Core VDBE execution enginesrc/vdbeInt.h- VDBE internal structures and definitionssrc/vdbeapi.c- Public VDBE API (sqlite3_step, sqlite3_finalize)src/vdbeaux.c- VDBE auxiliary functions (creation, opcodes)src/vdbemem.c- Memory cell management
The VDBE (Virtual Database Engine) is SQLite's bytecode interpreter that executes prepared SQL statements. It translates parsed SQL into a sequence of low-level instructions that manipulate data, manage cursors, and control program flow.
Architecture Overview
The VDBE operates as a stack-based virtual machine with three core components:
1. Instruction Set (Opcodes)
Each instruction is a VdbeOp structure containing:
opcode- The operation type (e.g.,OP_OpenRead,OP_Seek,OP_Add)p1, p2, p3- Three integer operands (register indices, jump targets, counts)p4- A fourth parameter (pointers to KeyInfo, FuncDef, or other structures)p5- Flags modifying opcode behavior
Opcodes fall into categories: control flow (OP_Goto, OP_If), arithmetic (OP_Add, OP_Multiply), comparisons (OP_Eq, OP_Lt), cursor operations (OP_OpenRead, OP_Next), and data manipulation (OP_Insert, OP_Delete).
2. Memory Cells (Registers)
The VDBE maintains an array of Mem structures (memory cells) that store intermediate values during execution. Each Mem can hold:
- SQL NULL, INTEGER, REAL, TEXT, or BLOB values
- Multiple cached representations (e.g., both integer and string forms)
- Flags indicating which representations are valid
3. Cursors
VdbeCursor objects track positions in tables and indexes. Four cursor types exist:
CURTYPE_BTREE- B-tree cursors for tables/indexesCURTYPE_SORTER- Sorter cursors for ORDER BY operationsCURTYPE_VTAB- Virtual table cursorsCURTYPE_PSEUDO- Single-row pseudotables
Execution Model
The Vdbe structure represents a prepared statement and contains:
aOp[]- Array of instructionsaMem[]- Array of memory cellsapCsr[]- Array of open cursorspc- Program counter (current instruction)eVdbeState- State machine (INIT, READY, RUN, HALT)
Execution begins with sqlite3_step(), which calls sqlite3VdbeExec(). This function loops through instructions, incrementing the program counter and executing each opcode's case statement. Jumps modify the program counter; most opcodes proceed sequentially.
Key Operations
Table Scans: OP_OpenRead opens a cursor on a table. OP_Rewind positions at the first row, OP_Next advances to subsequent rows, and OP_Column extracts column values into memory cells.
Index Lookups: OP_SeekRowid and OP_SeekLT perform binary searches on indexes using B-tree cursors, enabling efficient WHERE clause evaluation.
Aggregation: OP_AggStep accumulates values for aggregate functions (SUM, COUNT, etc.), with OP_AggFinal computing the final result.
Sorting: OP_SorterOpen creates a sorter cursor, OP_SorterInsert adds rows, and OP_SorterData retrieves sorted results.
Memory Management
vdbemem.c handles memory cell lifecycle. sqlite3VdbeMemSetInt64(), sqlite3VdbeMemSetNull(), and similar functions manage value assignment. The MEM_Dyn flag indicates dynamically allocated strings/blobs requiring cleanup via destructors.
State Transitions
A VDBE progresses through states: INIT (construction), READY (prepared), RUN (executing), and HALT (finished). sqlite3VdbeReset() returns to READY for reuse; sqlite3_finalize() deallocates the entire structure.
Loading diagram...
Query Optimization & Planning
Relevant Files
src/where.csrc/whereInt.hsrc/wherecode.csrc/whereexpr.csrc/select.c
SQLite's query optimizer transforms WHERE clauses into efficient execution plans by analyzing constraints, selecting indexes, and determining the optimal join order. The process involves three main phases: clause decomposition, loop generation, and path selection.
Core Data Structures
The optimizer uses several key structures to represent query plans:
- WhereClause: Decomposes the WHERE clause into individual terms connected by AND/OR operators. Each term is analyzed for indexability and constraint strength.
- WhereLoop: Represents a single table scan strategy, including which index to use, how many equality constraints apply, and estimated costs.
- WherePath: Represents a complete join order combining multiple WhereLoops. The solver evaluates many paths to find the lowest-cost plan.
- WhereInfo: The main context object holding the complete query plan state, including all loops, levels, and metadata.
Clause Analysis & Term Extraction
The optimizer begins by decomposing the WHERE clause into individual terms via whereClauseInsert() and exprAnalyze(). Each term is classified by its operator type (equality, range, IN, etc.) and marked with flags indicating whether it can use an index. OR expressions are recursively decomposed into separate WhereClause objects, allowing the optimizer to evaluate different branches independently.
Index Selection & Loop Generation
For each table in the FROM clause, the optimizer generates candidate WhereLoop objects representing different scan strategies:
- Full table scan
- Index scans using available indexes
- Rowid lookups for primary key constraints
Cost estimation uses logarithmic approximations (LogEst) to avoid overflow. For index scans, costs account for seek operations plus sequential scanning:
cost = nSeek * (log(nRow) + K * nVisit)
where K varies based on index size relative to table size.
Path Solver Algorithm
The wherePathSolver() function uses dynamic programming to find the optimal join order. It builds paths incrementally:
- Start with N best single-table paths
- For each subsequent table, extend existing paths by adding new loops
- Keep only the M best paths at each stage (typically M=10)
- Compare paths using a vector metric: (cost, row count, unsorted cost)
This greedy approach avoids exponential explosion while exploring promising alternatives.
ORDER BY Optimization
The optimizer evaluates whether index ordering can satisfy ORDER BY clauses without explicit sorting. The wherePathSatisfiesOrderBy() function checks if loop output naturally provides the required order, considering:
- Index column ordering
- Equality constraints that reduce the search space
- UNIQUE and NOT NULL properties for order-distinctness
If an index provides partial ordering, the optimizer calculates reduced sorting costs.
Cost Tuning & Heuristics
SQLite applies several heuristics to improve plan quality:
- Star-schema optimization: Reduces costs for queries with a central fact table
- Skip-scan: Uses IN operators to avoid full index scans
- Automatic indexes: Creates temporary indexes for expensive IN clauses
- ORDER BY LIMIT optimization: Skips rows that won't fit in the result set
The interstage heuristic runs between two solver passes to disable suboptimal loops after the first pass identifies a good plan.
Loading diagram...
Simple Query Fast Path
For common single-table queries with simple equality constraints, whereShortCut() bypasses the full solver, directly generating an optimized plan. This reduces preparation time for the most frequent query patterns.
Storage Engine & Transactions
Relevant Files
src/btree.c- B-tree implementation and page managementsrc/btreeInt.h- B-tree internal structures and file formatsrc/pager.c- Page cache and transaction state machinesrc/wal.c- Write-Ahead Log implementationsrc/pcache.c- Page cache management
SQLite uses a sophisticated storage engine built on B-trees with multiple transaction modes to ensure ACID compliance. The system manages data through pages, maintains consistency via locking, and provides two distinct journaling strategies.
B-Tree Storage Structure
The database file is divided into fixed-size pages (typically 4KB). Each page can be a B-tree node, freelist page, overflow page, or pointer-map page. The first page contains a 100-byte file header with metadata: page size, format versions, schema cookie, and file change counter. B-tree pages store cells (key-value pairs) with a header, cell pointer array, and cell content area. Large payloads spill to overflow pages.
// Page header structure (8-12 bytes)
// Offset 0: Flags (intkey, zerodata, leafdata, leaf)
// Offset 1-2: First freeblock offset
// Offset 3-4: Number of cells
// Offset 5-6: Cell content area start
// Offset 7: Fragmented free bytes
// Offset 8-11: Right child pointer (interior nodes only)
Transaction State Machine
The pager implements a seven-state machine controlling transaction lifecycle:
- OPEN - No transaction active, file may be unlocked
- READER - Read transaction active, SHARED lock held
- WRITER_LOCKED - Write transaction started, RESERVED lock acquired
- WRITER_CACHEMOD - Pages modified in cache, journal opened
- WRITER_DBMOD - Changes written to database file, EXCLUSIVE lock held
- WRITER_FINISHED - All writes synced, ready to commit
- ERROR - Unrecoverable error state
State transitions enforce strict ordering: OPEN → READER → WRITER_LOCKED → WRITER_CACHEMOD → WRITER_DBMOD → WRITER_FINISHED → READER → OPEN.
Locking Protocol
SQLite uses a five-level file locking hierarchy:
- NO_LOCK - No access to database
- SHARED_LOCK - Multiple readers allowed, no writers
- RESERVED_LOCK - One writer preparing, readers still allowed
- PENDING_LOCK - Writer waiting for readers to finish
- EXCLUSIVE_LOCK - Single writer, no other access
Transitions follow strict rules: UNLOCKED → SHARED → RESERVED → PENDING → EXCLUSIVE. This prevents deadlocks and ensures serializable isolation.
Journaling Modes
Rollback Journal (Default): Before modifying a page, its original content is written to a journal file. On commit, the journal is synced and deleted. On rollback, the journal is replayed to restore original pages. This guarantees atomicity: either all changes persist or none do.
Write-Ahead Log (WAL): Changes are written to a separate WAL file before modifying the database. Multiple transactions accumulate in the WAL. Periodically, a checkpoint transfers WAL content back to the database. WAL enables concurrent readers to view consistent snapshots while writers append new frames.
Page Cache Management
The page cache (PCache) maintains dirty pages in LRU order. Dirty pages are tracked in a doubly-linked list with metadata about sync status. The cache implements a stress callback to evict pages when memory pressure occurs. Clean pages (matching disk content) can be evicted freely; dirty pages require journal writes first.
struct PCache {
PgHdr *pDirty, *pDirtyTail; // Dirty pages in LRU order
PgHdr *pSynced; // Last synced page for optimization
int szCache; // Configured cache size
int (*xStress)(void*,PgHdr*); // Eviction callback
};
ACID Guarantees
Atomicity: Journal/WAL ensures all-or-nothing commits. Consistency: B-tree invariants maintained via locking. Isolation: SHARED/EXCLUSIVE locks provide serializable isolation. Durability: Sync operations ensure committed data survives crashes. The file change counter (bytes 24-39) signals cache invalidation across processes.
Extensions & Optional Features
Relevant Files
ext/fts5/fts5_main.cext/rtree/rtree.cext/session/sqlite3session.cext/rbu/sqlite3rbu.cext/recover/sqlite3recover.csrc/loadext.csrc/sqlite3ext.h
SQLite provides a modular extension system that allows optional features to be compiled into the core, loaded dynamically, or registered as auto-extensions. Extensions enhance SQLite with specialized functionality for full-text search, spatial indexing, change tracking, and database recovery.
Extension Loading Mechanism
Extensions are loaded through sqlite3_load_extension() in src/loadext.c. The system supports three loading modes:
- Compiled-in extensions – Built directly into the SQLite library via compile-time flags like
SQLITE_ENABLE_FTS5orSQLITE_ENABLE_RTREE - Loadable extensions – Dynamically loaded shared libraries (
.so,.dylib,.dll) at runtime - Auto-extensions – Registered via
sqlite3_auto_extension()and automatically loaded for every new database connection
The loader searches for entry points named sqlite3_<name>_init() or sqlite3_extension_init() and passes the sqlite3_api_routines structure containing all public SQLite APIs.
Core Extensions
FTS5 (Full-Text Search) – ext/fts5/fts5_main.c
Provides advanced full-text search with tokenization, phrase queries, and ranking. FTS5 uses a virtual table interface and manages its own index storage. It supports custom tokenizers and auxiliary functions for relevance scoring.
R-Tree (Spatial Indexing) – ext/rtree/rtree.c
Implements R-tree and R*-tree data structures for spatial queries. Stores tree nodes in three backing tables (_node, _parent, _rowid) and supports rectangular range queries with efficient bounding-box filtering.
Session Module – ext/session/sqlite3session.c
Records database changes into changesets that can be applied to other databases. Uses preupdate hooks to capture old and new values, enabling change tracking, replication, and conflict resolution workflows.
RBU (Resumable Bulk Update) – ext/rbu/sqlite3rbu.c
Performs large database updates in three stages: (1) write changes to an OAL file, (2) atomically rename to WAL, (3) checkpoint incrementally. Supports resumption if interrupted, making it ideal for mobile and embedded systems.
Recover Module – ext/recover/sqlite3recover.c
Recovers data from corrupted databases by scanning pages and reconstructing table schemas. Extracts records from freelist pages and attempts to recover deleted data when configured.
Extension Architecture
Loading diagram...
Configuration and Compilation
Extensions are enabled via compile-time flags in the build system:
SQLITE_ENABLE_FTS5– Enable FTS5 full-text searchSQLITE_ENABLE_RTREE– Enable R-tree spatial indexingSQLITE_ENABLE_SESSION– Enable session/changeset moduleSQLITE_ENABLE_RBU– Enable resumable bulk updateSQLITE_ENABLE_RECOVER– Enable database recovery
The --all build flag enables FTS4, FTS5, RTREE, GEOPOLY, SESSION, DBPAGE, DBSTAT, and CARRAY extensions together.
Extension API
Extensions access SQLite through the sqlite3_api_routines structure defined in src/sqlite3ext.h. This provides stable ABI compatibility across SQLite versions. Key capabilities include:
- Creating virtual tables via
sqlite3_create_module() - Registering functions via
sqlite3_create_function() - Accessing database handles and prepared statements
- Memory management and error reporting
The SQLITE_EXTENSION_INIT2() macro initializes the API pointer for loadable extensions, enabling them to call SQLite functions through the provided thunk layer.
Build System & Testing
Relevant Files
main.mk- Primary POSIX-compatible makefileMakefile.msc- Windows MSVC build configurationconfigure- AutoSetup configuration scripttool/mksqlite3c.tcl- Amalgamation generatortest/all.test- Master test suite runner
Build Architecture
SQLite uses a dual-makefile system supporting both POSIX and Windows platforms. The main.mk file is POSIX-compatible and included by platform-specific makefiles. Configuration is handled by AutoSetup, which generates Makefile.in from auto.def.
Key build modes:
- Amalgamation mode (
USE_AMALGAMATION=1): Combines all source files intosqlite3.candsqlite3.hfor simpler distribution and better compiler optimizations - Non-amalgamation mode: Builds from individual source files in
src/directory - Static vs. shared libraries: Controlled by
ENABLE_LIB_STATICandENABLE_LIB_SHARED
Amalgamation Generation
The amalgamation process is a two-step build:
-
Target source preparation (
.target_sourcetarget):- Copies all source files to
tsrc/directory - Generates parser files using Lemon (
parse.c,parse.h) - Generates keyword hash table (
keywordhash.h) - Compresses VDBE code
- Copies all source files to
-
Amalgamation creation (
tool/mksqlite3c.tcl):- Merges all C files into single
sqlite3.c - Deduplicates header includes using amalgamator directives
- Optionally includes
#linemacros for debugging - Supports custom extensions via
EXTRA_SRC
- Merges all C files into single
Build Targets
make all # Build libraries and shell
make lib # Static library only
make so # Shared library only
make sqlite3 # CLI shell executable
make testprogs # All test executables
Testing Framework
SQLite employs a comprehensive multi-layer testing strategy:
Test execution levels:
- quicktest: Fast smoke test (<3 minutes)
- alltest: Full TCL test suite via
testfixture - fulltest: All tests plus fuzzing
- releasetest: Pre-release validation with source tree checks
- devtest: Developer testing with standard configurations
Test infrastructure:
testfixture- TCL-based test harness linking against SQLitetest/all.test- Master test orchestrator running 50+ test configurationstest/testrunner.tcl- Parallel test execution with job schedulingtest/permutations.test- Configuration matrix defining test variants
Test configurations include single-threaded, multi-threaded, memory subsystems, WAL modes, journal modes, and cache configurations.
Compiler Configuration
Feature flags control compilation:
OPT_FEATURE_FLAGS- Enable/disable features (-DSQLITE_ENABLE_*,-DSQLITE_OMIT_*)SHELL_OPT- CLI shell-specific optionsCFLAGS.{feature}- Per-feature compiler flags (readline, ICU, zlib)LDFLAGS.{feature}- Per-feature linker flags
Windows builds use Makefile.msc with MSVC-specific options for optimization levels, debug symbols, and runtime linking.
Installation
Standard autotools-compatible installation directories:
$(prefix)/bin - Executables (sqlite3)
$(prefix)/lib - Libraries (libsqlite3.so, libsqlite3.a)
$(prefix)/include - Headers (sqlite3.h)
$(prefix)/lib/pkgconfig - pkg-config files
DLL installation on Unix creates versioned symlinks for compatibility with legacy libtool-style naming.