Planet PostgreSQL

Jobin Augustine: What Is in pg_gather Version 33 ?

Mon, 16 Mar 2026 17:03:18 +0000

It started as a humble personal project, few years back. The objective was to convert all my PostgreSQL notes and learning into a automatic diagnostic tool, such that even a new DBA can easily spot the problems. The idea was simple, a simple tool which don’t need any installation but do all possible analysis and […]

Cornelia Biacsics: Contributions for week 10, 2026

Mon, 16 Mar 2026 08:20:47 +0000

On Tuesday March 10, 2026 PUG Belgium met for the March edition, organized by Boriss Mejias and Stefan Fercot.

Speakers:

Esteban Zimanyi
Thijs Lemmens
Yoann La Cancellera

Robert Haas organized a Hacking Workshop on Tuesday March 10, 2026. Tomas Vondra discussed questions about one of his talks.

PostgreSQL Edinburgh meetup Mar 2026 met on Thursday March 12, 2026

Speakers:

Radim Marek
Jimmy Angelakos

FOSSASIA Summit 2026 took place from Sunday March 8 - Tuesday March 10, 2026 in Bangkok.

PostgreSQL speakers:

Koji Annoura
Charly Batista
Gary Evans
Joe Conway
Suraj Kharage
Robert Treat
Sameer Kumar
Roneel Kumar
Sivaprasad Murali
Yugo Nagata
Denis Smirnov
Vaibhav Dalvi
Gyeongseon Park
Bo Peng
Brian McKerr
Chris Travers
Jirayut Nimsaeng
Gilles Darold
Rajni Baliyan

PostgreSQL Conference India took place in Bengaluru (India) from March 11 - March 13, 2026.

Organizers:

Pavan Deolasee
Ashish Kumar Mehra
Nikhil Sontakke
Hari Kiran
Rushabh Lathia

Talk Selection Committee:

Amul Sul
Dilip Kumar
Marc Linster
Thomas Munro
Vigneshwaran c

Speakers:

Abhijeet Rajurkar
Aditya Duvuri
Ajit Awekar
Amit Kumar Singh
Amogh Bharadwaj
Amul Sul
Andreas Scherbaum
Ashutosh Bapat
Avinash Vallarapu
Boopathi Parameswaran
Claire Giordano
Danish Khan
Deepak R Mahto
Dilip Kumar
Divya Bhargov
Dr. M. J. Shankar Raman
Franck Pachot
Hari Kiran
Hari Prasad
Harish Perumal
Jayant Haritsa
Jim Mlodgenski
Jobin Augustine
Joe Conway
Kanthanathan S
Kevin Biju
Koji Annoura
Kranthi Kiran Burada
Lalit Choudhary
Michael Zhilin
Mithun Chicklore Yogendra
Mohit Agarwal
NarendraSingh Tawar
Neel Patel
Neeta Goel
Nikhil Chawla
Nikhil Sontakke
Nishad Mankar
Palak Chaturvedi
Pavan Deolasee
Pushkar Khadilkar
Rahila Syed
Rajeev Rastogi
Rajkumar Raghuwanshi
René Cannaò
Ripunjay Tripathi
Rohith BCS
Roneel Rohitesh Kumar
Sai Srirampur
Sameer Kumar
Samuel Cherukutty
Sashikanta Pattanayak
Sathakathullah Abdul Kafar
Saurabh Gupta
Shashidhar
Shlok Kumar Kyal
Shriram Muthukrishnan
Srinath Reddy Sadipiralla
Sumedh Pathak
Suresh dash
Tom Kincaid
Vaibhav Popat
Vaijayanti Bharadwaj
Venkat Akhil Pavuluri
Vinay Paladi
Vishnu R Nambiar
Wazir Ahmed

Volunteers:

Aarti Nadekar
Aditya Sanjay Raje
Ashesh Vashi
Khushboo Vashi
Pinaz Raut
Rahila Syed

Community Blog Posts:

SCaLE23x by Gabrielle Roth

Richard Yen: Learning AI Fast with pgEdge's RAG

Mon, 16 Mar 2026 08:00:00 +0000

Introduction

If you’ve been paying attention to the technology landscape recently, you’ve probably noticed that AI is everywhere. New frameworks, new terminology, and a dizzying array of acronyms and jargon: LLM, RAG, embeddings, vector databases, MCP, and more.

Honestly, it’s been difficult to figure out where to start. Many tutorials either dive deep into machine learning theory (Bayesian transforms?) or hide everything behind a single API call to a hosted model. Neither approach really explains how these systems actually work.

Recently I spent some time experimenting with the pgEdge AI tooling after hearing Shaun Thomas’ talk at a PrairiePostgres meetup. He talked about how to set up the various components of an AI chatbot system, starting from ingesting documents into a Postgres database, vectorizing the text, setting up a RAG and then an MCP server.

When I got home I wanted to try it out for myself – props to the pgEdge team for making it all free an open-source! What surprised me most was not just that everything worked, but how easy it was to get a complete AI retrieval pipeline running locally. More importantly, it turned out to be one of the clearest ways I’ve found to understand how modern AI systems are constructed behind the scenes. Thanks so much, Shaun!

The pgEdge AI Components

The pgEdge AI ecosystem provides several small tools that fit together naturally. I’ll go through them real quickly here

Doc Converter – The doc-converter normalizes documents into a format that is easy to process downstream. Whether the input is PDF, HTML, Markdown, or plain text, the converter produces clean text output suitable for ingestion.
Vectorizer – The vectorizer handles the process of converting text chunks into embeddings. These embeddings are numeric representations of text that capture semantic meaning. Once generated, they can be stored inside PostgreSQL using pgvector and queried with similarity search.
Retrieval-Augmented Generation (RAG) Server – The RAG framework ties everything together. It orchestrates:
1. embedding the user’s query
2. retrieving similar document chunks
3. assembling prompt context
4. sending the prompt to an LLM
5. returning the generated response

When the full system is running, you essentially have ChatGPT or Gemini running on your laptop

Running Everything Locally with Ollama

With ChatGPT and Gemini, getting tokens or sharing my payment info was a blocker, especially if I just want to test stuff for educational purposes. Through Shaun’s presentation, I was introduced to Ollama, which is a great alternative, if you’re okay with slower performance (especially on a 8GB M1 Mac Mini).

I was pleasantly surprised at how easy it was to run the entire pipeline without relying on external AI APIs. Specifically, I used the embeddinggemma model for generating embeddings. This meant the entire stack could run locally, no API keys required! Running everything locally removes those barriers and definitely makes experimentation much easier.

Understanding RAG by Actually Running It

One of the most confusing concepts in learning AI prior to Shaun’s talk was Retrieval-Augmented Generation (RAG). I learned that what a RAG does is:

Before asking the LLM to answer a question, retrieve relevant information and include it in the prompt.

With the pgEdge pipeline, the flow becomes very visible.

Documents are converted into clean text
Text is split into chunks
Chunks are embedded into vectors
Vectors are stored in PostgreSQL
A question is embedded into a vector
A similarity search finds relevant chunks
Those chunks are inserted into the prompt
The LLM generates the response

From this, I realized that the LLM is not storing my data. Instead, the system retrieves relevant information on demand and feeds it into the prompt. The RAG is a facilitator to the LLM’s response.

The Role of the Vectorizer

The vectorizer is a crucial step in the pipeline. Its job is to convert human language into embeddings, which are high-dimensional numeric representations of meaning. With vectors, searching with natural language becomes possible, instead of old-fashioned keyword matches.

Once the embeddings (vectorized documents) are stored in PostgreSQL using pgvector, everything starts to look familiar again for database engineers:

indexing
storage
similarity search
ranking results

Managing these things look pretty doable for a database guy like me 😂

Don’t Try This At Home!

After writing about the pgEdge stack I wanted to make it as easy as possible for others to reproduce the same experience, so I packaged everything into a Docker Compose project.

Clone the repository and run:

git clone https://github.com/richyen/learn-ai-with-postgres.git
cd learn-ai-with-postgres
mkdir documents # put some txt files in there for vectorization
docker compose up --build

That single command:

Builds a custom PostgreSQL image with pgvector and pgedge_vectorizer compiled in
Starts an Ollama container and pulls the embeddinggemma and glm-4.7-flash models locally
Runs pgedge-docloader to ingest any documents you’ve put into the documents/ folder
Calls pgedge_vectorizer.enable_vectorization(), which starts background workers inside Postgres that chunk and embed every page
Starts the RAG server on port 8080

No API keys, no cloud services. Everything runs on your own hardware.

Once the RAG server is up (watch for the setup container to exit cleanly), try asking it a question:

curl -s -X POST http://localhost:8080/v1/pipelines/pg-docs \
  -H "Content-Type: application/json" \
  -d '{"query": "How does autovacuum decide when to run?"}' \
  | jq .

The answer comes back a few seconds later, grounded in the actual PostgreSQL documentation:

{
  "answer": "Autovacuum in PostgreSQL is triggered based on thresholds defined by two parameters: autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor. The daemon considers a table eligible for vacuuming when the number of dead tuples exceeds the threshold plus (scale_factor × total row count) ..."
}

You can also run raw similarity searches directly in SQL to see exactly what the retrieval step is doing before the LLM touches anything:

SELECT
    d.title,
    left(c.content, 200) AS snippet
FROM documents_content_chunks c
JOIN documents d ON c.source_id = d.id
WHERE c.embedding IS NOT NULL
ORDER BY c.embedding <=>
    pgedge_vectorizer.generate_embedding('autovacuum threshold configuration')
LIMIT 5;

This is the same pgvector <=> (cosine distance) operator the RAG server uses internally — you can inspect the retrieval step at any time without going through the HTTP API.

Embeddings are generated in the background by Postgres workers, so you can start querying as soon as a few hundred chunks are ready. Watch the progress with:

psql postgresql://postgres:password@localhost:5432/pgai -c "
SELECT
  (SELECT count(*) FROM documents)                                             AS total_docs,
  (SELECT count(*) FROM documents_content_chunks WHERE embedding IS NOT NULL)  AS vectorized;
"

The project also includes the pgedge-postgres-mcp server on port 8081, which exposes the knowledge base via the Model Context Protocol — so it can be wired directly into Claude Desktop, VS Code Copilot, or any other MCP-compatible client.

Final Thoughts

There’s a lot of pressure right now to “learn AI,” but that phrase can mean many different things. For people coming from infrastructure, databases, or backend engineering, one of the most approachable paths is simply:

build a small RAG pipeline and observe how the pieces fit together.

The pgEdge tooling made this surprisingly straightforward. Instead of assembling half a dozen unrelated frameworks, the components already fit together:

doc ingestion
vectorization
PostgreSQL storage
retrieval
prompt generation
LLM response

Once I saw the entire flow working end-to-end, the AI ecosystem makes a lot more sense. Setting up the pgEdge RAG stack turned out to be a surprisingly effective way to see that architecture in action.

Enjoy!

Dave Page: AI Features in pgAdmin: AI Insights for EXPLAIN Plans

Mon, 16 Mar 2026 06:31:22 +0000

This is the third and final post in a series covering the new AI functionality in pgAdmin 4. In the first post, I covered LLM configuration and the AI-powered analysis reports, and in the second, I introduced the AI Chat agent for natural language SQL generation. In this post, I'll walk through the AI Insights feature, which brings LLM-powered analysis to PostgreSQL EXPLAIN plans.Anyone who has spent time optimising PostgreSQL queries knows that reading EXPLAIN output is something of an acquired skill. pgAdmin has long provided a graphical EXPLAIN viewer that makes the plan tree easier to navigate, along with analysis and statistics tabs that surface key metrics, but interpreting what you're seeing and deciding what to do about it still requires a solid understanding of the query planner's behaviour. The AI Insights feature aims to bridge that gap by providing an expert-level analysis of your query plans, complete with actionable recommendations.

Where to Find It

AI Insights appears as a fourth tab in the EXPLAIN results panel, alongside the existing Graphical, Analysis, and Statistics tabs. It's only visible when an LLM provider has been configured, so if you don't see it, check that you've set up a provider in Preferences (as described in the first post). The tab header simply reads 'AI Insights'.To use it, run a query with EXPLAIN (or EXPLAIN ANALYZE for the most useful results, since actual execution timings give the AI much more to work with), and then click on the AI Insights tab. The analysis starts automatically when you switch to the tab, or you can trigger it manually with the Analyze button.

What the Analysis Provides

The AI Insights analysis produces three sections:

Summary

A concise paragraph providing an overall assessment of the query plan's performance characteristics. This gives you a quick sense of whether the plan is generally healthy or has significant issues worth investigating. For well-optimised queries, the summary will confirm that the plan looks reasonable; for problematic ones, it highlights the key areas of concern.

Performance Bottlenecks

This is the heart of the analysis. The AI examines the plan tree and identifies specific nodes that may be causing performance problems. Each bottleneck is presented as a card showing:

: Classified as high, medium, or low, with colour-coded indicators (red for high, orange for medium, blue for low) so you can quickly spot the most important issues

: The specific plan node involved (for example, 'Seq Scan on orders' or 'Nested Loop')

Issue: A brief description of the problem

: A more thorough explanation of why this is a problem and what impact it has on query performance

The types of issues the analysis looks for include sequential scans on large tables where an index might help, nested loops with high row counts that suggest missing indexes or poor join ordering, large variances between estimated and actual row counts (which usually indicate stale statistics), sort operations on large datasets without supporting indexes, hash joins spilling to disk, and bitmap heap scans with excessive recheck conditions.Importantly, the analysis also applies contextual judgement. Not every sequential scan is a problem; scanning a small lookup table sequentially is often faster than using an index, and the AI takes table size and selectivity into account when deciding whether to flag something as an issue.

Recommendations

Each identified bottleneck comes with one or more prioritised recommendations for addressing it. Recommendations are numbered by priority, with the most impactful changes listed first. Each recommendation includes:

: A short description of the suggested change

: Why this change will help, connecting the recommendation back to the specific bottleneck

: Where applicable, the exact SQL statement to implement the recommendation

This last point is particularly valuable. Rather than telling you "consider adding an index" and leaving you to work out the details, the analysis provides the actual statement with the appropriate table name, column list, and index type. Each SQL code block has a copy button and an 'Insert into Editor' button that places the SQL directly into your query editor, so you can review and execute it with minimal friction.Recommendations aren't limited to index creation, however. You might see suggestions to run on tables with stale statistics, to adjust for queries that are spilling sorts or hash operations to disk, to rewrite suboptimal query structures, or to consider partial indexes when a full index would be unnecessarily large.

A Worked Example

To give a sense of what this looks like in practice, imagine you run EXPLAIN ANALYZE on a query that joins a large table with a table and filters by date range. The AI Insights analysis might produce something like this:: The query takes 2.3 seconds to execute, with the majority of time spent on a sequential scan of the table. The join to is well-optimised using an index lookup, but the date range filter on is not supported by an index, causing a full table scan of 4.2 million rows. (High Severity): Sequential Scan on , scanning 4,200,000 rows but returning only 12,500. The planner estimated 15,000 rows, suggesting statistics are reasonably up to date, but the lack of an index forces a full scan.: Create an index on the date column:: If queries typically also filter by status, consider a composite index:You could click 'Insert into Editor' on either recommendation, review the statement, execute it, and then re-run your EXPLAIN ANALYZE to see the improvement.

Downloading Reports

If you want to save or share the analysis, the Download button exports a complete Markdown report including the original SQL query, the raw execution plan, and the full AI analysis with all bottlenecks and recommendations. The file is named with the current date (for example, ) for easy filing.

Regenerating and Stopping

Because LLM responses can vary between invocations, you might occasionally want to get a second opinion on the same plan. The Regenerate button reruns the analysis from scratch, which can sometimes surface different insights or provide alternative recommendations. If a new EXPLAIN is run whilst the AI Insights tab is visible, the analysis will automatically trigger for the new plan.If the analysis is taking longer than expected (the timeout is five minutes, though most analyses complete in well under a minute), you can click the Stop button to cancel the in-flight request. The panel will show an 'Analysis stopped' message and you can choose to retry or move on.

How It Works Under the Hood

When you trigger an analysis, the frontend sends the full EXPLAIN plan (in JSON format) and the original SQL query to a backend endpoint via a streaming HTTP request. The backend constructs a prompt that instructs the LLM to act as a PostgreSQL performance expert, providing it with detailed guidelines on what to look for in query plans and how to classify severity. The LLM's response is parsed as structured JSON (with bottlenecks, recommendations, and summary as separate fields), which allows the frontend to render each piece with appropriate formatting and interactivity.The streaming architecture means you see a 'thinking' indicator whilst the analysis is in progress, with rotating messages such as 'Analyzing query plan...', 'Examining node costs...', 'Looking for sequential scans...', and 'Evaluating join strategies...'. Results appear as soon as the LLM completes its response, without needing to reload or poll.

Getting the Most from AI Insights

A few suggestions for making the most of this feature:

. The actual execution timings and row counts give the AI significantly more information to work with. Plain EXPLAIN provides only the planner's estimates, which limits the depth of analysis possible.

. The AI provides excellent starting points, but you should consider your specific workload patterns before creating indexes. An index that helps one query might slow down write-heavy operations on the same table. Use the recommendations as informed suggestions that merit testing rather than directives to follow without question.

. Even if you're already experienced with EXPLAIN output, the detailed explanations of why specific plan nodes are problematic can help reinforce your understanding or occasionally highlight something you might have overlooked. For less experienced users, it's an excellent way to build up familiarity with how PostgreSQL executes queries.

. If the AI Insights analysis identifies issues but you want to explore further (perhaps to understand your data distribution or check current index usage statistics), switch to the AI Chat agent and ask follow-up questions. The two features complement each other well.

Wrapping Up the Series

Across these three posts, I've covered the full range of AI functionality now available in pgAdmin 4: the LLM configuration that underpins everything, the AI-powered security, performance, and schema design reports for proactive database analysis, the AI Chat agent for natural language SQL generation and database exploration, and the AI Insights feature for query plan optimisation.All of these features are designed to enhance rather than replace your expertise. They lower the barrier to performing analyses that would otherwise require significant time and specialist knowledge, whilst keeping you firmly in control of what actually gets executed against your database. Whether you use a cloud-hosted model from Anthropic or OpenAI, or prefer to keep everything local with Ollama or Docker Model Runner, the AI features adapt to your environment and preferences.Give them a try; I think you'll find they become a natural part of your PostgreSQL workflow. And as always, we welcome feedback and contributions from the community.

Pavel Luzanov: PostgreSQL 19: part 4 or CommitFest 2026-01

Mon, 16 Mar 2026 00:00:00 +0000

Continuing the series of CommitFest 19 reviews, today we’re covering the January 2026 CommitFest.

The highlights from previous CommitFests are available here: 2025-07, 2025-09, 2025-11.

Partitioning: merging and splitting partitions
pg_dump[all]/pg_restore: dumping and restoring extended statistics
file_fdw: skipping initial rows
Logical replication: enabling and disabling WAL logical decoding without server restart
Monitoring logical replication slot synchronization delays
pg_available_extensions shows extension installation directories
New function pg_get_multixact_stats: multixact usage statistics
Improvements to vacuum and analyze progress monitoring
Vacuum: memory usage information
vacuumdb --dry-run
jsonb_agg optimization
LISTEN/NOTIFY optimization
ICU: character conversion function optimization
The parameter standard_conforming_strings can no longer be disabled

...

Ashutosh Bapat: Professional karma

Sat, 14 Mar 2026 05:48:00 +0000

In the very early days of my career, an incident made me realise that perfoming my job irresponsibily will affect me adversely, not because it will affect my position adversely, but because it can affect my life otherwise also. I was part a team that produced a software used by a financial institution where I held my account. A bug in the software caused a failure which made several accounts, including my bank account, inaccessible! Fortunately I wasn't the one who introduced that bug and neither was other software engineer working on the product. It has simply crept through the cracks that the age-old software had developed as it went through many improvements. Something that happens to all the architectures, software or otherwise in the world. That was an enlightening and eve opening experience. But professional karma is not always bad; many times it's good. When the humble work I do for earning my living also improves my living, it gives me immense satisfaction. It means that it's also improving billions of lives that way across the globe.

When I was studying post-graduation in IIT Bombay, I often travelled by train - local and intercity. The online ticketing system for long distant trains was still in its early stages. Local train tickets were still issued at stations and getting one required standing in a long queue. Fast forward to today, you can buy a local train ticket on a mobile App or at a kiosk at the station by paying online through UPI. In my recent trip to IIT Bombay I bought such a ticket using GPay in a few seconds. And know what, UPI uses PostgreSQL as an OLTP database in its system. I didn't have to go through the same experience thank to the same education and the work I am doing. Students studying in my alma-matter no more have to go through the same painful experience now, thanks to many PostgreSQL contributors who once were students and might have similar painful experiences in their own lives.

In PGConf.India, Koji Annoura, who is a Graph database expert talked about our ongoing work on SQL/PGQ. He is also a certified professional for coffee, a drink that I greatly enjoy! He talked about improving coffee supply chain using graph databases. He is using SQL/PGQ, the software I am co-authorigin with Peter Eisentraut. I love coffee and my work is helping me procure a better quality coffee at a cheaper price!. Someone telling me that my work was useful to them gives me an immense satisfaction, irrespective of the size of the cause. As software developers we don't often get to hear that from the end users. Open source software and conferences around that give us that opportunity.

Professional life is full of stress, stress to get work done, to get paid, to secure job, promoted and what not. That stress is so overwhelming that we often loose the site of greater purpose. We often fail to notice that our work has other, arguably greater, benefits. These simple moments are enough to motivate me to continue doing our work leaving behind the stress that it carries.

Shane Borden: More Obscure Things That Make It Go “Vacuum” in PostgreSQL

Fri, 13 Mar 2026 15:51:40 +0000

I previously blogged about ensuring that the “ON CONFLICT” directive is used in order to avoid vacuum from having to do additional work. I also later demonstrated the characteristics of how the use of the MERGE statement will accomplish the same thing.

You can read the original blogs here Reduce Vacuum by Using “ON CONFLICT” Directive and here Follow-Up: Reduce Vacuum by Using “ON CONFLICT” Directive

Now in another recent customer case, I was chasing down why the application was invoking 10s of thousands of Foreign Key and Constraint violations per day and I began to wonder, if these kinds of errors also caused additional vacuum as described in those previous blogs. Sure enough it DEPENDS.

Let’s set up a quick test to demonstrate:

/* Create related tables: */
CREATE TABLE public.uuid_product_value (
        id int PRIMARY KEY,
        pkid text,
        value numeric,
        product_id int,
        effective_date timestamp(3)
        );

CREATE TABLE public.uuid_product (
        product_id int PRIMARY KEY
        );

ALTER TABLE uuid_product_value
    ADD CONSTRAINT uuid_product_value_product_id_fk 
    FOREIGN KEY (product_id) 
    REFERENCES uuid_product (product_id) ON DELETE CASCADE;

/* Insert some mocked up data */
INSERT INTO public.uuid_product VALUES ( 
        generate_series(0,200));

INSERT INTO public.uuid_product_value VALUES ( 
        generate_series(0,10000), 
        gen_random_uuid()::text,
        random()*1000,
        ROUND(random()*100),
        current_timestamp(3));
 
/* Vacuum Analyze Both tables */
VACUUM (VERBOSE, ANALYZE) uuid_product;
VACUUM (VERBOSE, ANALYZE) uuid_product_value;

/* Verify that there are no dead tuples: */
SELECT
    schemaname,
    relname,
    n_live_tup,
    n_dead_tup
FROM
    pg_stat_all_tables
WHERE
    relname in ('uuid_product_value', 'uuid_product');
 
 schemaname |      relname       | n_live_tup | n_dead_tup
------------+--------------------+------------+------------
 public     | uuid_product_value |      10001 |          0
 public     | uuid_product       |        201 |          0

Then, let’s issue a simple insert that will violate the FK and check to see if dead tuples were generated:

/* Insert a row that violates the FK, without the ON CONFLICT directive */
INSERT INTO public.uuid_product_value VALUES ( 
        generate_series(10001,10001), 
        gen_random_uuid()::text,
        random()*1000,
        202,  /* we know this product_id doesn't exist in the parent */
        current_timestamp(3));

ERROR:  insert or update on table "uuid_product_value" violates foreign key constraint "uuid_mod_test_product_id_fk"
DETAIL:  Key (product_id)=(202) is not present in table "uuid_product".
Time: 3.065 ms

And now check the tuple stats:

SELECT
    schemaname,
    relname,
    n_live_tup,
    n_dead_tup
FROM
    pg_stat_all_tables
WHERE
    relname in ('uuid_product_value', 'uuid_product');

 schemaname |      relname       | n_live_tup | n_dead_tup
------------+--------------------+------------+------------
 public     | uuid_product_value |      10001 |          1
 public     | uuid_product       |        201 |          0

Sure enough, we now have a dead row as a result of the FK violation on the insert. But, will an “ON CONFLICT” directive help us in this scenario like in the others?

/* Insert a row that violates the FK, but with the ON CONFLICT directive */
INSERT INTO public.uuid_product_value VALUES ( 
        generate_series(10001,10001), 
        gen_random_uuid()::text,
        random()*1000,
        202,  /* we know this product_id doesn't exist in the parent */
        current_timestamp(3)) ON CONFLICT DO NOTHING;

/* Verify the tuple stats: */
SELECT
    schemaname,
    relname,
    n_live_tup,
    n_dead_tup
FROM
    pg_stat_all_tables
WHERE
    relname in ('uuid_product_value', 'uuid_product');

 schemaname |      relname       | n_live_tup | n_dead_tup
------------+--------------------+------------+------------
 public     | uuid_product_value |      10001 |          2
 public     | uuid_product       |        201 |          0

Unfortunately, it does not solve this problem. So we need to really be cognizant of FK violations and its effect on vacuum. Now what about trying to insert a NULL into a NOT NULL column? Will that result in a dead row? Let’s check.

/* Alter a column to NOT NULL */
ALTER TABLE public.uuid_product_value
    ALTER COLUMN pkid SET NOT NULL;

/* Check the table definition */
                        Table "public.uuid_product_value"
     Column     |              Type              | Collation | Nullable | Default
----------------+--------------------------------+-----------+----------+---------
 id             | integer                        |           | not null |
 pkid           | text                           |           | not null |
 value          | numeric                        |           |          |
 product_id     | integer                        |           |          |
 effective_date | timestamp(3) without time zone |           |          |
Indexes:
    "uuid_mod_test_pkey" PRIMARY KEY, btree (id)
    "uuid_mod_test_product_id_idx" btree (product_id) WHERE id >= 1 AND id <= 1000
    "uuid_mod_test_product_id_idx1" hash (product_id)
Foreign-key constraints:
    "uuid_mod_test_product_id_fk" FOREIGN KEY (product_id) REFERENCES uuid_product(product_id) ON DELETE CASCADE

/* Insert a row that violates the NOT NULL constraint */
INSERT INTO public.uuid_product_value VALUES ( 
        generate_series(10001,10001), 
        NULL,
        random()*1000,
        200,
        current_timestamp(3));
ERROR:  null value in column "pkid" of relation "uuid_product_value" violates not-null constraint
DETAIL:  Failing row contains (10001, null, 613.162063338205, 200, 2026-03-13 14:25:28.758).

/* Verify the tuple stats: */
SELECT
    schemaname,
    relname,
    n_live_tup,
    n_dead_tup
FROM
    pg_stat_all_tables
WHERE
    relname in ('uuid_product_value', 'uuid_product');

 schemaname |      relname       | n_live_tup | n_dead_tup
------------+--------------------+------------+------------
 public     | uuid_product_value |      10001 |          2
 public     | uuid_product       |        201 |          0

As you can see, a violation of the NOT NULL constraint does not have the same behavior as a violation of the FK constraint. It’s always good to know and relay to the application development staff what operations are going to result in more work for the database and adjust the code accordingly. Enjoy!

Shaun Thomas: Using Patroni to Build a Highly Available Postgres Cluster—Part 2: Postgres and Patroni

Fri, 13 Mar 2026 06:12:14 +0000

Welcome to Part two of our series about building a High Availability Postgres cluster using Patroni! Part one focused entirely on establishing the DCS using etcd, providing the critical layer that Patroni uses to store metadata and guarantee its leadership token uniqueness across the cluster.With this solid foundation, it's now time to build the next layer in our stack: Patroni itself. Patroni does the job of managing the Postgres service and provides a command interface for node administration and monitoring. Technically the Patroni cluster is complete at the end of this article, but stick around for part three where we add the routing layer that brings everything together.Hopefully you still have the three VMs where you installed etcd. Those will be the same place where everything else happens, so if you haven’t already gone through the steps in part one, come back when you’re ready.Otherwise, let’s get started!

Installing Postgres

The Postgres community site has an incredibly thorough page dedicated to installation on various platforms. For the sake of convenience, this guide includes a simplified version of the Debian instructions. Perform these steps on all three servers.Start by setting up the PGDG repository:Then install your favorite version of Postgres. For the purposes of this guide, we’re also going to stop Postgres and drop the initial cluster the Postgres package creates. Patroni will recreate all of this anyway, and it should be in control.It’s also important to completely disable the default Postgres service since Patroni will be in charge:Finally, install the version of Patroni included in the PGDG repositories. This should be available on supported platforms like Debian and RedHat variants, but if it isn’t, you may have to resort to the official installation instructions.Once that command completes, we should have three fresh VMs ready for configuration.

Configuring Patroni the easy way

The Debian Patroni package provides a tool called that transforms a Patroni template into a configuration file customized specifically for Debian systems. Before using it, it’s necessary to modify part of that template to use etcd, as ZooKeeper is the default. Perform these steps on all three servers.Note that the YAML header shows “etcd3” rather than simply “etcd”. Patroni uses etcd2 by default for backward compatibility purposes, and version 3 requires a much different communication protocol.Then create the rest of the config with a single command:This creates a file named in the configuration directory, which systemd uses when managing this specific cluster. We’ll also need this for invoking .

Understanding Patroni configuration

Despite the fact that the configuration file is already complete, it’s important to actually understand the purpose of each section and what it does. This will enable users of other platforms to manually configure Patroni if necessary.Let’s start with the topmost section dedicated to the DCS:When Patroni writes to the DCS, all keys start at the path specified by the parameter. Similarly as one DCS may host multiple clusters, keys for this cluster must include in the key path. The indicates how Patroni should refer to this individual node. The configuration tool actually uses the DCS to see which names are already reserved so each VM will be uniquely identified. Go ahead and check all three to make sure they’re correct.The next section, labeled , determines how Patroni should create the initial Postgres cluster, the parameters to use, and other important information. It’s also pretty long, so let’s look at each individual portion:Normally Patroni uses when creating a new cluster, but for full compatibility with Debian organization quirks regarding Postgres, the configuration specifies an alternative command. This short section will likely only appear on a Debian system.Next comes the section under . All of these parameters should be covered in the Dynamic Configuration Settings documentation, but we’ll explain the important ones. It’s important to note that any settings defined here actually persist in the DCS layer and apply to Patroni on all nodes. After initialization, the only way to change these parameters is through the utility. It’s a good idea to make sure all of these are set properly, as changing them later is somewhat inconvenient.These parameters define how Patroni interacts with the DCS layer and how it should manage certain Postgres features. Remember that the leadership token determines which node is the primary, so defines how long that lease should last, controls how long to wait between lease renewals, and says how long to wait for a response from the DCS.We’ve included in this output because the leader race isn’t quite absolute. If Patroni promotes a node to primary, or determines Postgres has failed, it has up to this timeout before it forces a failover. The provides a grace period for crash recovery to complete, but you may find the default of five minutes is much too long. Another important parameter here is , which tells Patroni it should manage the Postgres configuration setting by automatically using names of other nodes in the cluster. This is how you would enable synchronous replication in Patroni.Next is the section under : This section defines how Patroni should operate the Postgres service. The first few parameters control how Patroni recycles old primary nodes, such as using the utility when possible, and whether it should erase the data directory as a last resort. Patroni also uses replication slots for replicas by default to prevent unnecessary replica rebuilds in failure scenarios.You can also pass GUC settings directly to Postgres on all nodes through the section. This is useful for providing important cluster-wide settings that may not be hardware dependent, such as , , or .The final section under the heading is : You’ll want to customize this section before starting Patroni; it uses this to build the pg_hba.conf file that controls incoming connection access. The default will allow connections on the server’s subnet if you uncomment the disabled line, otherwise it’s local access only.Next is another section, but this is a top-level header meant to tell Patroni how it should handle Postgres on this specific server. These sections are explained in more detail in the Patroni YAML Configuration Settings documentation.This example starts with some Debian-specific content:As before, this is so Debian can integrate with the other packaged Postgres tooling, so it’s safe to skip on other platforms. After that comes a few pertinent parameters for handling connections: This sample effectively tells Patroni how it should connect to the local Postgres service for administrative actions. Patroni uses unix sockets when possible using these settings, which makes sense as Patroni runs as the postgres OS user and has direct socket access.Then comes a fun section that defines several paths: Patroni knows it will be installed in several different environments where Postgres and configuration directories may be in completely arbitrary locations. These are the defaults for Postgres 18 running on a Debian system.Lastly there’s a second parameters section, meant for parameters that should only apply to this specific Postgres server: Nothing here should be surprising; it’s mostly just log storage for the local instance and where the unix socket directory is located. These are likely to be universal across the cluster, but it’s safer to leave them out of the DCS section. If there is ever any variance caused by a hardware or OS distribution migration, you’ll want to have the ability to change these locally.In any case, take some time to examine the file on each node to spot-check it for any mistakes.

Starting and validating Patroni

The Patroni package provides a standard systemd service file; simply enable and start the service on all VMs.One of the three nodes will “win” the leader race and become the primary for the cluster. Patroni then invokes the command on that system to create the data and configuration directories before starting Postgres. On the other nodes, Patroni calls instead to create new streaming replicas. If you want a specific node to start as the primary, simply start Patroni on that node and wait for it to establish a cluster before starting the service on the other two.The end result on all three systems should be a new “demo” database visible to : The next step is to check the status of the Patroni cluster itself. You should be able to run this command from any node as the OS user. It will also work as , but now that Patroni is installed and managing the cluster, it’s best to avoid relying on the root user.This output tells us the cluster is healthy and operational, node 1 is the current primary, both replicas are streaming, and there’s no replication lag. Success!

Editing the cluster configuration

The last step that might be necessary is to modify the cluster configuration stored in the DCS layer. These are the Postgres parameters and pg_hba.conf entries used to bootstrap the initial state of the cluster, and it’s easy to make mistakes early on.Once again, comes to the rescue: Patroni loads the current DCS config into the current default editor, and in our case it looks like this:Use this as an opportunity to fix any missing HBA lines, or add any Postgres parameters that should apply to all nodes. For example, add under to enable logical replication:Since changing the parameter requires a Postgres restart, use to restart the nodes in the cluster: Then check with Postgres to verify that the setting changed as expected. This is the output from node 3, even though I modified the DCS and restarted the cluster from node 1:

Finishing up

Now you know why this series was broken into three parts! Setting up Patroni isn’t too difficult by itself, but getting the configuration right, knowing how and why each section works the way it does, and continuing to modify the cluster after deployment, is a complex process. But if you followed along, you should have a fully operational Patroni cluster at this very moment.Technically you can even stop here and skip the third and final installment of this series. Postgres supports multi-host connection strings, and specifying for the restricts connections to the primary node. Connecting with psql might look like this:But what if, in some distant future, we change server names, or add more nodes to the cluster, or want other connection restrictions? That’s where the routing layer comes in, and what fully completes a Patroni deployment.So come back next week to learn about HAProxy and how it provides that critical and final component!

Deepak Mahto: PGConf India 2026: PostgreSQL Query Tuning: A Foundation Every Database Developer Should Build

Fri, 13 Mar 2026 01:12:23 +0000

Most PostgreSQL tuning advice that folks chase is quick fixes but not on understanding what made planners choose an path or join over others optimal path. !

Tuning should not start with Analyze on tables involved in the Query but with intend what is causing the issue and why planner is not self sufficient to choose the optimal path.

Most fixes we search for SQL tuning are around,

Add an index. 
Rewrite the query. 
Bump work_mem. 
Done.

Except it’s not done. The same problem comes back, different query, different table, same confusion.

The Real Problem

A slow query is a symptom. Statistics, DDL, query style, and PG version are the actual culprit’s.

Before you touch anything, you need to answer five questions — in order:

Find it — which query actually hurts the most right now?
Read the plan — what is the planner doing and where is it wrong?
Check statistics — is the planner even working with accurate data?
Check the DDL — is your schema helping or hiding the answer?
Check GUCs & version — are the defaults silently working against you?

5-Dimension SQL Tuning Framework

Most developers skip straight to question two. Many skip to indexes without asking any question at all.

What I Covered at PGConf India 2026

I presented this framework at PGConf India yesterday, a room full of developers and DBA , sharp questions, and a lot of “I’ve hit exactly this” moments.

The slides cover core foundations for approaching Query Tuning and production gotchas including partition pruning, SARGability, CTE fences, and correlated column statistics.

Slide – PostgreSQL Query Tuning: A Foundation Every Database Developer Should Build

Pavel Luzanov: PostgreSQL 19: part 3 or CommitFest 2025-11

Fri, 13 Mar 2026 00:00:00 +0000

This article reviews the November 2025 CommitFest.

For the highlights of the previous two CommitFests, check out our last posts: 2025-07, 2025-09.

Planner: eager aggregation
Converting COUNT(1) and COUNT(not_null_col) to COUNT(*)
Parallel TID Range Scan
COPY … TO with partitioned tables
New function error_on_null
Planner support functions for optimizing set-returning functions (SRF)
SQL-standard style functions with temporary objects
BRIN indexes: using the read stream interface for vacuuming
WAIT FOR: waiting for synchronization between replica and primary
Logical replication of sequences
pg_stat_replication_slots: a counter for memory limit exceeds during logical decoding
pg_buffercache: buffer distribution across OS pages
pg_buffercache: marking buffers as dirty
Statistics reset time for individual relations and functions
Monitoring the volume of full page images written to WAL
New parameter log_autoanalyze_min_duration
psql: search path in the prompt
psql: displaying boolean values
pg_rewind: skip copying WAL segments already present on the target server
pgbench: continue running after SQL command errors

...

Vibhor Kumar: Transparent Column Encryption in PostgreSQL: Security Without Changing Your SQL

Thu, 12 Mar 2026 15:19:49 +0000

There is a moment in many database reviews when the room becomes a little too quiet.

Someone asks:

“Which columns in this database are encrypted?”

At first, the answers sound reassuring.

“We use TLS.”

“The disks are encrypted.”

“The application handles sensitive fields.”

And then the real picture starts to emerge.

Some values are encrypted in one service but not another.

Some migrations remembered to apply encryption.

Some scripts did not.

Some backups are safe in theory, but no one wants to test that theory the hard way.

That is the uncomfortable truth of database security:

encryption is often present, but not always enforced where the data actually lives.

That is exactly the problem I wanted to explore with the PostgreSQL extension:

column_encrypt: https://github.com/vibhorkum/column_encrypt

This extension provides transparent column-level encryption using custom PostgreSQL datatypes so developers can read and write encrypted columns without changing their SQL queries.

And perhaps the most human part of this project is this:

the idea for this project started back in 2016.

It stayed with me for years as one of those engineering ideas that never quite leaves your mind — the thought that PostgreSQL itself could enforce encryption at the column level.

Now I’ve finally decided to release it.

This is the first public version. It’s a starting point — useful, practical, and hopefully something the PostgreSQL community can explore and build upon.

Why This Matters

Encryption conversations often focus first on infrastructure.

We encrypt disks.
We use TLS connections.
We protect credentials.

All of these are important.

But once data is inside the database, a different question matters:

What happens if someone gains access to the database itself?

That access might come from:

a leaked backup
an overprivileged account
a dump file
a compromised service
an operational mistake

At that point infrastructure encryption has already done its job.

The real question becomes:

Are the most sensitive columns still readable?

That is where column-level encryption becomes critical.

Not as a compliance checkbox.

But as blast-radius reduction.

Because security is not only about preventing breaches — it is also about limiting the damage when something goes wrong.

The Problem with Application-Level Encryption

Many teams implement encryption in the application layer.

In theory this works.

In practice it often becomes fragmented over time.

Different services implement encryption differently.

Migration scripts forget encryption steps.

ETL jobs bypass application logic.

Support scripts accidentally insert plaintext values.

The result is predictable:

inconsistent encryption behavior
scattered security logic
difficult auditing
accidental plaintext storage

The biggest limitation is simple:

the database itself cannot enforce encryption consistently.

If someone forgets to encrypt a value, PostgreSQL will store it as plaintext.

That’s where database-level encryption changes the story.

Transparent Column Encryption in PostgreSQL

The column_encrypt extension moves encryption directly into PostgreSQL.

It introduces two encrypted data types:

ENCRYPTED_TEXT
ENCRYPTED_BYTEA

These types perform encryption and decryption automatically at the datatype level.

That means:

On INSERT or UPDATE the plaintext value is encrypted
On SELECT the ciphertext is decrypted
SQL queries remain unchanged

In other words, developers interact with encrypted columns as if they were normal data.

Example: Using Encrypted Columns

Create a table with an encrypted column:

CREATE TABLE secure_data (
id SERIAL,
ssn ENCRYPTED_TEXT
);

Insert values normally:

INSERT INTO secure_data(ssn) VALUES (‘888-999-2045’);
INSERT INTO secure_data(ssn) VALUES (‘888-999-2046’);
INSERT INTO secure_data(ssn) VALUES (‘888-999-2047’);

Query the data:

SELECT * FROM secure_data;

With the correct key loaded in the session, PostgreSQL returns decrypted values.

id | ssn
—-+————–
1 | 888-999-2045
2 | 888-999-2046
3 | 888-999-2047

Without the key loaded:

ERROR: cannot decrypt data, because key was not set

This behavior ensures encrypted columns remain protected.

Why Database-Level Encryption Can Be Better

Moving encryption into PostgreSQL has several advantages.

Encryption becomes part of the schema

Encrypted columns are visible in the table definition. Security becomes part of database design rather than scattered across application code.

SQL remains simple

Developers can use normal SQL statements without wrapping every query in encryption functions.

Consistent enforcement

PostgreSQL itself ensures encrypted storage. Developers cannot accidentally bypass encryption.

Safer backups and dumps

Even if a database dump leaks, sensitive columns remain encrypted.

Easier security audits

Encrypted data types make it easy to identify which columns contain protected data.

How It Works Internally

The extension registers two custom PostgreSQL base types backed by bytea.

On INSERT / UPDATE

The type input functions:

col_enc_text_in
col_enc_bytea_in

encrypt plaintext values using the active Data Encryption Key (DEK).

On SELECT

The output functions:

col_enc_text_out
col_enc_bytea_out

decrypt ciphertext values using the loaded key.

Encryption and decryption occur transparently at the type boundary.

Key Management Model

The extension uses a two-tier key architecture.

Data Encryption Key (DEK)

Used to encrypt column data.

Stored encrypted in the database.

Key Encryption Key (KEK)

A master passphrase used to wrap the DEK.

The KEK is never stored inside PostgreSQL.

Each session must load the key before accessing encrypted data.

Security Features Built Into the Extension

Several operational safeguards are included.

Log masking

A PostgreSQL emit_log_hook prevents sensitive key material from appearing in logs or pg_stat_activity.

Row-Level Security

The internal cipher_key_table uses Row-Level Security to restrict access.

Secure memory cleanup

Keys stored in session memory are securely wiped when removed.

Key version header

Each ciphertext contains a key version identifier to support key rotation.

Key Rotation Support

The extension provides a helper function to re-encrypt existing data with a new key.

SELECT cipher_key_reencrypt_data(
‘public’,
‘secure_data’,
‘ssn’
);

This allows encrypted data to be rotated to new keys without losing access to existing values.

Querying Encrypted Data

The extension supports hash indexes for equality comparisons.

Example:

CREATE INDEX idx_ssn
ON secure_data USING hash(ssn);

This allows queries such as:

SELECT * FROM secure_data
WHERE ssn = ‘888-999-2045’;

┌─────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using idx_ssn on secure_data (cost=0.00..12.02 rows=1 width=36) │
│ Index Cond: (ssn = ‘888-999-2045’::encrypted_text) │
└─────────────────────────────────────────────────────────────────────────────┘
(2 rows)

What Should Be Considered Before Using It

Because this is the first public version of a project originally started in 2016, it should be approached thoughtfully.

Before production use:

perform code and security reviews
validate it against your PostgreSQL version
test backup and failover behavior
evaluate connection pooling scenarios
carefully manage session keys
practice key rotation procedures

Encryption components deserve extra scrutiny.

When Column Encryption Is Most Useful

This approach works well for sensitive identifiers such as:

social security numbers
financial account numbers
API tokens
healthcare identifiers
personal identification data

These fields typically require strong protection but are not heavily indexed.

When It May Not Be the Right Fit

Be cautious when encrypting columns that:

are frequently used in range queries
participate in large joins
require advanced indexing strategies

Encryption works best when applied selectively to high-value data.

PostgreSQL’s Extensibility Makes This Possible

PostgreSQL’s extension architecture allows developers to extend the database engine itself.

Well-known examples include:

PostGIS
pgvector
TimescaleDB
pgcrypto

column_encrypt explores how PostgreSQL’s extensibility can also strengthen database security.

Try It Yourself

Clone and install the extension:

git clone https://github.com/vibhorkum/column_encrypt.git
cd column_encrypt
make
make install

Add to postgresql.conf:

shared_preload_libraries = ‘$libdir/column_encrypt’

Restart PostgreSQL and create the extension:

CREATE EXTENSION column_encrypt;

You can then start defining encrypted columns using ENCRYPTED_TEXT or ENCRYPTED_BYTEA.

Feedback Welcome

This project is being released as a first version, and I would genuinely value feedback from the PostgreSQL community.

If you try it, review it, or have ideas for improving it, please share your feedback on the GitHub repository:

https://github.com/vibhorkum/column_encrypt

That could include:

design suggestions
security observations
performance considerations
compatibility findings
operational lessons
ideas for future enhancements

Good open source gets better through review, discussion, and real-world use. This project should be no different.

Final Thoughts

Some ideas take time.

The concept behind column_encrypt began in 2016, and this release represents its first public version.

It is not the final word on column-level encryption in PostgreSQL, but it explores a design direction I still believe is important: moving encryption closer to where the data actually lives.

Security often fails when it depends on every developer, every script, and every service remembering to do the right thing.

Systems become safer when guardrails are built into the architecture.

Transparent column encryption is one way to move toward that goal.

And if this project sparks feedback, discussion, or improvements from the community, that would be a very good next chapter.

Repository

https://github.com/vibhorkum/column_encrypt

Richard Yen: Debugging RDS Proxy Pinning: How a Hidden JIT Toggle Created Thousands of Pinned Connections

Thu, 12 Mar 2026 08:00:00 +0000

Introduction

When using AWS RDS Proxy, the goal is to achieve connection multiplexing – many client connections share a much smaller pool of backend PostgreSQL connections, givng more resources per connection and keeping query execution running smoothly.

However, if the proxy detects that a session has changed internal state in a way it cannot safely track, it pins the client connection to a specific backend connection. Once pinned, that connection can never be multiplexed again. This was the case with a recent database I worked on.

In this case, we observed the following:

extremely high CPU usage
relatively high LWLock wait times
OOM killer activity on the database, maybe once every day or two
thousands of active connections

What was strange about it all was that the queries involved were relatively simple, with max just one join.

Finding the Pinning Source

To get to the root cause, one option was to look in pg_stat_statements. However, that approach had two problems:

Getting a clean snapshot of the statistics while thousands of queries were being actively processed would be tricky.
pg_stat_statements normalizes queries and does not expose the values passed to parameter placeholders.

Instead, to see the actual parameters, we briefly enabled log_statement = 'all'. This immediately surfaced something interesting in the logs, which could be downloaded and reviewed on my own time and pace.

What we saw were statements like SELECT set_config($2,$1,$3) with parameters related to JIT configuration – that was the first real clue.

Getting to the Bottom

After tracing the behavior through the stack, the root cause turned out to be surprisingly indirect. The application created new connections through SQLAlchemy’s asyncpg dialect, and we needed to drill down into that driver’s behavior.

Step 1 – Reviewing how SQLAlchemy registers JSON codecs

During connection initialization, SQLAlchemy runs an on_connect hook:

def connect(conn):
    conn.await_(self.setup_asyncpg_json_codec(conn))
    conn.await_(self.setup_asyncpg_jsonb_codec(conn))

This registers optimized JSON and JSONB codecs.

Step 2 – Observing how asyncpg introspects type metadata

Registering those codecs requires looking up type OIDs in pg_catalog.

That triggers asyncpg’s internal function: introspect_types()

Step 3 – Catching asyncpg temporarily disabling JIT

Inside _introspect_types() there is this block:

async def _introspect_types(self, typeoids, timeout):
    if self._server_caps.jit:
        cfgrow, _ = await self.__execute(
            """SELECT current_setting('jit') AS cur,
                      set_config('jit', 'off', false) AS new""",
        )

The purpose is harmless and avoids rare edge cases with complex type queries by temporarily disabling JIT, running the introspection query, and finally restoring the setting afterwards. For direct PostgreSQL connections, this is perfectly fine.

Unfortunately, set_config() changes session state. RDS Proxy cannot safely track this change. So it decides it is necessary to pin the client connection to a backend session. Once pinned, that connection can never be multiplexed again, for the duration of the session.

In short, since every connection initialization triggers the JIT toggle, every RDS Proxy connection gets pinned to a database connection, effectively invalidating the usefulness of RDS Proxy’s purpose of connection multiplexing. With thousands of live connections doing relatively little, Postmaster develops a lot of LWLock overhead memory buffers don’t get flushed, and OOM Killer can be invoked when the conditions are right.

The Fix

The key observation is that asyncpg only runs the JIT toggle if it believes the server supports JIT.

That capability is stored in an internal structure _server_caps. If jit is set to False, asyncpg skips the entire block.

So we added a SQLAlchemy connection hook:

@event.listens_for(engine.sync_engine, "connect", insert=True)
def _prevent_rds_proxy_session_pinning(dbapi_connection, connection_record):
    raw_conn = dbapi_connection._connection
    if hasattr(raw_conn, "_server_caps") and raw_conn._server_caps.jit:
        raw_conn._server_caps = raw_conn._server_caps._replace(jit=False)

This configuration does the following:

Registers a connection hook so that it runs every time a new connection is created.
Runs the hook before SQLAlchemy’s own hooks and ensures our handler runs before SQLAlchemy’s on_connect logic. That is important because the JSON codec registration is what triggers the introspection.
Disables the JIT capability flag. By using _server_caps._replace(jit=False), we tell asyncpg to skip the set_config() block entirely.

The Result

After deploying the asyncpg fix, we saw the number of pinned sessions drop precipitously:

Of course, we were still seeing many pinned sessions, which we continued to deal with through other fixes, but this first step produced an improvement of over 50%

Other Fix Attempts That Didn’t Work

Before landing on this fix, we attempted a few other approaches.

First, we attempted to disable JIT via connection parameters by setting server_settings={"jit": "off"}. This fails because RDS Proxy rejects it with a message like:

FeatureNotSupportedError:
RDS Proxy currently doesn't support the option jit

We also tried disabling prepared statement caching with prepared_statement_cache_size=0 in the configuration. This didn’t work because it prevents named prepared statement pinning, but it does not prevent set_config() pinning.

The only fix that worked was to add the pin-prevention hook as described above.

Lessons Learned

A few takeaways from this debugging experience:

RDS Proxy pinning can come from unexpected places. Even small session-level changes can disable multiplexing.
pg_stat_statements hides parameter values. It’s great for query patterns, but it does not expose bound parameters, which can hide critical clues. Sometimes the fastest diagnostic tool is temporarily enabling log_statement = 'all', which quickly exposed the params in the set_config() call.
SQLAlchemy and asyncpg do have some quirks that need to be addressed when using them with RDS Proxy

Final Thoughts

The entire chain looked like this:

SQLAlchemy connection
 → asyncpg codec registration
 → asyncpg type introspection
 → temporary JIT disable via set_config()
 → RDS Proxy detects session state change
 → connection gets pinned

A single hidden configuration toggle resulted in thousands of pinned sessions.

Once identified, the fix was only a few lines of code.

But getting there required following the entire stack – from SQLAlchemy to asyncpg to PostgreSQL to RDS Proxy.

Hopefully this saves someone else a few hours (or days) of debugging.

gabrielle roth: SCaLE23x

Thu, 12 Mar 2026 00:38:49 +0000

I’m back from Pasadena after SCaLE23x and another installment of PostgreSQL@SCaLE! It was really just wonderful this year, seeing old friends and making new ones, talking to people and soaking up knowledge. I’m looking forward to implementing what I learned. Expo Hall:We had a lot of booth volunteers this year. Thank you all so much; […]

Bruce Momjian: The MySQL Shadow

Wed, 11 Mar 2026 14:15:02 +0000

For much of Postgres's history, it has lived in the shadow of other relational systems, and for a time even in the shadow of NoSQL systems. Those shadows have faded, but it is helpful to reflect on this outcome.

On the proprietary side, most database products are now in maintenance mode. The only database to be consistently compared to Postgres was Oracle. Long-term, Oracle was never going to be able to compete against an open source development team, just like Sun's Solaris wasn't able to compete against open source Linux. Few people would choose Oracle's database today, so it is effectively in legacy mode. The Oracle shadow is clearly fading. In fact, almost all enterprise infrastructure software is open source today.

The MySQL shadow is more complex. MySQL is not proprietary, since it is distributed as open source, so it had the potential to ride the open source wave into the enterprise, and it clearly did from the mid-1990s to the mid-2000s. However, something changed, and MySQL has been in steady decline for decades. Looking back, people want to ascribe a reason for the decline:

Sun buying MySQL AB
Oracle buying Sun
Poor stewardship of MySQL by Oracle, including recent layoffs

Vibhor Kumar: Beyond Features: What a PostgreSQL Strategy Discussion Taught Me About Calm, Modern Platforms

Wed, 11 Mar 2026 13:36:44 +0000

Last December, I was part of a long enterprise discussion centered on PostgreSQL.

On paper, it looked familiar: a new major release, high availability and scale, Aurora migration, monitoring, operational tooling, and the growing conversation around AI-assisted operations.

The usual ingredients were all there.

But somewhere in the middle of that day, the tone of the room changed.

It did not change when we talked about new PostgreSQL capabilities. It changed when the conversation moved to upgrades, patching, monitoring quality, and operational control.

That was the moment I realized this was not really a feature discussion.

It was a trust discussion.

Not trust in PostgreSQL as a database. That question is mostly behind us.

It was trust in something more practical: can this platform evolve without exhausting the team responsible for it? Can it scale without becoming harder to reason about? Can it be upgraded without becoming a quarterly trauma ritual? Can it be monitored without operators drowning in false signals? Can it support modernization without making every change feel dangerous?

That, to me, is where the PostgreSQL conversation has matured.

A modern PostgreSQL platform is not defined only by what it can do. It is defined by how calmly it can change.

Why this matters now

This matters because PostgreSQL is no longer entering the enterprise through side doors. In many organizations, it is already trusted with serious workloads and is increasingly central to modernization plans.

That changes the questions.

A few years ago, teams often asked whether PostgreSQL was ready for enterprise use. Today, the better question is whether the operating model around PostgreSQL is ready for enterprise reality.

Because the database can be strong while the surrounding practice is weak.

That is where many teams struggle. They like PostgreSQL, but lag on upgrades. They have HA designs, but unclear failure playbooks. They have monitoring, but poor signal quality. They use managed PostgreSQL services, but feel boxed in over time. They want automation and intelligent assistance, but are not always clear what uncertainty they are trying to remove.

The technology is often ready. The operating discipline around it is where the real work lives.

What we were really discussing

Even though the conversation touched several areas, the underlying questions were surprisingly consistent.

We were really discussing:

How to stay current on PostgreSQL releases without upgrade drama
How to think about availability and scale without operational chaos
How to evaluate Aurora-to-PostgreSQL migration without turning it into ideology
How to improve monitoring quality so operators can act faster and guess less
How to use guided operations only where they reduce real uncertainty

That is a more useful frame than just listing technologies.

Most platform pain does not come from lack of features. It comes from lack of confidence in change.

PostgreSQL 18: a release only creates value if it is reachable

One obvious part of the discussion was PostgreSQL 18.

A new major release always matters. It brings new capabilities, new opportunities, and another step forward in a long tradition of serious engineering.

But in real environments, the most important question is not, “What’s new?”

It is this: can we adopt it without making life harder for the team?

That is where enterprise PostgreSQL gets real.

A release only creates value if it is reachable. Not downloadable. Not demo-ready. Operationally reachable.

That means teams can:

Validate it with confidence
Test application behavior early
Align rollout with workload patterns
Standardize the upgrade motion
Avoid turning major version change into a hero project

Too many environments still postpone upgrades until they become politically and technically painful. The version gap grows. Dependencies pile up. Risk accumulates quietly. Then one day the “future upgrade” becomes a transformation project nobody wants to sponsor.

The better pattern is less dramatic: stay reasonably current, validate earlier, and make major upgrades boring.

That word is worth defending.

In PostgreSQL operations, boring is beautiful. A quiet upgrade is not a lack of ambition. It is a sign that the team has built enough discipline to let progress happen without ceremony and fear.

A practical PostgreSQL upgrade test

If a team says it is “ready” for a major PostgreSQL upgrade, I think it should be able to answer these questions clearly:

Do we know which applications, extensions, and drivers need compatibility testing?
Do we have a rehearsal path in a lower environment that resembles production closely enough to matter?
Do we know our acceptable rollback posture if the cutover does not behave as expected?
Have we aligned the upgrade window to real workload behavior instead of habit?
Can we explain the upgrade sequence in plain language without relying on one hero engineer?

If those answers are fuzzy, the problem is usually not PostgreSQL. It is upgrade discipline.

Availability and scale: PostgreSQL gets harder when the business stops tolerating pauses

High availability is one of those areas where teams can sound mature long before they actually are.

The vocabulary is easy: resilience, failover, RPO, RTO, locality, five nines.

The hard part is building a PostgreSQL platform that behaves predictably when real conditions get messy.

That is where the distributed side of the discussion became important. Not because “distributed” sounds impressive, but because some environments genuinely need:

Broader availability expectations
Regional workload handling
Scale without conflict
Architecture that remains understandable during failure

One capability that stood out in the conversation was regional workload transfer to the closest database while ensuring no conflicts occur.

That is not just a technical feature description. It points to something bigger: keeping the platform coherent while demand shifts and geography matters.

And coherence matters more than many teams admit.

Adding nodes is easy to put on a slide. Adding trust is much harder.

A PostgreSQL platform only becomes more resilient if the people operating it understand:

What kinds of failures it is designed to absorb
What behavior is automatic versus manual
What consistency guarantees matter for the workload
How upgrades and patching will happen without destabilizing the design

Without that clarity, scale becomes ambiguity. And ambiguity is expensive.

Five nines is not something you announce. It is something you earn through engineering discipline and operational clarity.

What good HA/DR looks like in PostgreSQL

A high-availability PostgreSQL design is not “good” just because replication is configured.

A mature design should let the team answer these questions fast:

If the primary fails, what happens next?
Is failover automatic, manual, or operator-assisted?
Who is responsible for deciding whether the system should fail over?
What happens to client routing and reconnection?
How do we prevent split-brain or conflicting writes in more advanced topologies?
How do we patch and upgrade the environment without violating the resilience design?

If those answers are not operationally clear, the architecture may look strong but still behave weakly under pressure.

Aurora migration: the real question is not “managed or unmanaged?”

Another meaningful part of the day was around Aurora and the path beyond it.

Aurora solves real problems. It gives many teams speed, convenience, and a managed experience that can be a good fit, especially early on.

But enterprises have a way of growing into harder questions.

Eventually the conversation stops being “Is it managed?” and becomes “Is it manageable on our terms?”

That is a different question altogether.

Because the trade-off is not simply convenience versus complexity. It is often convenience versus control.

Control over:

Upgrade timing
Patching choices
Extension strategy
Operational consistency across environments
Visibility into how the system behaves
Where responsibility truly lives when something goes wrong

These edges do not always show up on day one. They tend to appear later, when the estate gets bigger, the stakes get higher, and standardization starts to matter more than convenience.

The mistake here is to turn the conversation ideological.

This is not “managed is bad” or “self-managed is pure.” That is not serious thinking.

The serious question is this: how do we regain control without reintroducing chaos?

That is why good Aurora-to-PostgreSQL thinking starts with the operating model, not the migration utility.

Before teams move, they should be clear on:

Upgrade and patch discipline
Backup and recovery assumptions
Workload-aware change windows
Extension needs
Monitoring expectations
Blast-radius containment during transition

Good migration planning is not just about moving bytes. It is about rebuilding confidence.

A practical Aurora migration checkpoint

Before moving off Aurora into a broader PostgreSQL operating model, teams should test themselves with a few honest questions:

What problem are we actually trying to solve: cost, control, extensions, standardization, or something else?
What operating burden will we newly own after the move?
Do we already have the monitoring, backup, patching, and upgrade discipline to own that burden well?
Are we migrating to a clearer operating model, or just migrating away from frustration?
Can we stage the move in a way that limits blast radius and preserves confidence?

A surprising number of migrations get weaker not because PostgreSQL is hard, but because the target operating model was never designed properly.

Monitoring: the hidden tax most teams normalize

If there was one part of the conversation that felt universally familiar, it was monitoring and operational noise.

Most operational pain does not arrive as one dramatic outage. It arrives as accumulation.

Too many alerts. Too many dashboards. Too many things that look urgent but are not. Too many moments where capable people have to guess which signal matters.

That creates a tax that is easy to underestimate.

Alert fatigue becomes decision fatigue. Decision fatigue becomes hesitation. Hesitation becomes fear of change. And fear of change is where modernization quietly slows down.

This is why observability in PostgreSQL environments should be judged by a stricter standard than “we collect enough metrics.”

The real question is whether the system helps operators decide what to do next.

That means:

Reducing false positives
Improving actionability
Surfacing likely next actions
Making the operational context clearer, not noisier

It also means treating workload-aware time blocks as a serious operational tool.

Good teams do not only ask, “When is the maintenance window?” They also ask, “When is this workload least sensitive to change?” and “When will the platform absorb this patch, tuning adjustment, or upgrade most safely?”

That shift sounds small. It is not.

It is one of the clearest signs that a PostgreSQL team has moved from reactive maintenance to intentional operations.

A simple observability test for PostgreSQL teams

I like to ask teams four blunt questions:

How many alerts fired last week?
How many required human action?
How many were false positives or low-value noise?
For the alerts that mattered, was the next action obvious?

If the team cannot answer those questions, observability may be collecting data without improving decisions.

That is not observability maturity. That is metric accumulation.

Guided operations and AI: useful only if they reduce ambiguity

The conversation also touched on AI-assisted operations.

This is an area where it is very easy to sound futuristic and very hard to be useful.

So I prefer a simple standard.

AI has value in PostgreSQL operations only if it reduces ambiguity.

That means helping with things like:

Identifying which signals matter
Reducing noise
Suggesting meaningful next actions
Improving patching and upgrade readiness
Offering explainable optimization guidance
Aligning recommendations with workload behavior

If it does those things, it helps.

If it simply adds another glossy layer of “intelligence” without reducing operator burden, then it is not solving the real problem.

PostgreSQL teams do not need more spectacle. They need fewer uncertain moments.

That is where guided operations can actually matter: not by replacing human judgment, but by helping good teams use that judgment more effectively.

What teams often get wrong

Many platform problems are not mysterious. They are patterns.

1. Treating upgrades as rare events

When teams delay major PostgreSQL upgrades too long, they are not preserving stability. They are usually accumulating future pain.

2. Confusing HA design with resilience

Replication and topology alone do not create trust. Resilience requires clear failure behavior and clear operator understanding.

3. Measuring observability by volume

More dashboards and more alerts are not signs of maturity. Actionability is.

4. Migrating without redesigning the operating model

Moving from Aurora or any managed environment without rethinking patching, monitoring, and change discipline is a recipe for disappointment.

5. Talking about AI without defining the ambiguity it removes

If intelligent assistance does not reduce guesswork, it is decoration.

What good looks like in a mature PostgreSQL platform

A mature PostgreSQL platform is not one that never has incidents. That is fantasy.

It is one where the team has built enough clarity and discipline that change does not feel theatrical.

In practice, that means:

Major upgrades are planned early and executed routinely
Patching follows a repeatable cadence
HA/DR design is explainable under pressure
Monitoring produces clear actions instead of constant noise
Maintenance windows reflect workload behavior, not just habit
Automation supports judgment instead of masking weak process
Intelligent assistance is used where it reduces real uncertainty

That is what calm looks like in a serious PostgreSQL estate.

Not perfection. Not magic. Just fewer surprises and better decisions.

The five pillars of calm PostgreSQL operations

If I had to reduce this whole experience into a practical framework, it would be this:

1. Currency

Stay reasonably current on major PostgreSQL releases so upgrades remain manageable.

2. Clarity

Make HA, failover, ownership, and change behavior understandable before they are tested under pressure.

3. Control

Know when managed convenience is still helping and when it has become an operational constraint.

4. Signal

Reduce false alerts and improve actionability so operators can focus on what matters.

5. Change discipline

Patch, tune, and upgrade in workload-aware windows with repeatable playbooks.

That is not flashy. It is effective.

What to do on Monday

A good blog post should not just leave readers with ideas. It should leave them with work worth doing.

If you run PostgreSQL today, start with five practical steps.

1. List the top three changes your team currently avoids

Be honest. Which changes create the most hesitation? Major upgrades? Failover testing? Tuning changes? Patch cycles?

2. Identify why those changes feel risky

Is it testing weakness? Tooling gaps? Poor rollback confidence? Unclear ownership? Monitoring noise?

3. Review your version strategy

Are you staying reasonably current, or are you drifting into large, painful upgrade gaps?

4. Audit your alerts

How many alerts actually require action? How many are just operational wallpaper?

5. Revisit your maintenance windows

Are they based on habit and calendar tradition, or on actual workload behavior?

These five steps will usually tell you more about PostgreSQL maturity than a dozen architecture slides.

How I helped the team move the conversation forward

One part of the discussion I valued most was helping turn broad platform themes into practical operating questions.

That meant helping the team think through:

How to approach PostgreSQL modernization as an operating model, not just a technology choice
How to make major version upgrades and patching more repeatable and less disruptive
How to evaluate HA/DR and distributed PostgreSQL through the lens of clarity, resilience, and operational predictability
How to frame Aurora migration as a question of control, standardization, and long-term manageability
How to improve monitoring by reducing false alerts and making signals more actionable
How workload-aware maintenance windows can reduce risk and make change easier to absorb
How guided operations and intelligent assistance can be useful only when they reduce real uncertainty for operators

For me, that is where these conversations become valuable: when architecture discussion turns into operational clarity, and when strategy starts becoming something a team can actually execute.

Because most teams do not need more theory. They need a clearer path.

Final thought

What stayed with me from that day was not the product list or the roadmap sequence.

It was the pattern in the questions.

The best teams in the room were not looking for magic. They were looking for confidence.

Confidence that PostgreSQL could keep evolving without exhausting the people responsible for it. Confidence that upgrades could become routine. Confidence that scale would not create confusion. Confidence that operations could become calmer, not louder.

That, to me, is where the real value of PostgreSQL lives now.

Not just in features. Not just in architecture diagrams. Not just in benchmarks.

But in helping real teams make real changes safely, repeatedly, and without losing their nerve.

That is not fluff.

That is the work.

Floor Drees: The Future of Postgres on the agenda: EDB’s PGConf.dev Preview

Wed, 11 Mar 2026 12:29:11 +0000

PGConf.dev is heading to Vancouver, Canada, from May 19–22, bringing together the users, developers, and community organizers driving the future of PostgreSQL. EDB is proud to be a Gold-level sponsor this year, with our own Robert Haas serving as an organizer and Jacob Champion contributing to the Program Committee. Following a highly successful Call for Papers, we’ve put together this preview of the EDB-led sessions you won't want to miss.

Lukas Fittl: The Dilemma of the ‘AI DBA’

Wed, 11 Mar 2026 00:00:00 +0000

Like many in the industry, my perspective on AI tools has shifted considerably over the past year, specifically when it comes to software engineering tasks. Going from “this is nice, but doesn’t really solve complex tasks for me” to “this actually works pretty well for certain use cases.” But the more capable these tools become, the sharper one dilemma gets: you can hand off the work, but an AI agent won’t ultimately be responsible when the database goes down and your app stops working. For…

Lætitia AVROT: work_mem: it's a trap!

Wed, 11 Mar 2026 00:00:00 +0000

My friend Henrietta Dombrovskaya pinged me on Telegram. Her production cluster had just been killed by the OOM killer after eating 2 TB of RAM. work_mem was set to 2 MB. Something didn’t add up. Hetty, like me, likes playing with monster hardware. 2 TB of RAM is not unusual in her world. But losing the whole cluster to a single query during peak operations is a very different kind of problem from a 3am outage.

Virender Singla: The Part of PostgreSQL We Discuss the Most — 2

Tue, 10 Mar 2026 17:27:35 +0000

PostgreSQL and Oracle Implementation

In the Part 1, we explored the general concepts of MVCC and the implications of storing data snapshots either out-of-place or within heap storage, we can now map these methodologies to specific database engines.

The PostgreSQL MVCC implementation aligns with the DatabaseI model, whereas Oracle and MySQL are closely related to the DatabaseO model. Specifically, Oracle utilizes block versioning and stores older versions in a separate storage area known as UNDO, while PostgreSQL employs row versioning.

These engines further optimize their respective in-place or out-of-place MVCC strategies:

Oracle (DatabaseO) Delta Storage: To improve efficiency, Oracle avoids copying an entire block to UNDO. Instead, it only stores the modified columns as a “delta.” Consequently, when a query requires an older image, the engine applies this delta to the current heap block to reconstruct the previous state.
PostgreSQL (DatabaseI) Visibility Map (VM): To mitigate the overhead of scanning the entire heap for garbage collection, PostgreSQL uses a Visibility Map. This data structure maintains per-block information of heap, allowing the garbage collector to identify specific blocks containing garbage instead of performing a full table scan.
Heap Only Tuple (HOT) Optimization: PostgreSQL addresses continuous index churn caused by new physical address (ctid) through HOT optimization. If a new row version fits within the same block as the previous version, the indexes are not updated. Instead, index access lands on the heap block, accessing the old version, which then chains directly to the new version within the same block. Note that it’s still a single block fetch.
Row Locking Mechanism: PostgreSQL utilizes the visibility counters to manage row locking as well, whereas Oracle employs a distinct data structure located in the block header for this purpose.
Handling Multiple Data Versions: When a row undergoes multiple updates, Oracle maintains all historical versions in UNDO, linking them via pointers with the head of the chain anchored in the block header. Hence, the header only needs a single counter to redirect to this UNDO chain. However, if a record has been updated many times — such as ten iterations — a SELECT operation may have to traverse the entire UNDO sequence to reconstruct the required snapshot.

Scenario when the Block fills

PostgreSQL has a higher probability of filling blocks quickly because of its in-place storage method, whereas Oracle typically only fills a block further when an update causes a row’s width to expand. PostgreSQL and Oracle differ in how they handle index updates when the modified row version must be stored in a different heap block due to lack of free space within the same heap block. While the physical address changes in both systems, PostgreSQL updates the index entries in this scenario, whereas Oracle maintains the existing index entry. In Oracle, an index fetch will still land on the original heap block and then be redirected to the new heap block, a process known as “Row Migration” that results in two I/O operations. In summary, the architecture of PostgreSQL results in increased UPDATE latency due to heightened index churn. This stands in contrast to Oracle, where the performance impact is instead shifted to SELECT operations, which will require an additional fetch to retrieve data. Though this design choice is independent of their respective MVCC implementations.

In MySQL, data is structured within a clustered index and secondary indexes reference this data using logical primary key pointers rather than physical addresses. Because of this logical mapping, secondary indexes remain unaffected when the physical location of rows within MySQL changes.

Inefficiencies in the PostgreSQL Garbage Collection!

More Bloat?

The issue of bloat caused by PostgreSQL’s MVCC implementation is a frequent subject of debate. While Oracle bloat is less commonly discussed, long-running queries in that system can still experience significant performance degradation when forced to retrieve older data versions from UNDO blocks. To illustrate this, consider a scenario where an UPDATE statement modifies every row in a table. If a large query begins just before this update and performs a full table scan before any garbage collection occurs, PostgreSQL must fetch nearly double the actual table size because two versions of every row now exist within the heap. Oracle must also access these previous images, doing so by reaching into both the heap and the dedicated UNDO storage. In this sense, “garbage” or versioning bloat impacts the I/O of both engines.

However, a critical distinction remains: Oracle queries are only impacted when they specifically require a previous block image; all new queries are served directly from the updated heap blocks. In contrast, PostgreSQL bloat continues to impact every query, regardless of when it started, because the outdated versions remain interleaved with live data. Moreover, PostgreSQL bloat is often permanent. Even after the autovacuum process removes “dead tuples,” the physical table size typically does not shrink, meaning full table scans must still traverse empty space. While this is not a significant concern, as incoming data will eventually occupy these partially filled blocks. Although PostgreSQL can truncate empty blocks at the end of a file to return space to the OS, this requires an exclusive lock and can introduce other operational challenges, such as query conflicts on replicas.

Slow Default Configurations

In PostgreSQL, vacuuming is the primary garbage collection mechanism, managed by the “autovacuum” background worker. This process purges obsolete row versions based on a number of configurable. The instance wide default settings — such as triggering a cleanup only when obsolete data reaches 20% of the table size — are often too conservative for modern production workloads. This fixed percentage scales poorly in enterprise environments where tables reach hundreds of gigabytes, leading to significant bloat before maintenance begins. Beyond these default processing speeds, garbage collection can be further delayed or blocked by long-running transactions that require access to older row versions, preventing the reclamation of space. Also, concerns that autovacuum workers might consume resources and interfere with live traffic often lead administrators to favor less aggressive configurations or rely on manual vacuuming during off-peak hours. This necessitates time-consuming, table-level tuning based on specific sizes and workloads rather than relying on a one-size-fits-all approach.

Index Churn

Index maintenance adds another layer of complexity; cleaning indexes is significantly more expensive than heap cleanup. It requires the autovacuum process to collect row pointers (ctid) and perform exhaustive index scans, a process that slows down considerably as the number of indexes increases.

Right value for FILLFACTOR!

Both systems provide configuration settings, such as PostgreSQL’s FILLFACTOR, to determine the percentage of a block to fill versus the amount reserved for future updates. To reduce index churn, FILLFACTOR (defaults to 100, or 100% occupancy) is a critical setting in PostgreSQL. By reducing this value, administrators reserve space within blocks to facilitate HOT updates; however, identifying the optimal setting can be challenging. This contrasts with Oracle’s architecture, which typically leave a 10% reserved space for future updates and is generally less sensitive to these issues. Further details regarding FILLFACTOR can be found in sections 1 and 2.

Oracle Garbage Collection

Oracle’s garbage collection is largely autonomous; it manages versioning pressure internally without requiring the same level of granular manual tuning or exposure of maintenance settings to the user. Oracle users primarily focus on two administrative tasks: ensuring adequate UNDO space — similar to managing standard data file storage — and setting a Time-to-Live via the undo_retention parameter to regulate garbage cleanup.

It is also important to note that Oracle indexes accumulate garbage, as previously detailed in the Index MVCC versioning section. Unlike PostgreSQL, Oracle does not feature a dedicated garbage collection process for its indexes. It performs cleanup lazily or opportunistically, reclaiming space only when subsequent transactions happen to traverse those specific index blocks. Notably, index garbage is only produced during DELETE operations or UPDATEs that modify an indexed column; standard UPDATEs, even those that alter a row’s physical address (due to Row Migration), do not contribute to this.

MySQL obviously also has this Index garbage problem. To address this, MySQL includes primary keys in undo records, enabling the removal of garbage from clustered indexes during the cleanup of these undo records.

The (In)Famous Transaction ID Wraparound Issue

The “Transaction ID Wraparound” issue is a well-known phenomenon in PostgreSQL that can lead to significant database downtime. Why is this predominantly a PostgreSQL concern?

Storage and Performance Concerns

As PostgreSQL utilizes 4-byte Transaction ID (XID) counters, known as xmin and xmax, for visibility tracking on a per-row basis. With a 4-billion transaction limit, the system must eventually recycle these IDs. While the autovacuum process typically manages this cleanup, high-volume workloads can exhaust these IDs rapidly, so the autovacuum process needs to be tuned well for a high workload. If XID recycling is delayed or obstructed by a blocker, the exhaustion of all available IDs triggers an enforced outage known as a Transaction Wraparound.

In contrast, Oracle stores XIDs within the block header. This architectural choice allowed Oracle to utilize larger data types: 6 bytes prior to Oracle 12, and 8 bytes in subsequent versions. While the 6-byte implementation did cause a notable scare in 2012 due to a specific bug, the larger capacity significantly extends the runway. MySQL uses 6 byte DB_TRX_ID.

And 8 byte is not just double of 4 byte, it’s 4,294,967,296 times 🙂

Here is the straightforward math behind it:

4-byte capacity: 32 bits ($2^{32}$) gives you 4,294,967,296 possible values.

6-byte capacity: 48 bits ($2^{48}$) gives you 281,474,976,710,656 possible values.

8-byte capacity: 64 bits ($2^{64}$) gives you 18,446,744,073,709,551,616 possible values.

The Burn Rate: At a very high enterprise throughput of 20,000 transactions per second (TPS), it would take over 440 years of continuous, uninterrupted processing to exhaust the 6-byte limit. Even at an extreme 100,000 TPS, a system would still have nearly 90 years of runway.

Why has PostgreSQL not adopted larger XID storage? While technically feasible, the primary deterrent is that storing these visibility counters on a per-row basis raises significant concerns regarding performance and storage overhead. Interestingly, there is an ongoing hacker thread discussing the possibility of storing 8 byte XID at the block level and combining them with row-level XIDs to determine visibility. Also, some PostgreSQL variants already utilize such methods to bypass the 4-byte limitation.

Index Visibility

PostgreSQL omits Transaction IDs (XIDs) from its indexes, likely to avoid the substantial per-row storage overhead that would otherwise be required for each index entry. Consider the impact of indexing a 2-byte SMALLINT column: adding two 4-byte XIDs alongside the CTID would significantly inflate the entry size. Hence, row visibility cannot be determined solely at the index level; the system must instead consult the heap row to verify visibility. To optimize this process and avoid constant heap access, PostgreSQL utilizes the Visibility Map (VM), allowing a transaction to quickly check if an entire block is visible. This architecture is why Index-Only Scans in PostgreSQL are not truly “index-only” in the absolute sense.

For this exact reason, a DELETE operation in PostgreSQL does not immediately modify any Index entries. Cleaning up these entries is prohibited at that stage because concurrent sessions might still be interested in accessing those deleted records. In the absence of visibility counters, other sessions must independently verify a row’s deletion status by consulting heap rows visibility itself. These Index entries are only permanently removed once the autovacuum process confirms that no active transactions can still access the previous versions. That means after a DELETE and before VACUUM happens, every query accessing those ros has to consult the heap to check the row status. To optimize, while PostgreSQL utilizes delete bits within the Index to signify deletions, a DELETE does not toggle them either. PostgreSQL sets these “ bits” opportunistically during a SELECT if it determines that the old entries are no longer visible to any other transaction. Hence subsequent SELECTs do not need to consult the heap.

Segregation of work

Another challenge with XID recycling is the multifaceted nature of the autovacuum process, which handles dead tuple cleanup, statistics collection, and recycling. Hence, even when XID exhaustion is imminent, autovacuum may still be occupied with heap and index cleaning. This has been largely addressed through a fail-safe mechanism: once XID consumption reaches a specific configurable threshold, the system bypasses other tasks to focus exclusively on XID recycling.

Incremental XID recycling?

The XID cleanup process is not incremental in PostgreSQL. That means, there is no mechanism to isolate and prioritize the blocks or rows containing the oldest XIDs; the system cannot simply clean those first to bring the database back online while continuing to recycle remaining IDs in the background. Instead, the autovacuum process is forced to scan every block identified by the Visibility Map as requiring recycling — a task that can span many hours for exceptionally large tables.

The Role of Blockers

While the default autovacuum configuration is frequently blamed for the buildup of dead tuples and transaction wraparound risks, the true cause often lies with various blockers — such as long-running queries, transactions, replication issues, prepared statements, or even temporary tables, as I recently came across. It is crucial to pay close attention to these blockers and mitigate them by utilizing timeout functionality, particularly by setting limits on query run times.

In contrast, Oracle utilizes a specific undo_retention flag that acts as a Time-to-Live (TTL) for garbage data; once this threshold is met, the data is purged regardless of whether other transactions might still require it. PostgreSQL, however, maintains a stricter dependency: a single transaction executing a command like pg_sleep(infinite) will effectively stall all garbage collection and XID recycling for any transaction IDs following it.

Visibility of Overflow Pages

In databases, a row that is too wide to fit within a standard heap block (often due to Large Object or LOB data) is split and stored in fixed-size segments across overflow pages, with a redirection pointer remaining in the main page row. In PostgreSQL this is known as a TOAST table.

A critical architectural nuance in PostgreSQL for simplicity is that these overflow pages maintain their own visibility counters, xmin and xmax, despite having no independent identity outside of the main table, because TOAST entries are always accessed via the main table row — which governs overall visibility — this creates a unique maintenance requirement. For instance, a 1TB table that stores only 10GB in the main table while the rest resides in TOAST still requires XID recycling at both locations.

Customer User Journeys (CUJs)

Ultimately, the typical end-user experience for each system can be summarized as follows:

Oracle Journey

Issue: A daily query suddenly fails with a “snapshot too old” error.
Cause: Older UNDO data versions have been cleaned up.
Investigation: Determine why the query duration increased (e.g., higher data volume or a changed execution plan etc).
Resolution: Apply a quick fix by increasing UNDO space or pursue a long-term solution by tuning the query.

PostgreSQL Journey

Issue: The system approaches a transaction ID wraparound or performance degrades due to accumulated dead tuples.
Cause: Excessive “garbage” collection lag is impacting the workload.
Investigation: Evaluate if autovacuum is too slow; this often leads to a repetitive cycle of tuning flags at the table level.
Resolution: The administrator must pinpoint the specific blocker and terminate it to permit autovacuum to catch up.

The distinction between these journeys is clear: in one scenario, the impact is localized to a specific query, whereas in the other, it affects the entire system workload.

Final Thoughts

Conceptually both in-place and out-of-place data versioning strategies present distinct advantages and challenges but practically in-place MVCC implementations of PostgreSQL has more drawbacks. In fact a discussion initiated a few years ago regarding zheap, an initiative to implement out-of-place versioning in the Postgres. The community has also made rapid progress in optimizing and refining the autovaccum processes. Recent advancements in the cleanup process include the ability to skip indexes during maintenance, parallel index cleanup, and a more streamlined memory architecture. Additionally , the system now features bottom-up deletion and reduced WAL thrashing.

While managed service providers offer database solutions, the shared responsibility model means that customers remain responsible for the intricate task of tuning the autovacuum. Until PostgreSQL achieves a fully autonomous garbage collection system for its in-place MVCC implementation, we must remain diligent in monitoring and tuning our critical production databases.

Thank You for reading. Suggestions, feedbacks are appreciated.

Virender Singla: The Part of PostgreSQL We Discuss the Most — 1

Tue, 10 Mar 2026 17:26:58 +0000

Early in my PostgreSQL journey, I often sensed that a conversation between two Postgres professionals inevitably revolves around vacuuming. That lighthearted observation still remains relevant, as my LinkedIn feeds are often filled with discussions around vacuuming and comparing PostgreSQL’s Multi-Version Concurrency Control (MVCC) implementation to other engines like Oracle or MySQL. Given that people are naturally drawn to the most complex components of a system, I will continue this journey by exploring a detailed comparison of these database architectures focused on the MVCC implementations.

What is MVCC?

Stone age databases relied on strict locking mechanisms to handle concurrency, which proved inefficient under heavy load. In these traditional models, a read operation required a shared lock that prevented other transactions from updating the record. Conversely, write operations required exclusive locks that blocked incoming reads. This resulted in significant lock contention, where readers blocked writers and writers blocked readers.

To solve this, RDBMS implemented MVCC. The idea was very simple. Rather than overwriting data immediately, maintain multiple versions of data simultaneously. This allows transactions to view a consistent snapshot of the database as it existed at a specific point in time. For instance, if User 1 starts reading a table just before User 2 starts modifying a record, User 1 sees the original version of the data without hindering User 2’s progress. Without MVCC, the system would be forced to either serialize all access — making User 2 wait — or risk data consistency anomalies like dirty or non-repeatable reads where User 1 sees uncommitted changes that might eventually be rolled back.

Database engines utilize various architectures to manage this data versioning. A particularly notable point of discussion is the comparison between “in-place” and “out-of-place” data versioning techniques. Let’s examine these approaches more closely.

Explaining In-Place and Out-of-Place Data Versioning

Theoretical Framework:

To explore the core distinctions between MVCC implementations, let us consider two RDBMS utilizing row-based storage models: DatabaseI (in-place data versioning) and DatabaseO (out-of-place data versioning). These placeholder names are intended to represent the primary methodologies of PostgreSQL and Oracle, respectively, facilitating a comparison of their MVCC implementations without delving into exhaustive internal details at this level. In the subsequent analysis, we will map these conceptual models to the practical implementations found in PostgreSQL and Oracle.

Fundamentals of Table Storage:

In RDBMS, the fundamental unit of storage is a block or page — typically sized in kilobytes (though column-oriented models often utilize larger block size). A table, or heap, is essentially a collection of these blocks. Each block contains data rows along with a header that stores essential metadata for maintaining consistency and performing checksums. The block represents the smallest granular unit for reading. When a transaction queries a table, it accesses these heap blocks to retrieve the required data.

In-Place and Out-of-Place Data Versioning:

To understand the nuances of MVCC, consider the implications of a standard update operation, such as modifying a user’s account balance:

UPDATE accounts SET balance = balance + 100 WHERE user_id = 100;

While both DatabaseO and DatabaseI must preserve the before and after states of the data to ensure isolation, they diverge in their storage strategies for these versions.

DatabaseO: Block-Based Versioning (Copy-on-Write):

Operating as a Copy-on-Write (CoW) system, DatabaseO employs a block-based strategy. When an update occurs, the engine reads the block into memory, copies the original version to a dedicated storage area for older images, modifies the data in memory, and then writes the updated block back to the heap. This ensures the heap always contains the most current data, while older versions are retrieved from separate storage via redirection structures in the heap block header.

DatabaseI: Row-Level Versioning (In-Place):

Conversely, DatabaseI uses a row-level approach where the original data is not moved. Instead, it creates a new data version with the updated data and stores it directly within the existing heap block alongside the previous version. The process involves reading the block, adding the new row in its complete format (as it is a row-based engine), and writing the modified block back to the heap. Hence, a single block holds multiple iterations of the same row, keeping both states in one physical location.

In essence, one approach utilizes block versioning while the other relies on row versioning.

Understanding Snapshot Visibility

As MVCC functions by maintaining multiple versions of data, allowing a transaction to view data as it existed when that transaction began, the fundamental question then becomes: how does a transaction determine which version of this snapshot to read?

To manage this, the database assigns each transaction a unique Transaction ID (call it as XID), which acts as a database clock. These XIDs must be stored alongside the data to track when changes occurred and ensure that the appropriate transaction reads the correct data version.

DatabaseO, which utilizes block versioning, stores the XID in the block header so it can distinguish between two block versions. Conversely, DatabaseI, which employs row versioning, stores the XID within the header of each individual row so it can distinguish between two row versions.

A Practical Look at Snapshot Visibility

Consider a scenario where a row is inserted at XID=0 and subsequently updated at XID=100:

Visibility in DatabaseO (Block-Level Versioning):

In DatabaseO, an update causes the original XID=0 block image to be moved to separate storage while the heap block is updated with new data and a new XID of 100. Now if a transaction with an XID < 100 (already running before you executed the UPDATE) attempts to read the heap block, it finds the heap block header at XID=100. Recognizing that the data has changed since it began, it retrieves the old block image from the separate storage. Transactions with an XID ≥ 100 can read the current heap block directly as they are interested in reading the most recent version of the data.

At INSERT (XID = 0):

Data-Snapshot1: XID=0 /* heap block */

At UPDATE (XID = 100):

Data-Snapshot1: XID=0 /*separate storage */

Data-Snapshot2: XID=100 /* heap block */

Visibility in DatabaseI (Row-Level Versioning):

In DatabaseI, both row versions (XID=0 and XID=100) are stored within the heap block. When a transaction with an XID < 100 reads the block, it reads both rows but only returns the version at XID=0, filtering out the newer version as its XID is greater than the transaction XID and the data did not exists when the transaction started.

For transactions where the XID ≥ 100, the process becomes slightly more complex as transaction XID is now older than the XID of both the rows. To resolve this, DatabaseI has to include two XID markers per row: one updated upon insertion (XID1) and another updated upon deletion (XID2):

At INSERT (XID = 0):

Data-Snapshot1: XID1=0, XID2=NULL /* Inserted by XID 0 */

At UPDATE (XID = 100):

Data-Snapshot1: XID1=0, XID2=100 /* Inserted by XID 0, Deleted by XID 100 */

Data-Snapshot2: XID1=100, XID2=NULL /* Inserted by XID 100 */

A row is only visible to a transaction if that transaction’s XID falls between the row’s XID1 and XID2. Thus, a transaction with an XID ≥ 100 knows to ignore the original row because it was effectively “deleted” (replaced) at XID=100.

Final Considerations

These examples illustrate simplified visibility rules. Real-world concurrent environments involve additional complexities, such as row locking, multiple isolation levels, and the status of transactions (committed vs. rolled back).

The Impact of MVCC on Index Scans

The analysis thus far has focused on transactions accessing heap blocks directly. In OLTP environments where low latency is critical, a transaction typically targets an index entry before reaching the heap. This raises a key question: how does the MVCC architecture handle index-based access?

Index Storage and Row Locators

Each row in a heap block is identified by a physical ID, serving as its unique address in the heap block. B+ Tree indexes store column values alongside this row locator to precisely identify the row’s position within the heap block.

Impact of Data Versioning on Indexes

The choice of MVCC architecture significantly impacts index maintenance:

DatabaseO: Because the heap block contains the modified data within the same row, the physical ID location could remain constant. Consequently, the indexes do not require updates when data changes.
DatabaseI: Each new row version within the heap block receives a new physical ID. This necessitates updating all associated indexes to point to the new location. That means a lot of Index churn.

Note: If an indexed column value itself is modified, the index entry must be updated regardless of whether DatabaseO or DatabaseI.

The Intricacies of Index Versioning

We discussed above about data versioning in the heap and it’s impact on the Indexes. Though Indexes may also have multiple versions of data. Implementing MVCC within an index is inherently more complex than managing versioning within heap blocks. In DatabaseO, while heap versioning can simply redirect transactions to read older data versions in separate storage, index versioning forces both DatabaseI and DatabaseO to store data snapshot versions directly within the index blocks to maintain searchability for active queries, imagine if a value “5” were deleted from the index and moved to separate image, a search operation would be unable to locate it in the index structure.

For example, consider updating an indexed value from awesome to outstanding . Due to the balanced nature of tree structures, these entries may reside on different leaf blocks, each assigned specific visibility counters.

DatabaseO:

At INSERT (XID = 0):

“awesome”: XID=0 /* Index Block */

At UPDATE (XID = 100):

“awesome”: XID=0 /* Index Block 1 */

“outstanding”: XID=100 /* Index Block 2 */

In the separate storage, a previous image of Index Block 2 is captured prior to the insertion of the “outstanding” value. Conversely, since no modifications were made to Index Block 1, no previous image was generated for it.

DatabaseI:

At INSERT (XID = 0):

“awesome”: XID1=0, XID2=NULL /* Index Block, Inserted by XID 0 */

At UPDATE (XID = 100):

“awesome”: XID1=0, XID2=100 /* Index Block 1, Inserted by XID 0, Deleted by XID 100 */

“outstanding”: XID1=100, XID2=NULL /* Index Block 2, Inserted by XID 100 */

Querying the New Value: “outstanding”

When DatabaseO receives a query for “outstanding,” it accesses the relevant leaf block and checks the XID in the block header. The system then determines whether to read the current version (if transaction XID >= block header XID) or to seek a before-image (if transaction XID < block header XID), though that before-image does not contain the “outstanding” value itself. Similarly, DatabaseI evaluates the XIDs stored within the index entry to decide if the value should be read.

Querying the Old Value: “awesome”

DatabaseI evaluates the XIDs stored within the index entry to decide if the value should be read. But in DatabaseO retrieving the legacy value “awesome” is more difficult because the system must signal to new transactions that the entry is replaced or deleted. This typically requires visibility counters paired with a dedicated delete bit. In DatabaseO, a query for “awesome” checks the current block for a deletion marker or fetches the older image from separate storage where the delete bit is absent, depending on the transaction’s relationship to the block header XID.

DatabaseO:

At INSERT (XID = 0):

“awesome”: XID=0, Del_Bit=0 /* Index Block */

At UPDATE (XID = 100):

“awesome”: XID=100, Del_Bit=1 /*Index Block 1 */

“outstanding”: XID=100, Del_Bit=0 /* Index Block 2 */

Now both Index Block 1 and Index Block 2 previous images captured in the separate storage. It is also important to note that indexes now retain “garbage” (obsolete values), which necessitates a structured cleanup regardless of whether DatabaseO or DatabaseI is utilized.

Comparing In-Place and Out-of-Place Data Versioning

Now Let’s understand the high level difference between these two versioning models:

Write Performance during Version Creation

At the moment of version creation, DatabaseI appears more efficient. By placing the new version in the same block as the old one, it minimizes I/O; the system fetches a block, produces the new version within that same block, and writes it back in a single operation. In contrast, DatabaseO requires at least two random I/O operations: one to write the heap block and another to write the old image to a separate file.

Read Performance

For read operations, DatabaseI maintains an I/O advantage by fetching only a single block and filtering the versions locally. DatabaseO, however, may need to fetch the current heap page and then may perform an additional fetch from separate storage if the transaction requires a version older than what the current heap block provides. Of course, if a transaction requires the most recent image of the data, both databases would only need to fetch the heap block.

Block Density

In terms of storage density, however, DatabaseI encounters a notable hurdle. Its practice of maintaining multiple versions within the same block causes heap blocks to reach capacity much more quickly. Conversely, DatabaseO stores the pre-update data image in a separate location, allowing the row to occupy essentially the same amount of space within the heap. Both models share a common exception: when a brief value is replaced by a substantially larger one (for example, update “country” from “USA” to “Unites States of America”), the block may fill up rapidly regardless of the underlying architecture. This accelerated filling of blocks in DatabaseI creates additional downstream effects on both read and write performance, which will be explored in greater detail in a subsequent section.

Efficiency of the Cleanup Process

The continuous generation of new row versions presents a fundamental challenge in MVCC design. Databases would encounter unrestricted growth without a structured approach to purge outdated data. This maintenance, termed garbage collection, identifies and removes legacy data snapshots once it is certain that no active transaction requires access to them. DatabaseI storing the garbage directly with live data in the heap, while DatabaseO relocates it to a separate storage area.

When it comes to the actual garbage collection, DatabaseO has a distinct advantage. By isolating versioned data in a dedicated area, it can clean up obsolete rows more efficiently. DatabaseI, by contrast, must scan entire datasets to differentiate between live rows and garbage mixed within the heap blocks, making the identification and removal process more complex.

Snapshot Visibility

Snapshot visibility is more straightforward in DatabaseO than in DatabaseI. DatabaseI is at a notable disadvantage because it must maintain two visibility counters for every individual row, whereas DatabaseO only requires a single counter (along with a redirection structure) located within the block header.

Rollback Mechanism

In the event of a transaction rollback, the architectural differences between the two systems become even more apparent. For DatabaseI, the garbage collection process is essentially part of its normal routine; it must eventually clean up a row version regardless the transaction is committed or rolled back, and it simply uses the transaction status to determine which version is obsolete. Conversely, DatabaseO always maintains the most recent image directly in the heap block. To perform a rollback, it must actively undo the changes by restoring the original data from separate storage back to the heap. This creates additional I/O overhead during the rollback process. Moreover, until the rollback is finalized, any other transaction attempting to access that heap block will detect the uncommitted XID in the heap block and be redirected to separate storage to find a consistent version of the data.

Index Churn

Just as with heap blocks, DatabaseI indexes naturally consume more storage because they must maintain visibility counters for every row. Also, the shifting physical addresses of rows in DatabaseI appear to cause significantly higher index churn. While data visibility within the index follows a similar logic across both systems, DatabaseO requires additional bits to track entry deletions.

Closing

Conceptually both in-place and out-of-place data versioning strategies present distinct advantages and challenges. We will map these fundamental concepts to practice followed by PostgreSQL and Oracle in Part2…

Floor Drees: Shaping SQL in São Paulo

Tue, 10 Mar 2026 13:37:56 +0000

Last week, EDB engineers Matheus Alcantara and Euler Taveira attended the ISO/IEC SQL Standards Committee meeting in São Paulo as invited guests, supported remotely by veteran member Peter Eisentraut. The duo compared the collaborative environment to a PostgreSQL "Commitfest," where technical papers are proposed, debated, and refined much like code patches.

Andrew Dunstan: Validating the shape of your JSON data

Tue, 10 Mar 2026 10:13:17 +0000

One of the great things about PostgreSQL's jsonb type is the flexibility it gives you — you can store whatever structure you need without defining columns up front. But that flexibility comes with a trade-off: there's nothing stopping bad data from getting in. You can slap a CHECK constraint on a jsonb column, but writing validation logic in SQL or PL/pgSQL for anything beyond the trivial gets ugly fast.

I've been working on a PostgreSQL extension called json_schema_validate that solves this problem by letting you validate JSON and JSONB data against JSON Schema specifications directly in the

Dave Page: AI Features in pgAdmin: The AI Chat Agent

Tue, 10 Mar 2026 05:44:17 +0000

This is the second in a series of three blog posts covering the new AI functionality in pgAdmin 4. In the first post, I covered LLM configuration and the AI-powered analysis reports. In this post, I'll introduce the AI Chat agent in the query tool, and in the third, I'll explore the AI Insights feature for EXPLAIN plan analysis.If you've ever found yourself staring at a database schema you didn't design, trying to work out the right joins to answer a seemingly simple question, you'll appreciate what the AI Chat agent brings to pgAdmin's query tool. Rather than having to alt-tab to an external AI service, paste in your schema, describe what you need, and then copy the resulting SQL back into your editor, the entire conversation now happens within the query tool itself, with full awareness of your actual database structure.

Finding the AI Assistant

The AI Chat agent appears as a new tab alongside the Query and Query History tabs in the left panel of the query tool. It's labelled 'AI Assistant' and is only visible when an LLM provider has been configured (as described in the first post in this series). The panel header shows which LLM provider and model are currently active, so you always know what's generating your responses.

Natural Language to SQL

The core capability of the AI Chat agent is translating natural language questions into SQL queries. You type what you want to know in plain English (or whatever language you're comfortable with), and the assistant generates the corresponding SQL, complete with an explanation of what it does and why it was written that way.For example, you might type something like:The assistant will first inspect your database schema to understand the available tables and relationships, then generate an appropriate query. The response includes both the SQL and a brief explanation, so you can understand what the query is doing before you run it.What makes this particularly useful is that the assistant doesn't just guess at your schema; it actively inspects the database using a set of tools that allow it to discover schemas, tables, columns, constraints, and indexes. This means the generated SQL uses your actual table and column names, respects your foreign key relationships, and takes advantage of your existing indexes where appropriate.

How the Agent Works

Behind the scenes, the AI Chat agent operates as a tool-using LLM agent with access to four database inspection tools:

get_database_schema
: Lists all schemas, tables, and views in the connected database

get_table_info
: Retrieves detailed column, constraint, and index information for a specific table

get_table_columns
: Gets column names, data types, nullability, and defaults for a table

execute_sql_query
: Runs read-only SELECT queries to understand data structure and content

When you send a message, the assistant typically begins by calling to understand what tables are available, then drills into specific tables with to understand columns and relationships, and finally constructs the appropriate SQL. This tool-use loop can iterate multiple times for complex requests; the assistant might need to inspect several tables, check column types, or even run a quick exploratory query before it can generate the final answer.All of this happens within a strict safety boundary. The tool runs exclusively within a transaction, results are capped at 1,000 rows, and the maximum number of tool call iterations is configurable (defaulting to 20) through the preferences. The assistant cannot modify your data; it can only read and inspect the database structure.

Working with Generated SQL

When the assistant generates a SQL query, it's presented in a syntax-highlighted code block with three action buttons:

Copy
: Copies the SQL to your clipboard

Insert at Cursor
: Inserts the SQL at the current cursor position in the query editor, which is handy if you want to incorporate it into a larger script

Replace Query
: Replaces the entire contents of the query editor with the generated SQL

The generated SQL is automatically formatted according to your editor preferences for keyword case, identifier case, data type case, and function case, so it blends naturally with the rest of your code.

Conversational Context

The chat maintains a full conversation history within the session, so you can refine your requests iteratively. If the first query isn't quite what you wanted, you can say something like "Actually, filter that to just orders from the last 30 days" and the assistant will adjust the previous query accordingly. The assistant is also smart enough to ask clarifying questions when your request is ambiguous; if you ask for 'the users table' but there are multiple schemas each containing a table, it will ask which one you mean rather than guessing.You can navigate through your previous messages using the up and down arrow keys, much like command-line history, which is convenient when you want to rephrase or resubmit an earlier question. The Shift+Enter combination lets you type multi-line messages, whilst pressing Enter on its own sends the message.

Beyond SELECT Queries

The AI Chat agent isn't limited to SELECT queries. It can generate INSERT, UPDATE, DELETE, and DDL statements as well. If you ask it to "add a created_at timestamp column to the users table with a default of now()", it will generate the appropriate statement. For UPDATE and DELETE operations, the assistant is instructed to always include WHERE clauses, providing a useful safety net against accidentally modifying every row in a table.That said, it's worth emphasising that the generated SQL is always presented for your review before execution. The assistant never runs modification queries automatically; it generates the SQL and presents it to you, and you decide whether to run it. This keeps you firmly in control.

Streaming Responses

Responses are streamed to the browser via Server-Sent Events (SSE), so you see progress in real time rather than waiting for the complete response. Whilst the assistant is working, you'll see animated thinking messages with PostgreSQL-themed phrases such as 'Consulting the elephant...', 'Traversing the B-tree...', and 'Vacuuming the catalog...' that rotate every couple of seconds to let you know the analysis is in progress. If a request is taking too long (there is a five-minute timeout), you can click the Stop button to cancel the in-flight request and try a different approach.

Practical Tips

Having worked with the AI Chat agent extensively during development, here are a few observations that might help you get the most from it:

Be specific about what you want
. "Show me user activity" is vague, but "show me the number of logins per day for the last month, grouped by user role" gives the assistant enough context to generate precise SQL.

Use it for exploration
. When you're working with an unfamiliar database, asking questions like "what tables contain customer data?" or "how are orders related to products?" can be faster than manually browsing through the schema tree.

Review the generated SQL before running it
. The assistant is generally very good, but it's working with an LLM under the hood, and LLMs can occasionally produce incorrect or suboptimal queries. Always review what's been generated, especially for modification operations.

Take advantage of the conversation flow
. Start broad and refine iteratively; it's much more natural than trying to specify everything in a single message.

What's Next

In the final post in this series, I'll cover the AI Insights feature in the EXPLAIN plan viewer, which analyses your query execution plans and provides actionable optimisation recommendations, including specific index creation statements that you can insert directly into the editor. If you've ever found EXPLAIN output difficult to interpret, this feature is for you.

Yuwei Xiao: Introducing pg_duckpipe: Real-Time CDC for Your Lakehouse

Tue, 10 Mar 2026 00:00:00 +0000

Automatically keep a fast, analytical copy of your PostgreSQL tables, updated in real time with no external tools needed.

Umair Shahid: Thinking of PostgreSQL High Availability as Layers

Mon, 09 Mar 2026 14:03:16 +0000

High availability for PostgreSQL is often treated as a single, big, dramatic decision: “Are we doing HA or not?”

That framing pushes teams into two extremes:

a “hero architecture” that costs a lot and still feels tense to operate, or
a minimalistic architecture that everyone hopes will just keep running.

A calmer way to design this is to treat HA and DR as layers. You start with a baseline, then add specific capabilities only when your RPO/RTO and budget justify them.

Let us walk through the layers from “single primary” to “multi-site DR posture”.

Start with outcomes

Before topology, align on three things:

1. Failure scope

- A database host fails
- A zone or data center goes away
- A full region outage happens
- Human error

2. RPO (Recovery Point Objective)

- We can tolerate up to 15 minutes of data loss
- We want close to zero

3. RTO (Recovery Time Objective)

- We can be back in 30 minutes
- We want service back in under 2 minutes

Here is my stance (and it saves money!): You get strong availability outcomes by layering in the right order.

Layer 0 – Single primary (baseline, no backups)

This is the baseline: one PostgreSQL primary in one site. All reads and writes go to it.

That is it. No replicas. No archiving. No backup flow in this model.

What you get:

simplicity
low cost
low operational overhead

What it means operationally:

Your “recovery plan” is effectively “rebuild and rehydrate from wherever you can” (which might be infrastructure snapshots, application-level rebuilds, or other ad hoc processes depending on your environment).
Your availability depends heavily on the stability of the underlying host, storage, and platform.

If you are running Layer 0, the best mindset is: keep it stable and observable.

solid monitoring (latency, errors, saturation)
sane maintenance (bloat, stats, connection hygiene)
predictable change management

Layer 0 is not a “bad” architecture. It is simply the baseline. The moment you want a reliable recovery posture, you move to Layer 1.

Layer 1 – Add offsite backups (your first real safety net)

Layer 1 keeps the same single primary in Site A, and adds backup storage in Site B.

This model introduces a defined recovery path.

What you gain:

You can lose the primary server and still recover your data.
You can meet an RPO that is “last successful backup” (which is often perfectly acceptable for many systems).

Practical ways teams implement this:

pgBackRest or Barman sending backups to object storage (often in another region/account)
retention policies that reflect compliance and business needs

An important point to note here – a backup is only as good as its ‘restorability’. If you can’t restore a backup, there is no point in taking one. Best practice is to run periodic drills to test the restore procedure, measure the time it takes, and verify the data it restores.

Layer 2 – Add WAL archiving (PITR-ready recovery)

Layer 2 builds on Layer 1 by adding WAL archiving from Site A to Site B.

This is where recovery becomes precise and continuous.

Backups alone restore you to “the last backup.” WAL archiving lets you restore to a point in time.

What you gain:

PITR (Point-in-Time Recovery)
Tighter RPO
A clean response to human error

The habit that makes this layer valuable:

restore drills
timed drills
runbooks that a tired engineer can follow at 2 AM

Layer 2 is one of the highest-ROI layers in the entire model because it turns recovery into a controlled process rather than improvisation.

Layer 3 – Add a hot standby

Layer 3 keeps backups + WAL archiving, and adds a hot standby in Site A (often in a different zone or DC).

Primary → standby uses asynchronous streaming replication.

What you gain:

much faster RTO (fail over to the standby instead of rebuilding)
the option for load balancing (route read queries to the standby)
planned switchovers for maintenance that do not disrupt operations

Additional monitoring requirements:

replication lag
WAL generation rate
standby replay delay
failover readiness

This is also where teams choose between:

disciplined manual failover
Auto failover using an HA manager

Either path works when it is tested and documented.

Layer 4 – Add synchronous replication

Layer 4 is where teams typically run a primary and multiple standbys, using:

synchronous replication for stronger data guarantees, and
asynchronous replication for flexibility and additional redundancy.

What you gain:

near-zero data loss for transactions protected by synchronous commit

What you accept:

added write latency
more explicit failure handling

An important part of the policy:

When the synchronous standby is unavailable, do you prefer continued writes (async mode) or do you prefer waiting until sync returns?

Teams that decide this up front operate Layer 4 calmly. Teams that leave it implicit tend to discover their “real” policy during an incident.

Layer 5 – Add a warm standby in Site B

Layer 5 is where you treat a second site as a true recovery location, adding regional redundancy.

You keep your HA setup in Site A and maintain a warm standby in Site B, fed by backups and WAL archives that are continuously applied to the standby node.

What you gain:

a cleaner plan for site-level outages
a faster recovery path to Site B, reducing RTO

This layer also forces a useful reality check, DR is not only a database design. You also want:

routing (DNS/LB) that can switch cleanly
application configuration that supports failover
secrets and access that work in the DR site
rehearsed runbooks

When those pieces are ready, Layer 5 feels like a controlled switchover instead of a high-stress scramble.

Common gotchas that show up in production

These are the ones I see repeatedly:

Backups exist; restore is untested. At best, this is Schrodinger’s backup – and you will only know when there is an outage.
WAL archiving is configured but not monitored. You want to make sure the consumer is consuming the files, so they don’t pile up on the producer.
Replication slots retain WAL longer than expected. This needs to be monitored, and you need to ask ‘why’.
Synchronous replication without a clear failure policy. Write the rule down, test it, and make it visible to the on-call team.
Read traffic routed to standbys without thinking about staleness. Replica reads are great when you choose the right queries and accept the consistency model.

The post Thinking of PostgreSQL High Availability as Layers appeared first on Stormatics.

Cornelia Biacsics: Contributions for week 9, 2026

Mon, 09 Mar 2026 10:31:43 +0000

The community met on Wednesday, March 4, 2026 for the 7. PostgreSQL User Group NRW MeetUp (Cologne, ORDIX AG). It was organised by Dirk Krautschick and Andreas Baier.

Speakers:

Robin Riel
Jan Karremans

PostgreSQL Berlin March 2026 Meetup took place on March 5, 2026 organized by Andreas Scherbaum and Sergey Dudoladov.

Speakers:

Andreas Scherbaum
Tudor Golubenco
Narendra Tawar
Kai Wagner

Kai Wagner wrote about his experience at the meetup PostgreSQL Berlin Meetup - March 2026

Andreas Scherbaum wrote a blog posting about the Meetup.

SCALE 23x (March 5-8, 2026) had a dedicated PostgreSQL track, filled by the following contributions

Trainings:

Elizabeth Christensen
Devrim Gunduz
Ryan Booz

Talks:

Nick Meyer
Tristan Ahmadi
Alexandra Wang
Christophe Pettus
Max Englander
Magnus Hagander
Bruce Momjian
Robert Treat
Payal Singh
German Eichberger
Jimmy Angelakos
Justin Frye

SCALE 23x PostgreSQL Booth volunteers:

Bruce Momjian
Christine Momjian
Gabrielle Roth
Jennifer Scheuerell
Magnus Hagander
Devrim Gunduz
Elizabeth Garret Christensen
Robert Treat
Pavlo Golub
Phill Vacca
Jimmy Angelakos
Erika Miller
Aya Griswold
Alex Wood
Donald Wong
Derya Gumustel

Dave Page: AI Features in pgAdmin: Configuration and Reports

Mon, 09 Mar 2026 05:31:29 +0000

This is the first in a series of three blog posts covering the new AI functionality coming in pgAdmin 4. In this post, I'll walk through how to configure the LLM integration and introduce the AI-powered analysis reports; in the second, I'll cover the AI Chat agent in the query tool; and in the third, I'll explore the AI Insights feature for EXPLAIN plan analysis.Anyone who manages PostgreSQL databases in a professional capacity knows that keeping on top of security, performance, and schema design is an ongoing endeavour. You might have a checklist of things to review, or perhaps you rely on experience and intuition to spot potential issues, but it is all too easy for something to slip through the cracks, especially as databases grow in complexity. We've been thinking about how AI could help with this, and I'm pleased to introduce a suite of AI-powered features in pgAdmin 4 that bring large language model analysis directly into the tool you already use every day.

Configuring the LLM Integration

Before any of the AI features can be used, you'll need to configure an LLM provider. pgAdmin supports four providers out of the box, giving you flexibility to choose between cloud-hosted models and locally-running alternatives:

Anthropic
(Claude models)

OpenAI
(GPT models)

Ollama
(locally-hosted open-source models)

Docker Model Runner
(built into Docker Desktop 4.40 and later)

Server Configuration

At the server level, there is a master switch in (or, more typically, ) that controls whether AI features are available at all:When is set to , all AI functionality is hidden from users and cannot be enabled through preferences. This gives administrators full control over whether AI features are permitted in their environment, which is particularly important in organisations with strict data governance policies.Below the master switch, you'll find default configuration for each provider:For the cloud providers (Anthropic and OpenAI), API keys are read from files on disk rather than being stored directly in the configuration, which is a deliberate security choice. The key file should contain nothing but the API key itself, with no additional whitespace or formatting. For Ollama and Docker Model Runner, you simply provide the API URL for the local service (typically for Ollama and for Docker).

User Preferences

Whilst the server configuration sets the defaults and boundaries, individual users can customise their AI settings through the Preferences dialog under the 'AI' section. The preferences are organised into categories:AI Configuration contains the general settings:

Default Provider
: Users can select their preferred provider from a dropdown, or choose 'None (Disabled)' to turn off AI features for their account. This setting only takes effect if
LLM_ENABLED
is
True
in the server configuration.

Max Tool Iterations
: Controls how many tool call rounds the AI is allowed to perform during a single conversation, with a default of 20. Higher values allow more complex analyses but consume more resources.

Each provider has its own category with provider-specific settings:

Anthropic
: API Key File path and Model selection

OpenAI
: API Key File path and Model selection

Ollama
: API URL and Model selection

Docker Model Runner
: API URL and Model selection

One particularly nice touch is that the model selection dropdowns are populated dynamically. When you configure an API key or URL and click the refresh button, pgAdmin queries the provider's API to fetch the list of available models. For Ollama, it even shows the model sizes so you can see at a glance how much disk space each model is using. The model selectors also support typing in custom model names, so you're not limited to whatever the API returns; if you know the exact model identifier you want to use, you can simply type it in.

AI Analysis Reports

With the LLM configured, you gain access to three types of AI-powered analysis reports that can be generated from the browser tree context menu. Simply right-click on a server, database, or schema and select the appropriate report from the 'AI Analysis' submenu.

Security Reports

The security report examines your PostgreSQL configuration from a security perspective, covering a comprehensive range of areas:

Authentication Configuration
: Password policies, SSL/TLS settings, authentication methods, and connection security

Access Control and Roles
: Superuser accounts, privileged roles, login roles without password expiry, and role privilege assignments

Network Security
: Listen addresses, connection limits, and
pg_hba.conf
rules

Encryption and SSL
: SSL/TLS configuration, password encryption methods, and data-at-rest encryption settings

Object Permissions
: Schema, table, and function access control lists, default privileges, and ownership (at database scope)

Row-Level Security
: RLS policies, RLS-enabled tables, and policy coverage analysis

Security Definer Functions
: Functions running with elevated privileges and their permission settings

Audit and Logging
: Connection logging, statement logging, error logging, and audit trail configuration

Extensions
: Installed extensions and their security implications

Security reports can be generated at the server level (covering server-wide configuration such as authentication and network settings), the database level (adding object permissions and RLS analysis), or the schema level (focusing on a specific schema's security posture).

Performance Reports

The performance report analyses your server and database configuration for potential optimisation opportunities:

Memory Configuration
:
shared_buffers
,
work_mem
,
effective_cache_size
,
maintenance_work_mem
, and related settings

Checkpoint and WAL
: Checkpoint settings, WAL configuration, and background writer statistics

Autovacuum Configuration
: Autovacuum settings, tables needing vacuum, and dead tuple accumulation

Query Planner Settings
: Cost parameters, statistics targets, JIT compilation, and planner optimisation settings

Parallelism and Workers
: Parallel query configuration and worker process settings

Connection Management
: Maximum connections, reserved connections, timeouts, and current connection status

Cache Efficiency
: Buffer cache hit ratios, database-level cache statistics, and table-level I/O patterns

Index Analysis
: Index utilisation, unused indexes, tables that might benefit from additional indexes, and index size analysis

Query Performance
: Slowest queries and most frequent queries (when
pg_stat_statements
is available)

Replication Status
: Replication lag, standby status, and WAL sender statistics

Performance reports are available at both the server and database levels, with database-level reports including additional detail on index usage and cache efficiency for that specific database.

Schema Design Reports

The design review report examines your database schema for structural quality and best practices:

Table Structure
: Table definitions, column counts, sizes, ownership, and documentation coverage

Primary Key Analysis
: Primary key design and tables lacking primary keys

Referential Integrity
: Foreign key relationships, orphan references, and relationship coverage

Index Strategy
: Index definitions, duplicate indexes, index types, and coverage analysis

Constraints
: Check constraints, unique constraints, and data validation coverage

Normalisation Analysis
: Repeated column patterns, potential denormalisation issues, and data redundancy

Naming Conventions
: Table and column naming patterns, consistency analysis, and naming standard compliance

Data Type Review
: Data type usage patterns, type consistency, and type appropriateness

Design reports are available at the database and schema levels, allowing you to review either an entire database's schema design or focus on a specific schema.

How the Reports Work

Under the hood, the report generation follows a sophisticated multi-stage pipeline that keeps each LLM interaction within manageable token limits whilst still producing comprehensive output:

Planning
: The LLM first reviews the available analysis sections and the database context (server version, table count, available extensions, and so on), then selects which sections are most relevant to analyse. This means the report is tailored to your specific environment rather than running every possible check regardless of applicability.

Data Gathering
: For each selected section, pgAdmin executes a set of SQL queries against the database to collect the relevant configuration data, statistics, and metadata.

Section Analysis
: Each section's data is sent to the LLM independently for analysis. The LLM classifies findings by severity (Critical, Warning, Advisory, or Good) and provides specific, actionable recommendations, including SQL commands where relevant.

Synthesis
: Finally, the individual section analyses are combined into a cohesive report with an executive summary, a critical issues section aggregating the most important findings, the detailed section analyses, and a prioritised list of recommendations.

As the pipeline works through these stages, the UI shows real-time progress updates: the current stage name (Planning Analysis, Gathering Data, Analysing Sections, Creating Report), a description of what's being processed (for example, 'Analysing Memory Configuration...'), and a progress bar showing how many sections have been completed out of the total. Once all four stages are finished, the completed report is rendered in the panel in one go. Each report can also be downloaded as a Markdown file for archiving or sharing with colleagues.The reports are designed to be genuinely useful rather than generic. Because the LLM receives actual data from your database (configuration settings, role definitions, table statistics, and index information), its analysis is grounded in reality. A security report will flag your specific rules that might be overly permissive, a performance report will identify your specific tables that are missing useful indexes, and a design report will point out your specific naming inconsistencies.

A Note on Privacy and Data

It is worth noting that when using cloud-hosted LLM providers (Anthropic or OpenAI), the database metadata and configuration data gathered for reports is sent to those providers' APIs. No actual table data is sent for the reports (only metadata, configuration settings, and statistics), but administrators should be aware of this and ensure it aligns with their organisation's data handling policies. For environments where sending any data externally is not acceptable, the Ollama and Docker Model Runner options allow you to run models entirely locally.

Getting Started

If you'd like to try the AI features, the quickest way to get started is to configure an API key for either Anthropic or OpenAI, set the default provider in Preferences, and then right-click on a server in the browser tree to generate your first report. If you prefer to keep everything local, installing Ollama and pulling a model such as is straightforward, and Docker Desktop users on version 4.40 or later can enable the built-in model runner without any additional setup.In the next post, I'll cover the AI Chat agent in the query tool, which brings natural language to SQL translation directly into your workflow, along with database-aware conversational assistance. Stay tuned.

Radim Marek: Production Query Plans Without Production Data

Sun, 08 Mar 2026 21:15:56 +0000

In the previous article we covered how the PostgreSQL planner reads pg_class and pg_statistic to estimate row counts, choose join strategies, and decide whether an index scan is worth it. The message was clear: when statistics are wrong, everything else goes with it.

Streaming replication provides bit-to-bit replication, so all replicas share the same statistics with primary server.

But there was one thing we didn't talk about. Statistics are specific to the database cluster that generated them. The primary way to populate them is `ANALYZE` which requires the actual data.

PostgreSQL 18 changed that. Two new functions: pg_restore_relation_stats and pg_restore_attribute_stats write numbers directly into the catalog tables. Combined with pg_dump --statistics-only, you can treat optimizer statistics as a deployable artifact. Compact, portable, plain SQL.

The feature was driven by the upgrade use case. In the past, major version upgrades used to leave pg_statistic empty, forcing you to run ANALYZE. Which might take hours on large clusters. With PostgreSQL 18 upgrades now transfer statistics automatically. But that's just the beginning. The same logic lets you export statistics from production and inject them anywhere - test database, local debugging, or as part of CI pipelines.

The problem

Your CI database has 1,000 rows. Production has 50 million. The planner makes completely different decisions for each. Running EXPLAIN in CI tells you nothing about the production plan. This is the core premise behind RegreSQL. Catching query plan regressions in CI is far more reliable when the planner sees production-scale statistics.

Same applies to debugging. A query is slow in production and you want to reproduce the plan locally, but your database has different statistics, and planner chooses the predictable path. Porting production stats can provide you that snapshot of thinking planner has to do in production, without actually going to production.

pg_restore_relation_stats

The first of function behind portable PostgreSQL statistics is pg_restore_relation_stats. It writes table-level data directly into pg_class in form of variadic name/value pairs.

SELECT pg_restore_relation_stats(
    'schemaname', 'public',
    'relname', 'orders',
    'relpages', 123513::integer,
    'reltuples', 50000000::real,
    'relallvisible', 123513::integer,
    'relallfrozen', 120000::integer
);

But that's just an example. Let's modify some real statistics to see the full value. We will create a small table, inject fake production-like statistics and watch the planner to change its mind.

CREATE TABLE test_orders (
    id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    customer_id integer NOT NULL,
    amount numeric(10,2) NOT NULL,
    status text NOT NULL DEFAULT 'pending',
    created_at date NOT NULL DEFAULT CURRENT_DATE
);

INSERT INTO test_orders (customer_id, amount, status, created_at)
SELECT
    (random() * 9999 + 1)::int,
    (random() * 5000 + 5)::numeric(10,2),
    (ARRAY['pending','shipped','delivered','cancelled'])[floor(random()*4+1)::int],
    '2024-01-01'::date + (random() * 365)::int
FROM generate_series(1, 10000);

CREATE INDEX ON test_orders (created_at);
CREATE INDEX ON test_orders (status);
ANALYZE test_orders;

When you check the current statistics, it has predictable data.

SELECT relname, relpages, reltuples
FROM pg_class WHERE relname = 'test_orders';

   relname   | relpages | reltuples
-------------+----------+-----------
 test_orders |       74 |     10000
(1 row)

With 10,000 rows across 74 pages, the planner picks a sequential scan.

EXPLAIN SELECT * FROM test_orders WHERE created_at > '2024-06-01';

                           QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on test_orders  (cost=0.00..199.00 rows=5891 width=26)
   Filter: (created_at > '2024-06-01'::date)
(2 rows)

Now inject production-scale table stats:

SELECT pg_restore_relation_stats(
    'schemaname', 'public',
    'relname', 'test_orders',
    'relpages', 123513::integer,
    'reltuples', 50000000::real,
    'relallvisible', 123513::integer
);

And you might be surprised by the result.

EXPLAIN SELECT * FROM test_orders WHERE created_at > '2024-06-01';

                            QUERY PLAN
------------------------------------------------------------------
 Seq Scan on test_orders  (cost=0.00..448.45 rows=17649 width=26)
   Filter: (created_at > '2024-06-01'::date)

The planner is still using the sequential plan. Only the estimated number of rows has changed. Why? If you remember from previous article, it's where column level statistics come into play. Histogram bounds still match the original 10,000 rows we inserted.

pg_restore_attribute_stats

This function writes column-level statistics into pg_statistic the same catalog that ANALYZE populates with MCVs, histograms, and correlation.

In previous section, we left the planner stuck on a sequential scan despite believing the table has 50 million rows. The missing piece is column-level statistics. Let's pick up where we left off and inject histogram bounds for created_at.

SELECT pg_restore_attribute_stats(
    'schemaname', 'public',
    'relname', 'test_orders',
    'attname', 'created_at',
    'inherited', false::boolean,
    'null_frac', 0.0::real,
    'avg_width', 4::integer,
    'n_distinct', -0.05::real,
    'histogram_bounds', '{2019-01-01,2019-07-01,2020-01-01,2020-07-01,2021-01-01,2021-07-01,2022-01-01,2022-07-01,2023-01-01,2023-07-01,2024-01-01}'::text,
    'correlation', 0.98::real
);

Now the planner knows the data spans 5 years. A query filtering on the last 6 months of 2024 covers a narrow slice.

EXPLAIN SELECT * FROM test_orders WHERE created_at > '2024-06-01';

                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Index Scan using test_orders_created_at_idx on test_orders  (cost=0.29..153.21 rows=6340 width=26)
   Index Cond: (created_at > '2024-06-01'::date)

Histogram bounds divide the non-MCV portion of the data into equal-population buckets. If most_common_vals accounts for most of the data, the histogram covers only the remaining tail. The number of buckets is controlled by default_statistics_target (default 100, meaning 101 bounds).

And that's a plan flip! The histogram tells the planner the data spans 2019–2024, so > '2024-06-01' matches a narrow tail. A small fraction of 50 million rows. The index scan that was ignored before is now the obvious choice. Table-level stats set the scale, column-level stats shaped the selectivity, and together they changed the plan.

The correlation statistic tells the planner how closely the physical row order matches the column's sort order. A value near 1.0 means sequential access patterns - making index scan cheaper because the next row is likely on the same or adjacent page. For time-series data like created_at where rows are inserted chronologically, correlation is typically very high.

Injecting a skewed distribution

The same function handles MCV lists. In production, your status column isn't uniform, 95% of orders are delivered, 1.5% are pending.

SELECT pg_restore_attribute_stats(
    'schemaname', 'public',
    'relname', 'test_orders',
    'attname', 'status',
    'inherited', false::boolean,
    'null_frac', 0.0::real,
    'avg_width', 9::integer,
    'n_distinct', 5::real,
    'most_common_vals', '{delivered,shipped,cancelled,pending,returned}'::text,
    'most_common_freqs', '{0.95,0.015,0.015,0.015,0.005}'::real[]
);

You can see

EXPLAIN SELECT * FROM test_orders WHERE status = 'pending';

                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_orders  (cost=8.93..90.42 rows=599 width=27)
   Recheck Cond: (status = 'pending'::text)
   ->  Bitmap Index Scan on test_orders_status_idx  (cost=0.00..8.78 rows=599 width=0)
         Index Cond: (status = 'pending'::text)
(4 rows)

and compare it with

EXPLAIN SELECT * FROM test_orders WHERE status = 'delivered';

                            QUERY PLAN
------------------------------------------------------------------
 Seq Scan on test_orders  (cost=0.00..448.45 rows=28458 width=27)
   Filter: (status = 'delivered'::text)
(2 rows)

Same column, same operator, different plans. The planner uses a bitmap index scan for pending (1.5% rare enough to justify the index) and a sequential scan for delivered (95% being most of the table). The selectivity ratios from the MCV list drive the plan choice.

You might have noticed the row estimates (599 and 28,458) are lower than you'd expect for a 50-million-row table. The planner checks the actual physical file size. Our table is only 74 pages on disk, not the 123,513 we injected. Hence the planner scales reltuples down proportionally. The absolute numbers shrink, but the ratios between them stay correct, and it's the ratios that determine plan shape. When you use pg_dump --statistics-only in practice, you're typically restoring into a database with comparable data volume, so the estimates align naturally.

pg_dump

The functions we covered are the mechanics. For operational use pg_dump provides everything you need. PostgreSQL 18 added three flags.

Flag	Effect
`--statistics`	dump the statistics (you have to request it explicitely)
`--statistics-only`	dump only the statistics, not schema or data
`--no-statistics`	do not dump statistics

When you export the statistics for your production database

pg_dump --statistics-only -d production_db > stats.sql

you will see the output is series of SELECT pg_restore_relation_stats(...) and SELECT pg_restore_attribute_stats(...) calls. Exactly as we explained above.

The full workflow to turn your production data into testable plans might look like this:

# 1. dump schema from production
pg_dump --schema-only -d production_db > schema.sql

# 2. dump statistics from production
pg_dump --statistics-only -d production_db > stats.sql

# 3. create test database with schema
createdb test_db
psql -d test_db -f schema.sql

# 4. load fixture data (optional; masked, minimal)
psql -d test_db -f fixtures.sql

# 5. inject production statistics
psql -d test_db -f stats.sql

# 6. query plans now match production
psql -d test_db -c "EXPLAIN SELECT * FROM test_orders WHERE status = 'pending'"

Statistics dumps are tiny. A database with hundreds of tables and thousands of columns produces a statistics dump under 1MB. The production data might be hundreds of GB. The statistics that describe it fit in a text file.

Keeping injected statistics alive

Now you might ask yourself, where's the catch? And there's a big one, the autovacuum will eventually kick in and run ANALYZE. Which will overwrite your injected statistics with real numbers and you are back where you started.

To prevent this, disable autovacuum analyze on the tables you've injected.

-- disable autovacuum
ALTER TABLE test_orders SET (autovacuum_enabled = false);

-- or set analyze threshold so high it nevers kicks-in
ALTER TABLE test_orders SET (autovacuum_analyze_threshold = 2147483647);

Be careful here.

If you're also writing data to these tables in dev: running migrations, loading fixtures, testing inserts, the injected statistics will drift further from reality with every write. The planner will plan based on a production distribution that no longer reflects the local data.

For read-only query plan testing this is exactly what you want. For integration tests that modify data, you may need to re-inject statistics after each test run.

And please, never ever do this in production!

What's not covered?

As we have seen earlier, it's not worth trying to inject relpages as the planner checks the actual file size and scales it proportationally. This limits the number of absolute rows planner might estimate. I.e. to get comparable numbers to production environment you still would have to create comparable data volume (which isn't a problem when talking about the primary use case of this feature - restoring backups).

It's also worth to note that CREATE STATISTICS used for multivariate correlations, distinct counts across column groups and MCV lists for column combinations are not covered within PostgreSQL 18. Those still require ANALYZE after restore. PostgreSQL 19 will close this gap with pg_restore_extended_stats().

Security

The restore functions require the MAINTAIN privilege on the target table. This is the same privilege needed for ANALYZE, VACUUM, REINDEX, and CLUSTER as it was introduced in PostgreSQL 17.

The easiest way to grant it for automation:

GRANT pg_maintain TO ci_service_account;

This grants MAINTAIN on all tables in the database. Enough for a CI pipeline to inject statistics without needing superuser.

Bruce Momjian: New Presentation

Sat, 07 Mar 2026 18:45:01 +0000

I just gave a new presentation at SCALE titled The Wonderful World of WAL. I am excited to have a second new talk this year. (I have one more queued up.)

I have always wanted to do a presentation about the write-ahead log (WAL) but I was worried there was not enough content for a full talk. As more features were added to Postgres that relied on the WAL, the talk became more feasible, and at 103 slides, maybe I waited too long.

I had a full hour to give the talk at SCALE, and that was helpful. I was able to answer many questions during the talk, and that was important — many of the later features rely on earlier ones, e.g., point-in-time recovery (PITR) relies heavily on crash recovery, and if you don't understand how crash recovery works, you can't understand PITR. By taking questions at the end of each section, I could be sure everyone understood. The questions showed that the audience of 46 understood the concepts because they were asking about the same issues we dealt with in designing the features:

How does server start know if crash recovery is needed?
Can dirty shared buffers be written to storage before the WAL for the transaction that dirtied them is written?
Can the WAL and heap/index storage get out of sync?
How is the needed WAL accurately retained for replica servers?
Can logical replicas be used as failover servers?

Gabriele Bartolini: From proposal to PR: how to contribute to the new CloudNativePG extensions project

Sat, 07 Mar 2026 06:36:35 +0000

In this article I walk you through the journey of adding the pg_crash extension to the new CloudNativePG extensions project. It explores the transition from legacy standalone repositories to a unified, Dagger-powered build system designed for PostgreSQL 18 and beyond. By focusing on the Image Volume feature and minimal operand images, the post provides a step-by-step guide for community members to contribute and maintain their own extensions within the CloudNativePG ecosystem.