Latest Blog Posts

Postgres backend statistics (Part 1): I/O statistics
Posted by Bertrand Drouvot on 2025-01-07 at 05:26

Introduction

PostgreSQL 18 will normally (as there is always a risk of seeing something reverted until its GA release) include this commit: Add backend-level statistics to pgstats.

commit 9aea73fc61d4e77e000724ce0b2f896590a10e03
Author: Michael Paquier 
Date:   Thu Dec 19 13:19:22 2024 +0900

Add backend-level statistics to pgstats

This adds a new variable-numbered statistics kind in pgstats, where the
object ID key of the stats entries is based on the proc number of the
backends.  This acts as an upper-bound for the number of stats entries
that can exist at once.  The entries are created when a backend starts
after authentication succeeds, and are removed when the backend exits,
making the stats entry exist for as long as their backend is up and
running.  These are not written to the pgstats file at shutdown (note
that write_to_file is disabled, as a safety measure).

Currently, these stats include only information about the I/O generated
by a backend, using the same layer as pg_stat_io, except that it is now
possible to know how much activity is happening in each backend rather
than an overall aggregate of all the activity.  A function called
pg_stat_get_backend_io() is added to access this data depending on the
PID of a backend.  The existing structure could be expanded in the
future to add more information about other statistics related to
backends, depending on requirements or ideas.

Auxiliary processes are not included in this set of statistics.  These
are less interesting to have than normal backends as they have dedicated
entries in pg_stat_io, and stats kinds of their own.

This commit includes also pg_stat_reset_backend_stats(), function able
to reset all the stats associated to a single backend.

It means that, thanks to the new pg_stat_get_backend_io() function, we can see the I/O activity in each backend (in addition to the overall aggregate still available through the pg_stat_io view).

Let’s look at some examples

Thanks to this new pg_stat_get_backend_io

[...]

Running an Async Web Query Queue with Procedures and pg_cron
Posted by Paul Ramsey in Crunchy Data on 2025-01-06 at 14:30

The number of cool things you can do with the http extension is large, but putting those things into production raises an important problem.

The amount of time an HTTP request takes, 100s of milliseconds, is 10- to 20-times longer that the amount of time a normal database query takes.

This means that potentially an HTTP call could jam up a query for a long time. I recently ran an HTTP function in an update against a relatively small 1000 record table.

The query took 5 minutes to run, and during that time the table was locked to other access, since the update touched every row.

This was fine for me on my developer database on my laptop. In a production system, it would not be fine.

Geocoding, For Example

A really common table layout in a spatially enabled enterprise system is a table of addresses with an associated location for each address.

CREATE EXTENSION postgis;

CREATE TABLE addresses (
  pk serial PRIMARY KEY,
  address text,
  city text,
  geom geometry(Point, 4326),
  geocode jsonb
);

CREATE INDEX addresses_geom_x
  ON addresses USING GIST (geom);

INSERT INTO addresses (address, city)
  VALUES ('1650 Chandler Avenue', 'Victoria'),
         ('122 Simcoe Street', 'Victoria');

New addresses get inserted without known locations. The system needs to call an external geocoding service to get locations.

SELECT * FROM addresses;
 pk |       address        |   city   | geom | geocode
----+----------------------+----------+------+---------
  8 | 1650 Chandler Avenue | Victoria |      |
  9 | 122 Simcoe Street    | Victoria |      |

When a new address is inserted into the system, it would be great to geocode it. A trigger would make a lot of sense, but a trigger will run in the same transaction as the insert. So the insert will block until the geocode call is complete. That could take a while. If the system is under load, inserts will pile up, all waiting for their geocodes.

Procedures to the Rescue

A better performing approach would be to insert the address right away

[...]

Waiting for PostgreSQL 18 – psql: Add more information about service name
Posted by Hubert 'depesz' Lubaczewski on 2025-01-06 at 09:43
On 18th of December 2024, Michael Paquier committed patch: psql: Add more information about service name   This commit adds support for the following items in psql, able to show a service name, when available: - Variable SERVICE. - Substitution %s in PROMPT{1,2,3}.   This relies on 4b99fed7541e, that has made the service name available … Continue reading "Waiting for PostgreSQL 18 – psql: Add more information about service name"

PostgreSQL Post Statistics for 2025
Posted by Stefanie Janine on 2025-01-05 at 23:00

A short review of PostgreSQL post rankings in 2024. The counting is limited, as there are no server logs and no cookies used at ProOpenSource websites.

The only tracking used is images on another instance with matomo. As some more protective browsers are blocking third party stuff, there is no counting for those.

Overall Statistics

As you may see in the image below, the access went up each year since the start of the blogs in 2021.

Overall statistics since 2021

First Guest Post

Last year the blog had the first guest post by Emma Saroyan, she wrote about her experiences at PGConf NYC,

Emma Saroyan presenting at PGConf NYC

Top Three Posts in 2024

Number 1

Handling BLOBs In PostgreSQL has been published on October 16 2024 and that is a bit unexpected to have it on No. 1.

Number 2

PostgreSQL with PostGIS on Android has been published on February 13, 2023. It had nearly the same number in 2023 as in 2024.

Number 3

PostgreSQL Connection Poolers published on July 29 2024, seems to be interesting to a lot of people.

Browser Statistics

The pie chart shows, sthst most people are using Chrome and Chrome Mobile to read the blog.

Browser statistics as pie chart for 2024

Country Statistics

Most of all visits have been from the USA in 2024.

World map with countries visiting the blog in 2024

Most Visited Entry Page

The conference calendar has been the most successful entry page in 2024.

PgPedia Week, 2025-01-05
Posted by Ian Barwick on 2025-01-05 at 20:42

A very short "Week" this week, as the end-of-year holiday season inevitably sees a global lull in activity - we're all only human, after all. Wishing everyone a happy new PostgreSQL year!

PostgreSQL 18 changes

No user-visible features or other changes were added this week.

more...

Investigating Memoize's Boundaries
Posted by Andrei Lepikhov in Postgres Professional on 2025-01-03 at 14:01

During the New Year holiday week, I want to glance at one of Postgres' most robust features: the internal caching technique for query trees, also known as memoisation.

Introduced with commit 9eacee2 in 2021, the Memoize node fills the performance gap between HashJoin and parameterised NestLoop: having a couple of big tables, we sometimes need to join only minor row subsets from these tables. In that case, the parameterised NestLoop algorithm does the job much faster than HashJoin. However, the outer size is critical for performance and may cause NestLoop to be rejected just because of massive repetitive scans of inner input.

When predicting multiple duplicates in the outer column that participate as a parameter in the inner side of a join, the optimiser can insert a Memoize node. This node caches the results of the inner query subtree scan for each parameter value and reuses these results if the known value from the outer side reappears later.

This feature is highly beneficial. However, user migration reports indicate that there are still some cases in PostgreSQL where this feature does not apply, leading to significant drops in query execution time. In this post, I will compare the caching methods for intermediate results in PostgreSQL and SQL Server.

Memoisation for SEMI/ANTI JOIN

Let me introduce a couple of tables:

DROP TABLE IF EXISTS t1,t2;
CREATE TABLE t1 (x integer);
INSERT INTO t1 (x)
  SELECT value % 10 FROM generate_series(1,1000) AS value;
CREATE TABLE t2 (x integer, y integer);
INSERT INTO t2 (x,y)
  SELECT value, value%100 FROM generate_series(1,100000) AS value;
CREATE INDEX t2_idx ON t2(x,y);
VACUUM ANALYZE t1,t2;

In Postgres, a simple join of these tables prefers parameterised NestLoop with memoisation:

EXPLAIN (COSTS OFF)
SELECT t1.* FROM t1 JOIN t2 ON (t1.x = t2.x);
/*
 Nested Loop
   ->  Seq Scan on t1
   ->  Memoize
         Cache Key: t1.x
         Cache Mode: logical
         ->  Index Scan using t2_idx on t2
               Index Cond: (x = t1.x)
*/

The smalle

[...]

SQL/JSON Path Playground Update
Posted by David Wheeler in Tembo on 2024-12-31 at 20:40

Based on the recently-released Go JSONPath and JSONTree playgrounds, I’ve updated the design and of the SQL/JSON Playground. It now comes populated with sample JSON borrowed from RFC 9535, as well as a selection of queries that randomly populate the query field on each reload. I believe this makes the playground nicer to start using, not to mention more pleasing to the eye.

The playground has also been updated to use the recently-released sqljson/path v0.2 package, which replicates a few changes included in the PostgreSQL 17 release. Notably, the .string() function no longer uses a time zone or variable format to for dates and times.

Curious to see it in action? Check it out!

Challenges of Postgres Containers
Posted by Jeremy Schneider on 2024-12-31 at 10:52

Many enterprise workloads are being migrated from commercial databases like Oracle and SQL Server to Postgres, which brings anxiety and challenges for mature operational teams. Learning a new database like Postgres sounds intimidating. In practice, most of the concepts directly transfer from databases like SQL Server and Oracle. Transactions, SQL syntax, explain plans, connection management, redo (aka transaction/write-ahead logging), backup and recovery – all have direct parallels. The two biggest differences in Postgres are: (1) vacuum and (2) the whole “open source” and decentralized development paradigm… once you learn those, the rest is gravy. Get a commercial support contract if you need to, try out some training; there are several companies offering these. Re-kindle the curiosity that got us into databases originally, take your time learning day-by-day, connect with other Postgres people online where you can ask questions, and you’ll be fine!

Nonetheless: the anxiety is compounded when you’re learning two new things: both Postgres and containers. I pivoted to Postgres in 2017, and I’m learning containers now. (I know I’m 10 years late getting off the sidelines and into the containers game, but I was doing lots of other interesting things!)

Postgres was already one of the most-pulled images on Docker Hub back in 2019 (10M+) and unsurprisingly it continues to be among the most-pulled images today (1B+). Local development and testing with Postgres has never been easier. For many developers, docker run postgres -e POSTGRES_PASSWORD=mysecret has replaced installers and package managers and desktop GUIs in their local dev & test workflows.

With the widespread adoption of kubernetes, the maturing of it’s support for stateful workloads, and the growing availability of Postgres operators – containers are increasingly being used throughout the full lifecycle of the database. They aren’t just for dev & test: they’re for production too.

Containers will dominate the future of Postgres, if only becaus

[...]

Waiting for PostgreSQL 18 – Add UUID version 7 generation function.
Posted by Hubert 'depesz' Lubaczewski on 2024-12-31 at 09:37
On 11st of December 2024, Masahiko Sawada committed patch: Add UUID version 7 generation function.   This commit introduces the uuidv7() SQL function, which generates UUID version 7 as specified in RFC 9652. UUIDv7 combines a Unix timestamp in milliseconds and random bits, offering both uniqueness and sortability.   In our implementation, the 12-bit sub-millisecond … Continue reading "Waiting for PostgreSQL 18 – Add UUID version 7 generation function."

Can we use this index, please? – Why not?
Posted by Henrietta Dombrovskaya on 2024-12-30 at 02:30

It’s Christmas time and relatively quiet in my day job, so let’s make it story time again! One more tale from the trenches: how wrong you can go with one table and one index?

Several weeks ago, a user asked me why one of the queries had an “inconsistent performance.” According to the user, “Sometimes it takes three minutes, sometimes thirty, or just never finishes.” After taking a look at the query, I could tell that the actual problem was not the 30+ minutes, but 3 minutes – when you have a several hundred million row table and your select yields just over a thousand rows, it’s a classical “short query,” so you should be able to get results in milliseconds.

The original query was over a view with self-join, and at first, I suspected that something was wrong with the view itself, but then I got it down to one SELECT from one table, which was indeed super-slow: taking minutes while it should have taken seconds. The “inconsistency” was due to the high I/O and dependent on what was in the shared buffers at the execution time. The query looked like this:

SELECT * FROM large_table
  WHERE col1='AAA'
  AND col2='BCD'
  AND created_at BETWEEN '01-01-2012' AND '12-31-2012'
  AND extract (hour FROM created_at)=16
  AND extract (minute FROM created_at)=15

There was an index on all of the attributes which were referenced in the query:

CREATE INDEX large_table_index ON large_table (col1, col2, created_at);

The query plan looked perfect: INDEX SCAN using that index; however, the query was incredibly slow because, for each fetched record, the hour and minute had to be verified (and you’ve already guessed that the table was not only large but also wide).

According to the execution plan, the number of rows selected during the index scan was about 30M, and subsequent filtering reduced it to a little bit over 1K. I started to think that as ridiculous as it sounds, it could be a good idea to create an additional partial index or to include the “hour” and “minute” parts into the index. (Un)fortunately, both of

[...]

PgPedia Week, 2024-12-29
Posted by Ian Barwick on 2024-12-29 at 20:21

Another calendar year draws to an end, so this will be the last PgPedia Week of 2024.

PostgreSQL 18 changes

As always, the end-of-year holidays mean commit log activity is lower than usual, but we did see one new potential performance improvement, with commit 58a359e5 (" Speedup tuple deformation with additional function inlining ") promising query performance increases of around 5-20% in deform-heavy OLAP-type workloads.

more...

CloudNativePG in 2024: Milestones, Innovations, and Reflections
Posted by Gabriele Bartolini in EDB on 2024-12-29 at 11:27

2024 was a transformative year for CloudNativePG, marked by significant contributions to the Kubernetes ecosystem, increased adoption, and a growing community. This article reflects on key milestones, including the integration of advanced Kubernetes features, conference highlights, and personal insights, while looking ahead to the opportunities awaiting PostgreSQL in the cloud-native era.

Name Collision of the Year: Vector
Posted by Elizabeth Garrett Christensen in Crunchy Data on 2024-12-26 at 13:30

I can’t get through a zoom call, a conference talk, or an afternoon scroll through LinkedIn without hearing about vectors. Do you feel like the term vector is everywhere this year? It is. Vector actually means several different things and it's confusing. Vector means AI data, GIS locations, digital graphics, and a type of query optimization, and more. The terms and uses are related, sure. They all stem from the same original concept. However their practical applications are quite different. So “Vector” is my choice for this year’s name collision of the year.

In this post I want to break down the vector. The history of the vector, how vectors were used in the past and how they evolved to what they are today (with examples!).

The original vector

The idea that vectors are based on goes back to the 1500s when René Descartes first developed the Cartesian coordinate XY system to represent points in space. Descartes didn't use the word vector but he did develop a numerical representation of a location and direction. Numerical locations is the foundational concept of the vector - used for measuring spatial relationships.

The first use of the term vector was in the 1840s by an Irish mathematician named William Rowan Hamilton. Hamilton defined a vector as a quantity with both magnitude and direction in three-dimensional space. He used it to describe geometric directions and distances, like arrows in 3D space. Hamilton combined his vectors with several other math terms to solve problems with rotation and three dimensional units.

image.png

The word Hamilton chose, vector, comes from the Latin word vehere meaning ‘to carry’ or ‘conveyor’ (yes, same origin for the word vehicle). We assume Hamilton chose this Latin word origin to emphasize the idea of a vector carrying a point from one location to another.

There’s a book about the history of vectors published just this year, and a nice summary here. I’ve already let Santa know this is on my list this year.

Mathematical vectors

Building upon Hamilto

[...]

CNPG Recipe 16 - Balancing Data Durability and Self-Healing with Synchronous Replication
Posted by Gabriele Bartolini in EDB on 2024-12-26 at 09:57

CloudNativePG 1.25 enhances control of PostgreSQL synchronous replication with a new dataDurability option, allowing you to choose between prioritising data consistency or self-healing capabilities. This article explains the feature, contrasts it with previous approaches, and provides guidance on migrating to the new API format.

PG Day Chicago 2025 – the CfP is open – please submit!
Posted by Henrietta Dombrovskaya on 2024-12-25 at 05:26

Dear Postgres Community! Happy Holidays! Whatever you celebrate, there are high chances you have several additional days off at the end of the year. That being said, please consider using some of this time to submit a proposal or two or three to PG Day Chicago 2025!

The CfP is open until January 20, but why leave it to the very last minute?! Our amazing CfP committee will work very hard to evaluate all proposals within a very short period of time, but you can make their work easier by not leaving it to the very last minute.

One more thing to consider. This year, we will have two tracks instead of three(due to the size of the venue), which unfortunately will reduce the number of talks we will be able to accept, but at the same time, each talk will have more chances to attract a larger audience.

And last but not least – don’t you love our great city?! As a famous “Christmas in Chicago” song goes, “I love Christmas in Chicago, that’s my kind of Christmas It’s like no other Christmas at all!” I believe, all of the above is applicable to the PG Day Chicago – it’s like no other Postgres event! I can’t even describe how much I am looking forward to see all my friends in Chicago in April, and hopefully meet new Postgres enthusiasts!

The CfP link.

PostgreSQL 17 64-bit for Windows FDWs
Posted by Regina Obe in PostGIS on 2024-12-23 at 08:50

We are pleased to provide binaries for file_textarray_fdw and odbc_fdw for PostgreSQL 17 Windows 64-bit.

To use these, copy the files into your PostgreSQL 17 Windows 64-bit install folders in same named folders and then run CREATE EXTENSION as usual in the databases of your choice. More details in the packaged README.txt

These were compiled against PostgreSQL 17.2 using msys2 / mingw64 and tested against PostgreSQL 17.2 EDB windows distribution.

Continue reading "PostgreSQL 17 64-bit for Windows FDWs"

Optimizing access to partitioned tables
Posted by Henrietta Dombrovskaya on 2024-12-23 at 00:06

I am unsure whether others have this problem, but I find myself explaining the benefits9and what’s not)of partitioning tables in Postgres over and over again.

Most times, developers have unrealistic expectations that if they partition a gigantic table, “it will be faster to select from it.” I always have to explain that the goal and the purpose of partitioning is improving maintenance, not query optimization, and if the execution speed of short queries remains the same as before partitioning, they should consider it a success (yes, there are no rules without exceptions, and there are cases of performance improvement, but those are exceptions indeed).

The next thing I tell developers is that for the queries not to slow down, they need to explicitly include the values of the attributes used for range partitioning. Quite often, this requirement is not obvious, and moreover, it requires a significant rewrite of the SQL statements in use.

From an application developer’s perspective, they do just that:

SELECT a, b, c

FROM partitioned_table

WHERE p IN

(SELECT p FROM other_table WHERE r between 1 and 100)

You can’t imagine how many times I’ve heard: But I am selecting from a specific partition! And I had to explain that there is no way for Postgres to know before the execution starts which partition it is going to be!

When developers ask me what they should do, I advise them to calculate the value first and use it in a query. This approach works all the time, but to be honest, that’s the question I have for the community: why can’t the query planner do it? Let me be clear: I know why it can’t do it now, but what exactly fundamentally prevents smart people who make Postgres better from making this change in the planner’s behavior? I mean, if I can figure it out, why Postgres can’t?

PostGIS Patch Releases
Posted by Regina Obe in PostGIS on 2024-12-23 at 00:00

The PostGIS development team is pleased to provide bug fix releases for 3.5.1, 3.4.4, 3.3.8, 3.2.8, 3.1.12

Please refer to the links above for more information about the issues resolved by these releases.

PgPedia Week, 2024-12-22
Posted by Ian Barwick on 2024-12-22 at 21:35

The year may be winding down, but PostgreSQL development is still ploughing ahead unabated. The pgPedia elves have also been busy and added a bunch of entries for the various advisory locking system functions.

PostgreSQL 18 changes this week libpq PQservice() function added psql variable SERVICE added prompt configuration option %s added various tab completion improvements pg_stat_reset_backend_stats() pg_stat_get_backend_io()

more...

PG Phriday: Kubernetes Killed the High Availability Star
Posted by Shaun M. Thomas in Tembo on 2024-12-20 at 18:26

Postgres Conference Seattle 2024 partnered up with PASS this year to present a united database front. They accepted my “Kubernetes Killed the High Availability Star” talk, which I graciously gave on the last day of the conference. The next talk in that room wasn’t for another hour, so I had plenty of time to talk shop with attendees, about the future of Postgres, high availability, and Kubernetes in general.

If you weren’t there and missed out on the fun, this is your chance to catch up and enjoy a few of my notorious bad puns along the way. Let me tell you why the concept of Postgres HA is dead.

2025.pgconf.dev needs your submissions!
Posted by Robert Haas in EDB on 2024-12-20 at 14:10

The call for proposals for 2025.pgconf.dev has been extended to January 6, 2025, otherwise known as "very soon". I'm writing this post to encourage you to submit, if you haven't done so yet, regardless of whether you have submitted to 2024.pgconf.dev or its predecessor, PGCon, in the past. The event will only be as good as the content you (collectively) submit, and having found much value in these events over the years that I've been participating, I very much want our future events to be as good as those in the past, or, hopefully, even better. But what makes a good event, and what kind of talk should you submit?

Read more »

pg_power: initialization and basic setup
Posted by Abhishek Chanda on 2024-12-20 at 02:46

I have been playing around with the powercap framework. I wrote a postgres extension that shows the energy usage of a query. Postgres has a hook mechanism that allows an extension to override the default executor. This implementation is very simple: the extension records the current energy reading when a query starts and then calls the actual executor that runs the query. When the query finishes, a second hook records the current energy reading. The overall energy usage of this query is the difference between the two values.

Why Isn’t My Query Benefiting from Partition Pruning?
Posted by Shane Borden on 2024-12-18 at 20:40

Recently I had a customer come to me with a poorly performing query that had a perfectly fine execution time on Oracle, but once migrated to Postgres the query was slow. Upon investigation, it was found that even though the “WHERE” and “JOIN” clauses had explicitly specified partition keys and were joined on the partition key, “pruning” was only happening on one part of the query.

The setup for the test is at the bottom of the blog post: Jump to Test Case Setup

Original Query

The tables in question are all partitioned by “hire_date” as as you can see in the “WHERE” clause below, there should be partition pruning because the “hire_date” from the driving table is equal to that of the joined table, yet we did not get partition pruning:

WITH top_emp AS (
   SELECT 
       DISTINCT
       id,
       first_name,
       last_name,
       hire_date,
       birth_date
   FROM
       employees.employee_part emp
       INNER JOIN employees.salary sal on (sal.employee_id = emp.id)
   WHERE
       emp.hire_date = '1985-01-01'::date
       and emp.gender = 'M'
       and emp.birth_date between '1957-01-01'::date and '1970-01-01'::date
       and sal.amount > 50000
)
SELECT
    dept.employee_id,
    dept.department_id,
    dept.hire_date
FROM
    top_emp emp1,
    employees.department_employee_part dept
WHERE
    emp1.hire_date = dept.hire_date and
    emp1.id = dept.employee_id;

/* Explain Plan */
 Nested Loop  (cost=1467.16..2646.20 rows=307 width=17) (actual time=4.565..4.613 rows=4 loops=1)
   Output: dept.employee_id, dept.department_id, dept.hire_date
   Buffers: shared hit=330
   ->  Unique  (cost=1466.87..1471.11 rows=37 width=31) (actual time=4.537..4.560 rows=4 loops=1)
         Output: emp.id, emp.first_name, emp.last_name, emp.hire_date, emp.birth_date
         Buffers: shared hit=318
         ->  Sort  (cost=1466.87..1467.72 rows=339 width=31) (actual time=4.534..4.539 rows=69 loops=1)
               Output: emp.id, emp.first_name, emp.last_name, emp.hire_date, emp.birth_date
               So
[...]

Contributions for the week of 2024-12-02 (week 49)
Posted by Pavlo Golub in postgres-contrib.org on 2024-12-18 at 12:09

Federico Campoli and Vik Fearing organised PGDay/MED 2025 in Naples, Italy. Speakers presented Gabriele Quaresima, Ryan Booz, Gianluca Padovani, Patrick Lauer, Stefanie Janine Stölting, Fabiana Farias, Pavlo Golub, Devrim Gündüz.

OpenDay 2025 in Bolzano (Italy)
Posted by Luca Ferrari on 2024-12-18 at 00:00

Prepare for the next great event by PgTraining!

OpenDay 2025 in Bolzano (Italy)

PgTraining is organizing the next year event, namely OpenDay 2025 that will be held on April 11th in Bolzano, Italy.

The event will be totally free but registration is required because the room assigned has a fixed number of seats.

Please note that all the speaks will be in italian.

The event will be held at the NOI Techpark.

We are working on the schedule, but the day will be organized in a talks-session and a laboratory/practical session, the former in the morning, the latter in the afternoon.

Please see the official event page for more details and stay tuned for updates!

Postgres Cafe: Exploring pgroll for zero-downtime schema changes
Posted by Gülçin Yıldırım Jelínek in Xata on 2024-12-18 at 00:00
Discover how pgroll helps teams manage schema changes with ease in our new Postgres Cafe blog and video series.

Some of My Favorite Things – Postgres Queries
Posted by Shane Borden on 2024-12-17 at 18:55

In the spirit of the holiday season, I thought I would write a quick post regarding some of my favorite queries that I use on a day to day basis working on Postgres. Some of these queries I have developed and others were found on the internet (hat tip to those who have previously posted) and further refined.

Many more are found on my github site:

https://github.com/shane-borden/sqlScripts/tree/master/postgres

Hope these queries can also help you in your day to day quest to make Postgres run better!

The first three queries rank top SQL from pg_stat_statements either by execution count, mean_exec_time and total_exec_time. I love to use these queries to get a quick glance of what I should focus on tuning. Given that pg_stat_statements tracks lots of things, you can filter out certain “query text” as necessary so that those do not influence the ranking.

Top SQL by Mean Exec Time

WITH
hist AS (
SELECT queryid::text,
       SUBSTRING(query from 1 for 1000) query,
       ROW_NUMBER () OVER (ORDER BY mean_exec_time::numeric DESC) rn,
       SUM(mean_exec_time::numeric) mean_exec_time
  FROM pg_stat_statements
 WHERE queryid IS NOT NULL
                AND query::text not like '%pg_%' 
                AND query::text not like '%g_%'
        /* Add more filters here */
 GROUP BY
       queryid,
       SUBSTRING(query from 1 for 1000),
       mean_exec_time::numeric
),
total AS (
SELECT SUM(mean_exec_time::numeric) mean_exec_time FROM hist
)
SELECT DISTINCT
       h.queryid::text,
       ROUND(h.mean_exec_time::numeric,3) mean_exec_time,
       ROUND(100 * h.mean_exec_time / t.mean_exec_time, 1) percent,
       h.query
  FROM hist h,
       total t
 WHERE h.mean_exec_time >= t.mean_exec_time / 1000 AND rn <= 14
 UNION ALL
SELECT 'Others',
       ROUND(COALESCE(SUM(h.mean_exec_time), 0), 3) mean_exec_time,
       COALESCE(ROUND(100 * SUM(h.mean_exec_time) / AVG(t.mean_exec_time), 1), 0) percent,
       NULL sql_text
  FROM hist h,
       total t
 WHERE h.mean_exec_time < t.mean_exec_time / 1000 O
[...]

pg_incremental: Incremental Data Processing in Postgres
Posted by Marco Slot in Crunchy Data on 2024-12-17 at 13:30

Today I’m excited to introduce pg_incremental, a new open source PostgreSQL extension for automated, incremental, reliable batch processing. This extension helps you create processing pipelines for append-only streams of data, such as IoT / time series / event data workloads.

Notable pg_incremental use cases include:

  • Creation and incremental maintenance of rollups, aggregations, and interval aggregations
  • Incremental data transformations
  • Periodic imports or export of new data using standard SQL

After you set up a pg_incremental pipeline, it runs forever until you tell Postgres to stop. There’s a lot you can do with pg_incremental and we have a lot of thoughts on why we think it’s valuable. To help you navigate some of if you want to jump directly to one of the examples that you feel is relevant to you:

Why incremental processing?

My team has been working on handling data-intensive workloads in PostgreSQL for many years. The most data-intensive workloads are usually the ones with a machine-generated stream of event data, and we often find that the best solution for handling those workloads in PostgreSQL involves incremental data processing.

For example, a common pattern in PostgreSQL is to periodically pre-aggregate incoming event data into a summary table. In that model, writes (esp. batch loads) are fast because they do not trigger any immediate processing. The incremental aggregation is fast because it only processes new rows, and queries from dashboards are fast because they hit an indexed summary table. I originally developed pg_cron for this purpose, but creating an end-to-end pipeline sti

[...]

Window functions to the rescue
Posted by Florent Jardin in Dalibo on 2024-12-17 at 12:00

PostgreSQL comes with a variety of functions that allow you to group rows into a “window” and perform calculations on that window. By using these functions, you can create more advanced and efficient queries for analyzing your database.

In early 2023, I contributed to a project that converts data models to PostgreSQL, called db_migrator. On this occasion, I (re)discovered the power of these window functions with the SQL language. In this article, I revisit a specific case of transforming the upper bounds of a partitioned table into an array of boundaries.

Quick Benchmark: Improvements to Large Object Dumping in Postgres 17
Posted by Michael Banck in credativ on 2024-12-17 at 09:00

Version 17 of PostgreSQL has been released for a while. One of the many features is a change by Tom Lane called “Rearrange pg_dump’s handling of large objects for better efficiency”. In the past, we have seen our customers have several problems with a large number of large objects being a performance issue for dump/restore. The main reason for this is that large objects are quite unlike to TOAST (The Oversized Attribute Storage Technique): while TOASTed data is completely transparent to the user, large objects are stored out-of-line in a pg_largeboject table with a link to the particular row in that table being an OID in the table itself.

Introduction To Large
Objects

Here is an example on how large objects can be used:

postgres=# CREATE TABLE test(id BIGINT, blob OID);
CREATE TABLE
postgres=# INSERT INTO test VALUES (1, lo_import('/etc/issue.net'));
INSERT 0 1
postgres=# SELECT * FROM test;
 id | blob
----+-------
  1 | 33280
(1 row)

postgres=# SELECT * FROM pg_largeobject;
 loid  | pageno |                    data
-------+--------+--------------------------------------------
 33280 |      0 | \x44656269616e20474e552f4c696e75782031320a
(1 row)

postgres=# SELECT lo_export(test.blob, '/tmp/foo') FROM test;
 lo_export
-----------
         1
(1 row)

postgres=# SELECT pg_read_file('/tmp/foo');
    pg_read_file
---------------------
 Debian GNU/Linux 12+

(1 row)

postgres=# INSERT INTO test VALUES (1, lo_import('/etc/issue.net'));
INSERT 0 1

Now if we dump the database in custom format with both version 16 and 17 of pg_dump and then use pg_restore -l to display the table of contents (TOC), we see a difference:

$ for version in 16 17; do /usr/lib/postgresql/$version/bin/pg_dump -Fc -f lo_test_$version.dmp; \
> pg_restore -l lo_test_$version.dmp | grep -v ^\; > lo_test_$version.toc; done
$ diff -u lo_test_{16,17}.toc
--- lo_test_16.toc  2024-12-11 09:05:46.550667808 +0100
+++ lo_test_17.toc  2024-12-11 09:05:46.594670235 +0100
@@ -1,5 +1,4 @@
 215; 1259 33277 TABLE public test postgres
-3348; 2613 
[...]

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.