Luca Ferrari

pgenv 1.4.3 is out!

2025-09-21T00:00:00+00:00

A new minor release for the beloved tool to build and manage multiple PostgreSQL instances.

pgenv 1.4.3 is out!

pgenv 1.4.3 is out! This minor release fixes a problem in the build of release candidate versions (e.g., 18rc1) by stripping out all the text part from a version number using a Bash regular expression.

PgTraining OpenDay is over!

2025-04-14T00:00:00+00:00

We are proud of what we have done in Bolzano.

PgTraining OpenDay is over!

Last Friday was PgTraining OpenDay in Bolzano, a free of charge day entirely dedicated to PostgreSQL, that we at PgTraning organized.

We hold the event in the spectacular NOI TechPark in Bozen (Bolzano), north Italy, and the room we had was simply amazing: everything was arranged in a very professional and clean way.

Chris, our host, Enrico and yours truly, had several talks with regard to cool topics like (but not limited to):

vector support in PostgreSQL via PgVector, and what you can do with such a tool to create RAG applications;
connection pooling (with regard to pgagroal)
logical replication and hot upgrade.

The afternoon was a more practical part, when we displayed a few live demos to the audience.

All the material, slides and code samples, including a few Docker images, are available on the PgTraining Gitlab repository, more material will be available in the next days.

As a joke, during the afternoon, Chris embedded the whole Raku documentation in his RAG application and we enjoyed asking the application about how to connect Raku to a PostgreSQL database, with a couple of very detailed and accurate answers.

The audience was very interested in all the topics, and we are glad of such a good day.

We hope to be able to host soon another event like this, and we would like to get some more feedback, even “bad”, in order to arrange an even better event!

pgagroal now has docker files!

2025-04-03T00:00:00+00:00

An important contribution to pgagroal.

pgagroal now has docker files!

Thanks to the contribution of Arshdeep now the pgagroal connection pooler has also docker images available on the repository.

There are two docker files: one based on Alpine Linux and one based on Rocky Linux 9.

Thanks to these docker files it should be simpler to test and do a play of the connection pooler.

pgenv 1.4.0 is out!

2025-03-10T00:00:00+00:00

A new version with an interesting improvement in the configuration management.

pgenv 1.4.0 is out!

pgenv 1.4.0 is out with an interesting improvement regarding the configuration management.

When you install, and then use, a specific PostgreSQL version, pgenv loads the configuration to start the instance with from a configuration file that is named after the PostgreSQL specific version. For instance, if you are running version 17.1, then pgenv will load the configuration from a file named 17.1.conf. If the latter file does not exists, the pgenv script will try to load the default configuration file default.conf.

Now, thanks to the work done in the pgenv development, it is possible to allow for multiple configuration files with overrides. In particular, pgenv will load more than one configuration file with narrowing context related to the PostgreSQL version. Therefore, using a 17.1 PostgreSQL version will trigger the loading of the following files:

default.conf
17.conf
17.1.conf

Note the addition of the major version specific configuration file (in the above 17.conf).

This new configuration loading chain will make pgenv to load configuration from a default to a specific context, allowing also for a quicker sharing of configuration assuming you are interested only in the major version configuration.

OpenDay 2025 by PgTraining

2025-03-10T00:00:00+00:00

There are still seats available for this entire day dedicated to PostgreSQL!

OpenDay 2025 by PgTraining

PgTraining is organizing a free for all entire day dedicated to PostgreSQL, where people is going to meet by face.

The event, that will be held in the great NOI Techpark in Bolzano (Italy) will be organized in two parts:

a talk session in the morning
a laboratory sessione in the afternoon.

Please note that this is an italian only spoken language event.

The schedule of day is available, and there are still a few seats available (but you need to register in order to participate).

Open Day 2025 in Bolzano (Italy): schedule available

2025-01-20T00:00:00+00:00

The schedule of the free event is available!

Open Day 2025 in Bolzano (Italy): schedule available

The schedule of the next free event organized by PgTraining is available.

The event, that will be held in the great NOI Techpark in Bolzano (Italy) will be organized in two parts:

a talk session in the morning
a laboratory sessione in the afternoon.

Due to the nature of the afternoon session, it is recommended to bring your own laptop in order to test everything the laboratory will introduce.

There are still available seats hut please note that you have to reserve your own seat to participate. Follow this link to reserve your seat!

The importance of testing with not-so-usual setups

2025-01-16T00:00:00+00:00

How we discovered a trivial bug in pgagroal

The importance of testing with not-so-usual setups

This week we found a trivial and silly bug in [pgagroal](https://github.com/agroal/pgagroal){:target="_blank"}.

This post is a brief description about such bug, not because it is important on itself, but because the way we discovered it emphasizes how important it is to randomize the configuration of a system. It is a well known concept, however we all still tend to fail on this, due also to the lack and time to configure and test all possibilities (thanks God there is automation!).

As it often happens, the bug was caused by a memory allocation problem.

As it often happens in these cases. the fixing is a very short troophy patch.

Again, the aim of this post is not to discuss a one line patch, rather the importance of running and testing with different tools and setups.

The memory bug

The bug is described in a dedicated issue. What is interesting, as often happens when dealing with bugs, is how long it get unnoted.

While working and testing other work in progress features of pgagroal, I was encouraged to compile the project using clang instead of my usual gcc. The result was discouraging, since I was not able anymore to start the program:

% pgagroal
pgagroal: Unknown key <ev_backend> with value <io_uring> in section [pgagroal] (line 46 of file </etc/pgagroal/pgagroal.conf>)
2025-01-13 12:38:09 WARN  configuration.c:482 pgagroal: max_connections (20) is greater than allowed (8)
2025-01-13 12:38:09 DEBUG configuration.c:3074 PID file automatically set to: [/tmp/pgagroal.54322.pid]
=================================================================
==17659==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x502000009495 at pc 0x559726597f10 bp 0x7ffc9c1e2360 sp 0x7ffc9c1e1b00
WRITE of size 6 at 0x502000009495 thread T0
    #0 0x559726597f0f in vsprintf (/usr/local/bin/pgagroal+0x58f0f) (BuildId: 16ffc1dab018cfa8eed6b5cc7e6981bc6e861325)
    #1 0x55972659900e in sprintf (/usr/local/bin/pgagroal+0x5a00e) (BuildId: 16ffc1dab018cfa8eed6b5cc7e6981bc6e861325)
    #2 0x7f1cd144820b in bind_host /home/luca/pgagroal/src/libpgagroal/network.c:613:4
    #3 0x7f1cd1447bfc in pgagroal_bind /home/luca/pgagroal/src/libpgagroal/network.c:104:17
    #4 0x55972664e91c in main /home/luca/pgagroal/src/main.c:961:11
    #5 0x7f1cd10295cf in __libc_start_call_main (/lib64/libc.so.6+0x295cf) (BuildId: d78a44ae94f1d320342e0ff6c2315b2b589063f8)
    #6 0x7f1cd102967f in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2967f) (BuildId: d78a44ae94f1d320342e0ff6c2315b2b589063f8)
    #7 0x559726571a94 in _start (/usr/local/bin/pgagroal+0x32a94) (BuildId: 16ffc1dab018cfa8eed6b5cc7e6981bc6e861325)

0x502000009495 is located 0 bytes after 5-byte region [0x502000009490,0x502000009495)
allocated by thread T0 here:
    #0 0x55972660d04d in calloc (/usr/local/bin/pgagroal+0xce04d) (BuildId: 16ffc1dab018cfa8eed6b5cc7e6981bc6e861325)
    #1 0x7f1cd14481a3 in bind_host /home/luca/pgagroal/src/libpgagroal/network.c:607:12
    #2 0x7f1cd1447bfc in pgagroal_bind /home/luca/pgagroal/src/libpgagroal/network.c:104:17
    #3 0x55972664e91c in main /home/luca/pgagroal/src/main.c:961:11
    #4 0x7f1cd10295cf in __libc_start_call_main (/lib64/libc.so.6+0x295cf) (BuildId: d78a44ae94f1d320342e0ff6c2315b2b589063f8)

If I wasn’t so lazy to test, even occasionally, another compiler, I would have discovered the problem sooner.

Lesson learned #1: using a different toolchain can speed up the discover of issues.

However, we were testing and building pgagroal on different environments and by different toolchains, hence how did this get unnoted?

Simple answer: because developers tend to be lazy. We tend to use the same setup over and over, and to follow the guides and howtos.

Understanding the bug

The stacktrace reports a problem about a calloc call and something about a `5 byte region’:

...
0x502000009495 is located 0 bytes after 5-byte region [0x502000009490,0x502000009495)
allocated by thread T0 here:
    #0 0x55972660d04d in calloc (/usr/local/bin/pgagroal+0xce04d) (BuildId: 16ffc1dab018cfa8eed6b5cc7e6981bc6e861325)
...

and luckily enough, we get also a line number and a file to look at:

...
    #2 0x7f1cd144820b in bind_host /home/luca/pgagroal/src/libpgagroal/network.c:613:4
...

Let’s start from there. The code within network.c, around that line, was doing the following:

char* sport;

sport = calloc(1, 5);

At its gist, the code is allocating a string to handle a number that represents a TCP/IP port number. The usage of calloc simplifies the well known pattern malloc plus memset to zero fill the memory.

It is quite simple now to spot the bug: a TCP/IP port upper boundary is 65535, five digits, but the string needs the \0 terminator. And this is why all of this was unnoted before: the guide for pgagroal suggests to use the TCP/IP port 2345 (the reverse of the PostgreSQL default port) to listen for connections. Since 2345 is made by four digits, there is room for the string terminator.

However, on my setup, I use the port 54322, which is five digits, hence the string terminator overflows the sport calloc-ated buffer.

The troophy patch was embarassing (see this commit):

-   sport = calloc(1, 5);
+   sport = calloc(1, 6);

Lesson learned #2: use a not standard setup in order to look for problems.

Conclusions

This short story emphasizes, once again, how important it is to change your own development environment and toolchain, as well as setup, in order to ease and speed the identification of problems. If I did not change the toolchain, I wouldn’t have seen the problem. And if I was not using a different setup than the “default” one, I wouldn’t have seen the problem.

Note that both the conditions had to happen for we to discovered the problem.

And this is the important remark about the whole story.

OpenDay 2025 in Bolzano (Italy)

2024-12-18T00:00:00+00:00

Prepare for the next great event by PgTraining!

OpenDay 2025 in Bolzano (Italy)

PgTraining is organizing the next year event, namely OpenDay 2025 that will be held on April 11th in Bolzano, Italy.

The event will be totally free but registration is required because the room assigned has a fixed number of seats.

Please note that all the speaks will be in italian.

The event will be held at the NOI Techpark.

We are working on the schedule, but the day will be organized in a talks-session and a laboratory/practical session, the former in the morning, the latter in the afternoon.

Please see the official event page for more details and stay tuned for updates!

PL/Perl now ties %ENV

2024-11-21T00:00:00+00:00

A small but great improvement in the security of PL/Perl.

PL/Perl now ties %ENV

PL/Perl is a great language, since it ties together two of my favourite pieces of technology: PostgreSQL and Perl.

While I do usually refer to PL/Perl as the capability to run Perl code within PostgreSQL, the correct naming is either PL/Perl or PL/Perlu, where the former is the trusted language and the latter is the untrusted one.

A trusted language means that the code will run in a PostgreSQL sandbox, with lower permissions than a normal application.

This commit introduces a new protection level in the trusted languaged PL/Perl: is prevents the modification of the %ENV hash, that represents the enviromental settings for the running code. An official CVE for the problem has been issued.

The trick to prevent modifications is really elegant, as often Perl is: using a tied hash to wrap up %ENV.

It works as follows:

create a new class that implements the hash protector
tie the %ENV to this new class
provide warnings when something tries to modify the %ENV tied hash.

Let’s explain it a little better. First of all, a new class to wrap the hash is created:

package PostgreSQL::InServer::WarnEnv;

use strict;
use warnings;
use Tie::Hash;
our @ISA = qw(Tie::StdHash);

sub STORE  { warn "attempted alteration of \$ENV{$_[1]}"; }
sub DELETE { warn "attempted deletion of \$ENV{$_[1]}"; }
sub CLEAR  { warn "attempted clearance of ENV hash"; }

The PostgreSQL::InServer::WarnEnv class inherits from Tie::StdHash, a tie-able hash that already defines all the required methods and that requires you to only override those that are in your scope of interest. In particular, WarnEnv overrides STORE, DELETE and CLEAR that are method used when adding, deleting of a value in the hash or clearing it all.

Then, it does suffice to tie the %ENV to this class, and in fact the PL/Perl implementation does:

tie %main::ENV, 'PostgreSQL::InServer::WarnEnv', %ENV or die $!;

that applies WarnEnv as the class behind the behaviour of main::ENV keeping all values of %ENV (that has been changed to a normal hash). From now on, trying to modify %ENV will result in a warning according to the method used.

This patch has been backported on older PostgreSQL versions until 12.

dbicdump: using PostgreSQL schemas as package separator in produced Perl classes

2024-11-18T00:00:00+00:00

A way to instrument dbicdump to use PostgreSQL schemas as package separators.

dbicdump: using PostgreSQL schemas as package separator in produced Perl classes

Perl DBIx::Class is a great Object Relational Mapper (ORM), and I use it regularly with dbicdump, which is a tool to synchronize your existing database structure with the classes your program is going to use.

PostgreSQL being PostgreSQL, a great rock solid database we all love, allows us to organize tables into schemas, a flat namespace that is usually transparent to the user because the default schema, public, is always into the search_path for every user.

But how to take advantage of PostgreSQL schemas and DBIx::Class packages?

Well, it turned out that this is possible, with a little customization of the way you sycnhronize your own data structure.

Example Database

Assume we have an example database with a couple of tables, namely products and orders, each one replicated into two different schemas named respectively italy and japan. Note, this is probably not the better design for your database, but it does serve only as an example to get a quick and easy idea of how to achieve things.

The database results as follows:

dbic=> CREATE SCHEMA italy;
CREATE SCHEMA
dbic=> CREATE SCHEMA japan;
CREATE SCHEMA
                                           ^
dbic=> CREATE TABLE italy.product( pk serial,
            code text,
	        description text,
			primary key( pk ),
			unique( code ) );
CREATE TABLE

dbic=> CREATE TABLE japan.product( pk serial,
          code text,
		  description text,
		  primary key( pk ), unique( code ) );
CREATE TABLE

dbic=> CREATE TABLE italy.orders( pk serial,
				  product int not null,
				  qty int default 0
				  , primary key ( pk )
				  , foreign key( product ) references italy.product( pk ) );
CREATE TABLE

dbic=> CREATE TABLE japan.orders( pk serial,
				product int not null,
				qty int default 0
				, primary key ( pk )
				, foreign key( product ) references japan.product( pk ) );
CREATE TABLE

Let’s populate the products table with a few rows:

dbic=> insert into italy.product( code, description )
       values( 'it01', 'An italian product' );
INSERT 0 1

dbic=> insert into japan.product( code, description )
       values( 'jp01', 'A japanese product' );
INSERT 0 1

dbic=> insert into japan.product( code, description )
       values( 'jp02', 'A japanese product' );
INSERT 0 1


dbic=> insert into italy.orders( product, qty )
		    select p.pk, ( random() * 100 )::int
		    from italy.product p, generate_series( 1, 5 ) v;
INSERT 0 5


dbic=> insert into japan.orders( product, qty )
			select p.pk, ( random() * 100 )::int
			from japan.product p, generate_series( 1, 5 ) v;
INSERT 0 10

Dumping the schema via `dbicdump`

In order to dump the schema via dbicdump, you need to pass several additional options:

the schema names to dump, in our example italy and jpana;
the moniker parts to use, that is how the class name will be built. By default the moniker is set to name, that means it will call the name method (i.e., the table name). In our example, we need to use both name and schema, with the latter before the former;
set the moniker parts separator, that is the character to use to separate the parts of the name. since we want to produce modules with their namespace, we will use the Perl namespace separator, that means ::.

This translates to a command line like the following:

% dbicdump -o dump_directory=/home/luca/tmp            \
           -o components='["InflateColumn::DateTime"]' \
		   -o moniker_parts='["schema", "name"]'       \
		   -o moniker_part_separator='::'              \
		   -o db_schema='["public", "italy", "japan"]' \
		   Example::Schema                             \
		   'dbi:Pg:dbname=dbic;host=rachel;port=5432'  \
		   luca superSecretPassword

Dumping manual schema for Example::Schema to directory /home/luca/tmp ...
Schema dump completed.

The parameters passed to dbicdump are the followings:

dump_directory where to store the Perl code produced;
components='["InflateColumn::DateTime"]' this is not mandatory for this post example, but is a good habit to get automatic date/time data type conversions;
moniker_parts='["schema", "name"]' this tells dbicdump to compose the name of a class mapped onto a table as the schema name plus the table name, which is what we want;
moniker_part_separator='::' this tells dbicdump to use the Perl name separator (i.e., package separator ::) between the schema name and the table name;
db_schema='["public", "italy", "japan"]' this tells dbicdump to dump the public, italy and japan schemas, i.e., where to look for tables.

The resulting tree is as follows:

% tree Example
Example
├── Schema
│   └── Result
│       ├── Italy
│       │   ├── Order.pm
│       │   └── Product.pm
│       └── Japan
│           ├── Order.pm
│           └── Product.pm
└── Schema.pm

That is the table italy.products has been translated to Italy::Product, and the other similarly.

Using the table structure

In order to use the Perl classes, and most notably, to query the tables, there is the need to pass the class names into the resultset method.

As an example:

#!perl

use v5.40;
use Example::Schema;
use Example::Schema::Result::Italy::Product;
use Example::Schema::Result::Japan::Product;


my $db = Example::Schema->connect(  'dbi:Pg:dbname=dbic;host=rachel;port=5432' ,
				    'luca',
				    'superSecretPassword' );


my @italian_products  = $db->resultset( 'Italy::Product' )->all;
my @japanese_products = $db->resultset( 'Japan::Product' )->all;

say "There are " . scalar( @italian_products ) . " italian products";
say "There are " . scalar( @japanese_products ) . " japanese products";


for my $product ( @italian_products ) {
    say "[ITALY] " . join( " | ", $product->code, $product->description );
}

for my $product ( @japanese_products ) {
    say "[JAPAN] " . join( " | ", $product->code, $product->description );
}

Note how the resultset method does not accept the table name, rather the Perl module name. In other words, italy.product does not work, while Italy::Product works.

In fact, enabling DBIC_TRACE and running the sample program produces the following output:

% export DBIC_TRACE=1

% perl test.pl
SELECT me.pk, me.code, me.description FROM italy.product me:
SELECT me.pk, me.code, me.description FROM japan.product me:
There are 1 italian products
There are 2 japanese products
[ITALY] it01 | An italian product
[JAPAN] jp01 | A japanese product
[JAPAN] jp02 | A japanese product

As you can see, the queries are correctly translated into <schema>.<tablename>. This is thanks to the fact that the table method in every class has been invoked with the fully qualified name. As an example:

% less Example/Schema/Result/Italy/Product.pm

...
__PACKAGE__->table("italy.product");
...

Using Relationships

Once it is clear how the tables are named, it is quite simple to query relationships. Let’s do it programmatically first:

#!perl

use v5.40;
use Example::Schema;
use Example::Schema::Result::Italy::Product;
use Example::Schema::Result::Japan::Product;


my $db = Example::Schema->connect(  'dbi:Pg:dbname=dbic;host=rachel;port=5432' ,
				    'luca',
				    'luca' );

my @italian_orders  = $db->resultset( 'Italy::Order' )->all;
my @japanese_orders = $db->resultset( 'Japan::Order' )->all;

for ( @italian_orders ) {
    say sprintf "[ITALY] qty = %d for product %s" ,
	    $_->qty,
	    join( "|", $_->product->code, $_->product->description );

}

for ( @japanese_orders ) {
    say sprintf "[JAPAN] qty = %d for product %s" ,
	    $_->qty,
	    join( "|", $_->product->code, $_->product->description );

}

But let’s assume we want to query all the products that have at least one order of a given quantity (again, this is an example). This can be done as follows:

my @italian_products = $db->resultset( 'Italy::Order' )
	->search_related( 'product' )
	->search( { qty => 36 } )
	->all
	;

Let’s dissect this:

resultset( 'Italy::Order' ) is what we search first;
search_related( 'product' ) is what we join and extract then;
search( { qty => 36 } ) is the search condition (i.e., the WHERE clause);
all is the materialization of the result set.

The above translates to the following query (again DBIC_TRACE to get information about):

SELECT product.pk, product.code, product.description
FROM italy.orders me
JOIN italy.product product
ON product.pk = me.product WHERE ( qty = ? ): '36'

Wait a minute! What is that product name that appears into the search_related method? Why is not Italy::Product as before? This is due to how DBIx::Class handles the relationships: every relationship gets a name that is used to tell DBIx what to join. Inspecting Italy::Order you can find something as follows:

__PACKAGE__->belongs_to(
  "product",
  "Example::Schema::Result::Italy::Product",
  { pk => "product" },
  { is_deferrable => 0, on_delete => "NO ACTION", on_update => "NO ACTION" },
);

The string "product" is the name of this join relationship, that has to be used when telling DBIx to join another table from Italy::Order.

This is a kind of trick used by DBIx, so that having an Order you can simply spell $order->product->code and it will work fine. You can rename such association as you like (having care of not irritating dbicdump self generated code), and use the name you like the most in joining, but I strongly recommend you to avoid this. Rather, design better your tables.

Conclusion

DBIx::Class is a very powerful and elegant ORM, and dbicdump allows you to organize your code in packages following the same clean order you can achieve with PostgreSQL schemas.

psql watch now has a row limit

2024-11-06T00:00:00+00:00

A new feature introduced with PostgreSQL 17.

psql \watch now has a row limit

I often use \watch, an internal command of the great text client psql that allows to monitor the last executed query at specific interval times. Essentially, it works as watch(8) on a Unix machine.

In the last major release of PostgreSQL, the \watch command has gained a new interesting feature: the minrows limit. The idea is to make \wwatch to stop automatically as soon as the executed query returns less than the specified number of rows. This is great, according to me, since I often launch \watch to just come back to the terminal and find out a lot of empty executions, needing therefore to scroll up the terminal or the log to find the last point when something did happened. With this feature, \watch will stop for me!

As a very trivial example, imagine to launch a well known pgbench test (with short time parameters for the sake of this article):

% pgbench -T 120 -c 4 -n -U pgbench pgbench

and on a psql terminal, use \watch waiting for the pgench to finish:

pgbench=> select query, wait_event
          from pg_stat_activity
		  where datname = current_database() and usename = 'pgbench';

pgbench=> \watch m=1

Whenever the query returns less than one row, that means no more processes are connected to the database, the \watch will stop:

pgbench=> \watch m=1

                             Wed Nov  6 07:34:20 2024 (every 2s)

                                    query                                     |  wait_event
------------------------------------------------------------------------------+---------------
 UPDATE pgbench_accounts SET abalance = abalance + 4197 WHERE aid = 9520572;  | DataFileWrite
 UPDATE pgbench_accounts SET abalance = abalance + 1973 WHERE aid = 98188924; | DataFileRead
 UPDATE pgbench_accounts SET abalance = abalance + 4479 WHERE aid = 2905019;  | DataFileWrite
 UPDATE pgbench_accounts SET abalance = abalance + 2554 WHERE aid = 17213075; | DataFileRead
(4 rows)

                              Wed Nov  6 07:34:22 2024 (every 2s)

                                     query                                     |  wait_event
-------------------------------------------------------------------------------+---------------
 BEGIN;                                                                        |
 UPDATE pgbench_accounts SET abalance = abalance + -3617 WHERE aid = 4375996;  | DataFileWrite
 UPDATE pgbench_accounts SET abalance = abalance + 1388 WHERE aid = 75850626;  | DataFileWrite
 UPDATE pgbench_accounts SET abalance = abalance + -3372 WHERE aid = 15435869; | DataFileRead
(4 rows)

Wed Nov  6 07:34:24 2024 (every 2s)

 query | wait_event
-------+------------
(0 rows)

Clerly, this has its own drawbacks: imagine a long running job pause, releasing the connection, just to come back to its activity later. If \watch executes the query in the pause period of time, the command will stop and you will not get any update about the resuming of the same activity. An example of this is by monitoring pg_stat_progress_xxx views, for example pg_stat_procress_autovacuum to see when cleaning a table is performed.

However, keeping in mind a good condition (number of rows) to get \watch on track, and being able to make it stop automatically is a feature that really helps me in my daily activity.

PostgreSQL is super solid in enforcing (well established) constraints!

2024-11-06T00:00:00+00:00

A note about mgirating from other databases…

PostgreSQL is super solid in enforcing (well established) constraints!

Well, let’s turn that around: SQLite3 is somehow too flexible in allowing you to store data!

We all know that.

And we all have been fighting situations where we have a well defined structure in SQLite3 and, ocne we try to migrate to PostgreSQL, a bad surprise arrives! As an example, today I was trying to migrate a Django project with the built-in loaddata from a dumpdata, and sadly:

django.db.utils.DataError:
    Problem installing fixture '/home/luca/git/respi/respiato/sql/data.respi.json':
	   Could not load respi.PersonPhoto(pk=30647):
	       value too long for type character varying(20)

So in my SQLite3 tables some fields (at least one) have exceeded the size of the varchar(20), and while PostgreSQL correctly refuses to store such value(s), SQLite3 happily get them into the database without warning you!

The fix, in this particular case, is quite simple: issueing an ALTER TABLE personphoto ALTER COLUMN file_path SET VARCHAR(50) does suffice. I could have used text also, but I would like to keep under control crazy values incoming from my application.

The point is: sooner or later, you will be stuck against a constraint your stack is not honoring, so be prepared for some troubles.

Using PostgreSQL in first place would have made the long-term maintanance easier, according to me.

PostgreSQL 17 WAL Summarization

2024-10-21T00:00:00+00:00

A new interesting feature in the management of WALs.

PostgreSQL 17 WAL Summarization

PostgreSQL adds a new cool feature in the management of the Write Ahead Logs (WALs): the WAL summarization.

Two settings control the WAL Summarization:

summarize_wal (by default set to off) indicates if the summaries have to be produced;
wal_summary_keep_time indicates the amount of time (usually days) to keep the summaries before proceeding to an automatic cleanup.

Documentation for these two settings can be found in the official documentation. Turning on summarize_wal makes another process appear in the list of PostgreSQL processes: the walsummarizer:

$ ps -auxw | grep postgres
postgres       1  0.0  0.1 221044 29824 ?        Ss   13:45   0:00 postgres
postgres      27 11.2  0.0  74668  6768 ?        Ss   13:45   4:09 postgres: logger
postgres      28  0.0  0.3 221312 52088 ?        Ss   13:45   0:00 postgres: checkpointer
postgres      29  0.0  0.0 221188  9080 ?        Ss   13:45   0:00 postgres: background writer
postgres      31  0.0  0.0 221164 11768 ?        Ss   13:45   0:00 postgres: walwriter
postgres      32  0.0  0.0 222608  9720 ?        Ss   13:45   0:00 postgres: autovacuum launcher
postgres      33  0.0  0.0 222616  9208 ?        Ss   13:45   0:00 postgres: logical replication launcher
postgres     289  0.0  0.0 221652  7696 ?        Ss   13:57   0:01 postgres: walsummarizer

Such process is in charge of keeping an eye on what is changed on disk, so to produce the summaries.

WAL summaries are kept in the pg_wal directory, under the summaries subdirectory, hence in a very risky zone to walk into!

$ ls -1 $PGDATA/pg_wal/summaries
000000010000000023001320000000002305F8D8.summary
00000001000000002305F8D80000000026000028.summary
0000000100000000260000280000000029E52AB8.summary
000000010000000029E52AB8000000002BC82690.summary
00000001000000002BC82690000000002E015A98.summary

The summaries are used to enable the very cool new feature of incremental backups: since version 17 the pg_basebackup is able to take incremental backups. The idea is as follows: you run a first pg_basebackup as usual, so to take a so called full backup. Then you take other backups specifying to pg_basebackup the --incremental option, passing the manifest of the previous backup. The command will try to understand what changed from the previous backup on disk and copy over only blocks that have been changed.

Before version 17 the only way to take a good incremental backup was to use tools like the excellent pgbackrest, that was able to do exactly that.

Summaries are used to know which blocks on disk have changed since the last backup, so to inform pg_basebackup about what is needed to be copied over. WAL summaries are much smaller than the WALs themselves, and therefore can be stored for a pretty much long period with regard to the WALs. In particular, in order to be able to peform an incremental backup, there must be all the summaries covering the timeframe from the previos backup to the current moment, otherwise there will be no possibility to perform an incremental backup. Hence the need for a wal_summary_keep_time tunable that resembles to me the old days of wal_keep_segments, with all the related problems and workarounds.

Incremental backups need then to be re-assembled into a single backup by means of a new tool called pg_composebackup, not discussed here.

One thing that scaries me a lot is that there is no way to automatically delete summaries once they are turned off after having been enabled. In other words, the user is required to remove no more useful summaries if the summarizer process is turned off. Being the summaries in a subdirectory of pg_wal, and being the latter such a risky place to be into, I believe a distracted user could do a great damage to the system.

pgenv 1.3.8 is out!

2024-10-17T00:00:00+00:00

A new release of pgenv that simplifies the management of PostgreSQL 17.

pgenv 1.3.8 is out!

Yesterday, David Wheeler releader version 1.3.8 of pgenv, that solves a few problems in dealing with the latest PostgreSQL release version 17.

The build workflow of PostgreSQL 17 has slightly changed, so that new dependencies are required to produce the documentation. Thanks to the work by Brian Salehi now the pgenv build command performs a make world-bin (essentially world-bin is the target to build and install PostgreSQL without documentation). The documentation package is downloaded separately, since now the documentation pre-built has been removed from the source tree and is available as a separate tarball.

Moreover, this release includes another Brian’s little contribution that improves the descriptive messages about dependencies.

Enjoy!

PostgreSQL adds the login type for event triggers

2024-10-03T00:00:00+00:00

Is it now possible to catch a login event.

PostgreSQL 17 adds a new firing event for event triggers: login. Therefore it is now possible to catch a login attempt on a database.

Caution: this is not the same as Oracle logon triggers, even if it resembles the same functionality to me.

However, thanks to this, is is now possible to get some more information when a login attempt succeeds.

In order to implement a poor-man auditing (don’t do this at home!) to breifly demonstrate this feature, you can:

postgres=# CREATE TABLE wrong_audit( pk int generated always as identity
          , who text
          , ts  timestamp default current_timestamp );
CREATE TABLE

postgres=# grant INSERT on table wrong_audit to public;
GRANT


postgres=# create or replace
	function f_etr_audit() returns event_trigger
	as $code$
	begin
     	insert into wrong_audit( who ) select current_role;
	end
	$code$
	language plpgsql;
CREATE FUNCTION

postgres=# create event trigger
	 poor_auditing on login
	 execute function f_etr_audit();
CREATE EVENT TRIGGER

Ano now, when you connect to the database you will see the table getting populated.

postgres=# table wrong_audit;
 pk |   who    |             ts
----+----------+----------------------------
  1 | postgres | 2024-10-03 11:39:27.659018
  2 | postgres | 2024-10-03 11:40:01.057011
  3 | postgres | 2024-10-03 11:46:06.38925
  4 | luca     | 2024-10-03 11:46:44.621835
  5 | luca     | 2024-10-03 11:46:46.389537
  6 | postgres | 2024-10-03 11:46:53.789339

There are a few things to note.

First of all, there is the need to grant the INSERT permission to the users that are going to fire the event, i.e., the user that are going to connect, or the trigger will not be able to execute. Obviously, there are other ways to do this, like settings permissions on the function itself.

Most important: if the trigger fails (due to an exception), the login attempt is aborted. For example, imagine that I remove the permissions on the tbale:

% psql -h localhost -U luca postgres
psql: error: connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL:  permission denied for table wrong_audit
CONTEXT:  SQL statement "insert into wrong_audit( who ) select current_role"
PL/pgSQL function f_etr_audit() line 2 at SQL statement

The connection is aborted due to the problem in completing the function.

Last but not least, the trigger function should not be a long running one, or the user will be locked waiting for the trigger to complete.

Now, for me to remember Oracle logon trigger, let’s complicate a little the above example (don’t try this at home):

postgres=# alter table wrong_audit add column db text;


ostgres=# create or replace function f_etr_audit()
returns event_trigger
as $code$
declare
        me text;
        db text;
begin
        SELECT current_role, current_database()
        INTO me, db;

        IF me = 'luca' AND db = 'postgres' THEN
           RAISE 'Get out of here!';
        END IF;

        insert into wrong_audit( who, db ) VALUES( me, db );
end
$code$
language plpgsql;

And now the poor bastard me when trying to connect to postgres gets:

% psql -h localhost -U luca postgres
psql: error: connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL:  Get out of here!
CONTEXT:  PL/pgSQL function f_etr_audit() line 10 at RAISE

while other users can still connect, and the table gets populated more and more.

PostgreSQL 17 allow_alter_system tunable

2024-10-03T00:00:00+00:00

PostgreSQL 17 includes a new (among others) tunable to control the ALTER SYSTEM command.

PostgreSQL 17 allow_alter_system tunable

Among the new excellent features of PostgreSQL 17, one captured my attention: the capability to disable the ALTER SYSTEM command via the tunable [allow_alter_system](https://www.postgresql.org/docs/current/runtime-config-compatible.html#GUC-ALLOW-ALTER-SYSTEM){:target="_blank"}.

The allow_alter_system is a boolean setting that is turned on by default, meaning that it is always possible to execute ALTER SYSTEM on the enrironment (as in previous versions). When turned off, the system will report an error, refusing to execute the command:

postgres=# alter system set work_mem to '512MB';
ERROR:  ALTER SYSTEM is not allowed in this environment

postgres=# show allow_alter_system ;
 allow_alter_system
--------------------
 off
(1 row)

The idea, as explained in the documentation, is to prevent mistakes when PostgreSQL is managed externally, or with an external tool, so that it is not possible to accidentally overwrite a configuration managed outside the database itself (i.e., via traditional files).

The annotation for the tunable explains it:

postgres=# select name, context, category, short_desc, extra_desc from pg_settings where name = 'allow_alter_system';
-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------
name       | allow_alter_system
context    | sighup
category   | Version and Platform Compatibility / Other Platforms and Clients
short_desc | Allows running the ALTER SYSTEM command.
extra_desc | Can be set to off for environments where global configuration changes should be made using a different method.

There are two important things to keep in mind when using this new feature:

this is not a security feature, it does not add any extra security layer;
postgresql.auto.conf will be always loaded as last included file*, therefore setting the tunable to off will not change the configuration machinery of PostgreSQL, nor will make impossible for *external tools to operate on postgresql.auto.conf directly (simulating, thefefore, ALTER SYSTEM),

Last but not least, keep in mind that the system is raising an error, thus aborting your existing scripts in the case this feature is set to off. According to me, the choice to error or ignoer an ALTER SYSTEM would have been a better choice, so that even automated script could rung without any side effect and without interruptions due to errors.

SQLite3 Vacuum and Autovacuum

2024-09-23T00:00:00+00:00

Similarly to PostgreSQL, also SQLite3 needs some care…

SQLite3 Vacuum and Autovacuum

Today I discovered, by accident I need to confess, that PostgreSQL is not the only database requiring VACUUM: also SQLite3 does.

And there’s more: SQLite3 includes an auto-vacuum too! They behave similarly, at least in theory, to their PostgreSQL counterparts, but clearly there is no autovacuum daemon or process. Moreover, the configuration is simpler and I’ve not found any threshold as we have in PostgreSQL. In the following, I explain how VACUUM works in SQLite3, at least at glance.

SQLite3 does not have a fully enterprise-level MVCC machinery as PostgreSQL has, but when tuples or tables are updated or deleted from a database, defragmentation and not reclaimed space makes the database file never shrink. Similarly to what PostgreSQL does, the now empty space (no more occupied by old tuples) is kept for future usage, so that the effect is that the database grows without never shrinking even after large data removal.

VACUUM is the solution that also SQLite3 uses to reclaim space.

VACUUM is a command available to the SQLite3 prompt to start a manual space reclaiming. It works by copying the database file content into another (temporary) file and restructuring it, so nothing really fancy and new here!

Then comes auto-vacuum that is turned off by default. The autovacuum works in a full mode or an incremental mode. The former is the most aggressive, and happens after a COMMIT. The second is the less intrusive, and “prepares” what the vacuum process has to do, without performing it. Is is only when [incremental_autovacuum](https://sqlite.org/pragma.html#pragma_incremental_vacuum){:target="_blank"} is launched that the space is freed. Therefore, autovacuum is SQLite3 either executes at each COMMIT or is postponed when considered safe to execute.

pg_dump and --if-exists little gem

2024-06-26T00:00:00+00:00

An option I was not aware of…

pg_dump and –if-exists little gem

pg_dump is a very useful tool to dump (and hence prepare to restore) a single PostgreSQL database.

When I use it, I usually add the options:

--clean to DROP the database I’m dumping;
--create to issue a CREATE DATABASE and reconnect to it.

Thanks to the above options, I’m pretty sure that I’m going to start over from a clean situation when restoring the dump. This is particularly useful, according to me, when developing a new application and need to start over from scratch.

However, the result of the --clean option is that the SQL file begins, after the useual preamble, with something like:

DROP DATABASE miniondb;

While this is what I want, if I need to restore the backup on a fresh machine, where the target database was not already in place, the restore will cause a warning saying that the database cannot be dropped because it does not exist (yet).

And thic could be annoying from time to time!

But being PostgreSQL such a great advanced piece of software, pg_dump provides an option for add the very useful IF EXISTS to DROP DATABASE: -if-exists comes to the rescue!

% pg_dump --clean --create --if-exists ...

The above will result in the DROP DATABASE miniondb IF EXISTS;, that in turn will stop annoying me when the database is not already in place.

After all, the documentation for the --clean option already mentioned it clearly:

If any of the objects do not exist in the destination database,
ignorable error messages will be reported during restore,
unless --if-exists is also specified.

and much more on the option documentation itself:

--if-exists
     Use DROP ... IF EXISTS commands to drop objects in --clean mode. This suppresses “does
     not exist” errors that might otherwise be reported. This option is not valid unless
     --clean is also specified.

Note that --if-exists refers to objects, not only the whole database!

PgTraining Free Online Event: Material Available

2024-04-23T00:00:00+00:00

The material and the videos are now online!

PgTraining Free Online Event: Material Available

The past Friday, on April 19th, we did our fourth edition of the webinar dedicated entirely to PostgreSQL, provided by PgTraining.

As in the previous editions, we had three talks and an open discussion at the end. The talks (all in italian) were:

Introduzione al linguaggio PL/Java (“An introduction to the PL/Java language”), from yours truly;
PgVector - in R768 nessuno può sentirti urlare (“PgVector - in R768 nobody can hear you screaming”), by Chris Mair;
Repliche logiche e migrazione di versione a caldo da PostgreSQL 12 a PostgreSQL 16 (“Logical replication and hot upgrade from PostgreSQL 12 to PostgreSQL 16”), by Enrico Pirozzi

The material is available on our Gitlab repository and such repository contains also links and material from the previous editions!

Some material is still under upgrading, so if not already there, it will appear any moment soon.

Using PL/Java: need for clarifications

2024-04-22T00:00:00+00:00

Sometimes it happens: I write something in a rush, and present it in a not-optimal way. And then I get advices!

Using PL/Java: need for clarifications

On January, I wrote an article about installing PL/Java on Rocky Linux, and about some of the difficulties I had in achieving a fully operational installation, even if I did not dig enough into the problems that I encountered.

Chapman Flack, the most active developer in the project at the moment, take the time to write to me a very detailed email with a lot of suggestions for improvements and providing corrections to some of the misconceptions I present in such an article.

I’m really glad to have received all those insights, and in order to spread the word, I’m writing here another article that, hopefully, fixes my mistakes. I’m not following the same order that Chapman presented them to me, since in my opinion some issues are much more important than others, so I present from the most important to the least one, according to me.

Editing the `java.policy` file

In my previous article, I advised readers to edit java.policy in the case there was a problem with Java permissions when executing PL/Java code. Despite the fact that I clearly stated that relaxing the permissions to all permissions was not a good idea, Chapman emphasized two main problems in my example: 1) I was editing the main policy file, therefore changing the policy rules for all the Java code, not only for PL/Java one; 2) adding java.security.AllPermission made no distinction between trusted and untrusted languages.

Chapman pointed out that PL/Java uses a customized policy file, that can be found in the PostgreSQL configuration directory, hence in $(pg_config --sysconfdir). This customizable configuration is available since PL/Java version 1.6, and is documented here in the section “Permissions available in sandboxed/unsandboxed PL/Java”. This file defines two main principals:

grant principal org.postgresql.pljava.PLPrincipal$Sandboxed * {
};


grant principal org.postgresql.pljava.PLPrincipal$Unsandboxed * {

        permission java.io.FilePermission
                "<<ALL FILES>>", "read,readlink,write,delete";
};

The first principal, the PLPrincipal$Sandboxed does not add any particular permission, while the PLPrincipal$Unsandboxed adds the permission to interact with the filesystem.

It is interesting to note that the pljava.policy file masks the ~/.java.policy one (if exists), meaning that the latter is not used by PL/Java at all. However, the special property ` pljava.policy_urls` can be set to point and include additional (cumulative) policy files.

Conclusion: configuring the pljava.policy file is the right way to make permissions available to the PL/Java code in a fine grain manner, without having to deal with the system-wide set of permissions.

Hopefully, there is no need to `SET pljava.libjvm_location`

Chapman provided me a link to the PL/Java packaging documentation which contains a section named “What is the default pljava.libjvm_location?” that explains how package mantainers have information about where the default JVM installation is on the target system. With such information, PL/Java pre-built packages could come pre-configured with the JVM location of the default installation on the system. So far, it seems the case for the Ubuntu package, while on my Rocky Linux it does not seem to be the case (or I messed the JVM installation).

Therefore, it is possible that there is no need to set pljava.libjvm_location if the package you installed already knows where the default JVM installation is on your operating system. However, knowing the aim of such variable and checking/configuring it allows database administrator to make PL/Java able to use a different (and specific) JVM.

Using the `pljava-api` (locally)

In my previous post, I wrote that in order to compile Java code against PL/Java there is the need for the API jar installed, namely pljava-api-x.y.z.jar. In order to get the API jar on the development machine, I wrote that you need to download the source code and compile it (using Apache mvn) and that this step is not simple at all, since it could require extra dependencies for the native code bindings.

Chapman pointed out that when you install the PL/Java from the PGDG distribution, you get also the above API jar installed on the PostgreSQL shared folder:

$ ls $(pg_config --sharedir)/pljava/*.jar
/usr/share/postgresql/16/pljava/pljava-1.6.7.jar
/usr/share/postgresql/16/pljava/pljava-api-1.6.7.jar
/usr/share/postgresql/16/pljava/pljava-examples-1.6.7.jar

Therefore, there is no need to manually compile the API jar by yourself, but you can use the one already installed into the PostgreSQL directory.

However, in order to make Apache Maven mvn aware of where the API jar is, you need to install locally the JAR into the Maven repository, so for example:

$ mvn install:install-file \
   -Dfile=$(pg_config --sharedir)/pljava/pljava-api-1.6.7.jar \
   -DgroupId=org.postgresql \
   -DartifactId=pljava-api \
   -Dversion=1.6.7 \
   -Dpackaging=jar

After the above, it is possible to compile Java code against the PL/Java API!

Information in the `sqlj.jar_repository` table

The sqlj.jar_repository table contains the unique (short) name given to every installed JAR, as well as the location the JAR was loaded from (jarorigin):

testdb=# select jarname, jarorigin from sqlj.jar_repository;
 jarname |        jarorigin
---------+--------------------------
 PWC258  | file:///tmp/PWC258-1.jar
 PWC260  | file:///tmp/PWC260-1.jar
 PWC257  | file:///tmp/PWC257-1.jar
 PWC263  | file:///tmp/PWC263-1.jar
 pwc266  | file:///tmp/PWC266-1.jar
 PWC264  | file:///tmp/PWC264-1.jar
 PWC259  | file:///tmp/PWC259-1.jar
 PWC262  | file:///tmp/PWC262-1.jar
 PWC65   | file:///tmp/PWC265-1.jar
(9 rows)

In my previous article, I poorly explained this concept: when the install_jar function is executed it accepts as a afirst argument the URI from which the JAR is going to be loaded from, and such value is stored into the jaroigin field. Once the JAR is deployed, such field does not have any useful meaning but giving information about the original location of the JAR, and does not provide information about where the JAR currently is. For example, if on a local storage, the JAR file could even be removed, since sqlj.install_jar will copy the jar content into the database (I guess into sqlj.jar_entry table).

Is there a round-trip of data between PostgreSQL and PL/Java?

Again, I poorly explained this concept in my previous article, stating that “[…] using an *external language like PL/Java means that PostgreSQL has to manage the round-trip of data between the database and the virtual machine, with the latter being fired at first execution.”*

PL/Java exploits JNI to comunicate with the PostgreSQL backend process, and the comunication happens within the same process. Therefore there is no roundtrip, at least not as in involving a different process (i.e, inter-process comunication). However, there is still the need to properly convert complex data structures from Java types to PostgreSQL ones and viceversa, and that was what I meant with the wrong term “roundtrip”.

Conclusions

The above is set of details towards a better understanding of how PL/Java works. I have to admit that I’m really sorry about the probably worst mistake I did, that was to provide all the permissions to all the Java code running on the machine. It is embarassing, since I did also in the past a lot of work on the Java policy mechanism, but being so long since I don’t develop Java anymore, I forgot all the good practice!

Besides, I hope this is going to better explain how to use PL/Java, and quite frankly I’m really happy to see that pretty much all my problems have a very strighforward solution that PL/Java developers have already addressed. This, again, emphasizes the maturity of such a project!

pgenv: run once scripts

2024-04-15T00:00:00+00:00

A new feature to run a single script at the very beginning of the cluster lifecycle.

pgenv: run once scripts

Today pgenv got a new release that provides a simple, but quite useful, feature: the capability to run a custom script the first time the instance is started.

The idea is simple: after the initdb phase, if the user has configured a PGENV_SCRIPT_FIRSTSTART executable, the system will run such script against the (just) started instance. This is different from PGENVE_SCRIPT_POSTSTART script, since the latter ie executed every time the cluster has started, while PGENV_SCRIPT_FIRSTSTART is run only the first time the database cluster is started.

The aim of this script is, hence, to install users and databases, or populate some initial data.

PostgreSQL 16 Coin

2024-04-14T00:00:00+00:00

I just got the coin in the mail!

PostgreSQL 16 Coin

I just received in the mail the PostgreSQL 16 Coin with a great artwork!

I’m really happy to be part of this great community!

pgagroal-cli minor bug fixes

2024-03-21T00:00:00+00:00

A few changes to a part of pgagroal.

pgagroal-cli minor bug fixes

In the past days I pushed a few troophy patches to pgagroal-cli, the command line tool to administer a pgagroal connection pooler, in order to fix minor issues that produced unattended results.

The bug were all harmless, since they only affected what the pgagroal-cli was producing as output to the user, but could have been confusing for some use cases, hence the need to fix them.

There is quite a momentum around the pgagroal project, and the activity around the issues has increased with the arrival of new contributors!

pgagroal command refactoring (again!) and a new contributor!

2024-03-15T00:00:00+00:00

Changes in pgagroal-cli and pgagroal-admin.

pgagroal command refactoring (again!) and a new contributor!

Last year I introduced a way in pgagroal-cli and pgagroal-admin to arrange commands in a more consistent and manageable way, deprecating some commands too.

Today, a new contributor to the project, Henrique de Carvalho, committed a patch that greatly improves the way commands are handled internally.

The users will not notice any particular difference, except that also a bug has been fixed in handling deprecated commands, but the changes in the code are very important: now all the commands are organized in a list of structs that provide a more accurate way of handling errors, missing arguments or command parts, and logging.

I became thinking about this refactoring months ago, but never got the time to dig into the changes. However, it all began with an annoying problem with some mispelled commands, that reported a wrong error message to the user.

And now, thanks to the contributions of Henrique, pgagroal has done another step towards a more complete and robust system.

PgTraining Online Event 2024 (italian)

2024-03-15T00:00:00+00:00

We are back with another event!

PgTraining Online Event 2024 (italian)

PgTraining, the amazing italian professionals that spread the word about PostgreSQL and that I joined in the last years, is organizing another online event (webinar) on next 19th April 2024.
Following the success of the previous edition(s), we decided to provide another afternoon full of PostgreSQL talks, in the hope to improve the adoption of this great database.

The event will consist in three hours with talks about PL/Java, PgVector and hot upgrade via logical replication.
As for the previous editions, the webinar will be presented in Italian. Attendees will be free to actively participate and do questions both during the talks and at the end of the whole event.

In the pure spirit of PgTraining, the event will be free of charge, but it is required to register for participate and the number of available seats is limited, so hurry up and get your free ticket as soon as possible!
The material will be available for free after the event has completed, but no live recording will be available.

pgagroal 1.6.0 has been released

2024-02-24T00:00:00+00:00

pgagroal, the fast connection pooler for PostgreSQL, has reached a new stable release!

pgagroal 1.6.0 has been released

A couple of days ago, pgagroal version 1.6.0 has been released.

This new version includes a lot of new features and small improvements that make pgagroal much more user-friendly and ease to adopt as a conenction pooler. The main contribution, from yours truly, has been command line refactoring and JSON support. Now the command line supports commands and subcommands, like for example conf get and conf set, and a more consistent set of commands. The JSON command output allows for an ease automation and a stable command output, so to ease the adoption in different scenarios.

But there’s more: a lot of other tickets have been solved during this release, and there is now support fo Mac OSX. Moreover, it is now possible to retrieve and set configuration values at run-time, thus without the need to manually editing the configuration file and reloading the daemon.

There is an initial exeperimental support for client certificates, and now it is possible to determine how long a connection must live.

A better handling of the configuration files, hence a better detection and reporting of misconfiguration, as well as a better error messaging system, completes the release.

The list of contributors is also expanding, and this is good and exciting!

Give pgagroal a try, you will be amazed by the capabilities of this connection pooler!

Using PL/Java to Return SETOF RECORD

2024-02-13T00:00:00+00:00

A simple way to return multiple records from PL/Java

Using PL/Java to Return SETOF RECORD

PL/Java allows a quite easy implementation of result set providers, objects that will produce rows that can be used as tables in queries. In order to produce a result set, the main steps are: 1) implement the ResultSetProvider interface and its method to effectively produce the data; 2) build a PL/Java function that will instantiate the above ResultsetProvider, so that PL/Java will wrap such function into a RETURN SETOF RECORD SQL function.

In the following there is a quite simple demostration about the production of records from PL/Java.

Implementing the `ResultSetProvider`

PL/Java has the ResultSetProvider interface that requires the implementation of two methods:

assignRowValues that is called for every row in the result set, and must return true to indicate that a new row has been added to the result set, or false to indicate that the result set is complete and no more rows will be added;
close that is called when the result set is closed by assignRowValue.

The assignRowValues function accepts two arguments:

an ResultSet object that is the container for all the rows;
a long value indicating the current row for which the method has been called. This counter starts at zero, as in normal Java list/array manipulations, not as in SQL.

Therefore, it is possible to implement the following methods as:

public class Task1 implements ResultSetProvider {

    private final static Logger logger = Logger.getAnonymousLogger();

    public Task1( int maxRows ) {
		super();
		this.maxRows = maxRows;
    }

    private int maxRows = 10;


    @Override
    public boolean assignRowValues( ResultSet rs, int row )
	throws SQLException {

		if ( row > maxRows )
		    return false;

		logger.info( String.format( "Producing row %d/%d", row, maxRows ) );

		rs.updateString( 1, String.format( "Row %d out of %d from %s",
						   row,
						   maxRows,
						   this.getClass().getName() ) );
		rs.updateInt( 2, row );
		rs.updateInt( 3, maxRows );
		rs.updateDate( 4, new java.sql.Date( Calendar.getInstance().getTimeInMillis() ) );
		return true;

    }


    @Override
    public void close() {
		logger.info( "Closing resultset" );
    }
}

The assignRowValues function simply adds to the ResultSet a string field, two integers and one date field. The production of the result set ends as soon as the produced rows count as in maxRows parameter, that is decided when the class is instantiated.

Creating a function to call the producer

It is possible to create a PL/Java function that will instantiate the aboce class, returning it. In order for PL/Java to understand that the function will produce a result set, the function must return a ResultSetProvider.

    @Function( onNullInput = RETURNS_NULL, effects = IMMUTABLE )
    public static final ResultSetProvider rs_producer_pljava() throws SQLException {
		logger.log( Level.INFO, "Entering rs_producer_pljava" );

		Task1 producer = new Task1( 20 );
		return producer;
    }

Once the function has been compiled, and the JAR installed, there will be a function defined as:

testdb=> \sf rs_producer_pljava
CREATE OR REPLACE FUNCTION public.rs_producer_pljava()
 RETURNS SETOF record
 LANGUAGE java
 IMMUTABLE STRICT
AS $function$PWC256.Task1.rs_producer_pljava()$function$

Note how the function has been produced as RETURN SETOF RECORD and will call the PL(Java function, that in turn will instantiate the ResultSetProviderr.

Using the function

It is now possible to query the function from SQL:

testdb=> select j.* from rs_producer_pljava() as j(t text, r int, m int, d date);
INFO:   PWC256.Task1 Entering rs_producer_pljava
INFO:   PWC256.Task1 Producing row 0/20
INFO:   PWC256.Task1 Producing row 1/20
INFO:   PWC256.Task1 Producing row 2/20
...
INFO:   PWC256.Task1 Producing row 18/20
INFO:   PWC256.Task1 Producing row 19/20
INFO:   PWC256.Task1 Producing row 20/20
INFO:   PWC256.Task1 Closing resultset

                t                 | r | m  |     d
-----------------------------------+---+----+------------
 Row 0 out of 20 from PWC256.Task1 | 0 | 20 | 2024-02-07
 Row 1 out of 20 from PWC256.Task1 | 1 | 20 | 2024-02-07
 Row 2 out of 20 from PWC256.Task1 | 2 | 20 | 2024-02-07
 Row 3 out of 20 from PWC256.Task1 | 3 | 20 | 2024-02-07
 Row 4 out of 20 from PWC256.Task1 | 4 | 20 | 2024-02-07
...

From the log messages it is possible to see that the result is being used to produce the records, and at the end it is closed.

Passing dynamically the number of rows to produce

What if there is the need to decide dynamically how many rows the ResultSetProvider has to produce? It simply requires to change the PL/Java function passing an integer argument:

    @Function( onNullInput = RETURNS_NULL, effects = IMMUTABLE )
    public static final ResultSetProvider rs_producer_pljava( int howManyRows ) throws SQLException {
		logger.log( Level.INFO, "Entering rs_producer_pljava" );

		if ( howManyRows <= 0 )
		    howManyRows = 5;

		Task1 producer = new Task1( howManyRows );
		return producer;
    }

And it is then possible to query the function with the following query:

testdb=> select j.* from rs_producer_pljava( 3 ) as j(t text, r int, m int, d date);
INFO:   PWC256.Task1 Entering rs_producer_pljava
INFO:   PWC256.Task1 Producing row 0/3
INFO:   PWC256.Task1 Producing row 1/3
INFO:   PWC256.Task1 Producing row 2/3
INFO:   PWC256.Task1 Producing row 3/3
INFO:   PWC256.Task1 Closing resultset
                t                 | r | m |     d
----------------------------------+---+---+------------
 Row 0 out of 3 from PWC256.Task1 | 0 | 3 | 2024-02-07
 Row 1 out of 3 from PWC256.Task1 | 1 | 3 | 2024-02-07
 Row 2 out of 3 from PWC256.Task1 | 2 | 3 | 2024-02-07
(2 rows)

Conclusions

It is quite simple to use PL/Java to implement a row producer, even based on already existing code.

pgagroal-cli gains JSON output

2024-02-10T00:00:00+00:00

A new feature of pgagroal-cli that now makes another step towards the full automation.

pgagroal-cli gains JSON output

At last, I made it: a commit in pgagroal to support JSON output. It has been quite hard and long, not for the technological challenge, rather for all the little details like continuos integration, to get this work completed. As a rule of thumb, I stated this work last November (of course, slowly working in and out).

What is all of this about?

The idea is to provide JSON based output to pgagroal-cli, the command line interface and main management tool for the pgagroal connection pool.

I have to admit that I hate JSON with a passion and I put it there on the top ranking of my worst formats with XML. So, why did I spent so much time in doing this patch? If you are not living under a stone, you probably know and see how many tools nowdays provide JSON output format, and the main reason is that this format, while being still human readable (ehm, to some extent!), it allows for an ease automation. There are tons of JSON parsers out there, and even our beloved database PostgreSQL has a very rich JSON support. Therefore, having a consistent and automatically parsable command output wille ease the automation, and hence the adoption of pgagroal.

To some extent, this work is the natural continuation of the work I initiated almost one year ago to make pgagroal-cli command line more consistent and understandable, for example I added commands to handle configuration directly from the command line (see for example this commit) and to have a more compact and consistent set of commands (see for example this this commit and the following I made).

How to use the JSON output

The pgagroal-cli command now supports an optional command line flag --format that allows to switch from the default text based output to the new JSON format. As the documentation states, the default output is the text format, so not specifying any --format option is totally equivalent to specifying --format text.

On the other hand, to turn on the JSON output format, it is required to pass --format json on the command line. As an example, the output of a command will appear to be:

% pgagroal-cli ping --format json
{
        "command":      {
                "name": "ping",
                "status":       "OK",
                "error":        0,
                "exit-status":  0,
                "output":       {
                        "status":       1,
                        "message":      "running"
                }
        },
        "application":  {
                "name": "pgagroal-cli",
                "major":        1,
                "minor":        6,
                "patch":        0,
                "version":      "1.6.0"
        }
}

Format of JSON output

The JSON output has a fixed structure with many pre-defined structure that include:

command an object that contains the command the server pgagroal has executed (or has been requested to execute);
application reports the name and version of the application that required the command (so far, always pgagroal-cli).

The comamnd object, in turn, contains other information, like the status of the command and the notification of errors, as well as output, an object that contains the command output (if any) and the command status.

As an example, consider a more verbose command like status:

% gagroal-cli status --format json
{
        "command":      {
                "name": "status",
                "status":       "OK",
                "error":        0,
                "exit-status":  0,
                "output":       {
                        "status":       {
                                "message":      "Running",
                                "status":       1
                        },
                        "connections":  {
                                "active":       0,
                                "total":        2,
                                "max":  15
                        },
                        "databases":    {
                                "disabled":     {
                                        "count":        0,
                                        "state":        "disabled",
                                        "list": []
                                }
                        }
                }
        },
        "application":  {
                "name": "pgagroal-cli",
                "major":        1,
                "minor":        6,
                "patch":        0,
                "version":      "1.6.0"
        }
}

As you can see, the command has a more extended output section that includes much more information and reports, with another dress, the output that the normal text command would have reported.

Every command has a different output format, that means that in order to interpret every command output there is the need to read the documentation for such command.

Moreover, it is interesting to note that, due to refactoring of the code, the text command output has slightly changed, so chances are that if you based your automation on such format you are going to break your scripts. This is an excellent motivation to switch to the new JSON output format!

Under the hood, all the complex commands like the above status have been refactored to talk only in JSON, therefore the text output format is nowdays a purified output extracted from the JSON sent over the communication protocol.

What about `pgagroal-cli` friends?

The other main command, pgagroal-admin has not migrated to JSON deliberately: I don’t believe that we need a lot of automation on this command, hence we don’t need to provide JSON output. Moreover, the command is not very verbose and does not produce pretty much output, on the other hand it requires an interactive session with the user.

Therefore, I don’t see the need to port JSON output to this command.

A Brief History

This patch is, as often it happens, the result of many trials and errors, either in the implementation or in the design.

In the beginning, I thought to add an explicit --json command line flag to indicate the need for JSON output, but I later changed my mind to the more general and extensible --format that allows for future addition of output formats, if the need will arise.

I implemented a first prototype using the json-c library, then switched to cJSON. I have to say that, even if both the libraries deal with JSON, they have a quite different approach in how to build a JSON object. I tend to prefer cJSON because it has a less structured approach to add scalar values.

Towards the end of the patch, we had to deal with a lot of issues with the port to OSX, and it required a few days for us to discover that we had not fully updated the CMakeList.txt file section related to the OSX part linkage.

Conclusions

pgagroal is growing more and more, and I believe that this new JSON feature will open the road for new exciting developments and integrations with other system, thus promoting the adoption of this tool in the PostgreSQL ecosystem!

Installing PostgreSQL 16 (development) on Rocky Linux 9: the Perl::IPC::Run problem

2024-02-08T00:00:00+00:00

A possible solution to a common problem

Installing PostgreSQL 16 on Rocky Linux 9: the Perl::IPC::Run problem

Today I was preparing a new machine, based on Rocky Linux 9, for some development activity. I was installing PostgreSQL 16 and the development stuff I need, so I was executing (after having imported the PGDG repository), the usual:

% sudo dnf install postgresql16.x86_64 \
                   postgresql16-contrib.x86_64 \
				   postgresql16-devel.x86_64 \
				   postgresql16-libs.x86_64 \
				   postgresql16-plperl.x86_64 \
				   postgresql16-server.x86_64

...
Error:
 Problem: cannot install the best candidate for the job
  - nothing provides perl(IPC::Run) needed by postgresql16-devel-16.1-2PGDG.rhel9.x86_64 from pgdg16
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)

Apparently I’m not able to find out a Perl-IPC-Run module on the Rocky Linux repositories, nor in the epel_release ones.

The correct way is to enable the crb repository:

% sudo dnf config-manager --set-enabled crb

And that’s it:

% dnf search Perl-IPC-Run

==================================== Name Matched: Perl-IPC-Run ====================================
perl-IPC-Run.noarch : Perl module for interacting with child processes
perl-IPC-Run3.noarch : Run a subprocess in batch mode

Another approach is to install it the Perl way!

I prefer to use cpanm as Perl package manager nowdays, but cpan and others work equally well:

% sudo dnf install perl-App-cpanminus.noarch

% sudo cpanm IPC::Run
--> Working on IPC::Run
Fetching http://www.cpan.org/authors/id/T/TO/TODDR/IPC-Run-20231003.0.tar.gz ... OK
Configuring IPC-Run-20231003.0 ... OK
Building and testing IPC-Run-20231003.0 ... OK
Successfully installed IPC-Run-20231003.0 (upgraded from 20200505.0)
1 distribution installed

pgenv gains a new command (and contributor!)

2024-02-06T00:00:00+00:00

A new command in the pgenv script.

pgenv gains a new command (and contributor!)

pgenv , the PostgreSQL binary manager written as a Bourne Again Shell script, has gained a new command: status.

The idea of this command is to report the status of a selected PostgreSQL instance, mainly if it is running or not. Behind the scenes the implementation exploits the pg_ctl command for the selected instance, stopping the execution immediatly if the user has no selected any instance.

The output of pg_ctl has been mangled to appear a little less verbose, in particular the pg_ctl: prefix has been removed.

Brian Salehi is the author of this patch, and hopefully a new contributor that will help improving pgenv again and again.

As an example, when using the new status command you will get something like the following:

% pgenv status
server is running (PID: 51503)
/usr/pgsql-16/bin/postgres "-D" "/postgres/16/data"

Changing a Column from Integer to Boolean in One Transaction

2024-02-05T00:00:00+00:00

A way to fix some oddity that comes from other databases.

Changing a Column from Integer to Boolean in One Transaction

I was migrating a database from SQLite3 to PostgreSQL, not because the former isn’t good, rather because the latter shines!

SQLite3 does not have booleans, so the tricky way to simulate booleans is to use integer columns (or characters, or whatever works for you), and I was in this situation with a table cassification having a miscellaneous column with only two values: 1 to indicate true and 0 to indicate false. Moreover, the column had a default value set to 0 (i.e., false).

While this is not a problem, it is really annoying when doing queries and data manipulation. Luckily PostgreSQL allows us for a quick fix of the column, migrating its data type to another. Unluckily, there is no straighforward evaluation of an integer into a boolean, so PostgreSQL is not able to understand how to migrate values, but it is quite simple to instrument it to follow the right path.

First of all, there is the need to check the original column values to identify if, by accident, some not-boolean-ish values have been stored. This is really simple, since you can do something like:

testdb=> SELECT count(*), miscellaneous
         FROM classification
		 WHERE miscellaneous NOT IN ( 0, 1 )
		 GROUP BY miscellaneous;

Now it is time to migrate the column.

PostgreSQL allows for transactional DDL statements, that means you can run multiple DDL statements within a transaction. Therefore, within a single transaction, it is possible to:

drop the column default value;
change the column data type, telling PostgreSQL about how to migrate the data;
assign a new default value to the new column.

Moreover, PostgreSQL is able to execute a single ALTER TABLE with multiple ALTER COLUMN statements, something that reminds me Oracle’s ALTER TABLE MODIFY ( ) expression:

testdb=> alter table classification
            alter column miscellaneous drop default,
            alter column miscellaneous set data type boolean
                 using
                 case miscellaneous when 1 then true else false end,
           alter column miscellaneous set default false;

Done!

The first alter column statement removes the default value, the second one uses a case to convert an integer into a boolean, and the last one adds a default value.

A more verbose way of doing the same thing is:

testdb=> BEGIN;
testdb=> alter table classification alter column miscellaneous drop default;
testdb=> alter table classification alter column miscellaneous set data type boolean
                 using
                 case miscellaneous when 1 then true else false end;

testdb=> alter table classification  alter column miscellaneous set default false;
testdb=> COMMIT;

The documentation and examples for ALTER TABLE provide more details about how to change the data type in similar situations.

'generated always as identity' columns do not have default values (or do they?)

2024-01-29T00:00:00+00:00

Something strange I discovered while using DBIx::Class and DBI.

‘generated always as identity’ columns do not have default values (or do they?)

PostgreSQL has two ways of defining what other databases call an auto-increment column:

serial
generated always as identity

The former, serial, is the oldest way of declaring an auto-increment column: it creates a sequence and attaches the default value of the column to the nextval() of the sequence. The latter, generated always as identity, is the newest (even if not so new!) declarative way of doing the same stuff as serial does: it creates a sequence and attaches the sequence and the table column.

So what is the difference?

In short, with serial you get two independent objects (a column and a sequence) that behave separatly, even if the value of the column is tied to the next value of the sequence. If there is the need to restart the counter, the table does not know anything, so you need to explicitly work against the sequence. On the other hand, with generated always as identity, the table column knows the sequence used to populate itself, and therefore it is possible to reset the sequence by working against the table. So for instance, you can do a ALTER TABLE foo ALTER COLUMN pk RESTART; without having to know the sequence name behind the pk column.

Usually, I do explain that the generated always as identity is the best way to proceed, because it gives all the advantages of serial with a more declarative way of handling special cases.

So, you should use generated always as identity, except when you should not!

The begin of the problems: `dbicdump` and `DBIx::Schema::Loader`

I was migrating a schema from SQLite3 to our beloved database, so I converted every SQLite3 autoincrement column to int generated always as identity. So far, so good, simple enough.

Then I used Perl DBIx::Schema::Loader and dbicdump to dump the database structure into so called schema, with objects to use.

And last, I used the generated objects. And here it is where problems become…

DBIx::Class was complaining about my auto-increment columns not being defined as such. What was wrong?

I inspected a class at glance, and I got the following:

__PACKAGE__->add_columns(
  "pk",
  {
    data_type         => "integer",
    is_nullable       => 0,
  },
...

compared to the SQlite3 definition

__PACKAGE__->add_columns(
  "pk",
  {
    data_type         => "integer",
    is_auto_increment => 1,
    is_nullable       => 0,
  },
...

immediatly revelead that there was a missing is_auto_increment variable definition for the same column.

DBIx::Schema::Loader was not understanding the column definition, at least not as I was expecting.

Investigating the problem

I decided to create a simple table with the two possible column types, and dump the schema to see what happens.

With a PostgreSQL table defined as:

testdb=> \d foo
                             Table "public.foo"
 Column |  Type   | Collation | Nullable |             Default
--------+---------+-----------+----------+---------------------------------
 pk     | integer |           | not null | generated always as identity
 kp     | integer |           | not null | nextval('foo_kp_seq'::regclass)

the resulted output from dbicudmp is:

__PACKAGE__->add_columns(
  "pk",
  {
    data_type         => "integer",
    is_nullable       => 0,
  },
  "kp",
  {
    data_type         => "integer",
    is_auto_increment => 1,
    is_nullable       => 0,
    sequence          => "author_kp_seq",
  },
  ...

It is clear that dbicdump is able to understand serial columns, while it is not able to understand the default value of generated always as identity!

More investigation: `DBIx::Class::Schema::Loader::DBI::Pg`

I was puzzled about the problem, so I decided to try to dig about how DBIx::Schema::Loader understands the definition of PostgreSQL table columns. It turned out, that DBIx::Class::Schema::Loader::DBI::Pg is the specific driver behind how the loader interacts with PostgreSQL meta information.

In particular, the _columns_info_for function tries to get the metadata for every column of the table, and in particular in such function you can find a piece of code like the following:

 # process SERIAL columns
 if ( ${ $info->{default_value} } =~ /\bnextval\('([^:]+)'/i ) {
   $info->{is_auto_increment} = 1;
   $info->{sequence}          = $1;
   delete $info->{default_value};
 }

Despite the comment, it is clear that the branch is evaluating the fact that the default_value must be like nextval. There are no other places that handle the autoincrement and sequence in the method.

But what is that $info hash? It is coming from DBI::column_info.

More and more investigation: `DBI::column_info`

The DBI driver interface provides a column_info method that provides an hash with a lot of useful information about the column definition.

It is really simple to write a dummy Perl program to dump the structure of the foo table presented before:

use v5.38;

use DBI;

my $db = DBI->connect( 'dbi:Pg:dbname=testdb;host=venkman;port=5432',
		       q/luca/,
		       q/XXXXXXXXXXX/ );


# testdb=> \d foo
#                             Table "public.foo"
#  Column |  Type   | Collation | Nullable |             Default
# --------+---------+-----------+----------+---------------------------------
#  pk     | integer |           | not null | generated always as identity
#  kp     | integer |           | not null | nextval('foo_kp_seq'::regclass)


my $statement = $db->column_info( undef, q/public/, q/foo/, q/pk/ );

while ( my $row = $statement->fetchrow_hashref ) {
    use Data::Dumper;
    say Dumper( $row );
}




say "==================================";
$statement = $db->column_info( undef, q/public/, q/foo/, q/kp/ );
while ( my $row = $statement->fetchrow_hashref ) {

    use Data::Dumper;
    say Dumper( $row );
}

The program produces the following (trimmed) output:

% perl ~/tmp/test.pl
$VAR1 = {
          'TYPE_NAME' => 'integer',
          'pg_schema' => 'public',
          'pg_type' => 'integer',
          'NULLABLE' => 0,
          'COLUMN_DEF' => undef,
          'IS_NULLABLE' => 'NO',
          'pg_column' => 'pk',
          'COLUMN_NAME' => 'pk',
          'pg_table' => 'foo',
          'TABLE_NAME' => 'foo',
          'TABLE_SCHEM' => 'public',
          'pg_constraint' => undef
		  ...
        };

==================================
$VAR1 = {
          'pg_column' => 'kp',
          'COLUMN_NAME' => 'kp',
          'DECIMAL_DIGITS' => undef,
          'pg_table' => 'foo',
          'TABLE_NAME' => 'foo',
          'TABLE_SCHEM' => 'public',
          'TYPE_NAME' => 'integer',
          'pg_type' => 'integer',
          'NULLABLE' => 0,
          'COLUMN_DEF' => 'nextval(\'foo_kp_seq\'::regclass)',
          ...
        };

The interesting part is COLUMN_DEF that defines the default value of the column: note how in the case of serial there is the nextval() call, while in the case of generated always as identity there is nothing. This is the problem.

Don’t blame `DBI`!

This is not a bug of DBI in a strict sense, so don’t blame the Perl Database Interface!

In fact, PostgreSQL is a little tricky about giving back information about generated always as identity columns:

testdb=> select a.attname, a.attidentity,
		(select pg_get_expr( d.adbin, d.adrelid, true )
		from pg_attrdef d
		where d.adrelid = a.attrelid
		and d.adnum = a.attnum )
		from pg_attribute a
		where a.attrelid = 'foo'::regclass
		and a.attname in ( 'pk', 'kp' );

 attname | attidentity |           pg_get_expr
---------+-------------+---------------------------------
 kp      |             | nextval('foo_kp_seq'::regclass)
 pk      | a           |
(2 rows)

As you can see, the only thing it is possible to extract is the fact that the column has been defined as an identity one (attidentity = a ).

How does `DBD::Pg` finds out the information about a column?

It turned out that DBD::Pg is using a very long query to get out the information about a column. I’ve opened a ticket on DBB::Pg.

Conclusions

I never thought about the possible difficulty in introspecting a generated column as identity. While I’m still convinced about the fact that it is better to use such columns instead of serial, the introspective frameworks could gain more information from the serial attribute defintions.

Learn PostgreSQL - second edition - Tech Bits

2024-01-27T00:00:00+00:00

Me and Enrico talk about out latest book on Doug’s Tech Bit show!

Learn PostgreSQL - second edition - Tech Bits

I’m really glad that me and Enrico were hosted on the great Doug’s Tech Bits show. You can see the podcast on YouTube:

I would like to thank Doug Ortiz for his excellent work.

Installing PL/Java on PostgreSQL 16 and Rocky Linux

2024-01-17T00:00:00+00:00

A short recap on some issues when dealing with PL/Java and Rocky Linux.

Installing PL/Java on PostgreSQL 16 and Rocky Linux

It has been a while since I last used PL/Java, and that’s mostly due to the fact that I (luckily) use much more Perl (and hence, PL/Perl) in my everyday activity than Java.

I decided to implement a few functionalities exploiting Java, and so here it comes another installation of PL/Java. Installing on Rocky Linux has been a little tricky, so here it is a short recap about what to do.

I wrote about PL/Java in my book PostgreSQL 11 Server Side Programming Quick Start Guide.

As usual within the PostgreSQL ecosystem, PL/Java has a very rich documentation.

IMPORTANT (2024-04-22): this article contains a few mistakes, that have been addressed in my other article, so please ensure to read also the other article!

Is it worth?

I had to answer this question over and over: is it worth using PL/Java for PostgreSQL triggers, functions, procedures and so on?

As usual in these cases, there’s no a single answer. First of all, using an external language like PL/Java means that PostgreSQL has to manage the round-trip of data between the database and the virtual machine, with the latter being fired at first execution. In short: performances are good but never as fast as native languages. Second, PL/Java brings all the complexitly of a formally compiled language, therefore making changes to the code is not as simple as in other scripting languages. Last, according to me, it does make sense if you need to bring some Java stuff into your scenario, either because it is a language you are absolutely proficient, or because you already have libraries and utilities that you don’t want to convert in a database usable way.

Installing PL/Java on Rocky Linux

Thanks to the PGDG, the official PostgreSQL repositories include an already available PL/Java package. Therefore, installing PL/Java is as simple as:

% sudo dnf install pljava_16.x86_64

This makes the executable available, that is PostgreSQL will be able to run Java stuff within the database.

Problem during compilation of PL/Java

If you need to develop against the PL/Java API, you need not only the executable, but also the whole library, that is compiled via Apache Maven. During the compilation, I got a few problems, most notably a gssapi related one.

I digged a little more using the -X flag:

% mvn -X clean install
...

In file included from /home/luca/pljava-1_6_6/pljava-so/src/main/c/InstallHelper.c:21:
/usr/pgsql-16/include/server/libpq/libpq-be.h:32:10: fatal error: gssapi/gssapi.h: No such file or directory
   32 | #include <gssapi/gssapi.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.

In order to solve the problem, I had to install the Kerberos development package:

% sudo dnf install krb5-devel.x86_64

and relaunching mvn worked as expected.

Using PL/Java

In order to use PL/Java there could be the need to relax the JVM security constraints. I don’t recommend to give an all permissions, but it is the quickest way to get PL/java able to run. Edit the file /usr/lib/jvm/java/lib/security/default.policy and make sure the very last section appears as follows:

// permissions needed by applications using java.desktop module
grant {
 permission java.security.AllPermission;
 ...
}

Inform PostgreSQL and PL/Java about where the JVM is located

Before being able to use PL/Java there is the need to inform PostgreSQL about where the JVM is located (and hence, which). This is achieved by a SET command:

testdb=# alter database testdb
     set
	 pljava.libjvm_location = '/usr/lib/jvm/java-11-openjdk-11.0.21.0.9-2.el9.x86_64/lib/server/libjvm.so';

and after this, it is possible to install PL/Java:

testdb=# create extension pljava;

Install a JAR

PL/Java being Java, works on the concept of jar archives. The JAR needs to be installed into PostgreSQL in order for PL/Java to be able to run its code. Installing a jar means that you need to inform PL/Java and PostgreSQL about the jar location.

testdb=> select sqlj.install_jar( 'file:///tmp/proj-0.0.1-SNAPSHOT.jar',
                                  'fluca',
								  true );

The first parameter to install_jar is the URI of the jar, the second is a shortname assigned to the jar and the last indicates if the deployment must be done.

Set the classpath

Java has the notion of classpath and so does PL/Java. In order to use a function within an installed jar, there is the need to map the PostgreSQL schema to the Java classpath, in particular to the jar.

testdb=> select sqlj.set_classpath('public', 'fluca');

The jar named fluca will be added to the public PostgreSQL schema, so that when you refer to a method in the publica schema PL/Java will search within the fluca jar.

Assuming the jar contains the classic Hello World function, the final result is something like:

estdb=> \sf hello
CREATE OR REPLACE FUNCTION public.hello(towhom character varying)
 RETURNS character varying
 LANGUAGE java
AS $function$java.lang.String=com.example.proj.Hello.hello(java.lang.String)$function$

which makes very clear that public.hello is mapped to Hello.hello in the Java space.

Where is my Java stuff?

PL/Java creates a schema sqlj that is used to handle both functions and tables that route stuff from PostgreSQL to Java and back.

In particular, sqlj.jar_repository contains an entry for every installed jar, so that you can for instance know where a jar is located:

estdb=> select jarid, jarname, jarorigin from sqlj.jar_repository;
 jarid | jarname |              jarorigin
-------+---------+-------------------------------------
     3 | fluca   | file:///tmp/proj-0.0.1-SNAPSHOT.jar

The table sqlj.classpath_entry shows how jar are mapped into PostgreSQL schemas:

testdb=> select r.jarname, r.jarorigin, c.schemaname
         from sqlj.jar_repository r join sqlj.classpath_entry c on c.jarid = r.jarid;
 jarname |              jarorigin              | schemaname
---------+-------------------------------------+------------
 fluca   | file:///tmp/proj-0.0.1-SNAPSHOT.jar | public

From the above it is possible to get the jar short name, the location of the jar on disk and to which PostgreSQL schema jar attributes have been mapped.

There are other interesting functions, like get_classpath, set_classpath and obviously remove_jar and replace_jar.

Conclusions

PL/Java is a very powerful tool and an interesting language to extend the already rich set of features that PostgreSQL provides.

Learn PostgreSQL (second edition): screencasts available!

2023-12-07T00:00:00+00:00

An example of how to run the Docker images provided by the Github repository.

Learn PostgreSQL (second edition): screencasts available!

One of the improvement we made while rewriting and updating the book was to introduce Docker images that the readers can launch as a safe environment to test the concepts expressed in the book. This has several advantages, most notably the fact that the user does not need to install a separate PostgreSQL instance on her own, and fill it with the data that could slightly change from chapter to chapter according to diffent examples. Another advantage is that, in the case the user damages the data and wants to restore it, the container can be erased and a new one can be built from scratch.

In order to help users to quickly access the PostgreSQL containers, preventing them from writing long and boring Docker commands, we built a simple shell script named run-pg-docker.sh that optionally accepts the name of the chapter image and projects the user within the container logging in as the postgres user. Therefore, with just a simple command, the reader can jump into the PostgreSQL container and start running all the commands and examples detailed in the book!

And, in order to better demonstrate how to quickly jump in, there are a couple of asciinema screencasts to let the readers see how the container process is launched.

The first screencast shows how to run the so called standalone container, the catch-all container used whenever there is no need for a per-chapter specific container:

The second screencast, on the other hand, shows how to run a specific per-chapter container, in particular the Chapter 10 container (related to users, roles and permissions):

Please consider that the time required for the container to fire up depends on the speed of the Internet connection, of the host machine and on the already downloaded artifacts (i.e., re-launching a container for the second time will require less time).

Resources

The Github repository for downloading examples, Docker images (via docker-compose) and in general source files is available at at this URL.

The Learn PostgreSQL second edition book can be found at this link.

Please consider that some output of the screencasts could be different from the one you get on your system, and that during time the configuration files for the Docker images could slightly change depending on readers’ suggestions and comments.

Learn PostgreSQL - second edition

2023-11-21T00:00:00+00:00

Another edition of our complete book is out there!

Learn PostgreSQL - second edition

On the last Halloween, the second edition of our book Learn PostgreSQL has been released!

I’m very proud of all the work me, and my friend Enrico (co-author), have done to not only and merely update this revision of the book, which now cover PostgreSQL 16, but also to provide new content, examples and most notably, a new approach to help readers understanding the concepts expressed in the book.

In fact, with this new edition, readers will have access to a set of Docker containers that can be used to quickly fire up a PostgreSQL instance and get hands on the examples and exercises!

Moreover, every chapter now has a Verify your knowledge ending section, made of questions and short answers to point the reader to the most important concepts of the chapter itself.

While the overall structure of the chapter has remained the same, we got the great chance to improve almost all the content in order to better explain concepts and terminology.

I strongly believe *this is not a simple *update of the book, rather it is a full upgrade! **

And after almost a month in the wild, the reviews for the book confirm my feelings{target=”_blank”}!

As always, me and Enrico will enjoy any feedback and errata that can help us improve, and other readers to get a better experience.

And this post cannot conclude without giving a very warm and special thanks to our technical reviewers Chris Mair and Silvio Trancanella, who helped us a lot improving the quality and readability of the book. I thank also all people at Packt that helped and assisted us during this work.

pgagroal: where is my configuration?

2023-11-13T00:00:00+00:00

A new command to display where the configuration files are located.

pgagroal: where is my configuration?

I implemneted a new command in pgagroal conf ls. The aim of the command is very simple: display where the configuration files are located. In fact, pgagroal configuration is split into several configuration files, and sometimes it could be useful to get information from the runtime system where a configuration file is.

The command works as follows:

% pgagroal conf ls
Main Configuration file:   /etc/pgagroal/pgagroal.conf
HBA file:                  /etc/pgagroal/pgagroal_hba.conf
Limit file:                /etc/pgagroal/pgagroal_databases.conf
Frontend users file:       /etc/pgagroal/pgagroal_frontend_users.conf
Admins file:               /etc/pgagroal/pgagroal_admins.conf
Superuser file:
Users file:                /etc/pgagroal/pgagroal_users.conf

If a configuration file has not been specified, the corresponding value will be left empty, otherwise, the full path to the configuration file will be displayed.

This is another small addition towards a better consistent and useulf command line interface.

pgagroal new commands: 'ping' and ìstatus details'

2023-10-30T00:00:00+00:00

Another little improvement to the interface for pgagroal

pgagroal new commands: ‘ping’ and ìstatus details’

When I committed the major command refactoring in pgagroal-cli, I introduced also a simple way to deprecate a command, so that the user running the old version of a command is warned about switching to the new interface.

This lead me to think I can not only refactor pgagroal-cli commands in a more coherent way, grouping similar commands together, but I can also change existing commands by means of deprecating them.

That is what I did in this commitLink Text where I replaced the is-alive command with ping and details with status details.

The `ping` command

I have to confess: the name has been inspired by the MySQL Admin tool, that has a similar command.

The idea of ping is to test if the connection pooler is alive, and it replaces the old command is-alive:

$ pgagroal-cli ping --verbose
pgagroal-cli: Success (0)


$ pgagroal-cli is-alive --verbose
pgagroal-cli: command <is-alive> has been deprecated by <ping> since version 1.6
pgagroal-cli: Success (0)

Please note that, as documented, the ping command does not print anything if the pooler is running.

The `status details` command

The status command prints a summary information about the pooler, while the details command prints the same summary and a more verbose and detailed information about every connection.

Why not group these two commands? This is the aim of having status details:

status will work as before;
status enhanced with details will provide more verbose output.

$ pgagroal-cli status
Status:              Running
Active connections:  0
Total connections:   0
Max connections:     15


$ pgagroal-cli status details
Status:              Running
Active connections:  0
Total connections:   0
Max connections:     15
---------------------
Server:              venkman
Host:                venkman
Port:                5432
State:               Not init
---------------------
---------------------
Server:              a
Host:                spengler
Port:                5432
State:               Not init
---------------------
---------------------
Server:              b
Host:                spengler
Port:                6432
State:               Not init
---------------------
---------------------
Database:            testdb
Username:            luca
Active connections:  0
Max connections:     2
Initial connections: 1
Min connections:     1
---------------------
---------------------
Database:            all
Username:            luca
Active connections:  0
Max connections:     10
Initial connections: 2
Min connections:     1
---------------------
---------------------
Database:            pgbench
Username:            pgbench
Active connections:  0
Max connections:     2
Initial connections: 1
Min connections:     1
---------------------
Connection    0:     Not init
Connection    1:     Not init
Connection    2:     Not init
Connection    3:     Not init
Connection    4:     Not init
Connection    5:     Not init
Connection    6:     Not init
Connection    7:     Not init
Connection    8:     Not init
Connection    9:     Not init
Connection   10:     Not init
Connection   11:     Not init
Connection   12:     Not init
Connection   13:     Not init
Connection   14:     Not init

Therefore, now the total number of main commands to pgagroal-cli has shrinked, since a few of them have been grouped.

Conclusions

While it may sound very trivial, having a coherent and easy to understand command line interface is a key value in make the project been approached by mere mortals. That’s why I strongly believe the refactoring of the commands in pgagroal-cli is going to play a very important role in the connection pooler adoption.

Installing pgBackRest on Amazon Linux (by sources)

2023-10-23T00:00:00+00:00

A recap on how to comile pgBackRest on Amazon Linux.

Installing pgBackRest on Amazon Linux (by sources)

I had the need to install pgBackRest on Amazon Linux machines.

Unluckily, even if Amazon Linux 2023 is a Red-Hat like operating system, the official PGDG repository did not install in any version. Therefore, I decided to install from sources, compiling the latest 2.48 version.

In order to achieve the final result, I had to install the following packages:

$ sudo dnf install postgresql15-server-devel.x86_64
$ sudo dnf install libxml2-static.x86_64
$ sudo dnf install -y libxml2-devel.x86_64
$ sudo dnf install -y libyaml-devel.x86_64
$ sudo dnf install -y bzip2-devel.x86_64

After this, I was able to download and compile pgbackRest:

$ wget https://github.com/pgbackrest/pgbackrest/archive/refs/tags/release/2.48.tar.gz
$ tar xzf 2.48.tar.gz
$ cd pgbackrest-2.48/src
$ ./configure && make && sudo make install

I tested it, and it works as solid as only pgBackRest can be!

Using psql Variables to Introspect Your Script

2023-10-23T00:00:00+00:00

A little trick to monitor your own running transaction in term of time and data size.

Using psql Variables to Introspect Your Script

psql is by far my favourite SQL text client, it has features that even the most expensive database tools provide. One very interesting property of psql is to support internal variables, pretty much like the variables you can find in a shell.

Since I often find myself doing some queries to get information about a transaction, in term of time and quantity of data manipulated, and doing manually the math, I decided that psql can do this for me by means of variables.

The Use Case: Quantitative Data About a Transaction

I want to run a long transaction that does some data manipulation and transformation, and I want to get an idea about how much it is going to cost me such a transaction, so that I can estimate how to apply the same transformation in production.

Usually, I begin the transaction having a look at the current time and WAL position, and I do the same at the end of the transaction. Doing the difference between the values provides me an hint about the wall clock time and the amount of data (assuming no other activity is going on the database). As an example:

testdb=> BEGIN;
BEGIN
testdb=*> SELECT clock_timestamp() AS begin_clock
testdb-*> , pg_current_wal_lsn() AS begin_lsn;
         begin_clock          | begin_lsn
------------------------------+------------
 2023-09-29 10:32:05.51654+02 | 2/A39CC3C0
(1 row)

testdb=*> INSERT INTO t( t )
testdb-*> SELECT 'Dummy ' || v
testdb-*> FROM generate_series( 1, 1000000 ) v;
INSERT 0 1000000
testdb=*> SELECT clock_timestamp() AS end_clock
, pg_current_wal_lsn() AS end_lsn;
           end_clock           |  end_lsn
-------------------------------+------------
 2023-09-29 10:32:48.511892+02 | 2/A81AC000
(1 row)

testdb=*> COMMIT;

Now that I have the times and WAL lsn positions, I can manually compute the cost of this transaction by copying and pasting the results:

testdb=> SELECT '2023-09-29 10:32:48.511892+02'::timestamp
         - '2023-09-29 10:32:05.51654+02'::timestamp AS wall_clock
        , pg_size_pretty( pg_wal_lsn_diff( '2/A81AC000', '2/A39CC3C0' ) ) as size;
   wall_clock    | size
-----------------+-------
 00:00:42.995352 | 72 MB

So the transaction took 42 seconds and produced around 72 MB of data (in the WALs). Note that I had to manually copy and paste every single value in order for the query to compute the difference I want.

Using `psql` variables to obtain the computation automatically

If I store the begin and end values into psql variables, I can use an immutable query to compute the same results, without having to copy and paste the single values.

This trick is made possible by the special command \gset, that allows for the declaration and definition of variables out of a query result.

testdb=> BEGIN;
BEGIN

testdb=*> SELECT clock_timestamp() AS clock
, pg_current_wal_lsn() AS lsn \gset begin_

testdb=*> INSERT INTO t( t )
SELECT 'Dummy ' || v
FROM generate_series( 1, 1000000 ) v;
INSERT 0 1000000

testdb=*> SELECT clock_timestamp() AS clock
, pg_current_wal_lsn() AS lsn \gset end_

testdb=*> SELECT :'end_clock'::timestamp - :'begin_clock'::timestamp as wall_clock
, pg_size_pretty( pg_wal_lsn_diff( :'end_lsn', :'begin_lsn' ) ) as size;
   wall_clock    | size
-----------------+-------
 00:00:11.400421 | 72 MB


testdb=*> COMMIT;
COMMIT

The two query to get the timing and WAL lsn informations are similar, and exploit a gset begin_ and \gset end_ command respectively. The first command takes the output of the query and, for each column, creates a variable with the given prefix (begin_) and the column name, therefore begin_clock and begin_lsn. The second query does the very same with the prefix end_, therefore creating end_clock and end_lsn variables.

The interesting part is the last query, that by now is totally automated and performs the differences between end_ and start_ values (please note the quoting and casting). Thanks to this little trick, I can now place such queries at the boundaries of my scripts and get as output the result I want or need to monitor the transaction.

Clearly, this approach can be extended, so you can have variables to track the number of tuples, the number of tables created or deleted, and so on. The key idea is to have a kind of catch-all set of queries that depend on variables you will define systematically in your scripts.

Why is the second transacction faster than the first one?

In the above example I shown two identical transactions, but the first one is slower, in terms of execution time, than the second one. The answer is simple: in the first transaction I was literally typing in the SQL statements, while in the second I was recalling them from the psql history. It is only a matter of typing the statements!

Conclusions

When I do professional training and present the psql command line client I see disappointment in my trainee faces. However, the more I go on explaining how flexible and powerful psql is, the more the classroom likes it. Thanks to the capabiliy of automagically set variables from a query output, psql allows you to automate some tasks including your own script introspection.

pgagroal command refactoring

2023-10-16T00:00:00+00:00

A new, cleaner, set of commands for pgagroal.

pgagroal command refactoring

It took me more than one year to get this patch in! The reason was not that this piece of code is particularly complex, rather it is hitting pretty much all the human interface pgagroal is exposing to the user.

When I started using pgagroal I felt uncomfortably with its command line interface. Commands had been added as the project improved, but there was not a clear grouping of related commands, and most of them have weird meaning, at least to me.

As an example, the reset command was dealing with the Prometheus reset, while reset-server was truly dealing with pgagroal reset. That sounded weird to me, since I believe pgagroal should deal first with itself, and then with other components, so the order of the commands appeared wrong to me.

Another example was the flush-xxx set of commands: flush-gracefully, flush-all and flush-idle. Why not grouping those commands into a big flush group and then add as a parameter what to effectively flush?

You probably get the point behind my rants. That’s why I started to develop this patch in to refactor the command line of pgagroal-cli and pgagroal-admin to:

have command groups
provide more concise and sane defaults
handle commands and subcommands, like git and other command line oriented tools do.

Since this change would have broke the command line, I decided also to place warnings and accept the old commands. Therefore I developed this patch with the option to parse the new set of commands, as well as the old ones, printing out a warning if the user was still using the old ones. This makes retro-compatibility easy, and pushes the user towards the new set of commands to prevent that the future removal of the old commands will break some external tools or scripts.

I’m not going to discuss the whole set of new commands, since the documentation already does this task. However, just to give you an idea about how the command now looks like, consider the flsuh commands:

# old commands
$ pgagroal-cli flush-all
$ pgagroal-cli flush-idle
$ pgagroal-cli flush-gracefully

# new commands
$ pgagroal-cli flush all
$ pgagroal-cli flush idle
$ pgagroal-cli flush gracefully

$ pgagroal-cli flush  # same as flush gracefully

As you can see, the new command is just flush and it accepts a subcommand that can either be idle, all, gracefully and is used to specify the mode to execute the flush command. This also introduces a new default behavior: if the user does not specify how to flush, the graceful mode is automatically selected.

If the user types in the old command, pgagroal-cli will emit a warning like the following:

$ pgagroal-cli flush-idle
WARN: command <flush idle> has been deprecated by <flush idle> since version 1.6.0

In this way, we keep compatibility with previous versions while trying to teach the users the new way to execute a command.

Sooner or later, old commands will be removed! Therefore users should start updating their tools and scripts to the new interface.

The conf set of commands is probably the one that groups the most subcommands. In fact, the conf command includes the set, the get and reload commands that were respectively config-set, config-get and reload. Note how the reload subcommand now makes it clearer what is going to be reloaded (i.e., the conf). In other words, according to me, pgagroal-cli conf reload is much clearer and less error prone than pgagroal-cli reload.

Some commands changed their name, and this was due to a clash in the default actions. For example, the reset and reset-prometheus commands have been moved into the clear group, with clear server (the default) and clear prometheus respectively.

Similarly, also pgagroal-admin has been updated, so that for instance pgagroal-admin add-user is now pgagroal-admin user add.

Thanks to the very well structured pgagroal source code, and to the introduction of a few new utility functions to handle command line arguments, these changes will allow the introduction to new commands and groups with a more consistent command line experience!

pgenv version 1.3.3 released

2023-09-28T00:00:00+00:00

A new release for the PostgreSQL binary manager.

pgenv version 1.3.3 released

pgenv release 1.3.3 is now available.

This release introduces two main environment variables to instrument the application about configuration files.

The first variable is PGENV_CONFIGURATION_FILE: such variable can be set to force pgenv to use a custom configuration file without having to guess which file to use depending on the specific PostgreSQL version in use. By default, pgenv looks for a configuration file named after the PostgreSQL version, or if not found, a default.conf configuration file. Using the above variable, it is now possible to pass information to pgenv about where a configuration is, and this allows for the same configuration file to be used over and over without any regard to the PostgreSQL version.

% export PGENV_CONFIGURATION_FILE=~/git/dot-files/pgenv/luca.conf

% pgenv rebuild 16.0
Using PGENV_ROOT /home/luca/git/pgenv
[DEBUG] Configuration file forced by environment variable PGENV_CONFIGURATION_FILE = /home/luca/git/dot-files/pgenv/luca.conf
[DEBUG] Configuration file forced by environment variable PGENV_CONFIGURATION_FILE = /home/luca/git/dot-files/pgenv/luca.conf
[DEBUG] Looking for configuration in /home/luca/git/dot-files/pgenv/luca.conf

...

As you can see, pgenv will now use the specified configuration file.

The other variable added is PGENV_WRITE_CONFIGURATION_FILE_AUTOMATICALLY, that if set to a false value (e.g., 0, no) will prevent pgenv to write or overwrite a configuration file once a build or rebuild is completed. The normal behavior is to let pgenv to write/overwrite the configuration file if this variable is not set at all or is set to a true value (e.g., 1, yes), and this is the behavior of previos releases. Since today, if you set this variable to a false value, pgenv will not create (nor overwrite) a configuration file at the end of a build phase.

% export PGENV_WRITE_CONFIGURATION_FILE_AUTOMATICALLY=no
% pgenv rebuild 16.0
...
[DEBUG] Not writing config file automatically: set `PGENV_WRITE_CONFIGURATION_FILE_AUTOMATICALLY` to a true value to enable the automatic file writing
PostgreSQL 16.0 built

As you see from the above, pgenv will complain it cannot write the configuration file for this build. Thanks to this option, you will be sure your carefully crafted configuration file will never be overwritten accidentally (please note that pgenv always make a backup copy before overwriting an existing file).

Last, the new subcommand config path has been added: the idea is to show to the user where pgenv expects to find a configuration file and what is going to use without any custom changes.

% pgenv config path
Using PGENV_ROOT /home/luca/git/pgenv
/home/luca/git/pgenv/config/default.conf

Functions to Validate User's Input

2023-09-25T00:00:00+00:00

PostgreSQL 16 introduces a couple of functions to validate user’s input.

Functions to Validate User’s Input

PostgreSQL 16 introduces a couple of new embedded functions: [pg_input_is_valid](https://www.postgresql.org/docs/16/functions-info.html#FUNCTIONS-INFO-VALIDITY-TABLE){:target="_blank"} and [pg_input_error_info](https://www.postgresql.org/docs/16/functions-info.html#FUNCTIONS-INFO-VALIDITY-TABLE){:target="_blank"}.

Both the functions accepts a couple of strings, the first one being the value to be validated, and the second one being the type to which you want to cast the value. This can be useful because you can check ahead of time if a given data type (expressed as a string) can be converted into a specific data type without raising an exception.

The first use case that comes into my mind is the conversion of some stringified date into an effective date, for example when importing data from an external source like a text file. Let’s see this in action:

testdb=> select * from pg_input_is_valid( '1978-07-19', 'timestamp' );
 pg_input_is_valid
-------------------
 t
(1 row)

testdb=> select * from pg_input_error_info( '1978-07-19', 'timestamp' );
 message | detail | hint | sql_error_code
---------+--------+------+----------------
         |        |      |
(1 row)

With a valid date, the pg_input_is_valid function returns true and the pg_input_error_info does not return any row. But what happens if the date is in a wrong format?

testdb=> \x
Expanded display is on.
testdb=> select * from pg_input_is_valid( '1978-19-07', 'timestamp' );
-[ RECORD 1 ]-----+--
pg_input_is_valid | f

testdb=> select * from pg_input_error_info( '1978-19-07', 'timestamp' );
-[ RECORD 1 ]--+--------------------------------------------------
message        | date/time field value out of range: "1978-19-07"
detail         |
hint           | Perhaps you need a different "datestyle" setting.
sql_error_code | 22008

As you can see from the above example, passing a wrong date/time format raises the error, and thanks to these functions we are now able to discover ahead of its usage what the problem could be.

Another example, just to clarify more:

testdb=> select pg_input_error_info( '4 months', 'interval' );
-[ RECORD 1 ]-------+------
pg_input_error_info | (,,,)

testdb=> select pg_input_error_info( '4 mesi', 'interval' );
-[ RECORD 1 ]-------+---------------------------------------------------------------
pg_input_error_info | ("invalid input syntax for type interval: ""4 mesi""",,,22007)

It is therefore quite easy to use such checks into your own function:

testdb=> CREATE OR REPLACE FUNCTION input_check( t text[] )RETURNS int
AS $CODE$
DECLARE
  current text; ok int := 0;  e text;
BEGIN
  FOREACH current IN ARRAY t LOOP
    IF pg_input_is_valid( current, 'date' ) THEN
       ok := ok + 1;
    ELSE
       SELECT message
       INTO e
       FROM pg_input_error_info( current, 'date' );
       RAISE DEBUG 'Skipping [%] because is not valid: %', current, e;
   END IF;
  END LOOP;

  RETURN ok;
END
$CODE$
LANGUAGE plpgsql;
CREATE FUNCTION

that, once invoked with the following input, provides the result as shown below:

testdb=> select input_check( array[ '2023-09-25', 'luca', '0001-01-01', 'Sat 23 Sep 2023', 'Feb 30 2023' ] );
DEBUG:  Skipping [luca] because is not valid: invalid input syntax for type date: "luca"
DEBUG:  Skipping [Feb 30 2023] because is not valid: date/time field value out of range: "Feb 30 2023"
 input_check
-------------
           3
(1 row)

psql \watch improvements

2023-09-22T00:00:00+00:00

A nice addition to the command \watch in the PostgreSQL command line client.

psql \watch improvements

psql is the best command line SQL client ever, and it gets improved constantly. With the new release of PostgreSQL 16, also psql get a new nice addition: the capability to stop a \watch command loop after a specific amount of iterations.

In this article I briefly show how the new feature works.

What is `\watch`?

The special command \watch is similar to the Unix command line utility watch(1): it repeats a specific command (in this case, an SQL statement) at regular time intervals.
I tend to use this, as an example, when I want to monitor some progress or some catalogs: I write the query that will produce the result I want to observe, and then use \watch to schedule regular repetitions of the query. For instance:

testdb=# SELECT * FROM pg_stat_progress_cluster;
...

testdb=# \watch 5

The above example will show me what is going on as CLUSTER or VACUUM with a refresh ratio of 5 seconds.

One problem of the \watch command is that it loops forever, meaning you need to manually stop it (e.g., CTRL-c). Another approach, is to raise an exception when the query has to stop. As a nasty example:

testdb=# WITH exit(x) AS ( SELECT count(*) FROM pg_stat_progress_cluster )
	   SELECT p.*
	   FROM pg_stat_progress_cluster p, exit e
	   WHERE 1 / e.x > 0
	   ;

The above query will raise a division by zero as soon as there are no more entries in the pg_stat_progress_cluster view, and this will in turn stop the \watch command:

testdb=# \watch 1
                                                                          Wed 20 Sep 2023 09:08:18 PM CEST (every 1s)

  pid  | datid | datname | relid |   command   |      phase       | cluster_index_relid | heap_tuples_scanned | heap_tuples_written | heap_blks_total | heap_blks_scanned | index_rebuild_count
-------+-------+---------+-------+-------------+------------------+---------------------+---------------------+---------------------+-----------------+-------------------+---------------------
 78406 | 16385 | testdb  |  2612 | VACUUM FULL | rebuilding index |                   0 |                   6 |                   6 |               1 |                 1 |                   0
(1 row)

                                                                              Wed 20 Sep 2023 09:08:19 PM CEST (every 1s)

  pid  | datid | datname | relid |   command   |          phase           | cluster_index_relid | heap_tuples_scanned | heap_tuples_written | heap_blks_total | heap_blks_scanned | index_rebuild_count
-------+-------+---------+-------+-------------+--------------------------+---------------------+---------------------+---------------------+-----------------+-------------------+---------------------
 78406 | 16385 | testdb  |  3603 | VACUUM FULL | performing final cleanup |                   0 |                 551 |                 551 |               3 |                 3 |                   1
(1 row)

ERROR:  division by zero

While the above approach is, according to me, ugly, it serves to stop \watch sometime in the future, so it could be useful to collect historical information without having a bunch of empty executions.

The new `\watch` count option

Staring from psql version 16, the \watch command has an option to indicate after how many iterations it has to spontaneously stop. The online help of the command is now as follows:

testdb=> \?
General
...
  \watch [[i=]SEC] [c=N] execute query every SEC seconds, up to N times

So it is now possible to specify to execute the \watch statement repeated, for instance, 7 times every 2 seconds:

testdb=> \watch i=2 c=7
...

Note that, in the case you want to specify the iteration counts, you have to use named parameter c, or the command will not understand your intention.The interval parameter does not require to be named, on the other hand, therefore to summary:

\watch 2 is valid and will repeat the command every 2 seconds without stopping;
\watch 2 c=7 is valid and will repated the command every 2 seconds, stopping after 7 iterations;
\watch i=2 c=7 ditto;
\watch c=7 i=2 ditto;
\watch 2 7 is not valid because both numbers will be considered as an interval and psql will abort with \watch: interval value is specified more than once.

Conclusions

psql is a command line client with a lot of features that can help interacting with PostgreSQL, and it gets improved release after release. In my experience, there is no other command line client with as much features as psql, and even the small addition to \watch makes it even more valuable.

FOR loops automatically declared variables in PL/PgSQL

2023-09-19T00:00:00+00:00

In PL/PgSQL FOR loops the iterator is automatically declared, and this could bring some problems.

FOR loops automatically declared variables in PL/PgSQL

Consider the following simple function that returns a table made by three columns:

CREATE OR REPLACE FUNCTION
public.a_table()
RETURNS TABLE( i int, j int, k int )
AS $CODE$

BEGIN
	FOR i IN 1 .. 2 LOOP
	    FOR j IN 1 .. 2 LOOP
	    	FOR k IN 1 .. 2 LOOP
		    RAISE INFO 'i=%, j=%, k=%', i, j, k;
		    RETURN NEXT;
		END LOOP;
	    END LOOP;
	END LOOP;
END

$CODE$
LANGUAGE plpgsql
VOLATILE
;

What is the result of invoking the above function?
Depending on how you know the FOR loop in PL/PgSQL, it could be surprising:

testdb=> select * from a_table();
INFO:  i=1, j=1, k=1
INFO:  i=1, j=1, k=2
INFO:  i=1, j=2, k=1
INFO:  i=1, j=2, k=2
INFO:  i=2, j=1, k=1
INFO:  i=2, j=1, k=2
INFO:  i=2, j=2, k=1
INFO:  i=2, j=2, k=2
 i | j | k
---+---+---
   |   |
   |   |
   |   |
   |   |
   |   |
   |   |
   |   |
   |   |
(8 rows)

Why is the result set empty even if the variables have values?
Because the FOR iterator is automatically declared and scoped to the loop itself. The PostgreSQL Documentation explains it:

The variable name is automatically defined as type integer and exists only inside the loop (any existing definition of the variable name is ignored

It should be clear that I’m referring to the integer FOR loop variant here. However, the problem is that while i, j and k are defined as variables for the function (the returning columns), the FOR loops create variables with the same name but an innser scope, so that it is not possible to refer to the returning columns.

Please note that the problem is not caught even with the warnings about shadowed variables :

testdb=> SET plpgsql.extra_warnings TO 'shadowed_variables';
SET
testdb=> select * from a_table();
INFO:  i=1, j=1, k=1
INFO:  i=1, j=1, k=2
INFO:  i=1, j=2, k=1
INFO:  i=1, j=2, k=2
INFO:  i=2, j=1, k=1
INFO:  i=2, j=1, k=2
INFO:  i=2, j=2, k=1
INFO:  i=2, j=2, k=2
 i | j | k
---+---+---
   |   |
   |   |
   |   |
   |   |
   |   |
   |   |
   |   |
   |   |
(8 rows)

Therefore, so far, the only solution is to choose appropriately the names of iterators, and of course to set the returnig variables accordingly:

CREATE OR REPLACE FUNCTION
public.a_table()
RETURNS TABLE( i int, j int, k int )
AS $CODE$

BEGIN
	FOR ii IN 1 .. 2 LOOP
	    FOR jj IN 1 .. 2 LOOP
	    	FOR kk IN 1 .. 2 LOOP
		    i := ii;
		    j := jj;
		    k := kk;
		    RAISE INFO 'i=%, j=%, k=%', i, j, k;
		    RETURN NEXT;
		END LOOP;
	    END LOOP;
	END LOOP;
END

$CODE$
LANGUAGE plpgsql
VOLATILE
;

The above in fact results in what you probably are expecting:

testdb=> select * from a_table();
INFO:  i=1, j=1, k=1
INFO:  i=1, j=1, k=2
INFO:  i=1, j=2, k=1
INFO:  i=1, j=2, k=2
INFO:  i=2, j=1, k=1
INFO:  i=2, j=1, k=2
INFO:  i=2, j=2, k=1
INFO:  i=2, j=2, k=2
 i | j | k
---+---+---
 1 | 1 | 1
 1 | 1 | 2
 1 | 2 | 1
 1 | 2 | 2
 2 | 1 | 1
 2 | 1 | 2
 2 | 2 | 1
 2 | 2 | 2

Using `plpgsql_check` as a possible help

This is a post update thanks to the comment of Pavel Stěhule on 2023-09-20.

The plpgsql_check extension could help in finding out the above described problem. Covering plpgsql_check here is out of the scope, however this is how the extension can provide some help:

testdb=# CREATE EXTENSION plpgsql_check;
CRATE EXTENSION

testdb=# SELECT message, level FROM plpgsql_check_function_tb( 'a_table()' );
           message           |     level
-----------------------------+---------------
 unmodified OUT variable "i" | warning extra
 unmodified OUT variable "j" | warning extra
 unmodified OUT variable "k" | warning extra
(3 rows)

As you can see, the check does not understand the effective problem, that is that the variables are all masked out by the context defined in the FOR loops, but at least it reveals that the output variables have not been modified along the function code. Knowing that such variables have not been modified means that what the function is expecting to achieve is probably not, and that will trigger some extra check by the developers.

Using Emacs and YASnippet to quickly write PostgreSQL functions

2023-09-08T00:00:00+00:00

How a simple snippet can allow you to save time and improve your PostgreSQL code quality.

Using Emacs and YASnippet to quickly write PostgreSQL functions

I love Emacs, and I also love PostgreSQL.
Whenever I have to write PostgreSQL code, I use Emacs.
Emacs can help me improving code quality, for example to write PostgreSQL functions. I use YASnippet as a package to provide the basic template for a PostgreSQL function.

A PostgreSQL Function Template (in action)

Before explaining the concept, let’s see a couple of short videos that demonstrate my snippet in action:

A PostgreSQL Function Template (the code)

The code for the template is the following one (I may change some bits here and there as time goes by):

# -*- mode: snippet -*-
# name: PostgreSQL Function
# key: function
# --
--
-- Function ${1:function_name}
-- Schema   ${2:public}
--
-- Description:
-- $3
--
-- Return Type: ${4:VOID}
--
CREATE ${5:OR REPLACE} FUNCTION
$2.$1($6)
RETURNS $4
AS $CODE$

$0

$CODE$
LANGUAGE ${8:plpgsql}
VOLATILE
;

The preamble is used from Emacs to understand the template name. The following, is SQL code that works as a template for a function. Every $n placeholder is a tab stop that can be used to place the cursor within the text. For example, ${1:function_name} is the first (1) tab stop, that present the default text function_name that is overwritten as I type in something. The name of function is then automatically replaced into the other $1 placeholder.
Note, how I first begin from the documentation, and then jump to the function code. This is a very important added value: writing the documentation first I ensure every piece of code will have at least some documentation, and thanks to the placeholders, what I write in the documentation is used to name the function and its return type.

Conclusions

Emacs and YASnippet can be very powerful to help writing PostgreSQL code. While this post focuses on functions, it is possible to provide templates also for other kind of code schemes, like procedures, triggers, and so on.

Using custom variables as per-session global variables

2023-08-24T00:00:00+00:00

A possible trick to emulate per-session global variables.

Using custom variables as per-session global variables

In a thread in the italian mailing list we were discussing about session global variables, something I believe is a bad idea, no matter what is the problem you are trying to solve, but probably a more database-oriented approach could solve it (e.g., temporary tables).

One thing I did not know, and I discovered thanks to the above discussion (credits to Andrea Adami) is that PostgreSQL allows the definition of custom variables by means of SET. Well, SET is of course the way to configure a GUC, that is a configuration parameter of the cluster. As you probably know, all GUCs that have a name without a namespace are cluster-wide, while those with a prefix belong to an extension.

Since PostgreSQL does not know in advance if an extension has been loaded or not, and since extension can be loaded at run-time, the cluster allows the user to set parameters that contain a prefix in the name. Documentation can be found here. Therefore, it is possible to use SET to define a fake GUC variable to be used in queries and functions.

As an example:

testdb=> SET fluca1978.favourite_database TO 'PostgreSQL';
SET
testdb=> SHOW fluca1978.favourite_database;
 fluca1978.favourite_database
------------------------------
 PostgreSQL
(1 row)
testdb=> SELECT 'Luca loves ' || current_setting( 'fluca1978.favourite_database' );
       ?column?
-----------------------
 Luca loves PostgreSQL
(1 row)

The variable behaves as a user context parameter, and honor also transaction boundaries:

estdb=> SELECT 'Luca loves ' || current_setting( 'fluca1978.favourite_database' );
       ?column?
-----------------------
 Luca loves PostgreSQL
(1 row)

testdb=> BEGIN;
BEGIN
testdb=*> SET fluca1978.favourite_database TO 'Oracle';
SET
testdb=*> SELECT 'Luca loves ' || current_setting( 'fluca1978.favourite_database' );
     ?column?
-------------------
 Luca loves Oracle
(1 row)

-- argh!
-- rollback!

testdb=*> ROLLBACK;
ROLLBACK
testdb=> SELECT 'Luca loves ' || current_setting( 'fluca1978.favourite_database' );
       ?column?
-----------------------
 Luca loves PostgreSQL
(1 row)

Clearly, this kind of variable is session-scoped and cannot be shared among different sessions:

testdb=> SELECT pg_backend_pid(), current_setting( 'fluca1978.favourite_database' );
 pg_backend_pid | current_setting
----------------+-----------------
            857 | PostgreSQL
(1 row)


-- in another session

testdb=> SELECT pg_backend_pid(), current_setting( 'fluca1978.favourite_database' );
ERROR:  unrecognized configuration parameter "fluca1978.favourite_database"

Conclusions

I don’t recommend this usage of dynamic session-scoped variables, since a temporary table is usually a better idea and provides pretty much the same solution. It is however interesting to know that PostgreSQL has this behavior. Clearly, it is important to avoid clashes in variable names (a thing that you don’t risk with temporary tables) against really existing GUCs defined by an extension.

A Possible Way to Implement a Shift Function in PL/PgSql (part 2)

2023-08-03T00:00:00+00:00

Creating a shift-like function for manipulating arrays in PL/PgSQL.

A Possible Way to Implement a Shift Function in PL/PgSql (part 2)

After my post about how to implement a shift like operation in PostgreSQL I got some comments and suggestions, most notably a pure SQL implementation provided by Stefan Stefanov, tho whom belongs the credits for the solution, and that allowed me to explain in this (second) article on the subject.
In the following you will find the Stefan Stefanov’s solution, a PL/Perl implementation I made in the meantime, and a little benchmarking to see how all the approaches compare to each other.

A pure SQL Implementation (credits to Stefan Stefanov)

The following is the function proposed by Stefan Stefanov:

CREATE OR REPLACE FUNCTION array_shift(arr anyarray, loops integer DEFAULT 1)
 RETURNS TABLE(head anyelement, tail anyarray)
 LANGUAGE sql
AS $function$

   with arr_tbl(el, arr_index) as (
       select * from unnest(arr) with ordinality
   )
   select (select el from arr_tbl where arr_index = loops), 
          (select array_agg(el order by arr_index) 
		          from arr_tbl where arr_index > loops);
$function$				  

As you can see, this is a very clever approach that exploits only SELECT statements to get the final result. The arr_tbl CTE explodes the array by means of the PostgreSQL builting unnest function, and returns the array as a table with the ordinality, that is an automatically added column that works as a row number. The output of the CTE is similar to the following one:

testdb=> select * from unnest( array['alfa','beta', 'gamma' ] ) with ordinality;
 unnest | ordinality 
--------+------------
 alfa   |          1
 beta   |          2
 gamma  |          3
(3 rows)
	

The main SELECT performs the selection of two different columns, both extracted by a subquery. The first subquery extracts the last element from the shift operation, that is the one with the ordinality (i.e., row number) equal to the number of loops. Assuming loops = 2, it extracts the beta value from the above table. This is what I called the head in my functions.
The other subquery extracts the elements with the ordinality greater than the number of shifts, that is all the remaining elements, and then re-agrgegates them into an array by means of the PostgreSQL builtin array_agg function.

The beauty of this idea is that everything is built on top of queries, that is the array is transformed into a table and then back into an array, but all the computation is done as cascading SELECT.

A PL/Perl implementation

Since Perl comes with a natural shift operator, why not using it as a wrapper to shift a PostgreSQL array?
The only drawback of this approach is that PL/Perl does not allow to pass an anyarray argument to a function, so there is the need to make an array-specific implementation:

CREATE OR REPLACE FUNCTION
shift_plperl(  text[],
                int default 1 )
RETURNS TABLE( head text, tail text[] )
AS $CODE$
   my ( $array, $loops ) = @_;
   my ( $head );

   $head = shift $array->@* for ( 1 .. $loops );
   return_next( { head => $head, tail => $array } );
   return undef;
$CODE$
LANGUAGE plperl;

The tests

The tests have been done, as in the previous post, with a block code similar to the following:

testdb=> DO LANGUAGE plpgsql
$CODE$
DECLARE
        a text[];
        ts_begin timestamp;
        ts_end   timestamp;
        iter     int;
        i        int;
BEGIN

        iter := 7000;

        -- initialize the array
        ts_begin := clock_timestamp();
        SELECT '{' || string_agg( v::text, ',' ) || '}'
        INTO a
        FROM generate_series( 1, iter / 2 + 5 ) v;
        ts_end := clock_timestamp();

        RAISE INFO 'Array allocation = %', ( ts_end - ts_begin );

		RAISE INFO 'Using shift for % iterations over % elements = %',
                   iter,
                   array_length( a, 1 ),
                   ( ts_end - ts_begin );


        ts_begin := clock_timestamp();
        FOR i IN 1 .. iter LOOP
            PERFORM shiftx( a, iter / 2 );
        END LOOP;
        ts_end := clock_timestamp();

        ts_begin := clock_timestamp();
        FOR i IN 1 .. iter LOOP
            PERFORM shiftx( a, iter / 2 );
        END LOOP;
        ts_end := clock_timestamp();

        RAISE INFO 'Using shiftx for % iterations over % elements = %',
                   iter,
                   array_length( a, 1 ),
                   ( ts_end - ts_begin );



        ts_begin := clock_timestamp();
        for i in 1 .. iter loop
                perform array_shift( a, iter / 2 );
        end loop;
        ts_end := clock_timestamp();

        RAISE INFO 'Using array_shift for % iterations over % elements = %',
                   iter, array_length( a, iter / 2 ),
                        ( ts_end - ts_begin );

        ts_begin := clock_timestamp();
        for i in 1 .. iter loop
                perform shift_plperl( a, iter / 2 );
        end loop;
        ts_end := clock_timestamp();

        RAISE INFO 'Using array_shift for % iterations over % elements = %',
                   iter, array_length( a, 1 ),
                        ( ts_end - ts_begin );

END
$CODE$;

where changing the iter variable makes the code to run more shifts against the same array.

In the following table, I show some results made on the same tiny crappy virtual machine. Please consider that the function used are:

shift a PL/PgSQL iteration based approach where, at each iteration the leftmost element of the array is removed;
shiftx a PL/PgSQL approach that slices the array;
array_shift is the PL/PgSQL function that executes the single query proposed by Stefan Stefanov;
shift_plperl a PL/Perl function that exploits the shift Perl operator.

Iterations	shifts	`shift`	`shiftx`	`array_shift`	`shift_plperl`
2000	1000	`23.03905` secs	`00.007952` secs	`00.518305` secs	`00.604843` secs
5000	2500	`05:44.236687` mins	`00.020937` secs	`03.039885` secs	`03.672881` secs
7000	3500	`15:23.3999` mins	`00.033396` secs	`05.97662` secs	`07.132485` secs
8000	4000	`23:02.73513` mins	`00.044517` secs	`07.445447` secs	`10.236968` secs
10000	5000	`44:46.496029` mins	`00.048962` secs	`12.211704` secs	`15.091066` secs
12000	6000	`01:16:53.758594` hours	`00.060169` secs	`17.198828` secs	`20.911864` secs

Long story short: the PL/PgSQL iteration based approach (shift) is by far the slowest approach, while the array-slice approach (shiftx) is the fastest one. The PL/Perl and query-only approach are comparable, with the latter being a little faster than the former probably due to PL/Perl requiring to marshall the arguments in and out of the function.

Clearly, the above is not a complete benchmarking, and has not been executed multiple times to get average results. However, the above does suffice in providing an idea of how the different approaches relate to each other.

Where is the SQL based solution spending its time?

It’s interesting to try to understand where Stefan Stefanov’s solution is spending most of its execution time, and EXPLAIN comes to a rescue here.

testdb=> explain analyze with 
   arr as ( select array_agg( v ) v from generate_series( 1, 100000 ) v )
   ,arr_tbl(el, arr_index) as (
       select u.* from unnest( (select * from arr) ) with ordinality u 
   )
   select (select el from arr_tbl where arr_index = 5000), 
          (select array_agg(el order by arr_index) 
                          from arr_tbl where arr_index > 5000);
						  
                                                                    QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=1250.60..1250.61 rows=1 width=36) (actual time=105.725..105.727 rows=1 loops=1)
   CTE arr_tbl
     ->  Function Scan on unnest u  (cost=1250.03..1250.13 rows=10 width=12) (actual time=59.127..66.255 rows=100000 loops=1)
           InitPlan 1 (returns $0)
             ->  Aggregate  (cost=1250.01..1250.02 rows=1 width=32) (actual time=51.840..51.841 rows=1 loops=1)
                   ->  Function Scan on generate_series v  (cost=0.00..1000.00 rows=100000 width=4) (actual time=30.879..40.827 rows=100000 loops=1)
   InitPlan 3 (returns $2)
     ->  CTE Scan on arr_tbl  (cost=0.00..0.22 rows=1 width=4) (actual time=60.106..79.291 rows=1 loops=1)
           Filter: (arr_index = 5000)
           Rows Removed by Filter: 99999
   InitPlan 4 (returns $3)
     ->  Aggregate  (cost=0.23..0.24 rows=1 width=32) (actual time=26.330..26.331 rows=1 loops=1)
	   ->  CTE Scan on arr_tbl arr_tbl_1  (cost=0.00..0.22 rows=3 width=12) (actual time=0.270..7.439 rows=95000 loops=1)
                 Filter: (arr_index > 5000)
                 Rows Removed by Filter: 5000
 Planning Time: 0.323 ms
 Execution Time: 106.301 ms

						  

The main node where there is time consuption is the CTE Scan to find out the head: it consumes more than 20 milliseconds. That node is produced by the first main subquery, and it requires a scan of the materialized CTE. Clearly I’m not considering the time consumed to produce the array, i.e., the InitPlan 1, because it is used only to feed the query.

Conclusions

While it is easy enough to implement a shift-like operation for PostgreSQL arrays, either by PL/PgSQL or a nested query, performances will never met the PostgreSQL array slicing.

A Possible Way to Implement a Shift Function in PL/PgSql

2023-08-01T00:00:00+00:00

Creating a shift-like function for manipulating arrays in PL/PgSQL.

A Possible Way to Implement a Shift Function in PL/PgSql

PostgreSQL does support arrays in a very excellent way, but it does not provide a shift like function. A shift function takes an array as input and removes the first (left-most) element from the array. This is quite simple to do in PostgreSQL, since array slices are easy to implement. However, a slice returns the modified (shifted) array, not the shifted element.

It is possible to implement a very simple function in PL/PgSQL that accepts an array of anytype and returns a table like multi-cardinality result set, with the element removed and the resulting array. The following is a straightforward implementation:

CREATE OR REPLACE FUNCTION
shift( a anyarray,
       loops int default 1,
       emit_intermediate boolean default false )
RETURNS TABLE( head text, tail anyarray, step int )
AS $CODE$
BEGIN
	-- check that the array is good and has
	-- at least one element
	IF a IS NULL OR array_length( a, 1 ) < 1 THEN
	   RETURN;
	END IF;

	-- if the array has less elements that those
	-- to shift, do only the max available shifting
	IF loops > array_length( a, 1 ) THEN
	   loops := array_length( a, 1 );
	END IF;

	-- initialize the returning array and the
	-- number of steps
	tail := a;
	step := 1;

	WHILE loops > 0 LOOP
		head := tail[ 1 ];
		tail := tail[ 2 : array_length( tail, 1 ) ];

		IF emit_intermediate OR loops = 1 THEN
		   RETURN NEXT;
		END IF;

		loops := loops - 1;
		step  := step + 1;
	END LOOP;

	RETURN;
END
$CODE$
LANGUAGE plpgsql;

The idea is quite simple: the function accepts the array to shift and, optionally, the number of times the shift operation has to be performed, as well as a flag to indicate if the intermediate steps have to be emitted. The function iterates over the number of shifts to be performed, and removes the head element from the array.
As an example:

testdb=> select * from shift( array[ 'cat', 'dog', 'parrot' ] );
 head |     tail     | step
------+--------------+------
 cat  | {dog,parrot} |    1
(1 row)

As you can see, the first element of the array, cat, is removed from the array that remains as {dog,parrot]; the removed (shifted) element is returned as head and the remaining array as tail. The step column indicates at which iteration the result refers to.

As another example, let’s do a shift twice, emitting intermediate results in the meantime:

testdb=> select * from shift( array[ 'cat', 'dog', 'parrot' ], 2, true );
 head |     tail     | step
------+--------------+------
 cat  | {dog,parrot} |    1
 dog  | {parrot}     |    2
(2 rows)

As you can see, this time two tuples are emitted. The first one (step = 1), the cat element is removed and the {dog,parrot} array is returned. At the second iteration (step = 2), the dog element is removed from the array and the remaining {parrot} is returned. Without emitting the intermediate results, the function returns always a single tuple:

testdb=> select * from shift( array[ 'cat', 'dog', 'parrot' ], 2 );
 head |   tail   | step
------+----------+------
 dog  | {parrot} |    2
(1 row)

Usage Example

As an example, it is possible to use the shift function into another PL/PgSQL piece of code using an assignment, for example a SELECT INTO:

DO LANGUAGE plpgsql $CODE$
DECLARE
	a text[];
	h text;
	I INT;
BEGIN
	a := array[ 'alfa', 'beta', 'gamma', 'delta' ];

	FOR i IN 1 .. 2 LOOP
		SELECT head, tail
		INTO h, a
		FROM shift( a );

		RAISE INFO 'Removed <%> = %', h, a;
	END LOOP;
END
$CODE$

that produces the following dummy output:

NFO:  Removed <alfa> = {beta,gamma,delta}
INFO:  Removed <beta> = {gamma,delta}
DO

Efficiency Considerations

The function shift, as it is implemented, is not really efficient because it performs a set of iterations over the given array. In the case there is no need to emit the intermediate stages, it is possible to shrink the function as a couple of operations, mainly an array slice. The following is a possible, slightly more efficient implementation:

CREATE OR REPLACE FUNCTION
shiftx( a anyarray,
        loops int default 1 )
RETURNS TABLE( head text, tail anyarray )
AS $CODE$
BEGIN
	-- check that the array is good and has
	-- at least one element
	IF a IS NULL OR array_length( a, 1 ) < 1 THEN
	   RETURN;
	END IF;

	-- if the array has less elements that those
	-- to shift, do only the max available shifting
	IF loops > array_length( a, 1 ) THEN
	   loops := array_length( a, 1 );
	END IF;

	-- initialize the returning array
	-- and the head of the last element
	head := a[ loops ];
	tail := a[ 1 + loops : array_length( a, 1 ) ];

	RETURN NEXT;
	RETURN;
END
$CODE$
LANGUAGE plpgsql;

The above implementation returns a single tuple, where the head is the last removed element at the loops offset, while the tail is the array slice removed from the array itself. Since this implementation does not perform any iteration, it can be a little faster on multi-occurencies shifts.

It is possible to perform a quick and dirty test about performances with a DO code that performs a few thousands of iterations comparing the results:

DO LANGUAGE plpgsql
$CODE$
DECLARE
	a text[];
	ts_begin timestamp;
	ts_end   timestamp;
	iter     int;
	i        int;
BEGIN

	iter := 2000;

	-- initialize the array
	SELECT '{' || string_agg( v::text, ',' ) || '}'
	INTO a
	FROM generate_series( 1, iter ) v;

	ts_begin := clock_timestamp();
	FOR i IN 1 .. iter LOOP
	    PERFORM shift( a, iter / 2 );
	END LOOP;
	ts_end := clock_timestamp();

	RAISE INFO 'Using shift for % iteration over % elements = %',
	      	   iter,
		   array_length( a, 1 ),
		   ( ts_end - ts_begin );


	ts_begin := clock_timestamp();
	FOR i IN 1 .. iter LOOP
	    PERFORM shiftx( a, iter / 2 );
	END LOOP;
	ts_end := clock_timestamp();

	RAISE INFO 'Using shiftx for % iteration over % elements = %',
	      	   iter,
		   array_length( a, 1 ),
		   ( ts_end - ts_begin );


END
$CODE$;

The above builds an array of 2000 elements and performs 2000 shifts with a cardinality of 100, that is removes the first 1000 elements from the array and loops 2000 times.
The results are the following, clearly depending on the machine they are run:

INFO:  Using shift for 2000 iteration over 2000 elements = 00:01:04.686821
INFO:  Using shiftx for 2000 iteration over 2000 elements = 00:00:00.055818
DO
Time: 64746,850 ms (01:04,747)

As you can see, the shiftx is clearly faster than the shift iterating version. This is clearly evenmore understandable if we raise the number of iterations and shifts to 5000:

INFO:  Using shift for 5000 iteration over 2505 elements = 00:05:40.513576
INFO:  Using shiftx for 5000 iteration over 2505 elements = 00:00:00.078997
DO
Time: 340597,743 ms (05:40,598)

Conclusions

Thanks to array slices it is simple enough to implement a shift like functionality in PL/PgSQL. The problem of the approach described here is that there is no easy way to modify the array in place, for example thru a reference, so the functions need to return a compound result like a table.

PostgreSQL 16 introduces a few new statistic fields for tables and indexes

2023-07-31T00:00:00+00:00

An addition to the pg_stat_xxx_tables and pg_stat_xxx_indexes that can help a lot in finding out seldomly used stuff.

PostgreSQL 16 introduces a few new statistic fields for tables and indexes

PostgreSQL 16 adds two important timestamp fields to the statistics about tables and indexes, most notably pg_stat_all_tables and pg_stat_all_indexes. Clearly, such fields are also inherited in user and system catalogs, like for instance pg_stat_user_tables and pg_stat_user_indexes. These two fields contain the last time a sequential scan against a table or an index (i.e., the index was used to extract data, and hence read) happened. As for all things statistics in PostgreSQL, the information is not in real time, rather it is defined at a transaction boundary.

Before these two fields were added, the statistics catalog provided only quantitative information, clearly less accurate, because it required the database administrator to guess how the system was behaving, for example understanding if an index was unused. Clearly, a table with a huge quantitative value of sequential scans is a good candidate for a few indexes to be created, while on the other hand an index with a very low usage counter is a good candidate for removal. With PostgreSQL 16 is now possible to better decide what to do in the above cases, understanding how far in the past a particular event happened, and hence better understand how acting on such table or index will affect the database.

In this article, I give a very short presentation of what it is like to query the new fields in the statistics catalogs. The tests have been done on PostgreSQL 16 beta-2.

Creating a simple workbench

Let’s create a very simple table and populate it:

testdb=> CREATE TABLE t( pk int generated always as identity
         , t text
        , primary key ( pk ) );
CREATE TABLE

testdb=> insert into t( t )
         select 'Test tuple #' || v
        from generate_series(1, 10000000 ) v;
INSERT 0 10000000

and add an index to the table

testdb=> CREATE INDEX idx_t_even ON t( pk ) WHERE pk % 2 = 0;
CREATE INDEX

The above bring up a table and index with the following sizes:

testdb=> SELECT relname, pg_size_pretty( pg_relation_size( oid ) )
FROM pg_class
WHERE relname IN ( 't', 'idx_t_even', 't_pkey' );
  relname   | pg_size_pretty
------------+----------------
 t          | 498 MB
 t_pkey     | 214 MB
 idx_t_even | 107 MB
(3 rows)

Let’s have a look at the statistics

In the pg_stat_xxx_tables and pg_stat_xxx_indexes there are now two new fields named last_seq_scan and last_idx_scan respectively. These fields are timestamps and contain the timestamp of the last time a sequential scan or an index scan has been performed.

For example:

testdb=> SELECT relname, seq_scan, now() - last_seq_scan as seq_scan_age, idx_scan
         FROM pg_stat_user_tables WHERE relname = 't';
-[ RECORD 1 ]+----------------
relname      | t
seq_scan     | 3
seq_scan_age | 00:04:57.627383
idx_scan     | 0

that gives the idea that the table has been read sequentially three times, last of which near five minutes ago. And in fact, if we perform another query on the table, the statistics gets update:

testdb=> SELECT count(*) FROM t;
-[ RECORD 1 ]---
count | 10000000

testdb=> SELECT relname, seq_scan, now() - last_seq_scan as seq_scan_age, idx_scan FROM pg_stat_user_tables WHERE relname = 't';
-[ RECORD 1 ]+----------------
relname      | t
seq_scan     | 6
seq_scan_age | 00:00:01.508468
idx_scan     | 0

What about the index? Well, the pg_stat_user_indexes shows information about the indexes and, in this case, the last_idx_scan is the added field:

testdb=> SELECT relname, indexrelname, idx_scan, now() - last_idx_scan FROM pg_stat_user_indexes WHERE relname = 't';
-[ RECORD 1 ]+-----------
relname      | t
indexrelname | t_pkey
idx_scan     | 0
?column?     |
-[ RECORD 2 ]+-----------
relname      | t
indexrelname | idx_t_even
idx_scan     | 0
?column?     |

Even in this case, when the index is used the last_idx_scan field is updated accordingly:

testdb=> SELECT relname, indexrelname, idx_scan, now() - last_idx_scan FROM pg_stat_user_indexes WHERE relname = 't';
-[ RECORD 1 ]+----------------
relname      | t
indexrelname | t_pkey
idx_scan     | 0
?column?     |
-[ RECORD 2 ]+----------------
relname      | t
indexrelname | idx_t_even
idx_scan     | 1
?column?     | 00:00:01.885197

Conclusions

Before PostgreSQL 16, the pg_stat_xxx_tables and pg_stat_xxx_indexes provided only quantitative information about the number of sequential scans and index usage, now it is also possible to have an idea on when such event last happened. This is important because it can reveal quickly how your indexes are performing without requiring you to reset the statistics and start monitoring them from scratch.

PostgreSQL Cluster Connection Limits

2023-07-27T00:00:00+00:00

A brief look to understand how the main cluster connection limits work.

PostgreSQL Cluster Connection Limits

PostgreSQL has two main connection limit tunables that allow the system administrator to decide what is the maximum number of connections the cluster will support and, in case an emergency activity has to be performed, what part of such connections is reserved to superusers.

PostgreSQL 16 is going to introduce a new parameter named reserved_connections among the other two max_connections and superuser_reserved_connections:

% psql -U postgres -h localhost -c 'SHOW SERVER_VERSION;'
 server_version
----------------
 16beta2
(1 row)

% psql -U postgres -h localhost -c "SELECT name, setting FROM pg_settings WHERE name like '%connections' and name not like 'log%'; "
              name              | setting
--------------------------------+---------
 max_connections                | 100
 reserved_connections           | 0
 superuser_reserved_connections | 3

The above are the default settings, that have not been changed since several releases of PostgreSQL.

The idea is to allow a fine grain tuning of how connections will be limited depending on the user asking for it. In this article, I try to briefly explain the difference between the two main settings (max_connections and superuser_reserved_connections) and the freshly introduced one (reserved_user_connections).

The Connection Limits Settings: `max_connections` and `superuser_reserved_connections`

First of all, the main idea is that the cluster is going to accept no more connections than max_connections, hence 100 in the above. Among the max_connections available, superuser_reserved_connections will be kept empty for incoming connections from superuser roles.
In other words, clients and application will be able to establish max_connections - superuser_reserved_connections connections.

It is simple enough to demonstrate this by means of pgbench:

% pgbench -U pgbench -T 60 -P 5 -n -c 100 -h localhost pgbench
pgbench (15.3, server 16beta2)
pgbench: error: connection to server at "localhost" (::1), port 5432 failed: FATAL:  remaining connection slots are reserved for roles with SUPERUSER
pgbench: error: could not create connection for client 97

In the above, I asked pgbench to create 100 concurrent connections, that is the max_connections value. That fails because three connections are reserved to superusers.

It is possible to demonstrate this using pgbench and simultaneously opening other connections. In a terminal launch the following:

% pgbench -U pgbench -T 120 -P 5 -n -c 97 -h localhost pgbench

that will consume all available user-level connections and will last for two minutes. Meanwhile, in another terminal, if you try to login as a non-superuser you get an error, while superuser can connect:

% psql -U pgbench -h localhost pgbench
psql: error: connection to server at "localhost" (::1), port 5432 failed: FATAL:  remaining connection slots are reserved for roles with SUPERUSER

% psql -U postgres -h localhost pgbench
psql (15.3, server 16beta2)
WARNING: psql major version 15, server major version 16.
         Some psql features might not work.
Type "help" for help.

pgbench=#

The New Connection Limit `reserved_connections`

As already written, this is a new parameter introduced by PostgreSQL 16. This parameter allows connections by user granted by the pg_use_reserved_connections, and is a way to make some non-superuser role more powerful, granting to him more capabilities.

First of all, let’s set the parameter to 10 connections; please note that being a network related parameter it is required a reboot of the cluster.

% psql -h localhost -U postgres -c 'ALTER SYSTEM SET reserved_connections TO 10;'
ALTER SYSTEM

% pgenv stop
% pgenv start

In the above I use pgenv as my PostgreSQL manager, but that is not the important part. After that, there is the need to grant some user(s) with the pg_use_reserved_connections permission:

% psql -U postgres -h localhost -c 'GRANT pg_use_reserved_connections TO luca;'
GRANT ROLE

It is now time to try:

% pgbench -U pgbench -T 120 -P 5 -n -c 97 -h localhost pgbench
pgbench (15.3, server 16beta2)
pgbench: error: connection to server at "localhost" (::1), port 5432 failed: FATAL:  remaining connection slots are reserved for roles with privileges of the "pg_use_reserved_connections" role
pgbench: error: could not create connection for client 87

As you can see, pgbench is no more able to obtain up to 97 connections because now 10 are reserved for non-superuser roles with the pg_use_reserved_connections. Therefore, the only way to make it work is to low the concurrent connections to max_connections - reserved_connections - superuser_reserved_connections, that means 100 - 10 - 3 = 87.

% pgbench -U pgbench -T 120 -P 5 -n -c 87 -h localhost pgbench

and while the above is working, you can try to connect from another concurrent session:

% psql -U pgbench -h localhost pgbench
psql: error: connection to server at "localhost" (::1), port 5432 failed: FATAL:  remaining connection slots are reserved for roles with privileges of the "pg_use_reserved_connections" role

% psql -U luca -h localhost pgbench
psql (15.3, server 16beta2)
WARNING: psql major version 15, server major version 16.
         Some psql features might not work.
Type "help" for help.

pgbench=>

In the first attempt, the connection fails because the pgbench user does not have any more connection slots to use, or better, there are no connection slots within the cluster to use.
However, the user luca succeed at connecting because he has the special pg_use_reserved_connections permission and there are still available slots.

It is important to note that no matter if your cluster does not have any role with the pg_use_reserved_connections, once the setting reserved_connections is not zero the cluster will keep such connection slots available! In other words, use reserved_connections only when you are sure you are going to grant the permission to a few roles.

Conclusions

PostgreSQL is able to prevent the system administrator to lock out the cluster, even when the number of connections is approaching the maximum allowance. Thanks to the new parameter reserved_connections added in upcoming PostgreSQL 16, it will be possible to fine-grain tune the connection allowance even better!

A PL/PgSQL Simple Roman Number Translator

2023-07-24T00:00:00+00:00

A way to decode a Roman number into an Arabic one and vice-versa using PL/PgSQL.

A PL/PgSQL Simple Roman Number Translator

In the last Weekly Challenge 227 the second task was about building a simple Roman numbers calculator. Since I usually try to implement those tasks also in PL/PgSQL (as well as in PL/Perl), I tried to implement such calculator and, along the path, I implemented a couple of simple functions to translate a number from and to roman notations.
In this short post I explain how the two functions work.
My approach is based on a lookup table that stores arabic and roman correspondencies for special cases and base units.

The lookup table

I defined a lookup table, that can be in whatever schema you want, even temporary, and that is populated with base units and some special cases:

CREATE SCHEMA IF NOT EXISTS fluca1978;
CREATE  TABLE IF NOT EXISTS fluca1978.roman( r text, n int, repeatable boolean );

TRUNCATE TABLE fluca1978.roman;

INSERT INTO fluca1978.roman
VALUES
('I', 1, true )
,( 'IV', 4, false )
,( 'V', 5, false )
,( 'IX', 9, false )
,( 'X', 10, true )
,( 'XL', 40, false )
,( 'L', 50, false )
,( 'XC', 90, false )
,( 'C', 100, true )
,( 'CD', 400, false )
,( 'D', 500, false )
,( 'CM', 900, false )
,( 'M', 1000, true );

The r field holds the roman value for a number n, while the repeatable flag indicates if the number can be repeated consequently in the same stirng. For example, I can be repeated to form III, while IV cannot be repeated into IVIV. This will be useful during validation.

Validating a Roman String

The following function perform the minimal validation for a given input string that is supposed to be a roman number:

CREATE OR REPLACE FUNCTION
fluca1978.validate_roman( r text )
RETURNS boolean
STRICT
AS $CODE$
DECLARE
	current_record fluca1978.roman%rowtype;
	rx text;
	matches int;
BEGIN


	FOR current_record IN SELECT * FROM fluca1978.roman ORDER BY n DESC LOOP
	    RAISE DEBUG 'Iterating over Roman value % = %', current_record.r, current_record.n;

	    matches := 0;
	    rx := format( '^%s', current_record.r );

	    WHILE r ~ rx LOOP
	    	  matches := matches + 1;
		  RAISE DEBUG 'Input string % -> % matches the Roman value %', r, matches, current_record.r;

		  IF NOT current_record.repeatable AND matches > 1 THEN
		     RAISE DEBUG 'Roman symbol % cannot be repeated!', current_record.r;
		     RETURN false;
		  END IF;

		  r := regexp_replace( r, rx, '' );
		  EXIT WHEN length( r ) = 0;
	    END LOOP;

 	   EXIT WHEN length( r ) = 0;
	END LOOP;

	IF length( r ) > 0 THEN
	   RETURN false;
	END IF;

	RETURN true;
END
$CODE$
LANGUAGE plpgsql;

The idea is simple: I order the lookup table in descending order, so from the biggest value to the smallest one. At each iteration, I search fi the current roman string starts with the roman letter (or couple of letters). If that is the case, I keep track of how many matches I’ve found, then remove the roman symbol from the beginning of the string. Then I see if the same letter/symbol can be found in the beginning of the string, and if so, I ensure it is a repeatable value, otherwise there is an error. If everything goes well, the ending string r will be empty due to the substitutions, otherwise if some characters remain then the string is wrong. That happens, for example, when the roman values on the right are biggest than those on the left.

Converting from Roman to Arabic

The following function does the convertion starting from a Roman string:

CREATE OR REPLACE FUNCTION
fluca1978.from_roman( r text )
RETURNS int
STRICT
AS $CODE$
DECLARE
	v int := 0;
	current_record fluca1978.roman%rowtype;
	rx text;
BEGIN
	IF r = '' THEN
	   RETURN 0;
	END IF;

	IF NOT fluca1978.validate_roman( r ) THEN
	   RETURN 0;
	END IF;

	FOR current_record IN SELECT * FROM fluca1978.roman ORDER BY n DESC LOOP
	     RAISE DEBUG 'Iterating over Roman value % = %', current_record.r, current_record.n;

	     rx := format( '^%s', current_record.r );
	    WHILE r ~ rx LOOP
		RAISE DEBUG 'Input string % matches the Roman value %', r, current_record.r;

	        v := v + current_record.n;
	        r := regexp_replace( r, rx, '' );
	    END LOOP;
	END LOOP;

	RAISE DEBUG 'Converted value is %', v;
	RETURN v;
END
$CODE$
LANGUAGE plpgsql;

It is really similar to the validating function: it iterates on each part of the string searching to decode as the biggest possible value in the roman lookup table.

It is possible to see the workflow of the function by means of using the DEBUG log level:

testdb=> set client_min_messages to debug;
SET
testdb=> select fluca1978.from_roman( 'MCMLXXVIII' );
DEBUG:  Iterating over Roman value M = 1000
DEBUG:  Input string MCMLXXVIII matches the Roman value M
DEBUG:  Iterating over Roman value CM = 900
DEBUG:  Input string CMLXXVIII matches the Roman value CM
DEBUG:  Iterating over Roman value D = 500
DEBUG:  Iterating over Roman value CD = 400
DEBUG:  Iterating over Roman value C = 100
DEBUG:  Iterating over Roman value XC = 90
DEBUG:  Iterating over Roman value L = 50
DEBUG:  Input string LXXVIII matches the Roman value L
DEBUG:  Iterating over Roman value XL = 40
DEBUG:  Iterating over Roman value X = 10
DEBUG:  Input string XXVIII matches the Roman value X
DEBUG:  Input string XVIII matches the Roman value X
DEBUG:  Iterating over Roman value IX = 9
DEBUG:  Iterating over Roman value V = 5
DEBUG:  Input string VIII matches the Roman value V
DEBUG:  Iterating over Roman value IV = 4
DEBUG:  Iterating over Roman value I = 1
DEBUG:  Input string III matches the Roman value I
DEBUG:  Input string II matches the Roman value I
DEBUG:  Input string I matches the Roman value I
DEBUG:  Converted value is 1978
 from_roman
------------
       1978
(1 row)

As you can see, at every iteration the function removes the leftmost letter from the string and continues to see what it can find next.
The matching is performed by building a regular expression as condition to the WHILE loop: the condition has the begin at string anchor ^ followed by whatever roman symbole is in the current record out of the lookup table. The special EXIT part ensures that there cannot be repetitions of two letetrs symbols. For example you cannot express IVIV as 8, so once IV is encountered, the WHILE knows it can safely exit the loop.

Converting From Arabic to Roman

The following function does the opposite: converts an integer into a roman string.

CREATE OR REPLACE FUNCTION
fluca1978.to_roman( n int )
RETURNS text
STRICT
AS $CODE$

DECLARE
	roman_value text := '';
    current_record fluca1978.roman%rowtype;
BEGIN
	IF n <= 0 THEN
		RAISE DEBUG 'Cannot convert zero!';
		RETURN NULL;
	END IF;

	FOR current_record IN SELECT * FROM fluca1978.roman ORDER BY n DESC LOOP

	    WHILE n >= current_record.n LOOP
			RAISE DEBUG 'The value % is greater than % so appending a %', n, current_record.n, current_record.r;
			roman_value := roman_value || current_record.r;
			n := n - current_record.n;
			EXIT WHEN length( current_record.r ) = 2;
	    END LOOP;
	END LOOP;

	RAISE DEBUG 'Computed value is %', roman_value;
	RETURN roman_value;
END
$CODE$
LANGUAGE plpgsql;

Again, thanks to the debug output it is easy to understand the workflow of the converter:

testdb=> select fluca1978.to_roman( 1978 );
DEBUG:  The value 1978 is greater than 1000 so appending a M
DEBUG:  The value 978 is greater than 900 so appending a CM
DEBUG:  The value 78 is greater than 50 so appending a L
DEBUG:  The value 28 is greater than 10 so appending a X
DEBUG:  The value 18 is greater than 10 so appending a X
DEBUG:  The value 8 is greater than 5 so appending a V
DEBUG:  The value 3 is greater than 1 so appending a I
DEBUG:  The value 2 is greater than 1 so appending a I
DEBUG:  The value 1 is greater than 1 so appending a I
DEBUG:  Computed value is MCMLXXVIII
  to_roman
------------
 MCMLXXVIII

Caching Results

It is, clearly, very simple to define a materialized view or a cache table to handle all values for a faster lookup. As an example, imagine to create a table that serves as a cache:

CREATE TABLE IF NOT EXISTS fluca1978.roman_cache_table( n int, r text );
TRUNCATE TABLE fluca1978.roman_cache_table;

INSERT INTO fluca1978.roman_cache_table( n, r )
SELECT n, r
FROM   fluca1978.roman
ORDER BY n;

and then a function that, given a number, tries to understand if the caching table contains such a number, otherwise populates the table from the last found index to the given one

CREATE OR REPLACE FUNCTION
fluca1978.roman_cache( x int )
RETURNS text
STRICT
AS $CODE$
DECLARE
	max_cached_value int;
	i int;
	v text;
BEGIN
	SELECT max( n )
	INTO max_cached_value
	FROM fluca1978.roman_cache_table;

	RAISE DEBUG 'Max cached value % and looking for %', max_cached_value, x;

	IF max_cached_value IS NULL OR x > max_cached_value THEN
	   IF max_cached_value IS NULL THEN
	      max_cached_value := 1;
	   END IF;
	   RAISE DEBUG 'Repopulating the cache from % to %', max_cached_value, x;

	   FOR i IN max_cached_value + 1 .. x LOOP
	   	   INSERT INTO fluca1978.roman_cache_table( n, r )
	   	   SELECT i, fluca1978.to_roman( i );
	   END LOOP;
	END IF;

	SELECT r
	INTO v
	FROM fluca1978.roman_cache_table
	WHERE n = x;

	RETURN v;
END
$CODE$
LANGUAGE plpgsql;

When you query the above function, the system inspects the roman_cache_table for the asked arabic number, and the number is in there it returns it. If the number is greater than the max value within the caching table, the function populates the table up to the given number.

Conclusions

With some patient and a few iterations, it is possible to create a fully functional Roman Number Converter, and hence also a calculator. Clearly, this kind of task is much more simpler with Perl (and PL/Perl), but PL/PgSQL can handle it too with a littlemore verbosity. Code from the above examples can be found on my Github repository.

Multi-Dimensional Arrays in PostgreSQL

2023-05-18T00:00:00+00:00

A look at how PostgreSQL handles multi-dimensional arrays.

Multi-Dimensional Arrays in PostgreSQL

PostgreSQL supports arrays of various types, and handles also multi-dimensional arrays. Except that it does not support multi-dimensional arrays!

Allow me to better explain. Multi-dimensional arrays are just an array that contains other arrays. In this sense, PostgreSQL does not provide a pure native multi-dimensional array, even if you can specify them.

Let’s see this in action by means of pg_typeof:

testdb=> select pg_typeof(  array[ array[ 1, 2 ],
                                   array[ 3, 4 ] ]::int[][] );
 pg_typeof
-----------
 integer[]
(1 row)

As you can see, the above matrix is repoted to be a single flat array.

Consider now the following function, that accepts a multi-dimensional array and returns a table:

CREATE OR REPLACE FUNCTION
f_matrix( int[][] )
RETURNS TABLE( a int, b int )
AS $CODE$
   my ( $matrix ) = @_;

   for my $row ( 0 .. $matrix->@* - 1 ) {
       for my $column ( 0 .. $matrix->[ $row ]->@* - 1 ) {
       	   return_next( { a => $row + 1,
	   		  b => $matrix->[ $row ]->[ $column ]
			} );
       }
   }

return undef;
$CODE$
LANGUAGE plperl;

The above function, when invoked with a multi-dimensional array, works as expected:

testdb=> select *
         from f_matrix( array[ array[ 1, 2 ],
		                       array[ 3, 4 ] ]::int[][] );
 a | b
---+---
 1 | 1
 1 | 2
 2 | 1
 2 | 2
(4 rows)

However, if you inspect the function, its signature clearly tells that the input parameter is a flat array:

testdb=> \df f_matrix
                              List of functions
 Schema |   Name   |      Result data type       | Argument data types | Type
--------+----------+-----------------------------+---------------------+------
 public | f_matrix | TABLE(a integer, b integer) |   integer[]         | func
(1 row)

The same result, clearly, can be achieved by plpgsql, for example implementing the following function:

CREATE OR REPLACE FUNCTION
f_matrix( m int[][] )
RETURNS TABLE( a int, b int )
AS $CODE$
DECLARE
   r int;
   c int;
BEGIN
	FOR r IN 1 .. array_length( m, 1 ) LOOP
	    FOR c IN 1 .. array_length( m, 2 ) LOOP
	    	a := r;
		    b := m[ r ][ c ];
		    RETURN NEXT;
	    END LOOP;
	END LOOP;

RETURN;
END
$CODE$
LANGUAGE plpgsql;

In conclusion, PostgreSQL manages multi-dimensional arrays as flat lists, like what you would do in the C programming language. This does not mean that you cannot use multi-dimensional arrays in an comfortable and efficient way, rather that you need to take into account how they are handled by the database engine, especially when passing them to a function.

Table name as function arguments: a few checks

2023-05-11T00:00:00+00:00

How to check if a given table name exists and where to find it.

Table name as function arguments: a few checks

Often I write some piece of code, usually a function or a procedure, that must operate dynamically on a table. To achieve this, I often pass the table name as an argument to the function.
The function should always check that the table exists and, moreover, the function should always use the fully qualified name of the table to avoid schema conflicts and search_path pollution problems. Last, sometime I use a relative name when I do pass the table as an argument, sometime I want to pass a fully qualified name to the function.

I’ve a template for doing this minimal checks, clearly it is just an idea on how to improve your own functions when dealing with table names. Imagine a simple function that accepts a table name, as follows:

CREATE OR REPLACE FUNCTION
f_do_stuff_on_table( t_name text )
RETURNS bool
AS $CODE$
DECLARE
	s_name text;
	info text[];
	pg_version  int;
	qualified_name text;
BEGIN


	-- parse the schema name
	info := parse_ident( t_name );
	IF array_length( info, 1 ) = 2 THEN
	   s_name := info[ 1 ];
	   t_name := info[ 2 ];
	ELSE
	   	-- try to understand if PostgreSQL 15 or higher
		SELECT setting::int
		INTO pg_version
		FROM pg_settings
		WHERE name = 'server_version_num';

		IF pg_version >= 150000 THEN
		   SELECT current_role
		   INTO   s_name;
		ELSE
		   s_name := 'public';
		END IF;

	END IF;


	-- check if the table exists
	PERFORM c.oid
	FROM pg_class c
	JOIN pg_namespace n
	ON n.oid = c.relnamespace
	WHERE c.relkind = 'r'
	AND   n.nspname = s_name
	AND   c.relname = t_name;

	IF NOT FOUND THEN
	   RAISE 'Table %.% does not exist, cannot proceed!', s_name, t_name;
	END IF;

	qualified_name := format( '%I.%I', s_name, t_name );
	RAISE DEBUG 'Table %', qualified_name;

	RETURN true;

END
$CODE$
LANGUAGE plpgsql;

The function accepts t_name that can be a relative name (e.g., foo) or an absolute name like public.foo.
Initially the function exploits the parse_identifier internal PostgreSQL function to get out an array of elements, where the first one represents the schema name and the second one represents the table name. Thanks to this, and checking if the returned array has a size of 2, I can discriminate on what the function has received as an argument.
If the function received a fully qualified table name, I store the schema into s_name and rewrite t_name with only its relative name, and nothing more has to be done on the naming part. On the other hand, if the function received a relative name, I must use a default schema, that generally speaking is public unless PostgreSQL 15, where it is the current role name. Therefore, I get the number of the PostgreSQL version and decide what value s_name will assume, either public or the current_role (interpolated) value.
Once I have both the schema and the relative table name, I can check for the table in pg_class, assuming that pg_namespace confirms that the table is in such schema. If the table is not there, I can RAISE an exception and stop the function right there, otherwise I can build a qualified name and go on with the rest of the work.

Extracting the list of columns from the catalogs

2023-05-08T00:00:00+00:00

A simple look at the PostgreSQL catalogs to get the list of a table’s column.

Extracting the list of columns from the catalogs

The special catalog pg_attribute keeps track of every column that your tabular structure holds. However, before using such catalog, you need to keep in mind some basic rules. In particular, every attribute in the catalog has an ordinality number named attnum: when the number is positive the attribute refers to a user defined column, whenever it is negative it represent a PostgreSQL special column. Moreover, the special column attisdropped indicates if the attribute has been dropped. Moreover, when an attribute is dropped, it takes a special name in the catalog, like .......pg.dropped.5.......

Imagine we create a dummy table as follows:

testdb=> CREATE TABLE foo(
a int
, b char
, c int
, d int
, e int
, f int
, g char
, z text
, y char
, k int
, j int
);

testdb=> ALTER TABLE foo DROP COLUMN e;

Getting the list of columns is easy via the catalog pg_attribute:

testdb=> SELECT attname, attnum
FROM pg_attribute
WHERE attrelid = 'foo'::regclass
AND NOT attisdropped;
 attname  | attnum
----------+--------
 tableoid |     -6
 cmax     |     -5
 xmax     |     -4
 cmin     |     -3
 xmin     |     -2
 ctid     |     -1
 a        |      1
 b        |      2
 c        |      3
 d        |      4
 f        |      6
 g        |      7
 z        |      8
 y        |      9
 k        |     10
 j        |     11
(17 rows)

There are few things to note in the above output: 1) the attnum is the order the column has been added to the table, in fact it respects the original table definition; 2) if attnum is positive than the attribute is a user defined one, that means it is a column you added to the table; 3) if attnum is negative than the attribute has been added by the system (i.e., PostgreSQL) for its internal usage; 4) all attributes listed in pg_attribute can be queried by the user; 5) the dropped column e is missing, note how the attnum skips the ordering 5.

It is now simple enough to get a list of columns and paste it into a query:

testdb=> SELECT string_agg( attname, ',' )
FROM pg_attribute
WHERE attrelid = 'foo'::regclass
AND NOT attisdropped;
                       string_agg
---------------------------------------------------------
 a,b,c,cmax,cmin,ctid,d,e,f,g,j,k,tableoid,xmax,xmin,y,z
(1 row)

testdb=> SELECT a,b,c,cmax,cmin,ctid,d,e,f,g,j,k,tableoid,xmax,xmin,y,z FROM foo;
-[ RECORD 1 ]--------
a        |
b        |
c        |
cmax     | 0
cmin     | 0
ctid     | (0,1)
d        |
f        |
g        |
j        |
k        |
tableoid | 44309
xmax     | 0
xmin     | 2075116
y        |
z        | test tuple

Clearly, you can manipulate the query to build something that allows you to choose between user columns and system columns, for example:

testdb=> WITH user_columns AS ( SELECT attname FROM pg_attribute
                                WHERE attrelid = 'foo'::regclass AND attnum > 0
								AND NOT attisdropped
								ORDER BY 1 )
, system_columns AS ( SELECT attname FROM pg_attribute
                      WHERE attrelid = 'foo'::regclass AND attnum < 0
					  AND NOT attisdropped
					  ORDER BY 1 )
SELECT string_agg( c.attname, ', ' )
FROM user_columns c;
           string_agg
---------------------------------
 a, b, c, d, f, g, j, k, y, z
(1 row)

You can even build something a little more complex, in order to get for instance the definition of a trigger (or something like that):

testdb=> WITH user_columns AS ( SELECT attname FROM pg_attribute
                                WHERE attrelid = 'foo'::regclass AND attnum > 0
								AND NOT attisdropped
								ORDER BY 1 )
, system_columns AS ( SELECT attname FROM pg_attribute
                      WHERE attrelid = 'foo'::regclass AND attnum < 0
					  AND NOT attisdropped
					  ORDER BY 1 )
, user_columns_list AS ( SELECT string_agg( c.attname , ',' ) as l
                         FROM user_columns c )
SELECT 'CREATE TRIGGER tr_foo_ins '
      || ' BEFORE UPDATE OF '
	  || ucl.l
	  || ' ON foo  FOR EACH ROW EXECUTE PROCEDURE f_tr_foo_ins() '
FROM user_columns_LIST ucl;

         ?column?
--------------------------------------------------------------------------------------------------------------------------
 CREATE TRIGGER tr_foo_ins  BEFORE UPDATE OF a,b,c,d,f,g,j,k,y,z ON foo  FOR EACH ROW EXECUTE PROCEDURE f_tr_foo_ins()

This can be pushed into a function or a EXECUTE dynamic query to provide a dinamically generated statement.

PgTraining Online Event 2023: available material

2023-04-28T00:00:00+00:00

Links to the material of our last free online event related to PostgreSQL!

PgTraining Online Event 2023: available material

On April 2023, the 14th, we had another free online event related to PostgreSQL, namely PgTraining Online Event 2023. The recording material and the slides are now available:

Il linguaggio PL/Perl, by yours truly, slides and the video
Il linguaggio PL/Python, by Chris Mair, slides and video
Le novità di PostgreSQL 15, by Enrico Pirozzo, slides

Talks are listed in the same order they appeared in the afternoon on the live stream.

I would like to thank my friends Enrico and Chris for being able, again, to provide with me a very nice and interesting mini-conference.

We hope to be able to deliver much more PostgreSQL related content, and in the case you are interested and have aparituclar subject you would like us to talk about, please feel free to contact us!

How much does it take to compile PostgreSQL (on my machines)?

2023-04-11T00:00:00+00:00

A few considerations on how fast (or slow) it can be to compile PostgreSQL

How much does it take to compile PostgreSQL (on my machines)?

A couple of months ago I bought a new laptop, an Acer Aspire 5 A515-45-R9EC, with an AMD Ryzen 5 500U CPU. It is the first Ryzen processor I never had, so I was curios to see how it performs, and what a better test approach for me than compiling PostgreSQL from scratch?
I fired up pgenv and did a simple pgenv build 15.2 to compile the whole PostgreSQL 15.2 distribution. I was quite happy with my results, and I posted the following

My new "mule" laptop with AMD Ryzen 5 5500U
compiles #emacs 28.2 in 86 secs and #PostgreSQL 15.2 in 226 secs. There's no time for a coffee anymore!

Then a friend of mine pointed out that they were not great times, sob!
This triggered some curiosity, so I decided to compare my three hardware machines to see how they perform. I report a single time example, but all times are pretty much stable and could change only by means of a couple of seconds (in other words, this is not a real benchmark!):

CPU	cores	thread	compilation type	seconds
Intel i5-5257U	2	4	make -j	166
Intel i5-10500T	6	12	make -j	57
AMD Ryzen 5 5500 U	6	12	make -j	61

The usage of make -j is to use all the available parallelism possible on the machine, and in the meantime all the computers were not doing anything else. The slowest is, as obvious, the machine with two cores. It is interesting to note that all the machine are low energy consumption, therefore they are not performing as well as a desktop or server environment.

I then decided to comile another beast: my favourite editor Emacs 28.2!

CPU	cores	thread	compilation type	seconds
Intel i5-5257U	2	4	make -j	76
Intel i5-10500T	6	12	make -j	31
AMD Ryzen 5 5500 U	6	12	make -j	33

Again, times are not so bad after all. But the above results are with the maximum parallelism, what happens in normal conditions?

CPU	cores	thread	compilation type	seconds to compile PostgreSQL 15.2	seconds to compile Emacs 28.2
Intel i5-10500T	6	12	make	211	106
AMD Ryzen 5 5500 U	6	12	make	193	92

As a final note, if I virtualize the Intel i5-10500T as a single CPU with as a single core, the time to compile PostgreSQL grows to 250 seconds. Clearly, such time is comparable with the make sequential on the same CPU.

pgagroal: setting configuration at run-time

2023-04-06T00:00:00+00:00

A new feature added to pgagroal that allows users to dynamically change the configuration.

pgagroal: setting configuration at run-time

I’m happy since today my contribution to pgagroal has been merged. The last year I added the config-get command to pgagroal-cli: such command allowed users to get information about how pgagroal was configured.

The natural improvement over the above work would have been the config-set command, and now pgagroal-cli has one (see this commit ! It took me a few months to complete the work, since I was very busy on my day job: I had a working prototype working before Christmas, but then I let it there for the future me to have some time to complete the effort. And in the last month, I had some spare time, so I completed it!

pgagroal-cli config-set

The new command allows to dynamically change some configuration values. Clearly, not everything can be changed at run-time without a daemon restart. I wanted the command to be useful also in automating scripts, so I thought it could be useful for the command to report back the actual value of a configuration parameter. Therefore, checking the desiring value and the obtained value can confirm if the change has been applied or not.

Similarly to the config-get command, also config-set accepts contexts:

pgagroal (or nothing) means that the specified configuration parameter is within the [pgagroal] configuration section;
limit means that the user requested to change a limit entry;
server the user wants to change something about a server section;
hba the user wants to change an HBA entry.

In the case of limit, hba and server contexts, the entry to modify must be identified with a name. While in a server configuration the name must be unique, the limits and hbas could not have unique names, so the first match wins.

As an example, imagine the user wants to change the max_connection setting. Such setting is within the [pgagroal] section, so the following are two identical commands:

$ pgagroal-cli config-set max_connections 100
40

$ pgagroal-cli config-set pgagroal.max_connections 100
40

In both the above cases, the system is returning 40 instead of the desired value 100. That’s normal, since the max_connections requires a restart of the daemon, and can be better understood using the --verbose flag:

$ pgagroal-cli config-set max_connections 100 --verbose
max_connections = 40
pgagroal-cli: Error (2)

Clearly the system reports there was an error, and provides the information that max_connections is set at 40.

Another example could be when the user decides to change a server or limit value, where he has to specify the context:

$ pgagroal-cli config-set server.venkman.port 6432
6432

$ pgagroal config-set limit.pgbench.max_size 2
2

In the above, the user changes the venkman server port, and the pgbench user limit entry.

If the system, that is the pgagroal daemon, cannot apply the requested change on the fly, the logs will be populated accordingly:

DEBUG Trying to change main configuration setting <max_connections> to <100>
INFO  Restart required for max_connections - Existing 40 New 100
WARN  1 settings cannot be applied
DEBUG pgagroal_management_write_config_set: unable to apply changes to <max_connections> -> <100>

Clearly the messages may vary depending on the log level configuration.

For more information, please see the official documentation .

PgTraining online webinar on 2023-04-14 (Italian): schedule available!

2023-04-02T00:00:00+00:00

Yet another online event organized by PgTraining!

PgTraining online webinar on 2023-04-14 (Italian): schedule available

PgTraining, the amazing italian group of people that spread the word about PostgreSQL and that I joined in the last years, is organizing another online event (webinar) on next 14th April 2023.
The schedule of the event will be as follows:

3 pm: gathering and welcoming the participants;
3:15 pm: Il linguaggio PL/Perl, by yours truly;
4 pm: Il linguaggio PL/Python, by Chris Mair;
4:45 pm: Nuove funzionalità di PostgreSQL 15, by Enrico Pirozzi;
5:30 pm: closing.

The afternoon is *for free, but registration is required so hurry up and get your free ticket (seats are limited).

Packt will offer a discount for buying a copy of the book Learn PostgreSQL by Luca Ferrari and Enrico Pirozzi.

So, what are you waiting for? There are no reasons to skip this event!

PgTraining online webinar on 2023-04-14 (Italian)

2023-02-21T00:00:00+00:00

Yet another online event organized by PgTraining!

PgTraining online webinar on 2023-04-14 (Italian)

PgTraining, the amazin italian group of people that spread the word about PostgreSQL and that I joined in the last years, is organizing another online event (webinar) on next 14th April 2023.
Following the success of the previous edition(s), we decided to provide another afternoon full of PostgreSQL talks, in the hope to improve the adoption of this great database.

The event will consist in three hours with talks about PL/Perl, PL/Python and all things news in PostgreSQL 15.
As for the previous editions, the webinar will be presented in Italian. Attendees will be free to actively participate and do questions both during the talks and at the end of the whole event.

In the pure spirit of PgTraining, the event will be free of charge, but it is required to register for participate and the number of available seats is limited, so hurry up and get your free ticket as soon as possible!
The material will be available for free after the event has completed, but no live recording will be available.

Invoking (your own) Perl from PL/Perl

2023-02-06T00:00:00+00:00

A glance at how to invoke Perl code within PL/Perl code.

Invoking (your own) Perl from PL/Perl

Invoking your own Perl code from PL/Perl, how hard can it be?
Well, it turns out that it can be harder than you think. PL/Perl is made to allow Perl interacting with the SQL world. Assume a function, named get_prime requires to invoke another Perl function is_prime to test if a number is prime or not.
How is it possible to chain the function invocation?

Invoking PL/Perl from PL/Perl via a query

One obvious possibility is to wrap is_prime into a PL/Perl function. Since a PL/Perl function is, well, an ordinary function, it is always possible to call it the SQL way, as another ordinary function.

CREATE OR REPLACE FUNCTION is_prime( int )
RETURNS bool AS $CODE$
  my ( $n ) = @_;
  for ( 2 .. $n ) {
    last if $_ >= ( $n / 2 ) + 1;
    return 0 if $n % $_ == 0;
  }
  return 1;
$CODE$ LANGUAGE plperl;

CREATE OR REPLACE FUNCTION get_primes( int )
RETURNS SETOF int AS $CODE$
  for ( 1 .. $_[ 0 ] ) {
    my $result_set = spi_exec_query( "SELECT is_prime( $_ )" );
    return_next( $_ ) if ( $result_set->{ rows }[0]->{ is_prime } eq 't' );
  }
  return undef;
$CODE$ LANGUAGE plperl;

The function get_primes builds a query (SELECT is_prime( $_ );) that is executed several times in order to get the result.
Advantages of this approach are that this is the natural way to query PostgreSQL functions, and therefore it would be possible to mix and match PL/Perl functions with other PL-functions. The main drawback is that this approach is tedious and error prone, since there is the need to build SQL queries. Moreover, handling invocation and argument passing will slow down the execution of the main function.

Using anonymous subroutines

Luckily, Perl allows the definition of a subroutine within another subroutine, and to call it when required. One way to achieve this is by code references.

CREATE OR REPLACE FUNCTION get_prime( int )
RETURNS SETOF int AS $CODE$
  my $is_prime = sub {
	 my ( $n ) = @_;
	 for ( 2 .. $n ) {
	   last if $_ >= ( $n / 2 ) + 1;
	   return 0 if $n % $_ == 0;
	 }
	 return 1;
  };

  for ( 1 .. $_[ 0 ] ) {
    return_next( $_ ) if $is_prime->( $_ );
  }

  return undef;
$CODE$ LANGUAGE plperl;

This makes very easy and natural to invoke Perl code (in this example, the $is_prime function) within PL/Perl. The main advantage of this approach is that it is all written in pure Perl. The main drawback is that the $is_prime function is now private to the scope of the get_prime function, and therefore cannot be reused by other functions.

Injecting a line of code at PL/Perl boot

PostgreSQL provides a set of plperl GUCs that allow you to set different properties of the Perl enrivornment. One possibility, is to pre-declare a function so that once the PL/Perl engine will run, the function will be there. Unluckily, GUCs do not allow for a setting to be split on different lines. Luckily, Perl is not Python, so you can write your own code in a single line.

# in postgresql.conf
plperl.on_plperl_init = 'sub is_prime { ...  }'

Therefore, when the PL/Perl engine starts, it gets the is_prime sub defined for free. This means that get_rpimes can be simply written as:

CREATE OR REPLACE FUNCTION get_primes( int )
RETURNS SETOF int AS $CODE$
   return [ grep { is_prime( $_ ) } ( 2 .. $_[ 0 ] ) ];
$CODE$ LANGUAGE plperl;

The main advantage of this approach is that it is simple. The main drawback is that it is complex, too. Writing code within a single line is not a good habit. Most notably, this makes the code declared into plperl.on_plperl_init available to every instance of the PL/Perl engine, and this is a security risk!

Injecting a module at PL/Perl boot

Following a similar approach, it is possible to place your custom code into a module and make PL/Perl use such module before the execution starts. The first step required is to provide a Perl module and place it where PostgreSQL can find on the filesystem.

# $PGDATA/conf.d/fluca1978.pm
sub is_prime {
	 my ( $n ) = @_;
	 for ( 2 .. $n ) {
	   last if $_ >= ( $n / 2 ) + 1;
	   return 0 if $n % $_ == 0;
	 }
	 return 1;
}

1;

Now, it is possible to load the module either in plperl.on_init or in plperl.on_plperlu_init:

plperl.on_init = 'use lib q{/postgres/15/data/conf.f/fluca1978.pm}; use fluca1978;

Last, since the module has been loaded for every PL/Perl engine, the get_primes function remains as simple as:

CREATE OR REPLACE FUNCTION get_primes( int )
RETURNS SETOF int AS $CODE$
  return [ grep { is_prime( $_ ) } ( 2 .. $_[ 0 ] ) ];
$CODE$ LANGUAGE plperl;

In the case plperl.on_init does not include the use statement, the function get_primes should have been defined as plperlu loading the module itself.

The main advantage of this approach is that it provides modularity of available code. The main drawback, as for the previous approach, is that it injects code into every PL/Perl engine that is going to be started.

Using shared code

PL/Perl provides the %_SHARED hash that is shared among functions running within the same connections and with the same user. This allows for storing an anonymous subroutine into the %_SHARED object and use it later.

The first step is to initialize the hash with the anonymous subroutine:

CREATE OR REPLACE FUNCTION my_plperl_init()
RETURNS VOID AS $CODE$
  $_SHARED{ is_prime } = sub {
	     my ( $n ) = @_;
	     for ( 2 .. $n ) {
	       last if $_ >= ( $n / 2 ) + 1;
	       return 0 if $n % $_ == 0;
	     }
	     return 1;
  };
$CODE$ LANGUAGE plperl;

Then, it is possible to make get_primes to use the shared reference:

CREATE OR REPLACE FUNCTION get_primes( int )
RETURNS SETOF int AS $CODE$
   return [ grep { $_SHARED{ is_prime }->( $_ ) } ( 2 .. $_[ 0 ] ) ];
$CODE$ LANGUAGE plperl;

The main adavantage of this approach is that it does not require a full code injection: only sessions that require the code to be shared will use it. The main drawback is that there is no real sharing of code, it is just temporary code living in the session space. Moreover, this approach requires an initialization phase, that can be error prone.

Conclusions

Being PL/Perl, well, Perl, there’s more than one way to do it!
Depending on the aims and constraints, there are different ways to invoke Perl code from other PL/Perl code. The main considerations, when choosing an approach or another, are related to code reusability and performances. Wrapping Perl into PL/Perl provides for better code reusability, but requires more resources and code bloating. Using a pure Perl approach provides for the best performances and code readibility, but can open the door to some security risks.

PostgreSQL command line colors!

2023-01-23T00:00:00+00:00

A simple way to make more attractive the PostgreSQL command line interface!

PostgreSQL command line colors!

Did you know that PostgreSQL tools can, under specific circumstances, display colors?
Well, I didn’t know until I came across this section in the documentation that explains it.
There are two different environment variables named PG_COLOR and PG_COLORS respectively. The first (note the singular) decides if the colors have to be activated or not, while the second contains the sequence of colors.
Clearly, colors are related to errors and other messages regarding a tool and not SQL errors!

Let’s see this in action:

As you can see, after setting PG_COLOR to always, both psql and pg_dump show the error with a red color and the message tag with a bold face. You can change the default color behaviour by setting the values in the PG_COLORS environment variable, so for example you can turn the errors to purple:

The PG_COLORS variable is a string that contains the log level (e.g., error) followed by the color code (e.g., 01;31 means bold red). The same color palette that you apply in shell and printf(2) like escape sequences can be applied to PG_COLORS variable. You can even make text blinking:

As far as I understand, the command line tools adapts to the colors thru the logging subsystem.

Handling NULLs and Empty values in PL/Perl

2023-01-17T00:00:00+00:00

How to correctly detect an SQL NULL value in PL/Perl.

Handling NULLs and Empty values in PL/Perl

Perl has a very simple concept of truth: everything that is a non-empty, non-zero value is true!
It’s that simple!

The problem with PL/Perl, the PostgreSQL internal language, is that SQL provides NULL values, that somehow are equivalent to Perl undef values. But unlike Perl, in SQL an empty string or a 0 value is not NULL!
This implies that if you pass a NULL value to a PL/Perl function (or piece of code), Perl will evaluate them as undef, and this is good. The problem is that SQL values 0 and '' (empty string) will be treated by Perl as false values while they are not.
Luckily, the rule is simple: use Perl’s defined operator to see if a value is NULL in the SQL sense.

Let’s see this with a very trivial code example:

CREATE OR REPLACE FUNCTION
plperl_catch_nulls( int, text )
RETURNS VOID
AS $CODE$
   my $arg = 1;

  for ( @_ ) {
      elog(INFO, "Input argument number $arg [$_] is false (as Perl)" ) if ! $_;
      elog(INFO, "Input argument number $arg [$_] is NULL (as SQL)" ) if ! defined $_;
      elog(INFO, "Input argument number $arg is valid [$_]" ) if defined( $_ ) && $_;
      $arg++;
  }


$CODE$
LANGUAGE plperl;

Let’s try the function with a few different set of arguments:

testdb=> select plperl_catch_nulls( 19, 'Hello World!' );
INFO:  Input argument number 1 is valid [19]
INFO:  Input argument number 1 is valid [Hello World!]
 plperl_catch_nulls
--------------------

(1 row)

testdb=> select plperl_catch_nulls( 0, '' );
INFO:  Input argument number 1 [0] is false (as Perl)
INFO:  Input argument number 1 [] is false (as Perl)
 plperl_catch_nulls
--------------------

(1 row)

testdb=> select plperl_catch_nulls( NULL, NULL );
INFO:  Input argument number 1 [] is false (as Perl)
INFO:  Input argument number 1 [] is NULL (as SQL)
INFO:  Input argument number 1 [] is false (as Perl)
INFO:  Input argument number 1 [] is NULL (as SQL)
 plperl_catch_nulls
--------------------

(1 row)

As you can see, when the argument is NULL the branches false (as Perl) and NULL (as SQL) are both triggered, while not NULL values are triggered only by the defined branch.

If you like the ease of thinking of Perl, and do not want to go deep into the NULL/defined stuff, you can define your function as STRICT that makes PostgreSQL preventing the function invocation when at least one argument is NULL.

From Numbers to Words using Perl (and Lingua::)!

2023-01-12T00:00:00+00:00

How to convert a digit into a sentence with the power of Perl.

From Numbers to Words using Perl (and Lingua::)!

A few days ago I came across a question on Facebook regarding the conversion of a number into its english representation. First of all, I hate Facebook with a passion and I strongly encourage people that have questions related to PostgreSQL to use mailing lists and IRC channels.
Despite that, how can we convert a number, let’s say 19 to nineteen?
The first thing that came into my mind was the excellent Lingua:: set of Perl modules. And since Perl is a very well supported language into every vanilla PostgreSQL instance, why not create a simple wrapper in PL/Perl?
So, as trivial, as it is, here you can find a simple implementation to translate a number into an english sentence.

Installing `Lingua::EN::Numbers`

Before you can use my wrapper, you need to install the Perl module Lingua::EN::Numbers in your system, so that PostgreSQL can find it. One easy way to achieve this is by means of cpanm:

% cpanm Lingua::EN::Numbers

Once the module is installed, you can install the procedure.

The `num2words` PL/Perl function: a first simple implementation

The function is a simple wrapper around the super powerful num2en function loadable via Lingua::EN::Numbers. Since the PL/Perl function requires an external module, the function has to be defined as plperlu, therefore potentially unsafe.

CREATE OR REPLACE FUNCTION
	num2words(  numeric default 0 )
RETURNS text
STRICT
AS $CODE$
   use Lingua::EN::Numbers qw/ num2en /;

   my ( $number ) = @_;

   num2en( $number );
$CODE$
LANGUAGE plperlu;

The function is defined as STRICT, and therefore it does return NULL on NULL input. The function accepts a number input, therefore even a very large number, and passes it to num2en function, returning the result.
As an example of invocations:

testdb=> select num2words();
 num2words
-----------
 zero
(1 row)
testdb=> select num2words( 19071978 );
                               num2words
------------------------------------------------------------------------
 nineteen million, seventy-one thousand, nine hundred and seventy-eight
(1 row)
testdb=> select num2words( 1.23 );
      num2words
---------------------
 one point two three
(1 row)

The `num2words` function: a multi-language approach

It is possible to improve the function to support multiple languages, for example specifying the language as an argument. One problem is that not all the Lingua::*::Numbers behave the same, so it is not simple to dynamically load a module and a function depending on the input argument. For example, in the English module the function to use is num2word while in the Italian language is number_to_it, so there is not a well established pattern name.
Here there is a multilanguage implementation:

CREATE OR REPLACE FUNCTION
	num2words( numeric default 0, text default 'en' )
RETURNS text
STRICT
AS $CODE$
   my ( $number, $language ) = @_;
   $language = 'en' unless( $language );

   if ( $language =~ /^en(glish)?$/i ) {
      use Lingua::EN::Numbers qw/ num2en /;
      num2en( $number );
   }
   elsif( $language =~ /^it(alian)?$/i ) {
     use Lingua::IT::Numbers qw/ number_to_it /;
     number_to_it( $number );
   }
   else {
   	elog( NOTICE, "Unsupported language $language" );
	return undef;
   }
$CODE$
LANGUAGE plperlu;

The function accepts the locale (two letters) or the full language name (e.g., italian) in a case insensitive manner. In every branch of the if-elsif-else the appropriate module is loaded, and then the appropriated function is evaluated.
Clearly, the function needs to access every language specific module, so before you can use the above PL/Perl function you need to install the Perl modules on the machine:

% sudo cpanm Lingua::EN::Numbers
% sudo cpanm Lingua::IT::Numbers

Conclusions

I love the capability to melt Perl into PostgreSQL, giving the best of two worlds in a very quick and smart way!

Using table aliases in UPSERTs

2022-12-07T00:00:00+00:00

A simple way to achieve a “counter-like” using UPSERTs.

Using table aliases in UPSERTs

PostgreSQL has had the UPSERT feature since a while, now somehow overtaken by the MERGE command.
One interesting feature of UPSERT is that it can quickly help you to implement a counter-like approach, but often you need to use table aliasing in order for the feature to be able to accumulate results.
Allow me to explain with a very simple example: we want to count the occurrencies of every letter into a bunch of text, and we want it to be able to accumulate between different calls. One possible solution is to use a table, even a TEMPORARY one, to store the letter and a counter, and then populate such table.

With an UPSERT the thing reduces to:

testdb=> create temp table letters( l char primary key, c int default 0 ) on commit delete rows;
CREATE TABLE

testdb=> begin;
BEGIN

testdb=*> with flow as ( select l, count(*) as n
               from regexp_split_to_table( 'fhcgfeettrrzz', '' ) l
               group by l )
insert into letters as mem  -- table alias
select l, n
from flow
on conflict( l )
do update set c = mem.c + excluded.c; -- accumulate !
INSERT 0 8

-- repeat with different sources of text

testdb=*> table letters;
 l | c
---+----
 a | 12
 b | 20
 r |  2
 z |  2
 g |  1
 c |  1
 t |  2
 h |  1
 f | 18
 e | 18
(10 rows)

The flow CTE is simulating our flow of letters, and provides an instant counting of the occurencies. We want such counting to accumulate into the letters table, but we want to use the UPSERT feature. When a conflict is found, over the primary key l, the INSERT degenrates into an UPDATE and the c counter must be increased by the value of the excluded tuple counter. Here arise the problem: how can we refer to the current value of the counter? We need a table aliasing, in our case mem to refer to the current tuple.
If we omit the mem.c from the update statement, the query will refuse to work because there is ambiguity on which tuple column we are referring to:

testdb=*> with flow as ( select l, count(*) as n
               from regexp_split_to_table( 'fhcgfeettrrzz', '' ) l
               group by l )
insert into letters as mem
select l, n
from flow
on conflict( l )
do update set c = c + excluded.c
;
ERROR:  column reference "c" is ambiguous
LINE 8: do update set c = c + excluded.c

Note that the column name on the left of the assignement does not need to be disambiguated, since it is clear that we are referring to the conflicting tuple we tried to insert.

pgagroal: getting run-time configuration

2022-11-30T00:00:00+00:00

A new command to interactively get the pgagroal runtime configuration.

pgagroal: getting run-time configuration

pgagroal, the fast connection pooler for PostgreSQL, is gaining new features! In this commit I introduced a new command for pgagroal-cli: config-get. Such command allows the user to specify the name of a configuration parameter and get back the value the pooler is using. As an example:

% pgagroal-cli get-config max_connections
300

% pgagroal-cli get-config max_connections --verbose
max_connections = 300

When the command is invoked with the --verbose flag, the application respond with a full configuration line that can then be copied and pasted into a new configuration file.

INI sections

The config-get command allows also for the specification of sections, for example if you pgagroal.conf configuration file is like the following:

% cat pgagroal.conf
[pgagroal]
...

[venkman]
host = 192.168.2.2
port = 5432
primary = off

it is possible to query the command with the section name, and the application will dig into the INI file section:

% pgagroal-cli config-get venkman.host
192.168.2.2

Thanks to this, it is possible to get all the main configuration out of pgagroal-cli.

Limit and HBA entries

Similarly to the sections, the config-get command allows the specification of parameters to search for a limit entry or an HBA entry. In such case, the key to search for is the username to match for an HBA entry and the database to match for a limit entry. The special key prefix limit or hba allows the command to understand where to dig. As an example:

% pgagroal-cli config-get limit.testdb.max_size
10

% pgagroal-cli config-get hba.luca.method
md5

See the pgagroal-cli documentation for more examples and details.

```

PostgreSQL scary settings: data_sync_retry

2022-11-24T00:00:00+00:00

A look at how this setting works.

PostgreSQL scary settings: data_sync_retry

Is this a scary setting? Well, of course no!
However, it is a setting that you should not touch unless you are really, really aware of what you are doing.

data_sync_retry is a setting that instrument the cluster to retry after an fsync failure related to data pages. What does that mean? As we all know, PostgreSQL has to flush data, sooner or later, from memory to the data files, and this happens with fsync(2), an operating system call that forces the data to be flushed from memory to the filesystem layer and, hopefully, to the disk into the data files.
Clearly, PostgreSQL cannot allow any data loss, even of one bit, and therefore the system takes great care about what happens when flushing data. In normal circumstances, if fsync(2) fails, data has not been written to disk and therefore there is nothing PostgreSQL can do about it. Since this is a huge problem, PostgreSQL issues a PANIC and crashes. That’s not so good, but it is safe after all, since it means we are going to recover from the WALs and therefore we are not going to loose any data.
In the case we force a retry, by means of setting data_sync_retry to on, PostgreSQL will not crash, so that later on the flush-to-disk could be retried (hence the name of this setting).

So, why you should not enable such a great setting that promises to you to avoid crash even when fsync(2) fails?
The problem is that, after an fsync(2) failure, the kernel pacge cache status could be unknown. This means that the page, that was moved to the page cache (i.e., filesystem layer) to be flushed, could have been removed from the kernel cache even if did not hit the disc (because fsync(2) was responsible for this, and it failed). In such situation, PostgreSQL could have discarded the dirty page from the shared buffers, the kernel could have thrown away the page, and the page did not hit the disc. At this point, a retry happens, and the operating system reports a success, making things even worst! And here’s where the data loss happens!

How does PostgreSQL handles the above?

If you take a look at the code, in particular at the function data_sync_elevel:

int
data_sync_elevel(int elevel)
{
	return data_sync_retry ? elevel : PANIC;
}

you will see that, unless the data_sync_retry option is enabled, the system returns a PANIC. Such function is used whenever a call to fsync(2) stuff happens, for example :

rc = sync_file_range(fd, offset, nbytes,
					 SYNC_FILE_RANGE_WRITE);
if (rc != 0)
{
	int			elevel;

    ...
    elevel = data_sync_elevel(WARNING);

	ereport(elevel,
			(errcode_for_file_access(),
			 errmsg("could not flush dirty data: %m")));
}

For short: when there is an error in data syncing, PostgreSQL tries to understand at which level it must emit the problem. Here the system decides, thru data_sync_level to choose for WARNING (in the case there will be another try) or the PANIC (the default).

Conclusions

Similarly to what happens for fsync, that you should never change and keep always set to on, the data_sync_retry setting must be never touched and should always retain its default value off. Turning on this parameter could result in data loss, and surely does not provide you any performance benefit.
An interesting thread to read about that provides interesting concepts and explaination.

Emacs(client) as editor in psql

2022-11-16T00:00:00+00:00

At last I found a way to use Emacs (as a client) within psql!

Emacs(client) as editor in psql

psql is an amazing interactive SQL swiss-army knife terminal thingy that can really tunr your day! Quite frankly, in my professional training activity, I always tell participants to learn to use psql for several reasons, and I also ask them if any of their interactive SQL-terminals has the same set of features, without having back an answer!

One nice thing that psql provides, is the capability to edit a complex query directly within your editor of choice.

My editor of choice is Emacs!

Starting Emacs every time I have to edit a query buffer (via \e command) is awkward: even on recent hardware, Emacs startup is slow. Thankfully, there is a trick: Emacs embeds a client-server approach. And it is a real client-server approach.

The idea is to have Emacs started as a daemon, and every time you need a new editor frame (ehm, every time you need to edit something), you can invoke the special command emacsclient asking to attach to the daemon running. As a result, while the daemon startup is as slow as a normal emacs instance is, the emacsclient editing session is shiningly fast!

So far so good! Ehm, no, well, not for me. I had a lot of troubles in trying to configure psql to use emacsclient as an editor. And, shame on me, I had troubles because I was using the wrong set of flags to launcha emacsclient! To make things worst: I’m using ZSH as my default shell, and for some strange reason I need to investigate on, the shell does not work really well with commands and flags within the same environment variable.

TL;DR - How to do that?

There are two ways to change the default editor that psql is going to use to edit a query buffer:

set the well known EDITOR environment variable;
set the psql specific PSQL_EDITOR environment variable.

Since I’m a whole-Emacs kind of guy, I decided for the first, so that whenever I need to use $EDITOR, I will be dropped into my comfortable Emacs environment.

Therefore, in order to achieve this, after some research on the flags, simply put the following in your .zshrc configuration file:

export EDITOR="emacsclient -t"

or do the same with PSQL_EDITOR.

Then, when you are in the psql application, hit \e and look at Emacs quickly appear. The very first time, when you will come back to your psql session, you will notice a few lines of warnings coming out from Emacs itself:

testdb=> \e
emacsclient: can't find socket; have you started the server?
emacsclient: To start the server in Emacs, type "M-x server-start".
Starting Emacs daemon.
Emacs daemon should have started, trying to connect again
testdb=>

That’s because, at the very first time, emacsclient does not find any emacs instance running as a daemon, so it starts one by itself, waits a moment, and connect back to the server. This is clearly explained in the messages, that are therefore just warnings.

What about ZSH (and Bash)?

One of the reason it took me so long to understand how to configure emacsclient, was that for some reasons I don’t know (yet), ZSH behaves nastly when you try to launch a command within an environment variable, assuming such variable contains spaces. Why? Because in order to set the variable with spaces, you have to quote your command line, and at that point ZSH assumes the whole string is a single command:

% echo $EDITOR
emacsclient -t

% $EDITOR
zsh: command not found: emacsclient -t

The same does not happen in Bash, that behaves as expected:

% bash
$ echo $EDITOR
emacsclient -t
$ $EDITOR
...

So, I’ve also found a place where Bash seems to behave more intuitively than ZSH is (and no, I’m not going to switch back to Bash for this reason!).

Colors and themes!

I tend to use a dark color scheme in all my activities, included terminals I use to SSH-in machines. In these circumstances, Emacs has a very poor default color choice. Assuming you don’t have a specific Emacs startup configuration (and you should, believe me!), you can add a line like the following to one of your startup files (e.g., ~/.emacs or ~/.emacs.d/init.el):

(load-theme 'tango-dark)

This will make your Emacs experience on dark terminals a lot better!

Please consider that changes will not be reflect into any running daemon, so you have to restart Emacs (or re-evaluate) the startup files once you have changed!

Conclusions

Emacs, thanks to its client-server component, can really improve your already awesome experience within psql! For instance, you can use external tools to reformat your queries while you are writing them in the editor.

For more documentation on how to customize emacsclient see the official documentation.

PostgreSQL 15: logging in JSON

2022-10-21T00:00:00+00:00

PostgreSQL 15 has now the capability to output logs in JSON format!

PostgreSQL 15: logging in JSON

The freshly released PostgreSQL 15 introduces a lot of new features and improvements, but one, according to me, is going to change the way our favourite database is monitored: the capability to log daemon status in JSON.

Essentially, the log_destination configuration parameter now has another enumerated value: jsonlog. When this value is added to log_destination, PostgreSQL will start to emit JSON structured logs. Here it is a simple configuration example:

% grep log_destination /postgres/15/data/postgresql.conf
log_destination = 'stderr,jsonlog'

and this is how the log directory appears right after the configuration has been reloaded:

% sudo -u postgres ls /postgres/15/data/log -1
postgresql-Fri.json
postgresql-Fri.log

Clearly the logs will contain the same values, but in different formats.

There is more: when there’s more than one value set in log_destination, PostgreSQL will store a file named current_logfiles, where each line will represent the format and the current logfile where PostgreSQL has to store the data:

% sudo -u postgres cat /postgres/15/data/current_logfiles
stderr log/postgresql-Fri.log
jsonlog log/postgresql-Fri.json

In this way, not only PostgreSQL, but even the sysadmin can keep track of where the system is going to log right now, and this is useful especially when there’s a log rotation in place.

On the SQL side, the function pg_current_logfile() can optionally accept the log format (the same specified in log_destination) and provide the current log file depending on the choosen format:

testdb=# select pg_current_logfile();
   pg_current_logfile
------------------------
 log/postgresql-Fri.log

testdb=# select pg_current_logfile( 'jsonlog' );
   pg_current_logfile
-------------------------
 log/postgresql-Fri.json

testdb=# select pg_current_logfile( 'stderr' );
   pg_current_logfile
------------------------
 log/postgresql-Fri.log

I suspect we will see more and more log crunching applications to switch over the new JSON log format!

PostgreSQL ASCII numeric operators

2022-10-10T00:00:00+00:00

PostgreSQL has some special ways to provide numeric opeators by means of ASCII chars.

PostgreSQL ASCII numeric operators

PostgreSQL has some ASCII numeric representations of commonly used numeric operators. It could be not well know, since I suspect pretty much everyone is using the function operators, and moreover it is not so simple to find them in the documentation by means of a searching for.

In any case, here they are:

|/ is the same as sqrt;
||/ is the same as cbrt;
@ is the same as abs.

An of course, it is quite easy to test such operators in action:

testdb=> SELECT
            sqrt( 81 ) as sqrt,
            |/ 81 as root,
            cbrt( 1000 ) as cube_root,
            ||/ 1000 as root3,
            abs( -19 ) as abs,
            @ -19 as absolute;

-[ RECORD 1 ]-
sqrt      | 9
root      | 9
cube_root | 10
root3     | 10
abs       | 19
absolute  | 19

Quite frankly, I believe the function operators are more readable, in particular since I’ve never seen (yet) such operators in other programming languages.

pgenv 1.3.2 is out!

2022-09-20T00:00:00+00:00

A new release of the PostgreSQL virtual environment manager.

`pgenv` 1.3.2 is out!

Today we released version 1.3.2 of pgenv, the binary manager for PostgreSQL.
This release fixes a quite subtle bug in the handling of the configuration that prevented custom settings to be correctly loaded back into the running system. Users are encouraged to upgrade as soon as possible.

A description of the problem

17bob17 noticed the issue: when you edited your configuration file, either the default or a per-version one, and changed settings in a (Bash) array, the configuration was not correctly loaded.
It took a lot of time to figure out that the problem was not directly in the way the configuration was loaded, rather in the way the configuration was stored.
When pgenv acquired the configuration settings as arrays, it started using declare -p as a way to print out a Bash compatible representation of the array, and such representation was stored in the configuration file. The problem was that declare -p assumes you want to use declare back when you re-evaluate the variable (array), and so placed a declare -a as the output.
The configuration is then loaded within the pgenv_configuration_load function, and declare run into a function has the same effect as local, that is it lexically scope the variables. Therefore, as soon as pgenv_configuration_load ends its job, the lexically scoped variables are gone and the old (previous) one are kept with their default value. It is a boring masquerading problem due to inner contexts.
One possible solution could have been to use -g as a flag to declare, so to force the variable to be global and therefore not lexically scoped, but such flag is not everywhere in different Bash versions and implementation.
The -x flag to declare, to export the variable, did not have any effect too.

Therefore, the current release removes the use of declare at all when the configuration is sourced back (loaded).

pgagroal 1.5.0 released!

2022-09-15T00:00:00+00:00

A new release of the pgagroal connection pooler.

`pgagroal` 1.5.0 released!

[pgagroal](https://agroal.github.io/pgagroal/){:target=_blank} is a fast connection pooler for PostgreSQL, written in the C language.
A couple of weeks ago, a new release, the 1.5.0 was released. I’m writing about this just now because I was on holidays!
The new release brings a new set of features, in particular a lot of small checks within the configuration file setup (e.g., avoiding duplicated servers or wrong parameters) and a lot of new loggin capabilities, including log rotation and log line prefix.
Other areas of improvements include code clean-up, shell completion for command line tools, and portability towards FreeBSD and OpenBSD systems.
Last but not least, a new set of tutorials will help the newcomers to correctly start using pgagroal!

Shell completions for pgagroal

2022-08-19T00:00:00+00:00

A small patch to ease the use of pgagroal tools.

Shell completions for `pgagroal`

In the beginning of the current month I pushed a commit that introduces shell completions for pgagroal related commands, in particular pgagroal-cli (used to manage the pooler) and pgagroal-admin (used to manage authentication and users).
The shell completions work only for Bash and Zsh, and allow you to hit <TAB> after a command and get it automatically completed with the appropriate options.
While importing the completions in Bash is as simple as sourceing the file, in Zsh you need to enable the completion framework. Detailed instructions about how to enable the completions have been placed in the tutorials.

PostgreSQL 15: changes in the public schema permissions

2022-07-15T00:00:00+00:00

The upcoming new release of PostgreSQL does some changes on the public schema permissions.

PostgreSQL 15: changes in the `public` schema permissions

In PostgreSQL 15 the default public schema that every database has will have a different set of permissions. In fact, before PostgreSQL 15, every user could manipulate the public schema of a database he is not owner. Since the upcoming new version, only the database owner will be granted full access to the public schema, while other users will need to get an explicit GRANT:

Imagine the user luca is owner of the database testdb: it means he can do whatever he wants on the database.

testdb=> SHOW server_version;
 server_version
----------------
 15beta2
(1 row)

testdb=> SELECT current_role, current_user;
 current_role | current_user
--------------+--------------
 luca         | luca
(1 row)

testdb=> CREATE TABLE mytable( t text );
CREATE TABLE

On the other hand, another user, let’s say pgbench, cannot:

testdb=> SELECT current_role, current_user;
 current_role | current_user
--------------+--------------
 pgbench      | pgbench
(1 row)

testdb=> CREATE TABLE mytable2( t text );
ERROR:  permission denied for schema public
LINE 1: CREATE TABLE mytable2( t text );

testdb=> select * from mytable;
ERROR:  permission denied for table mytable

That means that public is not managed as a user defined schema, and therefore in order to allow other users to do operations, an explicit GRANT must be executed.
What has changed is that there is no more the CREATE permission on public schema, while YUSAGE is as before. Therefore, in order to allow not-owners to create objects, an explicit GRANT CREATE ON SCHEMA public TO pgbench statement myust be executed.
This affects newly created databases, not those restored from previous backups.
But there is a trick that could help in setting back the previous behavior: if you set the permissions on the template1 (or in a template database) you could have them for free on new databases:

template1=# GRANT CREATE ON SCHEMA public TO PUBLIC;
GRANT

template1=# CREATE DATABASE newdb WITH OWNER luca;
CREATE DATABASE

And now, collecting as not-owning user:

% psql -U pgbench -h localhost newdb

newdb=> create table foo( i int );
CREATE TABLE

the permissions are as in previous PostgreSQL versions.
It is not clear if the above trick will remain in place once the PostgreSQL version exists the beta status, in any case I discourage you to adopt it. The choice of revoking by default privileges on the `public` schmea could be annoying, but is a good choice in term of security and forces you to decide how to deal with permissions.

PostgreSQL 15: changes in the low level backup functions

2022-07-13T00:00:00+00:00

The upcoming new release of PostgreSQL does some changes to low level backup functions.

PostgreSQL 15: changes in the low level backup functions

The upcoming PostgreSQL 15 release implements a few changes into the /low level/ backup functions.
Nowdays I suspect nobody, except backup solution developers, know or use such functions, but I clearly remember when we developed our own scripts to do a continuos backup using functions like pg_start_backup() and pg_stop_backup().
You should use other backup solutions today, like the great pgBackRest.
In any case, what are the changes?
As you can read from [the release notes](https://www.postgresql.org/docs/15/release-15.html{:target=”_blank”} there two mainly:

functions have been renamed to a more consistent naming scheme.
a few functions and modes have been removed.

The functions are now named as pg_backup_, so pg_start_backup() becomes pg_backup_start(), and similarly, pg_stop_backup() becomes pg_backup_stop(). Quite frankly I like this decision, it makes the naming simpler to search for and to remember.
Moreover, there is no more the presence of deprecated (since version 9.6, if I remember correctly), the exclusive backup mode. This was the only way to perform a low level backup back in the days, but since a lot it has been deprecated. One of the problems with exclusive backups is that the system will create a label file that prevents the primary to restart after a crash, and in turn this led people to delete the label file also on standby servers. Now this is no more a problem, and the pg_backup_start() and pg_backup_stop() functions do not handle anymore the exclusive backup parameter.
As a consequence of this choice, the functions pg_is_in_backup() and pg_backup_start_time() have been removed because they were focused only on exclusive backups, that do not exist anymore.

A new pgenv release

2022-06-27T00:00:00+00:00

A new release of the PostgreSQL virtual environment manager.

A new `pgenv` release

Today we released version 1.3.1 of pgenv, the binary manager for PostgreSQL.
This release fixes an annoying bug introduced on Mac OSX (Bash) that was preventing pgenv to properly work on such platform. The bug was introduced with a commit of mines that was fixing another bug about wrong configuration reload.
Unluckily, such bug gets unnoted because I don’t have access to a Mac OSX, and I was quite sure Bash was much more portable than what we discovered!

Anyway, sorry for the bug, and please update your version of pgenv and enjoy it!

Ordinality in function queries

2022-06-23T00:00:00+00:00

A trick about queryies that involves function.

Ordinality in fuction queries

The PostgreSQL SELECT statement allows you to query function that return result set (either a SET OF or TABLE), that are used as source of tuple for the query itself.
There is nothing surprising about that!
However, the SELECT statement, when invoked against a function that provides a result set, allows an extra clause to appear: [WITH ORDINALITY](https://www.postgresql.org/docs/14/sql-select.html){:target="_blank"}. This clause adds a column to the result set with a numerator (of type bigint) representing the number of the tuple as got from the function.

Why is this important? Because you don’t need your function to provide by itself a kind of tuple numerator.

`WITH ORDINALITY` in action

Let’s take a simple example to understand how it works. Let’s create a function that returns a table:

CREATE OR REPLACE FUNCTION animals( l int DEFAULT 5,
                                    animal text DEFAULT 'cat',
                                    owner text DEFAULT 'nobody' )
RETURNS TABLE( pk int, description text, mood text )
AS $CODE$
DECLARE
        i int := 0;
        j int := 0;
BEGIN
        FOR i IN 1 .. l LOOP
            pk          := i;
            description := format( '%s #%s owned by %s', animal, i, owner );
            j           := random() * 100;
            IF j % 2 = 0 THEN
               mood     := 'good';
            ELSE
              mood      := 'bad';
            END IF;

            RAISE DEBUG 'Generating % # % with mood %', animal, i, mood;
            RETURN NEXT;
        END LOOP;
RETURN;

END
$CODE$
LANGUAGE plpgsql;

The above function animals() produced an output with a simple name of the animal (numerated), the index of the generated tuple (i.e., a numerator) and a randomly select mood.
It is clearly easy to test it out:

testdb=> SELECT * FROM animals();
 pk |      description       | mood
----+------------------------+------
  1 | cat #1 owned by nobody | bad
  2 | cat #2 owned by nobody | good
  3 | cat #3 owned by nobody | bad
  4 | cat #4 owned by nobody | bad
  5 | cat #5 owned by nobody | bad
(5 rows)

The pk column contains the numerator of the generated tuple, so we know that the cat #1 tuple has been generated first, the cat #2 as second and so on.
Let’s kick WITH ORDINALITY in:

testdb=> SELECT * FROM animals() WITH ORDINALITY;
 pk |      description       | mood | ordinality
----+------------------------+------+------------
  1 | cat #1 owned by nobody | good |          1
  2 | cat #2 owned by nobody | bad  |          2
  3 | cat #3 owned by nobody | good |          3
  4 | cat #4 owned by nobody | good |          4
  5 | cat #5 owned by nobody | good |          5

The WITH ORDINALITY clause must follow the function it will be apply onto. Such clause appends a new column to the result set, by default named ordinality with a progressive numerator. Note how pk and ordinality contain the very same value: WITH ORDINALITY is keeping track for you of the tuple produced by the result set stream (the function), so you don’t need to compute by yourself.
Clearly, this works also with a reordering of the tuples, because the clause does not numerate the appearance of the tuples, rather the instant (or better, the sequence) a tuple has been added to the result set:

testdb=> SELECT * FROM animals() WITH ORDINALITY
         ORDER BY random();
 pk |      description       | mood | ordinality
----+------------------------+------+------------
  4 | cat #4 owned by nobody | good |          4
  2 | cat #2 owned by nobody | good |          2
  3 | cat #3 owned by nobody | good |          3
  5 | cat #5 owned by nobody | good |          5
  1 | cat #1 owned by nobody | good |          1
(5 rows)

It is also possible to rename the ordinality column with an alias, like the following:

testdb=> SELECT * FROM animals() WITH ORDINALITY
                       AS cat(i, name, mood, n)
                       ORDER BY random();
 i |          name          | mood | n
---+------------------------+------+---
 4 | cat #4 owned by nobody | good | 4
 1 | cat #1 owned by nobody | good | 1
 2 | cat #2 owned by nobody | bad  | 2
 5 | cat #5 owned by nobody | bad  | 5
 3 | cat #3 owned by nobody | good | 3
(5 rows)

Clearly, you have to alias the whole result set, not a single column!

`WITH ORDINALITY` as a filtering condition

Having the automatically named ordinality column, or a custom chosen named column, it is possible to add such column to the WHERE clause of a query:

testdb=> SELECT * FROM animals() WITH ORDINALITY                                                                                        AS cat(i, name, mood, n)
                       WHERE n % 2 = 0
                       ORDER BY random();
 i |          name          | mood | n
---+------------------------+------+---
 4 | cat #4 owned by nobody | bad  | 4
 2 | cat #2 owned by nobody | bad  | 2
(2 rows)

as you can see, the above query filters on the n column to get only even tuples.

`WITH ORDINALITY` vs `row_number()`

You may think that the window function [row_number()](https://www.postgresql.org/docs/14/functions-window.html){:target="_blank"} does the same job as WITH ORDINALITY, at least in the function call scenario. However, the row_number() window function is a different beast, and can work on a window defined against the result set ordinality. In short, window functions cover a diferent set of problems!
Therefore, even if the following seems to produce the very same result:

testdb=> SELECT *, row_number() OVER () FROM animals() WITH ORDINALITY;
 pk |      description       | mood | ordinality | row_number
----+------------------------+------+------------+------------
  1 | cat #1 owned by nobody | good |          1 |          1
  2 | cat #2 owned by nobody | bad  |          2 |          2
  3 | cat #3 owned by nobody | bad  |          3 |          3
  4 | cat #4 owned by nobody | good |          4 |          4
  5 | cat #5 owned by nobody | bad  |          5 |          5
(5 rows)

as soon as you define your partition to number in a more specialized way you see different results:

testdb=> SELECT *, row_number() OVER ( order by pk desc ) FROM animals() WITH ORDINALITY;
 pk |      description       | mood | ordinality | row_number
----+------------------------+------+------------+------------
  5 | cat #5 owned by nobody | bad  |          5 |          1
  4 | cat #4 owned by nobody | bad  |          4 |          2
  3 | cat #3 owned by nobody | bad  |          3 |          3
  2 | cat #2 owned by nobody | good |          2 |          4
  1 | cat #1 owned by nobody | good |          1 |          5

In the above, you can see that the last row produced by the function (this with ordinality set to 5) is the first row encountered by row_number().
Another example of different results can be quickly obtained when joining:

testdb=> SELECT *, row_number() OVER ()
         FROM animals() WITH ORDINALITY,
         generate_series(1, 3) WITH ORDINALITY as x(gs, counter);
 pk |      description       | mood | ordinality | gs | counter | row_number
----+------------------------+------+------------+----+---------+------------
| cat #1 owned by nobody | bad  |          1 |  1 |       1 |          1
| cat #2 owned by nobody | good |          2 |  1 |       1 |          2
| cat #3 owned by nobody | good |          3 |  1 |       1 |          3
| cat #4 owned by nobody | good |          4 |  1 |       1 |          4
| cat #5 owned by nobody | good |          5 |  1 |       1 |          5
| cat #1 owned by nobody | bad  |          1 |  2 |       2 |          6
| cat #2 owned by nobody | good |          2 |  2 |       2 |          7
| cat #3 owned by nobody | good |          3 |  2 |       2 |          8
| cat #4 owned by nobody | good |          4 |  2 |       2 |          9
| cat #5 owned by nobody | good |          5 |  2 |       2 |         10
| cat #1 owned by nobody | bad  |          1 |  3 |       3 |         11
| cat #2 owned by nobody | good |          2 |  3 |       3 |         12
| cat #3 owned by nobody | good |          3 |  3 |       3 |         13
| cat #4 owned by nobody | good |          4 |  3 |       3 |         14
| cat #5 owned by nobody | good |          5 |  3 |       3 |         15
(15 rows)

For every generate_series() tuple (column counter) there are five animals() tuples (column ordinality), each one progressively tracked by row_number().

Conclusions

Why is this ordinality thing important?
It may happen that you are tempted to include into your function result sets some extra information that will ease the post-processing of the result set itself. This practice should be avoided when the “external world” (i.e., the query using the function) is able to add such extra information by itself. You will not waste resources, but also keep your code cleaner and more readable.

An introduction to pgagroal (Italian)

2022-06-10T00:00:00+00:00

The video recording about my pgagroal talk.

An introduction to pgagroal (Italian)

At the past PgTraining online event we have a set of amazing talks. I gave an introduction about pgagroal, a very interesting connection pooler for PostgreSQL.

Here there’s a recording of my video (in italian):

and here there are the slides (in italian).

And don’t forget to glance at all the other online material shared about the online event!

pgenv `switch`

2022-05-11T00:00:00+00:00

pgenv 1.3.0 adds a new command: switch

`pgenv` switch`

pgenv, a simple but great shell script that helps managing several PostgreSQL instances on your machine, have been improved in the last days.

Thanks to the contribution of Nils Dijk @thanodnl on GitHub, there is now a new command named switch that allows you to quickly prepare the whole environment for a different PostgreSQL version without having to start it.

The problem, as described in this pull request was that the use command, trying to be smart, starts a PostgreSQL instance once it has been chosen. On the other hand, switch, allows you to pre-select the PostgreSQL instance to use without starting it. This is handy, for example, when you want to compile some code against a particular version of PostgreSQL (managed by pgenv) but don’t want to waste your computer resources starting up PostgreSQL.
To some extent, switch can be thought as an efficient equivalent of:

% pgenv use 14.2
% pgenv stop

The command has been implemented as a subcase of use, but while use does fire up an instance, switch does not.
However, in the case an instance is already running, switching to a new instance will stop the previously running one!

Other minor contributions

If you have pgenv on the radar, you probably have seen another release in the last days, that covered a bug fix spot by Nils Dijk about the management of the configuration.

Conclusions

pgenv keeps growing and adding new features, and is becoming a more complex beast than it was in the beginning. Hopefully, it can help your workflow too!

Don't forget the PgTraining online webinar on 2022-04-29 (Italian)

2022-04-20T00:00:00+00:00

Yet another online event organized by PgTraining!

Don’t forget the PgTraining online webinar on 2022-04-29 (Italian)

There are still some seats available for another great online event provided you by PgTraining!

Don’t forget to get your free of charge access to the online event, that will be brought you in Italian language on next 29th April: hurry up and get your free ticket.

Formatting SQL code with pgFormatter within Emacs

2022-04-13T00:00:00+00:00

Editing SQL and PostgreSQL related code within Emacs, in a beautiful war!

Formatting SQL code with pgFormatter within Emacs

pgFormatter is a great Perl 5 tool that parses SQL input and re-format it in a beautiful way.
Despite the name, it works with any SQL piece of code, since it does support the standars from SQL-92 to SQL-2011, plus all little keywords and details that are specific to PostgreSQL.

Being myself an Emacs addicted, I reasoned about how to “pkug in” pgFormatter into Emacs, and I came up with a short and ugly snippet of code that does the trick.
But, being Emacs what it is, there is no particular need to plug in such code, as I will show you in a moment.

Use `pgFormatter` from Emacs, the portable way

Emacs allows users to run a shell command over a region or a buffer content. The M-| (menomic: pipe) does that. With the universal prefix (C-u) it can also replace the region or buffer you are running the command against.
This means that, given your own Emacs instance, you can format the code within the region by simply doing

C-u M-| pg_format

where pg_format is the name of the executable of pgFormatter (e.g., it is called like that in Rocky Linux).

A more lispy approach

I developed a simple and ugly snippet of Lisp that can be loaded into Emacs to make the pgFormatter usage quicker.

(defun pgformatter-on-region ()
  "A function to invoke pgFormatter as an external program."
  (interactive)
  (let ((b (if mark-active (min (point) (mark)) (point-min)))
        (e (if mark-active (max (point) (mark)) (point-max)))
        (pgfrm "/usr/bin/pg_format" ) )
    (shell-command-on-region b e pgfrm (current-buffer) 1)) )

The above piece of code defines an interactive function (i.e., a function that can be invoked with M-x) named pgformatter-on-region. The function defined three variables:

b is the beginning of the region to format;
e is the end of the region to format;
pgfrm is the path to the executable to invoke.

If there is a region active (i.e., mark-active) the function operates over the region, otherwise, if no region is applied, it operates on the whole buffer.
In the end, the function invokes the shell command pgfrm using the internal interactive function shell-command-on-region over the current buffer. The last argument, 1 indicates that I want to substitute the content of the current region (or buffer) with the command output.

In order to execute the formatting, I then just need to M-x pgformatter-on-region with either an active region or not. It is also possible to bind the function to a keyboard sequence with something like:

(global-set-key (kbd "C-i") 'pgformatter-on-region)

or a local key map entry.

The ending result is something like the following:

pgbadger incremental mode via SSH

2022-04-06T00:00:00+00:00

How great it is pgbadger?

pgbadger incremental mode via SSH

pgbadger is a great tool, and quite frankly I suggest everyone I talk to about PostgreSQL to install it!
Why?
It is cheap and does its job in analyzing logs and providing you insights about what happened in your cluster.

A few days ago I caught a strange, to me, behavior. pgbadger has a very handy incremental mode that allows you to keep it running processing new logs every day (or whatever period you choose) and get historical and up-to-date insights. Well, when downloading a file over an SSH connection, this incremental behavior was not working.
Uhm, I was sure it was working, since I use it quite often, but I was unable to understand what I was missing in the configuration of pgbadger. After a few experiments and comparisons with other working systems of mines, I found that the -r (remote) flag was able to work over SSH, while a “simple” URI like ssh://me@you//var/postgresql/logs was not.

I reported the issue, and in less than a week the problem was fixed!
Well, this is unfair: it is true that the commits is a week after the initial issue, but after only 48 hours there was a commit aimed to fix the problem, but then there was some around-the-daylight time spent in communicating tests and their results.

This is something you simply don’t get in your commercial ecosystem!

Thanks for the great work and keep this useful project going!

pgagroal log rotation and formatting

2022-04-04T00:00:00+00:00

My small contributions to pgagroal.

`pgagroal` log rotation and formatting

A few weeks ago I implemented a small contribution to [pgagroal](https://agroal.github.io/pgagroal/){:target="_blank"}, the high-performance PostgreSQL connection pooler, in order to implement log rotation and log formatting.
At last, my contribution was accepted and merged, but I did not get enough time to write on this until now.

The issues

As you can read in the issues about log rotation and log formatting, pgagroal was born with a very minimal support for logging. With “minimal” I mean that the log file was not able to be strftime(3) compatible, therefore no placeholder were available in the log filename, and at the same time there was no rotation of logs at all.
Therefore I decided to try to implement both these features, and here there is a short description of what I did.

`strftime`

This has been the first place to start: allow the support of strftime(3) compatible strings in the log_filename parameter. This has been quite easy, since the only need was to use strftime when opening the log file.

Log rotation

This was much bigger to implement. First of all, I have to ensure that every time a new rotation was required, the system was able to rotate the logs.
I decided to implement the check once a new log entry was outputted. This is clearly an unefficent approach, since the system is going to check the rotation needs much more than it is required, and can also slow down the logging system. However, the idea is that pgagroal is not going to log so much data to be impacted by the continuos check for rotation. Moreover, this was a kind of forced choice, since the logging is not done via a separated process.
The next step was to ensure that, once a log rotation is required, the system can rotate the log file. In order to do tat two changes were required:

the log_filename should be able to support different names, in particular becoming a strftime(3) compatible strings;
the log file cannot be opened only in the application startup, but I needed a dedicated function to re-open the log file with a new name if needed.
In order to speed up a little the activities, I also provided an utility function to test if log rotation was enabled. Therefore, in the case the rotation was disabled, nothing of the above will ever happen.
Of course, the user must have some parameters to control log rotation, so I introduced log_rotation_age and log_rotation_size, both accepting strings that have to be converted respectively in seconds and bytes. And I provided the parsing functions as well.
Rotating the log on the size basis was quite simple: I have to test if the file size has exceeded the log_rotation_size. It was much more difficult to implement the rotation by age.
The rotation by age was implemented like this: a global variable keeps track of the last time the log file has been opened or reopened. Then, when I need to check for log rotation, I count the current time and get the difference between the current time and the last time the log file has been opened, if such number of seconds is greater than log_rotation_age, the log must be closed and re-opened.

There is an important consequence in the above logic: the rotation does not happen exactly when it is supposed to happen, but always with a size/time delay. In other words, it is allowed for a log file to exceed by a single log entry (therefore, a feew bytes) the log_rotation_size, as well as not time based rotation will happen before a new log entry is flushed. This means that on a low busy server, you could see rotation to happen much later than what you configured.

Last, there was to implement a some kind of truncation when a log rotates. Luckily, there was already such parameter, named log_mode, for opening a log file in append or truncate mode. I reused such logic in the re-opening log file function.

Log Formatting

The last piece to add to the picture was the log formatting option. This was, after all, quite easy: every log entry was flushed with a strftime(3) fixed preamble; it was sufficient to provide an option log_line_prefix to use as a variable preamble to strfime(3).

Glance at contributed code

Here you can find a glance at the contributed code:

log_rotation_enabled() returns true if the log rotation is active. The log rotation is automatically disabled if some configuration parameter is not set accordingly;
log_rotation_disable() turns off the log rotation. This is done as a last resort when the configuration is miswritten;
log_rotation_required() checks if now is required a log rotation, either by age or size;
log_rotation_set_next_rotation() computes the next age at which a rotation by time should be triggered;
log_file_open() opens or re-opens the log file using strftime(3) agains the log_filename configuration parameter. Every time the rotation must happen this function is invoked;
log_file_rotate() performs the log rotation.

As an example, the log_file_rotate() function is really simple:

void
log_file_rotate(void)
{
   if (log_rotation_enabled())
   {
      fflush(log_file);
      fclose(log_file);
      log_file_open();
   }
}

and as you can see, it flushes the current log and re-opens it. And all the magic happens when the log entry is spurted:

 vfprintf(log_file, fmt, vl);
            fprintf(log_file, "\n");
            fflush(log_file);

            if (log_rotation_required())
            {
               log_file_rotate();
            }

Briefly, after the fprintf and the fflush the system asks itself if a log rotation is required, and in case, rotates the log file.

Conclusions

While pgagroal has a basic logging mechanism, this contribution provides the log rotation features in a semi-precise way. Contributing to this was fun, even if hard in some aspects, in particular because it’s way too long since I develop something in C.
pgagroal is a promising project, and I’m sure it is going to quickly show all its potential!

Perl code reuse in Pl/Perl thru pg_proc and anonymous code blocks

2022-03-09T00:00:00+00:00

Perl is great. PostgreSQL is great. And great plus great means super-powers!

Perl code reuse in Pl/Perl thru pg_proc and anonymous code blocks

PostgreSQL allows you to write executable code, e.g., FUNCTIONs and PROCEDUREs in Perl thru its extension language Pl/Perl (plperl and plperlu). But sometimes there is the need to use the same Perl block over and over again across different code blocks and functions.
There are different approaches, most notably the module one: abstract your behavior into a module and load it whenever you need. Yeah, this means using plperlu, but it is a fair tradeoff.

However, keeping in mind how PostgreSQL stores procedures and their code, it is possible to use a more fancy approach. In this post I show you a couple of simple examples as proofs of concept, clearly in order to push this into production there is the need for a more sophisticated approach.

An easy function in Pl/Perl

Let’s start simple: a Pl/Perl function to say prime numbers.

CREATE OR REPLACE FUNCTION
fluca.is_prime( int )
RETURNS bool
AS $CODE$
   return 1 if $_[0] <= 2;

   for my $i ( 2 .. $_[0] - 1 ) {
       return 0 if $_[0] % $i == 0;
   }

   return 1;
$CODE$
LANGUAGE plperl;

Quite simple, uh? Now imagine that we need to prepare another function that needs to generate prime numbers, and thus needs to know if a given number is prime or not.
One approach could be to call the above fluca.is_prime() function, but this will slow down the whole process. But after all, this is the building block logic on functions!
Another approach could be to take apart the above bunch of Perl code, create a module, and use it wherever needed. But it is not the approach followed here.
Again, another approach could be to store a block reference into the %_SHARED global hash.
Last, why not querying the catalog pg_proc to extract the source code of the above function and wrap it into another Perl anonymous code block? It goes like this:

CREATE OR REPLACE FUNCTION
fluca.generate_primes_up_to( int )
RETURNS SETOF int
AS $CODE$

my $query = "select prosrc from pg_proc where proname = 'is_prime' and pronamespace = ( select oid from pg_namespace where nspname = 'fluca' );";
my $code = spi_exec_query( $query, 1 )->{ rows }[ 0 ]->{ prosrc };

my $is_prime = eval( "sub { $code }; " );

for my $n ( 1 .. $_[0] ) {
    return_next( $n ) if $is_prime->( $n );
}

return undef;

$CODE$
LANGUAGE plperl;

The $query statement selects the pg_proc.prosrc text field that contains the source code, whatever you have written between $CODE$ separators. That’s because plperl is a pl language, therefore its source code is stored in the system catalog.
Having stated that, the $code string contains the block of code, so it was like the variable was declared as follows:

my $code = " return 1 if $_[0] <= 2;

   for my $i ( 2 .. $_[0] - 1 ) {
       return 0 if $_[0] % $i == 0;
   }

   return 1;";

Seems like a Perl sub, but it is not (yet). There is the need to wrap the bunch of code into a sub declaration, and this is the easy part, and then we need to compile it. That’s the task of eval( "sub { $code };" ), that creates an anonymous subroutine with the source code extracted from the other function.
Such code is stored into a scalar $is_prime that is then used as a standard anonymous subroutine via ->.
And that’s all!

Advantages

The main advantage of the above approach is that whenever a change is done nto fluca.is_prime(), the same change is immediatly reflected into fluca.generate_primes_up_to(), because the source code of the former is always queried at the time the latter starts its execution.

Drawbacks

Time!
Extracting the code and compiling it every time requires time and resources, so it can be a pitfall for big Perl code blocks. There are different modules that can help in this scenario, e.g., [Perl::Parse](https://metacpan.org/pod/Parse::Perl){:target="_blank"}.
An hidden drawback is that the two functions are not explicitly coupled, so if fluca.is_prime is accidentaly deleted, the other function will no more be able to run at all!

The `SETOF` problem

Reusing a piece of code that returns a scalar is simple, but what about functions that return sets?
Assume there is the need for a function that returns all the even numbers up to a limit, and does that efficiently, that is returning one value at a time.

CREATE OR REPLACE FUNCTION
fluca.generate_evens( int )
RETURNS SETOF int
AS $CODE$
for ( 1 .. $_[0] ) {
  return_next( $_ ) if $_ % 2 == 0;
}

return undef;
$CODE$
LANGUAGE plperl;

While the function is really simple, the problem of the sets arises immediatly: Pl/Perl provides particular ways of interacting with PostgreSQL, and return_next is one of such ways. Long story short: return_next yelds the function adding a new element to the current result set.
Since (regular) Perl does not have a return_next operator nor a function, how to translate such code? A very inefficient approach is to put the result set into an array and return the whole array. It is not the same as return_next, because there is no yelding, but it can work. Therefore, a function that wants to use the previous code could inject an array on the function prologue and substitute return_next with a regular array returning.
Imagine we want to build a function that computes odd numbers on top of the even ones; the code looks like the following snippet.

CREATE OR REPLACE FUNCTION
fluca.generate_odds( int )
RETURNS SETOF int
AS $CODE$
my $query = "select prosrc from pg_proc where proname = 'generate_evens' and pronamespace = ( select oid from pg_namespace where nspname = 'fluca' );";
my $code = spi_exec_query( $query, 1 )->{ rows }[ 0 ]->{ prosrc };

$code = "my \@return_values;\n" . $code ;
$code =~ s/return_next\s*\(/push( \@return_values,/g;
$code =~ s/return\s+undef\.*;/return \@return_values;/g;

my $generate_evens = eval( "sub { $code }; " );
my @odds = map( { $_ + 1 } $generate_evens->( 10 ) );

for ( @odds ) {
  return_next( $_ );
}

return undef;
$CODE$
LANGUAGE plperl;

The base idea is the same as in the previous case: query pg_proc to get the source of the function and store it into $code.
Then, add the declaration for an array, named @return_values (a better and unique name should be chosen), and substitue with a regular expression all return_next with a push into the above array, removing also any return undef (that in PlPerl is the way to end the result set).
Yeah, I hear you screaming! This is surely something not to do in production, but it is a simple and dirty way to make Perl do what you want.
As in the previous case, store the result of evaluating the so rewritten $code into a scalar named $generate_evens and use it as you prefer.

Danger Will Robinson!
The regex substitution is awful because it will go beyond the scope where return_next applies. Imagine to apply the same technique recursively to fluca.generate_odds(): there is a return_next level inside the code extracted from pg_proc and an outer scope with return_next used within the function itself. The regular expression is not able to find out the scope, so both return_next will appear similar and will get substituted in the very same manner. And that’s why you should not use such approach in production! Again, there are Perl modules to get rid of these details and get things done in the right way.

Conclusions

Perl is great. It allows you to build dynamic code in a very dynamic way.
PostgreSQL is great. It allows you to inspect every single part of the system, including executable code.
The Perl power to push some code out of a PostgreSQL table (pg_proc) into a scalar, so to use it later on, allows for code sharing among Pl/Perl functions and routines.
It is up to you to decide to shoot yourself in the foot or hit something valuable!

Pl/Perl Recursion

2022-03-02T00:00:00+00:00

Some thoughts and experiments in Pl/Perl recursion.

Pl/Perl Recursion

While solving the Perl Weekly Challenge 154, I provided a couple of possible solutions in Pl/Perl, the widely available Perl integration within PostgreSQL.
One task to solve, Padovan numbers, required to use recursion, and that is something not as simple as it could seem to implement using Pl/Perl.
Why?
Because Pl/Perl does not expose Perl objects, rather is a way to execute Perl within SQL objects (e.g., functions). What it means is that SQL objects are (clearly) the first class objects available, so you have always to use SQL functions to recurse.
Except when you don’t want to!

But let’s start simple and see how to solve the problem.

Padovan numbers

A Padovan number is a number defined as the sum of two preceeding numbers in the sequence. In particular:

P(0) = P(1) = P(2) = 1, the first three elements of the sequence are equal;
P(n) = P(n - 3) + P(n - 2)

This is great for recursion, because you can define a function in Pl/pgSQL as follows:

CREATE OR REPLACE FUNCTION
pwc154.padovan( i int )
RETURNS int
AS $CODE$
BEGIN
    IF i <= 2 THEN
       RETURN 1;
    END IF;

    RETURN pwc154.padovan( i - 3 ) + pwc154.padovan( i - 2 );
END
$CODE$
LANGUAGE plpgsql;

Translating to Pl/Perl and the problem of recursion

The above Pl/pgSQL function cannot be translated byte-by-byte to Pl/Perl; the following will not be possible:

CREATE OR REPLACE FUNCTION
padovan_not_working( int )
RETURNS int
AS $CODE$
  return 1 if( $_[0] <= 2 );
  return padovan_not_working( $_[0] - 3 )
       + padovan_not_working( $_[0] - 2 );
$CODE$
LANGUAGE plperl;

In fact, the padovan_not_working is a function on the SQL side, and thus cannot be called by PlPerl as a Perl function.
One, ease, solution, could be to accept the fact that the resulting function is an SQL object and interact with it accordingly:

CREATE OR REPLACE FUNCTION
pwc154.padovan_plperl( int )
RETURNS int
AS $CODE$
 return 1 if $_[0] <= 2;
 my ( $a, $b ) = ( $_[ 0 ] - 3, $_[ 0 ] - 2 );
 my $rs = spi_exec_query( "SELECT pwc154.padovan_plperl( $a ) + pwc154.padovan_plperl( $b ) AS p" );
 return $rs->{ rows }[ 0 ]->{ p };

$CODE$
LANGUAGE plperl;

As you can see, the function invokes itself by means of an SQL query.

Using a closure

It is possible to use a closure to hold the reference to an anonymous code block, so that it is possible to implement the recursion as follows:

CREATE OR REPLACE FUNCTION
plperl_padovan_recursive( int )
RETURNS int
AS $CODE$
  my $padovan;
  $padovan = sub {
    return 1 if $_[0] <= 2;
    return $padovan->( $_[0] - 3 ) + $padovan->( $_[0] - 2 );
  };

  return $padovan->( $_[0] );
$CODE$
LANGUAGE plperl;

No need for queries, no need for external modules, but there are memory leaks due to the reference counting.

Using `Sub::Recursive`

There is a module, named Sub::Recursive that does exactly what I would like to go: allows to define an anonymous code block that can recursively invoke itself without any leak.
The only drawback is that the function must be run as Pl/Perl unsafe because it needs to load a module outside of the PostgreSQL server (and of course, the module must be on the system, cpanm is your friend!):

CREATE OR REPLACE FUNCTION
plperl_padovan( int )
RETURNS int
AS $CODE$
use Sub::Recursive;
my $padovan = recursive {
    return 1 if $_[0] <= 2;
    return $REC->( $_[0] - 3 ) + $REC->( $_[0] - 2 );
};

  return &$padovan( $_[0] );
$CODE$
LANGUAGE plperlu;

That’s it! No need for queries, no need for %_SHARED, no need for closures (apparently), just Perl!
But, there is the need for plperlu!
The module provides the special keyword recursive that accepts a code reference with the closure $REC that holds a reference to the code block itself.

Using `%_SHARED` ?

Another way to use recursion is by means of the Pl/Perl global hash %_SHARED, that is used to share whatever object you want across different functions. The idea is to share a function, so that it is possible to invoke it directly later on.
The implementation could be as follows:

CREATE OR REPLACE FUNCTION
plperl_padovan_init()
RETURNS VOID
AS $CODE$
  my $padovan;
  $padovan = sub {
    return 1 if $_[0] <= 2;
    return $padovan->( $_[0] - 3 ) + $padovan->( $_[0] - 2 );
  };

  $_SHARED{ padovan } = $padovan;
$CODE$
LANGUAGE plperl;


SELECT plperl_padovan_init();

CREATE OR REPLACE FUNCTION
plperl_padovan_shared( int )
RETURNS int
AS $CODE$
  my $padovan = $_SHARED{ padovan };
  return $padovan->( $_[0] );
$CODE$
LANGUAGE plperl;

The first function, plperl_padovan_init, installs a code reference $padovan into the global Pl/perl hash %_SHARED, so that other functions can obtain such code reference. The code is the same as in the other examples.
Then the function is explicitly invoked, so that the code reference is installed.
Later on, the plperl_padovan_shared function gets the code reference and use it as a normal function.

Quick and dirt comparison

I’ve done a very short one-launch comparison among the approaches, excluding the one based on %_SHARED because it is very similar to the approach using the pure recursion via code reference. Just for the records, asking to the %_SHARED based approach to compute the 50-th Padovan number requires around 0.5 secs that is, as expected, in line with the other similar approaches.
Increasing the Padovan number to compute makes the Perl approaches based on pure or Sub::Recursive really similar in terms of execution time. The approach that performs a query to use recursion is, as you can imagine, the slowest one and its performance decreases very quickly as the numbers grow.

The following table summarizes times depending on the generated number:

Padovan number	`Sub::Recursive`	closure	query
10	12 ms	20.39 ms	17.61 ms
20	0.51 ms	18.18 ms	12 ms
30	1.2 ms	17.9 ms	27.99 ms
40	31.9 ms	30.01 ms	323.4 ms
50	0.50 secs	0.52 secs	5.9 secs
60	12.76 secs	10.35 secs	99.77 secs
65	35.09 secs	35.78 secs	385 secs
70	142 secs	145 secs	1580 secs
71	187.89 secs	196.24 secs	2122.8 secs

It is not possible to keep increasing the Padovan number because of integer overflow, therefore I would have to adjust the functions to return bigint, but in any case I’m not expecting much different result trends.

Conclusions

Recursion in Pl/Perl could be hard to implement and could require fancy approaches like the closure based ones. First of all, you need to decide if you can deal with untrusted languages: if so, probably installing a module is the easiest and rightmost approach. If you don’t want to deal with untrusted code, you need to decide if you prefer to use a pure Perl approach, in such case a code reference is the choice, or you want to have something that can be invoked by other languages. The latter means using a more SQL-toward approach, while the former means sticking with a code refence, either used immediatly or by means of some sort of shared storage.

Contributing to pgagroal (and pgmoneta?)

2022-02-23T00:00:00+00:00

My small contributions to two interesting projects.

Contributing to `pgagroal` (and `pgmoneta`?)

[pgagroal](https://agroal.github.io/pgagroal/){:target="_blank"} is an interesting high-performance PostgreSQL connection pooler. I started using and studying it at the end of the past year, and due to some discussions on the project discussions page, I decided to have a look at the source code.
The above resulted in a few small contributions that have been merged:

small changes to configuration error messages. During some testing I noted that there was a misleading, at least to me, FATAL log message when the configuration of limits was inconsistent; this patch tries to improve the situation that was not strictly related to the log message, rather to the way of throwing the configuration inconsistency thru the application stack. The patch has been squashed and merged.
master key error messages I was frustated one day while trying to insert a master-key password for the pgagroal vault, but since I was inserting a too short password, the system kept asking me the password without any error message. Thsi patch, squashed and merged, improves the verbosity of the application in such condition.
generate a PID file depending on the listening socket was required because I was not able to stop pgagroal sometime. The problem was that launching two different instances on the same machine could result in some problems when the configuration was duplicated. The solution was to use a guard PID file based on the listening socket, so that the daemon could not be started on the same host in such conditions.
verbosity about master-key vault provides an informational string to the user about where, on disk, the vault has been stored.

The above are very small contributions, due also to the fact that I don’t know (yet) pgagroal so well to be confident in doing more complex contributions. Also, as you can see from the pull requests, my Emacs decided to wipe out several times the code style, resulting in merge to stay pending. Moreover, it has been years since I developed something in C!

Getting in touch with pgagroal lead me to get to know also another interesting project: [pgmoneta](https://github.com/pgmoneta/pgmoneta){:target="_blank"}, a backup solution for PostgreSQL. I did not have very much time to inspect and study this project, but at least I was able to re-propose a similar patch about the guard PID file.

What’s next? Well, I’m trying to implement log rotation on pgagroal, but I’m not yet ready to push in the wild a proposal. So far, I’m testing my work, so stay tuned!

Perl Weekly Challenge 153: recursive CTEs

2022-02-22T00:00:00+00:00

My personal solutions to the Perl Weekly Challenge.

Perl Weekly Challenge 153: recursive CTEs

This is a short tour about my solutions to the Challenge 153 done in PostgreSQL.

Task 1
Task 2

PWC 153 - Task 1

This task was about producing left factorial numbers, where each value is computed by summing all the previously computed factorials.
I decided to implement it on top of a recursive CTE named factorials that, well, computes factorials. That was the easy part, then I needed to compute the sum of all the values less than the current one. Let’s use a LATERAL JOIN for the task:

with recursive factorials as
(
   SELECT 0::numeric as num
         ,1::numeric as fac

   UNION

   SELECT f.num + 1
         , ( f.num + 1 ) * f.fac
   FROM factorials f
   WHERE f.num < 1000
)
SELECT f.num, sum( w.fac ) as left_factorial
FROM factorials f, LATERAL
( SELECT ff.fac FROM factorials ff WHERE ff.num < f.num ORDER BY ff.num ) w
WHERE f.num <= 10
GROUP BY f.num, f.fac
ORDER BY f.num
;

I limit the number of work to be done to 10, as requested by the task. The factorials CTE computes all the factorials, and then I join LATERAL with a subquery w that selects all the factorial values for entries less then current one. Therefore, using the built-in sum function in the outer query solves the problem.
Clearly, this is not a particularly efficient solution, but it is a good example of what recursive CTEs and LATERAL an do when combined together.

PWC 153 - Task 2

Similar to the previous task, but simpler: see if a given number is made by digits that, when summed as factorials, provide the number itself. As an example, 145 is a number that can be expressed as !1 + 4! + 5!.
Having a recursive CTE to compute factorials from the previous task, I decided to use the same starting point. However, this time, I used a psql variable named needle to which I assign the value I want to test:

\set needle 145

with recursive factorials as
(
   SELECT 0::numeric as num
         ,1::numeric as fac

   UNION

   SELECT f.num + 1
         , ( f.num + 1 ) * f.fac
   FROM factorials f
   WHERE f.num < 1000
)
SELECT CASE sum( f.fac ) WHEN :needle THEN :needle || ' OK' ELSE :needle || ' KO' END AS factorions
FROM factorials f JOIN regexp_split_to_table( :needle::text, '' ) w(n)
ON w.n = f.num::text
;
;

The trick here is that I join factorials with regexp_split_to_table that returns all the digits as a set of tuples. Then, in the outer query, I do sum the factorials of every digit and see if the result is still the needle, producing an OK string or a KO one.

PgTraining online webinar on 2022-04-29 (Italian)

2022-02-05T00:00:00+00:00

Yet another online event organized by PgTraining!

PgTraining online webinar on 2022-04-29 (Italian)

PgTraining, the amazin italian group of people that spread the word about PostgreSQL and that I joined in the last years, is organizing another online event (webinar) on next 29th April 2022.
Following the success of the previous edition(s), we decided to provide another afternoon full of PostgreSQL talks, in the hope to improve the adoption of this great database.

The event will consist in three hours with talks about connection pooling, timeseries extensions and column storage internals.
As for the previous editions, the webinar will be presented in Italian. Attendees will be free to actively participate and do questions both during the talks and at the end of the whole event.

In the pure spirit of PgTraining, the event will be free of charge, but it is required to register for participate and the number of available seats is limited, so hurry up and get your free ticket as soon as possible!
The material will be available for free after the event has completed, but no live recording will be available.

Pentagon numbers

2022-01-11T00:00:00+00:00

A couple of different solutions to an interesting problem.

Pentagon numbers

Since a few weeks, I tend to implement some of the tasks for the Perl Weekly Challenge into PostgreSQL specific code. One interesting problem has been this week task 2 of the Challenge 147: finding out a couple of pentagon numbers that have simultaneously a sum and a diff that is another pentagon number.
In this post, I discuss two possible solutions to the task.

What is a Pentagon Number?

A pentagon number is defined as the value of the expression n * ( 3 * n - 1 ) / 2, therefore the pentagon number corresponding to 3 is 12.
The task required to find out a couple of pentagon numbers so that:

P(n1) + P(n2) = P(x)
P(n1) - P(n2) = P(y)

It does not matter what x and y are, but n1 and n2 must be pentagon numbers and both their sum and diff must be pentagon numbers too.

The first approach: a record based function

The first solution I came with was inspired by the solution I provided in Raku, and is quite frankly a kind of record-based approach.
Firs of all, I define an IMMUTABLE function named f_pentagon that computes the pentagon number value starting from a given number, so that f_pentagon( 3 ) returns 12. Why do I need a function? Because I want to implement a table with a stored virtual column to keep track of numbers and their pentagon values.
For that reason, I created a pentagons table with a generic n column that represents the starting value and the p column that represents the computed pentagon value.

CREATE OR REPLACE FUNCTION
f_pentagon( n bigint )
RETURNS bigint
AS
$CODE$
        SELECT ( n * ( 3 * n - 1 ) / 2 );
$CODE$
LANGUAGE sql
IMMUTABLE;


DROP TABLE IF EXISTS pentagons;
CREATE TABLE pentagons
(
        n bigint
        , p bigint GENERATED ALWAYS AS ( f_pentagon( n ) ) STORED
);



INSERT INTO pentagons( n )
SELECT generate_series( 1, 5000 );

I inserted into the table 5000 records because I know, from the Raku solution, that what I’m looking for is within such range of values. It is, of course, possible to increase that limit to find out other values.
The table content looks therefore like the following:

testdb=> select * from pentagons limit 10;
 n  |  p
----+-----
|   1
|   5
|  12
|  22
|  35
|  51
|  70
|  92
| 117
| 145

Now it is possible to implement a function, named f_pentagon_pairs that seeks the above table searching for the required values. The table returns a TABLE, even if only one row will be returned, but since I want to output multiple values, I decided to implement it as a row level returning function. In particular, the returned information is:

n1 is the first number;
n2 is the second number;
s is the sum of the pentagons, that is P(n1) + P(n2);
d is the difference of the pentagons, that is abs( P(n1) - P(n2) );
ps is the number which pentagon corresponds to the sum of the two pentagons, that is P(ps) = P(n1) + P(n2);
pd is the number which pentagon corresponds to the difference of the two pentagons, that is P(pd) = abs( P(n1) - P(n2) );

The function is the following one:

CREATE OR REPLACE FUNCTION
f_pentagons_pairs()
RETURNS TABLE ( n1 bigint, n2 bigint,  s bigint, d bigint, ps bigint, pd bigint )
AS $CODE$
DECLARE
        current_tuple pentagons%rowtype;
        other_tuple   pentagons%rowtype;
        fnd           int := 0;
BEGIN

        FOR current_tuple IN SELECT * FROM pentagons ORDER BY n LOOP
            SELECT *
            INTO other_tuple
            FROM pentagons pp
            WHERE EXISTS(
                  SELECT *
                  FROM pentagons ps
                  WHERE ps.p = current_tuple.p + pp.p
                  )
           AND EXISTS (
               SELECT *
               FROM pentagons ps
               WHERE ps.p = abs( current_tuple.p - pp.p )
           );


           IF FOUND THEN
              SELECT current_tuple.n
                     , other_tuple.n
                     , current_tuple.p + other_tuple.p
                     , abs( current_tuple.p - other_tuple.p )
                     , p1.n
                     , p2.n
              INTO n1, n2, s, d, ps, pd
              FROM pentagons p1, pentagons p2
              WHERE p1.p = current_tuple.p + other_tuple.p
              AND   p2.p = abs( current_tuple.p - other_tuple.p );

              RAISE INFO 'P(%) + P(%) = P(%) =  %',
                         n1, n2, ps, s;

             RAISE INFO 'P(%) - P(%) = P(%) =  %',
                        n1, n2, pd, d;


              fnd := fnd + 1;
              RETURN NEXT;
              RETURN;
           END IF;

        END LOOP;

        RETURN;
END
$CODE$
LANGUAGE plpgsql;

It is quite simple to understand:

it performs a one-record-at-time loop placing every row of pentagons into current_tuple;
it searches for another_tuple in pentagons so that the sum and the difference EXISTS in pentagons at the very same time. This means that the other_tuple and current_tuple lead to a sum and a difference that is still another pentagon number;
when such tuple is FOUND, the output tuple is built.

In order to get the reverse values that lead to the sum and difference, I do another double join with pentagons to get out the result.
The RAISE instructions are placed only to provide a textual representation of the expressions.
Launching the function on a very little virtual machine, busy in doing other stuff, results in:

testdb=> SELECT * FROM f_pentagons_pairs();
INFO:  P(1020) + P(2167) = P(2395) =  8602840
INFO:  P(1020) - P(2167) = P(1912) =  5482660
  n1  |  n2  |    s    |    d    |  ps  |  pd
------+------+---------+---------+------+------
 1020 | 2167 | 8602840 | 5482660 | 2395 | 1912
(1 row)

Time: 3346,886 ms (00:03,347)

A CTE Approach

Is there the need for the pentagons table? Uhm…it is possible to materialize the same set of data with a recursive CTE. And, therefore, it is possible to move the query at the outer level so that there is no need to perform a record-by-record scan. After all, SQL is a set based language!

WITH RECURSIVE pentagons( n, p )
AS
(
        SELECT 1 AS n
               , f_pentagon( 1 ) AS p

UNION
        SELECT p.n + 1
               , f_pentagon( p.n + 1 )
        FROM pentagons p
        WHERE p.n < 5000
)

SELECT format( '%s, %s', l.n, r.n ) AS pentagon_pairs
FROM pentagons l, pentagons r
WHERE EXISTS(
      SELECT *
      FROM pentagons ps
      WHERE ps.p = l.p + r.p
      )
AND EXISTS (
    SELECT *
    FROM pentagons ps
    WHERE ps.p = abs( l.p - r.p )
    )
;

The query executes in a little less time than the approach using the table and the record-based function:

 pentagon_pairs
----------------
 1020, 2167
 2167, 1020
(2 rows)

Time: 5820,066 ms (00:05,820)

There are some details that are harder to tune with the CTE approach, most notably the reverse lookup of the resulting base numbers and the exclusion of the duplicated row. However, it is possible to tune it to your needs.
Why is the CTE approach require more time than the function approach? Well, even if the times are similar, the function terminates as soon as it finds a solution, while the CTE does not, and therefore scans the whole dataset.

Plans!

Timing is not as much difference as it could seem at glance, and effectively the two approaches are comparable with regard to performances. The execution plans are, of course, a lot more different since the function approach works as a black box:

testdb=> EXPLAIN ANALYZE SELECT * FROM f_pentagons_pairs();
INFO:  P(1020) + P(2167) = P(2395) =  8602840
INFO:  P(1020) - P(2167) = P(1912) =  5482660
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Function Scan on f_pentagons_pairs  (cost=0.25..10.25 rows=1000 width=48) (actual time=4754.081..4754.082 rows=1 loops=1)
 Planning Time: 0.047 ms
 Execution Time: 4856.988 ms
(3 rows)

Time: 4909,165 ms (00:04,909)
testdb=> EXPLAIN ANALYZE WITH RECURSIVE pentagons( n, p )
AS
(
        SELECT 1 AS n
               , f_pentagon( 1 ) AS p
UNION
        SELECT p.n + 1
               , f_pentagon( p.n + 1 )
        FROM pentagons p
        WHERE p.n < 5000
)
SELECT format( '%s, %s', l.n, r.n ) AS pentagon_pairs
FROM pentagons l, pentagons r
WHERE EXISTS(
      SELECT *
      FROM pentagons ps
      WHERE ps.p = l.p + r.p
      )
AND EXISTS (
    SELECT *
    FROM pentagons ps
    WHERE ps.p = abs( l.p - r.p )
    )
;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Hash Semi Join  (cost=5.57..30.98 rows=23 width=32) (actual time=2659.801..10415.225 rows=2 loops=1)
   Hash Cond: (abs((l.p - r.p)) = ps_1.p)
   CTE pentagons
     ->  Recursive Union  (cost=0.00..3.56 rows=31 width=12) (actual time=0.002..21.821 rows=5000 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=12) (actual time=0.001..0.001 rows=1 loops=1)
           ->  WorkTable Scan on pentagons p  (cost=0.00..0.29 rows=3 width=12) (actual time=0.000..0.000 rows=1 loops=5000)
                 Filter: (n < 5000)
                 Rows Removed by Filter: 0
   ->  Hash Semi Join  (cost=1.01..25.63 rows=149 width=24) (actual time=361.447..10156.895 rows=5341 loops=1)
         Hash Cond: ((l.p + r.p) = ps.p)
         ->  Nested Loop  (cost=0.00..20.15 rows=961 width=24) (actual time=0.005..6323.391 rows=25000000 loops=1)
               ->  CTE Scan on pentagons l  (cost=0.00..0.62 rows=31 width=12) (actual time=0.002..0.671 rows=5000 loops=1)
               ->  CTE Scan on pentagons r  (cost=0.00..0.62 rows=31 width=12) (actual time=0.000..0.546 rows=5000 loops=5000)
         ->  Hash  (cost=0.62..0.62 rows=31 width=8) (actual time=354.101..354.101 rows=5000 loops=1)
               Buckets: 8192 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 260kB
               ->  CTE Scan on pentagons ps  (cost=0.00..0.62 rows=31 width=8) (actual time=0.001..0.405 rows=5000 loops=1)
   ->  Hash  (cost=0.62..0.62 rows=31 width=8) (actual time=255.920..255.920 rows=5000 loops=1)
         Buckets: 8192 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 260kB
         ->  CTE Scan on pentagons ps_1  (cost=0.00..0.62 rows=31 width=8) (actual time=0.004..49.709 rows=5000 loops=1)
 Planning Time: 0.195 ms
 Execution Time: 10431.113 ms
(21 rows)

Time: 10509,619 ms (00:10,510)

<br/<

Changing the CTE between MATERIALIZED and NOT MATERIALIZED does not produce any sensible change, of course.
Creating an index on pentagons(p) makes the function approach a little faster, but not very much faster since it is used only in the final part of the function.

A query only approach

Having the pentagons table in place, it is possible to use it as the materialization of the CTE, thus pushing the query out of the function and not within the CTE:

testdb=> SELECT format( '%s, %s', l.n, r.n ) AS pentagon_pairs
FROM pentagons l, pentagons r
WHERE EXISTS(
      SELECT *
      FROM pentagons ps
      WHERE ps.p = l.p + r.p
      )
AND EXISTS (
    SELECT *
    FROM pentagons ps
    WHERE ps.p = abs( l.p - r.p )
    )
;
 pentagon_pairs
----------------
 1020, 2167
 2167, 1020
(2 rows)

Time: 5024,468 ms (00:05,024)

It is possible to push some LIMIT 1 into the subqueries, so to force them to terminate as soon as a match is found, and this slightly improves the speed of the whole query:

testdb=> SELECT format( '%s, %s', l.n, r.n ) AS pentagon_pairs
FROM pentagons l, pentagons r
WHERE EXISTS(
      SELECT *
      FROM pentagons ps
      WHERE ps.p = l.p + r.p LIMIT 1
      )
AND EXISTS (
    SELECT *
    FROM pentagons ps
    WHERE ps.p = abs( l.p - r.p ) LIMIT 1
    )
;
 pentagon_pairs
----------------
 1020, 2167
 2167, 1020
(2 rows)

Time: 4328,603 ms (00:04,329)

The number of rows on the table is, however, too small for triggering the usage of the index, even forcing an ORDER BY. Even an including index, that could cover all the columns, will not help in this case.

Conclusiops

There is more than one way to do it!
No, sorry, this is not Perl, but PostgreSQL! However, given a specific problem, PostgreSQL provides a lot of fun and tools to solve a task.

kill that backend!

2021-12-06T00:00:00+00:00

How to kill a backend process, the right way!

`kill` that backend!

Sometimes it happens: you need, as a DBA, to be harsh and terminate a backend, that is a user connection.
There are two main ways to do that:

use the operating system kill(1) command to, well, kill such process;
use PostgreSQL administrative functions like pg_terminate_backend() or the more polite `pg_cancel_backend()**.

PostgreSQL `pg_cancel_backend()` and `pg_terminate_backend()`

What is the difference between the two functions?
Quite easy to understand: pg_cancel_backend() sends a SIGINT to the backend process, that is it asks politely to exit. It is the equivalent of a standard kill -INT against the process.
But, what does it mean to aks politely to exit? It means to cancel the current query, that is it does not terminates the user session, rather the user interaction. That is why it is mapped to SIGINT, the equivalent to CTRL-c (interrupt by keyboard).
On the other hand, pg_terminate_backend() sends a SIGTERM to the process, that is equivalent to kill -TERM and forces brutally the process to exit.

Now, Kill it!

Which method should you use?
If you are absolutely sure about what you are doing, you can use whatever method you want!
But sometimes caffeine is at a too low level in your body to do it right, you should use the PostgreSQL way! There are at least two good reasons to use the PostgreSQL administrative functions:

you don’t need access to the server, i.e., you don’t need an operating system shell;
you will not accidentally kill another process.

The first reason is really simple to understand, and improves security about the machine hosting PostgreSQL, at least in my opinion.
The second reason is a little less obvious, and relies on the fact that pg_cancel_backends() and pg_terminate_backend() act only against processes within the PostgreSQL space, that is only processes spawn by the postmaster.
Let’s see this in action: imagine we select the wrong process to kill, like 174601 that is running Emacs on the server.

% ssh luca@miguel 'ps -aux | grep emacs'
luca      174601  1.6  4.6 320068 46584 pts/0    S+   08:40   0:04 emacs


% psql -h miguel -U postgres -c "SELECT pg_cancel_backend( 174601 );" testdb
WARNING:  PID 174601 is not a PostgreSQL server process
 pg_cancel_backend 
-------------------
 f
(1 row)



% psql -h miguel -U postgres -c "SELECT pg_terminate_backend( 174601 );" testdb
WARNING:  PID 174601 is not a PostgreSQL server process
 pg_terminate_backend 
----------------------
 f
(1 row)

As you can see, there is no way to misbehave against a non PostgreSQL process! The logs provide, of course, the very same warning message:

WARNING:  PID 174601 is not a PostgreSQL server process

Now, imagine what happened if the administrator did run something like:

% ssh luca@miguel 'sudo kill 1747601'

The process, in this case Emacs, would have been killed.

Conclusions

While you can always use the well known Unix tools to interact with PostgreSQL processes, it is strongly suggested to use the PostgreSQL tools. This improves safety checks and requires less effort in keeping track of what is happening on the cluster.

pgdump, text and xz

2021-12-06T00:00:00+00:00

A not-scientific look at how to compress a set of SQL dumps.

pgdump, text and xz

I have a database that contains around 50 GB of data. I do continuos backup thru pgBackRest, I also do regular pg_dump in directory format via multiple jobs, so I’m fine with backups.
However, why not have a look at SQL backups?
First of all: the content of the database is mostly numeric, being a quite large container of sensors data. This means that the data should be very good for compression.
Moreover, tables are partitioned on a per-year and per-month basis, therefore I have a regular structure with one year table and twelve month childrens. For instance, in the current year there is a table named y2021 with other partitions named y2021m01 thru y2021m12.

`pg_dump` in text mode

I did a simple for loop in my shell to produce a few backup files, separating every single file by its year:

% for y in $(echo 2018 2019 2020 2021 2022 ); do
echo "Backup year $y"
time pg_dump -h miguel -U postgres -f sensorsdb.$y.sql -t "respi.y${y}*" sensorsdb
done

This produce the following amount of data:

% ls -sh1 *.sql     
3,5G sensorsdb.2018.sql
 13G sensorsdb.2019.sql
 12G sensorsdb.2020.sql
 10G sensorsdb.2021.sql
 20K sensorsdb.2022.sql

The following is a table that summarizes the file size and the time required to create it:

year	SQL size	time
2018	3.5 GB	7 minutes
2019	13 GB	20 minutes
2020	12 GB	20 minutes
2021	10 GB	17 minutes

Compress them!

Use xz with the default settings, that according to my installation is a compression level 6:

% for y in $(echo 2018 2019 2020 2021 2022 ); do
echo "Compress year $y"
time xz sensorsdb.$y.sql                                                          
done

Compress year 2018
xz sensorsdb.$y.sql  2911,75s user 12,62s system 98% cpu 49:22,22 total
Compress year 2019
xz sensorsdb.$y.sql  7411,57s user 41,22s system 98% cpu 2:06:24,38 total
Compress year 2020
xz sensorsdb.$y.sql  6599,22s user 19,08s system 98% cpu 1:52:07,38 total
Compress year 2021
xz sensorsdb.$y.sql  5487,37s user 15,25s system 98% cpu 1:33:08,32 total
Compress year 2022
xz sensorsdb.$y.sql  0,01s user 0,01s system 36% cpu 0,069 total

It requires from one to two hours to compress every single file, as summarized in the following table:

File size	Time	Compressed size	Compression ratio
3.5 GB	50 minutes	227 MB	92 %
13 GB	2 hours	766 MB	94 %
12 GB	2 hours	658 MB	94 %
10 GB	1 and half hour	566 MB	94 %

Therefore, xz is a great tool to compress dump data, especially if that data is textual and most in a numeric form. Unluckily, xz results a little slow when applied with the default compression.
How much does it take to decompress the data? Well, it takes around 4 minutes for every file, that is much faster than the compression.

Just as a comparison, doing a compression with -2 instead of -6 requires around one quarter of the time doing only 1/3 of less compression, e.g., 13 GB required 35 minutes instead of 120 minutes, requiring 1.1 GB of disk space instead of 0.77 GB. Let's see the result using-2` as default compression:

File size	Time	Compressed size	Compression ratio
3.5 GB	10 minutes	338 MB	90 %
13 GB	35 minutes	1.1 GB	91 %
12 GB	37 minutes	918 MB	92 %
10 GB	30 minutes	786 MB	92 %

As you can see, using compression -2 can greatly improve the speed of compression with a minum extra disk space requirement.
What about a directory format of dumping? Well, the same backup with pg_dump -Fd, that defaults at creating compressed objects, required 4.7 GB of disk space. The xz version requires from 3.1 GB (compression -2) to 2.2 GB (compression -6).

Conclusions

xz can help you save a lot of disk storage for textual (SQL) backups, but the default compression level could require an huge amount of time, especially on not-so-poweful machines. However, a lower level of compression can greatly make pg_dump and xz as fast as pg_dump -Fd with some extra space saving.

Monitoring Schema Changes via Last Commit Timestamp

2021-11-26T00:00:00+00:00

An ugly way to introspect database changes.

Monitoring Schema Changes via Last Commit Timestamp

A few days ago, a colleague of mine shown to me that a commercial database keeps track of last DDL change timestamp against database objects.
I began to mumble… is that possible in PostgreSQL? Of course it is, but what is the smartest way to achieve it?
I asked on the mailing list, because the first idea that came into my mind was to use commit timestamps.
Clearly, it is possible to implement something that can do the job using event triggers, that in short are triggers not attached to table tuples rather to database event like DDL commands. Great! And in fact, a very good explaination can be found here.
In this article, I present my first idea about using commit timestamps.
The system used for the test is PostgreSQL 13.4 running on Fedora Linux, with only myself connected to it (this simplifies following transactions). The idea is, in any case, general and easy enough to be used on busy systems.

Introduction to `pg_last_committed_xact()`

The special function pg_last_committed_xact() allows the database administrator (or an user) to get information about which transaction has committed last.
Let’s see this in action:

% psql -U luca -h miguel -c 'select pg_last_committed_xact();'   testdb
ERROR:  could not get commit timestamp data
HINT:  Make sure the configuration parameter "track_commit_timestamp" is set.

First of all in order to get information about the committed transaction timestamps, there must be the option track_commit_timestamp configured.
Turning on and off the parameter will not provide historic data, that is even if you had the parameter on and then you turned off, you will not be able to access collected data.
Let’s turn on the parameter and see how it works. The track_commit_timestamp is a parameter with the postmaster context, and therefore requires a server restart!

% psql -U postgres -h miguel \
       -c 'ALTER SYSTEM SET track_commit_timestamp to "on"; ' \
       testdb
ALTER SYSTEM
% ssh luca@miguel 'sudo systemctl restart postgresql-13'

In the above I restarted a remote system via ssh, of course you are free to configure the parameter and restart the cluster with your preferred (or available) method.
It is now time to see which information we can get with track_commit_timestamp turned on.

testdb=> SELECT txid_current();
-[ RECORD 1 ]+-------------
txid_current | 380316302458

testdb=> SELECT *  FROM  txid_status( 380316302457 ), 
                         pg_last_committed_xact();
-[ RECORD 1 ]------------------------------
txid_status | committed
xid         | 2359180410
timestamp   | 2021-11-20 04:28:50.223275-05

Let’s dissect the above example:

txid_current() simulates a new transaction in one row, because the function gets a new xid (transaction identifier) even if not used for effective work;
txid_status() accepts a xid identifier and returns a string with the status of the transaction, and as shown, the fake transaction 380316302458 results in status committed;
pg_last_committed_xact() now is able to report both the xid and the timestamp at which the last transaction has committed, that is the transaction 380316302458 committed at 2021-11-20 04:28:50.223275-05.

Wait a minute: pg_last_committed_xact() states that the last committed transaction is 2359180410, not 380316302458. What is happening?
Wrap-around is on its way!
The above system has done a so called xid wraparound, that is normal situation in a long running PostgreSQL instance. What this means, is that txid_current() is resturning a bumped value that is, somehow, an absolute value. However, PostgreSQL “reasons” in terms of values modulo 2^32, therefore we must take into account this possible difference.
The above example therefore becomes:

testdb=> SELECT txid_current() as xid_absolute, 
                mod( txid_current(), pow( 2, 32 )::bigint )  as xid;
-[ RECORD 1 ]+-------------
xid_absolute | 380316302460
xid          | 2359180412

testdb=> SELECT *  FROM  
              txid_status( 380316302460 ) as xid_abs_status, 
              txid_status( 2359180412 ) as xid_status,  
              pg_last_committed_xact();
-[ RECORD 1 ]--+------------------------------
xid_abs_status | committed
xid_status     | 
xid            | 2359180412
timestamp      | 2021-11-20 04:34:54.531106-05

The above demonstrates that transactions 380316302460 and 2359180412 are the same, according to PostgreSQL. However, txid_status() requires an “absolute” xid number (note how the short transaction number does not report any status), while pg_last_committed_xact() reasons in terms of “running” numbers, i.e., the modulo ones.
There is another interesting function to keep in mind: pg_xact_commit_timestamp() that, given a transaction identifier, returns the known commit timestamp:

testdb=> SELECT * FROM 
                pg_xact_commit_timestamp( 2359180412::text::xid ), 
                pg_last_committed_xact();
-[ RECORD 1 ]------------+------------------------------
pg_xact_commit_timestamp | 2021-11-20 04:34:54.531106-05
xid                      | 2359180412
timestamp                | 2021-11-20 04:34:54.531106-05

As you can see, the timestamp for the same transaction is always the same. Note that a bigint requires a conversion to text before being translated into a xid.

Tracking DDL Commands

Every table in PostgreSQL has two hidden fields that track the transaction ranges: xmin indicates the transaction that created a tuple, while xmax indicates the transaction that invalidated the tuple. This is used in the MVCC (Multi Version Concurrency Control) machinery that I’m not going to discuss here, so trust that everything works just fine.
The keypoint here is: every table has fields that track the transaction that generated the tuple. This applies also to system catalogs, and in particular (with regard to this article) to pg_class.
Having stated that, and knowing that every time a DDL command applies, something is changed in the system catalogs, it is therefore possible to track when changes did happen on a particular database object or table.
Let’s see this in action:

testdb=> BEGIN;
BEGIN
testdb=> SELECT txid_current() as xid_absolute, 
                mod( txid_current(), pow( 2, 32 )::bigint ) as xid, 
                current_timestamp;
-[ RECORD 1 ]-----+------------------------------
xid_absolute      | 380316302463
xid               | 2359180415
current_timestamp | 2021-11-20 05:11:56.343542-05

testdb=> CREATE TABLE ddl_test( 
           pk int generated always as identity, 
           t text );
CREATE TABLE
testdb=> COMMIT;
COMMIT

At timestamp 2021-11-20 05:11:56 the table ddl_test has been created. Since every DDL command in PostgreSQL is transactional, it is possible to track the transaction that committed such DDL (in the above example, 380316302463 alis 2359180415).
Let’s query pg_class to get information about last DDL commands on ddl_test table:

testdb=> SELECT age( xmin ) as transaction_before
                , xmin as it_was_transaction_number
                , pg_xact_commit_timestamp( xmin ) as modified_at
                , relname as table
       FROM pg_class
       WHERE relkind = 'r'
       AND relname   = 'ddl_test';
-[ RECORD 1 ]-------------+------------------------------
transaction_before        | 1
it_was_transaction_number | 2359180415
modified_at               | 2021-11-20 05:12:21.359126-05
table                     | ddl_test

The above queries tells us that 1 transaction ago the transaction number 2359180415 modified the structure of ddl_test at timestamp 2021-11-20 05:12:21.359126-05**. <br/> **Everything seems fine except for the timestamp**: the transaction timestamp is not really the same as reported bypg_xact_commit_timestamp()`. The reason for this is that the moment a transaction commits is not the same as the transaction is consolidated, therefore there could some offset and lag. However, checking deeper we can see that data is coherent:

testdb=> SELECT * FROM pg_last_committed_xact();
-[ RECORD 1 ]----------------------------
xid       | 2359180415
timestamp | 2021-11-20 05:12:21.359126-05

So this is a first ugly but pretty much unexpensive way to track changes to the table.

Let’s now add a column to the table, so to see if this machinery can work:

testdb=> BEGIN;
BEGIN
testdb=> SELECT txid_current() as xid_absolute, mod( txid_current(), pow( 2, 32 )::bigint ) as xid, current_timestamp;
-[ RECORD 1 ]-----+------------------------------
xid_absolute      | 380316302464
xid               | 2359180416
current_timestamp | 2021-11-20 05:21:03.089031-05

testdb=> ALTER TABLE ddl_test ADD COLUMN tt text;
ALTER TABLE
testdb=> COMMIT;
COMMIT


testdb=> SELECT * FROM pg_last_committed_xact();
-[ RECORD 1 ]----------------------------
xid       | 2359180416
timestamp | 2021-11-20 05:21:32.376468-05

Transaction 2359180416 at timestamp 2021-11-20 05:21:32.376468-05 committed the ALTER TABLE. Let’s run again our query against pg_class:

testdb=> SELECT age( xmin ) as transaction_before
                , xmin as it_was_transaction_number
                , pg_xact_commit_timestamp( xmin ) as modified_at
                , relname as table
         FROM pg_class
         WHERE relkind = 'r'
         AND relname   = 'ddl_test';
-[ RECORD 1 ]-------------+------------------------------
transaction_before        | 1
it_was_transaction_number | 2359180416
modified_at               | 2021-11-20 05:21:32.376468-05
table                     | ddl_test

Therefore we now know when the table was last touched by a DDL command.

Going Deeper: Introspection Against Columns

From the above we now know when a change happened to our table, but we don’t know which attribute has been changed. It is possible to push the same logic against other parts of the system catalog, for example pg_attribute that handles information about single table columns.
Here it the example applied to our demo table:

testdb=> SELECT xmin, attname, age( xmin ), pg_xact_commit_timestamp( xmin ) 
FROM pg_attribute
WHERE attrelid = 'ddl_test'::regclass;
    xmin    | attname  | age |   pg_xact_commit_timestamp    
------------+----------+-----+-------------------------------
 2359180415 | tableoid |   2 | 2021-11-20 05:12:21.359126-05
 2359180415 | cmax     |   2 | 2021-11-20 05:12:21.359126-05
 2359180415 | xmax     |   2 | 2021-11-20 05:12:21.359126-05
 2359180415 | cmin     |   2 | 2021-11-20 05:12:21.359126-05
 2359180415 | xmin     |   2 | 2021-11-20 05:12:21.359126-05
 2359180415 | ctid     |   2 | 2021-11-20 05:12:21.359126-05
 2359180415 | pk       |   2 | 2021-11-20 05:12:21.359126-05
 2359180415 | t        |   2 | 2021-11-20 05:12:21.359126-05
 2359180416 | tt       |   1 | 2021-11-20 05:21:32.376468-05

All the columns except tt have been created by the very same transaction at the very same timestamp, while tt has been touched from another transation 11 minutes after.
The above is not very useful, so it is possible to improve sligthly the query into the following one:

testdb=> SELECT array_agg( attname ) as columns,  
                current_timestamp - pg_xact_commit_timestamp( xmin ) as when 
         FROM pg_attribute
         WHERE attrelid = 'ddl_test'::regclass
         GROUP BY pg_xact_commit_timestamp( xmin );
                 columns                  |      when       
------------------------------------------+-----------------
 {tableoid,cmax,xmax,cmin,xmin,ctid,pk,t} | 00:19:38.202794
 {tt}                                     | 00:10:27.185452

That reports all the column “touched” at the very same time and how many time has elapsed from the last change. For example, the column tt has been changed 10 minutes ago, while the other columns 19 minutes ago.
Let’s do more changes to our table and see what happen; please note that everything is executed in autocommit mode:

testdb=> ALTER TABLE ddl_test ADD COLUMN ttt text;
ALTER TABLE

testdb=> ALTER TABLE ddl_test 
           ALTER COLUMN tt SET DEFAULT 'FizzBuzz';
ALTER TABLE


testdb=> ALTER TABLE ddl_test DROP COLUMN t;
ALTER TABLE

testdb=> SELECT * FROM pg_last_committed_xact();
    xid     |          timestamp           
------------+------------------------------
 2359180419 | 2021-11-20 05:36:48.54285-05

If we inspect again pg_attribute we have:

testdb=> SELECT array_agg( attname ) as columns,  
                current_timestamp - pg_xact_commit_timestamp( xmin ) as time_ago, 
                pg_xact_commit_timestamp( xmin ) as when       
         FROM pg_attribute
         WHERE attrelid = 'ddl_test'::regclass
         GROUP BY pg_xact_commit_timestamp( xmin );
                columns                 |    time_ago     |             when              
----------------------------------------+-----------------+-------------------------------
 {tableoid,cmax,xmax,cmin,xmin,ctid,pk} | 00:26:25.685984 | 2021-11-20 05:12:21.359126-05
 {ttt}                                  | 00:04:08.244367 | 2021-11-20 05:34:38.800743-05
 {tt}                                   | 00:02:31.791574 | 2021-11-20 05:36:15.253536-05
 {........pg.dropped.2........}         | 00:01:58.50226  | 2021-11-20 05:36:48.54285-05


testdb=> SELECT age( xmin ) as transaction_before
                , xmin as it_was_transaction_number
                , pg_xact_commit_timestamp( xmin ) as modified_at
                , relname as table
         FROM pg_class
         WHERE relkind = 'r'
         AND relname   = 'ddl_test';
-[ RECORD 1 ]-------------+------------------------------
transaction_before        | 3
it_was_transaction_number | 2359180417
modified_at               | 2021-11-20 05:34:38.800743-05
table                     | ddl_test

There are some interesting things in the above output. First of all, pg_class reports only the changes related to new attributes, not the dropped ones or the internally changed. On the other hand, pg_attribute reports information about every single attribute, including those changed in a “minor” mode (the SET DEFAULT for instance).
Please note how the dropped column (namely t) is no more visible, even if there is pg.dropped.2 that clearly refers to such column. In the above example it is easy enough: only one column has been dropped in a single user instance, however in a more concurrent system it is hard to get track about the information related to dropped attributes. For more information about the dropped columns, please see my previous article about why PostgreSQL does not reclaim disk space on column drop.

What about `VACUUM`?

The VACUUM FULL command totally rewrites a table, therefore this means that every information about transactions that have “touched” systsem catalogs are updated by a newer transaction. This does not mean that VACUUM is a transactional command, rather it happen to do a CREATE TABLE pretty much as we did manually.

testdb=> VACUUM FULL ddl_test;                                                                                 
VACUUM
testdb=> SELECT age( xmin ) as transaction_before
                , xmin as it_was_transaction_number
                , current_timestamp - pg_xact_commit_timestamp( xmin ) as modified_since
                , relname as table
         FROM pg_class
         WHERE relkind = 'r'
         AND relname   = 'ddl_test';
 transaction_before | it_was_transaction_number | modified_since  |  table   
--------------------+---------------------------+-----------------+----------
                  1 |                2359180423 | 00:00:02.615678 | ddl_test
(1 row)

testdb=> SELECT array_agg( attname ) as columns,  
                current_timestamp - pg_xact_commit_timestamp( xmin ) as when 
         FROM pg_attribute
         WHERE attrelid = 'ddl_test'::regclass
         GROUP BY pg_xact_commit_timestamp( xmin );
                columns                 |      when       
----------------------------------------+-----------------
 {tableoid,cmax,xmax,cmin,xmin,ctid,pk} | 00:50:03.972343
 {ttt}                                  | 00:27:46.530726
 {tt}                                   | 00:26:10.077933
 {........pg.dropped.2........}         | 00:25:36.788619

It is interesting to note an apparent inconsistency: the table has been modified 2 seconds ago while the columns have been touched between 25 and 50 minutes ago. How is that possible? Well, VACUUM FULL has rewritten the table but metadata about columns did not change.
In short, this is an indicator about VACUUM FULL execution: if the change time of a table is earlier than that of its columns probably vacuum ran. The correct way to know when VACUUM FULL run is to inspect appropriate catalogs like pg_stat_user_tables. In any case, combining these information help understanding what happened into the system.

Let’s see something about VACUUM:

testdb=> SELECT age( xmin ) as transaction_before
                , xmin as it_was_transaction_number
                , current_timestamp - pg_xact_commit_timestamp( xmin ) as modified_since
                , relname as table
         FROM pg_class
         WHERE relkind = 'r'
         AND relname   = 'ddl_test';
-[ RECORD 1 ]-------------+----------------
transaction_before        | 11
it_was_transaction_number | 2359180423
modified_since            | 00:20:41.953205
table                     | ddl_test


testdb=> VACUUM ddl_test;
VACUUM

testdb=> SELECT age( xmin ) as transaction_before
                , xmin as it_was_transaction_number
                , current_timestamp - pg_xact_commit_timestamp( xmin ) as modified_since
                , relname as table
         FROM pg_class
         WHERE relkind = 'r'
         AND relname   = 'ddl_test';
-[ RECORD 1 ]-------------+----------------
transaction_before        | 11
it_was_transaction_number | 2359180423
modified_since            | 00:20:51.272209
table                     | ddl_test

The result is that pg_class is unchanged, with regard to the transaction that generated the tuple.
Why?
Since VACUUM is a command that cannot be run within a transaction, it cannot be considered in the described workflow, therefore it is like an invisible command (with regard to transactions).

What about `ANALYZE`?

Unlike VACUUM, the command ANALYZE can be run in a transaction, and this is clearly shown by the age result increasing by one:

testdb=> SELECT age( xmin ) as transaction_before
                , xmin as it_was_transaction_number
                , current_timestamp - pg_xact_commit_timestamp( xmin ) as modified_since
                , relname as table
         FROM pg_class
         WHERE relkind = 'r'
         AND relname   = 'ddl_test';
-[ RECORD 1 ]-------------+----------------
transaction_before        | 14
it_was_transaction_number | 2359180423
modified_since            | 01:37:54.483495
table                     | ddl_test


testdb=> ANALYZE ddl_test;
ANALYZE

testdb=> SELECT age( xmin ) as transaction_before
                , xmin as it_was_transaction_number
                , current_timestamp - pg_xact_commit_timestamp( xmin ) as modified_since
                , relname as table
         FROM pg_class
         WHERE relkind = 'r'
         AND relname   = 'ddl_test';
-[ RECORD 1 ]-------------+----------------
transaction_before        | 15
it_was_transaction_number | 2359180423
modified_since            | 01:38:05.267443
table                     | ddl_test

What is not changing in the above example is the transaction that generated the tuple in pg_class: it is always 2359180423, before and after the ANALYZE command (that did run in a transaction).
Why?
Well, ANALYZE hits another table: pg_statistic. Such table is the root of all statistical information like pg_stat_user_tables and friends, and is the one updated by ANALYZE. This can be clearly inspected with a similar query:

testdb=# SELECT xmin, age( xmin ), staattnum 
         FROM pg_statistic 
         WHERE starelid = 'ddl_test'::regclass;
    xmin    | age | staattnum 
------------+-----+-----------
 2359180437 |   1 |         1
 2359180437 |   1 |         3
 2359180437 |   1 |         4

Please note that the query has been run as a superuser, because of the need of privileges. The result set is made of three rows because there are three “active” (i.e., not dropped) columns within the table, and all of them has been modified (from a statistic point of view) by ANALYZE, that ran in transaction 2359180437 that is now one transaction far (i.e., it was the previous transaction).

Conclusions

Keeping track of commit timestamps could be useful for database introspection, at least to get a glance at when things changed.
The same trick can also be used against regular table tuples, to get an idea of when a tuple appeared in that form in the table.
However, this is not a very good approach, and something much more complex can be built like using already mentioned event triggers.
But hey, this is PostgreSQL: you can extend it in pretty much any direction!

pgenv `config migrate`

2021-11-24T00:00:00+00:00

pgenv 1.2.1 introduces a different configuration setup.

`pgenv config migrate`

Just a few hours I blogged about some new cool features in pgenv, I completed the work about configuration in one place.
Now pgenv will keep all configuration files into a single directory, named config . This is useful because it allows you to backup and/or migrate all the configuration from one machine to another easily.
But it’s not all: since the configuration is now under a single directory, the single configuration file name has changed. Before this release, a configuration file was named like .pgenv.PGVERSION.conf, with the .pgenv prefix that both made the file hidden and stated to which application such file belongs to. Since the configuration files are now into a subdirectory, the prefix has been dropped, so that every configuration file is now simply named as PGVERSION.conf, like for example 10.4.conf.
And since we like to make things easy, there is a config migrate command that helps you move your existing configuration from the old naming scheme to the new one:

% pgenv config migrate
Migrated 3 configuration file(s) from previous versions (0 not migrated)
Your configuration file(s) are now into [~/git/misc/PostgreSQL/pgenv/config]

Let’s have fun with pgenv!

New features in pgenv

2021-11-18T00:00:00+00:00

pgenv 1.2 introduces a few nice features.

New features in `pgenv`

pgenv is a great tool to simply manage different binary installations of PostgreSQL.
It is a shell script, specifically designed for the Bash shell, that provides a single command named pgenv that accepts sub-commands to fetch, configure, install, start and stop different PostgreSQL versions on the same machine.
It is not designed to be used in production or in an enterprise environment, even if it could, but rather it is designed to be used as a compact and simple way to switch between different versions in order to test applications and libraries.

In the last few weeks, there has been quite work around pgenv, most notably:

support for multiple configuration flags;
consistent behavior about configuration files.

In the following, I briefly describe each of the above.

Support for multiple configuration flags

pgenv does support configuration files, where you can store shell variables that drive the PostgreSQL build and configuration. One problem pgenv had was due to the limitation of the shell environment variables: since they represent a single value, passing multiple values separated by spaces was not possible. This made build flags, e.g., CFLAGS hard to write if not impossible.
Since this commit, David (the original author) introduced the capability to configure options containing spaces. The trick was to switch from simple environment variables to Bash arrays, so that the configuration can be written as

PGENV_CONFIGURE_OPTIONS=(
    --with-perl
    --with-openssl
    'CFLAGS=-I/opt/local/opt/openssl/include -I/opt/local/opt/libxml2/include'
    'LDFLAGS=-L/opt/local/opt/openssl/lib -L/opt/local/opt/libxml2/lib'
)

where the CFLAGS and LDFLAGS both contain spaces.
To be coherent, this also renamed a lot of _OPT_ parameters to _OPTIONS_ to reflect the fact that they now can contain multiple values.

Consistent behavior about configuration files

pgenv exploits a default configuration file when no specific PostgreSQL configuration is found. The idea is that, if you launch PostgreSQL version x, an .pgenv.x.conf file is searched for, and if not found, the command tries to load the configuration from a default file named .pgenv.default.conf.
However, when you delete the configuration, the system did remove also the default configuration.
Therefore, since this commit, there is more consistency in the usage of the config subcommand.
In particular, in order to delete the default configuration you have to specify config delete defauòt explicitly, since config delete will no more nuke your default configuration. Moreover, the config init command has been added, so that you can initialize the configuration and then modify it by means of the config write command. Why these two commands? Well, config init will create a “default” configuration file from scratch with current default settings, while config write will modify the specified configuration.

There is more…

I’m currently working at another change in the configuration subsystem, so that you can keep all the configuration files into a single directory. The idea is to ease the migration of pgenv to a different machine (e.g., a new one), keeping your own configuration.

My Perl Weekly Challenge Solutions in PostgreSQL

2021-11-09T00:00:00+00:00

Pushing PostgreSQL solutions to my own repositories.

My Perl Weekly Challenge Solutions in PostgreSQL

Starting back at Perl Weekly Challenge 136, I decided to try to implement, whenever possible (to me), the challenges not only in Raku (i.e., Perl 6), but also in PostgreSQL (either pure SQL or plpgsql).

Recently, I modified my sync script that drags solutions from the official Perl Weekly Challenge repository to my own repositories, and of course, I added a way to synchronized PostgreSQL solutions.

The solutions are now available on GitHub under the PWC directory of my PostgreSQL examples repository.

PostgreSQL USB Sticks in the Attic!

2021-11-08T00:00:00+00:00

USB sticks I found in the attic…

PostgreSQL USB Sticks in the Attic!

TLDR: this is not a technical post!

Cleaning the attic, I found a couple of old PostgreSQL USB Sticks.
It happened that, back at the Italian PostgreSQL Day (PGDay.IT) 2012, we (at the time I was an happy memeber of ITPUG) created PostgreSQL-branded USB sticks to give away as gadgets to participants.
The USB stick was cool, with soft rubber envelope, a clear white and blue elephant logo on its sides, the size of 4 GB (that back then, it was quite common) and a necklace.
However, it had something that I didn’t like.
So, when I was the ITPUG president back in 2013, I decided to change the design of the USB stick (as well as doubling its size).
Let’s inspect the differences, and please apologize if the sticks printing is not clear anymore, but well, some years have gone by:

The upper stick is the 2012 edition, the lower one is the 2013 edition.
Do you spot the difference?
Yes, the 2013 edition USB stick did have the PostgreSQL logo on one side and the ITPUG logo on the other side, while the 2012 edition did not have any reference to the organizing and local user group ITPUG!

When I decided to give a new spark to the ITPUG, I also decided to improve its visibility via such gadgets, that were too much generic and, for this reason, also re-usable in other events as PostgreSQL related gadgets.

Therefore, such gadget was both presenting PostgreSQL and the italian users’ group, no shame at all!

pg_upgrade and OpenBSD

2021-11-05T00:00:00+00:00

OpenBSD ships pg_upgrade as a separate package.

pg_upgrade and OpenBSD

I never noted that, on OpenBSD, the pg_upgrade command is not shipped with the default PostgreSQL server isntallation. I usually install PostgreSQL from sources, so I never digged into Open BSD packages. The choice of OpenBSD is to keep pg_upgrade separate from the rest of the binaries and executables of PostgreSQL.
Allow me to explain and let’s start from the installed binaries on a OpenBSD 7.0 machine:

% ls -1 /usr/local/bin/pg*
/usr/local/bin/pg_archivecleanup
/usr/local/bin/pg_basebackup
/usr/local/bin/pg_checksums
/usr/local/bin/pg_config
/usr/local/bin/pg_controldata
/usr/local/bin/pg_ctl
/usr/local/bin/pg_dump
/usr/local/bin/pg_dumpall
/usr/local/bin/pg_isready
/usr/local/bin/pg_receivewal
/usr/local/bin/pg_recvlogical
/usr/local/bin/pg_resetwal
/usr/local/bin/pg_restore
/usr/local/bin/pg_rewind
/usr/local/bin/pg_standby
/usr/local/bin/pg_test_fsync
/usr/local/bin/pg_test_timing
/usr/local/bin/pg_verifybackup
/usr/local/bin/pg_waldump
/usr/local/bin/pgbench

The server is a PostgreSQL 13.4, installed via pkg_add. The PostgreSQL contrib module is installed, but as you can see, there is no pg_upgrade binary in the above listing.
Let’s inspect the packages:

% pkg_info -Q postgresql

postgresql-client-13.4p0 (installed)
postgresql-contrib-13.4p0 (installed)
postgresql-docs-13.4p0 (installed)
postgresql-odbc-10.02.0000p0
postgresql-pg_upgrade-13.4p0
postgresql-pllua-2.0.7
postgresql-plpython-13.4p0
postgresql-plr-8.4.1
postgresql-server-13.4p0 (installed)

Please note the postgresql-pg_upgrade-13.4p0 that is what contains the pg_upgrade command:

% pkg_info  postgresql-pg_upgrade-13.4p0 
Information for https://cdn.openbsd.org/pub/OpenBSD/7.0/packages/amd64/postgresql-pg_upgrade-13.4p0.tgz

Comment:
Support for upgrading PostgreSQL data from previous version

Description:
Contains pg_upgrade, used for upgrading PostgreSQL database
directories to newer major versions without requiring a dump and
reload.

Maintainer: Pierre-Emmanuel Andre <[email protected]>

WWW: https://www.postgresql.org/

This choice of packaging is somehow strange.
Let’s install pg_upgrade:

% doas pkg_add postgresql-pg_upgrade
quirks-4.53 signed on 2021-10-30T11:32:24Z
postgresql-pg_upgrade-13.4p0:postgresql-previous-12.8: ok
postgresql-pg_upgrade-13.4p0: ok

% ls -lh $(which pg_upgrade)
-rwxr-xr-x  1 root  bin   185K Sep 26 21:25 /usr/local/bin/pg_upgrade

So, the binary itself is very tiny, and sizes at 185 kB, therefore placing it on its own package does not make sense with regard to the disk space occupation. However, please note that installing pg_upgrade also triggered the installation of postgresql-previous-12.8, that means the system has installed also PostgreSQL 12.8.
This is clearly shown from a query on such package:

% pkg_info postgresql-previous-12.8   
Information for inst:postgresql-previous-12.8

Comment:
PostgreSQL RDBMS (previous version, for pg_upgrade)

Required by:
postgresql-pg_upgrade-13.4p0

Description:
PostgreSQL RDBMS server, the previous version

This is the previous version of PostgreSQL, necessary to allow for
pg_upgrade to work in the currently supported PostgreSQL version.

And in fact, the package installs all the previous version of the cluster, included libraries and executables:

% pkg_info -L postgresql-previous-12.8 | grep bin
/usr/local/bin/postgresql-12/clusterdb
/usr/local/bin/postgresql-12/createdb
/usr/local/bin/postgresql-12/createuser
/usr/local/bin/postgresql-12/dropdb
/usr/local/bin/postgresql-12/dropuser
/usr/local/bin/postgresql-12/ecpg
/usr/local/bin/postgresql-12/initdb
/usr/local/bin/postgresql-12/oid2name
/usr/local/bin/postgresql-12/pg_archivecleanup
/usr/local/bin/postgresql-12/pg_basebackup
/usr/local/bin/postgresql-12/pg_checksums
/usr/local/bin/postgresql-12/pg_config
...

Therefore, installing pg_upgrade will also install the *whole previous major version of PostgreSQL.**

It was a separated packages since a while…

Inspecting the CVS of the ports tree, it is possible to note that the pg_upgrade command has been separated into a subpackage since 2016:

This moves pg_upgrade to a subpackage, and has that
subpackage depend on postgresql-previous.

In fact, this is the commit that made the pg_upgrade a distinct package into the build system.
The rationale about this can be found in the b2k16 hackaton article, where Jeremy Evans explain that in order to get pg_upgrade to work, there was the need to have the previous binaries for PostgreSQL. Therefore, the application has been moved to a different package, so that it can install also the previous binaries on the system.

Conclusions

The choice of keeping pg_upgrade as a separated package is a choice. I don’t think it is right or wrong, it is just a choice that ensures that if you decide to install a newer PostgreSQL, you must have a previous version to upgrade from.
Quite frankly, I don’t see the reason because I could have a different database version into the system, that I want to upgrade from, even if I did not have installed from ports.
Moreover, pg_upgrade can upgrade PostgreSQL even from non-sequential PostgreSQL versions, even if I personally don’t recommend this, especially if the “hole” in versioning is big. However, this means that installing the previous version of PostgreSQL could not be the right choice in every scenario. Again, this is not either a good or bad choice, it is just a choice and it must be noted that, unlike other operating systems, OpenBSD does not offer old versions of PostgreSQL as packages (if we exclude the -previous package), that means it is a choice coherent with the philosophy of the operating system.

Perl Weekly Challenge 136: PostgreSQL Solutions

2021-10-29T00:00:00+00:00

My personal solutions to the Perl Weekly Challenge, this time in PostgreSQL!

Perl Weekly Challenge 136: PostgreSQL Solutions

Wait a minute, what the hell is going on? A Perl challenge and PostgreSQL?
Well, it is almost two years now since I’ve started participating regurarly in the Perl Weekly Challenge, and I always solve the tasks in Raku (aka Perl 6).
Today I decided to spend a few minutes in order to try to solve the assigned tasks in PostgreSQL. And I tried to solve them in an SQL way: declaratively.

So here there are my solutions in PostgreSQL for the Challenge 136.

Task 1
Task 2

PWC 136 - Task 1

The first task asked to find out if two numbers are friends, meaning that their greatest common divisor should be a positive power of 2. This is quite easy to implement in pure SQL:

CREATE OR REPLACE FUNCTION friendly( m int, n int )
RETURNS int
AS $CODE$
   SELECT
        CASE gcd( m, n ) % 2
             WHEN 0  THEN 1
             ELSE 0
        END;
$CODE$
LANGUAGE SQL;

The gcd function finds out the greatest common divisor, then I apply the module % operator and catch the remainder: if it is 0 then the gcd is a power of 2, else it is not.

PWC 136 - Task 2

The second task was much more complicated to solve, and required, at least to me, a little try-and-modify approach. Given a specific value, we need to find out all unique combinations of numbers within the Fibonacci sequence that can lead to that value sum.
I decided to solve it via a RECURSIVE Common Table Expression (CTE), due to the fact I need to produce a Fibonacci series:

CREATE OR REPLACE FUNCTION fibonacci_sum( l int DEFAULT 16 )
RETURNS bigint
AS $CODE$

WITH RECURSIVE
fibonacci( n, p ) AS
(
        SELECT 1, 1
        UNION
        SELECT p + n, n
        FROM fibonacci
        WHERE n < l
)
, permutations AS
(
        SELECT n::text AS current_value, n as total_sum
        FROM fibonacci
        UNION
        SELECT current_value || ',' || n, total_sum + n
        FROM permutations, fibonacci
        WHERE
                position( n::text in  current_value ) = 0
       AND n > ALL( string_to_array( current_value, ',' )::int[] )


)
SELECT count(*)
FROM permutations
WHERE total_sum = l
;

$CODE$
LANGUAGE SQL;

The searched value is the argument to the function, that is l.
The first part of the CTE computes the Fibonacci sequence of values that lead to l, and thus we can throw away all the other values since their sum will be greater than l.
The permutations CTE computes a two column materialization: each value from the Fibonacci sequence is appended to the next value, and the sum so far is computed. Note the WHERE clause:

the position function checks that the digit has not already be inserted in the list;
the n > ALL considers only ordered values, that is 3,5 is a good list, but 5,3 is not because n is 5.

Thanks to the trick of considering only ordered sequences, I can trim out all the sequences that produce the same sum, with the same numbers, in a different order. For example 3, 13 and 13,3 produce the same value, but only the first one is kept.
At this point, it does suffice to count how many tuples there are in permutations to get final answer of the task: how many permutations that lead to l by sum can be found in the Fibonacci series.

Conclusions

Clearly PostgreSQL provides all the features to implement program-like behaviors in a declarative way. Of course, the above solutions are neither the best nor the more efficient that can be implemented, but they demonstrate how powerful PostgreSQL (and more in general, SQL), can be to solve tasks where a few nested loops seem the simpler approach!

pspg lands in OpenBSD

2021-10-28T00:00:00+00:00

A great pager into a great operating system.

pspg lands in OpenBSD

pspg is a great pager specifically designed for PostgreSQL, or better, for psql, the default and powerful text client for PostgreSQL databases.
But pspg is more than simply a pager for PostgreSQL: it is a general purpose pager for tabular data.

It happened that a few weeks ago I was using an OpenBSD system, and since I had to do some work with PostgreSQL, I decided to install pspg to get some advantages. Unluckily, there was no package for OpenBSD, and most notably, no port in the ports tree.
Therefore, the only chance to install pspg was to compile it from sources, but I failed. I opened an issue to get some help, and after some assistance, I decided to dig deeper. So I asked for help on the misc OpenBSD mailing list and get much more that I was expecting: not only I solved the problem on how to install pspg, but the application was noticed and a proposed for a new port was issued.
In fact, another italian guy, Omar, did prepared and proposed a pspg port, and after a few days the port get included into the ports tree!

What does tha mean? That, at least at the moment of writing, that you can get pspg installed on OpenBSD via the ports:

% cd /usr/ports/databases/pspg
% doas make install  
===> pspg-5.4.0 depends on: postgresql-client-* -> postgresql-client-13.4p0
===> pspg-5.4.0 depends on: readline-* -> readline-7.0p0
===> pspg-5.4.0 depends on: metaauto-* -> metaauto-1.0p4
===> pspg-5.4.0 depends on: autoconf-2.69 -> autoconf-2.69p3
===> pspg-5.4.0 depends on: gmake-* -> gmake-4.3
===>  Verifying specs: c curses ereadline m panel pq
===>  found c.96.1 curses.14.0 ereadline.2.0 m.10.1 panel.6.0 pq.6.12
===>  Installing pspg-5.4.0 from /usr/ports/packages/amd64/all/
pspg-5.4.0: ok

It is important to note that the ports tree that include pspg, at the time of writing, is the -CURRENT (see here), and therefore there is still some time to wait to get pspg as a package and a port in the -RELEASE ports tree.

Great OpenBSD Job!

I must say that I was astonished by the great work done by Omar and the other OpenBSD volunteers to get the pspg within the ports tree.

Conclusions

pspg is a very useful and interesting pager for tabular like data, and of course this includes output from PostgreSQL’s psql command line client.
With a bit of luck, patience, and the effort of the OpenBSD community, this program will be soon available on OpenBSD too as a package!

Installing PostgreSQL on OpenBSD

2021-10-09T00:00:00+00:00

A quick look at how to get PostgreSQL up and running on OpenBSD.

Installing PostgreSQL on OpenBSD

OpenBSD is a rock solid, super secure, real Unix operating system.
PostgreSQL is a rock solid, enterprise level, fully feautured relational database.
Is it possible to merge the two for a great database on such operating system? Yes, of course!

OpenBSD has a packaging system that is somehow different from many other operating systems; in particular the packages are deeply inspected before they are installed, so that the installation process proceed only if it really sure the installation can succeed. Moreover, the operating system provides a simple and flexible way to manage services like PostgreSQL.
In this short article, I will show how you can start working with PostgreSQL on OpenBSD.

Packages

OpenBSD uses the pkg_xxx tools, a set of intelligent Perl applications that handle all the packaging mechanics. While it is true that you can install an application out of ports, like other BSDs, OpenBSD recommends to install via packages because the system can easily track what you have installed so far, and consequently, handle updates.

The first thing to do is therefore to search for some PostgreSQL related package, and this is done by means of pkg_info command, with the particular -Q flag (for “query”):

puffy# pkg_info -Q postgresql
debug-dovecot-postgresql-2.3.15v0
debug-dovecot-postgresql-2.3.16v0
dovecot-postgresql-2.3.15v0
dovecot-postgresql-2.3.16v0
postgresql-client-13.4
postgresql-contrib-13.4
postgresql-docs-13.4
postgresql-pg_upgrade-13.4
postgresql-plpython-13.4
postgresql-previous-12.8
postgresql-server-13.4

As you can see, the currently supported version of PostgreSQL is 13.4, while the 14 is already out from a few days (at the time of writing).
Installing packages is done by means of pkg_add, and this case there is no particular flavour (i.e., configuration or stack) required, so it is as simple as:

puffy# pkg_add postgresql-server-13.4 postgresql-client-13.4 postgresql-contrib-13.4 postgresql-docs-13.4
quirks-3.633 signed on 2021-10-05T18:48:49Z
postgresql-server-13.4:libexecinfo-0.3p2v0: ok
postgresql-server-13.4:xz-5.2.5: ok
postgresql-server-13.4:libiconv-1.16p0: ok
postgresql-server-13.4:libxml-2.9.10p3: ok
postgresql-server-13.4:postgresql-client-13.4: ok
useradd: Warning: home directory `/var/postgresql' doesn't exist, and -m was not specified
postgresql-server-13.4: ok
postgresql-contrib-13.4: ok
postgresql-docs-13.4: ok
Running tags: ok
The following new rcscripts were installed: /etc/rc.d/postgresql
See rcctl(8) for details.
New and changed readme(s):
        /usr/local/share/doc/pkg-readmes/postgresql-server

It takes less than a minute to have all the components installed (and if you are curious, it takes much more time to install Emacs 27.2 without X Window support!).
It is important to note that a new rc-script has been installed. rc-scripts are a set of well defined Korn Shell based scripts that are used to manage daemons; they act similar to other init systems like systemd without being, well, so much bloated.
Since the installed script is named /etc/rc.d/postgresql, the service will be called as the relative file name of the script, therefore postgresql.

Start the PostgreSQL Server (and Failing)

You will not be able to start PostgreSQL just after the installation:

puffy# rcctl start postgresql
postgresql(failed)

To understand why PostgreSQL is not starting, we need to dig a little more into the rc-scripts. First of all, ask the OpenBSD system what it knows about PostgreSQL, and this is done thru the rcctl command and the get option:

puffy# rcctl get postgresql
postgresql_class=daemon
postgresql_flags=NO
postgresql_logger=
postgresql_rtable=0
postgresql_timeout=30
postgresql_user=_postgresql

There is not much output in the above command, but essentially PostgreSQL is system-wide disabled. However, this is not why the process is failing, and in order to discover what is causing the fault, we need to debug the rcctl execution:

puffy# rcctl -df start postgresql 
doing _rc_parse_conf
doing _rc_quirks
postgresql_flags empty, using default >-D /var/postgresql/data -w -l /var/postgresql/logfile<
doing rc_check
pg_ctl: directory "/var/postgresql/data" does not exist
postgresql
doing rc_start
doing _rc_wait start
doing rc_check
pg_ctl: directory "/var/postgresql/data" does not exist
pg_ctl: directory "/var/postgresql/data" does not exist
doing _rc_rm_runfile
(failed)

Essentially, the system is failing because packages did not created the PGDATA directory, and therefore this must be done manually:

puffy# mkdir /var/postgresql/data
puffy# chown _postgresql:_postgresql /var/postgresql/data

puffy# su - _postgresql

puffy$ initdb -D /var/postgresql/data
The files belonging to this database system will be owned by user "_postgresql".
This user must also own the server process.

The database cluster will be initialized with locale "C".
The default database encoding has accordingly been set to "SQL_ASCII".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/postgresql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 20
selecting default shared_buffers ... 128MB
selecting default time zone ... Europe/Rome
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /var/postgresql/data -l logfile start

Now that the data directory is set up, the system can be started:

puffy# rcctl start postgresql
postgresql(ok)

Of course, you can play around the fresh installed and fired up PostgreSQL:

puffy# psql -U _postgresql template1
psql (13.4)
Type "help" for help.

template1=# SHOW SERVER_VERSION;
 server_version 
----------------
 13.4
(1 row)

Did you spot the little trick up there? Since the initdb has been executed by the _postgresql user, the database administrator is the _postgresql user too!

Configuring your PostgreSQL, the OpenBSD way!

The rcctl set of scripts is based on a set of variables, that can construct a set of flags passed to the daemon in order to configure it. For example, in the default installation, the PGDATA directory is set to /var/postgresql/data, but where is this set? Let’s inspect again what the rc-scripts knows about:

puffy# rcctl get postgresql
postgresql_class=daemon
postgresql_flags=NO
postgresql_logger=
postgresql_rtable=0
postgresql_timeout=30
postgresql_user=_postgresql

puffy# rcctl getdef postgresql
postgresql_class=daemon
postgresql_flags=-D /var/postgresql/data -w -l /var/postgresql/logfile
postgresql_logger=
postgresql_rtable=0
postgresql_timeout=30
postgresql_user=_postgresql

The get subcommand reports no flags, but the getdef (for get defaults) reports the current settings of the daemon. Clearly, what we are interested in is the postgresql_flags. There are two ways to make a change to a value of the rcctl variables:

editing the rc script by hand;
using rcctl to set the value.

The latest is the preferred way, but, hey, this is Unix, so you can also fire up your favourite editor and go change the /etc/rc.d/postgresql file, that in fact appears as:

puffy# less /etc/rc.d/postgresql

#!/bin/ksh
#
# $OpenBSD: postgresql.rc,v 1.13 2019/08/27 19:49:46 awolk Exp $

daemon="/usr/local/bin/pg_ctl"
daemon_flags="-D /var/postgresql/data -w -l /var/postgresql/logfile"
daemon_user="_postgresql"

. /etc/rc.d/rc.subr

...

Clearly, editing this file by hand can be error prone and must be done with the cluster (i.e., PostgreSQL) not running, or you can result in not being able to stop the instance (e.g., changing the PGDATA).

Using rcctl can do something better than manually editing the script file, but it has some constraints:

the daemon must be system wide enabled;
only rc variables can be edited (i.e., you cannot define your own variables);
a variable is named without the daemon prefix.

Therefore, in order to change both the PGDATA and the logging directory and file, we can edit the flags variable as follows:

puffy# rcctl enable postgresql

puffy# rcctl set postgresql flags "-D /var/postgresql/13/data -l /var/postgresql/13/log/postgresql.log" 

puffy# rcctl get postgresql
postgresql_class=daemon
postgresql_flags=-D /var/postgresql/13/data -l /var/postgresql/13/data/log/postgresql.log
postgresql_logger=
postgresql_rtable=0
postgresql_timeout=30
postgresql_user=_postgresql

Of course, you have to create the new PGDATA and the logging directory by hand, assign the right ownership to _postgresql before you can start the service.

Please also note that if you disable, at system-wide level, the daemon, the customized configuration will be lost:

puffy# rcctl get postgresql
postgresql_class=daemon
postgresql_flags=-D /var/postgresql/13/data -l /var/postgresql/13/data/log/postgresql.log
postgresql_logger=
postgresql_rtable=0
postgresql_timeout=30
postgresql_user=_postgresql

puffy# rcctl disable postgresql

puffy# rcctl get postgresql     
postgresql_class=daemon
postgresql_flags=NO
postgresql_logger=
postgresql_rtable=0
postgresql_timeout=30
postgresql_user=_postgresql

Make PostgreSQL start at boot

In order to let OpenBSD start PostgreSQL at boot, you have to enable the service system-wide. This can be achieved, as already shown, by means of the enable command, or by setting the stauts variable to on:

puffy# rcctl enable postgresql

# the same
puffy# rcctl set postgresql status on

The At-Boot Configuration

Once the service is enabled at system-wide level, it can be customized by means of the rcctl set command, as already shown. The reason is that, once a daemon is enabled at boot, its name is appended to the list of serices in the file /etc/rc.conf.local, that is in turn used to determine what to start at boot.
The custom configuration goes in that file too, and once the daemon is disabled, the configuration is scrubbed out of the file, so that only the default values (in the rc-script) survive:

puffy# rcctl enable postgresql
puffy# rcctl set postgresql flags "-D /var/postgresql/13/data -l /var/postgresql/13/data/log/postgresql.log" 

puffy# cat /etc/rc.conf.local
amd_flags=
pkg_scripts=postgresql transmission_daemon
postgresql_flags=-D /var/postgresql/13/data -l /var/postgresql/13/data/log/postgresql.log

In the above, there are two services that have been installed on the system: PostgreSQL and Transmission. The system is going to start PostgreSQL first (because it is leftmost), and then Transmission. When starting PostgreSQL, it is going to use the specified flags.
If the PostgreSQL is now disabled, the setting are also lost:

puffy# rcctl disable postgresql
puffy# cat /etc/rc.conf.local   
amd_flags=
pkg_scripts=transmission_daemon

Decide When to Start at Boot

It is also possible to let PostgreSQL start after (or before) specific other daemons. If we re-enable PostgreSQL, it will be appended into the rc.conf.local file, and therefore it will be started after the Transmission daemon; this can be obtained also from the rcctl order command:

puffy# rcctl enable postgresql
puffy# rcctl order             
transmission_daemon postgresql

Let’s say we want PostgreSQL to be started as soon as possible, it is possible to change the order of starting by means of rcctl order command: you need to specify the leftmost (absolute first) daemon to start, or the list of daemons you want to start in the beginning:

puffy# rcctl order postgresql
puffy# rcctl order            
postgresql transmission_daemon

Two is Better Than One

What if you want another PostgreSQL instance controlled by rcctl?
You can copy the rc-script, giving another name, and chagne the set of flags to let it start:

puffy# cp /etc/rc.d/postgresql /etc/rc.d/postgresql_replica                                                                                       
puffy# rcctl enable postgresql_replica
puffy# rcctl set postgresql_replica flags "-D /var/postgresql/replica/data -l /var/postgresql/replica/data/log/postgresql.log -o '-p 5433'"
puffy# mkdir -p /var/postgresql/replica/data
puffy# chown -R _postgresql:_postgresql /var/postgresql/replica/data

puffy# su - _postgresql
puffy$ initdb /var/postgresql/replica/data
...
puffy$ mkdir /var/postgresql/replica/data/log

puffy# rcctl start postgresql_replica
postgresql_replica(ok_

There is some work to perform, but it is quite simple after all.

So, is PostgreSQL Running?

Besides checking for allowed connections, you can use rcctl to see if the daemon is running: the ls command accepts a status you are looking for, started for running daemons, and returns the running services:

puffy# rcctl ls started
...
postgresql
postgresql_replica
...

Conclusions

PostgreSQL can, of course, run well on OpenBSD systems. It can also be managed via the integrated service handler, named rcctl and its rc-scripts, as well as manually by means of PostgreSQL utility (e.g., pg_ctl).

GNU Guix and PostgreSQL

2021-09-30T00:00:00+00:00

Installing PostgreSQL via GNU Guix.

GNU Guix and PostgreSQL

GNU Guix is an advanced transactional package manager for the GNU operating system. It is both a complete Linux distribution and a package manager that can be installed on an existing operating system.
The idea behind GNU Guix is to provide a package manager that works in a way similar to that of binary environment managers: Guix uses user profiles and a set of self-contained directory tree to make available libraries and executables.
GNU Guix provides a guix command line command that can be used to manage packages and all the GNU Guix dependencies and configuration.

In this article, I show how to use GNU Guix on a CentOS Linux operating system to install and manage PostgreSQL. Please note that I’m not going to show how to install guix, please refer to the official GNU Guix installation guide.

Please note that managing PostgreSQL versions via GNU Guix has nothing to do with PostgreSQL point in time recovery (PITR) or backup strategies.

Using GNU Guix

The main command to interact with GNU Guix is, guess what, guix. The command allows for subcommands, in particular the help one that can provide you interactive help about other subcommands.
In this article I’m going to use the following subcommands:

search to search for packages to install;
pull to get fresh guix package lists and update the program itself;
install and remove to install a package and delete it;
package the main guix command, many other subcommands are aliases to the package one. The package command allows for various operations on packages and their history.

Searching for PostgreSQL

The subcommand search can be used to search for a package, in our case the beloved PostgreSQL database. The search command allows the specification of what to search as a regular expression, and in the following example I’ll search for only packages that start with postgresql to avoid getting information about drivers and extensions:

% guix search '^postgresql.*$'

name: postgresql
version: 9.6.21
outputs: out
systems: x86_64-linux i686-linux
dependencies: [email protected] [email protected] [email protected] [email protected]
location: \gnu/packages/databases.scm:1124:2\
homepage: https://www.postgresql.org/
license: X11-style
synopsis: Powerful object-relational database system  
description: PostgreSQL is a powerful object-relational database system.  It is fully ACID compliant, has full support for foreign keys, joins,
+ views, triggers, and stored procedures (in multiple languages).  It includes most SQL:2008 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR,
+ VARCHAR, DATE, INTERVAL, and TIMESTAMP.  It also supports storage of binary large objects, including pictures, sounds, or video.
relevance: 30

name: postgresql
version: 13.2
outputs: out
systems: x86_64-linux i686-linux
dependencies: [email protected] [email protected] [email protected] [email protected]
location: \gnu/packages/databases.scm:1085:2\
homepage: https://www.postgresql.org/
license: X11-style
synopsis: Powerful object-relational database system  
description: PostgreSQL is a powerful object-relational database system.  It is fully ACID compliant, has full support for foreign keys, joins,
+ views, triggers, and stored procedures (in multiple languages).  It includes most SQL:2008 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR,
+ VARCHAR, DATE, INTERVAL, and TIMESTAMP.  It also supports storage of binary large objects, including pictures, sounds, or video.
relevance: 30
...

There are other PostgreSQL versions in the command output, that I trimmed out for sake of readibility. In short, the search allows you to search for a package and all its available versions.
As many guix subcommands, the search command is just a shortcut for the invocation of the package subcommand with the appropriate options. In other words, guix search foo is the same as calling guix package -s foo:

% guix package -s '^postgresql.*$'

name: postgresql
version: 9.6.21
outputs: out
systems: x86_64-linux i686-linux
dependencies: [email protected] [email protected] [email protected] [email protected]
location: \gnu/packages/databases.scm:1124:2\
homepage: https://www.postgresql.org/
license: X11-style
synopsis: Powerful object-relational database system  
description: PostgreSQL is a powerful object-relational database system.  It is fully ACID compliant, has full support for foreign keys, joins,
+ views, triggers, and stored procedures (in multiple languages).  It includes most SQL:2008 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR,
+ VARCHAR, DATE, INTERVAL, and TIMESTAMP.  It also supports storage of binary large objects, including pictures, sounds, or video.
relevance: 30
...

Installing PostgreSQL

If we don’t specify any particular version, guix will install the latest available in its repositories, that as I write is PostgreSQL 13.2.
There are two ways to install stuff in guix:

compiling all software on the local machine (the default behaviour);
using binary packages where and when available.

Of course, compiling all the software on the local machine can require a lot of time and resources, depending on the power of the machine guix is running on.
Binary packages are called substitutes in guix, because they substitute source compiled software.

In both installation scenarios, guix will install every dependency required by the specific software you are going to install.
A source based installation will look like the following:

% guix install postgresql

The following package will be installed:
   postgresql 13.2
   
...
building /gnu/store/jkzin3sk1kk8ah9j066k3a03q4d99hc4-tcc-boot0-0.9.26-1103-g6e62e0e.drv...
| 'build' phase
building /gnu/store/35lsvpkqwgzmcs3gnhqkmxhivwfisidm-gzip-mesboot-1.2.4.drv...
building /gnu/store/gzlrw46slsi423qh5vcq91ki0rw4xzm4-make-mesboot0-3.80.drv...
building /gnu/store/2nvaxgs0rdxfkrwklh622ggaxg0wap6n-bash-mesboot0-2.05b.drv...
- 'unpack' phase
...

In the case of binary packages, substitutions, the installation will look like:

% guix install postgresql
...
 perl-5.30.2  13.6MiB                                                                                    1.6MiB/s 00:09 [##################] 100.0%
 pkg-config-0.29.2  201KiB                                                                               721KiB/s 00:00 [##################] 100.0%
 postgresql-13.2  5.4MiB                                                                                 543KiB/s 00:10 [##################] 100.0%
 guile-3.0.2  6.9MiB                                                                                     457KiB/s 00:15 [##################] 100.0%
 texinfo-6.7  1.2MiB                                                                                     3.5MiB/s 00:00 [##################] 100.0%
building CA certificate bundle...
building fonts directory...
building directory of Info manuals...
building database for manual pages...
building profile with 1 package...
hint: Consider setting the necessary environment variables by running:

     GUIX_PROFILE="/home/luca/.guix-profile"
     . "$GUIX_PROFILE/etc/profile"

Alternately, see `guix package --search-paths -p "/home/luca/.guix-profile"'.

The first time, guix has to bootstrap a lot of dependencies, so it will download, (build) and install libraries and tools even if they are already available on your operating system.
At the end of the installation, guix will give you an hint about setting environment variables to give you access to the installed PostgreSQL (and other installed software).

Inspecting the content of the directory pointed by the above variable, you can see it contains PostgreSQL binaries and executables:

% export GUIX_PROFILE="/home/luca/.guix-profile"
% source "$GUIX_PROFILE/etc/profile"

% ls $GUIX_PROFILE  
bin  etc  include  lib  manifest  share

% ls /home/luca/.guix-profile/bin
clusterdb   dropuser  pg_archivecleanup  pg_config       pg_dumpall      pg_resetwal  pg_test_fsync    pg_waldump  reindexdb
createdb    ecpg      pg_basebackup      pg_controldata  pg_isready      pg_restore   pg_test_timing   postgres    vacuumdb
createuser  initdb    pgbench            pg_ctl          pg_receivewal   pg_rewind    pg_upgrade       postmaster  vacuumlo
dropdb      oid2name  pg_checksums       pg_dump         pg_recvlogical  pg_standby   pg_verifybackup  psql

`locale` and Language problems

It is suggested to install the locales, because within guix ecosystem the ones you have already system-wide will not be available. This could make all your executables, included PostgreSQL’s one, not working at all and cause you some problems.

% guix install glibc-locales

The following package will be installed:
   glibc-locales 2.31

...
 glibc-locales-2.31  10.8MiB                                                                             222KiB/s 00:50 [##################] 100.0%
 linux-libre-headers-5.4.20  1.0MiB                                                                      629KiB/s 00:02 [##################] 100.0%
building CA certificate bundle...
building fonts directory...
building directory of Info manuals...
building database for manual pages...
building profile with 3 packages...

After installing locales, you need to export Guix related environment variables to make the former available:

%  export GUIX_LOCPATH="$HOME/.guix-profile/lib/locale"

Using the freshly installed PostgreSQL

Sourcing the profile file as suggested by guix makes the PostgreSQL executables available to your shell:

% which pg_ctl
~/.guix-profile/bin/pg_ctl

The trick is simple: the profile file manipulates the PATH environment variable to place the installed software executables in front of the already available ones:

% cat "$GUIX_PROFILE/etc/profile"

export PATH="${GUIX_PROFILE:-/gnu/store/xh9k8z9x5aspfqfcp1gycqlwksgl1m3g-profile}/bin${PATH:+:}$PATH"

It is now straightforward to use PostgreSQL as “usual”:

<br/<

% mkdir -p pgdata/13

% initdb -k -D pgdata/13
...
Success. You can now start the database server using:

    pg_ctl -D pgdata/13 -l logfile start

There is an important thing to note here: PostgreSQL has been installed as a normal user, this is very similar to virtual binary environment manages, for instance my favourite in PostgreSQL scenario pgenv.

It is now possible to start PostgreSQL, and since I’ve already a system-wide PostgreSQL running, I need to specify a different port to listen on:

% pg_ctl -D pgdata/13 -o '-p 5433' start
waiting for server to start....
 LOG:  starting PostgreSQL 13.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 7.5.0, 64-bit
 LOG:  listening on IPv6 address "::1", port 5433
 LOG:  listening on IPv4 address "127.0.0.1", port 5433
 LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
 LOG:  database system was shut down at 2021-09-30 08:41:44 EDT
 LOG:  database system is ready to accept connections
 done
server started

And it is now possible to see that two instances are running on the machine:

% psql -c 'SHOW SERVER_VERSION;' template1
 server_version 
----------------
 13.4
(1 row)

% psql -c 'SHOW SERVER_VERSION;' -p 5433 template1
 server_version 
----------------
 13.2
(1 row)

The PostgreSQL version 13.4 is the system wide one, while the version 13.2 is the one installed via guix.

Getting Newer PostgreSQL Versions

In order to get newer PostgreSQL versions, you need to “ask” guix to search for updates. This is done via the pull command, that tell guix to update the list of available software:

% guix pull
Migrating profile generations to '/var/guix/profiles/per-user/luca'...
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
Authenticating channel 'guix', commits 9edb3f6 to 7b59508 (6.374 new commits)...
Building from this channel:
  guix      https://git.savannah.gnu.org/git/guix.git   7b59508

...
 guix-7b59508ca-modules                                                                                        1.5MiB/s 00:20 | 29.2MiB transferred
 guix-module-union                                                                                                8.9MiB/s 00:00 | 3KiB transferred
 guix-command  635B                                                                                       18KiB/s 00:00 [##################] 100.0%
 guix-daemon  391B                                                                                       1.0MiB/s 00:00 [##################] 100.0%
 guix-7b59508ca                                                                                                 44.4MiB/s 00:00 | 16KiB transferred
building CA certificate bundle...
building fonts directory...
building directory of Info manuals...
building database for manual pages...
building profile with 1 package...
hint: Consider setting the necessary environment variables by running:

     GUIX_PROFILE="/home/luca/.config/guix/current"
     . "$GUIX_PROFILE/etc/profile"



% source "$GUIX_PROFILE/etc/profile"

It is really important to source again the profile file since it has changed due to the update process.

After the pull update, we can search for PostgreSQL again and the available version has bumped to 13.3:

% guix search 'postgresql.*'
...
name: postgresql
version: 13.3
outputs: out
systems: x86_64-linux i686-linux
dependencies: [email protected] [email protected] [email protected] [email protected]
location: \gnu/packages/databases.scm:1127:2\
homepage: https://www.postgresql.org/
license: X11-style
synopsis: Powerful object-relational database system  
description: PostgreSQL is a powerful object-relational database system.  It is fully ACID compliant, has full support for foreign keys, joins,
+ views, triggers, and stored procedures (in multiple languages).  It includes most SQL:2008 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR,
+ VARCHAR, DATE, INTERVAL, and TIMESTAMP.  It also supports storage of binary large objects, including pictures, sounds, or video.
relevance: 30
...

It is now time to upgrade the currently running PostgreSQL (and it is suggested to stop the running instance before);

% pg_ctl -D pgdata/13 stop

% guix upgrade postgresql
The following package will be upgraded:
   postgresql 13.2 → 13.3

...
 postgresql-13.3  5.4MiB                                                                                 1.7MiB/s 00:03 [##################] 100.0%
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
The following derivation will be built:
   /gnu/store/ghlc1angdx9q7gx4hm4yagam6m0gmxzw-profile.drv

0,2 MB will be downloaded
...

Is the new PostgreSQL version installed? Let’s check out:

% pg_ctl -D pgdata/13 -o '-p 5433' start
...
server started

% psql -p 5433 -c 'SHOW SERVER_VERSION;' template1
 server_version 
----------------
 13.3
(1 row)

Success!

Generations (or, “How do I go back in time?”)

guix stores the so called generations, that are point in time that contain the history of the installed/removed packages. The package sub command can show you the generations available in your system, for example:

% guix package --list-generations
\Generation 1   set 30 2021 08:16:02\
  postgresql    13.2    out     /gnu/store/ivmkwkjsvbkv3g0jq9gcgwlhrhwx91gw-postgresql-13.2

\Generation 2   set 30 2021 08:37:18\
 + glibc-utf8-locales   2.31    out     /gnu/store/rgydar9dfvflqqz2irgh7njj34amaxc6-glibc-utf8-locales-2.31

\Generation 3   set 30 2021 08:40:43\
 + glibc-locales        2.31    out     /gnu/store/wnw0nwlyg92vv33f5f65jj1rd3p4fi3c-glibc-locales-2.31

\Generation 4   set 30 2021 10:04:21\   (current)
 + postgresql   13.3    out     /gnu/store/1nlzmg4hw4gga56g58dsqf9nx90z9kkn-postgresql-13.3
 - postgresql   13.2    out     /gnu/store/ivmkwkjsvbkv3g0jq9gcgwlhrhwx91gw-postgresql-13.2

In the above example, we installed PostgreSQL 13.2 as first thing (generation 1), while the upgrade of PostgreSQL to version 13.3 happened in the fourth generation. Note that the output is somehow similar to a diff status report, where + lines are addition and - are somehow removals.
Imagine we need to come back to version 13.2 of PostgreSQL. How can we achieve this? There are two ways:

do a so called rollback that makes the last generation active (that is goes to generation number 3);
jump to a specific revision, in this case the number 1.

Depending on the history of your system, you can choose the correct approach.
Let’s jump to generation one (again, ensure your PostgreSQL server is turned off):

% pg_ctl -D pgdata/13 stop

% guix package --switch-generation=1
switched from generation 4 to 1

% pg_ctl --version
pg_ctl (PostgreSQL) 13.2

Unlike installing new software, switching to a previous generation is a very fast, almost immediate, operation, since the only thing to do is to adjust the binary environment. As you can see, the PostgreSQL executables are turned back to version 13.2.

What if we want to upgrade PostgreSQL version again? One solution is to switch-generation again, but it is also possible to run upgrade again, that is an almost immediate operation since everything is already on the system:

% guix upgrade postgresql

building CA certificate bundle...
listing Emacs sub-directories...
building fonts directory...
building directory of Info manuals...
building database for manual pages...
building profile with 1 package...

% pg_ctl --version
pg_ctl (PostgreSQL) 13.3

What has changed in the generations? Since we moved back to history placeholder one, and then upgrade PostgreSQL, the upgrade has been squashed from there:

% guix package --list-generations

\Generation 1   Sep 30 2021 08:16:02\
  postgresql    13.2    out     /gnu/store/ivmkwkjsvbkv3g0jq9gcgwlhrhwx91gw-postgresql-13.2

\Generation 2   Sep 30 2021 10:15:33\   (current)
 + postgresql   13.3    out     /gnu/store/1nlzmg4hw4gga56g58dsqf9nx90z9kkn-postgresql-13.3
 - postgresql   13.2    out     /gnu/store/ivmkwkjsvbkv3g0jq9gcgwlhrhwx91gw-postgresql-13.2

\Generation 3   Sep 30 2021 08:40:43\
 + glibc-locales        2.31    out     /gnu/store/wnw0nwlyg92vv33f5f65jj1rd3p4fi3c-glibc-locales-2.31
 + glibc-utf8-locales   2.31    out     /gnu/store/rgydar9dfvflqqz2irgh7njj34amaxc6-glibc-utf8-locales-2.31
 + postgresql           13.2    out     /gnu/store/ivmkwkjsvbkv3g0jq9gcgwlhrhwx91gw-postgresql-13.2
 - postgresql           13.3    out     /gnu/store/1nlzmg4hw4gga56g58dsqf9nx90z9kkn-postgresql-13.3

\Generation 4   Sep 30 2021 10:04:21\
 + postgresql   13.3    out     /gnu/store/1nlzmg4hw4gga56g58dsqf9nx90z9kkn-postgresql-13.3
 - postgresql   13.2    out     /gnu/store/ivmkwkjsvbkv3g0jq9gcgwlhrhwx91gw-postgresql-13.2

The PostgreSQL set of changes is propagated from history number one to all the other entries.

Removing PostgreSQL

Imagine we want to remove the 13.3 PostgreSQL version, keeping the older one available. The remove command does pretty much what you would expect:

% guix remove [email protected]

The following package will be removed:
   postgresql 13.3

The following derivation will be built:
   /gnu/store/ilxkw0i597n0qvirb11mksbyad8qmnvd-profile.drv

building profile with 0 packages...

Again, this is a very fast operation, and this should hint you that nothing has been removed from the storage. Note that I specified the version to remove with the @<version> syntax after the package name.
Is the older PostgreSQL version immediatly available? NO!
If you test the binaries, you will find out that the system wide (if any) because, as guix has told you in the above command output, it has removed the package from the current profile. This means, PostgreSQL is no more available via guix:

% which pg_ctl
/usr/pgsql-13/bin/pg_ctl

% pg_ctl --version
pg_ctl (PostgreSQL) 13.4

In order to enable the PostgreSQL 13.2 version via guix package --switch-generations to jump back to the generation that has the required PostgreSQL package:

% guix package --switch-generation=1
switched from generation 3 to 1

% pg_ctl --version
pg_ctl (PostgreSQL) 13.2

But how to free disk space from unused PostgreSQL versions?
The gc subcommand will garbage collect packages that are not in use. Here, not in use could be something different from what you think: guix is of course smarter than you (at least, smarter than me) in finding out references between generations and packages. This means that the only fact that a package is not currently in use, does not make it eligible for hard deletion. It is therefore recommended to delete all the generations that refer to a specific package in order to get it deleted from the garbage collector:

% guix package --delete-generations=2
% guix package --delete-generations=3
% guix package --delete-generations=4
% guix gc
...
note: currently hard linking saves 913.85 MiB
guix gc: freed 3,631.25278 MiBs

% du -hs /gnu/store/*postgresql-13.?    
30M     /gnu/store/ivmkwkjsvbkv3g0jq9gcgwlhrhwx91gw-postgresql-13.2

Of course, this approach brings back your whole system since upgrading will require a new fresh installtion.

Conclusions

GNU Guix is a very interesting package manager that can be used to setup a binary environment useful for testing and deploying software stacks, including our beloved database and its dependencies (e.g., tools and libraries).
Probably you are not going to use guix in a PostgreSQL production environment because you will have other package revision tools, to automate and keep stable your packages. However, guix can be very handy in testing and upgrading your own environment.

Restarting a sequence: how hard could it be? (PostgreSQL and Oracle)

2021-09-23T00:00:00+00:00

How hard could it be to reset a sequence?

Restarting a sequence: how hard could it be? (PostgreSQL and Oracle)

One reason I like PostgreSQL so much is that it makes me feel at home: it has a very consistent and coherent interface to its objects. An example of this, is the management of sequences: ALTER SEQUENCE allows you to modify pretty much every detail about a sequence, in particular to restart it from its initial value.
Let’s see this in action:

testdb=> create sequence batch_seq 
         increment by 1 start with 1;
CREATE SEQUENCE

testdb=> do $$
declare
  i int;
begin
  for i in 1..100 loop
     perform nextval( 'batch_seq' );
  end loop;
end
$$
;
DO


testdb=> select currval( 'batch_seq' );
 currval 
---------
     100

In the above piece of code, I’ve created a batch_seq and queried it one hundred times, so that the current value of the sequence is holding 100.

How is it possible to make the sequence start over again?
A first possibility is to use the setval function:

testdb=> select setval( 'batch_seq', 1 );
 setval 
--------
      1


testdb=> select currval( 'batch_seq' );
 currval 
---------
       1

Another option is to use ALTER SEQUENCE, that is a command aimed to this purpose (and others):

testdb=> alter sequence batch_seq restart;
ALTER SEQUENCE

testdb=> select nextval( 'batch_seq' );   
 nextval 
---------
       1

An important thing to note here, is that the only option specified has been RESTART, that is the sequence already knows what restarting means: it means reset to its original starting value.
It is also possible to specify a specific value for the restarting:

testdb=> alter sequence batch_seq restart with 666;
ALTER SEQUENCE
        
testdb=> select nextval( 'batch_seq' );
 nextval 
---------
     666

That’s so simple!
The above behaviour is guaranteed back to the 8.1 PostgreSQL version (and probably even before): see the old documentation here.

Wait, what about `currval()`?

The careful reader has probably noted that I used nextval() to see if the reset of a sequence worked, instead of currval(). The reason can be found in the official documentation: *Returns the value most recently obtained by nextval for this sequence *in the current session . *
It is easy to test this:

testdb=> select nextval( 'batch_seq' );
 nextval 
---------
     667


testdb=> alter sequence batch_seq restart with 999;
ALTER SEQUENCE

testdb=> select currval( 'batch_seq' );
 currval 
---------
     667


testdb=> select nextval( 'batch_seq' );
 nextval 
---------
     999

As you can see, after an ALTER SEQUENCE RESTART the currval() result remains unchanged (it is the last polled value within the current session), while nextval() (that actually queries the sequence) provides the right and expected value.

What about Oracle sequences?

Oracle provides a powerful ALTER SEQUENCE command only in recent versions. For older versions, the official documentation for the command ALTER SEQUENCE clearly states that To restart the sequence at a different number, you must drop and re-create it!

Err… what?

Until version 18: ALTER SEQUENCE cannot restart the sequence. What is then the solution? You need to trigger a sequence update:

change the increment of the sequence to effectively subtract values;
ask the sequence a new value, so that it applies the subtraction;
set the increment to its correct value.

This means you have to do something like the following:

SQL> select batch_seq.nextval from dual; 
SQL> alter sequence batch_seq  increment by -666;
SQL> select batch_seq.nextval from dual; 
SQL> alter sequence batch_seq  increment by 1;

I don’t like this approach very much, because it is error prone and requires you to do some computation ensuring you are not going to go outside the sequence boundaries.

In recent versions of Oracle Database (e.g., 21), the ALTER SEQUENCE command works as in PostgreSQL, i.e., as in the standard SQL, and this is good, of course.
With a quick search for within the Oracle documentation about ALTER SEQUENCE, the right behaviour has been introduced in Oracle 18 and next. Therefore, if you are facing a previous Oracle version, you need to do the above set of commands to manually adjust the sequences.

Conclusions

PostgreSQL has a very strict approach to the SQL standard, that roots even in old versions. Unluckily, Oracle is not the same, and older versions require some tricks to simulate the PostgreSQL behavior.
This is not meant to be a flame or a comparison, it simply indicates how counter-intuitive could be to handle Oracle once you have been used to PostgreSQL!

Using jq to get information out of pgbackrest

2021-09-10T00:00:00+00:00

pgbackrest supports the JSON output format, and this can be useful to automate some information analysys.

Using `jq` to get information out of `pgbackrest`

pgbackrest offers the output of its commands in the JSON format. I’m not a great fan of JSON, but it having such an output offers a few advantages, most notably it is a stable text output format that can be inspected easily with other tools.
In other words, no need for regular expression to parse the textual output, and moreover, the output is guaranteed to be stable, that means no changes will happen (or better, no fields will be removed), while a simple rephrasing in the text output could crash your crafty regular expression!

Among the available tools, jq is a good sheel program that allows you to parse and navigate a JSON content.
Let’s see how it is possible to get some output combining jq and pgbackrest.

Get the last backup information

When your stanza has a lot of backup, you probably don’t want to monitor all of them in deep, but would rather like to get a quick hint on when the last backup did took place.
The pgbackrest info command reports all the backup available for a given stanza, and it can then be piped into jq to get more human readable information.
Quick! Show me the snippet:

$ pgbackrest info --output json | jq '"Stanza:  " + .[].name + " (" +  .[].status.message + ") " + "Last backup completed at "  +   (.[].backup[-1].timestamp.stop | strftime("%Y-%m-%d %H:%M") )' 

"Stanza:  miguel (ok) Last backup completed at 2021-07-27 09:23"

This is what I would like to see when I’m in a rush and need to see which machine are in trouble with backups: it shows me the name of the stanza, the status of the backup (ok) and the time and date the backup ended.
Let’s analyze the command in more detail:

pgbackrest info --output json enables the output of the info command as JSON;
jq is used to parse the JSON output concatenating strings, delimited by " with +
- .[].name provides the name of the stanza, that is it reads the name property of the JSON output;
- .[].status.message provides the backup status message, that is the appearing ok;
- (.[].backup[-1].timestamp.stop | strftime("%Y-%m-%d %H:%M") ) is clearly the trickiest part, and it gets the last backup (i.e., the backup -1 from the end), extracts its stop timestamp (there are start and stop timestamp properties) and filters it (i.e., pipes within jq) to strftime to display the timestamp in a more human friendly way.

Get all the backups for a stanza

It is possible to iterate over all the backup information and therefore get an overall status of all the backups:

$ pgbackrest info --stanza miguel --output json | jq -r '"Stanza:  " + .[].name + " (" +  .[].status.message + ") " + " backup completed at "  +   (.[].backup[].timestamp.stop | strftime("%Y-%m-%d") ) + " of size " + (.[].backup[].info.size/1024|tostring ) + " MB"' 
Stanza:  miguel (ok)  backup completed at 2021-01-27 of size 3578696.4814453125 MB
Stanza:  miguel (ok)  backup completed at 2021-02-27 of size 3578696.4814453125 MB
Stanza:  miguel (ok)  backup completed at 2021-03-27 of size 3578696.4814453125 MB
Stanza:  miguel (ok)  backup completed at 2021-04-27 of size 3578696.4814453125 MB
Stanza:  miguel (ok)  backup completed at 2021-05-27 of size 3582783.4150390625 MB
Stanza:  miguel (ok)  backup completed at 2021-06-27 of size 3582783.4150390625 MB
Stanza:  miguel (ok)  backup completed at 2021-07-27 of size 3582783.4150390625 MB
Stanza:  miguel (ok)  backup completed at 2021-07-27 of size 3582783.4150390625 MB
Stanza:  miguel (ok)  backup completed at 2021-09-27 of size 3585732.208984375 MB
Stanza:  miguel (ok)  backup completed at 2021-07-27 of size 3585732.208984375 MB

The trick here is to use -r to let the application to iterate on every backup information. Also note that it is possible to add the dimension of the backup, as well as other information tailored to your needs.

Get the last backup within a set of servers

It is possible to elaborate a little more on the jq extract string and loop it within a simple shell iteration to get information about all your servers. Of course, this is simpler if your servers have all a pre-defined name, like server-01, server-02 and so on.

$ for server in {1..10}; do printf "Stanza server-%02d with last backup at %s\n" $server "$(  pgbackrest info --stanza $(printf '%02d' $server) --output json |  jq ' (.[0].backup[-1].timestamp.stop | strftime("%Y-%m-%d %H:%M") )' )" ; done
Stanza server-01 with last backup at "2021-07-27 09:23"
Stanza server-02 with last backup at "2021-07-27 01:23"
Stanza server-03 with last backup at "2021-07-27 02:23"
Stanza server-04 with last backup at "2021-07-27 03:23"
Stanza server-05 with last backup at "2021-07-27 05:23"
Stanza server-06 with last backup at "2021-07-27 06:23"
Stanza server-07 with last backup at "2021-07-27 07:23"
Stanza server-08 with last backup at "2021-07-27 08:23"
Stanza server-09 with last backup at "2021-07-27 10:23"
Stanza server-10 with last backup at "2021-07-27 11:23"

Please note the usage of printf(1) to cope with numbers like 01, as well as the for to invoke pgbackrest info against every single stanza. Similar results can be obtained with jq iterations:

Conclusions

The capability to output information in JSON can simplify a lot the monitoring of the backup status. There is no need to deploy a complex monitoring stack though, and it does suffice to use jq to get a report about servers and backups. Of course, being able to navigate the JSON output and play with shell scripting can allow you to get even better results.

A simple example of LATERAL use

2021-08-07T00:00:00+00:00

How LATERAL can help to solve problems…

A simple example of LATERAL use

A few days ago I found a question by a user on Facebook: how to select events from a table where they are no more than 10 minutes one from another?
My first answer was related to LATERAL, and this post I try to represent with an example how I understood and could solve the above question.

First of all, let’s build an events table, where each row has a timestamp.

testdb=> CREATE TABLE events(
  pk int generated always as identity
  , event text
  , ts timestamp default CURRENT_TIMESTAMP
  , PRIMARY KEY( pk )
);

CREATE TABLE

Now, let’s populate the table with “random” data so that there are events spread in a range of 2 minutes each:

testdb=> insert into events( event, ts )
select 'event #' || v, current_timestamp - ( ( v * 2 ) || ' minutes' )::interval
from generate_series( 1, 100 ) v;

INSERT 0 100

Having setup the table and the data, how can we relate every tuple with other events that are not outside a ten minutes window? LATERAL comes to the rescue.

testdb=> SELECT e1.pk, e1.event, e2.*  
         FROM events e1, 
         LATERAL ( SELECT pk, event, ts - e1.ts as time_elapsed 
                   FROM events 
                   WHERE pk <> e1.pk AND e1.ts - ts <= '10 minutes'::interval ) e2 
        ORDER BY e1.pk LIMIT 20;
        
 pk  |  event   | pk  |  event   | time_elapsed 
-----+----------+-----+----------+--------------
 501 | event #1 | 502 | event #2 | -00:02:00
 501 | event #1 | 503 | event #3 | -00:04:00
 501 | event #1 | 504 | event #4 | -00:06:00
 501 | event #1 | 505 | event #5 | -00:08:00
 501 | event #1 | 506 | event #6 | -00:10:00
 502 | event #2 | 501 | event #1 | 00:02:00
 502 | event #2 | 503 | event #3 | -00:02:00
 502 | event #2 | 504 | event #4 | -00:04:00
 502 | event #2 | 505 | event #5 | -00:06:00
 502 | event #2 | 506 | event #6 | -00:08:00
 502 | event #2 | 507 | event #7 | -00:10:00
 503 | event #3 | 501 | event #1 | 00:04:00
 503 | event #3 | 502 | event #2 | 00:02:00
 503 | event #3 | 504 | event #4 | -00:02:00
 503 | event #3 | 505 | event #5 | -00:04:00
 503 | event #3 | 506 | event #6 | -00:06:00
 503 | event #3 | 507 | event #7 | -00:08:00
 503 | event #3 | 508 | event #8 | -00:10:00
 504 | event #4 | 501 | event #1 | 00:06:00
 504 | event #4 | 502 | event #2 | 00:04:00
(20 rows)

Let’s disassemble the query and see how it works. The subquery selects all the tuples that are within a 10 minutes range and that are different from the query the system is currently evaluating (i.e., e1.pk). But usually a subquery is evaluated once for the outer query, but note that the subquery is prefixed with LATERAL that, in simple words, means evaluate the subquery for every row of the outer result set. This means that the LATERAL subquery can access the outer query row, and can “reason” about its own result set.
An important thing to keep in mind while dealing with LATERAL is that the subquery must be referenced with an alias, in my case e2. Please note that within the LATERAL subquery I do compute the time difference between the timestamp of the outer tuple and the one of the inner result set, and as you can see from the output column time_elapsed every row differs by 2 minutes, that is how we generated the rows.

What happens if you don’t use LATERAL? Well, you cannot reference the e1 outer tuple, that is there is no way for a subquery to cross-reference something outside of its scope:

testdb=> SELECT e1.pk, e1.event, e2.*  
         FROM events e1,  
         ( SELECT pk, event, ts - e1.ts as time_elapsed 
           FROM events WHERE pk <> e1.pk AND e1.ts - ts <= '10 minutes'::interval ) e2 
         ORDER BY e1.pk LIMIT 20;
ERROR:  invalid reference to FROM-clause entry for table "e1"
LINE 1: ..., e2.*  FROM events e1,  ( SELECT pk, event, ts - e1.ts as t...
                                                             ^
HINT:  There is an entry for table "e1", but it cannot be referenced from this part of the query.

As you can see, PostgreSQL clearly states that you cannot refer to e1 (the outer tuple) from within the scope of the subquery.

`LATERAL` Joins

It is, of course, possible to use LATERAL in a join, and in this case the above query can be rewritten as:

testdb=> SELECT e1.pk, e1.event, e2.*  
         FROM events e1 JOIN  LATERAL 
            ( SELECT pk, event, ts - e1.ts as time_elapsed 
              FROM events WHERE pk <> e1.pk AND e1.ts - ts <= '10 minutes'::interval ) e2 
        ON true  
        ORDER BY e1.pk LIMIT 20;
 pk  |  event   | pk  |  event   | time_elapsed 
-----+----------+-----+----------+--------------
 501 | event #1 | 502 | event #2 | -00:02:00
 501 | event #1 | 503 | event #3 | -00:04:00
 501 | event #1 | 504 | event #4 | -00:06:00
 501 | event #1 | 505 | event #5 | -00:08:00
 501 | event #1 | 506 | event #6 | -00:10:00
 502 | event #2 | 501 | event #1 | 00:02:00
 502 | event #2 | 503 | event #3 | -00:02:00
 502 | event #2 | 504 | event #4 | -00:04:00
 502 | event #2 | 505 | event #5 | -00:06:00
 502 | event #2 | 506 | event #6 | -00:08:00
 502 | event #2 | 507 | event #7 | -00:10:00
 503 | event #3 | 501 | event #1 | 00:04:00
 503 | event #3 | 502 | event #2 | 00:02:00
 503 | event #3 | 504 | event #4 | -00:02:00
 503 | event #3 | 505 | event #5 | -00:04:00
 503 | event #3 | 506 | event #6 | -00:06:00
 503 | event #3 | 507 | event #7 | -00:08:00
 503 | event #3 | 508 | event #8 | -00:10:00
 504 | event #4 | 501 | event #1 | 00:06:00
 504 | event #4 | 502 | event #2 | 00:04:00
(20 rows)

Conclusions

LATERAL is a very powerful SQL operator in PostgreSQL, and can help solving problems you would normally solve by means of cursors and iterations.

Select Distinct Bytea (or Blobs)

2021-08-04T00:00:00+00:00

A strange behaviour I found in Oracle.

Select Distinct Bytea (or Blobs)

TLDR: seems to me that PostgreSQL has a more comfortable behaviour than Oracle when dealing with distinct and BLOB-like fields

I’m not an avid Oracle user, at least not as much as I’m with regard to PostgreSQL.
In the last days I spot a problem with an application of mine: after having added a BLOB column to an Oracle table, a few automated queries began to fail. It was not so simple, in the beginning, to find out what the problem was, but essentially the ORM I am using was generating a query with a distinct clause, and it seems that Oracle does not accept such kind of query when it involves a BLOB or a CLOB field.
Let’s see an example: the blobby table is made by a varchar2 description field and a bdata field of type BLOB.

SQL> select distinct bdata, description from blobby;
select distinct bdata, description from blobby
                *
ERROR at line 1:
ORA-00932: inconsistent datatypes: expected - got BLOB

The reported error is somehow obscure to me: ORA-00932: inconsistent datatypes: expected - got BLOB does not provide to me enough information to understand what type the system was expecting. However, seeing the BLOB final part let me reason about the problem.
However, in the begin, I was not even able to reproduce the problem because if you don’t specify an explicit column list, the same query works:

SQL> select distinct * from blobby;
...
6 rows selected.

I was unable to make the query to work even using a cast to different types, so I guess Oracle cannot handle the query when the columns are explicitly listed. And that was the problem: many ORMs, including the one I’m using, produce queries where all the columns are asked as output fields, and so Oracle was refusing to run the query.

What About PostgreSQL?

I was curious to see how does PostgreSQL handle the same situation, assuming BLOB can be translated into a bytea field.

testdb=> create table blobby( pk int generated always as identity,
description text, bdata bytea, primary key( pk ) );
testdb=> insert into blobby( description ) select 'Record ' || v from
generate_series( 1, 5 ) v;
INSERT 0 5
testdb=> \lo_import myfile.pdf
lo_import 50626
testdb=> update blobby set bdata = lo_get( 50626 );
UPDATE 5

testdb=> \o test.csv
testdb=> \a
Output format is unaligned.
testdb=> \f ';'
Field separator is ";".
testdb=> select distinct bdata, description from blobby;
-- same as select distinct * from blobby


% ls -1hs test.csv
23M test.csv

Despite the initial part to create and populate the table, as you can see the SELECT works both with an explicit column list or a wildcard.

Conclusions

I don’t have any conclusions, and I don’t blame a product or another. They just behave differently under pretty much the same context.
I like the PostgreSQL approach the most, it seems more natural. Moreover, Oracle error messages seem to me very obscure!

pgbackrest async behavior

2021-07-27T00:00:00+00:00

pgbackrest can work in asynchronous way in order to improve the resource usage.

pgbackrest async behavior

pgbackrest is an amazing backup tool, it is rock-solid (as PostgreSQL is) and designed to work under heavy database load.
One feature it has to improve efficienty of WAL archiving is the async mode.

In “standard” mode, pgbackrest will push WAL segments to the backup machine, using the classical archive_command provided by PostgreSQL. As you probably already know, PostgreSQL will wait for archive_command to complete and acknowledge the WAL transfert. It could happen that:

the archive_command could take a very long time, and while PostgreSQL will continue to work, not yet transferred WALs will make pg_wal to grow;
the archive_command could fail, and PostgreSQL will warn you (in the logs) about this event and will try again to archive the failed WALs (forever, or better, unless it succeed).

On the other hand, when doing a restore, PostgreSQL executes the restore_command to get a new WAL segment, and this in turn results in running pgbackrest for a single WAL request.
The key concept here is probably single WAL request, both for push and get.

pgbackrest allows for an improvement on this situation by means of asynchronous archive management, both push and get. The idea is to give more control to pgbackrest so that it can optimize I/O operations.
When PostgreSQL archives a WAL segment, it executes the archive_command within a loop (allow me to simplify things): when a WAL is ready, archive_command is invoked and until it has finished, there is no chance to archive an already available WAL segment. On the other hand, when PostgreSQL needs to get a WAL in order to do a restore/recovery, it executes restore_command on every WAL segment it is expecting to replay. Therefore, if the server has to replay many WALs, it has to execute restore_command and “download” every WAL one after the other.
How does the asynchronous mode improve on the above?
When archiving, that means pushing, pgbackrest can decide to group several WALs in a single transfert, that means for instance to reduce the setup/tear-down operations for establishing a network connection with the backup machine.
When restoring, that means getting, pgbackrest could perform a pre-fetch, downloading a few WALs on the local machine and make them available immediatly to the PostgreSQL server when needed.

The test environment

In this post, I will demonstrate the usage of pgbackrest asynchronously using my usual two-machine setup:

miguel is the PostgreSQL server, running Fedora Linux with PostgreSQL 13.3;
carmensita is the backup machine, running Fedora Linux.

pgbackrest is at version 2.34.

Asynchronous Configuration Parameters

There are a bunch of configuration parameters that can be configured within the pgbackrest.conf file or specified on the command line, as usual.
The settings mainly regard the spool directory, the queues and the enabling of the asynchronous mode.

Enabling or disabling the asynchronous mode

There is a single configuration parameter to enable the asynchronous mode: async. By default this is false, meaning pgbackrest will work “normally” as you expect. Turning it on, will automatically make any archive-get and archive-push in asynchronous mode.

The spool directory

In order to manage the async operations, pgbackrest creates on the PostgreSQL machine a spool directory, usually /var/spool/pgbackrest where it places an archive directory and a directory named after the server, or better, the stanza. Such directory could then be split into in or out for respectively archive-get and archive-push.
The spool directory root can be defined with the spool-path configuration parameter.
For example, given the stanza named miguel, the spool directory will either be /var/spool/pgbackrest/archive/miguel/out or /var/spool/pgbackrest/archive/miguel/in.

In the out directory the system will write book-keeping stuff, mainly small text files that will be used to identify at which point the archiving has arrived.
In the in directory, the system will store incoming WALs ready to be restored from the PostgreSQL server.

Queues

There are two different setting to manage the queues of pgbackrest:

archive-push-max-queue;
archive-get-max-queue.

They configure the max size of the data enqueued for the push and get operations. When the queue is full, pgbackrest will behave differently depending on the operation that is ongoing, as explained below.

Configuration

The backup machine, named carmensita has a 7etc/pgbackrest.conf file configured as follows:

$ cat /etc/pgbackrest.conf
[global]
start-fast = y
stop-auto  = y
repo1-path = /backup/pgbackrest

repo1-retention-full=2

repo1-host-user = backup
log-level-console = info


[miguel]
pg1-host = miguel
pg1-path = /postgres/13/data

while on the PostgreSQL server machine, named miguel the /etc/pgbackrest.conf file is

[global]
repo1-path = /backup/pgbackrest
repo1-host-user = backup
log-level-console = info
repo1-host = carmensita



archive-async          = y
archive-push-queue-max = 500MB
spool-path             = /var/spool/pgbackrest
archive-get-queue-max  = 32MB

Last, the archive_command on the PostgreSQL machine is configured as follows:

archive_command = '/usr/bin/pgbackrest \
                    --pg1-path=/postgres/13/data \
                    --config=/etc/pgbackrest.conf \
                    --stanza=miguel \
                    archive-push %p'
archive_mode = on

Please note that the archive-async parameter is specified in the configuration, instead of setting it in the archive-push or archive-get. This simplifies, in my opinion, the usage of pgbackrest.

With all the above up and running, it is possible to see how the asynchronous mode works.

Archiving (`archive-push`)

Let’s start with the backup scenario, that is archive-push.

When things go right

Let’s see what happens when everything is fine: I launched a pgbench session in order to generate some traffic and, therefore, some WAL segment generation and archiving. On one hand, pgbench was running as follows:

% pgbench -c 8 -T 120 -h miguel -U pgbench -n -P 5 pgbench

While pgbench is running, let’s inspect what is happening on the PostgreSQL machine, with particular regard to the spooling folder:

# ls -1s /var/spool/pgbackrest/archive/miguel/out \
      && psql -h miguel -U postgres \
         -c 'select last_archived_wal from pg_stat_archiver;' postgres

0 000000070000014E00000015.ok

last_archived_wal     
--------------------------
 000000070000014E00000015


# # after a while

# ls -1s /var/spool/pgbackrest/archive/miguel/out \
        && psql -h miguel -U postgres \
           -c 'select last_archived_wal from pg_stat_archiver;' postgres

0 000000070000014E00000016.ok

last_archived_wal     
--------------------------
 000000070000014E00000016

As you can see, in the spool directory there will be an empty file named after the last archived WAL segment, that is the last segment sent to the backup machine, and the suffix .ok.
In the PostgreSQL logs, there will be a notice when the pgbackrest completes the pushing (depending on the log level you configured):

INFO: pushed WAL file '000000070000014E000000AD' to the archive asynchronously

When things go wrong

First case: shutting down the backup machine

Assume the backup machine, carmensita, is turned off. The archiving cannot work, of course, and if you generate again some traffic on the PostgreSQL server (e.g., by using pgbench as shown above), the situation on the spool directory is:

# ls -1s /var/spool/pgbackrest/archive/miguel/out \
    && psql -h miguel -U postgres \
       -c 'select last_archived_wal, last_failed_wal from pg_stat_archiver;' postgres

4 global.error

    last_archived_wal     |     last_failed_wal      
--------------------------|--------------------------
 000000070000014E0000001A | 000000070000014E0000001B

The file global.error contains a textual description of what is happening:

# cat /var/spool/pgbackrest/archive/miguel/out/global.error 
103
unable to find a valid repository:
repo1: [UnknownError] remote-0 process on 'carmensita' terminated unexpectedly [255]: ssh: connect to host carmensita port 22: No route to hos

If you then restart the backup machine, so that the archiving starts working again, the situation on the spool directory is as follows:

# ls -1s /var/spool/pgbackrest/archive/miguel/out \
    && psql -h miguel -U postgres \
       -c 'select last_archived_wal, last_failed_wal from pg_stat_archiver;' postgres

0 000000070000014E0000001B.ok
0 000000070000014E0000001C.ok
0 000000070000014E0000001D.ok
0 000000070000014E0000001E.ok
0 000000070000014E0000001F.ok

    last_archived_wal     |     last_failed_wal      
--------------------------|--------------------------
 000000070000014E0000001F | 000000070000014E0000001B

As you can see, the .ok files are there and the archiving is working again.
During the time, there could be one or more .ok files. The idea is that the last .ok file indicates the last asynchronously archived WAL segment (in the above, the one ending with 1F).

Second case: generating more WALs

Shutdown the backup machine again, so that the PostgreSQL server is not able to archive WAL segments; then generate quite an amount of traffic to increase the WAL directory size (pg_wal).
Let’s inspect the situation:

# ls -1s /var/spool/pgbackrest/archive/miguel/out \
     && psql -h miguel -U postgres \
        -c 'select last_archived_wal, last_failed_wal from pg_stat_archiver;' postgres

4 global.error

    last_archived_wal     |     last_failed_wal      
--------------------------|--------------------------
 000000070000014E000000ED | 000000070000014E000000EE
  
 # cat /var/spool/pgbackrest/archive/miguel/out/global.error 
103
unable to find a valid repository:
repo1: [UnknownError] remote-0 process on 'carmensita' terminated unexpectedly [255]: ssh: connect to host carmensita port 22: No route to host

Therefore, the 14E0xED is the last archived WAL on the backup machine.
Suppose now a larger amount of data is mangled on PostgreSQL, so that it starts generating WAL segments. Clearly PostgreSQL cannot archive segments anymore, and will start accumulating them into pg_wal to keep them available for when the archive_command will start to work again.
Or does it?
Inspect again the situation on disk:

# ls -1s /var/spool/pgbackrest/archive/miguel/out \
   && psql -h miguel -U postgres      \
     -c 'select last_archived_wal, last_failed_wal from pg_stat_archiver;' postgres

4 000000070000014F00000053.ok
4 000000070000014F00000054.ok
4 000000070000014F00000055.ok
4 000000070000014F00000056.ok
4 000000070000014F00000057.ok
4 000000070000014F00000058.ok
4 000000070000014F00000059.ok
4 000000070000014F0000005A.ok.pgbackrest.tmp

    last_archived_wal     |     last_failed_wal      
--------------------------|--------------------------
 000000070000014F00000059 | 000000070000014F00000053


# cat /var/spool/pgbackrest/archive/miguel/out/000000070000014F00000059.ok
0
dropped WAL file '000000070000014F00000059' because archive queue exceeded 500MB
  

First of all: last_archived_wal advanced even if the acrhive_command is failing (remember that the backup machine is down)! How is that possible?
The answer is in how pgbackrest asynchronous works: if the number of failed WALs is greater than a specified size, pgbackrest decides to ackwnloedge the archiving to the PostgreSQL server, that in turn advances in archiving even if the archived WAL did not hit the backup machine!
The idea is that pgbackrest will prevent the pg_wal to grow undefinetly, thus risking to stop PostgreSQL to work at all. However, ** acknowledging a fake archiving means that the WAL-stream is broken, so Point In Time Recovery will not be possible anymore around this “hole” and a new backup is strongly recommended!**
pgbackrest inserts an information into its .ok files, that now are non-empty and inform the administrator that the WAL segment has been dropped explicitly.
You can find the same information into the PostgreSQL logs, where pgbackrest prints a message to make it clear:

% sudo grep 000000070000014F00000059 $PGLOG

INFO: archive-push command begin 2.34: [pg_wal/000000070000014F00000059] --archive-async --archive-push-queue-max=500MB --config=/etc/pgbackrest.conf --exec-id=40124-af251b1c --log-level-console=info --pg1-path=/postgres/13/data --repo1-host=carmensita --repo1-host-user=backup --repo1-path=/backup/pgbackrest --spool-path=/var/spool/pgbackrest --stanza=miguel

WARN: dropped WAL file '000000070000014F00000059' because archive queue exceeded 500MB

INFO: pushed WAL file '000000070000014F00000059' to the archive asynchronously

There are three pieces of information:

pgbackrest tried to archive the WAL segment, failing;
there is a WARN that informs you that pgbackrest instrumented PostgreSQL to drop the WAL file as if it was archived correctly;
pgbackrest states that it has archived the file, so PostgreSQL can proceed to delete or recycle it.

What and when does pgbackrest decides to give up and starts faking to PostgreSQL? The acrhive-push-queue-max configuration paramater establish how many data pgbackrest can fail behind the normal WAL operations before trying to make PostgreSQL delete segments.
In my configuration, there is archive-push-queue-max=500MB, that means that after 500MB of failed WALs, pgbackrest will start faking and there will be a hole into the WAL stream. Roughly, this corresponds to 32 failed WALs on a row.

Parallel Processes

The configuration parameter process-max can be used to control how many push workers can be launched to serve the asynchronous system. Suppose that in the configuration there is process-max = 4, then during WAL archiving you could see something as follows in the process list:

# pstree -c  -A
systemd-|-NetworkManager-|-{NetworkManager}
       ...                                                                                        
        |-pgbackrest-|-pgbackrest---ssh
        |            |-pgbackrest---ssh
        |            |-pgbackrest---ssh
        |            |-pgbackrest---ssh
        |            `-ssh
       ...                                        
        |-postmaster-|-postmaster
        |            |-postmaster
        |            |-postmaster
        |            |-postmaster
        |            |-postmaster
        |            |-postmaster---pgbackrest
        |            |-postmaster
        |            `-postmaster
      ...

As you can see, PostgreSQL has launched pgbackrest (that is, is executing the archive_command), and there are four pgbackrest processes.
If the system is pushing archives in synchronous mode, process-max is ignored.
Every concurrent process will share an exec-id that identifies the batch to which the process belongs:

# pstree -A -c -a -l | grep pgbackrest
...
  |-pgbackrest --config=/etc/pgbackrest.conf --exec-id=46475-10e060a1
  |   |-pgbackrest --config=/etc/pgbackrest.conf --exec-id=46475-10e060a1
  |   |-pgbackrest --config=/etc/pgbackrest.conf --exec-id=46475-10e060a1
  |   |-pgbackrest --config=/etc/pgbackrest.conf --exec-id=46475-10e060a1
...

Restoring (`archive-get`)

Let’s do a restore from a recent backup:

% sudo systemctl stop postgresql-13.service
% sudo -u postgres pgbackrest --stanza miguel \
       --pg1-path /postgres/13/data --delta restore
       ...
       INFO: restore command end: completed successfully (69861ms)

During the restore, the archive directory within the spool directory of pgbackrest is cleaned, in particular the specific server directory miguel is removed, since no WAL archiving is in progress.
The postgresql.auto.conf file contains the archive-get command ready to fetch the WAL segments:

% sudo cat /postgres/13/data/postgresql.auto.conf

# Recovery settings generated by pgBackRest restore on 2021-07-27 05:26:26
restore_command = 'pgbackrest --pg1-path=/postgres/13/data --stanza=miguel archive-get %f "%p"'

During the system startup, pgbackrest will get (as usual) WAL segments from the backup machine, but this time in an asynchronous way:

INFO: archive-get command begin 2.34: [000000070000014F000000A4, pg_wal/RECOVERYXLOG] --archive-async --archive-get-queue-max=32MB --exec-id=42831-f4ada646 --log-level-console=info --pg1-path=/postgres/13/data --repo1-host=carmensita --repo1-host-user=backup --repo1-path=/backup/pgbackrest --spool-path=/var/spool/pgbackrest --stanza=miguel

INFO: found 000000070000014F000000A4 in the archive asynchronously

INFO: archive-get command end: completed successfully (713ms)

The above is an excerpt of the PostgreSQL log. In the meantime, the spool directory was populated with a in subdirectory for the server, and in such directory the incoming WALs were stored waiting to be replayed by the PostgreSQL server:

% sudo ls -1s /var/spool/pgbackrest/archive/miguel/in
16384 000000070000014F000000A4.pgbackrest.tmp

In this scenario, the archive-get-queue-max parameter can specify the size of pre-fetched WALs: pgbackrest will fetch and store in the spooling directory no more WAL segments than the specified amount. Unlike the push configuration, setting this parameter does not imply the system will throw away WALs.

Conclusions

pgbackrest is an amazing backup tool, rock solid and with a lot of configuration parameters that can help improving the resource usage so that the backup and restore work fast and reliably even under heavy loads.
The asynchronous mode can help improving performances by means of batches and pre-fetching of WAL segments. However, you need to be aware about the fact that, by design, asynchronous pushing of WALs could produce holes in the WAL stream if the archiving accumulates too much data.
This is, in my opinion, an excellent feature, because in my experience I’ve seen many times a PostgreSQL server accumulating too much WAL segments (up to consuming all the storage) due to a faulty backup machine (or networking). After all, pgbackrest is ensuring you that a backup exists, and at least that your PostgreSQL server will not go read-only due to archive_command failing.

Love it or hate it!

PostgreSQL Extension Catalogs

2021-07-20T00:00:00+00:00

How to see the available and/or installed extensions?

PostgreSQL Extension Catalogs

There are three main catalogs that can be useful when dealing with extensions:

The former one, pg_extension provides information about which extensions are installed in the current database, while the latter, pg_available_extensions provides information about which extensions are available to the cluster.
The difference is simple: to be used an extension must appear first on pg_available_extensions, that means it has been installed on the cluster (e.g., via pgxnclient). From this point on, the extension can be installed into the database by means of a CREATE EXTENSION statement; as a result the extension will appear into the pg_extension catalog.

As an example:

testdb=> select name, default_version from pg_available_extensions;
        name        | default_version 
--------------------|-----------------
 intagg             | 1.1
 plpgsql            | 1.0
 dict_int           | 1.0
 dict_xsyn          | 1.0
 adminpack          | 2.1
 intarray           | 1.3
 amcheck            | 1.2
 autoinc            | 1.0
 isn                | 1.2
 bloom              | 1.0
 fuzzystrmatch      | 1.1
 jsonb_plperl       | 1.0
 btree_gin          | 1.3
 jsonb_plperlu      | 1.0
 btree_gist         | 1.5
 hstore             | 1.7
 hstore_plperl      | 1.0
 hstore_plperlu     | 1.0
 citext             | 1.6
 lo                 | 1.1
 ltree              | 1.2
 cube               | 1.4
 insert_username    | 1.0
 moddatetime        | 1.0
 dblink             | 1.2
 earthdistance      | 1.1
 file_fdw           | 1.0
 pageinspect        | 1.8
 pg_buffercache     | 1.3
 pg_freespacemap    | 1.2
 pg_prewarm         | 1.2
 pg_stat_statements | 1.8
 pg_trgm            | 1.5
 pg_visibility      | 1.2
 pgcrypto           | 1.3
 pgrowlocks         | 1.2
 pgstattuple        | 1.5
 postgres_fdw       | 1.0
 refint             | 1.0
 seg                | 1.3
 bool_plperl        | 1.0
 plperlu            | 1.0
 sslinfo            | 1.2
 anon               | 0.9.0
 tablefunc          | 1.0
 tcn                | 1.0
 tsm_system_rows    | 1.0
 bool_plperlu       | 1.0
 tsm_system_time    | 1.0
 pgaudit            | 1.5
 pg_qualstats       | 2.0.2
 unaccent           | 1.1
 plperl             | 1.0
 orafce             | 3.13
 uuid-ossp          | 1.1
 xml2               | 1.1
 pg_background      | 1.0

The above list represents all the available extensions installed on the cluster, thus those I can execute a CREATE EXTENSION against.

The pg_available_extensions has an installed_version field that provides the version number of the extension installed in the current database, or NULL if the extension is not installed in the current database. Therefore, in order to know if an extension is installed or not in a database, you can run a query like the following:

<br/<

testdb=> select name, default_version, installed_version 
         from pg_available_extensions 
         where installed_version is not null;
     name      | default_version | installed_version 
---------------|-----------------|-------------------
 plpgsql       | 1.0             | 1.0
 dblink        | 1.2             | 1.2
 orafce        | 3.13            | 3.13
 pg_background | 1.0             | 1.0

This is a little too much effort, and since extension could have been installed with different flags in different database, the pg_extension catalog provides a more detailed and narrowed information: it lists all extensions that have been installed on the current database.

Therefore, to see what a database can use, that means which extensions it has access to, I need to use the pg_extension catalog:

testdb=> select extname, extversion from pg_extension ;
    extname    | extversion 
---------------|------------
 plpgsql       | 1.0
 orafce        | 3.13
 dblink        | 1.2
 pg_background | 1.0

The current database has a much smaller list of available extensions.

Extension Version Numbers

As you know, an extension can come with different version number and the beauty of this mechanism is that it is easy to upgrade an extension from one version to another.
The pg_available_extensions catalog provides only the last (i.e., newest) version of an available extension. Let’s try with a very popular extension: pg_stat_statements:

testdb=> select name, default_version, installed_version
         from pg_available_extensions 
         where name = 'pg_stat_statements';
        name        | default_version | installed_version 
--------------------|-----------------|-------------------
 pg_stat_statements | 1.8             | 

The extension could be installed to the version 1.8 and is currently not available in the current database.
But what about other version numbers?
The catalog pg_available_extension_versions provides a list of all available versions an extension is currently available:

testdb=> select name, version, installed, relocatable
         from pg_available_extension_versions 
         where name = 'pg_stat_statements'
         order by version;
        name        | version | installed | relocatable 
--------------------|---------|-----------|-------------
 pg_stat_statements | 1.4     | f         | t
 pg_stat_statements | 1.5     | f         | t
 pg_stat_statements | 1.6     | f         | t
 pg_stat_statements | 1.7     | f         | t
 pg_stat_statements | 1.8     | f         | t

As you can see, the extension is available in five different versions, and I can choose the version that fit the best my requirements.
This catalog provides different information, in particular it can give you an idea if the extension can be installed only by superusers (field superuser) or by a user with appropriate privileges (field trusted), as well as other required extensions (field requires_name), and relocatability.

How much data goes into the WALs? (part 2)

2021-07-15T00:00:00+00:00

I did some more experiments with WALs.

How much data goes into the WALs? (part 2**

In order to get a better idea about how WAL settings can change the situation within the WAL management, I decided to run a kind of automated test and store the results into a table, so that I can query them back later.
The idea is the same of my previous article: produce some workload, meausere the differences in the Log Sequence Numbers, and see how the size of WALs change depending on some setting. This is not an accurate research, it’s just a quick and dirty experiment.

At the end, I decided to share my numbers so that you can have a look at them and elaborate a bit more. For example, I’m no good at all at doing graphs (I know only the very minimum about gnuplot!).

!!! WARNING !!!

WARNING: this is not a guide on how to tune WAL settings! This is not even a real and comprhensive set of experiments, it is just what I’ve played with to see how much traffic can be generated for certain amount of workloads.
Your case and situation could be, and probably is, different from the very simple test I’ve done, and I do not pretend to be right about the small and obvious conclusions I come up at the end. In the case you see or know something that can help making more clear what I write in the following, please comment or contact me!

Set up

First of all I decided to run an INSERT only workload, so that the size of the resulting table does not include any bloating and is therefore comparable to the effort about the WAL records.
No other database activity was ongoing, so that the only generated WAL traffic was about my own workload.
Each time the configuration was changed, the system was restarted, so that every workload started with the same (empty) clean situation and without any need to reason about ongoing checkpoints. Of course, checkpoints were happening as usual, but not at the beginning of the workload.

I used two tables to run the test:

wal_traffic stores the results of each run;
wal_traffic_data is used to store the data about every workload, that is tuples inserted in the database.
The wal_traffic_data was dropped and re-created every time a new run was started, so to avoid data bloating It is interesting to note that any workload setup activity is performed before the server is restarted, so that the only WAL traffic measured is as close as possible to the workload only.
The wal_traffic table is defined as follows:

CREATE TABLE IF NOT EXISTS wal_traffic
  (
    pk int generated always as identity
    , workload text
    , lsn_start pg_lsn
    , lsn_end   pg_lsn
    , lsn_insert_start pg_lsn
    , lsn_insert_end   pg_lsn
    , run int          default 0
    , data_size bigint default 0
    , wal_size bigint generated always as ( lsn_end - lsn_start ) stored
    , wal_data_ratio numeric generated always as ( ( lsn_end - lsn_start )::real / data_size * 100 ) stored
    , wal_insert_data_ratio numeric generated always as ( ( lsn_insert_end - lsn_insert_start )::real / data_size * 100 ) stored
    , settings jsonb
    , workload_repetitions int default 1
    , ts_start timestamp default current_timestamp
    , ts_end   timestamp default current_timestamp

    , PRIMARY KEY ( pk )
  );

The workload field stores the text string about the executed query.
The lsn_xxx fields store the location within the WAL, in particular:

lsn_start and lsn_end store the result of pg_current_wal_lsn() function invoked at the begin and at the end of the workload;
lsn_insert_start and lsn_insert_end store the result of pg_current_wal_insert_lsn() function invoked at the beginning and ending of the workload.

I decided to store both the information to be able to examine differences in a more accurate way, however for this kind of experiment the differences between the values are pretty much useless.
The data_size column contains the result of pg_relation_size(), that is a rough estimation of the volumen of data produced during the workload.
The columns wal_size, wal_data_ratio, and wal_insert_data_ratioare generated, and contain repsectively the amount of generated WAL records and the ratio between the size of the actual data and that of the WAL records.
Last, the settings column contains a jsonb representation of the settings used to run the test, like for example the value for wal_level, wal_compression and so on.

There is also a view to quickly get results about the workload size:

CREATE OR REPLACE VIEW vw_wal_traffic
  AS
  select pg_size_pretty( data_size ) as data_size,
		 pg_size_pretty( wal_size ) as wal_size, wal_data_ratio::numeric( 7, 2 ) || ' %' as ratio,
		 wal_insert_data_ratio::numeric( 7, 2 ) || '%' as ins_ratio,
		 ts_end - ts_start as elapsed_time,
		 settings from wal_traffic;

Details about the workloads

I’ve prepared two different workload, both based on INSERTs.
The first workload does two transactions: the first one inserts a certain amount of tuples, while the second inserts a smaller amount of tuples. In particular, the first transaction inserts a number of tuples specified by $workload_scale, while the second transaction inserts 1/5 of the same value.

BEGIN;
INSERT INTO $WORKLOAD_TABLE SELECT v, md5( v::text )::text || random()::text
FROM generate_series( 1, $workload_scale ) v;
COMMIT;

BEGIN;
INSERT INTO $WORKLOAD_TABLE 
SELECT v + v, t || ' - ' || t || random()::text
FROM $WORKLOAD_TABLE
WHERE v % 5 = 0;
COMMIT;

The $workload_scale variable assumes the values ranging from 100 to 10 million growing by a factor of ten (e.g., 100, 1000, 10000 and so on).
The second workload type is shorter, and does the following:

DO $$
DECLARE
  i int;
BEGIN
  FOR i IN 1 .. $workload_scale LOOP
    INSERT INTO $WORKLOAD_TABLE SELECT 1, md5( random()::text )::text;
  END LOOP;
END
$$;

Therefore performs the same number of tuple insertions as in the previous transaction, but it does by looping. The final effect is that the first workload executes a single INSERT statetement, while the second workload executes several INSERT statements.

The usage of random() within the INSERT statements is to generate some more traffic on logical decoding.

The Workload Workflow

In order to do the tests, I wrote an ugly shell script with the following workflow:

truncate the wal_traffic_data table, so that its size on disk does not include previous experiments;
execute a few ALTER SYSTEM to set some configuration on WAL related parameters (wal_level, full_page_writes, wal_compression and so on);
restart the PostgreSQL system, so to ensure every test has a clean and clear situation;
get the current WAL position (pg_current_wal_lsn() and pg_current_wal_insert_lsn());
execute the workload with the right scale;
get the current WAL position (pg_current_wal_lsn() and pg_current_wal_insert_lsn());
insert the result tuple with WAL differences into wal_traffic;
loop with a different scaling factor.

The Results

It is now time to have a look at the test results.

Let’s consider a few results:

testdb=> select * from vw_wal_traffic where settings->>'wal_level' = 'minimal' and settings->>'wal_compression' = 'on';
-[ RECORD 1 ]+-----------------------------------------------------------------------------------------------------
data_size    | 1205 MB
wal_size     | 2148 MB
ratio        | 178.27 %
ins_ratio    | 178.27%
elapsed_time | 00:02:33.282366
settings     | {"wal_level": "minimal", "wal_log_hints": "off", "wal_compression": "on", "full_page_writes": "off"}
-[ RECORD 2 ]+-----------------------------------------------------------------------------------------------------
data_size    | 1205 MB
wal_size     | 2148 MB
ratio        | 178.27 %
ins_ratio    | 178.27%
elapsed_time | 00:02:34.882126
settings     | {"wal_level": "minimal", "wal_log_hints": "on", "wal_compression": "on", "full_page_writes": "on"}

As you can see, for 1,2 GB of data the system has produced roughly 2,1 GB of WAL records. And the situation is even worst when there is no wal_compression (as you could expect):

-[ RECORD 8 ]+------------------------------------------------------------------------------------------------------
data_size    | 1205 MB
wal_size     | 2402 MB
ratio        | 199.34 %
ins_ratio    | 199.34%
elapsed_time | 00:02:30.725138
settings     | {"wal_level": "minimal", "wal_log_hints": "off", "wal_compression": "off", "full_page_writes": "on"}

this time, for the same amount of data, the WAL size is almost double that of the real data.
Changing the setting of wal_level to logical or replicat does not change very much the situation,

It is possible to get the best ratio between the WAL produced and the data stored:

testdb=> select * from vw_wal_traffic v where ratio = ( select min( ratio ) from vw_wal_traffic where settings->>'wal_level' = v.settings->>'wal_level' ) and v.settings->>'wal_level' IN ( 'minimal', 'replica', 'logical' );
-[ RECORD 1 ]+-----------------------------------------------------------------------------------------------------
data_size    | 16 kB
wal_size     | 16 kB
ratio        | 101.95 %
ins_ratio    | 101.95%
elapsed_time | 00:00:00.133674
settings     | {"wal_level": "logical", "wal_log_hints": "on", "wal_compression": "on", "full_page_writes": "off"}
-[ RECORD 2 ]+-----------------------------------------------------------------------------------------------------
data_size    | 16 kB
wal_size     | 16 kB
ratio        | 101.56 %
ins_ratio    | 101.56%
elapsed_time | 00:00:00.120578
settings     | {"wal_level": "replica", "wal_log_hints": "on", "wal_compression": "on", "full_page_writes": "on"}
-[ RECORD 3 ]+-----------------------------------------------------------------------------------------------------
data_size    | 16 kB
wal_size     | 18 kB
ratio        | 111.13 %
ins_ratio    | 100.34%
elapsed_time | 00:00:00.427126
settings     | {"wal_level": "minimal", "wal_log_hints": "off", "wal_compression": "on", "full_page_writes": "off"}

and on the other side, the worst ratio:

testdb=> select * from vw_wal_traffic v where ratio = ( select max( ratio ) from vw_wal_traffic where settings->>'wal_level' = v.settings->>'wal_level' ) and v.settings->>'wal_level' IN ( 'minimal', 'replica', 'logical' );  
-[ RECORD 1 ]+-----------------------------------------------------------------------------------------------------
data_size    | 8192 bytes
wal_size     | 23 kB
ratio        | 289.16 %
ins_ratio    | 190.72%
elapsed_time | 00:00:00.266881
settings     | {"wal_level": "minimal", "wal_log_hints": "off", "wal_compression": "off", "full_page_writes": "on"}
-[ RECORD 2 ]+-----------------------------------------------------------------------------------------------------
data_size    | 8192 bytes
wal_size     | 23 kB
ratio        | 289.16 %
ins_ratio    | 190.72%
elapsed_time | 00:00:00.112946
settings     | {"wal_level": "minimal", "wal_log_hints": "off", "wal_compression": "off", "full_page_writes": "on"}
-[ RECORD 3 ]+-----------------------------------------------------------------------------------------------------
data_size    | 8192 bytes
wal_size     | 23 kB
ratio        | 284.47 %
ins_ratio    | 190.63%
elapsed_time | 00:00:00.076021
settings     | {"wal_level": "logical", "wal_log_hints": "off", "wal_compression": "off", "full_page_writes": "on"}
-[ RECORD 4 ]+-----------------------------------------------------------------------------------------------------
data_size    | 8192 bytes
wal_size     | 23 kB
ratio        | 289.65 %
ins_ratio    | 190.53%
elapsed_time | 00:00:00.113793
settings     | {"wal_level": "replica", "wal_log_hints": "off", "wal_compression": "off", "full_page_writes": "on"}

From the above, it is clear that the worst cases are those with wal_compression disabled, while the best cases are those with compression enabled.

Download the Results

The results are available by means of a CSV file, so you can load and inspect them yourself. In order to load the files, create a table wal_traffic_results as follows:

testdb=> create table wal_traffic_results ( 
   run int, workload text, wal_size bigint, 
   data_size bigint, 
   wal_data_ratio numeric( 5,2), 
   wall_clock time, wal_level text, 
   wal_log_hints text, 
   wal_compression text, 
   full_page_writes text );

and then load the CSV file with a command like the following one:

testdb=> \copy wal_traffic_results from wal_traffic.csv with ( format csv, header  );

Please note that I’ve split the jsonb field into a set of columns with a query like the following one, that produced the CSV file:

% psql -A --csv -h miguel 
-c 'select run, workload, wal_size, data_size, wal_data_ratio,  ts_end - ts_start as wall_clock, x.* from wal_traffic
cross join lateral jsonb_to_record( settings ) as x( wal_level text, wal_log_hints text, wal_compression text, full_page_writes text );' 
testdb >! wal_traffic.csv

More Results

From the de-jsonb representation of the results, it is easier to get a glance at the WAL ratio by workload type

testdb=> select workload, min( wal_data_ratio ), max( wal_data_ratio ), max( wal_data_ratio ) - min( wal_data_ratio ) as diff
from wal_traffic_results 
group by workload order by 4 asc;
-[ RECORD 1 ]-------------------------------------------------------------------------------
workload | 'BEGIN;                                                                          +
         | INSERT INTO wal_traffic_workload SELECT v, md5( v::text )::text || random()::text+
         | FROM generate_series( 1, 1000000 ) v;                                            +
         | COMMIT;                                                                          +
         |                                                                                  +
         | BEGIN;                                                                           +
         | INSERT INTO wal_traffic_workload                                                 +
         | SELECT v + v, t || '' - '' || t || random()::text                                +
         | FROM wal_traffic_workload                                                        +
         | WHERE v % 5 = 0;                                                                 +
         | COMMIT;'
min      | 125.55
max      | 125.60
diff     | 0.05
-[ RECORD 2 ]-------------------------------------------------------------------------------
workload | 'DO $wl$ DECLARE                                                                 +
         |   i int;                                                                         +
         | BEGIN                                                                            +
         |   FOR i IN 1 .. 1000000 LOOP                                                     +
         |     INSERT INTO wal_traffic_workload SELECT 1, md5( random()::text )::text;      +
         |   END LOOP;                                                                      +
         | END $wl$;'
min      | 125.76
max      | 125.82
diff     | 0.06
-[ RECORD 3 ]-------------------------------------------------------------------------------
workload | 'DO $wl$ DECLARE                                                                 +
         |   i int;                                                                         +
         | BEGIN                                                                            +
         |   FOR i IN 1 .. 10000000 LOOP                                                    +
         |     INSERT INTO wal_traffic_workload SELECT 1, md5( random()::text )::text;      +
         |   END LOOP;                                                                      +
         | END $wl$;'
min      | 125.76
max      | 125.92
diff     | 0.16
-[ RECORD 4 ]-------------------------------------------------------------------------------
workload | 'BEGIN;                                                                          +
         | INSERT INTO wal_traffic_workload SELECT v, md5( v::text )::text || random()::text+
         | FROM generate_series( 1, 100000 ) v;                                             +
         | COMMIT;                                                                          +
         |                                                                                  +
         | BEGIN;                                                                           +
         | INSERT INTO wal_traffic_workload                                                 +
         | SELECT v + v, t || '' - '' || t || random()::text                                +
         | FROM wal_traffic_workload                                                        +
         | WHERE v % 5 = 0;                                                                 +
         | COMMIT;'
min      | 125.49
max      | 125.73
diff     | 0.24
-[ RECORD 5 ]-------------------------------------------------------------------------------
workload | 'DO $wl$ DECLARE                                                                 +
         |   i int;                                                                         +
         | BEGIN                                                                            +
         |   FOR i IN 1 .. 100000 LOOP                                                      +
         |     INSERT INTO wal_traffic_workload SELECT 1, md5( random()::text )::text;      +
         |   END LOOP;                                                                      +
         | END $wl$;'
min      | 125.72
max      | 125.97
diff     | 0.25
-[ RECORD 6 ]-------------------------------------------------------------------------------
workload | 'BEGIN;                                                                          +
         | INSERT INTO wal_traffic_workload SELECT v, md5( v::text )::text || random()::text+
         | FROM generate_series( 1, 10000 ) v;                                              +
         | COMMIT;                                                                          +
         |                                                                                  +
         | BEGIN;                                                                           +
         | INSERT INTO wal_traffic_workload                                                 +
         | SELECT v + v, t || '' - '' || t || random()::text                                +
         | FROM wal_traffic_workload                                                        +
         | WHERE v % 5 = 0;                                                                 +
         | COMMIT;'
min      | 124.99
max      | 126.55
diff     | 1.56
-[ RECORD 7 ]-------------------------------------------------------------------------------
workload | 'DO $wl$ DECLARE                                                                 +
         |   i int;                                                                         +
         | BEGIN                                                                            +
         |   FOR i IN 1 .. 10000 LOOP                                                       +
         |     INSERT INTO wal_traffic_workload SELECT 1, md5( random()::text )::text;      +
         |   END LOOP;                                                                      +
         | END $wl$;'
min      | 125.14
max      | 127.47
diff     | 2.33
-[ RECORD 8 ]-------------------------------------------------------------------------------
workload | 'BEGIN;                                                                          +
         | INSERT INTO wal_traffic_workload SELECT v, md5( v::text )::text || random()::text+
         | FROM generate_series( 1, 10000000 ) v;                                           +
         | COMMIT;                                                                          +
         |                                                                                  +
         | BEGIN;                                                                           +
         | INSERT INTO wal_traffic_workload                                                 +
         | SELECT v + v, t || '' - '' || t || random()::text                                +
         | FROM wal_traffic_workload                                                        +
         | WHERE v % 5 = 0;                                                                 +
         | COMMIT;'
min      | 178.27
max      | 199.46
diff     | 21.19
-[ RECORD 9 ]-------------------------------------------------------------------------------
workload | 'BEGIN;                                                                          +
         | INSERT INTO wal_traffic_workload SELECT v, md5( v::text )::text || random()::text+
         | FROM generate_series( 1, 1000 ) v;                                               +
         | COMMIT;                                                                          +
         |                                                                                  +
         | BEGIN;                                                                           +
         | INSERT INTO wal_traffic_workload                                                 +
         | SELECT v + v, t || '' - '' || t || random()::text                                +
         | FROM wal_traffic_workload                                                        +
         | WHERE v % 5 = 0;                                                                 +
         | COMMIT;'
min      | 121.58
max      | 152.01
diff     | 30.43
-[ RECORD 10 ]------------------------------------------------------------------------------
workload | 'DO $wl$ DECLARE                                                                 +
         |   i int;                                                                         +
         | BEGIN                                                                            +
         |   FOR i IN 1 .. 1000 LOOP                                                        +
         |     INSERT INTO wal_traffic_workload SELECT 1, md5( random()::text )::text;      +
         |   END LOOP;                                                                      +
         | END $wl$;'
min      | 118.45
max      | 167.37
diff     | 48.92
-[ RECORD 11 ]------------------------------------------------------------------------------
workload | 'BEGIN;                                                                          +
         | INSERT INTO wal_traffic_workload SELECT v, md5( v::text )::text || random()::text+
         | FROM generate_series( 1, 100 ) v;                                                +
         | COMMIT;                                                                          +
         |                                                                                  +
         | BEGIN;                                                                           +
         | INSERT INTO wal_traffic_workload                                                 +
         | SELECT v + v, t || '' - '' || t || random()::text                                +
         | FROM wal_traffic_workload                                                        +
         | WHERE v % 5 = 0;                                                                 +
         | COMMIT;'
min      | 101.56
max      | 247.46
diff     | 145.90
-[ RECORD 12 ]------------------------------------------------------------------------------
workload | 'DO $wl$ DECLARE                                                                 +
         |   i int;                                                                         +
         | BEGIN                                                                            +
         |   FOR i IN 1 .. 100 LOOP                                                         +
         |     INSERT INTO wal_traffic_workload SELECT 1, md5( random()::text )::text;      +
         |   END LOOP;                                                                      +
         | END $wl$;'
min      | 124.02
max      | 289.65
diff     | 165.63
                            

There are certain workload (by type and size) that do not produce any sensible variation in the WAL produced, while for example the last workload for a small amount of tuples produces a very wide range of WAL record writes.
We could also query to search for a trend in the ratio:

testdb=> select wal_data_ratio, wal_level, wal_log_hints, wal_compression, full_page_writes from wal_traffic_results where workload like '%FOR i IN 1 .. 100 LOOP%' order by 1 desc;
-[ RECORD 1 ]----|--------
wal_data_ratio   | 289.65
wal_level        | replica
wal_log_hints    | off
wal_compression  | off
full_page_writes | on

...
-[ RECORD 6 ]----|--------
wal_data_ratio   | 256.35
wal_level        | logical
wal_log_hints    | on
wal_compression  | off
full_page_writes | on
...
-[ RECORD 19 ]---|--------
wal_data_ratio   | 150.78
wal_level        | replica
wal_log_hints    | on
wal_compression  | on
full_page_writes | off
...
-[ RECORD 36 ]---|--------
wal_data_ratio   | 124.02
wal_level        | replica
wal_log_hints    | on
wal_compression  | on
full_page_writes | on

The above confirms how much wal_compression is going to reduce the WAL traffic.
And again, the wal_level is not going to influence the WAL size too much:

testdb=> select min( wal_data_ratio ), max( wal_data_ratio ), wal_level 
         from wal_traffic_results 
         where workload like '%FOR i IN 1 .. 100 LOOP%'  
         group by wal_level order by 1 desc, 2 desc;
-[ RECORD 1 ]------
min       | 146.00
max       | 289.16
wal_level | minimal
-[ RECORD 2 ]------
min       | 124.12
max       | 284.47
wal_level | logical
-[ RECORD 3 ]------
min       | 124.02
max       | 289.65
wal_level | replica

Conclusions

Even a small amount of real data can produce quite a lot amount of WAL records, and this is good because within those records there are all the information PostgreSQL needs to keep our data at safe, that after all its our final goal.
WAL related settings can, of course, influence the amount of generated data and the idea behind this article is not to provide an exhaustive guide to tune WALs, rather to show how you can measure your WAL traffic depending on the workload you are facing.
This should then help you to decide the right way to tune your WALs.
In the case you find something wrong in the approach described above, or want to integrate or share your experience, please comment on contact me.

How much data goes into the WALs?

2021-07-13T00:00:00+00:00

What is the amount of traffic generated in the Write Ahead Logs?

How much data goes into the WALs?

PostgreSQL exploits the Write Ahead Logs (WALs) to make data changes persistent: whenever you COMMIT (implicitly or explicitly) a work, the data is stored in the WALs before it phisically hits the table it belongs to.
There are different advantages in this approach, most notably performances and the ability to survive a crash.
And one beautiful thing about PostgreSQL is that it provides you all the tools to follow, study and understand what it is happening under the hood. With regard to the WALs, there are few pg_wal_xxx functions that can be exploited to get a clue about what is happening in the WALs.
In this post I’m going to use mainly:

pg_current_wal_lsn() that provides the current offset within the WAL stream where the next thing will happen. Such offset in the WAL stream is called Log Sequence Number or LSN for short;
pg_walfile_name() that given a Log Sequence Number (LSN) provides you the name of the WAL file, in the pg_wal directory, that contains the WAL location.

It is worth spending a little time to explain what LSNs are.
PostgreSQL organizes the WALs into files large 16 MB each (you can change this setting, but assume you will not). Every time a WAL file is full, that is it contains 16 MBG of valid WAL data, PostgreSQL produces a new file (or recycles a no more used one).
The database must know exactly when things happened during the history of transactions, and this means it must be able to point to a location into the WAL files to clearly identify a transaction, or a statement, or something else. This location is expressed a Log Sequence Number, something that points the server to an offset within the WAL stream.
Therefore, when you execute an SQL statement, the database stores the result of the statement into the WALs at the position indicated by the current log sequence number, and the next statement will happen at a different log sequence number.
Log sequence numbers have the form of AA/BBxxxxxx where AA and BB can be used to identify the WAL file on disk (knowing the current timeline). In fact, usually the WAL file that contains a log sequence number is named as 000000<timeline>000000AA000000BB. As an example, if the LSN is 16/70D22618 the corresponding file on disk is 000000070000001600000070 (given the timeline number 7). This rule of thumb is not always true, since the LSN could be near the end of the WAL file, or even on the beginning of the new one, but you get the idea. The remaining part, represented by xxxxxx is the offset within the WAL file to find the position of the LSN.
PostgreSQL has a dedicated data type, pg_lsn, to store information about a Log Sequence Number. You can apply operators to pg_lsn, for example to do a difference between two values, and PostgreSQL will show you the result as a numeric value.
Now that is clear what a LSN is and how it relates to the WAL files on disk, let’s see how it is possible to get the amount of data written in the WALs with regards to the amount of data written to a table. In the following examples I’m using a server 13.3 with only me running queries, so numbers are effectively related only to my experimentations.

An example with a normal table

Let’s create a very simple table:

testdb=> create table logged_table( t text );
CREATE TABLE

Now let’s do a bulk insert, and check the current Log Seqeuence Number before and after the insertion of one million tuples:

testdb=> select pg_walfile_name( pg_current_wal_lsn() ), pg_current_wal_lsn(), pg_size_pretty( pg_relation_size( 'logged_table' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty 
--------------------------|--------------------|----------------
 000000070000001600000070 | 16/70D22618        | 0 bytes
(1 riga)

testdb=> insert into logged_table( t ) 
         select 'logged ' || v from generate_series(1, 1000000 ) v;
INSERT 0 1000000

testdb=> select pg_walfile_name( pg_current_wal_lsn() ), 
                pg_current_wal_lsn(), 
                pg_size_pretty( pg_relation_size( 'logged_table' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty 
--------------------------|--------------------|----------------
 000000070000001600000075 | 16/752611C8        | 42 MB
(1 riga)

testdb=> select pg_size_pretty( '16/752611C8'::pg_lsn - '16/70D22618'::pg_lsn );
 pg_size_pretty 
----------------
 69 MB
(1 riga)

As you can see, generating 42 MB of real table data implied the generation of 69 MB of WAL data. Why there is more data in the WALs than in the actual table? Because the WAL records must keep links to themselves, checksum and a lot of other data that can be used by PostgreSQL by replication and crash recovery.

Using an unlogged table

Let’s now start over, transforming the table as UNLOGGED, so that it is not going to hit the WALs.

testdb=> truncate table logged_table ;
TRUNCATE TABLE
testdb=> alter table logged_table set unlogged;
ALTER TABLE
testdb=> alter table logged_table rename to unlogged_table;
ALTER TABLE

Replay the same above insertion of one million tuples and see what happens to the WALs:

testdb=> select pg_walfile_name( pg_current_wal_lsn() ), pg_current_wal_lsn(), pg_size_pretty( pg_relation_size( 'unlogged_table' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty 
--------------------------|--------------------|----------------
 000000070000001600000075 | 16/75285AD0        | 0 bytes
(1 riga)

testdb=> insert into unlogged_table( t ) 
         select 'logged ' || v from generate_series(1, 1000000 ) v;
INSERT 0 1000000

testdb=> select pg_walfile_name( pg_current_wal_lsn() ), 
                pg_current_wal_lsn(), 
                pg_size_pretty( pg_relation_size( 'unlogged_table' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty 
--------------------------|--------------------|----------------
 000000070000001600000075 | 16/75285B30        | 42 MB
(1 riga)

testdb=> select pg_size_pretty( '16/75285B30'::pg_lsn - '16/75285AD0'::pg_lsn );
 pg_size_pretty 
----------------
 96 bytes

As you can see, the table has grown by the same size of the previous example, that is 42 MB of real data. This time however, the WAL records have not grown, except for a very little amount of 96 bytes of roomkeeping datata.

Going back to a logged table

What happens if the table comes back as LOGGED?

testdb=> alter table unlogged_table rename to logged_table;
ALTER TABLE
testdb=> select pg_walfile_name( pg_current_wal_lsn() ), 
                pg_current_wal_lsn(), 
                pg_size_pretty( pg_relation_size( 'logged_table' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty 
--------------------------|--------------------|----------------
 000000070000001600000075 | 16/75295120        | 42 MB
(1 riga)

testdb=> alter table logged_table set logged;
ALTER TABLE
testdb=> select pg_walfile_name( pg_current_wal_lsn() ), 
                pg_current_wal_lsn(), 
                pg_size_pretty( pg_relation_size( 'logged_table' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty 
--------------------------|--------------------|----------------
 000000070000001600000079 | 16/7978EFF0        | 42 MB
(1 riga)

testdb=> select pg_size_pretty( '16/7978EFF0'::pg_lsn - '16/75295120'::pg_lsn );
 pg_size_pretty 
----------------
 69 MB
(1 riga)

As you can see, setting the table from UNLOGGED to LOGGED generated pretty much the same amount of WAL traffice (i.e., 69 MB) as in the original insert transaction.

Add some fields

Let’s add a couple of more fields to the table, to see what happens with regard to the traffic:

testdb=> alter table logged_table add column pk serial primary key;
ALTER TABLE
testdb=> alter table logged_table add column price numeric( 5, 2 ) default 0;
ALTER TABLE
testdb=> truncate logged_table ;
TRUNCATE TABLE

and now re-run our little benchmark (note that the added fields have default values):

testdb=> select pg_walfile_name( pg_current_wal_lsn() ), 
                pg_current_wal_lsn(), 
                pg_size_pretty( pg_relation_size( 'logged_table' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty 
--------------------------|--------------------|----------------
 00000007000000160000007D | 16/7DD16CA8        | 0 bytes
(1 riga)

testdb=> insert into logged_table( t ) 
         select 'logged ' || v from generate_series(1, 1000000 ) v;
INSERT 0 1000000
testdb=> select pg_walfile_name( pg_current_wal_lsn() ), 
                pg_current_wal_lsn(), 
                pg_size_pretty( pg_relation_size( 'logged_table' ) ),
                pg_size_pretty( pg_relation_size( 'logged_table_pkey' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty | pg_size_pretty
--------------------------|--------------------|----------------|----------------
 000000070000001600000086 | 16/86BD7110        | 50 MB          | 43 MB

(1 riga)

testdb=> select pg_size_pretty( '16/86BD7110'::pg_lsn - '16/7DD16CA8'::pg_lsn );
 pg_size_pretty 
----------------
 143 MB
(1 riga)

This time, as you can see, the table has grown about 20% of its previous size, that is to 50 MB of real data, but there is also the index (on the primary key column) to consider, and that is 43 MB, for an overall total of 93 MB of real data. However, the WALs almost doubled their previous size, and still are larger than the size of the real data due to the structure of the records.

Doing a rollback

What happens if the transaction does a rollback?
WALs are managed as an append-only storage, so there will be WAL traffic. It is quite easy to experiment this:

testdb=> select pg_walfile_name( pg_current_wal_lsn() ),                        
                pg_current_wal_lsn(), 
                pg_size_pretty( pg_relation_size( 'logged_table' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty 
--------------------------|--------------------|----------------
 00000007000000160000009E | 16/9EC3A680        | 0 bytes
(1 riga)

testdb=> begin;
BEGIN
testdb=*> select pg_walfile_name( pg_current_wal_lsn() ), 
                pg_current_wal_lsn(), 
                pg_size_pretty( pg_relation_size( 'logged_table' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty 
--------------------------|--------------------|----------------
 00000007000000160000009E | 16/9EC3A680        | 0 bytes
(1 riga)

testdb=*> insert into logged_table( t )                                          
         select 'logged ' || v from generate_series(1, 1000000 ) v;
INSERT 0 1000000
testdb=*> select pg_walfile_name( pg_current_wal_lsn() ), 
                pg_current_wal_lsn(),                              
                pg_size_pretty( pg_relation_size( 'logged_table' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty 
--------------------------|--------------------|----------------
 0000000700000016000000A7 | 16/A7AFA000        | 50 MB
(1 riga)

testdb=*> rollback;
ROLLBACK
testdb=> select pg_walfile_name( pg_current_wal_lsn() ), 
                pg_current_wal_lsn(), 
                pg_size_pretty( pg_relation_size( 'logged_table' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty 
--------------------------|--------------------|----------------
 0000000700000016000000A7 | 16/A7AFAB50        | 50 MB
(1 riga)

testdb=> select pg_walfile_name( pg_current_wal_lsn() ), 
                pg_current_wal_lsn(), 
                pg_size_pretty( pg_relation_size( 'logged_table' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty 
--------------------------|--------------------|----------------
 000000070000001600000092 | 16/92D11AD8        | 0 bytes
(1 riga)

testdb=> begin;
BEGIN
testdb=*> select pg_walfile_name( pg_current_wal_lsn() ),                                          
                pg_current_wal_lsn(),                              
                pg_size_pretty( pg_relation_size( 'logged_table' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty 
--------------------------|--------------------|----------------
 000000070000001600000092 | 16/92D11AD8        | 0 bytes
(1 riga)

testdb=*> insert into logged_table( t )                                          
         select 'logged ' || v from generate_series(1, 1000000 ) v;
INSERT 0 1000000
testdb=*> select pg_walfile_name( pg_current_wal_lsn() ), 
                pg_current_wal_lsn(),                              
                pg_size_pretty( pg_relation_size( 'logged_table' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty 
--------------------------|--------------------|----------------
 00000007000000160000009B | 16/9BBD0000        | 50 MB
(1 riga)

testdb=*> select pg_size_pretty( '16/9BBD0000'::pg_lsn - '16/92D11AD8'::pg_lsn );
 pg_size_pretty 
----------------
 143 MB
(1 riga)

testdb=*> rollback;
ROLLBACK
testdb=> select pg_walfile_name( pg_current_wal_lsn() ), 
                pg_current_wal_lsn(), 
                pg_size_pretty( pg_relation_size( 'logged_table' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty 
--------------------------|--------------------|----------------
 00000007000000160000009B | 16/9BBD1F40        | 50 MB
(1 riga)

testdb=> select pg_size_pretty( '16/9BBD1F40'::pg_lsn - '16/92D11AD8'::pg_lsn );
 pg_size_pretty 
----------------
 143 MB
(1 riga)

Before the transaction starts, the current LSN is 16/92D11AD8 and it remains unchanged until the transaction actually does some work. Before the ROLLBACK the LSN is 16/9BBD0000 and immediatly after the ROLLBACK the LSN moved forward to 16/9BBD1F40. Therefore, simply issuing a ROLLBACK caused the WAL to increase about 8kB.

`pg_waldump`

The special command pg_waldump provides information about WAL contents.
It is required to have the WALs to inspect: as trivial as it could sound, you will not be able to observe your transaction if the database has executed a CHECKPOINT and has recycled the WAL segments (but you can archive them if you want to inspect old transactions).

Let’s play again our rollback transaction to get effective LSN numbers:

testdb=> truncate logged_table ;
TRUNCATE TABLE
testdb=> begin;
BEGIN
testdb=*> select pg_walfile_name( pg_current_wal_lsn() ), 
                pg_current_wal_lsn(), 
                pg_size_pretty( pg_relation_size( 'logged_table' ) ),
                pg_size_pretty( pg_relation_size( 'logged_table_pkey' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty | pg_size_pretty 
--------------------------|--------------------|----------------|----------------
 0000000700000016000000C3 | 16/C3E18BB0        | 0 bytes        | 8192 bytes
(1 riga)

testdb=*> insert into logged_table( t ) values( 'a single record' );
INSERT 0 1
testdb=*> select pg_walfile_name( pg_current_wal_lsn() ),           
                pg_current_wal_lsn(), 
                pg_size_pretty( pg_relation_size( 'logged_table' ) ),
                pg_size_pretty( pg_relation_size( 'logged_table_pkey' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty | pg_size_pretty 
--------------------------|--------------------|----------------|----------------
 0000000700000016000000C3 | 16/C3E18BB0        | 8192 bytes     | 16 kB
(1 riga)

testdb=*> rollback;
ROLLBACK
testdb=> select pg_walfile_name( pg_current_wal_lsn() ), 
                pg_current_wal_lsn(), 
                pg_size_pretty( pg_relation_size( 'logged_table' ) ),
                pg_size_pretty( pg_relation_size( 'logged_table_pkey' ) );
     pg_walfile_name      | pg_current_wal_lsn | pg_size_pretty | pg_size_pretty 
--------------------------|--------------------|----------------|----------------
 0000000700000016000000C3 | 16/C3E18D48        | 8192 bytes     | 16 kB
(1 riga)

testdb=> select pg_size_pretty( '16/C3E18D48'::pg_lsn - '16/C3E18BB0'::pg_lsn );
 pg_size_pretty 
----------------
 408 bytes
(1 riga)

Why inserting asingle tuple this time? Because when using pg_waldump the system is going to produce a very verbose output and I don’t want to mess with a ton of INSERTs.
The above generated a very small amount of WAL traffic, 408 bytes exactly. Let’s inspect what is in the WALs by means of pg_waldump:

% sudo -u postgres /usr/pgsql-13/bin/pg_waldump -p $PGDATA/pg_wal  -s 16/C3E18BB0 -e 16/C3E18D48 -t 7 

rmgr: Heap        tx:    3562600, lsn: 16/C3E18BB0, prev 16/C3E18B50, desc: INSERT+INIT off 1 flags 0x00, blkref #0: rel 1663/89735/89935 blk 0
rmgr: Btree       tx:    3562600, lsn: 16/C3E18C00, prev 16/C3E18BB0, desc: NEWROOT lev 0, blkref #0: rel 1663/89735/89937 blk 1, blkref #2: rel 1663/89735/89937 blk 0
rmgr: Btree       tx:    3562600, lsn: 16/C3E18C68, prev 16/C3E18C00, desc: INSERT_LEAF off 1, blkref #0: rel 1663/89735/89937 blk 1
rmgr: Transaction tx:    3562600, lsn: 16/C3E18CA8, prev 16/C3E18C68, desc: ABORT 2021-07-13 04:59:03.235599 EDT

I’ve removed part of the information to better fit the screen size.
The first entry on the top is the execution of the INSERT statement, followed by two entries that create the values in the index, and last there is the ABORT, that is the ROLLBACK statement.
As you can see, every record has the clear indication of what LSN it is by means of the lsn field, as well as pointer to its previous record (i.e., the previous LSN offset). This way allows PostgreSQL to read the WAL stream from the end and go back in history to get the exact boundaries of a piece of work.

Using ora2pg to do a kind of backup

2021-06-11T00:00:00+00:00

How I implemented a kind of Oracle-to-PostgreSQL backup.

Using ora2pg to do a kind of backup

Disclaimer: ora2pg is an amazing tool, but is not supposed to be used as a backup tool!
In this article I’m going to show you how I decided to implement a kind of Oracle-to-PostgreSQL backup by means of ora2pg.

It all started as a simple need: migrate an Oracle database to PostgreSQL to do some experiments.
Therefore I fired up an ora2pg project, and started from there in order to do the migration.
End of the story.
But then I was asked to migrate again the same database, because in the meantime something changed.
And then again, and again.
I’m not saying I was asked to keep the database synchronized, but to sometime load an updated amount of data (and structures) from Oracle to PostgreSQL.
As lazy as I am, after a couple of request I was producing a simple shell script to automate the job, at least about running ora2pg. Yes, this could be less trivial than you think, since ora2pg relies on the Oracle instaclient to be installed (with all the environment set), and Perl to be ready with all the DBD::Oracle, DBI and other stuff in the right place. And this is a little complicated on my machines because I tend to experiment, and so I have a lot of different stuff installed, so I have to fire up the right Perl, with the right modules, and the right environment (I do use perlbrew, in the case you are wondering). In other words, there was some setup work necessary before I could run ora2pg, and that was a perfect candidate for a real shell script.
Then, the number of the databases to do this work on became two, and this was a call for a parametric script…you get the point!
Last but not least, I was not sure about when the migration would happen and when I was asked to load a new bunch of stuff into PostgreSQL, and since my memory is lazier than me, I not always do remember all the required steps to load the extracted part of ora2pg into our beloved database.
And therefore I decided to write a simple shell script to allow me to:

extract data from a customizable Oracle database, assuming ora2pg project is configured;
place the data and structures in a well defined space on my storage;
create a compact and clear psql script to load the extraction into PostgreSQL (yes, I know ora2pg can do this automagically with a PostgreSQL connection, but I have to do it offline).

Let’s start first from how I do add new databases to my script:

export ORACLE_HOME=/opt/oracle/instantclient_18_3
source ~/perl5/perlbrew/etc/bashrc
DATE_DIR=`date '+%Y-%m-%d'`
BACKUP_ROOT=/backup
ORACLE_PG_TEMPALTE="my_oracle_template"

do_ora2pg ora-srv ORADB1 /backup/ora2pg/ORADB1
do_ora2pg ora-srv ORADB2  /backup/ora2pg/ORADB2

The initial part is used to set up Perl and Oracle Instant Client.
The do_ora2pg are the lines that define a single extraction; the arguments to the do_ora2pg shell function are:

Oracle host name;
Oracle schema to which I need to connect;
path to the ora2pg project.

What does the do_ora2pg shell function do?
Here it is:

do_ora2pg()
{
    local SERVER_NAME=$1
    local ORACLE_SCHEMA=$2
    local ORACLE_PROJECT_FOLDER=$3

    local BACKUP_DIR="${BACKUP_ROOT}/${SERVER_NAME}/${DATE_DIR}/ora2pg/${ORACLE_SCHEMA}"
    local PG_DATABASE=$( echo $ORACLE_SCHEMA | tr '[:upper:]' '[:lower:]' )

    if [ ! -d "$BACKUP_DIR" ]; then
        mkdir -p "$BACKUP_DIR"
    else
        rm $BACKUP_DIR/*.sql > /dev/null 2>&1
    fi

    echo -e "\n{ $ORACLE_SCHEMA }\n\t=> PostgreSQL Dump in [$BACKUP_DIR]\n"

    cd $BACKUP_DIR


    TYPES="TABLE VIEW MVIEW INSERT SEQUENCE FUNCTION PROCEDURE TRIGGER"
    counter=1

    cat <<EOF > all.sql
-- Automatic PostgreSQL reload from Oracle
-- $ORACLE_SCHEMA
-- $BACKUP_DIR

\set ON_ERROR_STOP 1
\set QUIET         1

\echo Reload of Oracle schema $PG_DATABASE

DROP DATABASE IF EXISTS $PG_DATABASE;
CREATE DATABASE $PG_DATABASE WITH TEMPLATE $ORACLE_PG_TEMPLATE
\c $PG_DATABASE


CREATE EXTENSION IF NOT EXISTS orafce;


\echo Connected to $PG_DATABASE
\echo Starting the loading batch
\echo

EOF

    for t in $TYPES
    do
        output=$(printf "%02d" $counter)-$t.sql
        echo "$t => $output "
        echo -e "\n\\\echo Batch to load : $output ..." >> all.sql
        ora2pg --c ${ORACLE_PROJECT_FOLDER}/config/ora2pg.conf  -t $t -o $output >> ora2pg.log 2>&1
        if [ $? -eq 0 ]; then
            echo "OK"
            echo -e "\\\i $output" >> all.sql
        else
            echo "KO"
            echo  -e "\\\echo NOT LOADING!" >> all.sql
        fi

        counter=$(( counter + 1 ))
    done

    
}

The function initially creates a BACKUP_DIR that is named after a well defined root, and after the date the backup is took on (I assume to do no more than one per day). The idea is that the backup directory will result in something like /backup/ora-srv/2021-06-11/db1. After a quick check about the existance or not of the backup directory, the script creates a file named all.sql in such directory, placing some psql directives into such file.
Then the script executes ora2pg for the objects I care about, producing a different file name suffix for every kind of invocation, for example 01-TABLES for table structures (schema). If the dump of the objects type is fine, the \i inclusion of that file is placed into all.sql, otherwise an alert is inserted.
The result all.sql is a file like the following:

-- Automatic PostgreSQL reload from Oracle
-- ORADB1
-- /backup/DATI/ora-srv/2021-06-10/ora2pg/ORADB1

\set ON_ERROR_STOP 1
\set QUIET         1

\echo Reload of Oracle schema db1

DROP DATABASE IF EXISTS db1;
CREATE DATABASE db1 WITH TEMPLATE my_oracle_template;
\c db1


CREATE EXTENSION IF NOT EXISTS orafce;


\echo Connected to db1
\echo Starting the loading batch
\echo


\echo Batch to load : 01-TABLE.sql ...
\i 01-TABLE.sql

\echo Batch to load : 02-VIEW.sql ...
\i 02-VIEW.sql

\echo Batch to load : 03-MVIEW.sql ...
\i 03-MVIEW.sql

\echo Batch to load : 04-INSERT.sql ...
\i 04-INSERT.sql

\echo Batch to load : 05-SEQUENCE.sql ...
\i 05-SEQUENCE.sql

\echo Batch to load : 06-FUNCTION.sql ...
\i 06-FUNCTION.sql

\echo Batch to load : 07-PROCEDURE.sql ...
\i 07-PROCEDURE.sql

\echo Batch to load : 08-TRIGGER.sql ...
\i 08-TRIGGER.sql

Therefore, the only thing I have to do when I want to migrate the Oracle content into PostgreSQL, is to launch a command like:

% psql -U luca template1 < all.sql

and wait. This is something easy enough for me to remember even if I have not sleep well!
I’ve experimented with this for a few weeks now, and it is something that is really useful to my use case.
Please note that I create the extension orafce in the reloaded database, because we do use some functions that are dumped and reloaded well by this extension. For that reason, the database on the PostgreSQL side is created by means of a specific template that have the extension already installed.

Conclusions

ora2pg is an amazing tool, that can be used and abused in different ways including doing backups!
I’m sure there are smarter ways to achieve my same aim, and I will report back if I learn about them, so please let me know if you have suggestions!

Template Databases

2021-06-08T00:00:00+00:00

PostgreSQL relies on the concept of template databases to create a new one.

Template Databases

PostgreSQL relies on the concept of template as a way to create a new database. The idea is similar to the one of the /etc/skel for Unix operating systems: whenever you create a new user, its own home directory is cloned from the /etc/skel. In PostgreSQL the idea is similar: whenever you create a new database, that is cloned from a template one.

PostgreSQL ships with two template database, namely template1 and template0.
template1 is the default database, meaning that when you execute a CREATE DATABASE the system will clone such database as the new one. In other words:

CREATE DATABASE foo;

is the same as

CREATE DATABASE foo WITH TEMPLATE template1;

One advantage of this technique is that whatever object you put into the tempalte1, you will find into the new database(s). This could be handy when having to face multiple database with similar or identical objects, but can be a nightmare if you screw up your template database.
Then there is template0, that is used as a backup for template1 (in the case you screw up) or as a special templating database for handling particular situations like different encoding.

Working with different templates

You can create your own template database, that you can then use as a base to create other database:

emplate1=# CREATE DATABASE my_template WITH
           IS_TEMPLATE = true;
CREATE DATABASE

template1=# CREATE DATABASE a_new_database
            WITH TEMPLATE my_template;
CREATE DATABASE

Having templates is handy, however is not mandatory to exploit a template to build a new database. Change the previous template so that it is no more a template database and then build another database:

template1=# ALTER DATABASE my_template
            WITH IS_TEMPLATE = false;
ALTER DATABASE

template1=# CREATE DATABASE a_new_database_from_no_template
            WITH TEMPLATE my_template;
CREATE DATABASE

As you can see, you can use a normal (i.e., not template) database to build a new database too!
This is possible only if done by a superuser!

template1=> CREATE DATABASE db_from_user;
CREATE DATABASE

template1=> CREATE DATABASE db_from_user_and_template
            WITH TEMPLATE my_template;
ERROR:  permission denied to copy database "my_template"

As you can see, being a normal user you can create a new database using a template database, but not using a non-templating database.
Templates are exploitable by both normal and super users, but only super users can create a new database exploiting a database that is not marked as a template.

Connections while creating a database

When the CREATE DATABASE is performing, there must be no ther connections to the target database, it does not mean if it is a template or a normal database. The reason is that, in order to clone the database, there must be no activity on such database.

template1=> CREATE DATABASE db_from_user_while_template1_in_use;
ERROR:  source database "template1" is being accessed by other users
DETAIL:  There is 1 other session using the database.

Here it is: the message states clearly that there is some kind of activity on template1 and therefore it is not safe to clone such database.
The same happens with a non-template database:

template1=# CREATE DATABASE db_from_user_while_my_template_in_use
            WITH TEMPLATE my_template;
ERROR:  source database "my_template" is being accessed by other users
DETAIL:  There is 1 other session using the database.

It is interesting to note that it does not matter what kind of activity is ongoing in the database used as a template: it does suffice there is a single connection (event idle) to prevent CREATE DATABASE to continue.
On the other hand, the system prevents any incoming connection to be established against the tempalte database until the CREATE DATABASE has finished and hence releases the database.

Conclusions

Template database are used as a skeleton to be cloned when a new database is going to be created.
The cluster can survive even without template database, but not having the default one(s) will make less comfortable the usage of CREATE DATABASE. You can build your own templates, and this is recommended to avoid tainting the default one(s), but you will need to specify your template name within every CREATE DATABASE.
Last, the system will not allow you to use a database as a template if there are active connections (except your own), because cloning will become unsafe.

pgbackrest lands on FreeBSD!

2021-06-03T00:00:00+00:00

pgbackrest has been inserted into the FreeBSD ports!

pgbackrest lands on FreeBSD!

At last it happened: pgbackrest, my favourite backup solution fo PostgreSQL is now available in the FreeBSD ports tree, my favourite operating system!

Thanks to the efforts of people involved in this issue it is now possible to get pgbackrest installed easily (or in a simpler way) on FreeBSD!

PostgreSQL Builtin Trigger Function to Speed Up Updates

2021-06-03T00:00:00+00:00

Did you know PostgreSQL ships with a pre-built trigger function that can speed up UPDATES?

PostgreSQL Builtin Trigger Function to Speed Up Updates

PostgreSQL ships with an internal trigger function, named suppress_redundant_updates_trigger that can be used to avoid idempotent updates on a table.
The online documentation explains very well how to use it, including the fact that the trigger should be fire as last in a trigger chain, and so the trigger name should be alphabetically the last one in natural sorting.
But is it worth using such function?
Let’s find out wth a very trivial example on well known pgbench database. First of all, let’s consider the initial setup:

pgbench=> SELECT count(*), 
          pg_size_pretty( pg_relation_size( 'pgbench_accounts' ) ) 
          FROM pgbench_accounts;
  count   | pg_size_pretty 
----------|----------------
 10000000 | 1281 MB
(1 row)

Now, let’s execute an idempotet UPDATE, that is something that does not change anything, and monitor the timing:

pgbench=> \timing
Timing is on.
pgbench=> UPDATE pgbench_accounts SET filler = filler;
UPDATE 10000000
Time: 307939,763 ms (05:07,940)

pgbench=> SELECT pg_size_pretty( pg_relation_size( 'pgbench_accounts' ) );
 pg_size_pretty 
----------------
 2561 MB
(1 row)

Time: 180,732 ms

Note how the table has doubled its size: this is because of bloating caused by every row being substituted by an exact copy of it.
Now, let’s create the trigger using the suppress_redundant_updates_trigger function, and let’s run the same update again, but after a server restart to clean up also the memory.

pgbench=> CREATE TRIGGER tr_avoid_idempotent_updates
BEFORE UPDATE ON pgbench_accounts
FOR EACH ROW
EXECUTE FUNCTION suppress_redundant_updates_trigger();

-- restart the server

pgbench=> \timing
Timing is on.
pgbench=> UPDATE pgbench_accounts SET filler = filler;
UPDATE 0
Time: 287588,607 ms (04:47,589)

pgbench=> SELECT pg_size_pretty( pg_relation_size( 'pgbench_accounts' ) );
 pg_size_pretty 
----------------
 2561 MB
(1 row)

The total gain was about 20 secs, that is a speed up of roughly 7%, that is not too much at all.
However, note how the UPDATE reports zero tuples have been touched, therefore while the speed up gain is not really exciting, the bloating of the table remains the same as before the UPDATE itself.

After a full vacuum, the speed up results a lot more, but this can be a counter effect of having in memory already some pages:

pgbench=> VACUUM FULL pgbench_accounts ;
VACUUM
Time: 222455,150 ms (03:42,455)
pgbench=> UPDATE pgbench_accounts SET filler = filler;
UPDATE 0
Time: 198104,981 ms (03:18,105)

However, even after a reboot of the server, the time remains lower:

pgbench=> UPDATE pgbench_accounts SET filler = filler;
UPDATE 0
Time: 184217,260 ms (03:04,217

So the gain on a not bloated table is around 67% which is much more interesting!

Timing the trigger execution

How long does it take to execute the trigger function against every row? It is possible to get this information with EXPLAIN ANALYZE:

pgbench=> EXPLAIN (FORMAT yaml, ANALYZE, VERBOSE, TIMING ) 
          UPDATE pgbench_accounts SET filler = filler;               
                    QUERY PLAN                     
---------------------------------------------------
 - Plan:                                          +
     Node Type: "ModifyTable"                     +
     Operation: "Update"                          +
     Parallel Aware: false                        +
     Relation Name: "pgbench_accounts"            +
     Schema: "public"                             +
     Alias: "pgbench_accounts"                    +
     Startup Cost: 0.00                           +
     Total Cost: 263935.00                        +
     Plan Rows: 10000000                          +
     Plan Width: 103                              +
     Actual Startup Time: 153053.980              +
     Actual Total Time: 153377.845                +
     Actual Rows: 0                               +
     Actual Loops: 1                              +
     Plans:                                       +
       - Node Type: "Seq Scan"                    +
         Parent Relationship: "Member"            +
         Parallel Aware: false                    +
         Relation Name: "pgbench_accounts"        +
         Schema: "public"                         +
         Alias: "pgbench_accounts"                +
         Startup Cost: 0.00                       +
         Total Cost: 263935.00                    +
         Plan Rows: 10000000                      +
         Plan Width: 103                          +
         Actual Startup Time: 8.968               +
         Actual Total Time: 44542.939             +
         Actual Rows: 10000000                    +
         Actual Loops: 1                          +
         Output:                                  +
           - "aid"                                +
           - "bid"                                +
           - "abalance"                           +
           - "filler"                             +
           - "ctid"                               +
   Planning Time: 24.475                          +
   Triggers:                                      +
     - Trigger Name: "tr_avoid_idempotent_updates"+
       Relation: "pgbench_accounts"               +
       Time: 1510.272                             +
       Calls: 10000000                            +
   Execution Time: 159552.624
(1 row)

As you can see, running the trigger requires roughly 1.5 secs for 10 million tuples.
Assuming the timing is enough accurate and stable, it means 0.00015 msecs for every tuple, that is not much overhead after all.

It is possible to provide another table to experiment against, in order to see if the timing for the trigger eecution depends on the data types and its content:

pgbench=> create table stuff( pk serial, t text );

pgbench=> INSERT INTO stuff( t ) SELECT repeat( 'abc', 1000 ) 
          from generate_series( 1, 2000000 );
          
          
pgbench=> CREATE TRIGGER tr_avoid_idempotent_updates
BEFORE UPDATE ON stuff 
FOR EACH ROW
EXECUTE FUNCTION suppress_redundant_updates_trigger();


pgbench=> EXPLAIN ( FORMAT yaml, ANALYZE, VERBOSE, TIMING ) 
          UPDATE stuff SET t = t;
          
...
  Triggers:                                      +
     - Trigger Name: "tr_avoid_idempotent_updates"+
       Relation: "stuff"                          +
       Time: 223.227                              +
       Calls: 2000000                             +

Again, the mean execution time of the trigger is 0.00011 msecs, and very similar (if not equal) results can be obtained with the pk column, so I would say that the execution time of the trigger does not involves the specific type of the column(s) being updated.

The Black Behing the Triger Funtion

The suppress_redundant_updates_trigger is defined in the file utils/adt/trigfuncs.c, and the magic happens in the following piece of code:

	/* if the tuple payload is the same ... */
	if (newtuple->t_len == oldtuple->t_len &&
		newheader->t_hoff == oldheader->t_hoff &&
		(HeapTupleHeaderGetNatts(newheader) ==
		 HeapTupleHeaderGetNatts(oldheader)) &&
		((newheader->t_infomask & ~HEAP_XACT_MASK) ==
		 (oldheader->t_infomask & ~HEAP_XACT_MASK)) &&
		memcmp(((char *) newheader) + SizeofHeapTupleHeader,
			   ((char *) oldheader) + SizeofHeapTupleHeader,
			   newtuple->t_len - SizeofHeapTupleHeader) == 0)
	{
		/* ... then suppress the update */
		rettuple = NULL;
	}

that essentially compares the old and the new tuple to see if they have the same headers, the same number of attributes, and of course the same content of the memory representation (by means of memcpm(3)).

Doing in `plpgsql`

It is possible to implement a basic function in plpgsql by means of the IS DISTINCT FROM operator:

CREATE OR REPLACE FUNCTION
  f_avoid_idempotent_updates()
  RETURNS TRIGGER
AS $CODE$
BEGIN
  IF NEW.* IS DISTINCT FROM OLD.* THEN
    RETURN NEW;
  ELSE
    RETURN NULL;
  END IF;
END
  $CODE$
  LANGUAGE plpgsql;

and the execution with this trigger in place results in:

pgbench=> drop trigger tr_avoid_idempotent_updates on pgbench_accounts;
DROP TRIGGER
                                                     
pgbench=> create trigger tr_avoid_idempotent_updates 
before update on pgbench_accounts              
for each row
execute function f_avoid_idempotent_updates();
CREATE TRIGGER


pgbench=> update pgbench_accounts set filler = filler;
UPDATE 0
Time: 167400,098 ms (02:47,400)

and if you track function executions:

pgbench=> select * from pg_stat_user_functions ;
-[ RECORD 1 ]--------------------------
funcid     | 36672
schemaname | public
funcname   | f_avoid_idempotent_updates
calls      | 10000000
total_time | 21276.741
self_time  | 21276.741

that indicates that 21 secs are spent in doing the trigger analysis, so roughly 0,0021 msecs spent for each tuple. This is by far much more expensive of the C default function (that was roughly 0.00015 msecs).
Similar results are emphasized by the EXPLAIN ANALYZE output:

pgbench=> EXPLAIN (FORMAT yaml, ANALYZE, TIMING )
          UPDATE pgbench SET filler = filler;
...
  |   Triggers:                                      +
  |     - Trigger Name: "tr_avoid_idempotent_updates"+
  |       Relation: "pgbench_accounts"               +
  |       Time: 23002.383                            +
  |       Calls: 10000000                            +
  |   Execution Time: 163343.183

Here the Time is around 23000 msecs while with the C native function it was about 1500 msecs.

Conclusions

The internal suppress_redundant_updates_trigger function can be useful for reducing both time and bloating against large batches of UPDATEs.
The function is implemented in the C language and checks if the memory content of the tuples is the same or not, and this makes this approach really powerful and not so error prone as defining a custom trigger function by the user.

Memory inspection thru pg_buffercache

2021-05-28T00:00:00+00:00

A tiny set of functions to glance at the memory usage in the PostgreSQL system.

Memory inspection thru pg_buffercache

pg_buffercache is a very useful extension that allows for the inspection of the memory as used by a live PostgreSQL instance. The extension is available by means of the contrib module and is very useful to take a look at the memory usage, in other words the usage of the shared_buffers.
Thanks to this module it is possible to clearly understand the memory consumption and, therefore, the correct tuning of the shared_buffers parameter.
A few years ago I wrote a set of example queries to interact with the module and get a glance at the memory usage. While those queries were a starting point, they had some issues especially when a table was not consuming memory (disibion by zero, and so on).

I finally found the time to produce a cleaner approach to those queries, so I re-implemented all the queries by means of functions. The script is a psql script, and uses some special backslash commands, but you can extract the SQL pure part and execute it by means of another client.
The script creates a memory schema and places all the functions into such schema; the functions have a name that starts with f_memory, so that they should not clash with existing functions.
In the following I describe every function.
Please note that the idea here is to provide a background about memory inspection, there is still room for improvements and fixes!

Installing the functions

It does suffice to execute the memory.sql psql script to get the creation of the schema memory and all the functions into such schema. The script provides some information about the objects created:

tfdb=# \i memory.sql 
Creating a schema named memory...
All objects created!
Try one of the following functions:
 - memory.f_memory() to get very basic information
 - memory.f_memory_usage() to get information about the whole memory
 - memory.f_memory_usage_by_database() to get information about single databases
 - memory.f_memory_usage_by_table() to get information about tables in the current database
 - memory.f_memory_usage_by_table_cumulative() to get cumulative information for tables

You can add the memory schema to the search path.
Try running the following query while testing the database (e.g., via pgbench):

select memory.f_memory_usage();
\watch 5

The output of the functions

All the function accept a boolean human flag, that by default is set to true. If the flag is set the output of the memory dimensions will be formatted using pg_size_pretty(), therefore will be in a human readable format. Otherwise the output will be formatted as plain number of bytes.

tfdb=# select * from memory.f_memory();
 total  |  used  |  free  
--------|--------|--------
 800 MB | 101 MB | 699 MB
(1 row)

tfdb=# select * from memory.f_memory( false );
   total   |   used    |   free    
-----------|-----------|-----------
 838860800 | 106168320 | 732692480
(1 row)

Utility functions

There are a few utility functions that are used as a backbone to build the others. In particular:

memory.f_check_pg_buffercache() it checks that the extension pg_buffercache is installed into the database;
memory.f_check_user() checks that the user is either an administrator or has the privileges to run pg_buffercache functions;
memory.f_check() calls the previous two functions and raises an exception if the check fails. This function is invoked by all the other memory related functions, so that before the function is run the user can get an alert about missing pieces;
memory.f_usagecounter_to_string() provides a textual description of the pg_buffercache.usagecount value;
memory.f_tablename() provides the name of a table, index or view os anything that will appear in the output of other functions;
memory.f_print_bytes() prints the amount of bytes as text, using either pg_size_pretty() or plain text conversion. This is used in every function to support the above mentioned human flag.

Available functions

The available functions to inspect the memory usage are described in the following.

f_memory()

The function memory.f_memory() provides a glance at free and used memory in the cluster.

tfdb=# select * from memory.f_memory();
 total  |  used  |  free  
--------|--------|--------
 800 MB | 163 MB | 637 MB
(1 row)

f_memory_usage()

The function memory.f_memory_usage() provides a more detailed view about the usage of the memory. In particular it provides the amount of memory used by level of usagecount.

tfdb=# select * from memory.f_memory_usage();
 total_memory | memory  | percent | cumulative |  description   
--------------|---------|---------|------------|----------------
 800 MB       | 22 MB   | 2.71 %  | 2.71%      | VERY HIGH (5)
 800 MB       | 2536 kB | 0.31 %  | 3.02%      | HIGH (4)
 800 MB       | 1936 kB | 0.24 %  | 3.26%      | MID (3)
 800 MB       | 1888 kB | 0.23 %  | 3.49%      | LOW (2)
 800 MB       | 135 MB  | 16.85 % | 20.34%     | VERY LOW (1)
 800 MB       | 637 MB  | 79.66 % | 100.00%    | == FREE == (0)
(6 rows)

The memory column provides the amount of memory used for a specific region, and the percent columns provide the ratio of memory usage with regard to the total memory. The cumulative column provides the amount ratio of the usage level greater than the current one.
As an example, in the above there are 135 MB used not frequently, and thus the 20.34 % of memory is used from very high to very low.

f_memory_usage_by_database()

The function memory.f_memory_usage_by_database() provides information about the usage of memory by each database in the cluster, and provides also the caching amount of every database.

pgbench=# select * from memory.f_memory_usage_by_database();
 total_memory |  database   | size_in_memory | size_on_disk | percent_cached | percent_of_memory 
--------------|-------------|----------------|--------------|----------------|-------------------
 256 MB       | pgbench     | 182 MB         | 1505 MB      | 12.11%         | 71.15%
 256 MB       | ltdb        | 608 kB         | 171 MB       | 0.35%          | 0.23%
 256 MB       | postgres    | 544 kB         | 104 MB       | 0.51%          | 0.21%
 256 MB       | restore     | 544 kB         | 104 MB       | 0.51%          | 0.21%
 256 MB       | restore2    | 544 kB         | 104 MB       | 0.51%          | 0.21%
 256 MB       | restore3    | 544 kB         | 104 MB       | 0.51%          | 0.21%
 256 MB       | restore4    | 544 kB         | 8269 kB      | 6.58%          | 0.21%
 256 MB       | template1   | 544 kB         | 8245 kB      | 6.60%          | 0.21%
(8 rows)
         

f_memory_usage_by_table()

The function memory.f_memory_usage_by_table() provides information about the usage of all tabular like stuff, in other words about relations.

tfdb=# select * from memory.f_memory_usage_by_table();
...

 800 MB       | tfdb | (table) respi.y2019m12                         | 8192 bytes | 0.00 %  | VERY HIGH (5)
 800 MB       | tfdb | (table) respi.y2019m12                         | 22 MB      | 2.70 %  | VERY VERY LOW (0)
 800 MB       | tfdb | (index) respi.y2019m12_ts_idx                  | 32 kB      | 0.00 %  | VERY HIGH (5)
 800 MB       | tfdb | (index) respi.y2019m12_ts_idx1                 | 8192 bytes | 0.00 %  | VERY HIGH (5)

f_memory_usage_by_table_cumulative()

The function f_memory_usage_by_table_cumulative() provides an overview of how much memory a single table is “consuming”, without any regard to the usage level counter.

tfdb=# select * from memory.f_memory_usage_by_table_cumulative();
-[ RECORD 1 ]-----|-----------------------------------------------
total_memory      | 800 MB
database          | tfdb
relation          | (table) respi.y2019m07
memory            | 10 MB
on_disk           | 1159 MB
percent_of_memory | 1.27 %
percent_of_disk   | 0.88%
usagedescription  | any
-[ RECORD 2 ]-----|-----------------------------------------------
total_memory      | 800 MB
database          | tfdb
relation          | (table) respi.y2019m06
memory            | 10 MB
on_disk           | 1156 MB
percent_of_memory | 1.26 %
percent_of_disk   | 0.87%
usagedescription  | any
...

The function accepts the usual human argument, but also an integer optional argument that represents the usage counter you are interested in. When specified, the function will show only the amount of memory used with a greater or equal usage counter.

tfdb=# select * from memory.f_memory_usage_by_table_cumulative( 5 );
-[ RECORD 1 ]-----|-----------------------------------------------
total_memory      | 800 MB
database          | tfdb
relation          | (table) respi.y2019m07
memory            | 8192 bytes
on_disk           | 1159 MB
percent_of_memory | 0.00 %
percent_of_disk   | 0.00%
usagedescription  | >= VERY HIGH (5)
-[ RECORD 2 ]-----|-----------------------------------------------
total_memory      | 800 MB
database          | tfdb
relation          | (table) respi.y2019m06
memory            | 8192 bytes
on_disk           | 1156 MB
percent_of_memory | 0.00 %
percent_of_disk   | 0.00%
usagedescription  | >= VERY HIGH (5)
...

Conclusions

The above set of functions can be used as a starting point to build your own set of queries to inspect the memory usage of a live PostgreSQL cluster. There is still room for improvements and reduce the code duplication, so stay tuned for other versions!

A glance at doas & pg_ctl

2021-05-10T00:00:00+00:00

A possible system that differs from sudo.

A glance at doas & pg_ctl

doas(1) is a replacement for sudo(1), a program that allows you to execute commands as a different user. The main advantage of using sudo(1) and hence doas(1) is that you can gain different privileges without the need to know the authentication tokens (e.g., a password) to do that.
I use sudo(1) on pretty much every machine I use, both Linux and FreeBSD.
In this post I glance at doas(1) and how it can be quickly configured to run PostgreSQL commands, mainly pg_ctl.

`doas` introduction

doas(1) is a program that was born in the OpenBSD ecosystem as a replacement for sudo(1) because, in short, the latter is too big and cannot be easily integrated into the base system.
doas is now available on FreeBSD and Linux too, so it is worth spending some time to learn how it works.
doas(1) is based on a configuration file, namely doas.conf (in FreeBSD /usr/local/etc/doas.conf), that has a syntax a lot clearer than that of sudo, at least in my opinion.

Rules are pretty simple:

every line in the configuration file is a rule, and rules are read from top to the bottom;
a rule can be either permit or deny, allowing a user to run a command or not;
a command is prefix by the special keyword cmd;
a target user, that is the user you want to run the command as, is prefix by the keyword as;
the special keyword nopass does not ask for password (same as NOPASSWD option for sudo);
it is possible to specify or keep the environment or change it.

The usage of doas(1) is pretty much the same of sudo(1), and mainly;

doas is the entry command;
-u specifies the user to run the command as;
the command is the remaining part of the command line.

doas has a lot less features (and thus syntax cluttering) than sudo, and therefore it is a lot faster and easy to setup, and according to me a lot less prone to errors.

Using `doas` to control a PostgreSQL cluster

Assuming you want to control a cluster, that is being able to run pg_ctl against a cluster, a possible configuration of doas.conf is as follows:

permit nopass setenv { PGDATA=$PGDATA } luca as postgres cmd  /usr/local/bin/pg_ctl
permit nopass setenv { PGDATA=$PGDATA } luca as postgres cmd  pg_ctl

The two lines are pretty much identical, with the exception that the second allows for a relative path pg_ctl command to run. Let’s examine the rules:

permit nopass means that the rule allows to do the command without asking for the current user password;
luca as postgres means that the user luca to become the user postgres, that is allows the current user luca to execute a command with the privileges of the local user postgres;
cmd //usr/local/bin/pg_ctl specifies which command (both with absolute and relative path) to execute;
setenv { PGDATA=$PGDATA } means that the target user postgres will inherit the PGDATA variable from the current user luca.

Therefore, it is now possible to issue the following command to stop the cluster:

% doas -u postgres pg_ctl stop
waiting for server to shut down.... done
server stopped

That is equivalent to sudo -u postgres pg_ctl stop (assuming you have configured sudo to keep the environment**.

Please note that using nopass and relative paths is, in general, a very bad idea. Do not use it in production!

Let’s execute some other commands:

% doas -u postgres initdb /postgres/13
doas: Operation not permitted

Since doas does not have any entry for the command initdb, it does not allow the user to execute such command. In order to allow the initdb, it is possible to add the following lines to doas.conf:

permit persist setenv { PGDATA=$PGDATA } luca as postgres cmd  /usr/local/bin/initdb
permit persist setenv { PGDATA=$PGDATA } luca as postgres cmd  initdb

and now it is possible to run it:

% doas -u postgres initdb /postgres/13
Password:

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

...
Success. You can now start the database server using:

    pg_ctl -D /postgres/13 -l logfile start

Note how the program asked for a password; this is due to the persist authentication mode instead of nopass. persist is the behaviour that makes doas(1) asking for an authentication password and let the user to execute other commands without the same password within a short period of time. Essentially this is the same as the default behaviour of sudo in most of the default installations.

What if the user wants to be able to execute every command related to PostgreSQL? We can configure the user to be able to execute any command as the postgres user with a configuration like the following:

permit persist setenv { PGDATA=$PGDATA } luca as postgres 

The above allows luca to become postgres and execute any command as the latter user.
It is quite simple to generate a shell script that can add automatically configuration lines so that all the PostgreSQL related commands will be executed:

# for cmd in /usr/local/bin/pg*; do
    echo "permit persist setenv { PGDATA=\$PGDATA } luca as postgres $cmd" >> //usr/local/etc/doas.conf
  done

and the above is going to produce something really verbose as:

permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_archivecleanup
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_basebackup
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_checksums
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_config
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_controldata
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_ctl
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_dump
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_dumpall
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_isready
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_receivewal
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_recvlogical
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_repack
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_resetwal
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_restore
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_rewind
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_standby
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_test_fsync
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_test_timing
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_upgrade
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pg_waldump
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pgbackrest
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pgbadger
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pgbench
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pgbench_helper.sh
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pgxn
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pgxn-3.7
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pgxnclient
permit persist setenv { PGDATA=/postgres/12/data } luca as postgres /usr/local/bin/pgxnclient-3.7

Of course, you can tune such generator as much as you like.

Using commands against a single cluster (don’t try this at home!)

In the previous examples, doas has been configured to allow only PostgreSQL related commands with a default PGDATA environment variable, but the user is still able to execute a command using a different directory:

% doas -u postgres pg_ctl -D /postgres/13/ start
waiting for server to start....
 done
server started

As you can configure sudo, you can also tune doas to accept only a specific data directory as option to the commands. This is, however, quite complex and prone to errors: you have to specify the environment and all available arguments, such as:

permit nopass setenv { PGDATA=$PGDATA } luca as postgres cmd  /usr/local/bin/pg_ctl args start
permit nopass setenv { PGDATA=$PGDATA } luca as postgres cmd  /usr/local/bin/pg_ctl args stop
permit nopass setenv { PGDATA=$PGDATA } luca as postgres cmd  /usr/local/bin/pg_ctl args restart

The situation becomes:

 % doas -u postgres /usr/local/bin/pg_ctl start
waiting for server to start....
...
 done
server started


% doas -u postgres /usr/local/bin/pg_ctl reload
doas: Operation not permitted

Please be aware that this is not a good solution however, because while updating the doas.conf file the file could result shorter and the rules could be executed in a way you don’t figure.
A better approach is, of course, allow the user to become postgres and have the latter able to do only her own tasks.

Being able to run as user `postgres`

This is much simpler you may think and it resolves into the single rule:

permit persist setenv { PGDATA=$PGDATA } luca as postgres

Without specifying any command with the special keywor cmd, the user luca will be able to run any command as postgres, and such user will be able to execute every PostgreSQL related command.

Conclusions

doas(1) is a nice piece of code that allows for a more readable and less tunable configuration than sudo, and this can be exploited to allow users for executing operations against PostgreSQL, among other programs.

To WAL or not to WAL? When unlogged becomes logged...

2021-05-10T00:00:00+00:00

What happens to table that are not logged into WALs when a physical replication is in place?

To WAL or not to WAL? When unlogged becomes logged…

As many other databases, PostgreSQL allows for a table to be unlogged, that in short means “exclude me from the WALs!”. Such tables are not crash safe, as well as they are not replicated because the PostgreSQL replication relies on the WALs.
But what happens when you deal with such tables in a replication scenario? This post tries to provide you some explaination of what is possible and what happens.

Creating and populating a database to test

First of all, let’s create a clean database just to keep the test environment separated from other databases:

testdb=# CREATE DATABASE rep_test WITH OWNER luca;
CREATE DATABASE

Now let’s create and populate three tables (one temporary, one unlogged and one normal):

rep_test=> CREATE TABLE t_norm( pk int GENERATED ALWAYS AS IDENTITY,
                  t text,
                  primary key( pk ) );

rep_test=> CREATE UNLOGGED TABLE 
           t_unlogged( like t_norm including all );

rep_test=> CREATE TEMPORARY TABLE 
           t_temp( like t_norm including all );
           

rep_test=> INSERT INTO t_norm( t )
               SELECT 'Row #' || v
               FROM generate_series( 1, 1000000 ) v;
INSERT 0 1000000
Time: 4712.185 ms (00:04.712)

rep_test=> INSERT INTO t_temp( t )
            SELECT 'Row #' || v
            FROM generate_series( 1, 1000000 ) v;
INSERT 0 1000000
Time: 1789.473 ms (00:01.789)

rep_test=> INSERT INTO t_unlogged( t )
               SELECT 'Unlogged #' || v
               FROM generate_series( 1, 1000000 ) v;
INSERT 0 1000000
Time: 1746.729 ms (00:01.747)

The situation now is as follows:

Table	Status	Insertion time
`t_norm`	Ordinary table	4.7 secs
`t_temp`	Temporary table	1.8 secs
`t_unlogged`	Unlogged table	1.7 secs

As you can see, timing for temporary and unlogged tables is pretty much the same, and this is because both are not inserted into WAL records, and therefore there is no crash-recovery machinery involved. This also means that writing transactions against temporary and unlogged tables is much faster against those tables. Of course, the above is not an absolute measurement of INSERT times, but is reported here just to give you an idea of differences.

Since there is a temporary table, you need to keep opened the session with the master node or you are going to loose all the data in such table!

Doing the physical replication

Start a physical replication. This is not a tutorial about how to do a physical replication, I will report the commands I’ve done on a separate machine in order to get the replica cluster on its way:

% pg_basebackup -X stream --create-slot --slot 'carmensita_physical_replication_slot' -R -r 100M -D /postgres/12/replica -l "Test unlogged tables" -P -d "dbname=backup user=backup host=miguel" -T /wal=/postgres/12

The original cluster is on a machine named miguel, while the replicated slot is placed on a machine named carmensita. These are the two machines I use always to do some experimental work.
Please note also that I use a backup database and role to stream the information; as you can imagine you need to enable the replication connection on the pg_hba.conf:

% tail $PGDATA/pg_hba.conf

host    replication     backup  carmensita  trust

Once the replication has completed, you can fire up the standby node:

% /usr/pgsql-12/bin/pg_ctl -D /postgres/12/replica start 
in attesa che il server si avvii....
 LOG:  starting PostgreSQL 12.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9), 64-bit
 LOG:  listening on IPv4 address "0.0.0.0", port 5432
 LOG:  listening on IPv6 address "::", port 5432
 LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
 LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
 LOG:  redirecting log output to logging collector process
 HINT:  Future log output will appear in directory "log".

Check the tables on the replication side

It is now time to check the replicated database on the replication host:

% psql -h carmensita -U luca rep_test

rep_test=> \d
                Lista delle relazioni
 Schema |       Nome        |   Tipo   | Proprietario 
--------|-------------------|----------|--------------
 public | t_norm            | tabella  | luca
 public | t_norm_pk_seq     | sequenza | luca
 public | t_unlogged        | tabella  | luca
 public | t_unlogged_pk_seq | sequenza | luca
(4 righe)

rep_test=> select count(*) from t_norm;
  count  
---------
 1000000
(1 riga)

rep_test=> select count(*) from t_unlogged;
ERROR:  cannot access temporary or unlogged relations during recovery

As you can see the temporary table is missing, even if still available on the other master connection. There is no surprise here, a temporary table is usable only on a per-connection basis, and therefore will not be replicated.
It is more interesting to see that the unlogged table t_unlogged and the related sequence have been replicated, but they are there only as a placeholder, and in fact it is not possible to act on the unlogged table.
Therefore unlogged tables are replicated in their structure but not in their data!

Switching from unlogged to logged

On the master node, it is now time to change the unlogged status of t_unlogged to logged, and this can be done quickly with the ALTER TABLE command. Let’s also check the status of the relpersistence flag on pg_class to see how it changed from u (unlogged) to p (persistent):

rep_test=> \d
                Lista delle relazioni
 Schema |       Nome        |   Tipo   | Proprietario 
--------|-------------------|----------|--------------
 public | t_norm            | tabella  | luca
 public | t_norm_pk_seq     | sequenza | luca
 public | t_unlogged        | tabella  | luca
 public | t_unlogged_pk_seq | sequenza | luca
(4 righe)

rep_test=> select count(*) from t_norm;
  count  
---------
 1000000
(1 riga)

rep_test=> select count(*) from t_unlogged;
ERROR:  cannot access temporary or unlogged relations during recovery

The interesting part to note here is that changing the unlogged status to logged required 11 secs, that is more than the the insertion time on an ordinary table. The idea here is that PostgreSQL has to insert into the WALs all the records from the table, as the INSERT of each row just happened.

rep_test=# alter table t_unlogged set logged;
ALTER TABLE
Time: 11485.505 ms (00:11.486)

and after that, on the replicated standby the table becomes ordinary too:

rep_test=> select count(*) from t_unlogged;
  count  
---------
 1000000

Switching from logged to unlogged

What happens now if the t_unlogged returns unlogged again:

rep_test=# alter table t_unlogged set unlogged;
ALTER TABLE
Time: 5236.165 ms (00:05.236)
rep_test=# truncate t_unlogged;
TRUNCATE TABLE
Time: 21.498 ms

The interesting part to note here is that, again, there is a lot of time spent in the storage change.
On the standby, the table become again not usable:

rep_test=> select count(*) from t_unlogged;
ERROR:  cannot access temporary or unlogged relations during recovery
rep_test=> select relpages, reltuples from pg_class where oid = 't_unlogged'::regclass;
 relpages | reltuples 
----------|-----------
        0 |         0

Does the replica knows about the unlogged tables?

Of course it does, and in fact pg_class knows how many tuples and pages the table is using.
However the table is not consuming store space on the replication host. In other words, the database on the replication side knows how much the table occupies on the master node, because the pg_class (and other catalogs) are replicated too. The table data is missing on disk.

Let’s see this on the master side:

rep_test=# select relpages, reltuples, 
                  pg_size_pretty( pg_relation_size( 't_unlogged') ), 
                  pg_relation_filepath( oid ) 
                  from pg_class where oid = 't_unlogged'::regclass;
 relpages | reltuples | pg_size_pretty | pg_relation_filepath 
----------|-----------|----------------|----------------------
    12738 |     2e+06 | 100 MB         | base/41441/41555

and on disk the size of the file is

% sudo du -h $PGDATA/base/41441/41555
100M    /postgres/12/data/base/41441/41555

What on the replicating host? The information is the same, but on the disk there is nothing:

rep_test=# select relpages, reltuples, 
                  pg_size_pretty( pg_relation_size( 't_unlogged') ), 
                  pg_relation_filepath( oid ) 
                  from pg_class where oid = 't_unlogged'::regclass;
                  
 relpages | reltuples | pg_size_pretty | pg_relation_filepath 
----------|-----------|----------------|----------------------
    12738 |     2e+06 | 0 bytes        | base/41441/41555
                 

and on disk, in fact, there is no room occupied by the table:

% sudo du -h /postgres/12/replica/base/41441/4155
0       /postgres/12/replica/base/41441/4155

Unlogged but replicated ~ ordinary

An unlogged table that is replicated, looses the speed advantages of being unlogged.
Why? Because the system has to provide all the machinery to synchronize the table once it becomes logged. If you “stop” the replication, removing the slots and other related stuff, the table gains speed.

Conclusions

As expected, PostgreSQL replicates only logged tables and not temporary or unlogged ones. The latter are however present on the replicating side as placeholders, and once you turn them as logged they are fully shipped to the replicating part.

pg_dump and inserts

2021-04-30T00:00:00+00:00

pg_dump supports a few useful options to export data as a list of INSERTs

pg_dump and inserts

pg_dump(1) is the default tool for doing backups of a PostgreSQL database.
I often got answers about how to produce a more portable output of the database dump, with portable meaning truly “loadable into another PostgreSQL version or even a different database”.
In fact, pg_dump defaults to use COPY for bulkd loading data:

% pg_dump -a  -t wa -U luca testdb 
...
COPY luca.wa (pk, t) FROM stdin;
9200673 Record #1
9200674 Record #2
9200675 Record #3
9200676 Record #4
9200677 Record #5
9200678 Record #6
9200679 Record #7
9200680 Record #8
9200681 Record #9
...

As you can guess, COPY is usable only in PostgreSQL and not into other database. So, how to handle a text dump that can be used into other databases?
No need to worry: pg_dump has a few features to handle such need.
In particular, the following options can be useful:

--inserts removes the COPY and substitutes it with INSERT statements, one per tuple;
--column-inserts similar to the previous, but each INSERT has the list of named columns;
rows-per-inserts a number of tuples a single INSERT statement can handle, useful for a better bulk loading (but could be less portable).

There are also some other useful options:

--quote-all-identifiers force the quoting of the identifiers, and this is useful when preparing data for a different database;
--use-set-session-authorization when dealing with ownership of objects, use SQL standard commands;`
--no-comments, this is not a very “technical” aspect, but when you are going to load your dump into another database you probably do not want to import comments since they could be handled differently. Similarly, there are other --no options that are specific to PostgreSQL, like --no-publications to avoid replicating publications, and so on.

In the following I will use the same example table wa table with just two columns and a bunch of records, so that you can easily compare the output differences.

Defaulting to `INSERT`

In order to better understand the difference between every single option, let’s see a few examples:

% pg_dump -a  -t wa  --inserts -U luca testdb 
...
INSERT INTO luca.wa VALUES (9200673, 'Record #1');
INSERT INTO luca.wa VALUES (9200674, 'Record #2');
INSERT INTO luca.wa VALUES (9200675, 'Record #3');
INSERT INTO luca.wa VALUES (9200676, 'Record #4');
INSERT INTO luca.wa VALUES (9200677, 'Record #5');
INSERT INTO luca.wa VALUES (9200678, 'Record #6');
INSERT INTO luca.wa VALUES (9200679, 'Record #7');
...

As you can see from the above, the COPY has been translated into a set of INSERTs. This of course has the drawback of having a slower buk loading.
Just to do another example, let’s see how it does change the output with identifier quotiong:

% pg_dump -a  -t wa  --inserts --quote-all-identifiers -U luca testdb 
...
INSERT INTO "luca"."wa" VALUES (9200673, 'Record #1');
INSERT INTO "luca"."wa" VALUES (9200674, 'Record #2');
INSERT INTO "luca"."wa" VALUES (9200675, 'Record #3');
INSERT INTO "luca"."wa" VALUES (9200676, 'Record #4');
INSERT INTO "luca"."wa" VALUES (9200677, 'Record #5');
INSERT INTO "luca"."wa" VALUES (9200678, 'Record #6');
...

And the table and schema name has been quoted.
What if you want also the column list on every INSERT? The optin ``–column-inserts is there to explode the list of columns:

% pg_dump -a  -t wa  --column-inserts --quote-all-identifiers -U luca testdb 
...
INSERT INTO "luca"."wa" ("pk", "t") VALUES (9200673, 'Record #1');
INSERT INTO "luca"."wa" ("pk", "t") VALUES (9200674, 'Record #2');
INSERT INTO "luca"."wa" ("pk", "t") VALUES (9200675, 'Record #3');
INSERT INTO "luca"."wa" ("pk", "t") VALUES (9200676, 'Record #4');
INSERT INTO "luca"."wa" ("pk", "t") VALUES (9200677, 'Record #5');
INSERT INTO "luca"."wa" ("pk", "t") VALUES (9200678, 'Record #6');
INSERT INTO "luca"."wa" ("pk", "t") VALUES (9200679, 'Record #7');
INSERT INTO "luca"."wa" ("pk", "t") VALUES (9200680, 'Record #8');
INSERT INTO "luca"."wa" ("pk", "t") VALUES (9200681, 'Record #9');
...

despite the usage or not of the --quote-all-identifiers, each INSERT has the list of the columns the values are referring to.
The last case, a middle path between the COPY and a single INSERT per tuple, is the --rows-per-insert that allows you specify the maximum number of rows every INSERT will handle:

% pg_dump -a  -t wa  --rows-per-insert=3 --quote-all-identifiers -U luca testdb 
...
INSERT INTO "luca"."wa" VALUES
        (9200688, 'Record #16'),
        (9200689, 'Record #17'),
        (9200690, 'Record #18');
INSERT INTO "luca"."wa" VALUES
        (9200691, 'Record #19'),
        (9200692, 'Record #20');
...

Note how the last INSERT has only two tuples instead of the specified 3: the pg_dump is smart enough to let your INSERT to not loose a single row, so if there is not enough data left, the INSERT involves less rows.

Avoid `ALTER TBALE` to set ownership

If the dump includes the table data structure, pg_dump will issue appropriate commands to change the ownership. For example:

% pg_dump -C  -t wa  -U luca testdb
...
CREATE TABLE luca.wa (
    pk integer NOT NULL,
    t text
);

ALTER TABLE "luca"."wraparaound_pk_seq" OWNER TO "luca";
...

The option --use-set-session-authorization produces a more portable SQL output:

% pg_dump -C  -t wa --use-set-session-authorization -U luca testdb
...
SET SESSION AUTHORIZATION 'luca';


CREATE TABLE luca.wa (
    pk integer NOT NULL,
    t text
);

...

As you can see, the user is set in the beginning, so that automatically all created objects will belong to such user.

pgBackRest 2.33: multiple repositories (and more)

2021-04-28T00:00:00+00:00

pgBackRest now supports multiple repositories!

pgBackRest 2.33: multiple repositories (and more)

A few weeks ago a new release of pgbackrest, the 2.33 has been released. This release improves a lot of things, in particular two of them caught my attention:

multi repository support;
custom configuration path.

The former allows pgbackrest to perform a multiple backup scattared over different repositories, in other words it allows the backup to be mirrored across different storages.
The second improvement fixes a few annoyances with non-Linux operating systems, such as FreeBSD.
In the following I give a glance at both this improvements, in no specific order.

Custom configuration path

FreeBSD and, most in general, non-Linux machines use different default configuration paths. For example, what is commonly used as /etc on Linux is usually /usr/local/etc. In previous releases, there was room for using the --prefix option during the configure phase, but this was tedious because there was the need to specify the path to non standard files manually before invoking the command.
In other words:

archive_command = '/usr/local/bin/pgbackrest --pg1-path=/postgres/12/data \
                       --config=/usr/local/etc/pgbackrest.conf \
                       --stanza=miguel  archive-push %p'
archive_mode = on

The important part to note in the above snippet, is that on FreeBSD if you wanted to use the standard (from an operating system point of view) path for the configuration, pgbackrest did not have any clue about and would try to look up the configuration file as /etc/pgbackrest.conf. The solution was, of course, to specify the --config option with the appropriate file.

Things have changed in version 2.33, since the configure command now can instrument the pgbackrest binary to find out the correct configuration file:

% ./configure --help
 ...
 --with-configdir=DIR    default configuration path
 ...

**The default configuration path remains /etc/pgbackrest.conf ** but it is now possible to specify a default configuration file path at compile time, so that you don’t have to repeat yourself with --config at every invocation.

Multi Repository Support

This is a much more important improvement, at least in my opinion. pgbackrest has been designed with this feature in mind, but until now there was not support for multiple repositories.
Thanks to multiple repositories you can now scatter or even mirror your backups across different storage systems, so for example you can have a local repository and a remote one (e.g., in one of the supported cloud storages), or you can mount different storages and have the backup to be mirrored across all of them.
The advantage of this solution is that it provides a better redundancy in the case your single-point-of-failure backup storage dies.
One thing to take into account when working with multiple repositories is that a few pgbackrest commands now require a repository specification other than the stanza. The rule of thumb is that whenever pgbackrest is able to find out which repository to use, it will do, and this applies to the case when a single repository is configured. In other words, backward compatibility is safe!

In the following, there will be two configured repositories on the same backup machine. While this is a very bad idea, because it emphasizes a single point of failure, it allows for a quick run on multiple repository setup. The carmensita machine will handle two different local repositories:

/backup/pgbackrest is the main repository;
/backup/pgbackrest-mirror is the secondary repository, attached to a different storage.

In the beginning there was only `repo1`

With pgbackrest prior to version 2.33, you could not configure multiple repositories: the configuration did accept a repo1 set of variables but it was unable to handle repositories with a specification different from 1. As an example, consider the following configuration:

[global]
start-fast = y
stop-auto  = y

repo1-path = /backup/pgbackrest
repo1-retention-full=2
repo1-retention-archive=5

repo2-path = /backup/pgbackrest-mirror
repo2-retention-full = 1

Such a configuration produces an error even in version 2.32:

$ pgbackrest --stanza miguel stanza-create
ERROR: [032]: only repo1 may be configured

Multiple Repositories

I have to confess that setting up pgbackrest for different repositories on the same machine was not as simple as I initially thought, but once again thanks to very professional community behind this great product I was able to fix my setup:

[global]
start-fast = y
stop-auto  = y
repo1-path = /backup/pgbackrest

repo1-retention-full=2
repo1-retention-archive=5



repo2-path = /backup/pgbackrest-mirror
repo2-retention-full = 1

log-level-console = info


[miguel]
pg1-host = miguel
pg1-path = /postgres/12/data
pg1-host-user = postgres

while on the target machine the main configuration parameters are (/usr/local/etc/pgbackrest.conf):

[global]
repo1-path = /backup/pgbackrest
repo1-host-user = backup
repo1-host = carmensita


repo2-host = sheriff
repo2-host-user = backup
repo2-path = /backup/pgbackrest-mirror

Creating a stanza

As you can imagine, the stanza-create command creates the stanza in all the repositories automatically:

$ pgbackrest --stanza miguel stanza-create
P00   INFO: stanza-create for stanza 'miguel' on repo1
P00   INFO: stanza-create for stanza 'miguel' on repo2
P00   INFO: stanza-create command end: completed successfully (1017ms)

Executing a backup

It is now time to execute a backup and see what happens:

% pgbackrest --stanza miguel backup
...
INFO: repo option not specified, defaulting to repo1
...
INFO: new backup label = 20210413-105939F
INFO: backup command end: completed successfully (254377ms)
INFO: expire command begin 2.33: --exec-id=1606-12c0320b --log-level-console=info --repo1-path=/backup/pgbackrest --repo2-path=/backup/pgbackrest-mirror --repo1-retention-archive=5 --repo1-retention-full=2 --repo2-retention-full=1 --stanza=miguel
INFO: expire command end: completed successfully (59ms)

As you can see, since I did not specify any particular repository, the program program automatically selects the first repository.

Mixed backups

Having a single repository active in the backup list means the backup status is mixed:

$ pgbackrest --stanza miguel info
stanza: miguel
    status: mixed
        repo1: ok
        repo2: error (no valid backups)
    cipher: none

    db (current)
        wal archive min/max (12): 0000000100000005000000F2/000000010000000600000004

        full backup: 20210413-105939F
            timestamp start/stop: 2021-04-13 10:59:39 / 2021-04-13 11:03:51
            wal start/stop: 000000010000000600000004 / 000000010000000600000004
            database size: 2.5GB, database backup size: 2.5GB
            repo1: backup set size: 142.8MB, backup size: 142.8MB

To some extent, the above is a degraded state, that means not all repositories are up with good backups.
Note that the single backup info now has a final line that indicates the repository where the backup can be found.

Specifying the repository for a backup

You can specify the --repo option to instrument pgbackrest on which repository to store the backup:

% pgbackrest --stanza miguel backup --repo 2
...
INFO: backup command end: completed successfully (4846ms)

The situation on the repositories

The info command can, as always, display information about repositories and their content:

% pgbackrest --stanza miguel info
stanza: miguel
    status: ok
    cipher: none

    db (current)
        wal archive min/max (12): 0000000100000005000000F2/000000010000000600000016

        full backup: 20210413-105939F
            timestamp start/stop: 2021-04-13 10:59:39 / 2021-04-13 11:03:51
            wal start/stop: 000000010000000600000004 / 000000010000000600000004
            database size: 2.5GB, database backup size: 2.5GB
            repo1: backup set size: 142.8MB, backup size: 142.8MB

        full backup: 20210413-111525F
            timestamp start/stop: 2021-04-13 11:15:25 / 2021-04-13 11:19:37
            wal start/stop: 00000001000000060000000F / 00000001000000060000000F
            database size: 2.5GB, database backup size: 2.5GB
            repo2: backup set size: 142.8MB, backup size: 142.8MB

...

One backup at a time

It is not possible, as far as I know, to instrument pgbackrest to do simultaneously backups on all the repositories. This means that you are in charge of scheduling backups on all the repositories manually!

Archiving on all the repositories

The archiving, however, is done on all repositories at the same time. However, as explained here, the archive-push will iterate on every repository to push the same WAL segment. What this mean is that, from a PostgreSQL perspective, if a repository fails to get the WAL (while the others succeed), PostgreSQL will think the archiving has failed and will retry later.
One way to solve the problem is to use the archive-push asynchronous mode.

Conclusions

I am very enthusiast about how pgbackrest is progressing and how it is enabling new features at every release.

Preventing FreeBSD to kill PostgreSQL (aka OOM Killer prevention)

2021-04-02T00:00:00+00:00

Something that can be useful when running PostgreSQL on FreeBSD.

Preventing FreeBSD to kill PostgreSQL (aka OOM Killer prevention)

There are a lot of interesting articles on how to prevent the Out of Memory Killer (OOM killer in short) on Linux to ruin you day, or better your night. One particular well done explaination about how the OOM Killer works, and how to help PostgreSQL to survive, is, in my humble opinion, the one from Percona Blog.

I tend to run PostgreSQL on FreeBSD machines, at least whenever it is possible, and quite frankly I have still a lot of things to learn. One of those little details is about FreeBSD OOM Killer.
It turned out FreeBSD has its own OOM Killer implementation, see this excellent article; I discovered it recently via the excellent FreeBSD forum and, as usual, the kindness and professional of the community behind this great operating system.

A difference between Linux and FreeBSD is that the former exploits a lot the /proc filesystem to let the administrator to interact with the process configurations and information, while the latter does not. And thanks to the the above article I discovered the protect(1) command, that is aimed to instrument the OOM Killer.

In the following I describe what I learnt so far and how to protect PostgreSQL from the OOM Killer.

`protect(1)` and FreeBSD OOM Killer

Processes in FreeBSD has a particular flag named PROC_SPROTECT that, as the man page for procctl(2) system call states, is used to instrument the OOM Killer to skip this process when selecting a candidate to kill:

PROC_SPROTECT    Set process protection state.  This is used to mark
                 a process as protected from being killed if the
                 system exhausts the available memory and swap. 

The idea is that when the OOM Killer scans the processes to find out one (or more) candidate to kill to immediatly free memory, the protected processes must be skipped.
An important thing to note is that protection is not inherited by fork(2)-ed processes. Luckily, it is possible to mark a protected process to let its children to inherit the protection status. In fact, setting PROC_SPROTECT to:

PPROT_SET protects the current process but not its children;
PPROT_SET | PPROT_INHERIT protects the current process and any children from hereby.

Why is this detail important? Because as we all know, PostgreSQL starts with a main process (the postmaster) that forks a new process for every connection. Therefore, you are free to control the OOM Killer protection at level of postmaster or connection level.

WARNING: marking all processes as protected can prevent the OOM Killer to work at all, with the presumably result of panicing the whole machine.

Protecting PostgreSQL from OOM Killer

There are two main ways to protect PostgreSQL from the OOM Killer:

manually use protect(1) against one or more PostgreSQL processes;
automatically use protect(1) at sevrice startup.

Manually using protect(1) means that you are going to protect the process by means of its PID. As an example, suppose that on a machine there are the following processes:

% sudo pstree -s postgres
 \-+= 00776 postgres /usr/local/bin/postgres -D /postgres/12/data
   |--= 00777 postgres postgres: logger    (postgres)
   |--= 00779 postgres postgres: checkpointer    (postgres)
   |--= 00780 postgres postgres: background writer    (postgres)
   |--= 00781 postgres postgres: walwriter    (postgres)
   |--= 00782 postgres postgres: stats collector    (postgres)
   \--= 00783 postgres postgres: logical replication launcher    (postgres)

where the process with PID 776 is clearly the postmaster. Now, assume you want to protect the postmaster itself: you can call protect(1) specyfing the PID of the process.

% sudo protect -p 776

The main flags for protect(1) are:

-p specifies the PID of the process to protect;
-d or -i to apply the protection to all the current children or to the future children;
-c to remove the protection.
Therefore, in order to protect all new connections to the database the command to use is:

% sudo protect -i -p 776

that reads as protect process 776 and all new forked processes.

Doing all the protection manually is boring, and luckily the excellent rc.d system allows for the configuration of protection at the service startup. It is possible to specify the oomprotect configuration parameter for the service (all services, not only PostgreSQL!), that in turn can assume the following values:

yes enables protection for (a single) process;
all enables protection for all processes (forked).

Unluckily, this does not apply directly to PostgreSQL since the service(8) script /usr/local/etc/rc.d/postgresql does not fully use /etc/rc.subr that, in turn, is in charge of examining the oomprotect variable. The postgresql script uses directly pg_ctl(1) to manage the cluster, without any “protection** possible. I suspect the problem is due to the fact that pg_ctl(1) must be run as a normal user, and therefore there is the need to simultaneously run the pg_ctl(1) command without root privileges, as well as with such privileges to wrap it in protect(1).

In short, this means that even a configuration like the following will not apply protect(1) to PostgreSQL:

postgresql_enable="YES"
postgresql_data="/postgres/12/data"

# all = protect -i -p
# yes = protect -p
postgresql_oomprotect="all"

Therefore, in order to protect the postmaster or any other PostgreSQL process, you need to manually use protect(1) as already shown.
I am not sure if this is going to change in the future to allow the rc.d script to honor the oomprotect variable.

How to inspect the protection status

This has been hard to me, but again thanks to great FreeBSD community and IRC, I discovered that ps(1) has a special command line argument, named flags, that can show the status of the single process protection. It is also the flags2 command line argument that shows the status of the protection inheritance.
Both the flags and flags2 sections contain hexadecimal values that indicates all the extra information tied to a process. In the case of P_PROTECTED the value is 0x100000 (and this is found in flags), while for the P_INHERIT_PROTECTED the value is 0x00000001 (and this is found in flags2).
Putting it all together, you can inspect your PostgreSQL processes as follows:

% sudo ps -ax -o flags,flags2,pid,command | grep postgres
10104000 00000001 3747 /usr/local/bin/postgres -D /postgres/12/data
10100000 00000001 3748 postgres: logger    (postgres)
10100000 00000001 3750 postgres: checkpointer    (postgres)
10100000 00000001 3751 postgres: background writer    (postgres)
10100000 00000001 3752 postgres: walwriter    (postgres)
10100000 00000001 3753 postgres: stats collector    (postgres)
10100000 00000001 3754 postgres: logical replication launcher    (postgres)

The first process, with PID 3747 is the already mentioned postmaster and it has a flags value of 10104000 that means it is OOM protected, and it also has a flags2 section that is 00000001 that means it will make any spawn process protected too.

You can check this with some math and Perl:

% sudo ps -ax -o flags,flags2,command \
       | grep postgres \  
       | perl -lanE 'say "[OOM PROTECTED]\t@F[2 .. $#F]" if $F[0] =~ /^\d{2}1\d{5}$/; '                |                                                                                |
[OOM PROTECTED] /usr/local/bin/postgres -D /postgres/12/data
[OOM PROTECTED] postgres: logger (postgres)
[OOM PROTECTED] postgres: checkpointer (postgres)
[OOM PROTECTED] postgres: background writer (postgres)
[OOM PROTECTED] postgres: walwriter (postgres)
[OOM PROTECTED] postgres: stats collector (postgres)
[OOM PROTECTED] postgres: logical replication launcher (postgres)

The above Perl one liner gets the command line and the flags section, as internal array @F, and checks if the third leftmost bit is set; in such case the process is protected against OOM killing.

Hey ‘ma, am I protected?

I created an example pl/pgSQL function to check if the current connection is protected against the OOM Killer. The function is defined with SECURITY DEFINER and has to be created by a superuser, because it internally uses the COPY command to execute the ps utility.

CREATE OR REPLACE FUNCTION
f_oomprotect( pid int DEFAULT NULL )
RETURNS boolean
AS
$CODE$
DECLARE
  p_protected  bit(8)  = '00100000';
  is_protected boolean = false;
  shell        text;
BEGIN
  -- if no pid supplied, use my own
  IF pid IS NULL OR pid < 0 THEN
    pid := pg_backend_pid();
  END IF;

  RAISE DEBUG 'Inspecting PostgreSQL process %', pid;

  shell :=    '/bin/ps -ax -o flags,flags2 -p '
                || pid
                || ' | /usr/bin/tail -n 1 ';
  CREATE TEMPORARY TABLE IF NOT EXISTS
            my_ps( flags bit(8), flags2 bit(8) );
  TRUNCATE my_ps;
  EXECUTE format( '  COPY my_ps( flags , flags2 ) FROM PROGRAM $$ %s $$ WITH ( DELIMITER $$ $$, FORMAT TEXT)', shell );


   SELECT ( flags & p_protected )::int > 0
   INTO is_protected
   FROM my_ps;

   RETURN is_protected;
END
$CODE$
LANGUAGE plpgsql
SECURITY DEFINER;

The idea is quite simple: the function gets a PID to check, if none is specified it assumes we are interested in the current connection. Then the function creates (or empties) a temporary table my_ps to store the result of the ps shell command, in particular flags and flags2 (even if only the former is used). Flags are stored as bit strings, so that it becomes simpler to make flag comparison. Last, the flags field is compared with a logical and with the p_protected internal variable, and the boolean result is returned.
Therefore if the function returns true the selected connection/backend process is protected against the OOM Killer.

Conclusions

As usual FreeBSD reveals itself as a complex and well designed operating system. PostgreSQL can be protected against the OOM Killer in a more aggressive way with regard to Linux, but as usual protecting everything is like protecting nothing at all, so I recommend to not abuse about the protec(1) command.

A glance at Raku connectivity towards PostgreSQL

2021-03-29T00:00:00+00:00

A glance at Raku implementation for PostgreSQL database connectivity.

A glance at Raku connectivity towards PostgreSQL

Raku is a great language in my opinion, and I’m using it more and more everyday. I can say it is going to substitute my Perl scripting.

Raku comes with an extensive module library, that include of course database connectivity, that in turn includes features for connecting to PostgreSQL.
In this simple article, I’m going to quickly demonstrate how to use a Raku piece of code to do many of the trivial tasks than a database application can do.
The script is presented in an incremental way, so the Connecting to the database section must be always be as the script preamble.

The DB::Pg module is somehow similar to Perl 5 DBD::Pg, so a lot of concepts and method names will remind the latter.

Installation

It is possible to use zef to install the DB::Pg module:

% zef install DB::Pg

Depending on the speed of your system and the libraries already installed, it can take a few minutes.

If you are going to use the LISTEN/NOTIFY you need to install also the epoLl:

% zef install epoll

Connecting to the database

It is now possible to connect to the database using the DB::Pg module. For example, a simple script that accepts all parameters (in clear text!) on the command line can be:

#!raku

use DB::Pg;

sub MAIN( Str :$host = 'miguel',
          Str :$username = 'luca',
          Str :$password = 'secret',
          Str :$database = 'testdb' ) {

    "Connecting $username @ $host/$database".say;

    my $connection = DB::Pg.new: conninfo => "host=$host user=$username password=$password dbname=$database";

As you can see, the DB::Pg module accepts a conninfo string.

Read queries and results

The .query method allows for issuing a read query to the database. The result is a Result class object, that can be used by means of different methods, most notably with .hashes and .arrays that return a sequence of hashes or arrays, one per every row extracted from the query.
Special methods like .rows and .columns provide respectively the number of rows returned by a query and the list of coumn names of the result set.
As an example, here it is a simple query:

my $query = 'SELECT current_role, current_time';
my $results = $connection.query: $query;

say "The query { $query } returned { $results.rows } rows with columns: { $results.columns.join( ', ' ) }";
for $results.hashes -> $row {
    for $row.kv -> $column, $value {
        say "Column $column = $value";
    }
}

The above piece of code provides an output similar to the following:

The query SELECT current_role, current_time returned 1 rows with columns: current_role, current_time
Column current_role = luca
Column current_time = 14:48:47.147983+02

Cursors

By default, a .query method will fetch all the rows from the query, that is a problem with larger datasets. It is possible to use the .cursor method that accepts the optional batch size (by default 1000 tuples) and, optionally, the specifier for getting results into a sequence of hashes.
As a simple example:

for $connection.cursor( 'select * from raku', fetch => 2, :hash ) -> %row {
    say "====================";
    for %row.kv -> $column, $value {
        say "Column [ $column ] = $value";
    }
    say "====================";
}

that produces and output like:

====================
Column [ pk ] = 2
Column [ t ] = This is value 0
====================
====================
Column [ pk ] = 3
Column [ t ] = This is value 1
====================
====================
Column [ t ] = This is value 2
Column [ pk ] = 4
====================
====================
Column [ pk ] = 5
Column [ t ] = This is value 3
====================
...

Write Statements

Write statements can be performed by means of .execute method, such as:

$connection.execute: q< insert into raku( t ) values( 'Hello World' )>;

Transactions and Prepared Statements

In order to handle transactions, you need to access the database handler that is “masked” into the DB::Pg main object. The database object provides the method .begin, .rollback, .commit as usual.
Moreover, it is possible to use the .prepare method to obtained a prepared statement that can be cached and used in loops and repetitive tasks. It is worth noting that the .prepare method use the $1, $2, and so on parameter placeholders, and that when a statement accepts a single value it has to be specified without the index in .execute.
As an example:

my $database-handler = $connection.db;
my $statement = $database-handler.prepare: 'insert into raku( t ) values( $1 )';

$database-handler.begin;
$statement.execute( "This is value $_" )  for 0 .. 10;
$database-handler.commit;
$database-handler.finish;

The above loop is equivalent to an SQL transaction like:

BEGIN;
INSERT INTO raku( t ) VALUES ('This is value 0' );
INSERT INTO raku( t ) VALUES ('This is value 1' );
INSERT INTO raku( t ) VALUES ('This is value 2' );
...
INSERT INTO raku( t ) VALUES ('This is value 10' );
COMMIT;

The .finish method is required because DB::Pg handles caching. Please note that the .commit and .rollback methods are fluent, and return an object instance so that you can call .commit.finish.

Databases vs Connections

Caching is handled so that when a query is issued, a new connection is opened and used. Once the work has completed, the connection is returned to the internal pool. The DB::Pg::Database object does the same work of the DB::Pg one, with the exception that it does not automatically returns the connection to the pool, so you need to do the .finish by yourself.

Therefore, you can use the same .query and .execute methods on both the objects, but the DB::Pg automatically returns the connection into the internal pool, while the database object allows you for a fine grain control of when to return the connection into the pool.

Copy

PostgreSQL provides the special COPY command, that can be used to copy from and into. There is a method .copy-in that executes a COPY FROM, while COPY TO can be used within an iteration loop:

my $file = '/tmp/raku.csv'.IO.open: :w;
for $connection.query: 'COPY raku TO stdout (FORMAT CSV)'  -> $row {
    $file.print: $row;
}

The above exports the CSV result on a text file.
To read the data back, it is possible to issue the .copy-in method, but you first need to issue an SQL COPY. The workflow is:

issue a COPY FROM STDIN;
use .copy-data to slurp all the data;
use .copy-end to notify the database that the COPY is concluded.

The need for .copy-end is an advatange: it is possible to issue different .copy-data in a single run, for example to import data from different files.

$database-handler = $connection.db;
$database-handler.query: 'COPY raku FROM STDIN (FORMAT CSV)';
$database-handler.copy-data:  '/tmp/raku1.csv'.IO.slurp;
$database-handler.copy-data:  '/tmp/raku2.csv'.IO.slurp;
$database-handler.copy-end;

Converters

It is possible to specify converters, special roles that handle values in and out the database; something that reminds me the inflate and deflate options of DBI::Class.
The first step is to add a role to the converter instance within the DB::Pg, such instance must:

add a new type conversion;
add a convert method to handle the type stringified value and returns the new value (in any Raku instance).
As an example, the following converts a text PostgreSQL type into a Str Raku object reversed in its content:

$connection.converter does role fluca-converter
{
    submethod BUILD { self.add-type( text => Str ) }
    multi method convert( Str:U, Str:D $value) {
        $value.flip.uc;
    }

}

.say for $connection.query( 'select * from raku' ).arrays;

that produces an output similar to:

[442 DLROW OLLEH]
[454 DLROW OLLEH]
[466 DLROW OLLEH]

where the string Hello World is flipped.

Listen and Notify

DB::Pg can handle also LISTEN and NOTIFY, and they are able to interact with the react dynamic feature of Raku.
First of all, create a simple mechanism to notify some events:

testdb=> create or replace rule r_raku_insert 
         as on insert to raku 
         do also 
         SELECT pg_notify( 'insert_event', 'INSERTING ROW(S)' );
CREATE RULE

testdb=> create or replace rule r_raku_delete
         as on delete to raku 
         do also 
         SELECT pg_notify( 'delete_event', 'DELETING ROW(S)' );
CREATE RULE

Now it is possible to create a Raku script that waits for incoming events:

react {
    whenever $connection.listen( 'delete_event' ) { .say; }
    whenever $connection.listen( 'insert_event' ) { .say; }
}

The aim is that, every time an event is issued, the .listen passes the message payload to the react code block. Therefore, issuing some DELETE and INSERT` will result in the output:

DELETING ROW(S)
INSERTING ROW(S)
INSERTING ROW(S)

It is possible to stop the listening react block with the .unlisten method. It is also possible to issue an event via .notify.

Conclusions

The DB::Pg is a great driver for PostgreSQL that allows Raku to exploit a lot of features directly into the language.

A first look at pg_repack

2021-03-25T00:00:00+00:00

An interesting extension that helps removing bloating from tables and databases.

A first look at pg_repack

I got time to have a look at pg_repack, an interesting extension that helps removing bloat from tables, indexes and databases with the promise of minimal locking.
Since this is a first look, I could be wrong on some aspects, so please apologize me.

The main idea behind pg_repack is to perform an on-line copy of a source (bloated) table, then switching the original table with the new one. In short, something like:

BEGIN;
CREATE TABLE not_bloated AS SELECT * FROM my_table;
ALTER TABLE my_table RENAME TO old_bloated;
ALTER TABLE not_bloated RENAME TO my_table;
DROP TABLE old_bloated;
COMMIT;

Of course things are a lot more complex than the above description, but I think that could be a good summary of what happens.

Why such a workflow would remove bloating?
Well, the idea is that the copy of tuples from the bloated table will, of course, copy only visible tuples (i.e., those that would left after a VACUUM). In other words, dead tuples are not going to hit the new table and therefore the last will not be bloated.

What about locking?
Since the copy is done on-line, the source table (the bloated one) can be used as usual, that is DML queries can be executed against such table. This of course creates a kind of race-condition, since changes are not propagated automatically to the new table.
To solve the problem, pg_repack installs a trigger that will fire for every DML statement and will log changes to a repack.log table, so that pg_repack will be able to replay changes at the end of the copy, that is just before switching the tables.
This is important, according to me, because this means that running pg_repack is not the same as running VACUUM since the new table could have a small fraction of bloating. Why? Well, if during the copy the original table is subjected to a workload that can cause bloating (i.e., UPDATE and DELETE), such bloating will be propagated to the new table as well. <br/<
What about disk space?
Doing a copy of the original table, pg_repack is going to require at least double the size of the original table on disk.

Installing `pg_repack`

pg_repack is an extension, and therefore can be installed via pgxn (as well as manually, of course):

% sudo  pgxn install pg_repack

Then of course, you need to install the extension into the database you are going to use (or repack):

% psql -U postgres -c "CREATE EXTENSION pg_repack;" testdb
CREATE EXTENSION

Using `pg_repack`

pg_repack must be invoked from the command line as an external utility. The command accepts pretty much all the usual arguments from libpq:

%  pg_repack -t "luca.wa" -U postgres testdb
INFO: repacking table "luca.wa"

The above will repack a single table, but it is possible to repack all tables in a schema, all tables in a database and so on.

The `repack` schema

pg_repack installs a repack schema in the database where the extension lives. In such schema there are different tables, mainly temporary for repacking objects. An interesting table is repack.tables that contains all the details for every table that can be repacked. Querying such tables you can see some tricks used in the workflow of pg_repack:

testdb=# select create_log, create_trigger, lock_table 
         from repack.tables 
         where relname = 'luca.wa';
...
create_log     | CREATE TABLE repack.log_16553
                 (id bigserial PRIMARY KEY, pk repack.pk_16553, row luca.wa)
create_trigger | CREATE TRIGGER repack_trigger
                 AFTER INSERT OR DELETE OR UPDATE ON luca.wa
                 FOR EACH ROW EXECUTE PROCEDURE
                 repack.repack_trigger('INSERT INTO repack.log_16553(pk, row)
                                       VALUES( CASE WHEN $1 IS NULL THEN NULL
                                       ELSE (ROW($1.pk)::repack.pk_16553) END, $2)')
lock_table     | LOCK TABLE luca.wa IN ACCESS EXCLUSIVE MODE

As you can see, there are SQL instructions to create a log_xxx table where changed tuples will be logged, as well as the definition of the trigger to attach to the table.
The repack_trigger is a C function that accepts an SQL string (as you can see) and that will execute an insert into the log_xxx table so that:

in case of an INSERT the new tuple will be inserted as (null, row);
in case of an UPDATE both the new and old tuples will be inserted as (old, new);
in case of a DELETE the old tuple only will be inserted as (old, null).

The lock_table is used to lock the table during the initial and final steps, that is at the time the trigger is attached and when the tables are swapped.

Conclusions

pg_repack is surely an interesting extension to keep into the bag. In the future I’m going to spend some time using this extension to see how it performs, but I already know there are happy people using it, so I’m expecting positive results!

Physical Backup Privileges Check

2021-03-25T00:00:00+00:00

A simple view to see if a user can perform backups.

Physical Backup Privileges Check

In order to perform a physical backup, PostgreSQL requires a role that is allowed to perform several operations, mainly invoke pg_start_backup() and pg_stop_backup() functions.
On old PostgreSQL versions, before version 10, only superusers can invoke the above backup functions and, therefore, can do a physical backup. Since PostgreSQL 10 things have changed and nowdays there are more fine grain permissions. In particular there are a few default roles that can be used to set up a backup role.
This is what I usually do, since working with superuser role can be dangerous, so I do create low profile roles and assign them the required privileges.

Depending on the backup solution you are going to implement, these privileges can be different, so I decided to create a view that can help inspecting the status of the roles available in the database.

The view works as follows:

extract a set of flags;
merge with logical AND and OR;
provide some additional flags.

Using the view results in something like the following:

backupdb=> select * from vw_role_backup_privileges
               WHERE rolname IN ( 'luca', 'backup' );
  -[ RECORD 1 ]------------|-------
  rolname                  | backup
  can_do_backup            | t
  can_monitor_backup       | t
  can_create_restore_point | t
  can_switch_wal           | t
  -[ RECORD 2 ]------------|-------
  rolname                  | luca
  can_do_backup            | f
  can_monitor_backup       | f
  can_create_restore_point | f
  can_switch_wal           | f

As you can see, the backup role, even if not a superuser, can do backups, can monitor them, can create restore points and force a WAL switch.
Of course, the above is not a one size fits all solution, since every backup solution could require different permissions, however this is a possible starting point to check the status of the users.

In the following I describe the single pieces of the view.

What is required to do a physical backup?

The minimal set of privileges required to perform a backup are:

permission to start a replication;
permission to invoke pg_start_backup(), pg_stop_backup().

This is done by the part:

f.can_start_replication
AND f.pg_start_backup
AND ( f.pg_stop_backup OR f.pg_stop_backup_exclusive )

where I check the above requirements.

Additinal requirements

being able to start and stop a backup could not suffice: the user could be required to monitor the backup. Monitoring always means being able to query the statistic data and the configuration of the cluster. The former can be used to see if the replication is working fine, while the latter to check the archiving setup.
PostgreSQL provides a pg_monitor role that can do the above queries, otherwise the user could need two different roles, namely pg_read_all_settings and pg_read_all_stats. Since pg_monitor includes the above two roles, assigning pg_monitor is equivalent to assign the latter two roles. It could also be required to be able to query the pg_is_in_backup() function, that indicates if the cluster is actually in physical backup mode.
This means that I need to check:

f.pg_monitor
  OR ( f.pg_read_all_settings
       AND f.pg_read_all_stats
       AND f.pg_is_in_backup
  )

Switch WALs and create restore points

Starting a backup could also require the user to issue an immediate switch of the WALs in order to quickly start the backup.
Moreover, it could be required to create a restore point, for example to mark in the WALs that the backup has started at a specific point in time.
This mean that the check is:

 f.pg_create_restore_point AS can_create_restore_point
 , f.pg_switch_wal           AS can_switch_wal

Putting everything together

Having stated the above list of requirements, the query can be split into two parts:

a CTE that extracts the flags;
a query that composes the flags.

To extract the flags, the following CTE can be used:

WITH flags AS (
    SELECT
    a.rolname
    , a.rolsuper         AS is_superuser
    , a.rolreplication AS can_start_replication
    , pg_has_role( a.rolname, 'pg_monitor', 'USAGE' ) AS pg_monitor
    , pg_has_role( a.rolname, 'pg_read_all_settings', 'USAGE' ) as pg_read_all_settings
    , pg_has_role( a.rolname, 'pg_read_all_stats', 'USAGE' ) as pg_read_all_stats
    , has_function_privilege( a.rolname, 'pg_start_backup( text, bool, bool )', 'EXECUTE' ) as pg_start_backup
    , has_function_privilege( a.rolname, 'pg_stop_backup( bool, bool )', 'EXECUTE' ) as pg_stop_backup
    , has_function_privilege( a.rolname, 'pg_stop_backup()', 'EXECUTE' ) as pg_stop_backup_exclusive
    , has_function_privilege( a.rolname, 'pg_create_restore_point( text )', 'EXECUTE' ) as pg_create_restore_point
    , has_function_privilege( a.rolname, 'pg_is_in_backup()', 'EXECUTE' ) as pg_is_in_backup
    , has_function_privilege( a.rolname, 'pg_switch_wal()', 'EXECUTE' ) as pg_switch_wal
FROM
    -- use pg_roles instead of pg_authid
    -- to allow non-superuser roles to query
    pg_roles a
)

I do query pg_roles that contain all the information that is found in pg_authid but do not require superuser privileges to be queried.
Please note that I check role group membership with the USAGE privilege, that means that the role does not have to do an explicit SET ROLE to gain access to the privileges from the group it belongs to, that is it has been created WITH INHERIT.

Then, composing the flags is as simple as:

SELECT f.rolname
    , f.is_superuser
      OR (
          f.can_start_replication
          AND f.pg_start_backup
          AND ( f.pg_stop_backup OR f.pg_stop_backup_exclusive )
      ) AS can_do_backup
    ,   f.pg_monitor
        OR ( f.pg_read_all_settings
             AND f.pg_read_all_stats
             AND f.pg_is_in_backup
        ) AS can_monitor_backup
    , f.pg_create_restore_point AS can_create_restore_point
    , f.pg_switch_wal           AS can_switch_wal
FROM flags f;

Managing Multiple PostgreSQL Instances on FreeBSD

2021-03-22T00:00:00+00:00

FreeBSD service(8) is a fully featured system to manage services, and allows multiple instances of PostgreSQL.

Managing Multiple PostgreSQL Instances on FreeBSD

FreeBSD allows the management of multiple instances of PostgreSQL by means of rc.conf(5).
The trick is to use profiles, that are available for the PostgreSQL rc script (/usr/local/etc/rc.d/postgresql) even if not well documented, at least in my opinion.
In order to understand how to deal with multiple PostgreSQL instances, consider a system with two cluster: test and prod.
In /etc/rc.conf you need to define the postgresql_profiles variable, where you list the clusters separated by spaces. Then, for each profile, you define the well know postgresql_xxx variables, specifying the profile name before the variable suffix. For example, to define a PGDATA, that will be usually defined into postgresql_data variable, you need to specify a postgresql_<profile-name>_data variable.
Therefore, in /etc/rc.conf you need to specify the following:

postgresql_profiles="test prod"

postgresql_test_data="/postgres/12/test"
postgresql_test_enable="YES"

postgresql_prod_data="/postgres/12/prod"
postgresql_prod_enable="YES"

Now you need to manage all instances by specifying the profile name on every service(8) call:

% sudo service postgresql start test

% sudo service postgresql status test
pg_ctl: server is running (PID: 35979)
/usr/local/bin/postgres "-D" "/postgres/12/test"

You need to specify the profile name as last argument to service(8) invocation.
But there is more: if you don’t specify any profile on the command line, service(8) will iterate on all available profiles. As an example, the following two sequences are equivalent:

% sudo service postgresql stop       
===> postgresql profile: test
===> postgresql profile: prod

# equivalent to
% sudo service postgresql stop test
% sudo service postgresql stop prod

With this simple profile-based management, it is easy to handle and manage multiple PostgreSQL instances on the same FreeBSD host.

PgTraining online webinar 2021-03-12 (Italian): video available

2021-03-19T00:00:00+00:00

An online event organized by PgTraining.

PgTraining online webinar 2021-03-12 (Italian): video available

PgTraining, the amazin italian group of people that spread the word about PostgreSQL and that I joined in the last years, has organized an online free event (webinar) on last April the 12th, 2021.
There were around 45 participants, that was quite a success in our opinion for such a small event. Most notably, the participants were really interested in the topics covered and, in fact, we got a lot of live questions and were unable to close the session in time since the discussion was really active!
We made video recordings for the event available, or better off line video recording because we decided not to live record the event due to privacy concerns. All the videos are in italian language. Here there is my talk, pgbackrest as a backup solution:

PgTraining online event 2021-03-12: pgBackRest from Pg Training on Vimeo.

There is also the video about sharding, made by Enrico, that you can watch here:

PgTraining online session 2021-03 - Sharding from Pg Training on Vimeo.

And finally, the video about the JIT compiler made by Chris, that you can watch here:

PgTraining online event 2021-03-12: Esperimenti con il JIT compiler di Postgres from Pg Training on Vimeo.

Slides about each talk are already available via the GitLab repository of the event (italian).

PgTraining online webinar on 2021-03-12 (Italian)

2021-02-13T00:00:00+00:00

An online event organized by PgTraining.

PgTraining online webinar on 2021-03-12 (Italian)

PgTraining, the amazin italian group of people that spread the word about PostgreSQL and that I joined in the last years, is organizing an online event (webinar) on next 12th April 2021.

The event will consist in three hours with talks about sharding, backup using PgBackrest and the JIT compiler.
The webinar will be in Italian and there will be room for questions and discussion at the end of every single talk.

There are only 40 available seats for the event, that is totally free of charge, so hurry up and register to the event.

PostgreSQL TOAST Data Corruption (ERROR: unexpected chunk number)

2021-02-08T00:00:00+00:00

I wrote a simple function to test for corrupted TOAST data.

PostgreSQL TOAST Data Corruption (ERROR: unexpected chunk number)

The Oversize Attribute Storage Tecnique (TOAST) is a way that allows PostgreSQL to store any kind of attribute within the table.
PostgreSQL stores data into data pages that have a fixed size, usually 8 kB; this means there is no room for a variable content (e.g., a string) that grows more than a single data page. To solve the problem, PostgreSQL uses TOAST: when an attribute value is too large to be stored in the table data page, PostgreSQL transparently moves the content to an external storage, namely pg_toast, where the content is split into chunks (parts) and stored as a set ot chunk tuples. When you ask back your content, PostgreSQL transparently seeks the chunks, recompose them in the right order, and provide the result to you. It is like the system executes a transparent join between your main table and the pg_toast one.

Unluckily sometime the TOAST storage can be damage, by accident often, resulting in data corruption. The problem is that such corruption goes often unseen until the real content is required: in other words your table looks fine unless you select that exact content that has been stored off-line into TOAST.

In this article I introduce a couple of functions that can serve as a basis to find out damaged TOAST data.
I’ve written such functions to do exactly the above job: help me identify the records that have been damaged, so that I can decide how to restore them (and here you should insert any backup good advice as you wish).

This article is divided into two parts:

the first one creates an examples and damaged it by purpose, so that you can try the code;
the second part shows how to use the functions and get some results.

The code of the functions can be found online on my Gitlab repository. As usual, any comment and improvement is appreciated.
Inspiration for this technique comes from Josh Berkus excellent article.

Create an example: corrupt your TOAST data

Assume we create the following table within our database:

testdb=> create table example_toast( a int, b text, c float, d varchar(10000) );

testdb=> alter table example_toast add column pk serial primary key;

testdb=> insert into example_toast
select x, repeat( 'fluca1978', x * 8000 ), x * 1.2, 
          repeat( 'fluca1978', x * 10 )
from generate_series( 1, 210 ) x;

testdb=> select pg_size_pretty( pg_relation_size( 'example_toast' ) );
 pg_size_pretty 
----------------
 8192 bytes
(1 row)

The table has been filled with four different types of data, each initialized with a different value, with particular regard to the text types that have been initialized to long contents. The table results in a very small one, and occupies exactly one data page.
Does the table has any TOAST-ed data? We can check that reltoastrelid has a value:

testdb=> select relname, relfilenode, reltoastrelid from pg_class where relkind = 'r' and oid = 'example_toast'::regclass;
    relname    | relfilenode | reltoastrelid 
---------------|-------------|---------------
 example_toast |       52367 |         44178
(1 row)

Therefore, the table has the 44178 TOAST table associated.

It’s time for a corruption!

In order to make the toasted data faulty, we can use an old Perl script of mine that is going to insert a crappy string into a data file. The script is really simple, as you can see:

#!env perl

open my $db_file, "+<", $ARGV[ 0 ]
     || die "Cannot open data file!\n\n";
seek $db_file, ( 8 * 1024 ) + $ARGV[ 1 ], 0;

print { $db_file } "Hello Corrupted Database!";
close $db_file;

Having placed a corruption script, we need to find out the data file that must be corrupted: it is the TOAST table we are going to damage, and we can get the path to a disk file using the PostgreSQL functions.

testdb=> select relname, relfilenode, reltoastrelid, 
                pg_relation_filepath( reltoastrelid ) 
                from pg_class where relkind = 'r' 
                and oid = 'example_toast'::regclass;
                
-[ RECORD 1 ]--------|-----------------
relname              | example_toast
relfilenode          | 52367
reltoastrelid        | 44175
pg_relation_filepath | base/24815/52368

We can now corrupt the data on the datafile base/24815/44175:

% sudo -u postgres \ 
        perl /usr/local/bin/do_corruption.pl \
             /postgres/12/base/24815/44175   \
             12345

WARNING: don’t try this at home, or better, do try against a test-only database!

Find out the corruption

What happens if we query the table now? Well, we asked for a data corruption and we got it!

testdb=> \o test.txt
testdb=> select b,d from example_toast;
ERROR:  unexpected chunk number 1126199148 (expected 2) for toast value 60699 in pg_toast_44175

Please note that I sent the otuput of the query to a file to make the whole buffer fill the blog post.

Searching for the error: find out corrupted TOAST data

The data on the TOAST storage has been damaged, and it is now required to find out which tuples have been affected by the damage so that you can decide the right strategy for recovery of that data.
I have built a couple of functions that can help you find out the damaged tuples. Let’s see the final result and then allow me to discuss the details:

testdb=> select * from f_find_bad_toast( 'example_toast', 'pk' );


-[ RECORD 1 ]----|-------------------------------------------------------------
total            | 210
ok               | 207
ko               | 3
health_ratio     | 98.57142857142857
damage_ratio     | 1.4285714285714286
description      | Table example_toast has 1.4285714285714286% toast 
                   data damaged (toast relation pg_toast.pg_toast_44175 
                   on disk file [base/24815/85136])
damage_tuple_ids | {110,111,112}

We now have a report that tells us that 3 records out of the 210 total ones have been damaged: the record ranging from pk 110 to 112 are the ones hitted by the data corruption, and therefore the toast data is wrong. The good news is that 98% of our table is healthy.

The function f_find_bad_toast accepts the table name and a column that must be unique (and therefore acting as a surrogate primary key). The function inspects every single record in the table and tries to de-toast its data. The final result is that, in this example, every single tuple has been corrupted.

The function f_find_bad_toast does the following: 1) performs a few sanity checks, and gets the list of TOASTable attributes of the table; 2) prepare an SQL query SELECT to query every single toastable attribute; 3) converts the toasted column into text and performs a few aggregate operations on that data, so to force the detoasting; 4) if an exception arises, the function stores the primary key of the tuple to indicate that there is an error on such tuple.

Internally, the function exploits another custom piece of code, f_enumerate_toastable_columns that provides a list of those columns that could have been stored on TOAST. As an example:

testdb=> select * from f_enumerate_toastable_columns( 'example_toast' );
 f_enumerate_toastable_columns 
-------------------------------
 b
 d
(2 rows)

As you can see, only columns with variable length could be stored in the TOAST area.

How `f_find_bad_toast()` works

You can get some hints on the internal working behavior by increasing the debug message level:

testdb=> set client_min_messages to debug;

testdb=> select * from f_find_bad_toast( 'example_toast', 'pk' );
...
DEBUG:  Preparing to de-toast record pk = 161
DEBUG:  Prepared query [SELECT  lower( b::text )  ||  lower( d::text )  FROM example_toast WHERE pk = '161']
DEBUG:  Succesfully executed query [SELECT  lower( b::text )  ||  lower( d::text )  FROM example_toast WHERE pk = '161']
,,,

As you can see, the function inspects every single record at a time (i.e., it can be really slow on large tables!) and builds an appropriate query to de-toast toastable data. Then the query is executed, the result is placed into a variable and the length of the result is computed; if this succeed the data has been detoasted, otherwise there was a problem reading the toasted data. In short, the function does:

BEGIN
   EXECUTE query_detoast
   INTO    current_detoasted_data;

  PERFORM  length( current_detoasted_data );
  RAISE DEBUG 'Succesfully executed query [%]', query_detoast;
  ok_counter = ok_counter + 1;
EXCEPTION
  WHEN OTHERS THEN
       ko_counter = ko_counter + 1;
       wrong_tuple_ids = array_append( wrong_tuple_ids, current_pk );
       RAISE NOTICE 'Record with % = % of table % has corrupted toast data!', pk, current_pk, tablez;

END;

The query_detoast is built for every single record, as for instance SELECT lower( b::text ) || lower( d::text ) FROM example_toast WHERE pk = '161'.

Arguments

The f_find_bad_toast function accepts four arguments:

the table name;
the surrogate primary key column name;
an optional limit clause, useful when inspecting very large tables;
an optional offset argument, useful when iterating over the same table.

A possible improvement could be to automatically find out a table’s primary key, for example by inspecting the system catalogs.

`f_enumerate_toastable_columns`

The f_enumerate_toastable_columns inspects the system catalogs to find out which attributes can be stored by TOAST. At its core, it returns every item in pg_attribute that has a storage of x (extended) or e (external), meaning that the attribute has been stored outside of the main table.

Conclusions

The TOAST mechanism is great, but until you detoast the content of your data you could not notice a problem in it. Periodically run tools based on the above functions can help you determine if a problem has been generated, and so far I’ve only experienced human-caused damages, so don’t worry about your PostgreSQL cluster as far as nobody disturbs it!

PostgreSQL Literate Programming with GNU Emacs

2021-01-18T00:00:00+00:00

GNU Emacs is great! I can prepare my slides with PostgreSQL snippets of code and results.

PostgreSQL Literate Programming with GNU Emacs

What is literate programming? Literate Programming is a programming paradigm that makes you write a program in a more natural language, interleaving documentation and code together.
GNU Emacs allows literate programming by means of Org Mode and its module Org Babel.
I am already used to Org Mode, and I am already writing my own documentation, slides and papers with this great tool. But Org Babel can do much more for me: as you probably know I write several articles, papers, presentation for training events all related to PostgreSQL.
The classical workflow is:

write a slide or piece of document;
execute an SQL statement (e.g. in a terminal);
copy and paste the SQL statement into your slide or document;
copy and paste the result into your slide or document.
One huge problem about the above is that every time you change the initial statement, you have to repeat the process copy and pasting the results, and this can lead to errors, inconsistencies, and duty on yourself to keep the documentation up to date. Moreover, imagine the output of a command changes from one version of PostgreSQL to another: you have to re-run every single command and repeat the copy and paste of the results.
That’s too much!

Being BNU Emacs what it is, there’s a much more smarter way to do it!

Org Babel to the Rescue!

Org Babel is a module that allows Org Mode to execute a single snippet of code. The code is executed launching external processes, like interpreters (in the case of Perl, Python, etc.), shells or, in the case of our beloved database, psql.
Let’s see an example, imagine to write the documentation for a PostgreSQL transaction as follows:

* An example of transaction

The following is a PostgreSQL explicit transaction:

#+begin_src sql :engine postgresql :dbhost miguel :dbuser luca :database emacsdb
BEGIN;

CREATE TABLE emacs( t text );

INSERT INTO emacs 
SELECT 'Foo' || v
FROM generate_series(1, 10);

COMMIT;
#+end_src

and when executed, the system replies with every command feedback:

For now, avoid the discussion about the connection parameters, that after all are quite easy to guess.
If you place within the code block (i.e., in any poin from #+begin_src to #+to_src) and hit C-c C-c, Emacs will launch a psql connection to the database to execute the SQL set of statements. In other words, it will be like if you had manually typed the following on a command line:

echo 'BEGIN; CREATE TABLE emacs(t text); ...' |  psql -h miguel -U luca emacsdb 

The end result will be that your document automagically changes to:

* An example of transaction

The following is a PostgreSQL explicit transaction:

#+begin_src sql :engine postgresql :dbhost miguel :dbuser luca :database emacsdb
BEGIN;

CREATE TABLE emacs( t text );

INSERT INTO emacs 
SELECT 'Foo' || v
FROM generate_series(1, 10);

COMMIT;
#+end_src

and when executed, the system replies with every command feedback:

#+RESULTS:
| BEGIN        |
|--------------|
| CREATE TABLE |
| INSERT 0 10  |
| COMMIT       |

that in turn, renders to something like the following

Not bad, uh?

Emacs and Org Babel Configuration

Emacs does not usually ship with Org Babel configured for SQL, so you have to place into your configuration file the following:

(org-babel-do-load-languages
 'org-babel-load-languages
 '((sql . t)))

(setq org-confirm-babel-evaluate nil)

The first three lines enables the SQL language, while the last one prevents Emacs to ask for confirmation before running every single snippet of code.

Update the Results

In the case you change a snippet of code, you can simply re-issue C-c C-c to update consequently the results.

Running All

Here it is the most fun part: imagine your documentation or slides include several snippets of code, and you want to update all the code results. Remember, you are in Emacs, and there must be a way to do it. And in fact, you can run C-c C-v b to create and/or update all the result sections.
This is particular handy for me when I want to update results based on a different version of PostgreSQL.

Connection Parameters

As you have probably guessed, those parameters after the sql tag in the header of the code snippets tell Emacs how to reach the PostgreSQL server. In particular:

dbhost is the remote hostname, with localhost for a local connection;
dbuser is the database username
dbpasswd is the user password, in clear text (!);
database is the name of the database to which you need to connect to.

Do not Repeat Yourself

You don’t have to specify the connection properties on the header of every single piece of code: you can group properties in an Org Mode tree to handle all at once.
Allow me to explain with an example document:

* My experiments

#+begin_src sql :engine postgresql :dbhost miguel :dbuser luca :database emacsdb                                                     
BEGIN;                                                                                                                                  
CREATE TABLE emacs( pk serial, t text );                                                                                                
INSERT INTO emacs(t) SELECT 'Foo' || v                                                                                                  
FROM generate_series(1,10) v;                                                                                                           
COMMIT;                                                                                                                                 
#+end_src                                                                                                                                

#+begin_src sql :engine postgresql :dbhost miguel :dbuser luca :database emacsdb                                                     
SELECT * FROM emacs                                                                                                                     
LIMIT 2;                                                                                                                                
#+end_src

the above can be replaced with a more compact version like

* My experiments
:PROPERTIES:
:header-args: sql :engine postgresql :dbhost localhost  :dbuser luca  :database emacsdb
:END:

#+begin_src sql 
BEGIN;                                                                                                                                  
CREATE TABLE emacs( pk serial, t text );                                                                                                
INSERT INTO emacs(t) SELECT 'Foo' || v                                                                                                  
FROM generate_series(1,10) v;                                                                                                           
COMMIT;                                                                                                                                 
#+end_src                                                                                                                                

#+begin_src sql 
SELECT * FROM emacs                                                                                                                     
LIMIT 2;                                                                                                                                
#+end_src

It is now possible to change in and manage the connection properties in a single place, so that if I, for example, need to change the hostname I can change on the header-args line and execute C-c C-v b to get all the require results.

Give me the Shell, Quick!

Org Babel can, of course, execute and evaluate different snippets of code and languages. This allows you to insert into your own documentation not only SQL statements, but also maintaance commands to run thru the shell, like service postgresql restart. And you can also execute directly psql as follows:

#+begin_src shell                                                                                                                       
psql -h localhost -U luca -c 'SELECT t FROM emacs LIMIT 2' emacsdb                                                                      
#+end_src                                                                                                                               

#+RESULTS:
| t      |       |
| ------ |       |
| Foo1   |       |
| Foo2   |       |
| (2     | rows) |

Please note that, since in Org Mode a <TAB> is used in conjunction with a table, the output is rendered as a two columns table even if you selected a single column.
Remember that in order to allow Org Babel to evaluate the shell commands you need to enable the shell language in the Emacs configuration, therefore in your .emacs file you must now have something like:

(org-babel-do-load-languages
 'org-babel-load-languages
 '(
   (shell . t)
   (sql . t)
   ) )

Conclusions

Emacs is a great tool! You can improve your PostgreSQL documentation by means of Org Mode and Org Babel.
There is much more about the Org Babel, and this is just a quick introduction to let you taste the power of Emacs!

pgenv special keywords: earliest and latest

2021-01-14T00:00:00+00:00

A nice addition to the pgenv PostgreSQL binary manager.

pgenv special keywords: earliest and latest

I recently added support for two different keywords in pgenv: earliest and latest.
The idea is quite simple: instead of having to specify each time a PostgreSQL version number to work on, you can now specify one of the above keywords to jump immediately to the oldest or newest PostgreSQL version you have installed. Of course, the newest PostgreSQL version is the most recent on a version number basis (not installation time), and on the other hand the oldest is the one with the lesser version number among those installed.
Let’s understand the concept with an example:

% pgenv versions
1      pgsql-12.1
3      pgsql-12.3
4      pgsql-12.4
0      pgsql-13.0
6.20    pgsql-9.6.20

Among the versions installed above, we have that:

9.6.20 is the oldest one, and therefore is mapped to earliest;
13.0 is the newest one, and therefore is mapped to newest. It is quite easy to demonstrate this by means of use:

% pgenv use earliest

PostgreSQL 9.6.20 started
Logging to /home/luca/git/misc/PostgreSQL/pgenv/pgsql/data/server.log

As you can see, earliest has been resolved to version 9.6.20; on the other hand latest is going to be resolved to 13.0:

% pgenv use latest

PostgreSQL 9.6.20 stopped
PostgreSQL 13.0 started
Logging to /home/luca/git/misc/PostgreSQL/pgenv/pgsql/data/server.log

But that is not enough: you can also narrow down the scope of versions to a specific major number. For instance, in the 12 branch we have installed 12.1, 12.3 and 12.4, that means that 12.1 is oldest version in the twelve branch, as far as 12.4 is the newest one. You can filter by a version number specifying the major version number after the earliest or latest keywords:

% pgenv use latest 12

PostgreSQL 13.0 stopped
PostgreSQL 12.4 started
Logging to /home/luca/git/misc/PostgreSQL/pgenv/pgsql/data/server.log


% pgenv use earliest 12

PostgreSQL 12.4 stopped
PostgreSQL 12.1 started
Logging to /home/luca/git/misc/PostgreSQL/pgenv/pgsql/data/server.log

Thanks to the addition of earliest and latest it becomes more intuitive and easy to automate pgenv usage, so that you don’t have to remember to which version of PostgreSQL you are referring to.

What about `build`?

Thanks to this commit, it is now possible to issue a build command using the same special keywords as above.
As an example, specifying pgenv build latest 13 will install the latest available version in the 13 major release, as well as pgenv build latest will install the very last available version among all. The word earliest works the opposite, even if I believe that building the very oldest PostgreSQL version could be a good way to have fun!

krunner and PostgreSQL Documentation Search

2021-01-10T00:00:00+00:00

How to search directly into the PostgreSQL documentation from your Plasma desktop.

krunner and PostgreSQL Documentation Search

If you, like me, are addicted to Plasma, the KDE desktop, you probably already know about krunner, an application launcher on steroids.
krunner allows you to quickly launch, kill, switch to and manage applications, as well as executed computations and, most notably web searches. In fact, krunner exploits the Konqueror shortcuts for web searches. Konqueror is the default web browser for KDE/Plasma (since KDE version 2), and allows for a quick customization of shortcut that enable you to redirect a string thru a search engine. As an example, by default Konqueror has the dd and the gg shortcuts: the former enbles the search of the remaining part of the string thru DuckDuckGo, while the latter thru Google.
So, what does it take to get krunner integrated with the PostgreSQL official documentation search engine?
There is no much work to do, after all, and in fact it does suffice to: 1) create a new Konqueror shortcut; 2) no, there are no other steps involved!
The good news is that you can configure whatever you want by the krunner interface itself.

Configure krunner

First of all, launch krunner by hitting ALT + F2 or ALT + <space>, then click on the setup icon on the left of the bar

In the dialog window, scroll to the Web Shortcuts line and click on the configure icon.

In the opened dialog, after having searched for the key sequence you want to insert, click on the New button to create a new shortcut.

Fill the dialog as you find appropriate, but with regard to the Shortcut URL place https://www.postgresql.org/search/?q= and then hit the button on the right to insert the query parameters (\{@}), so that the ending result is https://www.postgresql.org/search/?q=\{@}.
Place a shortcut in the Shortcuts entry, separaed by comma, for example pg, then postgres and last postgresql, so that you will be able to inject a search by a short or common character sequence.

Apply the changes and get ready for your PostgreSQL related queries.

It is now time to test the searching shortcut:

launch krunner by hitting ALT + F2 or ALT + <space>;
enter pg: to activate the search engine
insert a PostgreSQL string and press <enter>.

and the result will popup in your default web browser (that is not mandatory to be Konqeuror!).

Konqueror and Web Shortcuts

As already written, krunner exploits the Konqueror Web Shortcuts, and in fact I wrote an article (italian) back in 2008 about the configuration of Konqueror to access the PostgreSQL documentation. I also asked for that article to appear on the ITPUG official web site, without any success, but this is another story.

Conclusions

krunner is an amazing piece of software, that I totally use every day and every moment to the extent that I do not more use a lot of icons to start applications and tasks, but simply pass a few characters to krunner and let it do the heavy work for me.
Being able to integrate the PostgreSQL documentation search into krunner represent a huge adavantage for every PostgreSQL and Plasma user.

Firefox and PostgreSQL Documentation Search

2021-01-10T00:00:00+00:00

How to search directly within the PostgreSQL documentation from your Firefox web browser.

Firefox and PostgreSQL Documentation Search

The Firefox web browser supports several search engines, extensions by means of which you can insert a search string and get it passed to a specific site for search.
It is possible to customize Firefox to search for a particular string within the PostgreSQL official documentation: the idea is to instrument the web browser to redirect the searching for thru the PostgreSQL web site via a GET URL.
In order to achieve this, you need to install a customizable search engine, and then configure the shortcuts for enabling the web engine access.

Custom Search Engine Setup

The first step consists of installing the Custom Search Engine to your Firefox web browser.
Then, clicking on the main Firefox menu (the hamburger icon), select the Add-ons entry and then go to the extensions menu: you should see the new searching engine there. Check the engine is active and then click on the three dots button and select Preferences:

In the opened screen, edit a line to add the following details:

key, I use pg as the default prefix to indicate I’m going to specify a PostgreSQL documentation search;
Search Engine Name, set to PostgreSQL or any name it makes sense to you;
URL, you have to set it to https://www.postgresql.org/search/?q={searchTerms}, where {searchTerms} is going to be replaced by firefox with the searching keywords;
Description, whatever it makes sense to you, for example PostgreSQL Official Documentation.

As you can imagine, the important parts are the key and the URL. Note that you can also add specific PostgreSQL versions by changing the URL to include a version number, do a few searches on the official web site and inspect the URL for other arguments.
Once you have done, click on the button Save Preferences and then close the tab.

Searching into the documentation

With the engine in place, you can search within the PostgreSQL documentation by means of inserting:

ms to activate the custom search engine;
pg to activate the PostgreSQL documentation search engine (this is the key specified above);
any keyword you need to search into the documentation.
As an example, imagine we want to search for the CREATE INDEX statement documentation; we need to enter:

ms pg create index

and pressing enter the search will go thru the PostgreSQL documentation web site:

Conclusions

I personally don’t like very much the way Firefox allows for a search customization: having to type a shortcut to activate the search engine and another one to specialize the search engine seems to me too much work. However, it can result useful when you live in Firefox and want to quickly search for a PostgreSQL tip!

Single User Mode and -P flag

2021-01-03T00:00:00+00:00

How to allow corrupted catalogs repair.

Single User Mode and -P flag

It could happen that you can no more connect to the database because an error on a catalog happens.
PostgreSQL is rock-solid, so this usually does not happen, but in the case of disk corruption (or sometimes because of poor human behavior), the system could be not able to connect to the database because the system catalogs for that database are bad.

When the catalogs have been corrupted at the index level, there is a chance to get back your database (and data) by restoring the system catalogs. In fact, the REINDEX command supports the SYSTEM option that, as the name suggests, performs a reindex at the system level, that is against the database catalogs.

There is however an egg and chicken problem: you can reindex only the catalogs of a database you are connected to, and if you cannot connect to such database because of an index corruption what can you do?
Luckily postgres (the process) allows for a -P flag that Prevents the system catalog indexes to be loaded:

 -P
   Ignore system indexes when reading system tables, but still update
   the indexes when modifying the tables. This is useful when
   recovering from damaged system indexes.

Therefore the recovery can be achieved following these steps:

shutdown the cluster and restart it in single user mode (see my article about);
start a backend process ignoring the system indexes, such as
```
postgres --single -P -D /your/own/pgdata your_faulty_database
```
where your_faulty_database is the damaged database;
issue a full system reindex with `REINDEX SYSTEM your_faulty_database;
restart the cluster in multi-user mode and try to connect to the faulty database.

Why is it important to start the cluster in single user mode, therefore tearing down any other database and process? Well, PostgreSQL is smart enough to prevent you to connect directly to an already running cluster, that is any postgres process is checking against the presence of a postmaster:

% sudo -u postgres postgres -D /postgres/12 -P
FATAL:  lock file "postmaster.pid" already exists
HINT:  Is another postmaster (PID 14355) running in data directory "/postgres/12"?

The conclusion is: any data corruption is a story apart and cannot be easily fixed, but often PostgreSQL provides all the tools you needto recover.
And of course, once you have recovered, you should take all precautions to backup, verify and test your data!

ITPUG has a new board of directors

2020-11-09T00:00:00+00:00

Something is changing in ITPUG?

ITPUG has a new board of directors

I’ve recently have been contacted by a friend of mine claiming that the ITPUG (Italian PostgreSQL Users’ Group) board of directors has changed, and that’s true as you can see from the association page.
Having been a member of ITPUG form its very conception to the end of 2016, I know pretty much every member of the actual fresh board of directors, and I hope to see a jump forward in the manaement of the association.

AFAIK, there are members that want to make ITPUG much more user-friendly.
And there are members who deserve it!

Good luck!

Learn PostgreSQL - a new book

2020-10-28T00:00:00+00:00

My latest book talking about PostgreSQL

Learn PostgreSQL - a new book

I’m really excited to introduce you to a new PostgreSQL book entitled Learn PostgreSQL, written by myself and my good friend Enrico.

The book covers PostgreSQL 12 and 13 and contains 20 chapters explaining how to install a cluster, how to set up for first connections, basic usage of SQL and data types, the administration of a cluster including role management, statistics and performances, log analysis and tuning, with a eye on backups. The last section of the book is dedicated to replication, both physical and logical.

It has been an hard word to complete the book, and I would like to thank Enrico for having written this with me. We are really proud and believe it delivers a good content and a complete presentation of our beloved database.

A quick book outline is the following one:

Part 1

Introduction to PostgreSQL
Getting to know your cluster
Managing Users and Connections

Part 2

Basic Statements
Advanced Statements
Window Functions
Server Side Programming
Triggers and Rules
Partitioning

Part 3

Users, Roles and Database Security
Transactions, MVCC, WALs and Checkpoints
Extending the database: the Extension ecosystem
Indexes and Performance Optimization
Logging and Auditing
Backup and Restore
Configuration and Monitoring

Part 4

Physical Replication
Logical Replication

Part 5

Usefult tools and useful extensions
Towards PostgreSQL 13

There is of course much more to describe about the book, but so far I’m still recovering from an eye surgery so I will come back in the following weeks to discuss the book.
In the meantime you can have a look at the official repository hosting the code examples.

Update and Fix Typos

I’m sorry, in the original post I have both mispelled the title, not placed the right image and link.
This is what happens when you try to edit a post and your eyes have not recovered yet from the last surgery!
This is also why I recorded the video in first place!

Update 2 (2020-11-02)

Hans-Jürgen Schönig also wrote me to emphasize the mispelling in the title, apparently the Planet PostgreSQL is not getting an update of my post.

Hey there! I'm using PostgreSQL!

2020-09-04T00:00:00+00:00

A little contribution in spreading the PostgreSQL word!

Hey there! I’m using PostgreSQL!

A few weeks ago I changed my old mobile phone, and so I had to install again all my applications, including something I personally hate: WhatsApp.
While checking the configuration of the application, correctly and automatically cloned from my old phone, I came across the standard status that WhatsApp places for you:

The standard phrase is Hey there! I'm using WhatsApp!.
I hate this automatically placed sentences, so I was trying to thin about something different, and then I decided that I did not want something different because, after all, I don’t think many people spend their time reading your status.
And then, I decided to let the world know I’m using PostgreSQL:

It’s not a very big contribution, but it is a just a quick and easy way to let the world know about PostgreSQL.
If you like PostrgreSQL and the idea, please update your status too!

pgenv: get to know your logs

2020-08-28T00:00:00+00:00

I’ve added a couple of very minimalistic features to pgenv.

pgenv: get to know your logs

In these days a work of mine, related to PostgreSQL, is going to be tested. One quick way to get a fully functional PostgreSQL instance is to use pgenv.
However, one user asked me how to find out quickly the problem why pgenv was unable to start his own cluster.

Do your homework and read the logs! is the correct answer to the problem.
The you realize that part of your aim is to help people embracing the technology, so why should not pgenv try to teach the user to do so?

And here are two very small and ridiculous features that could help some user to get used to learn the basis of every problem solving, especially with PostgreSQL.

A quick look at the logs when things go wrong

The first problem is that when the cluster does not start, for any reason, pgenv correctly tells you to examine the logs.
End of the story.
That means that you have to mangle your logs thru your own favourite tool, even if you are an experienced database and system administrator. I’m lazy, so let’s pgenv provide me a quick hint:

% pgenv start

PostgreSQL 12.1 NOT started, examine logs in /home/luca/git/misc/PostgreSQL/pgenv/pgsql/data/server.log

Following are the last 5 lines of the log, as a quick hint:
2020-08-28 03:29:39.343 CEST [13046] LOG:  could not bind IPv4 address "127.0.0.1": Address already in use
2020-08-28 03:29:39.343 CEST [13046] HINT:  Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2020-08-28 03:29:39.343 CEST [13046] WARNING:  could not create listen socket for "localhost"
2020-08-28 03:29:39.343 CEST [13046] FATAL:  could not create any TCP/IP sockets
2020-08-28 03:29:39.343 CEST [13046] LOG:  database system is shut down

It is that simple: if something goes wrong, pgenv shows me the last bunch of lines of the logs. If I’m lucky, I will see the problem without having to manually type another command to dig into the logs (in the above, another cluster or process is holding the TCP/IP port 5432).
There is no black magic here: tail is used to the rescue!

Show me my logs

What if you don’t remember where pgenv is storing your logs and want to see them to mail or ask for help?
Here comes the new log command, that in turns invokes tail on the logs (assuming there is one log!). The beauty of using tail is that it becomes very simple to support every other flag tail does support, doing therefore “complex” log analysis.
So, in the case you want your logs:

% pgenv log
Dumping the content of /home/luca/git/misc/PostgreSQL/pgenv/pgsql/data/server.log 

2020-08-28 03:29:19.903 CEST [11867] LOG:  aborting any active transactions
2020-08-28 03:29:19.905 CEST [11867] LOG:  background worker "logical replication launcher" (PID 11874) exited with exit code 1
2020-08-28 03:29:19.906 CEST [11869] LOG:  shutting down
2020-08-28 03:29:19.922 CEST [11867] LOG:  database system is shut down
2020-08-28 03:29:39.342 CEST [13046] LOG:  starting PostgreSQL 12.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0, 64-bit
2020-08-28 03:29:39.343 CEST [13046] LOG:  could not bind IPv4 address "127.0.0.1": Address already in use
2020-08-28 03:29:39.343 CEST [13046] HINT:  Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2020-08-28 03:29:39.343 CEST [13046] WARNING:  could not create listen socket for "localhost"
2020-08-28 03:29:39.343 CEST [13046] FATAL:  could not create any TCP/IP sockets
2020-08-28 03:29:39.343 CEST [13046] LOG:  database system is shut down

and in the case you want something different:

% pgenv log -n 3 -f           
Dumping the content of /home/luca/git/misc/PostgreSQL/pgenv/pgsql/data/server.log 

2020-08-28 03:29:39.343 CEST [13046] WARNING:  could not create listen socket for "localhost"
2020-08-28 03:29:39.343 CEST [13046] FATAL:  could not create any TCP/IP sockets
2020-08-28 03:29:39.343 CEST [13046] LOG:  database system is shut down

that prints the last three lines and waits for new logs to be displayed.

Conclusions

The new pgenv functionalities are just toys, but I hope they can help people approaching this project that can really help, in turn, to get a cluster up and running.

Who needs comments?

2020-08-18T00:00:00+00:00

Who cares about comments? Because you can certainly read your database schema, right?

Who needs comments?

My friend and colleague Enrico told me about one of those hidden features of pg_dump: --no-comments.

The option allows you to dump the database (or the part of it) without dumping any user defined comment, that is no comment on tables, data types, and nothing you placed with an explicit COMMENT ON statement.
This made me lough at firts: why should I don’t want comments on my dump? Are we still back in the ninenties where people thought that hiding information was a good strategy to ensure their job?
However, there are some cases I can think about where you don’t want comments. For example, some extensions use comments on objects to perform some magic, and pgaudit comes to mind. But it is not always true that you need to replicate the same configuration on another database, hence you should strip off the comments.

How `pg_dump` avoids comments

Having a quick look at pg_dump source code, the function dumpTableComment represent a good introduction to how the comments are dumped or not. In particular, in the very beginning of the function, you can find something like:

/* do nothing, if --no-comments is supplied */
if (dopt->no_comments)
	return;

/* Comments are SCHEMA not data */
if (dopt->dataOnly)
	return;

/* Search for comments associated with relation, using table */
ncomments = findComments(fout,
						 tbinfo->dobj.catId.tableoid,
						 tbinfo->dobj.catId.oid,
						 &comments);

If the --no-comments command line option is set (i.e., dopt->no_comments is true), the function returns immediatly since there is nothing to do.
Interestingly, if the user wants to dump only the data for the database, and not its schema, the comments are not dumped too. That’s quite obvious if you think about.
The findComments function is in charge of going to the storage to retrieve the comments, and it does in a strange way. It invokes, in turn, collectComments, that executes a query like the following one:

appendPQExpBufferStr(query, "SELECT description, classoid, objoid, objsubid "
						 "FROM pg_catalog.pg_description "
						 "ORDER BY classoid, objoid, objsubid");

Do you see something strange there? There is no WHERE clause in the query! It does mean that the function is going to get all the comments from all the objects in the database, as reported also by the function comments preamble:

/*
 * collectComments --
 *
 * Construct a table of all comments available for database objects.
 * We used to do per-object queries for the comments, but it's much faster
 * to pull them all over at once, and on most databases the memory cost
 * isn't high.
 *
 * The table is sorted by classoid/objid/objsubid for speed in lookup.
 */

The idea is that all the comments are retrieved on a single pass, and then findComments performs a kind of binary search to find out the exact range of comments that match the object it is dumping at that moment (i.e., the table).

PostgreSQL 13 Explain now includes WAL information

2020-07-27T00:00:00+00:00

The upcoming version of PostgreSQL now includes new information in the EXPLAIN output.

PostgreSQL 13 Explain now includes WAL information

The upcoming PostgreSQL 13 includes a lot of new features, as a very consolidated habit in every release. One interesting feature among the others is that EXPLAIN now supports a new WAL option (that requires ANALYZE to be set).
This new WAL feature allows EXPLAIN to provide information about the generated amount of WAL traffic. It is quite simple to see it in action:

testdb=> CREATE TABLE foo( i int generated always as identity, t text );

testdb=> EXPLAIN ( ANALYZE, WAL, FORMAT yaml ) 
         INSERT INTO foo( t )
         SELECT  md5( v::text )
         FROM generate_series( 1, 300000 ) v;  
         
                QUERY PLAN                
------------------------------------------
 - Plan:                                 +
     Node Type: "ModifyTable"            +
     Operation: "Insert"                 +
     Parallel Aware: false               +
     Relation Name: "foo"                +
     Alias: "foo"                        +
     Startup Cost: 0.00                  +
     Total Cost: 6000.00                 +
     Plan Rows: 300000                   +
     Plan Width: 36                      +
     Actual Startup Time: 508.168        +
     Actual Total Time: 508.168          +
     Actual Rows: 0                      +
     Actual Loops: 1                     +
     WAL Records: 309091                 +
     WAL FPI: 0                          +
     WAL Bytes: 28500009                 +
     ...

As you can see, the output of EXPLAIN now includes three new nodes:

WAL Records, as the name suggests, is the number of WAL records inserted into the logs;
WAL FPI is the number of the Full Page Images inserted into the WALs;
WAL bytes is the amount of traffic generated towards the WAL logs.

The number of WAL records does not match exactly the number of tuple inserted by the query, clearly, but it is equal or greater. You can check this with a small number of inserts:

```sql testdb=> EXPLAIN ( ANALYZE, WAL, FORMAT yaml ) INSERT INTO foo( t ) SELECT md5( v::text ) FROM generate_series( 1, 3 ) v;

            QUERY PLAN                 ------------------------------------------  - Plan:                                 +
 Node Type: "ModifyTable"            +
 Operation: "Insert"                 +
 ...
 WAL Records: 3                      +
 WAL FPI: 0                          +
 WAL Bytes: 276                      +
 ... ``** <br/> <br/>

I think this can help in understanding the amount of traffic passing thru the WALs, and therefore helping in configuring properly also the checkpoint related settings in a more aggressive way.
auto_explain does support WAL information dump too, via the special configuration parameter auto_explain.log_wal.

PostgreSQL 13 Beta 2: it's your time to help testing!

2020-07-13T00:00:00+00:00

We are approaching the great 13 release, help the team testing it!

PostgreSQL 13 Beta 2: it’s your time to help testing!

We are approaching very quickly (and on time) the PostgreSQL 13 version, and we all can help testing it to provide a feedback and get ready for the next version.
As I’ve often written, a very easy approach to install and test the new version along side the version you are using (but not in production!) is by means of pgenv.
The only thing you have to do is pgenv build 13beta2, or if you are more curious:

luca@miguel ~ % pgenv available 13
             Available PostgreSQL Versions
========================================================

                     PostgreSQL 13
    ------------------------------------------------
     13beta1  13beta2 

luca@miguel ~ % pgenv build 13beta2
...
PostgreSQL, contrib, and documentation installation complete.
pgenv configuration written to file /home/luca/git/pgenv/.pgenv.13beta2.conf
PostgreSQL 13beta2 built

Once the system has been compiled, you can start it and use it:

luca@miguel ~ % pgenv use 13beta2           

WARNING:
  your PATH enrvironemnt variable does not seem to include

       /home/luca/git/pgenv/pgsql/bin

  as an entry. You will not be able to use the currently
  selected PostgreSQL binaries.

HINT:
  adjust your PATH variable to include

  /home/luca/git/pgenv/pgsql/bin

  for instance

  export PATH=/home/luca/git/pgenv/pgsql/bin:$PATH

Already using PostgreSQL 13beta2
waiting for server to start.... done
server started
PostgreSQL 13beta2 started
Logging to /home/luca/git/pgenv/pgsql/data/server.log

As the pgenv output suggests, it is better to modify your PATH to get the new executables:

luca@miguel ~ % export PATH=/home/luca/git/pgenv/pgsql/bin:$PATH

and you can make this a permanent change in your shell configuration.

Now, let’s connect to the cluster:

luca@miguel ~ % psql -U postgres template1 -c 'SHOW server_version;'
 server_version 
----------------
 13beta2
(1 row)

luca@miguel ~ % psql -U postgres template1 -c 'SELECT version();'   
                                                  version                                                   
------------------------------------------------------------------------------------------------------------
 PostgreSQL 13beta2 on x86_64-unknown-freebsd12.1, compiled by gcc (FreeBSD Ports Collection) 9.2.0, 64-bit
(1 row)

Happy testing!

replace vs regexp_replace

2020-07-09T00:00:00+00:00

Some considerations about the usage of replace or regexp_replace.

replace vs regexp_replace

While trying to help Stefan Stefanov with his pg_spreadsheetml I came across something that would have been obvious, but not too much to me.
The obvious thing is replace is generally faster than regexp_replace.
The fact is that, probably due to my heavy usage of Perl and Raku, I tend to use regular expressions even where they are not really required, and that is why I tried to change a nested invocation of replace into one of regexp_replace. The pull request, and in particular the commit did transform something like:

replace(replace(replace(s, '&', '&amp;'), '>', '&gt;'), '<', '&lt;');

into something like

regexp_replace( regexp_replace( regexp_replace( s, '&', '&amp;', 'g' )
                                , '>'
                                , '&gt;'
                                , 'g' )
                            , '<'
                            , '&lt;'
                            , 'g' );

Now, despite the newlines, the usage of regexp_replace resulted in slower code. So we decided to benchmark, and I decided in particular to test it with pgbench.

Testing with `pgbench`

I created three sql scripts that essentially do the following:

loop from 1 to the :scale;
build a single XML piece of code with a sligthly different content to avoid caching;
perform the substitution in three different ways
- with replace
- with regexp_replace
- with regexp_replace and backreferences
store the results with timing (clock_timestamp()) into a table for later analysis.

I did run the tests in a way similar to the following:

% pgbench -s 300000 -f benchmark_regexp_replace_compact.sql -U luca testdb

and at the end I asked to get the result for the type of test.

Results

Getting the results is quite straightforward, and on my PostgreSQL 12.2 I got:

testdb=> SELECT replacement_type, avg( ms ), min( ms ), max( ms ) 
         FROM benchmark_replace GROUP BY replacement_type;
         
    replacement_type    |          avg           | min |   max    
------------------------|------------------------|-----|----------
 regexp_replace         | 2.0656612333436503e-05 |   0 | 0.039055
 regexp_replace_compact | 0.00018001079899881362 |   0 |  0.06716
 replace                |  4.885953333294914e-06 |   0 | 0.027875
(3 rows)

that clearly show how replace is ten times faster than regexp_replace that in turns, is roughly ten time faster that a regexp_replace with backreferences, as you could expect (even if I was hoping for a lower difference due to a minor number of invocations of the function).
It is also interesting that the maximum times pretty much are 200% of the previous best case.

Conclusions

Even if the presented approach cannot be considered a good benchmarking, it does emphasizes how it is important to use the simplest function available for the task, in this case replace when you don’t need to do a regular expression magic.

PostgreSQL 12 Coin (by PGUS)

2020-07-09T00:00:00+00:00

A very nice surpise in the mail!

PostgreSQL 12 Coin (by PGUS)

A couple of weeks ago I got a very nice surprise in the mail, but I was not able to write about it due to my eyes problems and current situation (a very few details can be found here).
Anyway, promoting PostgreSQL is important, so here I am to tell you about what I received.

Long story short: I received a PostgreSQL 12 celebrative coin!

The envelope

First of all, I clearly recognized the name on the envelope: Mark Wong is the treasurer of the PostgreSQL US organization, namely PGUS.

The content

The content of the evenlope was an amazing PostgreSQL 12 coin and a couple of the PostgreSQL 12 press kit. I’ve took a picture of the coin near a pen, in order to let you understand the size.

The mission

PostgreSQL has been an important part of my life so far.
I’m not a developer, but I started using it and being productive with it at work.
Then I organized conferences, seminars and co-funded the Italian PostgreSQL Users’ Group (ITPUG), from which I literally escaped in 2016 due to clashes with the management.
Today I’m a PostgreSQL consultant.
Therefore, I can say that PostgreSQL has always been a part, even if sometime marginal, of my working career. But this is not my mission, nor is that of the community.
Our mission, as volounteers, is to improve PostgreSQL depending on our capabilities and to spread the word, to let other professional and passionate people embrace this database and get inspired by it.

This coin is an important sign, as other promotional material, in doing our mission and reminds me (and us) how important it is to make PostgreSQL a famous product even with non-technical stuff.

2019-10-03

It is interesting to note that the very next days of the release of PostgreSQL 12 I was in Rome, doing a professional course on PostgreSQL (of course!). If my memory serves me well, it was October the 7th.

PostgreSQL 13

No, it’s not a typo: PostgreSQL 13 is almost here, but I’m equally glad to have this piece of art.

Conclusions

I would like to thank the PGUS for sending me the coin, that I take as both a reminder on how important is to promote PostgreSQL and that I could have done some of the mission to promote it and get awarded with this, even if the award could be completly made up in my mind (so please let me believe I’m right!).

Being this a very hard time of my life, due to my eyes, it is somehow a relief knowing, or better being reminded, that we are all part of an amazing community.

ORA-2449 and the Constraint Dependencies

2020-06-18T00:00:00+00:00

What happens if you try to drop a table that is referenced by another table?

ORA-2449 and the Constraint Dependencies

Oracle clients seems somehow a little goofy when you have to deal with dependencies.
Imagine you have two tables, a that references table b; you can generate the tables as follows:

SQL> CREATE TABLE a( pk int PRIMARY KEY );

Table created.

SQL> CREATE TABLE b( pk int PRIMARY KEY );

Table created.

SQL> ALTER TABLE a ADD( b_ref int REFERENCES b(pk) );

Table altered.

SQL> COMMIT;

Commit complete.

As you can see, the tables are empty, there is no effective data but there is a clear reference made by the foreign key b_ref that connects table a to table b.
So far, so good!
Now, let’s try to delete table b, on which a depends on:

SQL> DROP TABLE b;
DROP TABLE b
           *
ERROR at line 1:
ORA-02449: unique/primary keys in table referenced by foreign keys

Great! Oracle, as we are expecting, is telling us that we cannot drop the refenced table unless we remove the dependency from the dependent object.
However, please note how Oracle is not telling us what dependency is preventing us from dropping the table!
The situation is pretty much the same if you execute the SQL Developer client:

Please note how the SQL Developer warning dialog is even suggesting us to execute a query against the Oracle catalogs to see which constraints are making the DROP fail. Not only, Oracle SQL Developer is so lazy to not even complete the query for us: instead of placing the table name in the query and presenting us a copy-and-paste ready statement, it tells us to execute something like SELECT ... WHERE TABLE_NAME = 'tabname'.
I’m pretty sure someone at least one time has executed a query searching a table named tabname!

Why is not Oracle giving us an hint about the constraints?

Being used to PostgreSQL, I can say that this should be correct behavior. In fact, if you try this in PostgreSQL you get a clear warning about which constraint is preventing you to delete the table:

testdb=> CREATE TABLE a( pk serial primary key );
CREATE TABLE
testdb=> CREATE TABLE b( pk serial primary key );
CREATE TABLE
testdb=> alter table a add b_ref int references b(pk);
ALTER TABLE
testdb=> drop table b;
ERROR:  cannot drop table b because other objects depend on it
DETAIL:  constraint a_b_ref_fkey on table a depends on table b
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

In particular, the message constraint a_b_ref_fkey on table a depends on table b gives us a really clear explaination of what we should search for to fix the “problem”.
Not only, PostgreSQL is reminding us that, if we want to quickly get rid of the table, we can use the DROP...CASCADE statement to force PostgreSQL to take action.

PostgreSQL 11 Server Side Programming Errata Corrige

2020-06-17T00:00:00+00:00

A reader provided us a feedback about a wrong listing.

PostgreSQL 11 Server Side Programming Errata Corrige

I have already written about how my first book on PostgreSQL, named PostgreSQL 11 Server Side Programming Quick Start Guide, gained more attention.

Gaining attention also means that readers could find out problems and errors, and this is good (to me)!

The first problem that has been reported to me is described here, so that if you are reading the book can better understand and deal with the problem.

Listing 8 on Chapter 3

The Listing 8 in chapter 3 is wrong, and in particular it is the very same listing as Listing 13 later in the chapter. The problem is that the shown listing 8 does not include a variable, namely file_type, that is referenced in the text.
Therefore, if you are dealing with that particular example, please consider that the right listing is reported on the official GitHub repository.

I’m really sorry about the misplaced listing, I hope this can help making it more readable.

Running pgbackrest on FreeBSD

2020-06-12T00:00:00+00:00

pgbackrest is an amazing backup solution for PostgreSQL, quite frankly it is my favourite. And now fully supports FreeBSD too!

Running pgbackrest on FreeBSD

pgbackrest is an amazing tool for backup and recovery of a PostgreSQL database. Quite frankly, it is my favourite backup solution because it is reliable, fast and supports a lot of interesting features including retention policies and encryption.
I have already written about some problems in running pgbackrest on FreeBSD, and the problem were not related to the application itself, rather to the compilation process.
I’m really glad that now pgbackrest fully supports non-Linux platforms, including FreeBSD, thanks to the changes in the compilation approach. It is therefore a simple process to get pgbackrest installed on your FreeBSD machine!

Installing pgbackrest on FreeBSD

In order to see how simple it is now to install pgbackrest on FreeBSD, let’s download the latest stable release, the 2.27 one, and install it. The only advice is that the project needs to be compiled with GNU make, that means you have to digit gmake inestead of usual make:

% wget https://github.com/pgbackrest/pgbackrest/archive/release/2.27.tar.gz
% tar xzvf 2.27.tar.gz
% cd pgbackrest-release-2.27

% cd src
% ./configure --prefix=/usr/local/pgbackrest
% gmake
% sudo gmake install

I’ve decided to install it on a specific path, /usr/local/pgbackrest just to avoid messing with other binaries, but you can install in the default FreeBSD location /usr/local/. If everything was succesful, you can then proceed to testing the program:

% export PATH=/usr/local/pgbackrest/bin:$PATH

% pgbackrest
pgBackRest 2.27 - General help

Usage:
    pgbackrest [options] [command]

Commands:
    archive-get     Get a WAL segment from the archive.
    archive-push    Push a WAL segment to the archive.
    backup          Backup a database cluster.
    check           Check the configuration.
    expire          Expire backups that exceed retention.
    help            Get help.
    info            Retrieve information about backups.
    restore         Restore a database cluster.
    stanza-create   Create the required stanza data.
    stanza-delete   Delete a stanza.
    stanza-upgrade  Upgrade a stanza.
    start           Allow pgBackRest processes to run.
    stop            Stop pgBackRest processes from running.
    version         Get version.

Use 'pgbackrest help [command]' for more information.

Great! Installing on FreeBSD is now really simple!

Some recent history about pgbackrest

In the last few month the porject was deply improved, and I’m not going to quote the whole release history here. However, there are two major aspects that I found really interesting.

Autoconf

As you probably have noted in the above installation example, pgbackrest now uses autoconf to understand how to correctly configure the project for the hosting operating system. Autoconf was introduced in the previous year as a reaction to a pull request I opened to compile on FreeBSD.

Migrating to C

pgbackrest was initially developed mainly in Perl, with little parts written in C to deal with performances and internals of PostgreSQL WAL files format.
As of January 2020, release 2.21, the whole codebase is in C. Well, this is not fully true, since the testing and documentation part is still written in Perl, at least to my understanding, but the whole pgbackrest production thing is now in C.
The fact that the application is now written in C makes a clear distinction between pgbackrest and other similar backup solutions, that indeed take advantages of existing tools to behave as “glue” between small pieces. Moreover, it means that the backup, and most notably the restore, can run at full speed.

My little messy contribution

A long time ago… I tried to contribute to a requested feature that sounded very easy to implement, and of course it was not!
Since version 2.25 there is the --dry-run flag for the expire command:

Add –dry-run option to the expire command. Use dry-run to see which backups/archive would be removed by the expire command without actually removing anything. (Contributed by Cynthia Shang, Luca Ferrari. Reviewed by David Steele. Suggested by Marc Cousin.)

Unluckily, I was unable to complete the effort because I was unable to use the testing system, and it was my fault, I underestimated the problem. But there are two very good news about this:

the project provide me a very quick, polite and constant support in trying to fix my issues;
they required me to test my changes instead of doing the testing by themselves.

Why are the above good news? First of all, other projcets are not so reactive when new contributions come, and I think this is very important for the project health. Second, testing a feature means that the project will not introduce regressions, and forcing every developer to test their own changes is a very good habit.

Conclusions

I have already used pgbackrest on FreeBSD, but now that it is *natively** supporting this platform I believe that the project will attrac more and more users. Moreover, now that all the code has been converted to C, the already optimal performances will be much more impressive.

pgbackrest is definetely my backup solution of choice, and not only for its features, but also for the clean and rigorous way the project is mantained and improved.

Locating the PostgreSQL configuration file

2020-06-08T00:00:00+00:00

How to find the PostgreSQL configuration file on an unknown system?

Locating the PostgreSQL configuration file

Sometimes you get to manage a PostgreSQL instance on an unknown system, and this means you don’t know how to locate the PostgreSQL configuration file.
An example could be when you are running PostgreSQL on a Docker container:

root@ff20ff72ee64:/# ps -auxwww
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres     1  0.0  0.8 288596 18076 ?        Ss   16:05   0:00 postgres
postgres    22  0.0  0.1 288732  3860 ?        Ss   16:05   0:00 postgres: checkpointer  
postgres    23  0.0  0.1 288596  3096 ?        Ss   16:05   0:00 postgres: background writer  
postgres    24  0.0  0.3 288596  6260 ?        Ss   16:05   0:00 postgres: walwriter  
postgres    25  0.0  0.1 289024  3068 ?        Ss   16:05   0:00 postgres: autovacuum launcher  
postgres    26  0.0  0.1 143856  2264 ?        Ss   16:05   0:00 postgres: stats collector  
postgres    27  0.0  0.1 288884  2564 ?        Ss   16:05   0:00 postgres: logical replication launcher  
root        70  0.0  0.1  19856  2236 pts/0    Ss   16:12   0:00 /bin/bash

Assuming you have the credentials of a PostgreSQL super user, you can ask PostgreSQL itself:

root@ff20ff72ee64:/# psql -U postgres -c 'SHOW config_file'        
               config_file                
------------------------------------------
 /var/lib/postgresql/data/postgresql.conf
(1 row)

You should have the administrator user credentials, since you have been assigned to manage this PostgreSQL instance!

Of course, you can have your credentials stored into the .pgpass file, as usual.
You can also save some extra “attaching” operations by macking docker do all the stuff for you:

$ docker exec -it db psql -U ckan -c 'SHOW config_file'
               config_file                
------------------------------------------
 /var/lib/postgresql/data/postgresql.conf
(1 row)

There are other methods, of course, including a search with find(1) or locate.

WAL, LSN and File Names

2020-05-28T00:00:00+00:00

Understanding the relationship between LSN and WAL file names.

WAL, LSN and File Names

PostgreSQL stores changes that is going to apply to data into the Write Ahead Logs (WALs), that usually are 16 MB each in size, even if you can configure your cluster (starting from version 11) to different sizes.
PostgreSQL knows at which part of the 16 MB file (named segment) it is by an offset that is tied to the Log Sequence Number (LSN). Let’s see those in action.
First of all, let’s get some information about the current status:

testdb=> SELECT pg_current_wal_lsn(),
          pg_walfile_name( pg_current_wal_lsn() );;
-[ RECORD 1 ]------|-------------------------
pg_current_wal_lsn | C/CE7BAD70
pg_walfile_name    | 000000010000000C000000CE

The server is currently using the WAL file named 000000010000000C000000CE. It is possible to see the relationship between the LSN, currently C/CE7BAD70 and the WAL file name as follows. The LSN is made up by three pieces: X/YYZZZZZZ where:

X represents the middle part of the WAL file name, one or two symbols;
YY represents the final part of the WAL file name;
ZZZZZZ are six symbols that represents the offset within the file name.
Therefore, given the LSN C/CE7BAD70 we can assume that the middle part of the WAL file name will be C and the last part will be CE, both zero padded to 8 symbols, so respectively 0000000C and 000000CE. Concatenated togehter, they provide us with a file name that ends with 0000000C000000CE. The initial part of the filename is still missing, and that is the timeline the server is running on, in this case 1, zero padded as the other parts, so 00000001 that provides us the final name 000000010000000C000000CE.
To summarize, the following is the correspondance between the single parts:

LSN  ->              C  /     CE      7BAD70
WAL  -> 00000001 0000000C 000000CE

Please consider that the above example is just to show you the concept, but it is better to use the function pg_walfile_name() to get the exact WAL file name from an LSN since WAL switch may lead to incorrect result from the LSN “manual decoding”.

The final part of the LSN is the offset within the WAL file, and it does suffice to convert it to int to get an idea:

testdb=> SELECT ( x'7BAD70' )::int AS offset;
-[ RECORD 1 ]---
offset | 8105328

You can get the same information with the special function pg_walfile_name_offset(), to which you can pass the LSN, and get the current filename and the offset in a single run:

testdb=> SELECT ( x'7BAD70' )::int AS offset_computed, pg_walfile_name_offset( 'C/CE7BAD70' );
-[ RECORD 1 ]----------|-----------------------------------
offset_computed        | 8105328
pg_walfile_name_offset | (000000010000000C000000CE,8105328)

To summarize, given a specific LSN the database is (and must be) clearly aware of the WAL file segment the LSN refers to and to the exact offset, within such file, where the data can be found.

Inspecting Command Tags and Events in event triggers

2020-05-26T00:00:00+00:00

Event triggers are a very powerful mechanism to react to data structure changes in PostgreSQL.

Inspecting Command Tags and Events in event triggers

While preparing an example for a course of mine about event triggers, I thought I’ve never proposed a catch-all event trigger debugging use case. So, here it is.
Event Triggers are a powerful mechanism that PostgreSQL provides to react to database schema changes, like table or column addition and deletion, object creation, and so on. The official documentatio already presents a couple of example about dropping objects or rewriting tables, so my little example is about more common commands. I create the following function:

CREATE OR REPLACE FUNCTION
f_event_trigger_demo()
RETURNS EVENT_TRIGGER
AS
$code$
DECLARE
event_tuple record;
BEGIN
   RAISE INFO 'Event trigger function called ';
   FOR event_tuple IN SELECT *
                      FROM pg_event_trigger_ddl_commands()  LOOP
                      RAISE INFO 'TAG [%] COMMAND [%]', event_tuple.command_tag, event_tuple.object_type;
   END LOOP;
END
$code$
LANGUAGE plpgsql;

It is quite simple to understand what it does: every time the function is triggered, it asks or the tuples out of the special function pg_event_trigger_ddl_commands(), that provides one tuple for every single command executed. Why multiple tuples? Because you could execute one command that explodes into different sub-commands.
Than, simply, the function does print the command tag and the object type.
Usually command tags are uppercase, while object types are lowercase.
The trigger can be created as follows:

testdb=# create event trigger tr_demo on ddl_command_end execute function f_event_trigger_demo();

It is now simple enough to test the trigger:

testdb=# create table foo();
INFO:  Event trigger function called 
INFO:  TAG [CREATE TABLE] COMMAND [table]
CREATE TABLE

testdb=# alter table foo add column i int default 0;
INFO:  Event trigger function called 
INFO:  TAG [ALTER TABLE] COMMAND [table]
ALTER TABLE

testdb=# create index idx_foo on foo(i);
INFO:  Event trigger function called 
INFO:  TAG [CREATE INDEX] COMMAND [index]
CREATE INDEX

testdb=# ALTER TABLE foo RENAME TO baz;
INFO:  Event trigger function called 
INFO:  TAG [ALTER TABLE] COMMAND [table]
ALTER TABLE

You can compare the output of the trigger function with the event trigger firing matrix to get an idea of what you can “catch”.
One last note: why have I attached the trigger to the ddl_command_end? Having a look at the event trigger firing matrix it looks like you can attach the trigger to either the ddl_command_start or ddl_command_end with the very same result, but the fact is that the function pg_event_trigger_ddl_command() works only on the end side of an event. The reason, as already explained, is that only approaching the end the system kows what a command has been exploded into.

Source Code

You can find the source code of the trigger function in my GitLab repository.

PostgreSQL 13 beta 1 on FreeBSD via pgenv

2020-05-26T00:00:00+00:00

It’s time to test the new PostgreSQL 13 release!

PostgreSQL 13 beta 1 on FreeBSD via pgenv

Five days ago PostgreSQL 13 beta 1 has been released!
It’s time to test the new awesome version of our beloved database. Installing from source is quite trivial, but why not using pgenv to such aim?
Installing on my FreeBSD machine with pgenv is as simple as:

luca@miguel ~ % pgenv build 13beta1
...
PostgreSQL, contrib, and documentation installation complete.
pgenv configuration written to file /home/luca/git/pgenv/.pgenv.13beta1.conf
PostgreSQL 13beta1 built

Are you ready to test it?
Activate it and enjoy:

luca@miguel ~ % pgenv use 13beta1
...
server started
PostgreSQL 13beta1 started
Logging to /home/luca/git/pgenv/pgsql/data/server.log


luca@miguel ~ % psql -U postgres -c "SELECT version();" template1
                                                  version                                                   
------------------------------------------------------------------------------------------------------------
 PostgreSQL 13beta1 on x86_64-unknown-freebsd12.1, compiled by gcc (FreeBSD Ports Collection) 9.2.0, 64-bit
(1 row)

Enjoy!

PostgresWeekly Interview

2020-04-24T00:00:00+00:00

I have been interviewed by PostgresWeekly.

PostgresWeekly Interview

Peter Cooper had written a small interview on me.
Postgres Weekly is a email roundup about PostgreSQL that is sent once per week, as the name implies, and provides a summary about what is happening and happened in the PostgreSQL ecosystem.
I think I’ve met Postgres Weekly one or two years ago, when I was contacted by the administrator that asked me to publish some of the contents I try to periodically write on PostgreSQL.

Sure! That’s why I have a blog after all!
And I have to enhance the above statement saying that I actually openened my first blog immediatly after the first italian PGDay.IT (2007) to the purpose of write about PostgreSQL (and other technologies), so PostgreSQL has been one of the most important reasons for me to have a blog!

Having being interviewed by Peter has been a pleasure, and he is a very good, nice and polite guy.

In this interview, for the fist time, I also tell about a very difficult part of my life that has nothing to do with PostgreSQL: long story short my eyes are turning off and there is no chance to rollback. I have to confess that this requires some brainpower to escape from nightmares.
That’s why my book, even if small and simple, has been a very important achievement for me. Pretty much the same feeling as when I was shooting at 90 meters with my bow ithout having any eye difficulty after a long period of repeated surgeries and medications.

But there is also something more happy in this interview, as the announcement that I am working to another book and we (because I’m writing with a friend of mine) have almot done half of the journey. You guess the subject!

PL/pgSQL Trends

2020-04-17T00:00:00+00:00

A graph that shows the trends of PL/pgSQL, according to Github.com.

PL/pgSQL Trends

I discovered this excellent graphing system that shows several programming language trends, according to Github.com.

So, let’s compare PL/pgSQL with SQL and plSQL:

As you can see, the interest in PL/pgSQL has grown grown a lot in the last days, and this is due to the success PostgreSQL (and therefore the language) has achieved, at least in my opinion. I’m not sure that the comparison with plSQL is correct, because this is tied to a proprietary database and chances are there is less material available on Github, at least on a general basis.

The trends are confirmed also with regard to the issues, pull requests and stars, with the only exception that plSQL overtook PL/pgSQL on pull requests around the year 2014.

That’s interesting, especially if you compare the trends with real programming language, I mean with multipurpose programming languages like Perl

and everything disappear if you compare against hype languages like Python or…ehm…Javascript:

Are we at the ZeroConference point?

2020-04-09T00:00:00+00:00

I’m seeing that around all the world the conferences are being cancelled. This is also true for PostgreSQL related conferences.

Are we at the ZeroConference point?

The situation around the planet is dramatic, to the point that our lifes have been deeply changed by the COVID-19. One, small, consequence of all the measures to avoid the prolification of COVID-19, is that a lot of conferences have been cancelled.

PostgreSQL conferences are no exclusion, and a lot of events are going to be cancelled or pushed to the video streaming mode.

Among the others, PGDay.IT has been cancelled, and I think someone should advice also on the planet, not only the mailing list.

I was hoping for a streamed conference, but the web site clearly states the conference is postponed to the next year.

I hope PostgreSQL conferences, as all other technical conferences, can be soon

PostgreSQL 11 Server Side Programming it's gaining attention

2020-04-08T00:00:00+00:00

My own first book on PostgreSQL is gaining more and more attention, and this is agood news.

PostgreSQL 11 Server Side Programming it’s gaining attention

I’m happy to say that my very first book on PostgreSQL, PostgreSQL 11 Server Side Programming Quick Start Guide, is gaining more and more attention and the statistics about it is increasing.

Quite frankly, the book sold one order of magnitude more than I was expecting and it is still raising even if PostgreSQL 12 is the latest major version and 13 is almost here.

I think the dramatical situation around europe, and more in general the world, namely COVID-19 is also responsible for this raising: since people is forced to stay at home, one that keep mind occupied is to read and study other topics and subjects.
Me myself have started reading technical books I never thought I would have read in normal conditions.

Anyway, back to the book, please consider that even if the title includes 11 as version, many of the examples and guidelines can be applied in newer versions of PostgreSQL. I have to admit I’m using many of the examples in my training events and courses (I can, of course, being the author).
The book uses Perl and Java as the main foreign languages, being of course PL/PgSQL the main native language for the application development.

I hope you can enjoy the book during this particular point in time.
And if you have some suggestions or errata-corridge, please advice and I will include credits and details in the book code repository.

Code Repository

The code repository with examples and other information is available on the official GitHub space and is also cloned into my GitLab repository so feel free to clone it from whatever is more comfortable to you!

PostgreSQL 12 Generated Columns: another use case

2020-03-02T00:00:00+00:00

When you start realizing how useful can be generated columns, you start using them as part of your workflow. Here there’s another story of mine in the adventures in PostgreSQL-land.

PostgreSQL 12 Generated Columns: another use case

I’ve already written about PostgreSQL 12 feature related to automatically generated columns{:target=”_blank”.
A few days ago I worked on a simple table that contains a single tuple for every file on a filesystem, including the file size and hash. Having the file hash provides a lot of practical analisys, including seeing how many times the file is replicated in the file system.
But what if I want to store such duplication information into the table?
One solution could be to add a column, and then run a long `UPDATE to update such column, then insert a trigger to catch every new table modifications.
Or, I can use generated columns!

The table structure

The table was structured as follows, and it is quite simple to understand:

testdb=> \d my_files
                       Table "public.my_files"
  Column   |          Type           | Collation | Nullable | Default 
-----------|-------------------------|-----------|----------|---------
 filename  | character varying(200)  |           |          | 
 directory | character varying(2048) |           |          | 
 md5sum    | character varying(128)  |           |          | 
 bytes     | integer                 |           |          | 

The file can be found in the filesystem in the position directory || filename (i.e., string concatenation). Every file has its own checksum (md5sum) and the size expressed in bytes.
Please note that this is a de-normalized schema, but it is a simple use case I have to work with so far.
The size of the table is quite normal:

testdb=> SELECT reltuples, relpages, pg_size_pretty( pg_relation_size( 'vace.my_files' ) ) FROM pg_class WHERE relname = 'my_files' AND relkind = 'r';
  reltuples   | relpages | pg_size_pretty 
--------------|----------|----------------
 1.872529e+06 |    40757 | 318 MB
(1 row)

Adding a generated column

Let’s add a new column to count the occurrencies of the file, that is how many times the file appears in the filesystem.
First of all, a new IMMUTABLE function must be generated:

CREATE FUNCTION f_count_occurrencies( md5sum_to_find text )
RETURNS bigint
AS
    $CODE$
      SELECT count(*)
      FROM   my_files
      WHERE  md5sum = md5sum_to_find;
    $CODE$
LANGUAGE sql
IMMUTABLE;

It is a very simple function: it does a count(*) of every tuple with a specific checksum.
There are two things to note: the function must return a bigint because so it does count() and, most notably, it must be marked as IMMUTABLE because it is what is required to use such function as the engine to compute the generated column values.
However, applying such a function did not complete within two hours!

testdb=> ALTER TABLE my_files 
         ADD COLUMN occurrencies int 
         GENERATED ALWAYS 
         AS ( f_count_occurrencies( md5sum ) ) STORED;


^CCancel request sent
ERROR:  canceling statement due to user request
CONTEXT:  SQL function "f_count_occurrencies" statement 1
Time: 8285400,145 ms (02:18:05,400)

Therefore I decided to create an index on the field md5sum and try it again:

testdb=> CREATE INDEX idx_md5sum ON my_files( md5sum );
CREATE INDEX
Time: 3016,571 ms (00:03,017)

testdb=> ALTER TABLE my_files 
         ADD COLUMN occurrencies int 
         GENERATED ALWAYS 
         AS ( f_count_occurrencies( md5sum ) ) STORED;
ALTER TABLE
Time: 120131,809 ms (02:00,132)

As you can see, this time it took two minutes to perform the update of the table structure with the automatically computed column, while before the creation of the index it had not finished within two hours.

The table does not occupy much more space than before:

testdb=> SELECT reltuples, relpages, pg_size_pretty( pg_relation_size( 'my_files' ) ) FROM pg_class WHERE relname = 'my_files' AND relkind = 'r';
  reltuples   | relpages | pg_size_pretty 
--------------|----------|----------------
 1.872529e+06 |    41492 | 324 MB
(1 row)

so with six extra megabytes we have now the information replicated on every row. The table increased of around 1.8% in size but make now computation about how much a file is replicated is straightforward.

WARNING!

edit 2020-03-04
As Adam Brusselback correctly pointd out in a comment to this blog post, adding the occurrencies column to the table does the job only if the table is immutable too, that is no more repeated files are added. In the case a file with an already existing md5sum is added to the table as a new entry, such last tuple will have the correct number of `occurencies, but other tuples will still get the last computed value.
**I didn’t mentioned in the beginning of this post that I was doing inspection and computations on an historical table, that is a table where new tuples are not added anymore.

A generated column cannot be based on a generate column

Once you get used to generated column, you simply want more.
Making a column that indicates, per file, how much disk space it consumes due to its replicated version seems easy, but it is a little tricky. A generated column cannot be based on another generated column, it would be a circular dependency or better, a dependency that PostgreSQL cannot solve (there should be a generation order and sooner or later you could end up with a circular dependency).
This means we cannot exploit the occurrencies column in the count of the disk space. In fact, let’s add a generated column based on that, so first let’s create a simple computation function:

CREATE FUNCTION f_compute_duplicated_size( bytes int, how_many_times int )
    RETURNS int
    AS $CODE$
      SELECT bytes * how_many_times;
    $CODE$
    LANGUAGE sql IMMUTABLE;

Note that the function exploits the generated column occurrencies, that is we are generating a column on the basis of another generated column, something PostgreSQL will avoid and in fact:

testdb=> ALTER TABLE my_files 
         ADD COLUMN duplication_bytes int 
         GENERATED ALWAYS 
         AS ( f_compute_duplicated_size( bytes, occurrencies ) ) STORED;
ERROR:  cannot use generated column "occurrencies" in column generation expression
DETAIL:  A generated column cannot reference another generated column.
Time: 17,900 ms

We need a trick to make the generated column indipendent from the other already generated one. Therefore, we can use a function like the following, that does not exploit any generated column:

CREATE FUNCTION f_compute_duplicated_size( md5sum_to_find text )
RETURNS bigint
AS $CODE$
    SELECT sum( bytes)
    FROM   my_files
    WHERE  md5sum = md5sum_to_find;
$CODE$
LANGUAGE sql IMMUTABLE;

and add the column, this time with success:

testdb=> ALTER TABLE my_files 
         ADD COLUMN duplication_bytes int 
         GENERATED ALWAYS 
         AS ( f_compute_duplicated_size( md5sum ) ) STORED;
ALTER TABLE
Time: 119696,310 ms (01:59,696)

Again, we are exploiting the md5sum index to keep the table modification at a rational speed.

More columns…more columns quick!

You get the point, it is now possible to enhance the table to get much more generated columns. Of course, the risk is to denomarlize the data more and more, so using this approach depends on what is your aim. Since I’m using this table as a workbench to make some simulations on a filesystem, I don’t care too much about the normalization of the data, rather I do care to be able to do simple queries and get the result.
So what’s next?
Let’s add a column to decide the file MIME type based on a very poor approach: the file extension! Of course, I do trust my filenames to be correct with respect to the relationship between the extension and the MIME type.
The approach is always the same: 1) create an immutable function; 2) alter the table.

The function I use exploits the regular expression engine:

CREATE FUNCTION f_compute_file_type( filename text )
RETURNS text
AS $CODE$
    SELECT  upper( ( regexp_match( trim( filename ),
                     '\.([a-zA-Z0-9]{3,4})$' ) )[ 1 ] );
$CODE$
LANGUAGE SQL IMMUTABLE;

The function returns the very last extension assuming it could be three or four characters (or digits). Then adding the column is boring:

testdb=> ALTER TABLE my_files 
         ADD COLUMN filetype text 
         GENERATED ALWAYS 
         AS ( f_compute_file_type( filename ) ) STORED;
ALTER TABLE
Time: 21452,153 ms (00:21,452)

What is the final effect on the table?

We added three generated columns in the table, what is the impact about the space?

testdb=> SELECT reltuples, relpages, pg_size_pretty( pg_relation_size( 'my_files' ) ) FROM pg_class WHERE relname = 'my_files' AND relkind = 'r';
  reltuples   | relpages | pg_size_pretty 
--------------|----------|----------------
 1.872529e+06 |    43704 | 341 MB
(1 row)

Therefore the table size has grown from 318 MB to 341 MB, meaning 7%. We have now a bigger table, to some extent de-normalized, but with a lot of more data to be used for analysis. Moreover, we can drop the index on md5sum since we could not need it anymore.

Generated Columns are not fixed!

Well, this could sound trivial, but the fact is that a generated column is not a compute-once column: the value of the column is updated every time its dependending-on columns are modified.
In the previous example you have seen that the filetype column depends on the filename one, but if we change the name to the file, for example because we don’t care anymore about the file extension, we are going to mess up also the filetype column.
The rule of thumb therefore is: if you need computed-once data use a materialized view (or a fixed column in the table), otherwise you can use generated columns.

Usage of disk space in Oracle and PostgreSQL: a simple use case

2020-02-24T00:00:00+00:00

A very non-scientific comparison about the two database engines.

Usage of disk space in Oracle and PostgreSQL

A few days ago I built a table in Oracle (11, if that matters) to store a few hundred megabytes of data. But I don’t feel at home using Oracle, so I decided to export the data and import it back in PostgreSQL 12.
Surprisingly, PostgreSQL requires more data space to store the same amount of data.

I’m not saying anything about who is the best, and I don’t know the exact reasons why this happens, however this is just what I’ve observed hpoing this can be useful to someone else!
So please don’t flame!

Table structure

The table is really simple, and holds data about files on a disk. It does not have even a key, since it is just data I must mangle and then throw away.

testdb=> \d my_schema.my_files
                        Table "my_schema.my_files"
  Column   |          Type           | Collation | Nullable | Default 
-----------|-------------------------|-----------|----------|---------
 filename  | character varying(200)  |           |          | 
 directory | character varying(2048) |           |          | 
 md5sum    | character varying(128)  |           |          | 
 bytes     | bigint                  |           |          | 

I’ve seen no changes in using text against a varchar, I used the latter just to be as similar as possible in the definition with Oracle.
The table is populated with 1872529 tuples (around 2 million tuples).

Oracle Disk Space

Oracle requires 312 MB to store the data:

 select segment_name,sum(bytes)/1024/1024 MB
    , count(segment_name)
    , blocks * 8192 / (1024 * 1024 )
    from user_segments
    where segment_type='TABLE'
    and segment_name=upper('MY_FILES')
    group by segment_name, blocks ;

The results of the above query are:

312 MB of data;
39936 blocks, that are something similar to PostgreSQL data pages.

The table has 110 extents, but I’m not sure how they account in the space compuation.

PostgreSQL Disk Space

The same data in PostgreSQL required 324 MB, so 12 MB more than Oracle, that is roughly 4% more of disk space. It is therefore possible to say that the overall space is pretty much the same:

testdb=> SELECT reltuples, relpages, 
         pg_size_pretty( pg_relation_size( 'my_schema.my_files' ) ) 
         FROM pg_class WHERE relname = 'my_files' AND relkind = 'r';
         
  reltuples   | relpages | pg_size_pretty 
--------------|----------|----------------
 1.872529e+06 |    41491 | 324 MB
(1 row)

Please note that fillfactor has been set to 100% and the table has been VACUUMed.

Counting Pages

What I can see, is that PostgreSQL uses 41491 data pages, while Oracle uses 39936, so 1555 less data pages. Again, that is roughly the same 4% we already saw on effective space, that lead me think the Oracle datapages have the same size as PostgreSQL.
In fact, asking for the datapage size:

SQL> show parameter db_block_size;

NAME          TYPE    VALUE 
------------- ------- ----- 
db_block_size integer 8192 

shows the same size as PostgreSQL.

From `NUMERIC` to `INT`

update of 2020-02-24
One possible difference between the two tables, is the NUMERIC data type used by Oracle. After inspecting the values, I’ve seen that the bytes column can be handled by an int4 (normal integer) value type, so I changed it in both Oracle and PostgreSQL. While in Oracle the size remained the same, 312 MB, in PostgreSQL the size shrinked down to 318 MB which is much more close to the Oracle one:

testdb=> ALTER TABLE vace.my_files ALTER COLUMN bytes SET DATA TYPE int;
ALTER TABLE
testdb=> vacuum full vace.my_files;
VACUUM
testdb=> SELECT reltuples, relpages, pg_size_pretty( pg_relation_size( 'vace.my_files' ) ) FROM pg_class WHERE relname = 'my_files' AND relkind = 'r';
  reltuples   | relpages | pg_size_pretty 
--------------|----------|----------------
 1.872529e+06 |    40757 | 318 MB
(1 row)

Conclusions

I really don’t have any. I know too little about Oracle storage to say why there is this difference in size, and I’m sure this is neither an advantage of Oracle nor a drawback of PostgreSQL.
I don’t even know if this is the default behavior for any use-case, I hardly think so, but it is interesting to know that even a simple use-case like this can require a little more space on disk.

Take advantage of pg_settings when dealing with your configuration

2020-02-13T00:00:00+00:00

The right way to get the current PostgreSQL configuration is by means of pg_settings.

Take advantage of pg_settings when dealing with your configuration

I often see messages on PostgreSQL related mailing list where the configuration is assumed by a Unix-style approach. For example, imagine you have been asked to provide your autovacuum configuration in order to see if there’s something wrong with it; one approach I often is the copy and paste of the following:

% sudo -u postgres grep autovacuum /postgres/12/postgresql.conf
#autovacuum_work_mem = -1               # min 1MB, or -1 to use maintenance_work_mem
#autovacuum = on                        # Enable autovacuum subprocess?  'on'
#log_autovacuum_min_duration = -1       # -1 disables, 0 logs all actions and
autovacuum_max_workers = 7             # max number of autovacuum subprocesses
autovacuum_naptime = 2min              # time between autovacuum runs
autovacuum_vacuum_threshold = 500       # min number of row updates before
autovacuum_analyze_threshold = 700      # min number of row updates before
#autovacuum_vacuum_scale_factor = 0.2   # fraction of table size before vacuum
#autovacuum_analyze_scale_factor = 0.1  # fraction of table size before analyze
#autovacuum_freeze_max_age = 200000000  # maximum XID age before forced vacuum
#autovacuum_multixact_freeze_max_age = 400000000        # maximum multixact age
#autovacuum_vacuum_cost_delay = 2ms     # default vacuum cost delay for
                                        # autovacuum, in milliseconds;
#autovacuum_vacuum_cost_limit = -1      # default vacuum cost limit for
                                        # autovacuum, -1 means use

While this could be a correct approach and makes it simply to provide a full set of configuration values, it has few drawbacks:

it produces verbose output (e.g., there are comments on the right of each line);
it could not be the whole story about the configuration, for example because something is in postgresql.conf.auto;
it does include commented out lines;
it could be not the configuration your cluster is running on.

Let’s examine all the drawbacks, one at a time.

Verbose Output

This is much annoyance than a real problem, but please consider that people on the other part of the world could have a screen resolution, line wrapping, or setting that makes it difficult to read verbose lines.

Could not be the whole truth about configuration

I often place my own PostgreSQL configuration into include_if_exists files, so that I leave the postgresql.conf file unchanged. Let’s call it a kind of FreeBSD configuration style!
This means that, in order to use a Unix approach to find a particular setting, I have to include in the search every single configuration file in every single location. This can be as simple as doing grep autovacuum *.conf* or much more complicated depending on your directory structure.
In any case, I could have omitted one single file, and that’s bad both for me and whoever is trying to help me.
Moreover, since ALTER SYSTEM is gaining more and more power, setting could also live into postgresql.conf.auto and people should begin used to check also such file.

It does include commented-out lines

Come on, who cares about that? After all commented-out lines means the value is at its default value.
And that could be the problem: do you remember all the default values for every single PostgreSQL version?
Therefore, don’t trust the default value, get the exact value!

It could not be the configuration you cluster is using

What if you modified the configuration file but edited the wrong context with regard to the action you did? May be you edited a postmaster context parameter and issued only a simple SIGHUP? What if you forgot to notify the cluster at all?
What if another administrator changed the parameters without telling you and scheduling a cluster notification at night? Yes, that really happened to me…
Again, get the real configuration!

How to get the real configuration

I’m glad you asked: pg_settings is there for you.
It is a matter of a single query, for example:

forumdb=> SELECT name, setting, pending_restart 
          FROM pg_settings 
          WHERE name like 'autovacuum%' 
          ORDER BY 1;
                name                 |  setting  | pending_restart 
-------------------------------------|-----------|-----------------
 autovacuum                          | on        | f
 autovacuum_analyze_scale_factor     | 0.1       | f
 autovacuum_analyze_threshold        | 50        | f
 autovacuum_freeze_max_age           | 200000000 | f
 autovacuum_max_workers              | 3         | f
 autovacuum_multixact_freeze_max_age | 400000000 | f
 autovacuum_naptime                  | 60        | f
 autovacuum_vacuum_cost_delay        | 2         | f
 autovacuum_vacuum_cost_limit        | -1        | f
 autovacuum_vacuum_scale_factor      | 0.2       | f
 autovacuum_vacuum_threshold         | 50        | f
 autovacuum_work_mem                 | -1        | f
(12 rows)

You can elaborate the query as you like, but the point is that you get exact values. In this particular example, as you can see, some values differs from what you get out of the configuration file. For example, autovacuum_max_worker has been set to 7 in the configuration file, but the database applies a value of 3.
Now you can inspect this problem too, and see if it has been caused from a cluster that has not been notified about configuration changes or an included file that overwrites your settings.

Conclusions

The configuration file is always only an hint about what your cluster is configured for, not the real thruth. When inspecting a configuration problem, the starting point to report even to others is the output of pg_settings.

Why Dropping a Column does not Reclaim Disk Space? (or better, why is it so fast?)

2020-02-09T00:00:00+00:00

You may have noticed how dropping a column is fast in PostgreSQL, haven’t you?

Why Dropping a Column does not Reclaim Disk Space? (or better, why is it so fast?)

Simple answer: because PostgreSQL knows how to do its job at best!

Let’s create a dummy table to test this behavior against:

testdb=> CREATE TABLE foo( i int );
CREATE TABLE

testdb=> INSERT INTO foo 
         SELECT generate_series( 1, 10000000 );
INSERT 0 10000000

testdb=> SELECT pg_size_pretty( pg_relation_size( 'foo' ) );
 pg_size_pretty 
----------------
 346 MB
(1 row)

Now, let’s add a quite large column to the table and measure how much time does it takes:

testdb=> \timing
Timing is on.

testdb=> ALTER TABLE foo 
         ADD COLUMN t text 
         DEFAULT md5( random()::text );
ALTER TABLE
Time: 30702,872 ms (00:30,703)

What happened? In nearly 31 secs the table has grown with random data on every row to the extent of 651 MB (almost the double of the original size):

testdb=> SELECT pg_size_pretty( pg_relation_size( 'foo' ) );
 pg_size_pretty 
----------------
 651 MB
(1 row)

What does PostgreSQL thinks about the columns in this table? Let’s query the pg_attribute catalog on all those attributes that are user-defined (i.e., attnum is a positive value) and inspect the attisdropped value that indicates if the column belongs or not to the table:

testdb=> SELECT attnum, attname, attisdropped 
         FROM pg_attribute a 
         JOIN pg_class c ON c.oid = a.attrelid 
         WHERE c.relname = 'foo' 
               AND c.relkind = 'r' 
               AND a.attnum > 0;
               
 attnum | attname | attisdropped 
--------|---------|--------------
      1 | i       | f
      2 | t       | f
(2 rows)

As you can see, both the columns foo.i and foo.t are valid in the table, that means they have not been dropped.

It is now time to drop the columns and see the results:

testdb=> ALTER TABLE foo DROP COLUMN t;
ALTER TABLE
Time: 20,237 ms

Pretty impressive, isn’t it?
We waited almost 31 seconds to add the new data and no one (20 milliseconds) to drop it away?
The documentation helps understanding it:

The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations. Subsequent insert and update operations in the table will store a null value for the column. Thus, dropping a column is quick but it will not immediately reduce the on-disk size of your table, as the space occupied by the dropped column is not reclaimed. The space will be reclaimed over time as existing rows are updated.

There is no reason to immediatly force a table rewrite, the DROP COLUMN invalidates the column so that is has disappeared logically but not physically. Let’s inspect the table and its attributes again:

testdb=> SELECT pg_size_pretty( pg_relation_size( 'foo' ) );
 pg_size_pretty 
----------------
 651 MB
(1 row)


testdb=> SELECT attnum, attname, attisdropped 
         FROM pg_attribute a 
         JOIN pg_class c ON c.oid = a.attrelid 
         WHERE c.relname = 'foo' 
               AND c.relkind = 'r' 
               AND a.attnum > 0;
               
 attnum |           attname            | attisdropped 
--------|------------------------------|--------------
      1 | i                            | f
      2 | ........pg.dropped.2........ | t
(2 rows)

The table size remained the same, but the t attribute has been renamed as ........pg.dropped.2........ and is now marked as dropped from the table (attisdropped = t).
Does that mean that it is possible from SQL to query the dropped column? No, this is not a recycle bin like mechanism:

testdb=> SELECT i, "........pg.dropped.2........" FROM foo limit 10;
ERROR:  column "........pg.dropped.2........" does not exist
LINE 1: SELECT i, "........pg.dropped.2........" FROM foo limit 10;

However, many of the properties of the column data type, such its length, are still in there into pg_attribute to allow the system to mangle that column even if the data type itself disappears.

Last, let’s fire a full table rewrite, for example with a VACUUM:

testdb=> VACUUM FULL foo;
VACUUM
Time: 20231,232 ms (00:20,231)

testdb=> SELECT pg_size_pretty( pg_relation_size( 'foo' ) ); pg_size_pretty 
----------------
 346 MB
(1 row)

Time: 1,519 ms
testdb=> SELECT attnum, attname, attisdropped 
         FROM pg_attribute a 
         JOIN pg_class c ON c.oid = a.attrelid 
         WHERE c.relname = 'foo' 
               AND c.relkind = 'r' 
               AND a.attnum > 0;
               
 attnum |           attname            | attisdropped 
--------|------------------------------|--------------
      1 | i                            | f
      2 | ........pg.dropped.2........ | t
(2 rows)

According to the time spent in VACUUM something good must be happened, and in fact the table space was reduced to the right (or better, the original) amount of space.
But why the dropped column is still mentioned in pg_attribute?
In this particular case it would have been dropped quite easily also from pg_attribute, but imagine a more complex tble where you drop a column in the middle of the attribute list: PostgreSQL would also have to rewrite all the attribute ordering with a quite expensive amount of work.
However, this approach has a potential drawback: being dropped attributes mentioned in pg_attribute as normal ones, they do count as table attributes and therefore could lower the number of real active attributes you can have in the table.

Conclusions

PostgreSQL way of dropping column is really fast because it involves a catalog update. But that also means disk space is not reclaimed, so in order to do that you need to trigger a full table rewrite.

Executing VACUUM by non-owner user

2020-02-06T00:00:00+00:00

VACUUM needs to be run by the object owner!

Executing VACUUM by non-owner user

The documentation about VACUUM clearly states it:

     To vacuum a table, one must ordinarily be the table's owner or a superuser. 
     However, database owners are allowed to vacuum all tables in their databases, 
     except shared catalogs. 
     [...]
     VACUUM cannot be executed inside a transaction block.

There is not an ACL flag about VACUUM, that means you cannot GRANT someone else to execute VACUUM.
Period.

Therefore there is no escape: in order to run VACUUM you must to be either (i) the object owner or (ii) the database owner or,as you can imagine, (iii) one of the cluster superuser(s).

Why am I insisting on this? Because some friends of mine argued that it is always possible to escape restrictions with functions an SECURITY DEFINER options. In this particular case, one could think to define a function that executes VACUUM, then apply the SECURITY DEFINER option so that the function will run as the object owner, and then provide (i.e., GRANT) execution permission to a normal user.
WRONG!
The fact that VACUUM cannot be executed within a transaction block means you cannot use such an approach, because a function is executed within a transaction block.
And if now you are asking yourself why VACUUM cannot be wrapped in a transaction block, just explain me how to ROLLBACK a VACUUM execution, it will be an interesting and fantasyland explaination!

So, what is going to happen if you define a VACUUM-function? Let’s quickly see what the database does:

CREATE OR REPLACE FUNCTION
do_vacuum( t text )
RETURNS VOID
AS $$
BEGIN
  EXECUTE 'VACUUM FULL VERBOSE '
  || quote_ident( t );
END
$$
LANGUAGE plpgsql;

This will not work, since VACUUM cannot be invoked by a function (have I already written this?):

testdb=> select do_vacuum( 'foo' );
ERROR:  VACUUM cannot be executed from a function
CONTEXT:  SQL statement "VACUUM FULL VERBOSE foo"
PL/pgSQL function do_vacuum(text) line 3 at EXECUTE

Changing the function into a procedure does not solve the problem, because VACUUM cannot be invoked by a function (have I already written this?):

testdb=> CREATE OR REPLACE PROCEDURE
 do_vacuum( t text )
 AS $$
 BEGIN
   EXECUTE 'VACUUM FULL VERBOSE '
   || quote_ident( t );
 END
 $$
 LANGUAGE plpgsql;
CREATE PROCEDURE

testdb=> CALL do_vacuum( 'foo' );
ERROR:  VACUUM cannot be executed from a function
CONTEXT:  SQL statement "VACUUM FULL VERBOSE foo"
PL/pgSQL function do_vacuum(text) line 3 at EXECUTE

Conclusions

VACUUM cannot be wrapped in a transaction nor a routine, therefore in order to execute it you must be a “special” user, with special simply meaning the owner, or the database owner, or a superuser.

PL/PgSQL Exception and XIDs

2020-02-05T00:00:00+00:00

A few considerations on how exceptions are handled in PL/PgSQL.

PL/PgSQL Exception and XIDs

I read the blog post The strange case of the EXCEPTION block where the author was claiming that an EXCEPTION block in a PL/PgSQL function was incrementing the transaction id (xid).
Somehow, this was not very surprising to me.
Why? That reminded me immediatly my own question on the general mailing list when I was observing a very similar behaviour within psql. In particular, this answer was illuminating:

   something is using subtransactions there.  
   My first guess would be that
   there are triggers with EXCEPTION blocks

My Guess About How Exceptions Are Handled

I think PL/PgSQL is using subtransactions (or savepoints) to handle exceptions.
Why?
Well, if you think about when you catch and exception you probably want to resume your execution, that is you must have a way to rollback your unit of work and start over again.

See Transactions in Action!

It is possible to inspect the transactions in action with a simple function and a table to abuse.
There is no need to play around with VACUUM FREEZE and age() as the original author says.
Let’s see the function:

CREATE OR REPLACE FUNCTION f_loop( b int DEFAULT 0, e int DEFAULT 10 )
  RETURNS VOID
  AS $$
  BEGIN
      RAISE DEBUG 'TXID of the function (here should not be assigned) function: % %',
                      txid_current_if_assigned(),
                      txid_status( txid_current_if_assigned() );
    FOR f IN b .. e LOOP
      BEGIN

        RAISE DEBUG 'Before INSERT of % TXID: %  SNAPSHOT: %',
                    f,
                    txid_current_if_assigned(),
                    txid_current_snapshot();

       INSERT INTO foo( i ) VALUES( f );


      RAISE DEBUG 'After INSERT of % TXID: %  SNAPSHOT: %',
                      f,
                      txid_current_if_assigned(),
                      txid_current_snapshot();

       EXCEPTION
        WHEN UNIQUE_VIOLATION
           THEN   RAISE DEBUG 'Exception for % TXID: %  SNAPSHOT: %',
                        f,
                        txid_current_if_assigned(),
                        txid_current_snapshot();

       END;
    END LOOP;
  END;
  $$
  LANGUAGE plpgsql;

The function accepts a begin and end indexes and loop thru every value between them, trying to insert the value into a table. At every step, including the exception, we inspect txid_current_if_assigned(), that reports the transaction ID (xid) and txid_current_snapshot() that provides the current snapshot, that means roughly the minimum and maximum xid this transaction is “flying” over.

The definition of the table is pretty straightforward: it has a single column with a UNIQUE constraint on it. That’s the constraint the function is going to violate.

CREATE TABLE foo ( i int PRIMARY KEY );

First Run: No Exceptions

Since the table is empty, inserting values from 1 to 10 does not produce any exception.

testdb=> SELECT f_loop( 1, 10 );  
DEBUG:  TXID of the function (here should not be assigned) function: <NULL> <NULL>
DEBUG:  Before INSERT of 1 TXID: <NULL>  SNAPSHOT: 4748:4748:
DEBUG:  After INSERT of 1 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  Before INSERT of 2 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  After INSERT of 2 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  Before INSERT of 3 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  After INSERT of 3 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  Before INSERT of 4 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  After INSERT of 4 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  Before INSERT of 5 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  After INSERT of 5 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  Before INSERT of 6 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  After INSERT of 6 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  Before INSERT of 7 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  After INSERT of 7 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  Before INSERT of 8 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  After INSERT of 8 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  Before INSERT of 9 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  After INSERT of 9 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  Before INSERT of 10 TXID: 4748  SNAPSHOT: 4748:4748:
DEBUG:  After INSERT of 10 TXID: 4748  SNAPSHOT: 4748:4748:
 f_loop 
--------
 
(1 row)

In the very first run the xid is NULL because the function has not (yet) modified anything. That’s why I use txid_current_if_assigned() instead of txid_current() to avoid wasting a number. Once the function starts modifying the data (i.e., after the very first INSERT) the transaction is promoted from virtual to concrete and so a xid is assigned.
Since no exception at all is raised, the xid of the function is fixed and so is the snapshot.

Second Run: Half of Exceptions

Let’s run it with some numbers overlapping, so that half of the values are inserted succesfully and half throw an exception.

testdb=> SELECT f_loop( 5, 15 ); 
DEBUG:  TXID of the function (here should not be assigned) function: <NULL> <NULL>
DEBUG:  Before INSERT of 5 TXID: <NULL>  SNAPSHOT: 4760:4760:
DEBUG:  Exception for 5 TXID: 4760  SNAPSHOT: 4760:4762:
DEBUG:  Before INSERT of 6 TXID: 4760  SNAPSHOT: 4760:4762:
DEBUG:  Exception for 6 TXID: 4760  SNAPSHOT: 4760:4763:
DEBUG:  Before INSERT of 7 TXID: 4760  SNAPSHOT: 4760:4763:
DEBUG:  Exception for 7 TXID: 4760  SNAPSHOT: 4760:4764:
DEBUG:  Before INSERT of 8 TXID: 4760  SNAPSHOT: 4760:4764:
DEBUG:  Exception for 8 TXID: 4760  SNAPSHOT: 4760:4765:
DEBUG:  Before INSERT of 9 TXID: 4760  SNAPSHOT: 4760:4765:
DEBUG:  Exception for 9 TXID: 4760  SNAPSHOT: 4760:4766:
DEBUG:  Before INSERT of 10 TXID: 4760  SNAPSHOT: 4760:4766:
DEBUG:  Exception for 10 TXID: 4760  SNAPSHOT: 4760:4767:
DEBUG:  Before INSERT of 11 TXID: 4760  SNAPSHOT: 4760:4767:
DEBUG:  After INSERT of 11 TXID: 4760  SNAPSHOT: 4760:4767:
DEBUG:  Before INSERT of 12 TXID: 4760  SNAPSHOT: 4760:4767:
DEBUG:  After INSERT of 12 TXID: 4760  SNAPSHOT: 4760:4767:
DEBUG:  Before INSERT of 13 TXID: 4760  SNAPSHOT: 4760:4767:
DEBUG:  After INSERT of 13 TXID: 4760  SNAPSHOT: 4760:4767:
DEBUG:  Before INSERT of 14 TXID: 4760  SNAPSHOT: 4760:4767:
DEBUG:  After INSERT of 14 TXID: 4760  SNAPSHOT: 4760:4767:
DEBUG:  Before INSERT of 15 TXID: 4760  SNAPSHOT: 4760:4767:
DEBUG:  After INSERT of 15 TXID: 4760  SNAPSHOT: 4760:4767:
 f_loop 
--------
 
(1 row)

As you can see, in the first five numbers there’s an exception reported. The xid of the function remains the same, but the snapshot grows by 6 transactions identifiers (one for the function, five for the subtransactions). After that, the remaining five values are succesfully inserted and so the snapshot does not grow anymore.

Where are these Subtransactions?

If you now inspect the MVCC values for the table, you can see that every value inserted has a different transaction id xmin, without any regard to the fact that the function call did catch an exception or not.

testdb=> SELECT xmin,xmax, cmin, cmax, * FROM foo;
 xmin | xmax | cmin | cmax | i  
------|------|------|------|----
 4749 |    0 |    0 |    0 |  1
 4750 |    0 |    1 |    1 |  2
 4751 |    0 |    2 |    2 |  3
 4752 |    0 |    3 |    3 |  4
 4753 |    0 |    4 |    4 |  5
 4754 |    0 |    5 |    5 |  6
 4755 |    0 |    6 |    6 |  7
 4756 |    0 |    7 |    7 |  8
 4757 |    0 |    8 |    8 |  9
 4758 |    0 |    9 |    9 | 10
 4767 |    0 |    6 |    6 | 11
 4768 |    0 |    7 |    7 | 12
 4769 |    0 |    8 |    8 | 13
 4770 |    0 |    9 |    9 | 14
 4771 |    0 |   10 |   10 | 15
(15 rows)
``**

### How to Simulate the Same Behavior

**Savepoints** do pretty much the same! Therefore, let's truncate the table and insert new values in it with an explicit transaction and savepoints:



```sql
testdb=> TRUNCATE foo;
TRUNCATE TABLE
testdb=> BEGIN;
BEGIN
testdb=> 
testdb=> INSERT INTO foo( i ) VALUES( 1 );
INSERT 0 1
testdb=> SAVEPOINT S1;
SAVEPOINT
testdb=> 
testdb=> INSERT INTO foo( i ) VALUES( 2 );
INSERT 0 1
testdb=> SAVEPOINT S2;
SAVEPOINT
testdb=> 
testdb=> INSERT INTO foo( i ) VALUES( 3 );
INSERT 0 1
testdb=> SAVEPOINT S3;
SAVEPOINT
testdb=> 
testdb=> COMMIT;
COMMIT
testdb=> SELECT xmin,xmax, cmin, cmax, * FROM foo;
 xmin | xmax | cmin | cmax | i 
------|------|------|------|---
 4779 |    0 |    0 |    0 | 1
 4780 |    0 |    1 |    1 | 2
 4781 |    0 |    2 |    2 | 3
(3 rows)

As you can see the xmin is incremented continuosly by every INSERT.

Conclusions

Exception are quite clearly implemented in PL/PgSQL (and possibly in other languages) by means of subtransactions. At least, the behavior is pretty much reproducible.

Checking catalogues for corruption with pg_catcheck

2020-01-30T00:00:00+00:00

I just discovered a new utility for checking the health of a cluster.

Checking catalogues for corruption with pg_catcheck

Today I discovered a nice tool from EnterpriseDB named pg_catcheck that aims at checking the health of the PostgreSQL catalogs.
As you know, if the catalogs are damaged, the database can quickly get confused and not allow you to use as you wish. Luckily, this is something does not happen very often, or rather I should say I think I’ve seen this happening only once during my career (and I don’t remember the cause).
While I’m not sure I would be able to fix any problem in the catalogues by myself, having a tool that helps me understanding if everything is fine is a relief!

Installing pg_catcheck on FreeBSD

You need to get it from the project repository. There is at the moment one official release, but let’s use the HEAD (after all, releases are for feeble people!).

% git clone https://github.com/EnterpriseDB/pg_catcheck.git
% cd pg_catcheck
% gmake
...
% sudo gmake install                                       
...

If everything works fine, you will end up with a program named pg_catcheck under /usr/local/bin.

Using pg_catcheck

As you can imagine, you need a database administrator to perform the check. The application supports pretty much the same options than psql to connect, and there’s an extra option --postgresql to indicate you are running against a vanilla PostgreSQL (on the other hand, with --enterprisedb the program will assume you are running an EnterpriseDB instance).

% pg_catcheck --postgresql -U postgres template1
progress: done (0 inconsistencies, 0 warnings, 0 errors)

That’s it, if you see 0 inconsinstencies your database is fine.

You can see what the program checks with the --verbose option:

% pg_catcheck --postgresql -U postgres  -v template1         
verbose: detected server version 120001
verbose: assuming PostgreSQL server
verbose: preloading table pg_authid because it is required in order to check pg_namespace
verbose: loading table pg_namespace
verbose: checking table pg_namespace (6 rows)
verbose: loading table pg_collation
verbose: checking table pg_collation (1110 rows)
verbose: loading table pg_tablespace
verbose: checking table pg_tablespace (2 rows)
verbose: loading table pg_language
verbose: checking table pg_language (4 rows)
verbose: loading table pg_database
verbose: checking table pg_database (4 rows)
verbose: loading table pg_largeobject_metadata
verbose: checking table pg_largeobject_metadata (0 rows)
verbose: loading table pg_publication
verbose: checking table pg_publication (0 rows)
verbose: loading table pg_subscription
verbose: checking table pg_subscription (0 rows)
verbose: loading table pg_default_acl
verbose: checking table pg_default_acl (0 rows)
verbose: loading table pg_largeobject
verbose: checking table pg_largeobject (0 rows)
verbose: loading table pg_db_role_setting
verbose: checking table pg_db_role_setting (0 rows)
verbose: loading table pg_auth_members
verbose: checking table pg_auth_members (8 rows)
verbose: preloading table pg_class because it is required in order to check pg_type
verbose: loading table pg_type
verbose: checking table pg_type (406 rows)
verbose: loading table pg_proc
verbose: checking table pg_proc (2960 rows)
verbose: loading table pg_operator
verbose: checking table pg_operator (770 rows)
verbose: loading table pg_ts_parser
verbose: checking table pg_ts_parser (1 rows)
verbose: loading table pg_ts_config
verbose: checking table pg_ts_config (22 rows)
verbose: loading table pg_ts_template
verbose: checking table pg_ts_template (5 rows)
verbose: loading table pg_ts_dict
verbose: checking table pg_ts_dict (22 rows)
verbose: loading table pg_foreign_data_wrapper
verbose: checking table pg_foreign_data_wrapper (0 rows)
verbose: loading table pg_foreign_server
verbose: checking table pg_foreign_server (0 rows)
verbose: loading table pg_cast
verbose: checking table pg_cast (216 rows)
verbose: loading table pg_conversion
verbose: checking table pg_conversion (132 rows)
verbose: loading table pg_extension
verbose: checking table pg_extension (1 rows)
verbose: loading table pg_enum
verbose: checking table pg_enum (0 rows)
verbose: loading table pg_user_mapping
verbose: checking table pg_user_mapping (0 rows)
verbose: loading table pg_event_trigger
verbose: checking table pg_event_trigger (0 rows)
verbose: loading table pg_rewrite
verbose: checking table pg_rewrite (126 rows)
verbose: loading table pg_attrdef
verbose: checking table pg_attrdef (0 rows)
verbose: loading table pg_policy
verbose: checking table pg_policy (0 rows)
verbose: loading table pg_publication_rel
verbose: checking table pg_publication_rel (0 rows)
verbose: loading table pg_statistic_ext
verbose: checking table pg_statistic_ext (0 rows)
verbose: loading table pg_transform
verbose: checking table pg_transform (0 rows)
verbose: loading table pg_attribute
verbose: checking table pg_attribute (2913 rows)
verbose: loading table pg_foreign_table
verbose: checking table pg_foreign_table (0 rows)
verbose: loading table pg_inherits
verbose: checking table pg_inherits (0 rows)
verbose: loading table pg_aggregate
verbose: checking table pg_aggregate (136 rows)
verbose: loading table pg_ts_config_map
verbose: checking table pg_ts_config_map (418 rows)
verbose: loading table pg_statistic
verbose: checking table pg_statistic (422 rows)
verbose: loading table pg_init_privs
verbose: checking table pg_init_privs (171 rows)
verbose: loading table pg_sequence
verbose: checking table pg_sequence (0 rows)
verbose: loading table pg_subscription_rel
verbose: checking table pg_subscription_rel (0 rows)
verbose: preloading table pg_am because it is required in order to check pg_opfamily
verbose: loading table pg_opfamily
verbose: checking table pg_opfamily (107 rows)
verbose: checking table pg_class (395 rows)
verbose: loading table pg_opclass
verbose: checking table pg_opclass (128 rows)
verbose: loading table pg_amop
verbose: checking table pg_amop (715 rows)
verbose: loading table pg_amproc
verbose: checking table pg_amproc (447 rows)
verbose: loading table pg_index
verbose: checking table pg_index (159 rows)
verbose: loading table pg_constraint
verbose: checking table pg_constraint (2 rows)
verbose: loading table pg_trigger
verbose: checking table pg_trigger (0 rows)
verbose: loading table pg_range
verbose: checking table pg_range (6 rows)
verbose: loading table pg_depend
verbose: checking table pg_depend (7601 rows)
verbose: loading table pg_shdepend
verbose: checking table pg_shdepend (16 rows)
verbose: loading table pg_description
verbose: checking table pg_description (4744 rows)
verbose: loading table pg_shdescription
verbose: checking table pg_shdescription (3 rows)
verbose: loading table pg_seclabel
verbose: checking table pg_seclabel (0 rows)
verbose: loading table pg_shseclabel
verbose: checking table pg_shseclabel (0 rows)
verbose: loading table pg_partitioned_table
verbose: checking table pg_partitioned_table (0 rows)
progress: done (0 inconsistencies, 0 warnings, 0 errors)

Thanks EnterpriseDB named pg_catcheck for making this tool open source!

PostgreSQL 12 EXPLAIN SETTINGS

2019-12-05T00:00:00+00:00

PostgreSQL 12 has a very interesting feature to turn on when doing an execution plan analysis.

PostgreSQL 12 EXPLAIN SETTINGS

PostgreSQL 12 has a new feature that can be turned on in the EXPLAIN output: SETTINGS. This option provides some information about all and only those parameters that can affect an execution plan if and only if they are not at the default setting.
What does it mean in practice? Let’s see an old plain EXPLAIN:

digikamdb=> EXPLAIN (FORMAT YAML)
            SELECT * FROM digikam_images 
            WHERE id IN ( SELECT id FROM digikam_images 
                          WHERE modificationdate = '2019-10-04' );  

the output is as follows:

 - Plan:                                                  +
     Node Type: "Nested Loop"                             +
     Parallel Aware: false                                +
     Join Type: "Inner"                                   +
     Startup Cost: 0.29                                   +
     Total Cost: 1737.95                                  +
     Plan Rows: 17                                        +
     Plan Width: 87                                       +
     Inner Unique: true                                   +
     Plans:                                               +
       - Node Type: "Seq Scan"                            +
         Parent Relationship: "Outer"                     +
         Parallel Aware: false                            +
         Relation Name: "digikam_images"                  +
         Alias: "digikam_images_1"                        +
         Startup Cost: 0.00                               +
         Total Cost: 1596.72                              +
         Plan Rows: 17                                    +
         Plan Width: 8                                    +
         Filter: "(modificationdate = '2019-10-04'::date)"+
       - Node Type: "Index Scan"                          +
         Parent Relationship: "Inner"                     +
         Parallel Aware: false                            +
         Scan Direction: "Forward"                        +
         Index Name: "idx_id"                             +
         Relation Name: "digikam_images"                  +
         Alias: "digikam_images"                          +
         Startup Cost: 0.29                               +
         Total Cost: 8.31                                 +
         Plan Rows: 1                                     +
         Plan Width: 87                                   +
         Index Cond: "(id = digikam_images_1.id)"

The output is quite long, as well as the query is intentionally stupid just to generate some kind of loop. Please note that I’m using yaml as an output format for better web impagination.

Let’s see SETTINGS in action, so change the EXPLAIN as follows:

digikamdb=> EXPLAIN (FORMAT YAML, SETTINGS ON)
            SELECT * FROM digikam_images 
            WHERE id IN ( SELECT id FROM digikam_images 
                          WHERE modificationdate = '2019-10-04' );  

that produces the very same output!
Why? Because nothing has changed, so nothing must be shown!
Now, let’s change a parameter or two:

digikamdb=> SET seq_page_cost TO 3;
digikamdb=> SET random_page_cost TO 1;

and see again the EXPLAIN in action:

digikamdb=> EXPLAIN (FORMAT YAML, SETTINGS ON)
            SELECT * FROM digikam_images 
            WHERE id IN ( SELECT id FROM digikam_images 
                          WHERE modificationdate = '2019-10-04' );  
...
 - Plan:                                                  +
     Node Type: "Nested Loop"                             +
     Parallel Aware: false                                +
     Join Type: "Inner"                                   +
     Startup Cost: 0.29                                   +
     Total Cost: 4353.95                                  +
     Plan Rows: 17                                        +
     Plan Width: 87                                       +
     Inner Unique: true                                   +
     Plans:                                               +
       - Node Type: "Seq Scan"                            +
         ...
       - Node Type: "Index Scan"                          +
         ...
   Settings:                                              +
     random_page_cost: "1"                                +
     seq_page_cost: "4"

As you can see, there is another section at the end of the output, titled Settings, that reminds us what parameters have changed and to which value they are currently.

In this way, it is possible to get an idea of why a plan is as it is, or at least we can remember that the system is running with different parameters.

Are all parameters affected?

Reading the documentation about SETTINGS one could think that only those parameters that are part of an access method are going to be reported on the output of EXPLAIN:

SETTINGS

    Include information on configuration parameters. 
    Specifically, include options affecting query planning 
    with value different from the built-in default value. 
    This parameter defaults to FALSE.

However, even parameters that are not going to change the query plan are displayed. For example, in selection all the tuples, there is no need to know that the random page cost has changed, but it is displayed anyway:

digikamdb=> RESET ALL;
digikamdb=> SET seq_page_cost TO 2;
digikamdb=> SET random_page_cost TO 1;
digikamdb=> EXPLAIN (SETTINGS ON) SELECT * FROM digikam_images;
                              QUERY PLAN                              
----------------------------------------------------------------------
 Seq Scan on digikam_images  (cost=0.00..2364.58 rows=55258 width=87)
 Settings: random_page_cost = '1', seq_page_cost = '2'

Which parameters?

There are different parameters, other than the trivial costs, that can be reported by SETTINGS section. An example is work_mem. Reading the commit ea569d64ac7174d3fe657e3e682d11053ecf1866 reveals that all the options marked in the source code with GUC_EXPLAIN are candidates to be printed.
So far, this resolves to the following long list, where I tried to mark as bold those that I usually configure (and I’ve seen touched by others):

enable_seqscan, enable_indexscan enable_indexonlyscan, enable_bitmapscan;
temp_buffers, work_mem;
max_parallel_workers_per_gather, max_parallel_workers, enable_gathermerge;
effective_cache_size;
min_parallel_table_scan_size, min_parallel_index_scan_size;
enable_parallel_append, enable_parallel_hash, enable_partition_pruning;
enable_nestloop, enable_mergejoin, enable_hashjoin;
enable_tidscan;
enable_sort;
enable_hashagg;
enable_material;
enable_partitionwise_join;
enable_partitionwise_aggregate;
geqo;
optimize_bounded_sort;
parallel_leader_participation;
jit;
from_collapse_limit;
join_collapse_limit;
geqo_threshold;
geqo_effort;
geqo_pool_size;
geqo_generations;
effective_io_concurrency;

What about `auto_explain`?

The new SETTINGS affects also the auto_explain tuning and output, and in fact there is a new GUC named auto_explain.log_settings that provides the same functionality as above for the auto_explain module.

Conclusions

The EXPLAIN (SETTINGS ON) new feature is something really cool in my opinion that pretty much every DBA should turn on when inspecting query execution plans.

PostgreSQL ascii logo for FreeBSD boot loader

2019-11-12T00:00:00+00:00

I spent some time making an elephant logo to be used as FreeBSD boot loader logo.

PostgreSQL ascii logo for FreeBSD boot loader

I use FreeBSD as my main PostgreSQL server, and also as virtual machine for training courses. A long time ago, I changed the message of the day (/etc/motd) to reflect the elephant logo in ascii-art, but why not changing also the booloader logo?
FreeBSD by default shows what is called orb or the devil (named beastie), and the new Lua based bootloader use some simple string concatenation to generate a logo. However, it was not so simple to make a new logo, since I’ve no idea about how to debug it production, and that forced me to a very long and repetitive *try and reboot** process to identify all the problems with my logos.
Last, I made it!
Now there are two available logos for the bootloader that provide both the black-and-white and the coloured elephant. Below you can see a couple of screenshoots:

How to use the PostgreSQL bootloader logo

In order to use one of the logos, you have to:

download the Lua script from my Github repository, within the logos directory you can find the files
- logo-postgresql.lua that is the coloured version of the logo;
- logo-postgresqlbw.lua that is the black-and-white version of the logo;
put the choosen file into the /boot/lua directory and provide read permissions;

edit your /boot/loader.conf and add the setting loader_logo depending on the chosen logo

# for the coloured version
loader_logo="postgresql"
# or for the black-and-white version
# loader_logo="postgresqlbw"

and of course, reboot!

Why `cyan` instead of `blue`?

You probably have noticed that the coloured elephant is made in cyan and not in the well known blue. The reason for that is that the console foreground blue is too dark to make the elephant appear. However, it is possible to manipulate the escape sequences in order to get a different color, but please note that for a reason I don’t know, highlighting colors (e.g., escape sequences like 94) are not working in the bootloader.

How to use the `/etc/motd` logo

In the logos directory there is also the motd file, or better, an example of message-of-the-day. Place it on your machine and customize at your wills.

What about the ascii art?

Credits to the ascii art go to Oleg Bartunov, even if I’m not able to find out anymore a message thread where he proposed the elephant logo. However, thanks also to Charles Clavadetscher that provided another version.

Feel free to contribute!

As usual, having posted the logos on my Github repository, any contribution and improvement is welcome.

PostgreSQL 12 Generated Columns

2019-11-04T00:00:00+00:00

PostgreSQL 12 provides support for automatically computed columns.

PostgreSQL 12 Generated Columns

PostgreSQL 12 introduces the generated columns, columns that are automatically computed depending on a generation expression.
The usage of generated columns is quite simple and can be summarized as follows:

the column must be annotated with the GENERATED ALWAYS AS (...) STORED instruction;
the expression in parentheses must use only IMMUTABLE functions and cannot use subqueries.
For more specific constraints, see the official documentation.

Please note I’ve indicated the STORED clause because at the moment PostgreSQL supports only that kind of columns: a STORED generated column is saved on disk storage as a normal column would be, the only difference is that you cannot modify it autonomously, the database will compute it for you.

You can think of a stored generated column as a trade-off between a table with a trigger and a materialized view. When the VIRTUAL (as opposed to STORED) will be implemented, the column will take no space at all and will be computed on each column access, something similar as a view.

An example of not-generated column

Let’s see generated columns in action: consider an ordinary table with a dependency between the age column and the birthday one, since the former can be computed from the values in the latter column:

testdb=> CREATE TABLE people( 
                  name text, 
                  birthday date, 
                  age int );

testdb=> WITH year AS ( 
   SELECT ( random() * 100 )::int % 70 AS y 
)
INSERT INTO people( name, age, birthday )
SELECT 'Person ' || v, y, current_date - ( y * 365 )
FROM generate_series(1, 1000000 ) v, year;

Let’s see how much space does it occupy to have such table filled with one million of rows:

testdb=> SELECT pg_relation_size( 'people' );
 pg_relation_size 
------------------
         52183040

An example with generated columns

In order to create a similar table where age is automatically computed.
Since the column must use an IMMUTABLE function, the first step is to abstract the computation into a function:

testdb=> CREATE OR REPLACE FUNCTION 
f_person_age( birthday date )
RETURNS int
AS $CODE$
BEGIN
    RETURN extract( year FROM CURRENT_DATE )
           - extract( year FROM birthday )
           + 1;
END
$CODE$
LANGUAGE plpgsql IMMUTABLE;

Then it is possible to create the table using the function as generation method:

testdb=> CREATE TABLE people_gc_stored ( 
      name text, 
      birthday date, 
      age int GENERATED ALWAYS AS ( f_person_age( birthday ) ) STORED
  );

If the table is filled in a similar way, the space occupied is the same:

testdb=> INSERT INTO people_gc_stored( name, birthday )
         SELECT 'Person ' || v, current_date - v 
         FROM generate_series(1, 1000000 ) v;

testdb=> SELECT pg_relation_size( 'people_gc_stored' );
 pg_relation_size 
------------------
         52183040

Why using a function in the generated column? Because if we place the real expression we got an error at creation time:

testdb=> CREATE TABLE people_gc_stored ( 
      name text, 
      birthday date, 
      age int GENERATED ALWAYS AS ( 
              extract( year FROM CURRENT_DATE ) 
              - extract( year FROM birthday ) 
              + 1 ) STORED
  );
  
ERROR:  generation expression is not immutable

Writing the generated column

As already written, the generated column is not writable once it has been computed:

testdb=> UPDATE people_gc_stored SET age = 40;
ERROR:  column "age" can only be updated to DEFAULT
DETAIL:  Column "age" is a generated column.

Querying the generated column

The generated column works and behaves as a normal column, that is access can be restricted or granted on such column:

testdb=# REVOKE ALL ON people_gc_stored FROM public;
testdb=# GRANT SELECT( name, age ) ON people_gc_stored TO harry;

Since user harry has access only on columns name and age, the user cannot see the dependency column:

testdb=> SELECT * FROM luca.people_gc_stored LIMIT 5;
ERROR:  permission denied for table people_gc_stored

testdb=> SELECT min( age ), max( age ) FROM luca.people_gc_stored;
 min | max  
-----|------
   1 | 2740
(1 row)

testdb=> SELECT min( birthday ), max( birthday ) FROM luca.people_gc_stored;
ERROR:  permission denied for table people_gc_stored

On the other hand, giving access only on birthday column does not automatically provide access on age:

testdb=# REVOKE SELECT ON people_gc_stored FROM harry;
testdb=# GRANT SELECT( name, birthday ) ON people_gc_stored TO harry;

testdb=> SELECT min( birthday ), max( birthday ) FROM luca.people_gc_stored;
      min      |    max     
---------------|------------
 0720-12-07 BC | 2019-11-03
(1 row)

testdb=> SELECT min( age ), max( age ) FROM luca.people_gc_stored;
ERROR:  permission denied for table people_gc_stored

PostgreSQL 12 package on FreeBSD

2019-11-04T00:00:00+00:00

PostgreSQL 12 is available as binary package on FreeBSD, but not in the quarterly update.

PostgreSQL 12 package on FreeBSD

In the case you need to install PostgreSQL 12 on FreeBSD please consider it has not reached the quarterly pkg(1) update, therefore if you install it via pkg(1) you will get PostgreSQL 12 rc1. However, in the ports tree, PostgreSQL is clearly at version 12 (release).
This behavior is due to the fact that since FreeBSD 12, the default repository for packages is quarterly, that in short means packages are older than the ports tree.

In order to install the official release, a new URL for the FreeBSD repository must be set up. The repository URL is placed into the file /etc/pkg/FreeBSD.conf:

FreeBSD: {
  url: "pkg+http://pkg.FreeBSD.org/${ABI}/quarterly",
  mirror_type: "srv",
  signature_type: "fingerprints",
  fingerprints: "/usr/share/keys/pkg",
  enabled: yes
}

The pkg(1) configuration allows the overriding of the default URL placing a file /usr/local/etc/pkg/FreeBSD.conf that overrides the properties of the above, so with the content:

FreeBSD: {
  url: "pkg+http://pkg.FreeBSD.org/${ABI}/latest"
}

After that, the repository can be updated and new packages will be available. Therefore, run:

% sudo pkg update
% sudo pkg install postgresql12-client-12 \
                   postgresql12-contrib-12 \
                   postgresql12-docs-12 \
                   postgresql12-plperl-12 \ 
                   postgresql12-server-12

Installing PostgreSQL on FreeBSD via Ansible

2019-10-30T00:00:00+00:00

My very simple attempt at keeping PostgreSQL up-to-date on FreeBSD machines.

Installing PostgreSQL on FreeBSD via Ansible

I’m slowly moving to Ansible to manage my machines, and one problem I’m trying to solve at best is how to keep PostgreSQL up-to-date.
In the case of FreeBSD machines, pkgng is the module to use, but in the past I was used to this very simple playbook snippet:

- name: PostgreSQL 11
  become: yes
  with_items:
    - server
    - contrib
    - client
    - plperl
  pkgng:
    name: postgresql11-{{ item }}
    state: latest
    

However, there is a very scarign warning message when running the above:

TASK [PostgreSQL 11] 
[DEPRECATION WARNING]: Invoking "pkgng" only once while using a loop via squash_actions is deprecated. Instead of using a loop to supply multiple 
items and specifying `name: "postgresql11-"`, please use `name: ['server', 'contrib', 'client', 'plperl']` and remove the loop. This 
feature will be removed in version 2.11. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.

That’s easy to fix, but also annoying (at least to me), because I have to change the above snippet to the following one:

- name: PostgreSQL 11
  become: yes
  pkgng:
    name:
      - postgresql11-server
      - postgresql11-contrib
      - postgresql11-client
      - postgresql11-plperl
    state: latest

So far, the better solution I’ve found that helps me keep readibility is to use a variable to hold the PostgreSQL version I want and the list of packages I need:

vars:
  pg_version: 11
  pg_components:
    - postgresql{{ pg_version }}-server
    - postgresql{{ pg_version }}-contrib
    - postgresql{{ pg_version }}-client
    - postgresql{{ pg_version }}-plperl


tasks:
    - name: PostgreSQL {{ pg_version }}
      become: yes
      pkgng:
        name: "{{ pg_components }}"
        state: latest
        

pgenv: adjust your PATH!

2019-10-25T00:00:00+00:00

A few days ago we added the option to suggest you changes to your PATH to prevent version clashes.

pgenv: adjust your PATH!

In the following you can find another quick video that demonstrate how easy it is to get, almost automtically, a PostgreSQL 12 instance up and running on your local machine using pgenv.

Please note also that, at time 5:35, you will see how pgenv suggests you to adjust your PATH environment variable in order to use the just installed binaries for the cluster. The idea behind this suggestion is to prevent you using a system-wide binary, e.g., psql, that has a possible incompatibility with the in-use cluster.

PostgreSQL 12 beta 4 up and running in less than six minutes

2019-09-19T00:00:00+00:00

How hard can it be to grab a copy of PostgreSQL 12 (still in beta) and install on your computer for testing, without having to deal with your existing database?

PostgreSQL 12 beta 4 up and running in less than six minutes

I have realized a very short, and to some extent, boring video to demonstrate how pgenv can simplify the installation of PostgreSQL 12 beta 4 (as well as other versions of course).

The video shows how automated it could be to install the beta version on a FreeBSD machine. For the very impatients, the commands are essentially:

% pgenv build 12beta4
% pgenv use 12beta4
% psql -h localhost -U postgres template1

but the last command is, of course, the proof that all is up and running.

As you will see, the most of the time is spent in doing the actual compilation of the software. The value added by pgenv is that you don’t have to deal with download links and commands to initialize your database. And once you are done, you can simply nuke the pgsql-12beta4 directory that will remove binaries and data.

Of course, pgenv can do a lot more than just downloading and compiling PostgreSQL, but the above demonstrate how it simplifies even the boring setup tasks.

New Release of PL/Proxy

2019-09-16T00:00:00+00:00

There is a new release of PL/Proxy out there!

New Release of PL/Proxy

There is a new exciting release of PL/Proxy: version 2.9 has been released a few hours ago!

This is an important release because it adds support for upcoming PostgreSQL 12. The main problem with PostgreSQL 12 has been that Oid is now a regular column, meaning that HeapTupleGetOid`` is no longer a valid macro. I first proposed a patch that was based on the C preprocessor to get rid of older PostgreSQL version.

The solution implemented by Marko Kreen is of course much more elegant and is based on defining helper functions that are pre-processed depending on the PostgreSQL version.

Enjoy proxying!

Compute day working hours in PL/pgsql

2019-08-30T00:00:00+00:00

How many working hours are there in a range of dates?

Compute day working hours in PL/pgsql

A few days ago there was a very nice thread in the pgsql-general mailing list asking for ideas about how to compute working hours in a month.
The idea is quite simple: you must extract the number of working days (let’s say excluding sundays) and multiple each of them for the number of hours per day and then get the sum.
There are a lot of nice and almost one-liner solutions in the thread, so I strongly encourage you to read it all!

I came up with my own solution, that is based on functions, and here I’m going to explain it hoping it can be useful (at least as a starting point).

You can find the code, as usual, on my GitHub repository related to PostgreSQL.

The workhorse function

One reason I decided to implement the alghoritm using a function was because I want it to be configurable. There are people, like me, that do a job where the working hours are different on a day-by-day basis. So, assuming the more general problem of computing the working hours between two dates, here there’s a possible implementation:

CREATE OR REPLACE FUNCTION 
compute_working_hours( begin_day DATE,
                        end_day DATE,
                        _saturday boolean DEFAULT false,
                        _hour_template real[] DEFAULT ARRAY[ 8, 8, 8, 8, 8, 8, 8 ]::real[],
                        _exclude_days date[] DEFAULT NULL )
RETURNS real
AS $CODE$
DECLARE
  working_hours real := 0;
  working_days daterange;
  current_day date;
  current_day_hours real;
  skip boolean;
BEGIN
  -- check arguments
  IF begin_day IS NULL
     OR end_day IS NULL
     OR begin_day >= end_day THEN
     RAISE EXCEPTION 'Please check dates';
  END IF;


  IF _hour_template IS NULL THEN
     _hour_template := ARRAY[ 8, 8, 8, 8, 8, 8, 8 ]::real[];
  END IF;
  WHILE array_length( _hour_template, 1 ) < 7 LOOP
    _hour_template := array_append( _hour_template, 8 );
  END LOOP;

   -- create the working period date range
  working_days = daterange( begin_day, end_day, '[]');

  RAISE DEBUG 'Working days in the range %', working_days;

  current_day := lower( working_days );
  LOOP
     -- skip sundays
     skip := EXTRACT( dow FROM current_day ) = 0;
     -- skip saturdays if required
     skip := skip OR  ( NOT _saturday AND EXTRACT( dow FROM current_day ) = 6 );

     -- skip this particular day if specified
     skip := skip OR ( _exclude_days IS NOT NULL AND _exclude_days @> ARRAY[ current_day ] );

     IF NOT skip THEN
        current_day_hours := _hour_template[ EXTRACT( dow FROM current_day ) ];
     ELSE
        current_day_hours := 0;
     END IF;

     RAISE DEBUG 'Day % counting % working hours',
                 current_day,
                 current_day_hours;

     working_hours := working_hours + current_day_hours;
     current_day   := current_day + 1;
     EXIT WHEN NOT current_day <@ working_days;
  END LOOP;


  -- all done
  RETURN working_hours;


END
$CODE$
LANGUAGE plpgsql;

Let’s consider the arguments: the first two are the dates you want to inspect, then there’s a boolean that indicates if saturday is a working day or not. The _hour_template is a template with the amount of hours within each day (sunday first, which can be any value since sundays are never working days - at least I would it to be!). Last, an array of days to exclude from the computation (holidays, vacation, and so on).

The function computes a working_days date range including the begin and end date, and then uses a current_day single day date to iterate within the date range. In the main loop, there are checks to skip the current day in the case it is a sunday, or a saturday (and saturdays are not working days) or is included into the array of ecluded days.
Then the tricky part: if the day has to be excluded, the working hours will be zero, otherwise the working hours will be extracted from the hour template. Working hours are then summed together.

Let’s see this in action:

testdb=# select compute_working_hours( current_date, 
     current_date + 3, 
     false, NULL, ARRAY[ '2019-08-28' ]::date[] );
DEBUG:  Working days in the range [2019-08-28,2019-09-01)
DEBUG:  Day 2019-08-28 counting 0 working hours
DEBUG:  Day 2019-08-29 counting 8 working hours
DEBUG:  Day 2019-08-30 counting 8 working hours
DEBUG:  Day 2019-08-31 counting 0 working hours
compute_working_hours
-----------------------
                    16

I wish not to work on my beautiful wife’s birthday, so within the three days I’m supposed to work only two and get 16 hours.

As you probably have noticed, the hour template is expressed as real values, so that it is possible to express even part of hours, like 8.5 to indicate 8 hours and half. Here probably the usage of time would have been a better choice, but with a little complication over the final sum, so I’m not yet convinced about providing such an implementation.

Back to the real problem: computing within a month

Having the above function in place, it is now possible to overload it and provide a function that computes the working hours in a single month of the year:

CREATE OR REPLACE FUNCTION 
compute_working_hours( _year int,
                       _month int,
                       _saturday boolean DEFAULT false,
                       _hour_template real[] DEFAULT ARRAY[ 8, 8, 8, 8, 8, 8, 8 ]::real[],
                       _exclude_days int[] DEFAULT null )
RETURNS real
AS $CODE$
DECLARE
  _exclude_days_as_dates date[];
  current_index int;
BEGIN
  -- check arguments
  IF _year IS NULL THEN
     _year := extract( year FROM CURRENT_DATE );
  END IF;

  IF _month IS NULL THEN
     _month := extract( month FROM CURRENT_DATE );
  END IF;

  IF _exclude_days IS NOT NULL THEN
    FOR current_index IN 1 .. array_upper( _exclude_days, 1 ) LOOP
     _exclude_days_as_dates := array_append( _exclude_days_as_dates,
                                             make_date( _year, _month, _exclude_days[ current_index ] ) );
    END LOOP;
  END IF;

  RETURN compute_working_hours( make_date( _year, _month, 1),
                              ( make_date( _year, _month, 1) + '1 month - 1 day'::interval )::date,
                                _saturday,
                                _hour_template,
                                _exclude_days_as_dates );

END
$CODE$
LANGUAGE plpgsql;

As you can see, the function asks for the year and month (as well as other parameters like the hour template), computes the range of dates for the specified month and delegates to the former implementation the computation.

One part I’m not really proud of is the _exclude_days parameter, that in this version is an array of integers that I have to convert then in array of dates. On one hand, I wanted the function to have coherent parameters, so if I specify a single month and want to skip the day 28 I already know that’s the 28th day of that month, so it is just a noise to ask the user to input a date. On the other hand, the loop that converts __exclude_days into an array of dates named _exclude_days_as_dates is less than elegant!

By the way, how is this invoked?

testdb=# SELECT compute_working_hours( NULL, 
                                       NULL, 
                                       true, 
                                       NULL,  
                                       ARRAY[12, 15 ,29] );
DEBUG:  Working days in the range [2019-08-01,2019-09-01)
DEBUG:  Day 2019-08-01 counting 8 working hours
DEBUG:  Day 2019-08-02 counting 8 working hours
DEBUG:  Day 2019-08-03 counting 8 working hours
...
 compute_working_hours
-----------------------
                   192

And yes, I love defaults and so pretty much every parameter can be omitted at all and still get a pretty decent result.

PgBouncer gets SCRAM!

2019-08-30T00:00:00+00:00

A few of days ago a new release of PgBouncer has been released, with the addition of SCRAM support!

PgBouncer gets SCRAM!

Three days ago PgBouncer 1.11 has been released, and one feature that immediately caught my attention was the addition of /SCRAM support for password/.

SCRAM is currently the most secure way to use password for PostgreSQL authentication and has been around since version ~10~ (so nearly two years). SCRAM support for PgBouncer has been a /wanted feature/ for a while, since not having it prevented users of this great tool to use SCRAM on the clusters.

Luckily, now this has been implemented and the configuration of the PgBouncer account** is similar to the plain and ~md5~, so it is very simple.

I really love PgBouncer and, with this addition, I can now upgrade my servers to /SCRAM/! Thank you PgBouncer developers!

PL/Proxy on PostgreSQL 12 ?

2019-08-27T00:00:00+00:00

I spent some more time on the PL/Proxy code base in order to make it compiling against upcoming PostgreSQL 12.

PL/Proxy on PostgreSQL 12 ?

In my yesterday blog post I reported some stupid thougth about compiling PL/Proxy against PostgreSQL 12.
I was too stupid to hit the removal of HeapTupleGetOid (as of commit 578b229718e8f15fa779e20f086c4b6bb3776106 ), and after having read the commit comment with more accuracy, I found how to fix the code (at least I hope so!).

Essentially, wherever I found usage of HeapTupleGetOid I placed a preprocessor macro to extract the Form_pg_ structure and use the normal column oid instead, something like:

#if PG_VERSION_NUM < 12000 
  Oid namespaceId = HeapTupleGetOid(tup);
#else
  Form_pg_namespace form = (Form_pg_namespace) GETSTRUCT(tup);
  Oid       namespaceId  = form->oid;
#endif

I strongly advise to not use this in production, at least until someone of the PL/Proxy authors have a look at the code! However the tests pass on PostgreSQL 12beta2 on Linux.

You can find the pull request that also includes my previous pull request to make PL/Proxy work against PostgreSQL11 and FreeBSD.
I hope it can help pushing a new release of this tool.

PL/Proxy on PostgreSQL 11 and FreeBSD 12

2019-08-26T00:00:00+00:00

PL/Proxy is a procedural language implementation that makes really easy to do database proxying, and sharding as a consequence. Unluckily getting it to run on PostgreSQL 11 and FreeBSD 12 is not for free.

PL/Proxy on PostgreSQL 11 and FreeBSD 12

PL/Proxy is a project that allows database proxying, that is a way to connect to remote databases, and as a consequence allows for /sharding/ implementations.
The idea behind PL/Proxy is as simple as elegant: define a minimalistic language to access remote (database) objects and, more in particular, execute queries.

Unluckily, the latest stable release of PL/Proxy is 2.8 and is dated October 2017, that means PostgreSQL 10! There are a couple of Pull Requests to make it working against PostgreSQL 11, but hey have not been merged and the project code seems in pause.

Today I created a cumulative pull request that does a little adjustments to allow the compilation on FreeBSD 12 against PostgreSQL 11.

My pull request is inspired and borrows changes from other two pull requests:

pr-31 and credits to Laurenz Albe;
pr-33 that has been merged into mine, and credits to Christoph Berg.
Then I added a compiler flag to adjust headers on FreeBSD 12, as well as dropped an old Bison syntax since this should be safe enough on modern PostgreSQL (at least 9.6 and higher. Some bit here and there to make all tests to pass against PostgreSQL 11, and everything seems right now.
It is important to warn that my version is not production ready because it should be reviewed by at least one PL/Proxy developer.

And what about PostgreSQL 12?

Well, PostgreSQL 12 drops the usage of the special column Oid in catalogs, with commit 578b229718e8f15fa779e20f086c4b6bb3776106. What this means is that the macro HeapTupleGetOid is no longer there and PL/Proxy does an heavy usage of it. I’ve tried to blindly substitute it with ->t_tableOid, but this does not seems to work since the tests are failing to lookup objects. So any suggestion here is welcome!

yum upgrade postgresql11 panic!

2019-07-22T00:00:00+00:00

I have to say, I don’t use CentOS very much and I’m not a good user of systemd, that is the reason why I got five minutes of pure fear!

yum upgrade postgresql11 panic!

How hard could it be to upgrade PostgreSQL within minor versions?
Usually it is very simple, and it is very simple but not when you don’t know your tools!
And in this case that’s my fault.
However, I’m writing this short note in order to avoid other people experience the same problem I had.

The current setup

The machine is a CentOS 7 running PostgreSQL 11.1 installed by packages provided by the PostgreSQL Global Development Group.

Preparing to upgrade

Of course, I took a full backup before proceeding, just in case. The cluster I’m talking about is a low traffice cluster with roughly ~12 GB~ of data, that is the backup and restore are not a zero downtime (and no, I’m not in the position of having a WAL based backup, but that’s another story).
Having a backup helps keeping the amount of panic at a fair level.

Performing the upgrade

I do like yum(8) and its transactional approach. Doing the upgrade was a matter of:

% sudo yum upgrade postgresql11

and all dependencies are, of course, calculated and applied. Then I confirmed, waited a couple of minutes for the upgrade to apply, and I started keeping my breath:

psql: could not connect to server: Connection refused
        Is the server running on host "xxx" (192.168.222.123) and accepting
        TCP/IP connections on port 5432?

Inspecting and solving the problem

Apparently PostgreSQL has not been restarted after the upgrade, but what is worst is that is not going to restart again:

33:25 lnx168 systemd[1]: Starting PostgreSQL 11 database server...
33:25 lnx168 postgresql-11-check-db-dir[10214]: "/var/lib/pgsql/11/data/" is missing or empty.
33:25 lnx168 postgresql-11-check-db-dir[10214]: Use "/usr/pgsql-11/bin/postgresql-11-setup initdb" to initialize the database cluster.
33:25 lnx168 postgresql-11-check-db-dir[10214]: See /usr/share/doc/postgresql11-11.4/README.rpm-dist for more information.
33:25 lnx168 systemd[1]: postgresql-11.service: control process exited, code=exited status=1
33:25 lnx168 systemd[1]: Failed to start PostgreSQL 11 database server.

What the hell! (I’m allowed to spell it loud because my colleague was on vacation and I was alone in my office).
First of all, do not run initdb as suggested because chances are you will destroy all your data. But that’s a good hint about the problem: systemd was trying to launch PostgreSQL with an empty PGDATA.

Of course, the PGDATA was not empty and was still in place, but yum upgraded my systemd configuration for PostgreSQL to the CentOS default, therefore my file /usr/lib/systemd/system/postgresql-11.service was overriden without any advice!

And in fact, to confirm the above, I was able to start the server manually using pg_ctl, and at least I had the server running.

Now that the server is running, I have more time to inspect /usr/lib/systemd/system/postgresql-11.service and adjust the PGDATA parameter to the right value:

% sudo grep PGDATA /usr/lib/systemd/system/postgresql-11.service
Environment=PGDATA=/data/pgdata

I also double checked that the systemd startup script correctly links to the edited file:

$ ls -l /etc/systemd/system/multi-user.target.wants/postgresql-11.service
lrwxrwxrwx 1 root root 45 20 dic  2018 /etc/systemd/system/multi-user.target.wants/postgresql-11.service 
                                           -> /usr/lib/systemd/system/postgresql-11.service

Seems fine, right?

Nested problems

No matter how fine the setup was, systemd still refused to restart the cluster:

$ sudo service postgresql-11 restart                      
Redirecting to /bin/systemctl restart postgresql-11.service
Job for postgresql-11.service failed because the control process exited with error code. See "systemctl status postgresql-11.service" and "journalctl -xe" for details.

For a reason I don’t really know, it seems that systemd keeps track that it hasn’t started the service, and that the latter is in failed mode. The solution was to manually stop the cluster via pg_ctl and that asks systemd to start it again, and this time it gets running.

Fixing the problem with `systemd**: the right approach

updated on 2019-07-22

As pointed out by Andrew Gierth in a comment, editing the systemd unit service file is not the right approach to configure services. Here it is the right approach, so that my changes do not get overwritten by systemd: 1) run systemctl edit postgresql-11; 2) add a line with Environment=PGDATA=/data/pgdata within the Service section:

[Service]
Environment=PGDATA=/data/pgdata

3) inspect the service with systemctl status postgresql-11, that will show the following:

$ systemctl status postgresql-11
● postgresql-11.service - PostgreSQL 11 database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql-11.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/postgresql-11.service.d
           └─override.conf
   Active: active (running) since lun 2019-07-22 15:43:50 CEST; 31s ago
     Docs: https://www.postgresql.org/docs/11/static/
 Main PID: 16114 (postmaster)
   CGroup: /system.slice/postgresql-11.service
           ├─16114 /usr/pgsql-11/bin/postmaster -D /postgres/data
           ├─16116 postgres: logger   
           ├─16118 postgres: checkpointer   
           ├─16119 postgres: background writer   
           ├─16120 postgres: walwriter   
           ├─16121 postgres: autovacuum launcher   
           ├─16122 postgres: stats collector   
           ├─16123 postgres: pg_cron scheduler   
           └─16124 postgres: logical replication launcher   

The important part in the above is the Drop-In line that points to a freshly created directory /etc/systemd/system/postgresql-11.service.d with a single file, override.conf that contains the new PGDATA definition. In other words, systemd keeps the service units under its own control, and you have to create an override.conf file to place other variable values.

Conclusions

Not knowing your tools, systemd in this case, can lead to panic when they do not behave as you expect. Unluckily, there are too many little details to know about every different system, and I wish systemd becomes a little less rude and at least warns the user that his files are going to be overriden.
While the unit file states, in its beginning, to not modify the file, it is not clear what is the best approach to use to re-define variables (include or override file?).

Checking PostgreSQL Version in Scripts

2019-07-18T00:00:00+00:00

psql(1) has a bar support for conditionals, that can be used to check PostgreSQL version and act accordingly in scripts.

Checking PostgreSQL Version in Scripts

psql(1) provides a little support to conditionals and this can be used in scripts to check, for instance, the PostgreSQL version.
This is quite trivial, however I had to adjust an example script of mine to act properly depending on the PostgreSQL version.

The problem

The problem I had was with declarative partitioning: since PostgreSQL 11, declarative partitioning supports a DEFAULT partition, that is catch-all bucket for tuples that don’t have an explicit partition to go into. In PostgreSQL 10 you need to manually create catch-all partition(s) by explicitly defining them.
In my use case, I had a set of tables partitioned by a time range (the year, to be precise), but I don’t want to set up a partition for each year before the starting point of clean data: all data after year 2015 is correct, somewhere there could be some dirty data with bogus years.
Therefore, I needed a partition to catch all bogus data before year 2015, that is, a partition that ranges from the earth creation until 2015. In PostgreSQL 11 this, of course, requires you to define a DEFAULT partition and that’s it! But how to create a different default partition on PostgreSQL 10 and 11?

I solved the problem with something like the following:

\if :pg_version_10

\echo 'PostgreSQL version is 10'
\echo 'Emulate a DEFAULT partition'

CREATE TABLE digikam.images_old
       PARTITION OF digikam.images_root
       FOR VALUES FROM ( MINVALUE )
                TO ( '2015-01-01' );

\else

\echo 'PostgreSQL version is at least 11'
\echo 'Using DEFAULT partition'

CREATE TABLE digikam.images_old
       PARTITION OF digikam.images_root
       DEFAULT;

\endif

The idea is quite simple: if (\if) PostgreSQL is at version 10 emulate a default partition, otherwise (\else) PostgreSQL is at version 11 or greater and can use native DEFAULT partition. The partition table is named the same in the two cases so that the final user does not see any difference.

But what is that :pg_version_10 stuff? That’s a boolean psql(1) variable set up by another utility, included into my script:

SELECT
EXISTS ( SELECT setting
         FROM   pg_settings
         WHERE  name = 'server_version_num'
         AND    setting::int >= 120000
         AND    setting::int  < 130000
       )
       AS pg_version_12
, EXISTS ( SELECT setting
         FROM   pg_settings
         WHERE  name = 'server_version_num'
         AND    setting::int >= 110000
         AND    setting::int  < 120000
         )
         AS pg_version_11
-- and so on ...
, EXISTS ( SELECT setting
         FROM   pg_settings
         WHERE  name = 'server_version_num'
         AND    setting::int < 100000
         )
         AS pg_version_less_than_10
\gset

The script does a very dummy job: it queries the server_version_num setting and dynamically creates (\gset) variables that are true depending on the PostgreSQL instance version number.
The only thing required is to import the script, for instance at the very top of your script, as for instance:

-- beginning of your script
\ir ../pgsql.check_postgresql_version.psql

And that’s all folks!

What this allows me to do is, for instance, avoid to run a declarative partition script at all if that is not supported on the server side:

\if :pg_version_less_than_10
\echo 'PostgreSQL version less than 10, cannot run declarative partitioning!'
\echo 'Update yourself!'
\quit
\endif

Just placing the above snippet on top of my declarative partitioning script prevents me to running commands that will generate errors if the server is not at least at version 10.

Summary

Thanks to psql(1) conditionals support it is possible to behave differently depending on the server version.
The advantage is that, clearly, you can build more robust scripts.
The drawback is that such script will require psql(1) and are therefore less portable.

Suggesting Single-Column Primary Keys (almost) Automatically

2019-07-17T00:00:00+00:00

Is it possible to infer primary keys automatically? If it, I’m not able at doing that, but at least I can try.

Suggesting Single-Column Primary Keys (almost) Automatically

A comment on my previous blog post about generating primary keys with a procedure made me think about how to inspect a table to understand which columns can be candidates for primary keys.

Of course, this does make sense (at least to me) for single-column constraints only, because multi column constraint require a deep knowledge about the data. Anyway, here it is my first attempt:

CREATE OR REPLACE FUNCTION f_suggest_primary_keys( schemaz text DEFAULT 'public',
                                                   tablez text  DEFAULT NULL )
RETURNS SETOF text
AS $CODE$
DECLARE
  current_stats record;
  is_unique            boolean;
  is_primary_key       boolean;
  could_be_unique      boolean;
  could_be_primary_key boolean;
  current_constraint   char(1);
  current_alter_table  text;
BEGIN
  RAISE DEBUG 'Inspecting schema % (table %)', schemaz, tablez;

  FOR current_stats IN SELECT s.*, n.oid AS nspoid, c.oid AS reloid FROM pg_stats s
                       JOIN  pg_class c ON c.relname = s.tablename
                       JOIN  pg_namespace n ON n.oid = c.relnamespace
                       WHERE s.schemaname = schemaz
                       AND   c.relkind = 'r'
                       AND   n.nspname = s.schemaname
                       AND   ( ( s.tablename = tablez ))

  LOOP
    is_primary_key       := false;
    is_unique            := false;
    could_be_unique      := false;
    could_be_primary_key := false;
    RAISE DEBUG 'Inspecting table [%.%] (%.%) -> %', current_stats.schemaname,
                                                     current_stats.tablename,
                                                     current_stats.nspoid,
                                                     current_stats.reloid,
                                                     current_stats.attname;
     -- search if this attribute is already included into
     -- a primary key constraint
     SELECT cn.contype
     INTO   current_constraint
     FROM   pg_constraint cn
     JOIN   pg_attribute a ON a.attnum = ANY( cn.conkey )
     WHERE  cn.conrelid     = current_stats.reloid
     AND    cn.connamespace = current_stats.nspoid
     AND    a.attrelid      = current_stats.reloid
     AND    a.attname       = current_stats.attname;


     IF current_constraint = 'p' THEN
        is_primary_key := true;
     ELSE
       is_primary_key := false;
     END IF;

     IF current_constraint = 'u' THEN
        is_unique := true;
     ELSE
       is_unique := false;
     END IF;

     -- if this is already on a constraint, skip!
     IF is_primary_key OR is_unique THEN
        CONTINUE;
     END IF;

   -- check if this could be an unique attribute
   IF current_stats.n_distinct = -1 THEN
      could_be_unique := true;
   ELSE
      could_be_unique := false;
   END IF;

   -- could it be promoted as a primary key?
   IF could_be_unique AND current_stats.null_frac = 0 THEN
      could_be_primary_key := true;
   ELSE
     could_be_primary_key := false;
   END IF;

   IF could_be_primary_key THEN
      RAISE DEBUG 'Suggested PRIMARY KEY(%) on %.%', current_stats.attname,
                                                     current_stats.schemaname,
                                                     current_stats.tablename;
      current_alter_table := format( 'ALTER TABLE %I.%I ADD CONSTRAINT UNIQUE(%I)', current_stats.schemaname,
                                                                                    current_stats.tablename,
                                                                                    current_stats.attname );
   ELSE IF could_be_unique THEN
         RAISE DEBUG 'Suggested UNIQUE(%) on %.%', current_stats.attname,
                                                   current_stats.schemaname,
                                                   current_stats.tablename;
        current_alter_table := format( 'ALTER TABLE %I.%I ADD CONSTRAINT PRIMARY KEY(%I)',
                                                   current_stats.schemaname,
                                                   current_stats.tablename,
                                                   current_stats.attname );
    END IF;
  END IF;




   RETURN NEXT current_alter_table;
  END LOOP;

  RETURN;

END
$CODE$
LANGUAGE plpgsql;

The idea is to wrap into a function all the logic, so that I can pass either the schema or the table name to inspect and have the arguments be set to decent defaults.
The first look is at pg_stats because it can provide hints about good candidates:

if n_distinct is negative, and in particular is -1, the column has one different value on every different tuple, so it (as far as we know) unique;
if null_frac is 0 the value is not null and it can be a candidate for a primary key.

Of course, this means that the statistics must be up-to-date or the whole thing will not be able to suggest constraints!

From pg_stats I get back column names, and the first thing to check then is if the column already appears in a constraint of type p (primary key) or u (unique); this prevents the function to suggest columns that already implied in such a constraint, that is avoid suggesting obvious things.

The remaining is quite simple: if the column is already involved in a constraint, skip it; otherwise consider if it can be part of a UNIQUE or PRIMARY KEY constraint. Depending on the result, the right ALTER TABLE is emitted, so that the administrator can use it with rationality.

Here it is an example invocation:

testdb =# select * from f_suggest_primary_keys( 'respi', 'tipo_rensom' );
 DEBUG:  Inspecting schema respi (table tipo_rensom)
 DEBUG:  Inspecting table [respi.tipo_rensom] (151915.151952) -> pk
 DEBUG:  Inspecting table [respi.tipo_rensom] (151915.151952) -> id_tipo_rensom
 DEBUG:  Inspecting table [respi.tipo_rensom] (151915.151952) -> nome
 DEBUG:  Suggested PRIMARY KEY(nome) on respi.tipo_rensom
 DEBUG:  Inspecting table [respi.tipo_rensom] (151915.151952) -> descrizione
 DEBUG:  Suggested PRIMARY KEY(descrizione) on respi.tipo_rensom

                   f_suggest_primary_keys
 -------------------------------------------------------------------
 ALTER TABLE respi.tipo_rensom ADD CONSTRAINT UNIQUE(nome)
 ALTER TABLE respi.tipo_rensom ADD CONSTRAINT UNIQUE(descrizione)

With no surprise, the pk column, that is a PRIMARY KEY is inspected but skipped, while two other columns appear to be enough unique to take a role in a constraint addition.

As you can imagine, this is just a little attempt in automating boring stuff. There is a lot of room for improvements, both on the performance way and on the more important support the function can provide to an administrator.

Generate Primary Keys (almost) Automatically

2019-07-09T00:00:00+00:00

What if your database design is so poor that you need to refactor tables in order to add primary keys?

Generate Primary Keys (almost) Automatically

While playing on quite large database (in terms of number of tables) with a friend of mine, we discovered that almost all tables did not have a primary key!
Gosh!
This is really baaaad!

Why is that bad? Well, you should not ask, but let’s keep the poor database design alone and focus on some more concrete problems: in particular not having a primary key prevents a lot of smart softwares and middlewares to work on your database. As you probably know, almost every ORM requires each table to have at least one surrogate key in order to properly identify each row and enable persistence (that is, modification of rows).

Luckily, fixing tables for such software is quite simple: just add a surrogate key and everyone will be happy again. But unluckily, while adding a primary key is a matter of issuing an ALTER TABLE, doing so for a long list of tables is boring.

Here comes the power of PostgreSQL again: thanks to its rich catalog, it is possible to automate the process.

In this post you will see how to build from a query to a whole procedure that does the trick.

A query to generate the `ALTER TABLE` commands

A first example is the following query, that searches for every table in the schema public that does not have a constraint of type p (primary key) and issue an ALTER TABLE for such table:

testdb=# WITH
to_be_fixed AS
(
  SELECT c.relname,
  'ALTER TABLE '
  || quote_ident( n.nspname )
  || '.'
  || quote_ident( c.relname )
  || ' ADD COLUMN pk int GENERATED ALWAYS AS IDENTITY PRIMARY KEY;' AS command
  FROM pg_class c
  JOIN pg_namespace n ON n.oid = c.relnamespace
  WHERE n.nspname = 'public'
  AND   c.relkind = 'r'
  AND NOT EXISTS ( SELECT conname FROM pg_constraint WHERE contype = 'p' AND conrelid = c.oid )
  ORDER BY c.relname
)
SELECT command FROM to_be_fixed;
                                      command                                       
------------------------------------------------------------------------------------
 ALTER TABLE public.bar ADD COLUMN pk int GENERATED ALWAYS AS IDENTITY PRIMARY KEY;
 ALTER TABLE public.foo ADD COLUMN pk int GENERATED ALWAYS AS IDENTITY PRIMARY KEY;

So a first, desperate way of doing it is to adjust the above query to your schema, saving it to a file named query.sql, and then executing it putting the output into a text file (say script.sql) and then execute it. In other words, something like:

% psql -U luca -h miguel -f query.sql -o script.sql testdb
% psql -U -h miguel -f script.sql

But let’s see a more tunable way of doing it.

A function to generate the `ALTER TABLE` commands

I’ve written a very small function to do the above ALTER TABLE commands in a way that is a little smarter and tunable. The function accepts a couple of parameters, all with default values:

pk_prefix (defaults to pk) the name of your primary key column, call it id, pk or whatever;
schemaz (defaults to public) the schema where you want to operate on;
use_identity true if you want to generate identity columns, false if you want to generate serial columns;
append_table_name in order to avoid column name clashes (it could be you already have an id column somewhere), it is possible to append the table name to the column name pk_prefix so to generate almost unique keys.

The function looks like the following:

CREATE OR REPLACE FUNCTION f_generate_primary_keys( pk_prefix text DEFAULT 'pk',
                                                    schemaz text DEFAULT 'public',
                                                    use_identity boolean DEFAULT true,
                                                    append_table_name boolean DEFAULT false )
RETURNS SETOF text
AS $CODE$
DECLARE
  current_class pg_class%rowtype;
  current_alter_table text;
  current_pk_type text;
  current_pk_generation text;
  current_pk_name text;
BEGIN

  FOR current_class IN SELECT * FROM pg_class c
                       JOIN pg_namespace n ON n.oid = c.relnamespace
                       WHERE n.nspname = schemaz
                       AND   c.relkind = 'r'
                       AND NOT EXISTS ( SELECT conname FROM pg_constraint
                                        WHERE contype = 'p'
                                        AND conrelid = c.oid )

  LOOP
    RAISE DEBUG 'Table [%] without primary key', current_class.relname;

    current_pk_name := pk_prefix;
    IF append_table_name THEN
       current_pk_name := current_pk_name || '_' || current_class.relname;
    END IF;

    IF NOT use_identity THEN
       current_pk_type       := 'serial';
       current_pk_generation := '';
    ELSE
      current_pk_type       := 'int';
      current_pk_generation := 'GENERATED ALWAYS AS IDENTITY';
    END IF;



    current_alter_table := format( 'ALTER TABLE %I.%I ADD COLUMN %I %s NOT NULL %s PRIMARY KEY;',
                                   schemaz,
                                   current_class.relname,
                                   current_pk_name,
                                   current_pk_type,
                                   current_pk_generation );

   RAISE DEBUG ' -> %', current_alter_table;
   RETURN NEXT current_alter_table;
  END LOOP;

  RETURN;

END
$CODE$
LANGUAGE plpgsql;

Briefly, the function issues a query that is very similar to the above one, and that finds out all tuples in pg_class corresponding to a table without a primary key. For each table, the appropriate ALTER TABLE is built and issued as a returning value.
Invoking the function produces the commands to execute after in the database:

testdb=# select * from f_generate_primary_keys( 'id', 'public', true, false );
                                   f_generate_primary_keys                                   
---------------------------------------------------------------------------------------------
 ALTER TABLE public.foo ADD COLUMN id int NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY;
 ALTER TABLE public.bar ADD COLUMN id int NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY;
(2 rows)

testdb=# select * from f_generate_primary_keys();
                                   f_generate_primary_keys                                   
---------------------------------------------------------------------------------------------
 ALTER TABLE public.foo ADD COLUMN pk int NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY;
 ALTER TABLE public.bar ADD COLUMN pk int NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY;
(2 rows)

testdb=# select * from f_generate_primary_keys( 'id', 'public', false, true );
                        f_generate_primary_keys                         
------------------------------------------------------------------------
 ALTER TABLE public.foo ADD COLUMN id_foo serial NOT NULL  PRIMARY KEY;
 ALTER TABLE public.bar ADD COLUMN id_bar serial NOT NULL  PRIMARY KEY;

There is of course room for improvements, for instance executing the ALTER TABLE immediatly within the function.

A procedure to execute the `ALTER TABLE` commands

It is now quite straightforward to wrap the f_generate_primary_keys function into a procedure and add transaction logic. The boring stuff is just to pass thru the arguments and control when to issue a commit while batch processing:

CREATE OR REPLACE PROCEDURE p_generate_primary_keys( pk_prefix text DEFAULT 'pk',
                                                     schemaz text DEFAULT 'public',
                                                     use_identity boolean DEFAULT true,
                                                     append_table_name boolean DEFAULT false,
                                                     commit_after int DEFAULT 10 )
AS $CODE$
DECLARE
  current_alter_table text;
  done int := 0;
BEGIN
  FOR current_alter_table IN SELECT f_generate_primary_keys( pk_prefix, schemaz, use_identity, append_table_name )
  LOOP
    RAISE DEBUG 'Executing [%]', current_alter_table;
    EXECUTE current_alter_table;
    done := done + 1;


    IF done % commit_after = 0 THEN
       RAISE DEBUG 'Forcing a commit';
       COMMIT;
    END IF;

  END LOOP;
  RAISE DEBUG 'Altered % tables in schema %', done, schemaz;
  COMMIT;
END
$CODE$
LANGUAGE plpgsql;

The important part here is, of course, the EXECUTE statement and the commit control. Invoking the procedure proceduces something like:

testdb=# call p_generate_primary_keys( 'id', 'public', false, true );
DEBUG:  Table [foo] without primary key
DEBUG:   -> ALTER TABLE public.foo ADD COLUMN id_foo serial NOT NULL  PRIMARY KEY;
DEBUG:  Table [bar] without primary key
DEBUG:   -> ALTER TABLE public.bar ADD COLUMN id_bar serial NOT NULL  PRIMARY KEY;
DEBUG:  Executing [ALTER TABLE public.foo ADD COLUMN id_foo serial NOT NULL  PRIMARY KEY;]
DEBUG:  ALTER TABLE will create implicit sequence "foo_id_foo_seq" for serial column "foo.id_foo"
DEBUG:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "foo_pkey" for table "foo"
DEBUG:  rewriting table "foo"
DEBUG:  building index "foo_pkey" on table "foo" serially
DEBUG:  Executing [ALTER TABLE public.bar ADD COLUMN id_bar serial NOT NULL  PRIMARY KEY;]
DEBUG:  ALTER TABLE will create implicit sequence "bar_id_bar_seq" for serial column "bar.id_bar"
DEBUG:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "bar_pkey" for table "bar"
DEBUG:  rewriting table "bar"
DEBUG:  building index "bar_pkey" on table "bar" serially
DEBUG:  Altered 2 tables in schema public
LOG:  duration: 16.224 ms  statement: call p_generate_primary_keys( 'id', 'public', false, true );
CALL

Again, there is room for improvement, but this is just a quick demonstration of how easy it is to exploit PostgreSQL facilities to refactor your schema.

PostgreSQL & recovery.conf

2019-07-08T00:00:00+00:00

The coming version of PostgreSQL, 12, will loose the recovery.conf file. It will get some time to get used to!

PostgreSQL & recovery.conf

According to the documentation for the upcoming version 12, the recovery.conf file has gone! The release note states it clearly: the server will not start if recovery.conf is in place and all the configuration parameters have moved to the classic postgresql.conf (or included files).

The change proposal is quite old, but represents a deep change in the way PostgreSQL handles the server startup and recovery and could take a while to get all the software out there to handle it too.

Please note that since PostgreSQL 12 is still in beta, things could change a little, even if the discussion and the implementation is nearly ended.

Two files can be created to instrument a standby node:

standby.signal if present in the PGDATA directory the host will work as a standby, that is it will wait for new incoming WAL segments and replay them for the rest of its life;
recovery.signal if present will stop the WAL replaying as soon as all the WALs have been consumed or the recovery_target parameter has been reached.

It is interesting to note that standby.signal takes precedence on recovery.signal, meaning that if both file exists the node will act as a standby. Both files may be empty, they act now as as triggering files rather than configuration files (here the change in the suffix).

So, what is the rationale for this change? There are several reasons, including the not needing for a duplication of configuration files. But what I like the most is that having the parameters into the trunk configuration make them good candidate to be changed via an ALTER SYSTEM and the postgresql.auto.conf machinery (see later for an example).

While all recovery parameters have been kept the same, the trigger_file one has been renamed to promote_trigger_file to clearly emphasize its meaning.

The above is not the only big difference in recovery handling: now it is no more possible to specify multiple recovery_target_xxx variables and “hope” to get the server to do it right (selecting the last one, effectively). The administrator is required to do a better job in selecting precisely which target to recover to! Last, also the timeline defaults to recover to the last one and not the current one.
As you can expect, pg_basebackup has been changed accordingly and therefore the --write-recovery-conf option (-R) now only puts a standby.signal file within the PGDATA directory. Settings are now appended to postgresql.auto.conf.

So, a lot of changes in the way the cluster manages the recovery/stand-by modes, and I hope all the automated backup software out there will respond properly.

Contexts

Contexts of the included setting GUCs have not changed so far:

template1=# SELECT name, context FROM pg_settings WHERE category like '% Archiv%';
          name           |  context   
-------------------------|------------
 archive_cleanup_command | sighup
 archive_command         | sighup
 archive_mode            | postmaster
 archive_timeout         | sighup
 recovery_end_command    | sighup
 restore_command         | postmaster

What happens if you keep around `recovery.conf`?

Let’s try it:

% sudo -u postgres touch /pgdata/12beta2/recovery.conf
% sudo -u postgres pg_ctl -D /pgdata/12beta2  start
...
FATAL:  using recovery command file "recovery.conf" is not supported
LOG:  startup process (PID 5837) exited with exit code 1
LOG:  aborting startup due to startup process failure
LOG:  database system is shut down

as already detailed, the database refuses to start.

What does happen when you issue an `ALTER SYSTEM`?

Easy pal, configuration is put on postgresql.auto.conf:

% psql -U postgres template1
psql (12beta2)
Type "help" for help.

template1=# ALTER SYSTEM SET restore_command TO 'cp %p %f';
ALTER SYSTEM

that results in:

% sudo cat /pgdata/12beta2/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
restore_command = 'cp %p %f'

PostgreSQL Administrator Account WITH NOLOGIN (recover your role)

2019-06-27T00:00:00+00:00

Today I got an email from a friend of mine that locked out of his own database due to a little mistake.

PostgreSQL Administrator Account WITH NOLOGIN (recover your `postgres` role)

What if you get locked out your own cluster due to a simple and, to some extent, stupid error? Let’s see it in quick list of steps.
First of all, lock the default postgres account so that the default administrator cannot any more log in the clsuter:

% psql -U postgres -c "ALTER ROLE postgres WITH NOLOGIN" testdb
ALTER ROLE

% psql -U postgres -c "SELECT version();" testdb               
psql: FATAL:  role "postgres" is not permitted to log in

What a mess!

PostgreSQL has a specific recovery mode, called single user mode, that resemble the operating system single user mode and can be used for such situations. Let’s see how.
First of all, shut down the cluster, avoid more damages of what you have already done!

% sudo service postgresql stop

Now, start the postgres process in single user mode. You need to know the data directory of your cluster in order for it to work:

% sudo -u postgres postgres --single -D /mnt/pg_data/pgdata/11.1

PostgreSQL stand-alone backend 11.3
backend> 

What happened? I used the operating system user postgres to launch the operating system process postgres (ok there’s a little name confusion here!) in single (--single) mode for my own data directory (-D). I got a prompt, I’m connected to the backend process directly, so this is not the same as a local or TCP/IP connection: I’m interacting with the backend process itself. Luckily, the backend process can speak SQL! Therefore, I can reset my administrator role:

backend> ALTER ROLE postgres WITH LOGIN;
backend> 

Please note that, while the backend process can speak SQL, it does not speak the same way psql does: there is no need for a semicolon and an <enter> will send the statement to the backend. Anyway, I can now release the backend process as I would do with any other operating system process, gently or not, for instance via CTRL-D (End of File).

backend>  CTRL-D
%

It is now time to restart the cluster and check if the user postgres can connect again:

% sudo service postgresql start
% psql -U postgres -c "SELECT CURRENT_DATE;" testdb
 current_date 
--------------
 2019-06-27
(1 row)

The world is an happy place again!

Importing data from MSSQL is faster than I thought

2019-06-26T00:00:00+00:00

A few months ago I set up a Foreign Data Wrapper against a Microsoft SQL Server to import historical data. I’m quite impressive about how quick the bulk import is.

Importing data from MSSQL is faster than I thought

It has not been an exciting job, but having PostgreSQL to pull data out of Microsoft SQL Server is a joy! The architecture is quite dumb:

PostgreSQL connects via Foreign Data Wrapper to the MSSQL machine;
a function is executed to extract records, crunch them and store modified into the PostgreSQL engine;
every 10 minutes, iterate.

So far, it is importing 3,5k tuples every ten minutes, that is around 500000 tuples per day. So far it is running in less than a second, and I’m able to monitor that thanks to a status table where I store the clock_timestamp() values for monitoring:

testdb=> SELECT record_count, 
             ts_end - ts_begin AS elapsed_time
         FROM pull_stat ORDER BY pk DESC LIMIT 10;
 record_count |  elapsed_time   
--------------|-----------------
         3659 | 00:00:00.440777
         3656 | 00:00:00.363089
         3694 | 00:00:00.385919
         3713 | 00:00:00.460304
         3695 | 00:00:02.209158
         3678 | 00:00:00.393815
         3699 | 00:00:00.404685
         3693 | 00:00:00.403348
         3704 | 00:00:00.358293
         3683 | 00:00:00.355856
         
testdb=> select version();
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit

While this is surely not a benchmark, I’m quite impressed about the speed of pulling data from a foreign server. It’s interesting to note that the tds version is 7.2 with notice messages enabled (that I suspect lead to a little time expense).

testdb=# \des+
List of foreign servers

Name                 | server_mssql
Owner                | postgres
Foreign-data wrapper | tds_fdw
Access privileges    | 
Type                 | 
Version              | 
FDW options          | (servername '10.0.0.1', database 'AXES', msg_handler 'notice', tds_version '7.2')
Description          | 

A recursive CTE to get information about partitions

2019-06-12T00:00:00+00:00

I was wondering about writing a function that provides a quick status about partitioning. But wait, PostgreSQL has recursive CTEs!

A recursive CTE to get information about partitions

I’m used to partitioning, it allows me to quickly and precisely split data across different tables. PostgreSQL 10 introduced the native partitioning, and since that I’m using native partitioning over inheritance whenever it is possible.
But how to get a quick overview of the partition status? I mean, knowing which partition is growing the more?
In the beginning I was thinking to write a function to do that task, quickly finding myself iterating recursively over pg_inherits, the table that links partitions to their parents. But the keyword here is recursively: PostgreSQL provides recursive Common Table Expression, and a quick search revelead I was right: it is possible to do it with a single CTE. Taking inspiration from this mailing list message, here it is a simple CTE to get a partition status (you can find it on my GitHub repository):

WITH RECURSIVE inheritance_tree AS (
     SELECT   c.oid AS table_oid
            , c.relname  AS table_name
            , NULL::name AS table_parent_name
            , c.relispartition AS is_partition
     FROM pg_class c
     JOIN pg_namespace n ON n.oid = c.relnamespace
     WHERE c.relkind = 'p'
     AND   c.relispartition = false

     UNION ALL

     SELECT inh.inhrelid AS table_oid
          , c.relname AS table_name
          , cc.relname AS table_parent_name
          , c.relispartition AS is_partition
     FROM inheritance_tree it
     JOIN pg_inherits inh ON inh.inhparent = it.table_oid
     JOIN pg_class c ON inh.inhrelid = c.oid
     JOIN pg_class cc ON it.table_oid = cc.oid

)
SELECT
          it.table_name
        , c.reltuples
        , c.relpages
        , CASE p.partstrat
               WHEN 'l' THEN 'BY LIST'
               WHEN 'r' THEN 'BY RANGE'
               ELSE 'not partitioned'
          END AS partitionin_type
        , it.table_parent_name
        , pg_get_expr( c.relpartbound, c.oid, true ) AS partitioning_values
        , pg_get_expr( p.partexprs, c.oid, true )    AS sub_partitioning_values
FROM inheritance_tree it
JOIN pg_class c ON c.oid = it.table_oid
LEFT JOIN pg_partitioned_table p ON p.partrelid = it.table_oid
ORDER BY 1,2;

The bootstrap term in the CTE selects all the tables that are not partition, that is the roots of a partitioning scheme. The recursive term simply joins pg_inherits in order to extract the children information. The query attached to the CTE extracts information like the number of tuples and pages (that’s what I need), and a summary of the partitioning including second level partitioning. Thanks to pg_get_expr it is possible to get a human readable partitioning startegy.

The output results in something like the following:

...
-[ RECORD 5 ]-----------|----------------------------------
table_name              | y2018
reltuples               | 0
relpages                | 0
partitionin_type        | BY LIST
table_parent_name       | root
partitioning_values     | FOR VALUES IN ('2018')
sub_partitioning_values | date_part('month'::text, mis_ora)
...
-[ RECORD 15 ]----------|----------------------------------
table_name              | y2018m10
reltuples               | 1.48956e+07
relpages                | 139212
partitionin_type        | not partitioned
table_parent_name       | y2018
partitioning_values     | FOR VALUES IN ('10')
sub_partitioning_values | 

That states table y2018 is child of table root, accepts values '2018' and is partitioned by list, and children are partitioned by month. On the other hand, y2018m10 is not partitioned anymore and is child of y2018'.
That’s a quick glance at the partitioning status in the cluster! Of course, it is possible to improve on this to get more information or restrict it depending on your needs.

UPDATE 2019-06-15

As per discussion reported on the [email protected]">bugs mailing list the query I originally proposed was tricky: while it was working on v11, it was not on upcoming v12 and the reason was that I was erronously casting NULL to text in the non-recursive term and then unioning with a name in the recursive part. Thanks to the explaination by Tom Lane I was able not only to fix the query, but to gain some more knowledge about PostgreSQL!

Checking the sequences status on a single pass

2019-06-11T00:00:00+00:00

It is quite simple to wrap a couple of queries in a function to have a glance at all the sequences and their cycling status.

Checking the sequences status on a single pass

The catalog pg_sequence keeps track about the definition of a single sequence, including the increment value and boundaries. Combined with pg_class and a few other functions it is possible to create a very simple administrative function to keep track about the overall sequences status.

I’ve created a seq_check() function that provides an output as follows:

testdb=# select * from seq_check() ORDER BY remaining;     
        seq_name        | current_value |    lim     | remaining  
------------------------|---------------|------------|------------
 public.persona_pk_seq  |       5000000 | 2147483647 |     214248
 public.root_pk_seq     |         50000 | 2147483647 | 2147433647
 public.students_pk_seq |             7 | 2147483647 | 2147483640
(3 rows)

As you can see, the function provides the current value of the sequence, the maximum value (limit) and how much values the sequence can still provide before it overflows or cycles. For example, persona_pk_seq has remained with 214248 values to provide. Combined with the current value, that is 5000000, this provides hint about the fact that the sequence has probably a too large increment interval.

The code of the function is as follows:

CREATE OR REPLACE FUNCTION seq_check()
RETURNS TABLE( seq_name text, current_value bigint, lim bigint, remaining bigint )
AS $CODE$
DECLARE
  query text;
  schemaz name;
  seqz    name;
  seqid   oid;
BEGIN

  FOR schemaz, seqz, seqid IN   SELECT n.nspname, c.relname, c.oid
                         FROM   pg_class c
                         JOIN   pg_namespace n ON n.oid = c.relnamespace
                         WHERE  c.relkind = 'S' --sequence
                    LOOP

     RAISE DEBUG 'Inspecting %.%', schemaz, seqz;

     query := format( 'SELECT ''%s.%s'', last_value, s.seqmax AS lim, (s.seqmax - last_value) / s.seqincrement AS remaining  FROM %I.%I, pg_sequence s WHERE s.seqrelid = %s',
                      quote_ident( schemaz ),
                      quote_ident( seqz ),
                      schemaz,
                      seqz,
                      seqid );

     RAISE DEBUG 'Query [%]', query;
     RETURN QUERY EXECUTE query;
  END LOOP;


END
$CODE$
LANGUAGE plpgsql
STRICT;

As you can see, the main query is a join between pg_sequence and data extracted directly from pg_class. The function iterates on all sequences within the system, and this means the function must run with administrator privileges.

I use this handy function to check the status on other machines, and quite frankly I’ve not yet come to remaining being near to zero, therefore I can sleep well at night:

=# select * from seq_check() order by remaining;
         seq_name          | current_value |         lim         |      remaining      
---------------------------|---------------|---------------------|---------------------
 t.root_pk_seq             |        201338 |          2147483647 |          2147282309
 respi.rosseni_tmp_pk_seq  |         16673 |          2147483647 |          2147466974
 respi.pull_status_pk_seq  |         14603 |          2147483647 |          2147469044
 respi.tipo_rossene_pk_seq |             8 |          2147483647 |          2147483639
 respi.root_pk_seq         |     140509487 | 9223372036854775807 | 9223372036714266320
 cron.jobid_seq            |             1 | 9223372036854775807 | 9223372036854775806

Of course, it is quite easy to improve the function adding, for instance, a percent ratio or a near-to-cycle flag.

FizzBuzz (in both plpgsql and SQL)

2019-06-11T00:00:00+00:00

While listening to a great talk by Benno Rice, I was pointed to the FizzBuzz alghortim. How hard could it be to implement it using PostgreSQL?

FizzBuzz (in both plpgsql and SQL)

FizzBuzz is something used as straight question during job interviews: the idea is that if you cannot get the alghoritm fine, you are not a programmer at all!
The alghoritm can be described as:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

Now, how hard could it be? You can find my implementation here. Well, implementing using pgsql is as simple as:

CREATE OR REPLACE FUNCTION
fizzbuzz( start_number int DEFAULT 1, end_number int DEFAULT 100 )
RETURNS VOID
AS
$CODE$
DECLARE
  current_number int;
  current_value  text;
BEGIN
  -- check arguments
  IF start_number >= end_number THEN
     RAISE EXCEPTION 'The start number must be lower then the end one! From % to %', start_number, end_number;
  END IF;

  FOR current_number IN start_number .. end_number LOOP
      current_value := NULL;

      IF current_number % 3 = 0 THEN
         current_value := 'Fizz';
      END IF;
      IF current_number % 5 = 0 THEN
         current_value := coalesce( current_value, '' ) || 'Buzz';
      END IF;

      IF current_value IS NULL THEN
         current_value := current_number::text;
      END IF;

      RAISE INFO '% -> %', current_number, current_value;
  END LOOP;
END
$CODE$
LANGUAGE plpgsql;

This is a possible implementation, as you can see there is more code to test input than to effectively do the work. The only trick, in my opinion, in FizzBuzz is that the case that prints FizzBuzz must to be handled as a different conditional from the one that tests for Fizz or `Buzz**.

But PostgreSQL has also recursive CTEs, and things get more interesting.

WITH RECURSIVE n AS (
     SELECT 1 AS current_number, NULL AS mod_3, NULL AS mod_5
     UNION
     SELECT current_number + 1 as current_number
            , CASE ( current_number + 1 ) % 3 WHEN 0 THEN 'Fizz'
                                              ELSE NULL
                                              END AS mod_3
           , CASE ( current_number + 1 ) % 5 WHEN 0 THEN 'Buzz'
                                              ELSE NULL
                                              END AS mod_5
     FROM n WHERE current_number < 99
)
SELECT current_number, 
       coalesce( mod_3 || mod_5, 
                 mod_3, 
                 mod_5, 
                 current_number::text )
FROM n;

The idea is pretty simple: the n recursive CTE provides a list of one hundred numbers with the strings Fizz, or Buzz or both as a set of rows. Now, such strings must be concatenated, and here comes coalesce. The coalesce function gets the first value that is not NULL. If both mod_3 and mod_5 are not NULL they are concatenated into the FizzBuzz string. Otherwise, either mod_3 or mod_5 is not NULL (but not both), and therefore one of them passes. If none Fizz or Buzz is set, then the regular number is printed as last resort. As you can imagine, the output is similar to:

 current_number | coalesce 
----------------|----------
              1 | 1
              2 | 2
              3 | Fizz
              4 | 4
              5 | Buzz
              6 | Fizz
              7 | 7
              8 | 8
              9 | Fizz
             10 | Buzz
             11 | 11
             12 | Fizz
             13 | 13
             14 | 14
             15 | FizzBuzz

I’m sure there are tons of other implementations, smarter than the above ones. However, what I was interested in demonstrating here was the capability to implement such an alghoritm with PostgreSQL facilities.

Normalize to save space

2019-05-31T00:00:00+00:00

It is no surprise at all: a normalized database requires less space on disk than a not-normalized one.

Normalize to save space

Sometimes you get a database that Just Works (tm) but its data is not normalized. I’m not a big fan of data normalization, I mean it does surely matter, but I don’t tend to “over-normalize” data ahead of design. However, one of my database was growing more and more because of a table with a few repeated extra information.
Of course a normalized database gives you some more disk space at the cost of the joins during query execution, but having a decent server and a small join table is enough to sleep at night!
Let’s see what we are talking about:

mydb=# select pg_size_pretty( pg_database_size( 'mydb' ) );
 pg_size_pretty
----------------
 13 GB
(1 row)

Ok, 13 GB is not something scarying, let’s say it is a fair database to work on (please note the size if reported after a full VACUUM). In such database, I’ve a table root that handles a lot of data from hardware sensors; such table is of course partitioned on a time base scale. One thing the table was storing was information about the sensor name, a text string repeated over and over on child tables too. While this was not a problem in the beginning, it was wasting space over time.

Shame on me!

Let’s go normalize the table!
Normalizing a table is quite straightforward, and I’m not interesting in sharing details here. Let’s say this was quite easy because my users where executing query against a view and not the root table, therefore I simply:

created a join table;
populated the join table extracting data from the root table;
(within a transaction) removed the columns from the root table, modified the view (by dropping and recreating it).

How much space was I supposed to gain? Let’s see how much space did the column occupy:

mydb=# select pg_column_size( 'app.root.sensor_name' );
 pg_column_size
----------------
             20
(1 row)

mydb=# select count(*) from app.root;
   count
-----------
 126224120


mydb=# select pg_size_pretty( 126224120::bigint * 20 );
 pg_size_pretty
----------------
 2408 MB

The text column was estimated 20 bytes, that on 126 milion of tuples was around 2,4 GB of disk space. After the transaction, I did a VACUUM FULL to let PostgreSQL re-arrange the disk space and I got the expected result:

mydb=# select pg_size_pretty( pg_database_size( 'mydb' ));
 pg_size_pretty 
----------------
 9234 MB

Please note that the gained space is a lot more than the one estimated becauce I also refactored other columns here and there. But the normalized database proved to be less space hungry. Remember that the starting size was already vacuumed, so there is no extra space gain due to dead rows lying around.
All the queries are working, the space is optimized, my users are happy, I’m happy!

PostgreSQL is almost the best (according to Stack Overflow Survery)

2019-05-29T00:00:00+00:00

Stack Overflow 2019 Suvery results are available, and PostgreSQL is almost leading in the database field.

PostgreSQL is almost the best (according to Stack Overflow Survery)

According to the 2019 suvery made by Stack Overflow and available here, PostgreSQL is the second top database, slightly ahead of Microsoft SQL Server and cleary ahead of Oracle. And this is true both for community and professional users that take the survey.

PostgreSQL is keeping its high position year after year and this means that the database is growing as a professional choice. In particular, in the professional users’ opinion PostgreSQL is more used and MySQL and MS SQL loose some points.

A glance at pg_cron to automatically schedule database tasks

2019-05-21T00:00:00+00:00

I tend to use cron(1) to schedule some automated tasks on the database server side, and since I discovered the pg_cron extension, I decided to try it. Here are some impressions.

A glance at `pg_cron` to automatically schedule database tasks

pg_cron is an interesting PostgreSQL extension by Citus Data: it does include a background worker (i.e., a PostgreSQL managed process) to execute database tasks on the server side. This is something I’ve done for years, I mean managing automated tasks using operating system wide cron(1) and schedulers alike, but having the scheduler within the database sounds really cool, since I can keep it tied to the data itself.

An example scenario

I’ve one server pulling data regularly out of another server, via a foreign data wrapper. No matter how this design choice sounds to you, it works for me!
In order to constantly pull data, I have set up a cron(1) task in my user crontab to execute a function that does all the business logic I need. Therefore my crontab file looks like:

$ crontab -l
10,20,30,40,50,0 * * * * \
 /usr/bin/psql -U postgres -h 127.0.0.1 \
 -c "SELECT f_pull('crontab import');" mydb

So I’m executing the function f_pull on database mydb specifying a label crontab import. Let’s see how this can be done using pg_cron too.

Installing `pg_cron`

While there are some packages for major Linux distributions, I find it quite easily to install it from the official repository with the following short commands:

$ git clone https://github.com/citusdata/pg_cron.git
$ cd pg_cron
$ export PATH=/usr/pgsql-11/bin:$PATH
$ make
$ sudo PATH=$PATH make install

and in the case it does matter, I’m using a CentOS 7 Linux here.

Now, in order to make pg_cron working it must be loaded as a shared library, so you have to adjust the PostgreSQL configuration (usually ~postgresql.conf~) as follows:

shared_preload_libraries = 'pg_cron' 
cron.database_name = 'mydb'

Here I use mydb as the database on which store the pg_cron data. In fact, pg_cron will create a cron schema with a table job in there that will do the same as your crontab file on any Unix machine. Unluckily, you need to restart the PostgreSQL cluster in order to apply changes.

$ sudo service postgresql-11 restart

It is now time to decide who will execute the cron jobs, and in my case it is the postgres superuser. This could be not the optimal choice, so choose the user that fits the need for you. In my case I was already using cron(1) with postgres user, so it sounded to me the right and faster way to migrate from regular cron to pg_cron. Why does it matter choosing the user in advance? Because pg_cron requires such user to be able to connect to the database without providing any password, so you either should adjust the pg_hba.conf properly or add a .pgpass in the home the user. Yes, even if a background worker is used to implement the pg_cron features, the connection happens thru libpq, so the need for the user to be granted to connect withou providing a password. Therefore I changed the pg_hba.conf as follows:

host    all          all        127.0.0.1/32     md5
host    mydb    postgres   localhost        trust

adding the specific line for postgres (line ordering does not matter) and leaving all other connections requiring a password. Now, you can issue a reload and test your user connectivity. Once this is done, you can configure pg_cron.

Configuring a job

Configuring pg_cron is really simple: all jobs are kept in the cron.job table and you can either edit such table with standard SQL or use the cron.schedule() function to get an initial entry to work later on. Since I was migrating a cron(1) entry, things were as simple as copy and paste the cron(1) entry line with dollar quoting:

mydb=# SELECT cron.schedule('5,15,25,35,45,55 * * * *',
  $CRON$ SELECT f_pull('pg_cron import'); $CRON$
  );

pg_cron replies with the identifier of the job, in my case 1 because it is the very first job inserted in the scheduler. I can inspect it with an ordinary SELECT against the cron.job table.

mydb=# SELECT * FROM cron.job;
-[ RECORD 1 ]------------------------------------------------------------
jobid    | 1
schedule | 5,15,25,35,45,55 * * * *
command  | SELECT public.f_pull('pg_cron import');
nodename | localhost
nodeport | 5432
database | mydb
username | postgres
active   | t

All the fields are perfectly understandable, and please note that the schedule field reports the string in the exact same format of cron(1); this is due to the fact the pg_cron uses the very same parser as cron(1), making migration from cron(1) to pg_cron really easy. By feault cron.schedule() uses the current PostgreSQL instance parameters and the current username, but you can than adjust them to something else. While I haven’t tested it, this means you could execute cron task from one PostgreSQL into a remote one.

And that’s all!
Now you can sit down and check your cron jobs.

`pg_cron` logging

Things never works the first time! In the case you need inspection, consider that pg_cron logs at the LOG level and provides a statement for job begin and end. A succesfully executed job prints log statements as

LOG:  cron job 1 starting: SELECT public.f_pull('pg_cron import');
LOG:  cron job 1 completed: 1 row

while a failing job prints lines as

LOG:  cron job 1 starting: SELECT public.f_pull('pg_cron import');
LOG:  cron job 1 connection failed

Often problems arise due to connection permissions or grants, so double check your cron user is really able to do what you are exepcting to do. In my case it was really simple because I was already using a cron(1) job, so the user was already granted to do its job.

Conclusions

pg_cron is an awesome tool to keep in your toolbag because it makes really easy to migrate from cron(1) to pg_cron (and back!). Moreover, being an extension, it makes all schedule configuration available within the database, and since cron.job is added to the backup from the extension installation instruction, this means you will get scheduler backups for free!

The role of a role within another role

2019-05-09T00:00:00+00:00

A recursive title for a kind of recursive topic: what does really mean to have a role into another one? This article tries to figure out some basic knowledge about it.

The role of a role within another role

After reading the very excellent article by Hans-Jürgen Schönig about roles, I decided to provide my own vision about users, groups and the more abstract role concept.

The word role

First of all, the word role has little to do with PostgreSQL: it is a word used in the SQL standard, so don’t blame our favourite database for using the same word to express different concepts like user and group.

Roles: are they users or groups?

The wrong part of the question is or: roles are both users and groups. Period. A role is a stereotype, an abstraction for saying a collection of permissions to do some stuff. Now, often a collection of permission is granted to a user, and therefore a role smells like an user account, but in my opinion this is just a coincidence. And in fact, as in the best system administration tradition, when you have to assign a collection of permissions to more than one user you need a group; roles can therefore smell like a group.
Remember: roles are collection of permission, what makes they smell as a group or an user is just the way you use them. If you use a role for a single user, then it is fine to think the role as an user account. If you use the role for more than one user, then it is fine to think the role as a group.
Now, if you think this is trivial and simple, consider that a role can smell like an user and a group at the same time. A role is a representative of a collection of permissions and therefore can be something assigned to a single user, to a group (multiple users) or both. Somehow, it is like the chief of a company: he is playing at the same time as an employee and as an employer, as well as a representation of the company itself.

Enough, let’s see something!

Consider a very simple example: a school with a schoolars table that can be writen only by professors and read by other students: as you can image both professors and students will be groups of permissions.

testdb=# CREATE ROLE professors WITH LOGIN;
CREATE ROLE
testdb=# CREATE ROLE students WITH LOGIN;
CREATE ROLE

testdb=# REVOKE ALL ON schoolars FROM PUBLIC;
REVOKE
testdb=# GRANT ALL ON schoolars TO professors;
GRANT
testdb=# GRANT SELECT ON schoolars TO students;
GRANT

Anybody playing the professors role can do whatever he wants against the table:

testdb=> SELECT CURRENT_USER;
 current_user 
--------------
 professors
(1 row)

testdb=> TABLE schoolars;
 pk |     name     
----|--------------
  1 | Harry Potter
  2 | Luca Ferrari
(2 rows)

testdb=> INSERT INTO schoolars(name) VALUES('Ron Weasly');
INSERT 0 1

but anybody playing the students role cannot:

testdb=> SELECT CURRENT_USER;
 current_user 
--------------
 students
(1 row)

testdb=> TABLE schoolars;
 pk |     name     
----|--------------
  1 | Harry Potter
  2 | Luca Ferrari
  3 | Ron Weasly
(3 rows)

testdb=> INSERT INTO schoolars(name) VALUES('Rubeus Hagrid');
ERROR:  permission denied for table schoolars

So far, so good! But our groups are not very useful so far, they act as single accounts. Let’s create a professor and add it to the professors group and see what happens:

testdb=# CREATE ROLE severus 
         WITH LOGIN 
         IN ROLE professors;
         
CREATE ROLE

The IN ROLE professors clause makes the role severus belonging to the professors group, and so we would expect it can do whatever the group can do:

testdb=> SELECT CURRENT_USER;
 current_user 
--------------
 severus
(1 row)

testdb=> TABLE schoolars;
 pk |     name     
----|--------------
  1 | Harry Potter
  2 | Luca Ferrari
  3 | Ron Weasly
(3 rows)

testdb=> INSERT INTO schoolars(name) VALUES('Drako Malfoy');
INSERT 0 1

So far so good, again! However, the above example worked as expected because of the default INHERIT behavior as clearly stated in the documentation:

The INHERIT attribute is the default for reasons of backwards compatibility: 
in prior releases of PostgreSQL, users always had access to all privileges 
of groups they were members of. 
However, NOINHERIT provides a closer match to the 
semantics specified in the SQL standard.

Role inheritance

When a role is attached to another role, and therefore is a member of the latter as if it was a group, PostgreSQL automatically uses the INHERIT property of the CREATE ROLE. Such property states that all permissions of the group the role is going to be a member will be forwarded to the member itself. In the above example, it does mean that severus has all the permissions of professors for free.
But what happens if the role has been created without inheritance?

testdb=# CREATE ROLE severus WITH LOGIN IN ROLE professors NOINHERIT;
CREATE ROLE


testdb=> SELECT CURRENT_USER;
 current_user 
--------------
 severus
(1 row)

testdb=> TABLE schoolars;
ERROR:  permission denied for table schoolars

The role still owns all the permissions, but it explicitly needs to state which set of permission must eb applied and this is done via a SET ROLE command:

testdb=> SET ROLE TO professors;
SET
testdb=> SELECT CURRENT_USER;
 current_user 
--------------
 professors
(1 row)

testdb=> TABLE schoolars;
 pk |     name     
----|--------------
  1 | Harry Potter
  2 | Luca Ferrari
  3 | Ron Weasly
  5 | Drako Malfoy
(4 rows)

It is like the role severus is allowed to become another user, like with system command sudo(1), but explicitly needs to become such user. In the case of INHERIT instead (the default behavior), all permissions are automatically granted.

Dynamic behvaior

Let’s add another professor, say albus, so that we will have albus that inherits from professors and severus who does not, but before that remove the INSERT permission from the professors group:

testdb=# REVOKE INSERT 
         ON schoolars 
         FROM professors;
REVOKE


testdb=# CREATE ROLE albus 
         WITH LOGIN 
         IN ROLE professors 
         INHERIT;
CREATE ROLE

Let’s see what this mean at run-time:

testdb=> SELECT CURRENT_USER;
 current_user 
--------------
 severus
(1 row)

testdb=> INSERT INTO schoolars(name) 
         VALUES('Lord Voldemort');
ERROR:  permission denied for table schoolars
testdb=> SET ROLE professors;
SET
testdb=> INSERT INTO schoolars(name) 
         VALUES('Lord Voldemort');
ERROR:  permission denied for table schoolars


testdb=> SELECT CURRENT_USER;
 current_user 
--------------
 albus
(1 row)

testdb=> INSERT INTO schoolars(name) 
         VALUES('Lord Voldemort');
ERROR:  permission denied for table schoolars

Neither albus nor severus can anymore insert a new tuple, as we would expect. Now let’s add again the INSERT permission to professors:

testdb=# GRANT INSERT 
         ON schoolars 
         TO professors;
GRANT

Let’s see how both severus and albus can now perform an evil insert:

testdb=> SELECT CURRENT_USER;
 current_user 
--------------
 severus
(1 row)

testdb=> INSERT INTO schoolars(name) 
         VALUES('Lord Voldemort');
ERROR:  permission denied for table schoolars
testdb=> SET ROLE professors;
SET
testdb=> INSERT INTO schoolars(name) 
         VALUES('Lord Voldemort');
INSERT 0 1

testdb=> SELECT CURRENT_USER;
 current_user 
--------------
 albus
(1 row)

testdb=> INSERT INTO schoolars(name) 
         VALUES('Lord Voldemort');
INSERT 0 1

Did you spot the difference? INHERIT means that the permission is immediatly granted at run-time to the role, while without inheritance the role must still become the target role to exploit the privileges.

Summary

So what is all about? When you create a role you can assign it to another role, that is make it belonging to a group. Such group must be enabled explicitly with a SET ROLE or, in the case of INHERITANCE all the permissions will be granted to the final user. Remember: a role is just a collection of priviliges, and how you nest a role into another merges all the privileges, either flatting them (INHERIT) or keeping them separated (NOINHERIT).

An article about pgenv

2019-04-17T00:00:00+00:00

A few months ago I worked to improve the great pgenv tool by theory. Today, I try to spread the word in the hope this tool can grow a little more!

An article about pgenv

tl;dr

I proposed a talk about pgenv, a Bash tool to manage several PostgreSQL instances on the same local machine, to the Italian PGDay 2019.
My talk has been rejected, and I hate to waste what I have already prepared, so I decided to transform my talk in an article, that has been quickly accepted on Haikin9 Devops Issue!

I should have written about this a couple of months ago, but I did not had time to.
My hope is that pgenv gets more and more users, so that it can grow and become someday a widely used tool. Quite frankly, I don’t see this happening while being in Bash, for both portability and flexibility, and I suspect Perl is much more the language for a more flexible implementation. However, who knows? Gathering users is also a way to gather contributors and bring therefore new ideas to this small but very useful project.

In the meantime, if you have time and will, try testing the build from git patch, that allows you to build and manage a development version of our beloved database.

Estimating row count from explain output...in Perl!

2019-04-04T00:00:00+00:00

After having read the interesting post by Laurenz Albe on how to use EXPLAIN to get a quick estimate of a query count, I decided to implement the same feature in Perl.

Estimating row count from explain output…in Perl!

At the end of his blog post, Laurenz Albe shows how to use a quick and dirty function to estimate the number of rows returned by an arbitrary query.

While I don’t believe it is often a good idea to judge the size of a query by the optimizer guesses, the approach is interesting. Laurenz shows how to exploit the JSON format and query facilities to extract data from the EXPLAIN output, why not using Perl to crunch the textual data?

So here it is a simple implementation to extract the estimate within Perl:

CREATE OR REPLACE FUNCTION plperl_row_estimate( query text )
RETURNS BIGINT
AS $PERL$

   my ( $query ) = @_;
   return 0 if ( ! $query );
   $query = sprintf "EXPLAIN (FORMAT YAML) %s", $query;

   elog( DEBUG, "Estimating from [$query]" );
   my @estimated_rows = map { s/Plan Rows:\s+(\d+)$/$1/; $_ }
                        grep { $_ =~ /Plan Rows:/ }
                        split( "\n", spi_exec_query( $query )->{ rows }[ 0 ]->{ "QUERY PLAN" } );

   return 0 if ( ! @estimated_rows );
   return $estimated_rows[ 0 ];
$PERL$
LANGUAGE plperl;

Let’s see an example in action:

testdb=> select plperl_row_estimate( 'SELECT p.* FROM persona p JOIN persona k on k.pk = p.pk WHERE k.eta = 40' );

 plperl_row_estimate 
---------------------
               69500

How does the function works? The main trick is at this point in code:

   my @estimated_rows = map { s/Plan Rows:\s+(\d+)$/$1/; $_ }
                        grep { $_ =~ /Plan Rows:/ }
                        split( "\n", spi_exec_query( $query )->{ rows }[ 0 ]->{ "QUERY PLAN" } );

where thru spi_exec_query an EXPLAIN is executed and its format, in YAML is split into an array of strings, one entry per line. Such array, is then passed to grep to exclude all rows that do not contain information about the row estimation. Last, map extracts the numeric value from such lines.

After that, therefore, there is an array of @estimated_rows entries where each one contains the rows estimatation of each plan node, with the outer node in the begin of the array. Such single position is therefore returned by the function and all the others are dropped away.

As a final note, please consider that such function accepts an arbitrary piece of text and tries to execute it as a query, therefore it must be used carefully to avoid SQL-injection and problems alike.

psql.it Mailing List is Back!

2019-03-25T00:00:00+00:00

The historical mailing list of the Italian psql.it group has been succesfully migrated!

`psql.it` Mailing List is Back!

With the great work of people behind the psql.it Italian group the first (and for many years the only one) Italian language mailing list has been migrated to a new platform and is now online again!

On this mailing list you can find a few very talented people willing to help with your PostgreSQL-related problem or curiosity, to discuss the current status and the future of the development and anything else you would expect from a very technical mailing list. Of course, the language is Italian!.

The link to the new mailing list management panel is https://www.freelists.org/list/postgresql-it.
Enjoy!

Running pgbackrest on FreeBSD

2019-03-04T00:00:00+00:00

I tend to use FreeBSD as my PostgreSQL base machine, and that’s not always as simple as it sounds to get software running on it. In this post I take some advices on running pgbackrest on FreeBSD 12.

Running pgbackrest on FreeBSD

pgbackrest is an amazing tool for backup and recovery of a PostgreSQL database. However, and this is not a critique at all, it has some Linux-isms that make it difficult to run on FreeBSD. I tried to install and run it on FreeBSD 12, stopping immediatly at the compilation part. So I opened an issue to get some help, and then tried to experiment a little more to see if at least I could compile.

The first trial was to cross-compile: I created the executable (pgbackrest has a single executable) on a Linux machine, then moved it to the FreeBSD machine along with all the ldd libraries (placed into /compat/linux/lib64). But libpthread.so.0 prevented me to start the command:

% ./pgbackrest 
./pgbackrest: error while loading shared libraries: libpthread.so.0: 
  cannot open shared object file: No such file or directory

So I switched back to native compilation and, as described in the issue I made a little changes to the client.c and the Makefile. Since it compiled (using of course gmake), I also made a little more changes to Makefile to compile and install it the FreeBSD way (i.e., under /usr/local/bin). The full diff is the following (some changes are not shown in the issue):

% git diff
diff --git a/src/Makefile b/src/Makefile
index 73672bff..0472c7f1 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -8,7 +8,7 @@
CC=gcc
# Compile using C99 and Posix 2001 standards (also _DARWIN_C_SOURCE for MacOS)
-CSTD = -std=c99 -D_POSIX_C_SOURCE=200112L -D_DARWIN_C_SOURCE
+CSTD = -std=c99 
# Compile optimizations
COPT = -O2
@@ -51,7 +51,7 @@ LDFLAGS = -lcrypto -lssl -lxml2 -lz $(LDPERL) $(LDEXTRA)
# Install options
####################################################################################################################################
# Modify destination install directory
-DESTDIR =
+DESTDIR = /usr/local/
####################################################################################################################################
# List of required source files.  main.c should always be listed last and the rest in alpha order.
@@ -175,8 +175,8 @@ pgbackrest: $(OBJS)
# Installation.  DESTDIR can be used to modify the install location.
####################################################################################################################################
install: pgbackrest
-       install -d $(DESTDIR)/usr/bin
-       install -m 755 pgbackrest $(DESTDIR)/usr/bin
+       install -d $(DESTDIR)bin
+       install -m 755 pgbackrest $(DESTDIR)/bin
####################################################################################################################################
# Compile rules
diff --git a/src/common/io/tls/client.c b/src/common/io/tls/client.c
index ddddb790..10b1d538 100644
--- a/src/common/io/tls/client.c
+++ b/src/common/io/tls/client.c
@@ -25,6 +25,7 @@ TLS Client
#include "common/type/keyValue.h"
#include "common/wait.h"
#include "crypto/crypto.h"
+#include <netinet/in.h>
/***********************************************************************************************************************************
Object type

Then, following the FreeBSD software paths, I created /usr/local/etc/pgbackrest/pgbackrest.conf and prooceed. So far everything seems working, even if as far as I know, FreeBSD is not a tested platform, so I’m working at my own risk (and so are you if you doing the same installation)!

One little annoying detail is the configuration file: pgbackrest defaults to /etc/pgbackrest/pgbackrest.conf, and such file seems to me to be hardcoded into the config/parse.c source file:

#define PGBACKREST_CONFIG_FILE                                      PROJECT_BIN ".conf"
#define PGBACKREST_CONFIG_ORIG_PATH_FILE                            "/etc/" PGBACKREST_CONFIG_FILE
STRING_STATIC(PGBACKREST_CONFIG_ORIG_PATH_FILE_STR,             PGBACKREST_CONFIG_ORIG_PATH_FILE);

or at least I don’t see any comfortable way to change such behavior. The problem is that having to specify the FreeBSD-style configuration file /usr/local/etc/pgbackrest/pgbackrest.conf is not only annoying, but can cause weird errors, most notably an apparently unrelated error like

option pg1-path must be specified when relative wal paths are used

because the archive_command specified did not included the same configuration file and pgbackrest was looking for its default. In other words, ensures that the PostgreSQL instance has something like:

archive_command = '/usr/local/bin/pgbackrest 
                   --stanza=main 
                   --config=/usr/local/etc/pgbackrest/pgbackrest.conf  
                   archive-push %p'

That’s made me think that linking /usr/local/etc/pgbackrest directory to /etc/pgbackrest could be, at this point a good solution to avoid some future mess.

Luca Ferrari

pgenv 1.4.3 is out!

pgenv 1.4.3 is out!

PgTraining OpenDay is over!

PgTraining OpenDay is over!

pgagroal now has docker files!

pgagroal now has docker files!

pgenv 1.4.0 is out!

pgenv 1.4.0 is out!

OpenDay 2025 by PgTraining

OpenDay 2025 by PgTraining

Open Day 2025 in Bolzano (Italy): schedule available

Open Day 2025 in Bolzano (Italy): schedule available

The importance of testing with not-so-usual setups

The importance of testing with not-so-usual setups

The memory bug

Understanding the bug

Conclusions

OpenDay 2025 in Bolzano (Italy)

OpenDay 2025 in Bolzano (Italy)

PL/Perl now ties %ENV

PL/Perl now ties %ENV

dbicdump: using PostgreSQL schemas as package separator in produced Perl classes

dbicdump: using PostgreSQL schemas as package separator in produced Perl classes

Example Database

Dumping the schema via dbicdump

Using the table structure

Using Relationships

Conclusion

psql watch now has a row limit

psql \watch now has a row limit

PostgreSQL is super solid in enforcing (well established) constraints!

PostgreSQL is super solid in enforcing (well established) constraints!

PostgreSQL 17 WAL Summarization

PostgreSQL 17 WAL Summarization

pgenv 1.3.8 is out!

pgenv 1.3.8 is out!

PostgreSQL adds the login type for event triggers

PostgreSQL adds the login type for event triggers

PostgreSQL 17 allow_alter_system tunable

PostgreSQL 17 allow_alter_system tunable

SQLite3 Vacuum and Autovacuum

SQLite3 Vacuum and Autovacuum

pg_dump and --if-exists little gem

pg_dump and –if-exists little gem

PgTraining Free Online Event: Material Available

PgTraining Free Online Event: Material Available

Using PL/Java: need for clarifications

Using PL/Java: need for clarifications

Editing the java.policy file

Hopefully, there is no need to SET pljava.libjvm_location

Using the pljava-api (locally)

Information in the sqlj.jar_repository table

Is there a round-trip of data between PostgreSQL and PL/Java?

Conclusions

pgenv: run once scripts

pgenv: run once scripts

PostgreSQL 16 Coin

PostgreSQL 16 Coin

pgagroal-cli minor bug fixes

pgagroal-cli minor bug fixes

pgagroal command refactoring (again!) and a new contributor!

pgagroal command refactoring (again!) and a new contributor!

PgTraining Online Event 2024 (italian)

PgTraining Online Event 2024 (italian)

pgagroal 1.6.0 has been released

pgagroal 1.6.0 has been released

Using PL/Java to Return SETOF RECORD

Using PL/Java to Return SETOF RECORD

Implementing the ResultSetProvider

Creating a function to call the producer

Using the function

Passing dynamically the number of rows to produce

Conclusions

pgagroal-cli gains JSON output

pgagroal-cli gains JSON output

How to use the JSON output

Format of JSON output

What about pgagroal-cli friends?

A Brief History

Dumping the schema via `dbicdump`

Editing the `java.policy` file

Hopefully, there is no need to `SET pljava.libjvm_location`

Using the `pljava-api` (locally)

Information in the `sqlj.jar_repository` table

Implementing the `ResultSetProvider`

What about `pgagroal-cli` friends?

The begin of the problems: `dbicdump` and `DBIx::Schema::Loader`

More investigation: `DBIx::Class::Schema::Loader::DBI::Pg`

More and more investigation: `DBI::column_info`

Don’t blame `DBI`!

How does `DBD::Pg` finds out the information about a column?

The `ping` command

The `status details` command

Using `psql` variables to obtain the computation automatically

What is `\watch`?

The new `\watch` count option

Using `plpgsql_check` as a possible help