RFC: Extension Packaging & Lookup

25 Nov 2024

Several weeks ago, I started a pgsql-hackers thread proposing a new extension file organization and a search path GUC for finding extensions. The discussion of Christoph Berg’s extension_destdir patch inspired this proposal. These threads cover quite a lot of territory, so I want to pull together a more unified, public proposal.

Here goes.

Challenges

A number of challenges face extension users, thanks to extension file organization in the Postgres core. The common thread among them is the need to add extensions without changing the contents of the Postgres installation itself.

Packager Testing

On Debian systems, the user account that creates extension packages lacks permission to add files to Postgres install. But testing extensions requires installing the extension where Postgres can find it. Moreover, extensions ideally build against a clean Postgres install; adding an extension in order to run make installcheck would pollute it.

Christoph’s patch solves these problems by adding a second lookup path for extensions and dynamic modules, so that Postgres can load them directly from the package build directory.

Alas, the patch isn’t ideal, because it simply specifies a prefix and appends the full pg_config directory paths to it. For example, if --sharedir outputs /opt/share and extension_destdir GUC is set to /tmp/build/myext, the patch will search in /tmp/build/myext/opt/share. This approach works for the packaging use case, which explicitly uses full paths with a prefix, but would be weird for other use cases.

Peter Eisentraut proposed an alternate patch with a new GUC, extension_control_path, that provides a more typical search path pattern to find extension control files, but doesn’t account for shared modules that ship with an extension, requiring that they still live in the dynamic_library_path. Installing into custom directories requires the undocumented datadir and pkglibdir variables:

make install datadir=/else/where/share pkglibdir=/else/where/lib

This pattern can probably be simplified.

OCI Immutability

OCI (née Docker) images are immutable, while a container image runs on a writeable but non-persistent file system. To install persistent extensions in a container, one must create a persistent volume, map it to SHAREDIR/extensions, and copy over all the extensions it needs (or muck with symlink magic). Then do it again for shared object libraries (PKGLIBDIR), and perhaps also for other pg_config directories, like --bindir. Once it’s all set up, one can install a new extension and its files will be distributed to the relevant persistent volumes.

This pattern makes upgrades tricky, because the core extensions are mixed in with third-party extensions. Worse, the number of directories that must be mounted into volumes depends on the features of an extension, increasing deployment configuration complexity. It would be preferable to have all the files for an extension in one place, rather than scattered across multiple persistent volumes.

Peter Eisentraut’s patch addresses much of this issue by adding a search path for extension control files and related data/share files (generally SQL files). One can create a single volume with a lib directory for shared modules and share/extension directory for control and data/share files.

OCI Extension Images

However, an additional wrinkle is the ambition from the CloudNativePg (CNPG) community to eliminate the need for a persistent volume, and rely instead on mounting images that each contain all the files for a single extension as their own volumes, perhaps using Kubernetes image volume feature, (currently in alpha).

This feature requires all the file in an extension to live in a single directory, a volume mounted to an extension image contains all the files required to use the extension. The search path patches proposed so far do not enable this behavior.

Postgres.app Immutability

The macOS Postgres.app supports extensions. But installing one into SHAREDIR/extensions changes the contents of the Postgres.app bundle, breaking Apple-required signature validation. The OS will no longer be able to validate that the app is legit and refuse to start it.

Peter Eisentraut’s new patch addresses this issue as well, with all the same caveats as for the packager testing challenges.

Solution

To further address these issues, this RFC proposes to change file organization and lookup patterns for PostgreSQL extensions.

Extension Directories

First, when an extension is installed, by default all of its files will live in a single directory named for the extension. The contents include:

The Control file that describes extension
Subdirectories for SQL, shared modules, docs, binaries, etc.

Subdirectories roughly correspond to the pg_config --*dir options:

bin: Executables
doc: Documentation files
html: HTML documentation files
lib: Dynamically loadable modules
locale: Locale support files
man: Manual pages
share: SQL and other architecture-independent support files

This layout reduces the cognitive overhead for understanding what files belong to what extension. Want to know what’s included in the widget extension? Everything is in the widget directory. It also simplifies installation of an extension: one need add only a directory named for and containing the files required by the extension.

Configuration Parameter

Add a new pg_config value that returns the directory into which extensions will by default be installed:

 --extdir   show location of extensions

Its default value would be $(pg_config --sharedir)/extension, but could be set at compile time like other configuration parameters. Its contents consist of subdirectories that each contain an extension, as described in Extension Directories. With a few extensions installed, it would look something like:

❯ ls -1 "$(pg_config --extdir)"
auto_explain
bloom
isn
pair
plperl
plpgsql
plv8
xml2
semver
vector

Extension Path

Add an extension lookup path GUC akin to dynamic_library_path, called extension_path. It lists all the directories that Postgres will search for extensions and their files. The default value for this GUC will be:

extension_path = '$extdir'

The special string $extdir corresponds to the pg_config option of the same name, and function exactly as $libdir does for the dynamic_library_path GUC, substituting the appropriate value.

Lookup Execution

Update PostgreSQL’s CREATE EXTENSION command to search the directories in extension_path for an extension. For each directory in the list, it will look for the extension control file in a directory named for the extension:

$dir/$extension/$extension.control

The first match will be considered the canonical location for the extension. For example, if Postgres finds the control file for the pair at /opt/pg17/ext/pair/pair.control, it will load files only from the appropriate subdirectories, e.g.:

SQL files from /opt/pg17/ext/pair/share
Shared module files from /opt/pg17/ext/pair/lib

PGXS

Update the extension installation behavior of PGXS to install extension files into the new layout. A new variable, $EXTDIR, will define the directory into which to install extension directories, and default to $(pg_config --extdir). It can be set to any literal path, which must exist and be accessible by the PostgreSQL service.

The $EXTENSION variable will be changed to allow only one extension name. If it’s set, the installation behavior will be changed for the following variables:

EXTENSION: Creates $EXTDIR/$EXTENSION, installs $EXTDIR/$EXTENSION/$EXTENSION.control
MODULES and MODULE_big: Installed into $EXTDIR/$EXTENSION/lib
MODULEDIR: Removed
DATA and DATA_built: Installed into $EXTDIR/$EXTENSION/share
DATA_TSEARCH: Installed into $EXTDIR/$EXTENSION/share/tsearch_data
DOCS: Installed into $EXTDIR/$EXTENSION/doc
PROGRAM, SCRIPTS and SCRIPTS_built: Installed into $EXTDIR/$EXTENSION/bin

Each of these locations can still be overridden by setting one of the (currently undocumented) installation location options (e.g., datadir, pkglibdir, etc.).

External projects that install extensions without using PGXS, like

pgrx, must also be updated to either follow the same pattern or to delegate installation to PGXS.

Control File

The directory control file parameter will be deprecated and ignored.

The module_pathname parameter should only name a shared module in the lib subdirectory of an extension directory. Any existing use of a $libdir prefix will be stripped out and ignored before replacing the MODULE_PATHNAME string in SQL files. The implication for loading extension dynamic modules¹ differs from the existing behavior as follows:

If the name is an absolute path, the given file is loaded.
If the name does not contain a directory part, the file is searched for in the in the lib subdirectory of the extension’s directory ($EXTDIR/$EXTENSION/lib).
Otherwise (the file was not found in the path, or it contains a non-absolute directory part), the dynamic loader will try to take the name as given, which will most likely fail. (It is unreliable to depend on the current working directory.)

Use Cases

Here’s how the proposed file layout and extension_path GUC addresses the use cases that inspired this RFC.

Packager Testing

A packager who wants to run tests without modifying a PostgreSQL install would follow these steps:

Prepend a directory under the packaging install to the extension_path GUC. The resulting value would be something like $RPM_BUILD_ROOT/$(pg_config --extdir):$extdir.
Install the extension into that directory: make install EXTDIR=$RPM_BUILD_ROOT
Make sure the PostgreSQL server can access the directory, then run make installcheck

This will allow PostgreSQL to find and load the extension during the tests. The Postgres installation will not have been modified; only the extension_path will have changed.

OCI/Kubernetes

To allow extensions to be added to a OCI container and to persist beyond its lifetime, one or more volumes could be used. Some examples:

Mount a persistent volume for extensions and prepend the path to that directory to the extension_path GUC. Then Postgres can find any extensions installed there, and they will persist. Files for all extensions will live on a single volume.
Or, to meet a desire to keep some extensions separate (e.g., open-source vs company-internal extensions), two or more persistent volumes could be mounted, as long as they’re all included in extension_path, are accessible by PostgreSQL, and users take care to install extensions in the proper locations.

CNPG Extension Images

To meet the CNPG ambition to “install” an extension by mounting a single directory for each, create separate images for each extension, then use the Kubernetes image volume feature (currently in alpha) to mount each as a read-only volume in the appropriate subdirectory of a directory included in extension_path. Thereafter, any new containers would simply have to mount all the same extension image volumes to provide the same extensions to all containers.

Postgres.app

To allow extension installation without invalidating the Postgres.app bundle signature, the default configuration could prepend a well-known directory outside the app bundle, such as /Library/Application Support/Postgres, to extension_path. Users wishing to install new extensions would then need to point the EXTDIR parameter to that location, e.g.,

$ make install EXTDIR="/Library/Application Support/Postgres"`

Or the app could get trickier, setting the --extdir value to that location so that users don’t need to use EXTDIR. As long as extension_path includes both the bundle’s own extension directory and this external directory, Postgres will be able to find and load all extensions.

Extension Directory Examples

A core extension like citext would have a structure similar to:

citext
├── citext.control
├── lib
│   ├── citext.dylib
│   └── bitcode
│       ├── citext
│       │   └── citext.bc
│       └── citext.index.bc
└── share
    ├── citext--1.0--1.1.sql
    ├── citext--1.1--1.2.sql
    ├── citext--1.2--1.3.sql
    ├── citext--1.3--1.4.sql
    ├── citext--1.4--1.5.sql
    ├── citext--1.4.sql
    └── citext--1.5--1.6.sql

The subdirectory for a pure SQL extension named “pair” in a directory named “pair” that looks something like this:

pair
├── LICENSE.md
├── README.md
├── pair.control
├── doc
│   ├── html
│   │   └── pair.html
│   └── pair.md
└── share
    ├── pair--1.0--1.1.sql
    └── pair--1.1.sql

A binary application like pg_top would live in the pg_top directory, structured something like:

pg_top
├── HISTORY.rst
├── INSTALL.rst
├── LICENSE
├── README.rst
├── bin
│   └── pg_top
└── doc
    └── man
        └── man3
            └── pg_top.3

And a C extension like semver would live in the semver directory and be structured something like:

semver
├── LICENSE
├── README.md
├── semver.control
├── doc
│   └── semver.md
├── lib
│   ├── semver.dylib
│   └── bitcode
│       ├── semver
│       │   └── semver.bc
│       └── semver.index.bc
└── share
    ├── semver--1.0--1.1.sql
    └── semver--1.1.sql

Phase Two: Preloading

The above-proposed solution does not allow shared modules distributed with extensions to compatibly be loaded via shared library preloading, because extension modules wil no longer live in the dynamic_library_path. Users can specify full paths, however. For example, instead of:

shared_preload_libraries = 'pg_partman_bgw'

One could use the path to the lib subdirectory of the extension’s directory:

shared_preload_libraries = '/opt/postgres/extensions/pg_partman_bgw/lib/pg_partman_bgw'

But users will likely find this pattern cumbersome, especially for extensions with multiple shared modules. Perhaps some special syntax could be added to specify a single extension module, such as:

shared_preload_libraries = '$extension_path::pg_partman_bgw'

But this overloads the semantics of shared_preload_libraries and the code that processes it rather heavily, not to mention the LOAD command.

Therefore, as a follow up to the solution proposed above, this RFC proposes additional changes to PostgreSQL.

Extension Preloading

Add new GUCs that complement shared library preloading, but for extension module preloading:

shared_preload_extensions
session_preload_extensions
local_preload_extensions

Each takes a list of extensions for which to preload shared modules. In addition, another new GUC, local_extensions, will contain a list of administrator-approved extensions users are allowed to include in local_preload_extensions. This GUC complements local_preload_libraries’s use of a plugins directory.

Then modify the preloading code to also preload these files. For each extension in a list, it would:

Search each path in extension_path for the extension.
When found, load all the shared libraries from $extension/lib.

For example, to load all shared modules in the pg_partman extension, set:

shared_preload_extensions = 'pg_partman'

To load a single shared module from an extension, give its name after the extension name and two colons. This example will load only the pg_partman_bgw shared module from the pg_partman extension:

shared_preload_extensions = 'pg_partman::pg_partman_bgw'

This change requires a one-time change to existing preload configurations on upgrade.

Future: Deprecate LOAD

For a future change, consider modifying CREATE EXTENSION to support shared module-only extensions. This would allow extensions with no SQL component, such as auto_explain, to be handled like any other extension; it would live under one of the directories in extension_path with a structure like this:

auto_explain
├── auto_explain.control
└── lib
   ├── auto_explain.dylib
   └── bitcode
       ├── auto_explain
       │   └── auto_explain.bc
       └── auto_explain.index.bc

Note the auto_explain.control file. It would need a new parameter to indicate that the extension includes no SQL files, so CREATE EXTENSION and related commands wouldn’t try to find them.

With these changes, extensions could become the primary, recommended interface for extending PostgreSQL. Perhaps the LOAD command could be deprecated, and the *_preload_libraries GUCs along with it.

Compatibility Issues

The module_pathname control file variable would prefer the name of a shared module. The code that replaces the MODULE_PATHNAME string in SQL files would to strip out the $libdir/ prefix, if present.
The behavior of loading dynamic modules that ship with extensions (i.e., the value of the AS part of CREATE FUNCTION) would change to look for a library name (with no directory part) in the lib subdirectory of the extension directory.
The directory control file parameter and the MODULEDIR PGXS variable would be deprecated and ignored.
*_preload_libraries would no longer be used to find extension modules without full paths. Administrators would have to remove module names from these GUCs and add the relevant extension names to the new *_preload_extensions variables. To ease upgrades, we might consider adding a PGXS variable that, when true, would symlink shared modules into --pkglibdr.
LOAD would no longer be able to find shared modules included with extensions, unless we add a PGXS variable that, when true, would symlink shared modules into --pkglibdr.
The EXTENSION PGXS variable will no longer support multiple extension names.
The change in extension installation locations must also be adopted by projects that don’t use PGXS for installation, like pgrx. Or perhaps they could be modified to also use PGXS. Long term it might be useful to replace the Makefile-based PGXS with another installation system, perhaps a CLI.

Out of Scope

This RFC does not include or attempt to address the following issue:

How to manage third-party shared libraries. Making system dependencies consistent in a OCI/Kubernetes environment or for non-system binary packaging patterns presents its own challenges, though they’re not specific to PostgreSQL or the patterns described here. Research is ongoing into potential solutions, and will be addressed elsewhere.

Acknowledgements

A slew of PostgreSQL community members contributed feedback, asked hard questions, and suggested moderate to significant revisions to this RFC via the the pgsql-hackers list, in-person discussion at PGConf.eu, and pull request comments. I’d especially like to thank:

Yurii Rashkovskii and David Christensen for highlighting this issue at the Extension Ecosystem Summit
Christoph Berg for the original patch, calling attention to the permission issues when building Debian packages, and various lists discussions
Tobias Bussmann for calling attention to the immutability issues with Postgres.app
Christoph Berg, Gabriele Bartolini, Peter Eisentraut, and Andres Freund for detailed discussion at PGConf.eu on extension location issues and getting to consensus on a genera approach to solving it
Douglas J Hunley, Shaun Thomas, and Keith Fiske for pull request reviews and corrections
Álvaro Hernández Tortosa for a very close review and ton of substantive feedback on the pull request
Paul Ramsey, Tristan Partin, Ebru Aydin Gol, and Peter Eisentraut for pgsql-hackers list discussions.
Tembo for supporting my work on this and many other extension-related issues

All remaining errors and omissions remain my own.

But not non-extension modules; see Phase Two and Future for further details on preloading extension modules and eventually deprecating non-extension modules. ↩︎

Bye Twitter

14 Nov 2024

I quit Twitter in November 2022, when it became clear that Phony Stark would take control of the company. No regrets; I’ve been active on Mastodon since then, via Ivory, where I’ve enjoyed the Twitter ca. 2010 vibe.

But I hadn’t deleted my Tweets or my accounts until this week. The change was finally triggered by the latest announcement:

On Friday, X will adopt a new terms of service policy that for the first time explicitly says all users agree to let their posts be used to train artificial intelligence, like the company’s generative AI service, Grok. Currently, X users can go to the site’s settings and opt out.

I have no interest in feeding Grok or any other large language model. So this week I deleted all my tweets and then my accounts. I tried to transfer my personal username to a new, empty account, but in the second between changing it and switching to another tab to take it, it was somehow acquired by an account created in 2011 with no posts. Not sure how that worked. Pretty scummy.

So, just to be clear, anyone using these usernames on Twitter are not me anymore:

@theory
@lunar_theory
@DesignScene
@pgxn

See you on Mastodon — or perhaps Bluesky.

Introducing RFC 9535 Go JSONPath and Playground

4 Nov 2024

I’ve written and release a RFC 9535 JSONPath JSONPath Go package, github.com/theory/jsonpath. Why? For a personal project, I needed a simpler JSONPath engine to complement to the Go SQL/JSON Path package, and quickly found myself implementing most of the RFC. So I decided do the whole thing.

Yes, yet another JSONPath package in Go. I really appreciate the idea of a standard — plus its support for features not included in the original design from 2007, such as object slices! But I could find no reference to the RFC on pkg.go.dev. Today the search shows one!

Example

Usage is straightforward; here’s a quick example (Go playground):

package main

import (
	"fmt"
	"log"

	"github.com/theory/jsonpath"
)

func main() {
	// Parse a jsonpath query.
	p, err := jsonpath.Parse(`$["name", "slogan"]`)
	if err != nil {
		log.Fatal(err)
	}

	// Select values from unmarshaled JSON input.
	json := map[string]any{
		"name":   "Kamala Harris",
		"title":  "Vice President of the United States",
		"home":   "California",
		"slogan": "We are not going back!",
	}
	items := p.Select(json)

	// Show the result.
	fmt.Printf("%#v\n", items)
}

And the output:

[]interface {}{"Kamala Harris", "We are not going back!"}

🛝 Playground

No need to write code to try it out, though. I’ve also written a playground webapp to encourage experimentation and exploration of the syntax and behavior of the package. The implementation follows the precedents set by the Go SQL/JSON Playground and design of the Rust JSONPath Playground. Moreover, thanks to TinyGo, the Wasm file comes in at a mere 254K!

The webapp loads sample JSON from the RFC, and randomly rotates through a few example JSONPath queries. Fill in your own and tap the “Permalink” button to share links. The Playground is a stateless JavaScript/Wasm web application: data persists only in permalink URLs.¹

🛝 Try this example Playground permalink right now!

Status

The root jsonpath package is stable and ready for use. Other packages remain in flux, as I refactor and rejigger things in the coming weeks as part of the aforementioned personal project. But for actual JSONPath execution and querying, it should continue to work as-is for the foreseeable futre.

I hope you find it useful.

And whatever data [GitHub Pages collect] 😔. ↩︎

NYTimes Tech Guild on Strike

4 Nov 2024

A New York Times-style election needle illustration ranging from “contract” to strike", with the needle all the way over to “strike”

This is a big deal. My former colleagues in the New York Times Tech Guild have declared a strike today in response to a number of unfair labor practice violations and management’s inability to bargain in good faith. From the Washington Post:

Represented by the NewsGuild of New York, the Tech Guild has been negotiating its first contract with the company since 2022. Both sides have accused the other side of bogging down contract talks, which have recently focused on three key issues: “just cause” job protections (which ensure employees can’t be fired without reason and due process), remote work and pay equity. Unit members voted overwhelmingly in September to authorize the strike, saying the timing of the authorization — in the run-up to the high-profile election period — was “no accident.”

I urge you to join me in supporting the strike fund, which will help more vulnerable members to weather the strike.

Beyond that, the Guild asks supporters to refrain from interacting with Cooking and Games, and to break your streaks. Times reporters Maggie Astor has the details:

As of this morning, the New York Times Tech Guild, which represents NYT tech workers, is on strike.

Unless it’s resolved while today’s Wordle is still live, this streak is gone.

NYT Games and Cooking are BEHIND THE PICKET LINE. Please don’t play or engage with Games or Cooking content while the strike lasts!

News coverage is NOT behind the picket line. It’s okay to read and share that, though the site and app may have problems.

Read the whole thread for more information, which, as she says, “the rules may not be what you think!”

PGConf & Extension Ecosystem Summit EU 2024

29 Oct 2024

Last week I MCed the first Extension Ecosystem Summit EU and attended my first at PGConf EU in Athens, Greece. Despite my former career as an archaeologist — with a focus on Mediterranean cultures, no less! — this was my first visit to Greece. My favorite moment was the evening after the Summit, when I cut out of a networking shindig to walk to Pláka and then circumnavigate the Acropolis. I mean just look at this place!

Nighttime photo of the Acropolis of Athens — The Acropolis of Athens on the evening of October 22, 2024. © 2024 David E. Wheeler

Highlight of the trip for sure. But the Summit and conference were terrific, as well.

Extension Ecosystem Summit

Floor Drees kindly organized The Extension Ecosystem Summit EU, the follow-up to the PGConf.dev original. While the Vancouver Summit focused on developers, we tailored this iteration to users. I started the gathering with a condensed version of my POSETTE talk, “State of the Postgres Extension Ecosystem”, but updated with a Trunk OCI Distribution demo. Links:

We then moved into a lightning round of 10 minute introductions to a variety of extensions:

Jimmy Angelakos showed off pg_statviz, a “minimalist extension and utility pair for time series analysis and visualization of PostgreSQL internal statistics”.
Adam Hendel gave a brief history of pgmq, is a “lightweight message queue like AWS SQS and RSMQ, but on Postgres.”
Gülçin Yıldırım Jelínek introduced pgzx, “a library for developing PostgreSQL extensions written in Zig.”
James Sewell talked about pgvectorscale and pgai for managing machine training workloads on Postgres.
Alastair Turner described pg_tde, an extension that provides transparent data encryption for Postgres.

Quite the whirlwind! There followed open discussion, in which each maintainer went to a corner to talk to attendees about contributing to their extensions. Details to come in a more thorough writeup on the Tembo blog, but I personally enjoyed some fascinating discussions about extension distribution challenges.

PGConf.eu

Following the Summit, I attended several thought-provoking and provocative presentations at PGConf.eu, which took place at the same hotel, conveniently enough.

Floor Drees speaking at a podium, next to a slide reading “Why Postgres?” — Floor Drees speaking at PGConf.eu 2024. © 2024 David E. Wheeler

In the conference keynote, Stacey Haysler explained the The PostgreSQL License Fee. I’m pleased to say that my employer “pays” license fee!
Andres Freund (yes, that one) summarized NUMA vs PostgreSQL, covering some of the issues and opportunities for optimization of PostgreSQL on servers using the NUMA multi-processor memory architecture.
Heikki Linnakangas offered an overview of The Wire Protocol, that bit of PostgreSQL technology that lets clients talk to PostgreSQL.
Ophir Lojkine and Thomas Guillemard showed how a pretty sweet framework, SQLPage, lets archaeologists like Thomas write complete web applications in pure SQL, enabling rapid data entry while in the field.
Tomas Vondra’s Performance Archaeology dug into the history of PostgreSQL improvements, mapping their compounding returns over time.
Floor Drees (photo above) talked about Supporting extensions, but really now, making a strong case for the need to organize infrastructure to support the broader extension ecosystem.

There were many more talks, but clearly I tend to be drawn to the most technical, core-oriented topics. And also archaeology.

Museums

Speaking of which, I made time to visit two museums while in Athens. First up was the National Archaeological Museum of Athens, where I was delighted to explore the biggest collection of Mycenaean artifacts I’ve ever seen, including massive collections from the excavations of Heinrich Schliemann. So much great Bronze Age stuff here. I mean, just look at this absolute unit:

Photo of a Mycenaean Krater featuring a horse-drawn chariot — From the museum description: “Fragment of a krater depicting a chariot with two occupants. A male figure holding a staff walks in front of the chariot. Much of the Mycenaean Pictorial Style pottery (14th-12th centuries BC) with representations of humans, chariots, horses and bulls on large kraters, was produced at Berbati in the Argolid and exported to Cyprus, where it was widely imitated. Birds, fish, wild goats or imaginary creatures (i.e. sphinxes) occur on other types of vessels, such as jugs and stirrup jars. Usually only fragments of these vases survive in mainland Greece from settlement contexts. In Cyprus, however, complete vases are preserved, placed as grave gifts in tombs.” © Photo 2024 David E. Wheeler

The animal decorations on Mycenaean and Akrotiri pottery is simply delightful. I also enjoyed the Hellenistic stuff, and seeing the famed Antikythera Mechanism filled my nerd heart with joy. A good 3 hours poking around; I’ll have to go back and spend a few days there sometime. Thanks to my pal Evan Stanton for gamely wandering around this fantastic museum with me.

Immediately after the PGConf.eu closing session, I dashed off to the Acropolis Museum, which stays open till 10 on Fridays. Built in 2009, this modern concrete-and-glass building exhibits several millennia of artifacts and sculpture exclusively excavated from the Acropolis or preserved from its building façades. No photography allowed, alas, but I snapped this photo looking out on the Acropolis from the top floor.

Photo of the Acropolis as viewed from inside the Acropolis Museum. — The Acropolis as viewed from inside the Acropolis Museum. Friezes preserved from the Parthenon inside the museum reflect in the glass, as does, yes, your humble photographer. © 2024 David E. Wheeler

I was struck by the beauty and effectiveness of the displays. It easily puts the lie to the assertion that the Elgin Marbles must remain in the British Museum to protect them. I saw quite a few references to the stolen sculptures, particularly empty spots and artfully sloppy casts from the originals, but the building itself makes the strongest case that the marbles should be returned.

But even without them there remains a ton of beautiful sculpture to see. Highly recommended!

Back to Work

Now that my sojourn in Athens has ended, I’m afraid I must return to work. I mean, the event was work, too; I talked to a slew of people about a number of projects in flight. More on those soon.

PGXN v2 Update

9 Oct 2024

Speaking of PGXN news, I neglected to link to this post I wrote for the Tembo Blog last month, a fairly detailed accounting of what’s been happening on the PGXN v2 project:

Forgive me Postgres community, for it has been five months since my last PGXN v2 Update. In my defense, it has been super busy! The time went into ongoing community discussions, planning, designs, and the start of implementation. Join me below for the lowdown.

A few highlights:

There’s been quite a bit of activity since then, including the aforementioned PGXN RFC–5 — Release Certification. More soon!

PGXN Certifications RFC

9 Oct 2024

A couple weeks ago, I drafted PGXN RFC–5 — Release Certification, which proposes to replace the simple inclusion of a SHA-1 hash digests in PGXN release META.json files with a JWS-signed release payload. From the introduction:

This RFC therefore proposes to extend v2 distribution metadata with a single additional property, certs, that contains one or more certifications that attest to the authenticity or other characteristics of a release on PGXN.

The certs value is an object that contains at least one property, pgxn, which itself contains a PGXN-generated RFC 7515 JSON Web Signature in the JWS JSON Serialization format. The pgxn property will allow clients not only to assemble the release URL and verify the downloaded file against checksums, but also validate it against a public key provided by PGXN.

The design allows multiple signatures, certifications, or other attestations, which in the future MAY allow authors or other entities to sign releases with their own keys. The new format appends a structure such as this to the distribution META.json file:
{
  "certs": {
    "pgxn": {
      "payload": "eyJ1c2VyIjoidGhlb3J5IiwiZGF0ZSI6IjIwMjQtMDktMTNUMTc6MzI6NTVaIiwidXJpIjoiZGlzdC9wYWlyLzAuMS43L3BhaXItMC4xLjcuemlwIiwiZGlnZXN0cyI6eyJzaGE1MTIiOiJiMzUzYjVhODJiM2I1NGU5NWY0YTI4NTllN2EyYmQwNjQ4YWJjYjM1YTdjMzYxMmIxMjZjMmM3NTQzOGZjMmY4ZThlZTFmMTllNjFmMzBmYTU0ZDdiYjY0YmNmMjE3ZWQxMjY0NzIyYjQ5N2JjYjYxM2Y4MmQ3ODc1MTUxNWI2NyJ9fQ",
      "signature": "cC4hiUPoj9Eetdgtv3hF80EGrhuB__dzERat0XF9g2VtQgr9PJbu3XOiZj5RZmh7AAuHIm4Bh-rLIARNPvkSjtQBMHlb1L07Qe7K0GarZRmB_eSN9383LcOLn6_dO--xi12jzDwusC-eOkHWEsqtFZESc6BfI7noOPqvhJ1phCnvWh6IeYI2w9QOYEUipUTI8np6LbgGY9Fs98rqVt5AXLIhWkWywlVmtVrBp0igcN_IoypGlUPQGe77Rw"
    }
  }
}

Review and feedback would be very much appreciated, especially on the list of unresolved questions toward the end.

Thanks to David Christensen and Steven Miller for the early reviews!

Meanwhile, I’ve released pgxn_meta v0.4.0, which adds support for this format, as well as code to rewrite PGXN v1 release fields to the new format. It doesn’t actually do signature verification, yet, as the server back end hasn’t been updated with the pattern and PKI. But I expect to modify it in response to feedback and get it implemented in early 2025.

⛰️ Postgres Ecosystem Summit EU

9 Oct 2024

Given the success of the Extension Ecosystem Summit at PGConf.dev back in May, my colleague Floor Drees has organized a sequel, the Extension Ecosystem Summit EU on Tuesday, October 22, at the Divani Caravel Hotel in Athens. That’s “Day 0” at the same hotel as PGConf.eu. Tembo, Percona, Xata, and Timescale co-sponsor.

While the May event took the form of an open-space technology (OST)-style unconference aimed at extension developers, the EU event aims to inform an audience of Postgres users about the history and some exemplary use cases for extensions. From the invite:

Join us for a gathering to explore the current state and future of Postgres extension development, packaging, and distribution. Bring your skills and your devices and start contributing to tooling underpinning many large Postgres installations.

Jimmy Angelakos - pg_statviz: pg_statviz is a minimalist extension and utility pair for time series analysis and visualization of PostgreSQL internal statistics.

Adam Hendel (Tembo) - pgmq: pgmq is a lightweight message queue. Like AWS SQS and RSMQ but on Postgres. Adam is pgmq’s maintainer since 2023, and will present a journey from pure Rust → pgrx → pl/pgsql.

Alastair Turner (Percona) - pg_tde: pg_tde offers transparent encryption of table contents at rest, through a Table Access Method extension. Percona has developed pg_tde to deliver the benefits of encryption at rest without requiring intrusive changes to the Postgres core.

Gülçin Yıldırım Jelínek (Xata) - pgzx: pgzx is a library for developing PostgreSQL extensions written in Zig.

Mats Kindahl (Timescale) - TimescaleDB (C), [pgvectorscale] (Rust) and pgai (Python): maintaining extensions written in different languages.

I will also deliver the opening remarks, including a brief history of Postgres extensibility. Please join us if you’re in the area or planning to attend PGConf.eu. See you there!

Release: pgxn_meta v0.1.0

8 Aug 2024

Following the recent spate of work drafting RFCs for a binary distribution format and Meta Spec v2, and bearing in mind the vote to implementing PGXN v2 apps in Rust, over the last few weeks I’ve switched gears to write some code.

First I wrote JSON Schemas for the v1 spec and then wrote a full test suite using the boon crate. Next I wrote and tested JSON Schemas for the v2 spec. This process informed revisions to the RFC, uncovering inconsistencies and unnecessary complexities.

With META.json file JSON Schema validation fully worked up, I decided to work the functionality into a proper Rust crate and CLI to eventually replace the old PGXN::Meta::Validator Perl module and its validate_pgxn_meta CLI. This turned out to be a decent Rust starter project, requiring a fairly simple crate and CLI, but also allowed me to develop patterns to build and release binaries for a variety of platforms and architecture.

As a result, I’m happy to announce the release today of the pgxn_meta crate and pgxn_meta CLI v0.1.0, available for download on these platforms, thanks to cross and houseabsolute/actions-rust-cross:

darwin-amd64
darwin-arm64
freebsd-amd64
freebsd-i686
illumos-amd64
linux-amd64
linux-arm
linux-arm64
linux-i686
linux-powerpc
linux-powerpc64
linux-powerpc64le
linux-riscv64
linux-s390x
linux-sparc64
netbsd-amd64
solaris-amd64
solaris-sparcv9
windows-amd64
windows-arm64
windows-i686

Download the archive file appropriate to your platform, decompress it, and put the pgxn_meta (or pgxn_meta.exe) binary in your path. Or use the universal binary installer (ubi) to install it:

ubi --project pgxn/meta --exe pgxn_meta --in ~/bin

And of course you can use cargo to compile it from source:

cargo install pgxn_meta

Usage is simple: just run pgxn_meta in a directory containing the META.json file to validate:

❯ pgxn_meta 
META.json is OK

And optionally pass it the name of the file, as in this example parsing a test file with no version property:

❯ pgxn_meta corpus/invalid.json 
Error: "corpus/invalid.json jsonschema validation failed with https://pgxn.org/meta/v2/distribution.schema.json#\n- at '': missing properties 'version'"```

That’s it!

What’s Next?

Now that I’ve implemented validation and figured out multi-platform binary support for Rust apps, my next tasks are to:

Implement a pattern to convert a v1 META.json to the v2 format
Create a pattern to merge multiple META.json files into one
Write code to build PGXS extension into trunk packages
Develop patterns to satisfy third-party dependencies for multiple platforms

Should keep me busy for a few weeks. Updates as I have them.

To Preload, or Not to Preload

7 Aug 2024

The Tembo Blog published a post by yours truly last week about when to preload shared libraries and when not to:

Recently I’ve been trying to figure out when a Postgres extension shared libraries should be preloaded. By “shared libraries” I mean libraries provided or used by Postgres extensions, whether LOADable libraries or CREATE EXTENSION libraries written in C or pgrx. By “preloaded” I mean under what conditions should they be added to one of the Shared Library Preloading variables, especially shared_preload_libraries.

The answer, it turns out, comes very much down to the extension type.

I view this post as a kind of proto-chapter for an imagined book about developing extensions that I’d like to work on someday. I learned quite a lot researching it and responding to extensive feedback from more knowledgeable community members. It resulted in updates to the PGXN Meta preload property that I hope will inform binary distribution in the future. More on that soon.

RFC: PGXN Meta Spec v2

15 Jul 2024

Two bits of news on the “PGXN v2” project.

PGXN RFCs: The Book

First, I’ve moved the RFC process (again, sorry) from PGXN Discussions, which were a bit fussy about Markdown formatting and don’t support inline comments, to the PGXN RFCs project, where use of pull requests on CommonMark Markdown documents address these issues. This process borrows heavily from the Rust RFCs project, right down to publishing accepted RFCs as a “book” site.

So I’d also like to introduce rfcs.pgxn.org, a.k.a., the PGXN RFCs Book.

It currently houses only one RFC: Meta Spec v1, dating from 2010. This document defines the structure of the META.json file required in archives published on PGXN.

But I expect many more RFCs to be drafted in the coming years, starting with draft RFC–2, the binary distribution RFC I POCed a few weeks ago. There has already been some great feedback in that pull request, in addition to the previous discussion. More eyes will make it even better.

PGXN Meta Spec v2 RFC

Last week I also iterated on the PGXN Metadata Sketch several times to produce draft RFC–3: Meta Spec v2. This represents a major reworking of the original spec in an attempt to meet the following goals:

Allow more comprehensive dependency specification, to enable packagers to identify and install system dependencies and dependencies from other packaging systems, like PyPI and CPAN
Adopt more industry-standard formats like SPDX License Expressions and purls.
Improve support multiple types of Postgres extensions, including apps, LOADable modules, background workers, and TLEs.
Improve curation and evaluation via categories, badging, and additional download links.

There’s a lot here, but hope the result can better serve the community for the next decade, and enable lots of new services and features.

The proof will be in the application, so my next task is to start building the tooling to turn PGXN distributions into binary distributions. I expect experimentation will lead to additional iterations, but feel confident that the current state of both RFC–2 and RFC–3 is on the right track.

Introducing Go SQL/JSON Path and Playground

8 Jul 2024

For a personal project, I needed to parse and execute PostgreSQL-compatible jsonpath expressions.¹ So I’ve spent just about every spare evening and weekend the last several months porting Postgres jsonpath to Go, and it’s finally ready to ship.

Introducing Go SQL/JSON, featuring the path package. This project provides full support for all of the PostgresSQL 17 jsonpath features² in the Go programming language. An example:

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"log"

	"github.com/theory/sqljson/path"
	"github.com/theory/sqljson/path/exec"
)

func main() {
	// Parse some JSON.
	var value any
	err := json.Unmarshal([]byte(`{"a":[1,2,3,4,5]}`), &value)
	if err != nil {
		log.Fatal(err)
	}

	// Parse a path expression and execute it on the JSON.
	p := path.MustParse("$.a[*] ? (@ >= $min && @ <= $max)")
	res, err := p.Query(
		context.Background(),
		value,
		exec.WithVars(exec.Vars{"min": float64(2), "max": float64(4)}),
	)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%v\n", res)
    // Output: [2 3 4]
}

I think the API is decent, but may implement better patterns as I discover them. Overall I’m quite satisfied with how it turned out, and just how well its implementation and performance compare to the original.

🛝 Playground

But why stop there? One of the nice things about this project is that Go supports compiling applications into WebAssembly (a.k.a. Wasm) via Go WebAssembly. Borrowing from the Goldmark project, I created and published the sqljson/path playground and populated the docs with links for all of its examples.

Now anyone can experiment with SQL/JSON path expressions, and share links to demonstrate patterns and techniques. The Playground is a stateless JavaScript/Wasm web application: data persists only in permalink URLs.³

🛝 Try this example Playground permalink right now!⁴

The Path Ahead

I’ve enjoyed learning how to implement a lexer, a goyacc parser, an AST, and an execution engine. The Playground was a bonus bit of fun!

I’m stoked to build cool stuff on this package, but don’t know whether anyone else will find it useful. If you do — or just enjoy messing about on the Playground, let me know!

“Whatever for,” you ask? Well, aside from wanting to see if I could do it, this post describes a POC. Now I’m working to create the real thing — done right and entirely from scratch. ↩︎
Well, nearly full. The only missing feature is the datetime(template) method. See also the comprehensive compatibility notes. ↩︎
And whatever data GitHub Pages collect 😔. ↩︎
JSON borrowed from MDM. ↩︎

Patch: Postgres ABI and API Guidance

27 Jun 2024

Update 2011-11-14

I forgot to update this post at the time, but on July 31, Peter Eisentraut committed the patch patch ABI and API guidance to the C language documentation. I only noticed because today’s releases contained a modified ABI that broke a number of extensions. See the hackers thread for details.

TL;DR

If you’re a Postgres extension developer interested in understanding what to expect from core API and ABI stability, please review and give feedback on this patch (or pull request) adding ABI and API Guidance to the documentation.

In my PGConf.dev report a couple days ago, I mentioned that a few actionable items came out of the Improving extensions in core unconference session. One was the need to document the heretofore unofficial policy for API and ABI stability between major and, especially, minor versions of Postgres.

A frequent topic at the Extension Summit and Mini-Summits and a number of PCConf sessions has been concern regarding compatibility changes between minor releases of Postgres. At Mini Summit Five, for example, Yurii Rashkovskii presented a few examples of such changes, leading him to conclude, along with several others in the community, that C API-using extensions can only be used when built against the minor release with which they’re used.

In the Unconference session, core committers reported that such changes are carefully made, and rarely, if ever, affect extensions compiled for different minor releases of the same major version. Furthermore, they carefully make such changes to avoid compatibility issues. In the case Yurii found, for example, a field was added to a struct’s padding, without affecting the ordering of other fields, thus minimizing the risk of runtime failures.

It became clear that, although the committers follow a policy — and read new committers into it via patch review — it’s not documented anywhere. The result has been a bunch of sturm und drang amongst extension developer unsure what level of compatibility to depend on and what changes to expect.

The week after the conference, I started a pgsql-hackers thread proposing to document the committer policy. Following some discussion and review of potential ABI breaks in minor releases, the consensus seemed to be that the committers strive to avoid such breaks, that they’re quite uncommon in minor releases, and that most of the reported issues were due to using more obscure APIs.

As a result, we started drafting a policy, and after a few iterations, Peter Eisentraut pulled things together from the perspective of a core team member, reframed as “Server API and ABI Guidance”. I converted it into a patch (and pull request) to add it to the C Language docs. A key statement on minor releases:

In general, extension code that compiles and works with a minor release should also compile and work with any other minor release of the same major version, past or future.

I hope this document clarifies things. Even if it’s not as strict as some might hope, it at least documents the project approach to compatibility, so we have a better idea what to expect when using the C APIs. If you see gaps, or you have additional questions, please respond to pgsql-hackers thread — or the pull request (I’ll propagate comments to hackers).

PGConf.dev 2024

25 Jun 2024

In addition to the afore-blogged Extension Summit, I also attended a slew of the regular PGConf.dev sessions, gave a talk on the future of the extension ecosystem, socialized with extension authors and core developers, and joined discussions in a number of unconference sessions. Some notes on selected talks and events:

Sessions

I enjoyed The road to new SQL/JSON features, where Álvaro Herrera gave a brief history of SQL/JSON in Postgres, starting with the JSON type in 9.2 (2012), JSONB in 2014, and SQL standard jsonpath in Postgres 12 (2017). Getting the SQL/JSON syntax finished turned out to be substantially more difficult, thanks to parsing issues. It took many attempts and a couple of reversions before most of the functionality was completed last year and included in Postgres 16. The forthcoming Postgres 17 finishes the work, with the standard fully supported except for “the JSON_TABLE plan param and json simplified accessor.”

It’s a great time to use Postgres for JSON object storage and management.

In Anarchy in the Database, subtitled “A Survey and Evaluation of Database Management System Extensibility”, Abigale Kim described her Master’s thesis work investigating Postgres extension incompatibilities. Installing and running tests for pairs of extensions, she found a number of conflicts and issues, such as a bug when Citus was paired with auto_explain (fixed in May). In all, 17% of pairs failed! Abi also found that 19% of extensions contain code copied from the Postgres core; page_inspect is 75% copied code!

Abi advocates for adding an extension manager into core, with well-defined hooks to manage extension load order and to streamline enabling and disabling extensions. Very interesting research, highlighting the need to think more deeply about how best to enable and empower the extension ecosystem.

Jeff Davis and Jeremy Schneider gave a thorough overview of Collations from A to Z. The problem rose to wide attention about six years ago when an libc upgrade changed a collation, leading to data loss, crashes, and duplicate primary keys. Ideally, sort orders would never change. But humans gotta human, language will evolve, and the order of things will need to be updated. In such situations, one must be aware of the changes and reindex or rebuild all indexes (and replace hot standbys, which can’t be reindexed).

I very much appreciated the context, as the ongoing issue with collations and upgrades has confused me. Should application authors choose collations or should DBAs? The new [builtin] collation provider in PostgresSQL 17 tries tries to bridge the gap by supporting unchanging Unicode code-point collation ordering that’s reasonably meaningful to humans. But I also realize that, for some projects with no need for human sort ordering, the C collations is more than sufficient.

In her keynote, When Hardware and Databases Collide, Margo Seltzer offered a provocation: Could PostgreSQL adopt something like CXL to scale to a virtually infinite pool of memory? Could one build a “complete fabric of CXL switches to turn an entire data center into a database”? I have no idea! It sure sounds like it could enable gigantic in-memory databases.

Tricks from in-memory databases by Andrey Borodin mostly went over my head, but each of the experiments sped things up a few percentage points. Together they might add up to something.

The Making PostgreSQL Hacking More Inclusive panel was terrific, and much-needed. I’m grateful that Amit Langote, Masahiko Sawada, and Melanie Plageman shared their experiences as up-and-coming non-white-male committers. I think the resulting discussion will help drive new inclusion initiatives in the PostgreSQL community, such as session moderator Robert Haas’s recently-announced Mentoring Program for Code Contributors.

Oh, and I gave a talk, The future of the extension ecosystem, in which I expanded on my mini-summit talk to suss out the needs of various members of the extension ecosystem (authors, users, DBAs, industry) and our plans to meet those needs in PGXN v2. Links:

Unconference

I also participated in the Friday Unconference. Abi, Yurii, and I led a discussion on Improving extensions in core. We discussed the need for an ABI stability policy, extension management, smoke testing (including for conflicts between extensions), a coalition to advocate for extensions in core (since launched as the Postgres Extension Developers Coalition), inline extensions, WASM-based extensions, and server installation immutability. Great discussions and a few actionable outcomes, some of which I’ve been working on. More soon in future posts.

In Increase Community Participation, we talked about the challenges for broadening the PostgreSQL contributor community, attracting and retaining contributors, recognizing contributions, and how to address issues of burnout and allow people to “retire”. I joined the discourse on how we could adopt or at least support GitHub workflows, such as pull requests, to encourage more patch review in a familiar environment. Personally, I’ve been creating pull requests in my fork for my patches for this very reason.

We also touched on training and mentoring new contributors (hence the mentoring announcement) and changes to postgresql.org, notably adding dedicated pages for each project governance committee, especially for the Contributors Committee (there’s a Contributors Committee?), as well as information for how to become a contributor and be listed on the contributor page.

Final Thoughts

I attended PGCon from 2009 to 2014, and always enjoyed the commearderie in Ottawa every year. Most people went to the same pub after sessions every night (or for some part of each evening), where random connections and deep technical nerdery would continue into the small hours, both indoors and out. The Black Oak was a highlight of the conference for me, every year.

In the intervening years I got busy with non-Postgres work and scaled back my participation. I finally returned in 2023 (other than a virtual unconference in 2022), and found it much the same, although the Black Oak had closed, and now there were 2-3 where people went, diluting the social pool a bit — though still a highlight.

As the new iteration of the Postgres Developer Conference, PGConf.dev is a worthy successor. Vancouver was a nice city for it, and people bought the same energy as always. I connected with far more people, and more meaningfully, than at any other conference. But other than the reception and dinner on Wednesday, there was no one (or three) place where people tended to aggregate into the wee hours. Or at least I wasn’t aware of it. The end of PGCon is bittersweet for me, but I’m happy to continue to participate in PGCONf.dev.

See you next year!

POC: Distributing Trunk Binaries via OCI

21 Jun 2024

A couple months ago, Álvaro Hernández suggested that Postgres extensions should be distributed as OCI (née Docker) images:

It’s all about not reinventing the wheel, and leveraging the ecosystem around OCI. Many of the problems (solutions) in building, packaging and distributing extensions are already solved by OCI: there’s a whole ecosystem of tools around OCI that provide additional benefits in terms of tooling, infrastructure and common knowledge.

As a relatively experienced Docker image builder and distributor, I found this idea intriguing. I wasn’t familiar with the OCI Image Manifest Specification, which defines how to build OCI images containing arbitrary files, or “artifacts”. But if we could adopt an existing protocol and federated registry system like OCI/Docker, it would save pretty significant development time over building our own — plus we’d be adopting and potentially contributing to a standard.

After PGConf.dev, I decided to see if I could work out how to distribute packages in the recently-proposed trunk format such that an OCI/Docker-style image URL could be used to install a version of an extension compiled for the appropriate architecture.

Thanks to the denizens of the #oras and #zot channels on the CNCF Slack, I extended the trunk format POC in pg-semver PR 69 to build the necessary JSON manifest files, push them to a registry, and then pull and install the architecturally-appropriate package. Here’s how it works.

Metadata generation

First, I extended trunk.mk, which builds a trunk package, with a few more targets that create the JSON files with metadata necessary to build OCI manifests. The files that make trunk now also generates are:

{extension}_annotations.json

OCI standard annotations describing a package, including license, vendor, and URLs. The semver_annotations.json file looks like this:

{
  "org.opencontainers.image.created": "2024-06-20T18:07:24Z",
  "org.opencontainers.image.licenses": "PostgreSQL",
  "org.opencontainers.image.title": "semver",
  "org.opencontainers.image.description": "A Postgres data type for the Semantic Version format with support for btree and hash indexing.",
  "org.opencontainers.image.source": "https://github.com/theory/pg-semver",
  "org.opencontainers.image.vendor": "PGXN",
  "org.opencontainers.image.ref.name": "0.32.1",
  "org.opencontainers.image.version": "0.32.1",
  "org.opencontainers.image.url": "https://github.com/theory/pg-semver"
}

{package_name}_config.json

An object with fields appropriate for OCI platform specification, plus the creation date. Here are the content of semver-0.32.1+pg16-darwin-23.5.0-arm64_config.json:

{
  "os": "darwin",
  "os.version": "23.5.0",
  "architecture": "arm64",
  "created": "2024-06-20T18:07:24Z"
}

{package_name}_annotations.json

An object defining annotations to use in an image, built for a specific platform, all under the special key $manifest to be used later by the ORAS CLI to put them in the right place. semver-0.32.1+pg16-darwin-23.5.0-arm64_annotations.json example:

{
  "$manifest": {
    "org.opencontainers.image.created": "2024-06-20T18:07:24Z",
    "org.opencontainers.image.title": "semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk",
    "org.opencontainers.image.licenses": "PostgreSQL",
    "org.opencontainers.image.description": "A Postgres data type for the Semantic Version format with support for btree and hash indexing.",
    "org.opencontainers.image.source": "https://github.com/theory/pg-semver",
    "org.opencontainers.image.vendor": "PGXN",
    "org.opencontainers.image.ref.name": "0.32.1",
    "org.opencontainers.image.version": "0.32.1",
    "org.opencontainers.image.url": "https://github.com/theory/pg-semver",
    "org.pgxn.trunk.pg.version": "16.3",
    "org.pgxn.trunk.pg.major": "16",
    "org.pgxn.trunk.pg.version_num": "160003",
    "org.pgxn.trunk.version": "0.1.0"
  }
}

The org.opencontainers.image keys are the same as in semver_annotations.json, while the new org.pgxn.trunk annotations are intended for an install client to find the image appropriate for the version of Postgres, although that functionality isn’t part of this POC.

The only change to the Makefile to support these annotations are the addition of a DESCRIPTION variable to populate org.opencontainers.image.description and a REPO_URL to populate org.opencontainers.image.source. trunk.mk includes a couple other new variables, too: TITLE (defaults to EXTENSION), VENDOR (defaults to “PGXN”), and URL (defaults to REPO-URL).

Publishing Images

The new shell script push_trunk uses the ORAS CLI and jq to build the necessary manifest files and push them to an OCI registry. It currently works only two trunk files like those built in the trunk POC. It first “pushes” the trunks to a locally-created OCI layout, then constructs manifests associated the SHA ID of each just-pushed image with annotations and platform configurations and writes them into an image index manifest. Finally, it pushes the complete OCI layout described by the index to a remote registry.

If that sounds like a lot of steps, you’re right, it adds up. But the result, following a precedent established by Homebrew (as described in this issue) is multiple images for different platforms indexed at a single URI. Once we publish the two trunks:

./push_trunk localhost:5000/theory/semver:0-32.1 \
    semver-0.32.1+pg16-darwin-23.5.0-arm64 \
    semver-0.32.1+pg16-linux-amd64

We can fetch the manifests. The address for the image index is that first parameter, localhost:5000/theory/semver:0-32.1; we fetch the manifest with the command

oras manifest fetch localhost:5000/theory/semver:0-32.1

Which returns:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 1285,
      "digest": "sha256:1a14997eb380f9641cba6193c001eb630319f345d76ef07aee37f86fafcdbe0b",
      "platform": {
        "os": "linux",
        "architecture": "amd64"
      },
      "annotations": {
        "org.pgxn.trunk.pg.version": "16.3",
        "org.pgxn.trunk.pg.major": "16",
        "org.pgxn.trunk.pg.version_num": "160003",
        "org.pgxn.trunk.version": "0.1.0"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 1302,
      "digest": "sha256:385fcfe6b33c858c3f126fb4284afe23ba8c2f7c32db8a50a607dfece6dd9162",
      "platform": {
        "os": "darwin",
        "os.version": "23.5.0",
        "architecture": "arm64"
      },
      "annotations": {
        "org.pgxn.trunk.pg.version": "16.3",
        "org.pgxn.trunk.pg.major": "16",
        "org.pgxn.trunk.pg.version_num": "160003",
        "org.pgxn.trunk.version": "0.1.0"
      }
    }
  ],
  "annotations": {
    "org.opencontainers.image.created": "2024-06-21T13:55:01Z",
    "org.opencontainers.image.licenses": "PostgreSQL",
    "org.opencontainers.image.title": "semver",
    "org.opencontainers.image.description": "A Postgres data type for the Semantic Version format with support for btree and hash indexing.",
    "org.opencontainers.image.source": "https://github.com/theory/pg-semver",
    "org.opencontainers.image.vendor": "PGXN",
    "org.opencontainers.image.ref.name": "0.32.1",
    "org.opencontainers.image.version": "0.32.1",
    "org.opencontainers.image.url": "https://github.com/theory/pg-semver"
  }
}

Note the manifests array, which lists images associated with this URI. The first one is for amd64 linux and the second for arm64 darwin. They also contain the org.pgxn.trunk annotations that would allow filtering for an appropriate Postgres version. The idea is to download an index like this, find the manifest information for the appropriate platform and Postgres version, and download it. To get the darwin image, pull it by its digest:

oras pull localhost:5000/theory/semver:0-32.1@sha256:385fcfe6b33c858c3f126fb4284afe23ba8c2f7c32db8a50a607dfece6dd9162

Which downloads the file:

$ ls -l *.trunk
semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk

Nice! The OCI protocol allows for filtering on platform directly, skipping the need to download and examine the image index. This is how docker pull --platform works, but is general to OCI. We can fetch a manifest with this command:

oras manifest fetch --platform linux/amd64 localhost:5000/theory/semver:0-32.1

Which returns not the image index, but the manifest for the Linux image:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "artifactType": "application/vnd.pgxn.trunk.layer.v1",
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:90c8d6f2e67cba09f8178648ad95a6b31e51c0d902058bf396f9e7e5f50c8dfd",
    "size": 84
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:b52b292cabe3ca479673ab68d3ea647802a86f15059c3e19ed24d5a7688159c3",
      "size": 61983,
      "annotations": {
        "org.opencontainers.image.title": "semver-0.32.1+pg16-linux-amd64.trunk"
      }
    }
  ],
  "annotations": {
    "org.opencontainers.image.created": "2024-06-21T17:55:13Z",
    "org.opencontainers.image.description": "A Postgres data type for the Semantic Version format with support for btree and hash indexing.",
    "org.opencontainers.image.licenses": "PostgreSQL",
    "org.opencontainers.image.ref.name": "0.32.1",
    "org.opencontainers.image.source": "https://github.com/theory/pg-semver",
    "org.opencontainers.image.title": "semver-0.32.1+pg16-linux-amd64.trunk",
    "org.opencontainers.image.url": "https://github.com/theory/pg-semver",
    "org.opencontainers.image.vendor": "PGXN",
    "org.opencontainers.image.version": "0.32.1",
    "org.pgxn.trunk.pg.major": "16",
    "org.pgxn.trunk.pg.version": "16.3",
    "org.pgxn.trunk.pg.version_num": "160003",
    "org.pgxn.trunk.version": "0.1.0"
  }
}

Or we can pull the file by platform with:

rm *.trunk
oras pull --platform linux/amd64 localhost:5000/theory/semver:0-32.1

And now the Linux image has been downloaded:

$ ls -1 *.trunk
semver-0.32.1+pg16-linux-amd64.trunk

Pretty nice! These examples use zot running in a local Docker container, but could just as easily use the Docker registry (docker.io) or the GitHub registry (ghcr.io) — which is where Homebrew stores its images (e.g., sqlite 3.46.0).

Installation

With these manifests configured and pushed, changes to install_trunk use this knowledge to download from the registry instead of relying on an existing file (as implemented for the trunk POC). Now we call it like so:

./install_trunk localhost:5000/theory/semver:0-32.1

First, it assembles platform information from uname, then pulls the platform-specific image with this oras command:

oras pull --no-tty --plain-http \
     --format 'go-template={{(first .files).path}}' 
     --platform "$platform" "$trunk"

As before, it downloads the image appropriate for the platform. The --format option, meanwhile, causes it to also download annotations and extract the path for the downloaded file. So in addition to downloading the file, it also emits its full path:

/tmp/pgxn/semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk

the script proceeds to unpack the image with that file name and continues with the installation process as before.

Demo

The last new file in the PR is docker_compose.yml, which sets up an amd64 Linux container for building an extension for Postgres 16, and a zot container to push to and pull from. I used it to build this POC and record this demo:

To use it yourself, run these commands with docker_compose.yml:

git clone https://github.com/theory/pg-semver.git
cd pg-semver
git checkout -b trunk-oci origin/trunk-oci
docker compose up -d

This clones the pg-semver repository, checks out the trunk-oci branch, fires up the containers. Wait a couple minutes for Postgres to start and be configured, then, assuming you can build against Postgres 16 on your local machine, you can follow the same steps. The commands in the demo are:

make trunk
docker compose exec linux bash
make clean
make trunk
exit
ls -1 *.trunk
ls -1 *.json
./push_trunk localhost:5000/theory/semver:0.32.1 \
    semver-0.32.1+pg16-darwin-23.5.0-arm64 \
    semver-0.32.1+pg16-linux-amd64
./install_trunk localhost:5000/theory/semver:0.32.1
docker compose exec linux bash
./install_trunk zot:5000/theory/semver:0.32.1
exit

You might need to adjust the first trunk image name if your local configuration is not the same as mine.

Concept Proven

Honestly, this POC far exceeded my expectations. It worked great! Not only does the trunk format seem to work well, but distributing via OCI registries is even better! It brings a bunch of benefits:

We could build a community registry that automatically builds images for PGXN releases for a variety of platforms. This could grow to become the default method for installing extensions, perhaps via a command such as pgxn trunk install theory/semver.
Anyone can use any other registry, and the tooling will work with it. Just as you can pull Docker images from docker.io, you can also pull them from ghcr.io, quay.io, or any other OCI-compliant registry. The same applies here. Extension authors can build and publish trunks to their own registries if they like.
Better yet, organizations can build extension registries for their own use cases, to complement the community registry. Think internal registries for private extensions, or commercial registries that additional features, such as security scans or curation.

Super promising! I’m just about ready to get to work building this stuff, though I anticipate a few challenges:

We’ll need a way to find the latest version (tag) of a release. I’m sure this is do-able, since Homebrew does it. There must be some other index for tags (ghcr.io/homebrew/core/sqlite:latest doesn’t return a result, alas).
In addition to filtering on platform specification when pulling an image, it would be nice to filter on other attributes, such as the org.pgxn.trunk annotations defining Postgres the version. For now it will be fine for the CLI to download an image index and find the right image, but additional server-side filtering would be very nice.
Will need to support extensions that can run on any architecture, such as pure SQL extensions. I think this will be pretty easy by publishing a single tagged image instead of an image index.
If we build a community registry, where should it be hosted? Homebrew uses ghcr.io, presumably avoiding hosting costs, but it might be nice to have a specific community registry, perhaps at trunk.pgxn.org or perhaps oci.postgresql.org.
If we do host a registry, might we want to allow extension authors to publish their own trunks within their namespaces? How might that be organized?

I can imagine workable solutions to these relatively minor challenges. As long as we can encapsulate them into the commands for a single command-line client, it should work out well.

Can’t wait to get started. What do you think?

POC: PGXN Binary Distribution Format

20 Jun 2024

In an effort to make discussion of PGXN and related improvements as accessible as possible, I’ve set up PGXN Discussions on GitHub. Of course GitHub created default categories for all sorts of conversation, and all is welcome, in accordance with the PostgresSQL Code of Conduct.

But I hope more people will be able to find, read, comment on, and even write their own RFCs than was possible on the Postgres Wiki or on Just a Theory. Therefore, please have a look at Proposal-2: Binary Distribution Format, which draws inspiration from the Python wheel format and Trunk to define a packaging format that allows for platform and PostgreSQL version matching to quickly install pre-compiled binary PostgreSQL extension packages.

The proposal has the details, but the basic idea is that files to be installed are stored in directories named for pg_config directory configurations. Then all an installer has to do is install the files in those subdirectories into the pg_config-specified directories.

POC

I ran this idea past some colleagues, and they thought it worth exploring. But the proposal itself didn’t feel sufficient. I wanted to prove that it could work.

So I created a proof-of-concept (POC) implementation in just about the quickest way I could think of and applied it to the semver extension in PR 68. Here’s how it works.

`trunk.mk`

A new file, trunk.mk, dupes all of the install targets from PGXS and rejiggers them to install into the proposed package directory format. The Makefile simply imports trunk.mk:

--- a/Makefile
+++ b/Makefile
@@ -22,6 +22,7 @@ endif
 
 PGXS := $(shell $(PG_CONFIG) --pgxs)
 include $(PGXS)
+include ./trunk.mk
 
 all: sql/$(EXTENSION)--$(EXTVERSION).sql

And now there’s a trunk target that uses those packaging targets. Here’s its output on my amd64 Mac (after running make):

$ make trunk
gmkdir -p 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/extension'
gmkdir -p 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver'
gmkdir -p 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib'
gmkdir -p 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/doc/semver'
ginstall -c -m 644 .//semver.control 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/extension/'
ginstall -c -m 644 .//sql/semver--0.10.0--0.11.0.sql .//sql/semver--0.11.0--0.12.0.sql .//sql/semver--0.12.0--0.13.0.sql .//sql/semver--0.13.0--0.15.0.sql .//sql/semver--0.15.0--0.16.0.sql .//sql/semver--0.16.0--0.17.0.sql .//sql/semver--0.17.0--0.20.0.sql .//sql/semver--0.2.1--0.2.4.sql .//sql/semver--0.2.4--0.3.0.sql .//sql/semver--0.20.0--0.21.0.sql .//sql/semver--0.21.0--0.22.0.sql .//sql/semver--0.22.0--0.30.0.sql .//sql/semver--0.3.0--0.4.0.sql .//sql/semver--0.30.0--0.31.0.sql .//sql/semver--0.31.0--0.31.1.sql .//sql/semver--0.31.1--0.31.2.sql .//sql/semver--0.31.2--0.32.0.sql .//sql/semver--0.32.1.sql .//sql/semver--0.5.0--0.10.0.sql .//sql/semver--unpackaged--0.2.1.sql  'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/'
ginstall -c -m 755  src/semver.dylib 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/'
gmkdir -p '/Users/david/.pgenv/pgsql-16.3/lib/bitcode/src/semver'
gmkdir -p 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode'/src/semver/src/
ginstall -c -m 644 src/semver.bc 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode'/src/semver/src/
cd 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode' && /opt/homebrew/Cellar/llvm/18.1.6/bin/llvm-lto -thinlto -thinlto-action=thinlink -o src/semver.index.bc src/semver/src/semver.bc
ginstall -c -m 644 .//doc/semver.mmd 'semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/doc/semver/'
ginstall -c -m 644 .//README.md .//LICENSE .//Changes 'semver-0.32.1+pg16-darwin-23.5.0-arm64/'
rm -f "semver-0.32.1+pg16-darwin-23.5.0-arm64/digests"
cd "semver-0.32.1+pg16-darwin-23.5.0-arm64/" && find * -type f | xargs shasum --tag -ba 256 > digests
tar zcvf semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk semver-0.32.1+pg16-darwin-23.5.0-arm64
a semver-0.32.1+pg16-darwin-23.5.0-arm64
a semver-0.32.1+pg16-darwin-23.5.0-arm64/LICENSE
a semver-0.32.1+pg16-darwin-23.5.0-arm64/Changes
a semver-0.32.1+pg16-darwin-23.5.0-arm64/trunk.json
a semver-0.32.1+pg16-darwin-23.5.0-arm64/README.md
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/digests
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/doc
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/extension
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.3.0--0.4.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.30.0--0.31.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.21.0--0.22.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.32.1.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.10.0--0.11.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.13.0--0.15.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.31.1--0.31.2.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.31.2--0.32.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--unpackaged--0.2.1.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.12.0--0.13.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.17.0--0.20.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.2.1--0.2.4.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.16.0--0.17.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.22.0--0.30.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.20.0--0.21.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.15.0--0.16.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.2.4--0.3.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.31.0--0.31.1.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.11.0--0.12.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/semver/semver--0.5.0--0.10.0.sql
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/share/extension/semver.control
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/doc/semver
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/doc/semver/semver.mmd
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/semver.dylib
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode/src
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode/src/semver
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode/src/semver.index.bc
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode/src/semver/src
a semver-0.32.1+pg16-darwin-23.5.0-arm64/pgsql/pkglib/bitcode/src/semver/src/semver.bc

The trunk target compresses everything into the resulting trunk file:

$ ls -1 *.trunk
semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk

This should work the same everywhere PGXS works. Here’s the output in an amd64 Linux container¹ mounted to the same directory:

# make trunk
mkdir -p 'semver-0.32.1+pg16-linux-amd64/pgsql/share/extension'
mkdir -p 'semver-0.32.1+pg16-linux-amd64/pgsql/share/semver'
mkdir -p 'semver-0.32.1+pg16-linux-amd64/pgsql/pkglib'
mkdir -p 'semver-0.32.1+pg16-linux-amd64/pgsql/doc/semver'
install -c -m 644 .//semver.control 'semver-0.32.1+pg16-linux-amd64/pgsql/share/extension/'
install -c -m 644 .//sql/semver--0.10.0--0.11.0.sql .//sql/semver--0.11.0--0.12.0.sql .//sql/semver--0.12.0--0.13.0.sql .//sql/semver--0.13.0--0.15.0.sql .//sql/semver--0.15.0--0.16.0.sql .//sql/semver--0.16.0--0.17.0.sql .//sql/semver--0.17.0--0.20.0.sql .//sql/semver--0.2.1--0.2.4.sql .//sql/semver--0.2.4--0.3.0.sql .//sql/semver--0.20.0--0.21.0.sql .//sql/semver--0.21.0--0.22.0.sql .//sql/semver--0.22.0--0.30.0.sql .//sql/semver--0.3.0--0.4.0.sql .//sql/semver--0.30.0--0.31.0.sql .//sql/semver--0.31.0--0.31.1.sql .//sql/semver--0.31.1--0.31.2.sql .//sql/semver--0.31.2--0.32.0.sql .//sql/semver--0.32.1.sql .//sql/semver--0.5.0--0.10.0.sql .//sql/semver--unpackaged--0.2.1.sql  'semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/'
install -c -m 755  src/semver.so 'semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/'
mkdir -p '/usr/lib/postgresql/16/lib/bitcode/src/semver'
mkdir -p 'semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode'/src/semver/src/
install -c -m 644 src/semver.bc 'semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode'/src/semver/src/
cd 'semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode' && /usr/lib/llvm-16/bin/llvm-lto -thinlto -thinlto-action=thinlink -o src/semver.index.bc src/semver/src/semver.bc
install -c -m 644 .//doc/semver.mmd 'semver-0.32.1+pg16-linux-amd64/pgsql/doc/semver/'
install -c -m 644 .//README.md .//LICENSE .//Changes 'semver-0.32.1+pg16-linux-amd64/'
rm -f "semver-0.32.1+pg16-linux-amd64/digests"
cd "semver-0.32.1+pg16-linux-amd64/" && find * -type f | xargs shasum --tag -ba 256 > digests
tar zcvf semver-0.32.1+pg16-linux-amd64.trunk semver-0.32.1+pg16-linux-amd64
semver-0.32.1+pg16-linux-amd64/
semver-0.32.1+pg16-linux-amd64/LICENSE
semver-0.32.1+pg16-linux-amd64/Changes
semver-0.32.1+pg16-linux-amd64/trunk.json
semver-0.32.1+pg16-linux-amd64/README.md
semver-0.32.1+pg16-linux-amd64/pgsql/
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode/
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode/src/
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode/src/semver/
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode/src/semver/src/
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode/src/semver/src/semver.bc
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/bitcode/src/semver.index.bc
semver-0.32.1+pg16-linux-amd64/pgsql/pkglib/semver.so
semver-0.32.1+pg16-linux-amd64/pgsql/doc/
semver-0.32.1+pg16-linux-amd64/pgsql/doc/semver/
semver-0.32.1+pg16-linux-amd64/pgsql/doc/semver/semver.mmd
semver-0.32.1+pg16-linux-amd64/pgsql/share/
semver-0.32.1+pg16-linux-amd64/pgsql/share/extension/
semver-0.32.1+pg16-linux-amd64/pgsql/share/extension/semver.control
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.3.0--0.4.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.30.0--0.31.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.21.0--0.22.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.32.1.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.10.0--0.11.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.13.0--0.15.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.31.1--0.31.2.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.31.2--0.32.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--unpackaged--0.2.1.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.12.0--0.13.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.17.0--0.20.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.2.1--0.2.4.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.16.0--0.17.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.22.0--0.30.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.20.0--0.21.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.15.0--0.16.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.2.4--0.3.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.31.0--0.31.1.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.11.0--0.12.0.sql
semver-0.32.1+pg16-linux-amd64/pgsql/share/semver/semver--0.5.0--0.10.0.sql
semver-0.32.1+pg16-linux-amd64/digests

Pretty much the same, as expected. Now we have two trunks:

$ ls -1 *.trunk
semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk
semver-0.32.1+pg16-linux-amd64.trunk

The package name format is:

{package}-{version}+{pgversion}-{os}-{os_version}-{architecture}.trunk

Here you see the same package, version, and Postgres version, but then the OSes differ, macOS includes the optional OS version, and then the architectures differ. This will allow an install client to download the appropriate trunk.

Note the directories into which files are copied under a top-level directory with that format (without the .trunk extension):

SHAREDIR files go into pgsql/share
DOCDIR files go into pgsql/doc
PKGLIB files go into pgsql/pkglib

What else is there?

$ ls -lah semver-0.32.1+pg16-linux-amd64
total 64
-rw-r--r--@ 1 david  staff    12K Jun 20 13:56 Changes
-rw-r--r--@ 1 david  staff   1.2K Jun 20 13:56 LICENSE
-rw-r--r--@ 1 david  staff   3.5K Jun 20 13:56 README.md
-rw-r--r--  1 david  staff   3.2K Jun 20 13:56 digests
drwxr-xr-x  5 david  staff   160B Jun 20 13:56 pgsql
-rw-r--r--  1 david  staff   1.1K Jun 20 13:56 trunk.json

Changes, LICENSE, README.md are simply copied from the source. The digests file contains checksums in the BSD digest format for every file in the package, aside from digests itself. Here are the first 3:

$ head -3 semver-0.32.1+pg16-linux-amd64/digests
SHA256 (Changes) = 98b5e87b8dc71604df4b743b1d80ef2fe40d96809a5fbad2a89ab97584bd9c01
SHA256 (LICENSE) = ff48c81463d79e2a57da57ca1af983c3067e51a8ff84c60296c6fbf0624a0531
SHA256 (README.md) = 99f7c59f796986777f873e78f47f7d44f5ce2deee645b4be3199f0a08dedc22d

This format makes it easy to validate all the files and well as adjust and update the hash algorithm over time.

Finally, the trunk.json file contains metadata about the extension and the system and Postgres on which the system was built:

{
  "trunk": "0.1.0",
  "package": {
    "name": "semver",
    "version": "0.32.1",
    "language": "c",
    "license": "PostgreSQL"
  },
  "postgres": {
    "version": "16.3",
    "major": "16",
    "number": 160003,
    "libs": "-lpgcommon -lpgport -lselinux -lzstd -llz4 -lxslt -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lm ",
    "cppflags": "-I. -I./ -I/usr/include/postgresql/16/server -I/usr/include/postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 ",
    "cflags": "-Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -fPIC -fvisibility=hidden",
    "ldflags": "-L/usr/lib/x86_64-linux-gnu -Wl,-z,relro -Wl,-z,now -L/usr/lib/llvm-16/lib  -Wl,--as-needed"
  },
  "platform": {
    "os": "linux",
    "arch": "amd64"
  }
}

The trunk proposal doesn’t specify the contents (yet), but the idea is to include information for an installing application to verify that a package is appropriate to install on a platform and Postgres version.

`install_trunk`

Now we have some packages in the proposed format. How do we install them? install_trunk script is a POC installer. Let’s take it for a spin on macOS:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38


$ ./install_trunk semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk 
Unpacking semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk
Verifying all checksums...Changes: OK
LICENSE: OK
README.md: OK
pgsql/pkglib/bitcode/src/semver/src/semver.bc: OK
pgsql/pkglib/bitcode/src/semver.index.bc: OK
pgsql/pkglib/semver.dylib: OK
pgsql/doc/semver/semver.mmd: OK
pgsql/share/extension/semver.control: OK
pgsql/share/semver/semver--0.3.0--0.4.0.sql: OK
pgsql/share/semver/semver--0.30.0--0.31.0.sql: OK
pgsql/share/semver/semver--0.21.0--0.22.0.sql: OK
pgsql/share/semver/semver--0.32.1.sql: OK
pgsql/share/semver/semver--0.10.0--0.11.0.sql: OK
pgsql/share/semver/semver--0.13.0--0.15.0.sql: OK
pgsql/share/semver/semver--0.31.1--0.31.2.sql: OK
pgsql/share/semver/semver--0.31.2--0.32.0.sql: OK
pgsql/share/semver/semver--unpackaged--0.2.1.sql: OK
pgsql/share/semver/semver--0.12.0--0.13.0.sql: OK
pgsql/share/semver/semver--0.17.0--0.20.0.sql: OK
pgsql/share/semver/semver--0.2.1--0.2.4.sql: OK
pgsql/share/semver/semver--0.16.0--0.17.0.sql: OK
pgsql/share/semver/semver--0.22.0--0.30.0.sql: OK
pgsql/share/semver/semver--0.20.0--0.21.0.sql: OK
pgsql/share/semver/semver--0.15.0--0.16.0.sql: OK
pgsql/share/semver/semver--0.2.4--0.3.0.sql: OK
pgsql/share/semver/semver--0.31.0--0.31.1.sql: OK
pgsql/share/semver/semver--0.11.0--0.12.0.sql: OK
pgsql/share/semver/semver--0.5.0--0.10.0.sql: OK
trunk.json: OK
Done!
Verifying compatibility with Trunk package 0.1.0
Verifying compatibility with PostgreSQL 16.3
Verifying compatibility with darwin/arm64:23.5.0 
Installing doc into /Users/david/.pgenv/pgsql-16.3/share/doc...Done
Installing pkglib into /Users/david/.pgenv/pgsql-16.3/lib...Done
Installing share into /Users/david/.pgenv/pgsql-16.3/share...Done

Most of the output here is verification:

Lines 3-32 verify each the checksums of each file in the package
Line 33 verifies the version of the Trunk format
Line 34 verifies Postgres version compatibility
Line 35 verifies platform compatibility

And now, with all the verification complete, it installs the files. It does so by iterating over the subdirectories of the pgsql directory and installing them into the appropriate directory defined by pg_config. Two whit:

Line 36 installs files from pgsql/doc into pg_config --docdir
Line 37 installs files from pgsql/pkglib into pg_config --pkglibdir
Line 38 installs files from pgsql/share into pg_config --sharedir

And that’s it. Here’s where it put everything:

❯ (cd ~/.pgenv/pgsql-16.3 && find . -name '*semver*')
./lib/bitcode/src/semver
./lib/bitcode/src/semver/src/semver.bc
./lib/bitcode/src/semver.index.bc
./lib/semver.dylib
./share/extension/semver.control
./share/semver
./share/semver/semver--0.3.0--0.4.0.sql
./share/semver/semver--0.30.0--0.31.0.sql
./share/semver/semver--0.21.0--0.22.0.sql
./share/semver/semver--0.32.1.sql
./share/semver/semver--0.10.0--0.11.0.sql
./share/semver/semver--0.13.0--0.15.0.sql
./share/semver/semver--0.31.1--0.31.2.sql
./share/semver/semver--0.31.2--0.32.0.sql
./share/semver/semver--unpackaged--0.2.1.sql
./share/semver/semver--0.12.0--0.13.0.sql
./share/semver/semver--0.17.0--0.20.0.sql
./share/semver/semver--0.2.1--0.2.4.sql
./share/semver/semver--0.16.0--0.17.0.sql
./share/semver/semver--0.22.0--0.30.0.sql
./share/semver/semver--0.20.0--0.21.0.sql
./share/semver/semver--0.15.0--0.16.0.sql
./share/semver/semver--0.2.4--0.3.0.sql
./share/semver/semver--0.31.0--0.31.1.sql
./share/semver/semver--0.11.0--0.12.0.sql
./share/semver/semver--0.5.0--0.10.0.sql
./share/doc/semver
./share/doc/semver/semver.mmd

Looks like everything’s installed in the right place. Does it work?

# psql -c "CREATE EXTENSION semver; SELECT '1.2.3'::semver"
CREATE EXTENSION
 semver 
--------
 1.2.3
(1 row)

Very nice. What about on Linux?

./install_trunk semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk 
Unpacking semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk
Verifying all checksums...Changes: OK
LICENSE: OK
README.md: OK
pgsql/pkglib/bitcode/src/semver/src/semver.bc: OK
pgsql/pkglib/bitcode/src/semver.index.bc: OK
pgsql/pkglib/semver.dylib: OK
pgsql/doc/semver/semver.mmd: OK
pgsql/share/extension/semver.control: OK
pgsql/share/semver/semver--0.3.0--0.4.0.sql: OK
pgsql/share/semver/semver--0.30.0--0.31.0.sql: OK
pgsql/share/semver/semver--0.21.0--0.22.0.sql: OK
pgsql/share/semver/semver--0.32.1.sql: OK
pgsql/share/semver/semver--0.10.0--0.11.0.sql: OK
pgsql/share/semver/semver--0.13.0--0.15.0.sql: OK
pgsql/share/semver/semver--0.31.1--0.31.2.sql: OK
pgsql/share/semver/semver--0.31.2--0.32.0.sql: OK
pgsql/share/semver/semver--unpackaged--0.2.1.sql: OK
pgsql/share/semver/semver--0.12.0--0.13.0.sql: OK
pgsql/share/semver/semver--0.17.0--0.20.0.sql: OK
pgsql/share/semver/semver--0.2.1--0.2.4.sql: OK
pgsql/share/semver/semver--0.16.0--0.17.0.sql: OK
pgsql/share/semver/semver--0.22.0--0.30.0.sql: OK
pgsql/share/semver/semver--0.20.0--0.21.0.sql: OK
pgsql/share/semver/semver--0.15.0--0.16.0.sql: OK
pgsql/share/semver/semver--0.2.4--0.3.0.sql: OK
pgsql/share/semver/semver--0.31.0--0.31.1.sql: OK
pgsql/share/semver/semver--0.11.0--0.12.0.sql: OK
pgsql/share/semver/semver--0.5.0--0.10.0.sql: OK
trunk.json: OK
Done!
Verifying compatibility with Trunk package 0.1.0
Verifying compatibility with PostgreSQL 16.3
Verifying compatibility with linux/amd64:6.5.11-linuxkit 
Trunk package contains darwin binaries but this host runs linux

Looks goo—oops! look at that last line. It detected an attempt to install Darwin binaries and rejected it. That’s because I tried to install semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk 🤦🏻‍♂️.

Works with the right binary, though:

# ./install_trunk semver-0.32.1+pg16-linux-amd64.trunk 
Unpacking semver-0.32.1+pg16-linux-amd64.trunk
Verifying all checksums...Changes: OK
LICENSE: OK
README.md: OK
pgsql/pkglib/bitcode/src/semver/src/semver.bc: OK
pgsql/pkglib/bitcode/src/semver.index.bc: OK
pgsql/pkglib/semver.so: OK
pgsql/doc/semver/semver.mmd: OK
pgsql/share/extension/semver.control: OK
pgsql/share/semver/semver--0.3.0--0.4.0.sql: OK
pgsql/share/semver/semver--0.30.0--0.31.0.sql: OK
pgsql/share/semver/semver--0.21.0--0.22.0.sql: OK
pgsql/share/semver/semver--0.32.1.sql: OK
pgsql/share/semver/semver--0.10.0--0.11.0.sql: OK
pgsql/share/semver/semver--0.13.0--0.15.0.sql: OK
pgsql/share/semver/semver--0.31.1--0.31.2.sql: OK
pgsql/share/semver/semver--0.31.2--0.32.0.sql: OK
pgsql/share/semver/semver--unpackaged--0.2.1.sql: OK
pgsql/share/semver/semver--0.12.0--0.13.0.sql: OK
pgsql/share/semver/semver--0.17.0--0.20.0.sql: OK
pgsql/share/semver/semver--0.2.1--0.2.4.sql: OK
pgsql/share/semver/semver--0.16.0--0.17.0.sql: OK
pgsql/share/semver/semver--0.22.0--0.30.0.sql: OK
pgsql/share/semver/semver--0.20.0--0.21.0.sql: OK
pgsql/share/semver/semver--0.15.0--0.16.0.sql: OK
pgsql/share/semver/semver--0.2.4--0.3.0.sql: OK
pgsql/share/semver/semver--0.31.0--0.31.1.sql: OK
pgsql/share/semver/semver--0.11.0--0.12.0.sql: OK
pgsql/share/semver/semver--0.5.0--0.10.0.sql: OK
trunk.json: OK
Done!
Verifying compatibility with Trunk package 0.1.0
Verifying compatibility with PostgreSQL 16.3
Verifying compatibility with linux/amd64:6.5.11-linuxkit 
Installing doc into /usr/share/doc/postgresql-doc-16...Done
Installing pkglib into /usr/lib/postgresql/16/lib...Done
Installing share into /usr/share/postgresql/16...Done

# psql -U postgres -c "CREATE EXTENSION semver; SELECT '1.2.3'::semver"
CREATE EXTENSION
 semver 
--------
 1.2.3
(1 row)

RFC

Any PGXS project can try out the pattern; please do! Just download trunk.mk and install_trunk, import trunk.mk into your Makefile, install shasum, jq and rsync (if you don’t already have them) and give it a try.

The intent of this POC is to prove the pattern; this is not a complete or shippable solution. Following a comment period, I expect to build a proper command-line client (and SDK) to package up artifacts generated from a few build systems, including PGXS and pgrx.

Whether you try it out or not, I welcome a review of the proposal and your comments on it. I’d like to get this right, and have surely overlooked some details. Let’s get to the best binary packaging format we can.

I used the pgxn-tools image and started Postgres and installed the necessary tools with the command pg-start 16 rsync jq. ↩︎