Sqitch 1.5.0

9 Jan 2025

Released yesterday: Sqitch v1.5.0. This version the MySQL driver DBD::mysql with DBD::MariaDB, both for its better backward compatibility with MySQL as well as MariaDB driver libraries and for its improved Unicode handling. The Docker image likewise switched to the MariaDB mysql client. I expect no compatibility issues, but you never know! Please file an issue should you find any.

V1.5.0 also features a fixes for Yugabyte deployment, Oracle error handling, existing Snowflake schemas, connecting to MySQL/MariaDB without a database name, and omitting the checkit MySQL/MariaDB function when the Sqitch user lacks sufficient permission to create it. Sqitch now will also complain when deploying with --log-only and a deployment file is missing.

Find it in the usual places:

Many thanks to everyone who has enjoyed using Sqitch and let me know in person, via email Mastodon, bug reports, and patches. It gratifies me how useful people find it.

Should URI::mysql Switch to DBD::MariaDB?

1 Jan 2025

I seek the wisdom of the Perl Monks:

The Sqitch project got a request to switch from DBD::mysql to DBD::MariaDB. DBD::mysql 5’s requirement to build from the MySQL 8 client library provides the impetus for the request, but in poking around, I found a blogs.perl.org post highlighting some Unicode fixes in DBD::MariaDB, as well.

Now, Sqitch likely doesn’t have the Unicode issue (it always works with Perl Unicode strings), but it depends on URI::db to provide the DBI connection string. For MySQL URIs, the URI::mysql dbi_driver method returns mysql.

Should it be changed to return MariaDB, instead? Is there general community consensus that DBD::MariaDB provides better compatibility with both MySQL and MariaDB these days?

I’m also curious what the impact of this change would be for Sqitch. Presumably, if DBD::MariaDB can build against either the MariaDB or MySQL client library, it is the more flexible choice to continue supporting both databases going forward.

Feedback appreciated via PerlMonks or the Sqitch issue.

Update 2025-01-08

URI-db 0.23 uses DBD::MariaDB instead of DBD::mysql for both URI::mysql and URI::MariaDB.

Similarly, Sqitch v1.5.0 always uses DBD::MariaDB when connecting to MySQL or MariaDB, even when using older versions of URI::db. Thanks everyone for the feedback and suggestions!

New JSONPath Feature: SelectLocated

1 Jan 2025

Happy New Year! 🎉🥳🍾🥂

The JSONPath RFC includes a section on defining normalized paths, which use a subset of JSONPath syntax to define paths to the location of a node in a JSON value. I hadn’t thought much about it, but noticed that the serde JSONPath Sandbox provides a “Located” switch adds them to query results. For the sake of complementarity, I added the same feature to the Go JSONPath Playground.

🛝 See it in action with this example, where instead of the default output:

[
  8.95,
  12.99,
  8.99,
  22.99,
  399
]

The located result is:

[
  {
    "node": 8.95,
    "path": "$['store']['book'][0]['price']"
  },
  {
    "node": 12.99,
    "path": "$['store']['book'][1]['price']"
  },
  {
    "node": 8.99,
    "path": "$['store']['book'][2]['price']"
  },
  {
    "node": 22.99,
    "path": "$['store']['book'][3]['price']"
  },
  {
    "node": 399,
    "path": "$['store']['bicycle']['price']"
  }
]

v0.3.0 of the github.com/theory/jsonpath Go package enables this feature via its new SelectLocated method, which returns a LocatedNodeList that shows off a few of the benfits of pairing JSONPath query results with paths that uniquely identify their locations in a JSON value, including sorting and deduplication. It also takes advantage of Go v1.23 iterators, providing methods to range over all the results, just the node values, and just the paths. As a result, v0.3.0 now requires Go 1.23.

The serde_json_path Rust crate inspired the use of LocatedNodeList rather than a simple slice of LocatedNode structs, but I truly embraced it once I noticed the the focus on “nodelists” in the RFC’s overview, which provides this definition:

A JSONPath expression is a string that, when applied to a JSON value (the query argument), selects zero or more nodes of the argument and outputs these nodes as a nodelist.

It regularly refers to nodelists thereafter, and it seemed useful to have an object to which more features can be added in the future. github.com/theory/jsonpath v0.3.0 thererfore also changes the result value of Select from []any to the new NodeList struct, an alias for []any. For now it adds a single method, All, which again relies on Go v1.23 iterators to iterate over selected nodes.

While the data type has changed, usage otherwise has not. One can iterate directly over values just as before:

for _, val := range path.Select(jsonInput) {
    fmt.Printf("%v\n", val)
}

But All removes the need to alias-away the index value with _:

for val := range path.Select(jsonInput).All() {
    fmt.Printf("%v\n", val)
}

I don’t expect any further incompatible changes to the main jsonpath module, but adding these return values now allows new features to be added to the selected node lists in the future.

May you find it useful!

SQL/JSON Path Playground Update

31 Dec 2024

Based on the recently-released Go JSONPath and JSONTree playgrounds, I’ve updated the design and of the SQL/JSON Playground. It now comes populated with sample JSON borrowed from RFC 9535, as well as a selection of queries that randomly populate the query field on each reload. I believe this makes the playground nicer to start using, not to mention more pleasing to the eye.

The playground has also been updated to use the recently-released sqljson/path v0.2 package, which replicates a few changes included in the PostgreSQL 17 release. Notably, the .string() function no longer uses a time zone or variable format to for dates and times.

Curious to see it in action? Check it out!

JSONTree Module and Playground

22 Dec 2024

As a follow-up to the JSONPath module and playground I released last month, I’m happy to announce the follow-up project, called JSONTree. I’ve implemented it in the github.com/theory/jsontree Go package, and built a Wasm-powered browser playground for it.

JSONTree?

While a RFC 9535 JSONPath query selects and returns an array of values from the end of a path expression, a JSONTree compiles multiple JSONPath queries into a single query that selects values from multiple path expressions. It returns results not as an array, but as a subset of the query input, preserving the paths for each selected value.

In other words, it compiles multiple paths into a single tree of selection paths, and preserves the tree structure of the input. Hence JSONTree.

Example

Consider this JSON:

{
  "store": {
    "book": [
      {
        "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      {
        "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      },
      {
        "category": "fiction",
        "author": "Herman Melville",
        "title": "Moby Dick",
        "isbn": "0-553-21311-3",
        "price": 8.99
      },
      {
        "category": "fiction",
        "author": "J. R. R. Tolkien",
        "title": "The Lord of the Rings",
        "isbn": "0-395-19395-8",
        "price": 22.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 399
    }
  }
}

This JSONPath query:

$..price

Selects these values (playground):

[8.95, 12.99, 8.99, 22.99, 399]

While this JSONPath query:

$..author

Selects (playground):

[
  "Nigel Rees",
  "Evelyn Waugh",
  "Herman Melville",
  "J. R. R. Tolkien"
]

JSONTree compiles these two JSONPaths into a single query that merges the author and price selectors into a single segment, which stringifies to a tree-style format (playground):

$
└── ..["author","price"]

This JSONTree returns the appropriate subset of the original JSON object (playground):

{
  "store": {
    "book": [
      {
        "author": "Nigel Rees",
        "price": 8.95
      },
      {
        "author": "Evelyn Waugh",
        "price": 12.99
      },
      {
        "author": "Herman Melville",
        "price": 8.99
      },
      {
        "author": "J. R. R. Tolkien",
        "price": 22.99
      }
    ],
    "bicycle": {
      "price": 399
    }
  }
}

Note that the original data structure remains, but only for the subset of the structure selected by the JSONPath queries.

Use Cases

A couple of use cases drove the conception and design of JSONPath.

Permissions

Consider an application in which ACLs define permissions for groups of users to access specific branches or fields of JSON documents. When delivering a document, the app would:

Fetch the groups the user belongs to
Convert the permissions from each into JSONPath queries
Compile the JSONPath queries into an JSONTree query
Select and return the permitted subset of the document to the user

Selective Indexing

Consider a searchable document storage system. For large or complex documents, it may be infeasible or unnecessary to index the entire document for full-text search. To index a subset of the fields or branches, one would:

Define JSONPaths the fields or branches to index
Compile the JSONPath queries into a JSONTree query
Select and submit only the specified subset of each document to the indexing system

Go Example

Use the github.com/theory/jsontree Go package together with github.com/theory/jsonpath to compile and execute JSONTree queries:

package main

import (
	"fmt"

	"github.com/theory/jsonpath"
	"github.com/theory/jsontree"
)

func main() {
	// JSON as unmarshaled by encoding/json.
	value := map[string]any{
		"name":  "Barrack Obama",
		"years": "2009-2017",
		"emails": []any{
			"[email protected]",
			"[email protected]",
		},
	}

	// Compile multiple JSONPaths into a JSONTree.
	tree := jsontree.New(
		jsonpath.MustParse("$.name"),
		jsonpath.MustParse("$.emails[1]"),
	)

	// Select from the input value.
	js, err := json.Marshal(tree.Select(value))
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%#v\n", tree.Select(value))
}

And the output:

{"emails":["[email protected]"],"name":"Barrack Obama"}

Note that the index position of the selected email was not preserved. Replace New with NewFixedModeTree to create a “fixed mode” JSONTree that preserves index positions by filling gaps with nulls. Its output of the above example would be:

{"emails":[null,"[email protected]"],"name":"Barrack Obama"}

Status

The public interface of the jsontree module is quite minimal and stable. But I suspect there may remain some flaws in the merging of JSONPath selectors. Please report bugs via GitHub issues and I’ll get them fixed up ASAP.

Otherwise, please share and enjoy!

RFC: Extension Packaging & Lookup

25 Nov 2024

Several weeks ago, I started a pgsql-hackers thread proposing a new extension file organization and a search path GUC for finding extensions. The discussion of Christoph Berg’s extension_destdir patch inspired this proposal. These threads cover quite a lot of territory, so I want to pull together a more unified, public proposal.

Here goes.

Challenges

A number of challenges face extension users, thanks to extension file organization in the Postgres core. The common thread among them is the need to add extensions without changing the contents of the Postgres installation itself.

Packager Testing

On Debian systems, the user account that creates extension packages lacks permission to add files to Postgres install. But testing extensions requires installing the extension where Postgres can find it. Moreover, extensions ideally build against a clean Postgres install; adding an extension in order to run make installcheck would pollute it.

Christoph’s patch solves these problems by adding a second lookup path for extensions and dynamic modules, so that Postgres can load them directly from the package build directory.

Alas, the patch isn’t ideal, because it simply specifies a prefix and appends the full pg_config directory paths to it. For example, if --sharedir outputs /opt/share and extension_destdir GUC is set to /tmp/build/myext, the patch will search in /tmp/build/myext/opt/share. This approach works for the packaging use case, which explicitly uses full paths with a prefix, but would be weird for other use cases.

Peter Eisentraut proposed an alternate patch with a new GUC, extension_control_path, that provides a more typical search path pattern to find extension control files, but doesn’t account for shared modules that ship with an extension, requiring that they still live in the dynamic_library_path. Installing into custom directories requires the undocumented datadir and pkglibdir variables:

make install datadir=/else/where/share pkglibdir=/else/where/lib

This pattern can probably be simplified.

OCI Immutability

OCI (née Docker) images are immutable, while a container image runs on a writeable but non-persistent file system. To install persistent extensions in a container, one must create a persistent volume, map it to SHAREDIR/extensions, and copy over all the extensions it needs (or muck with symlink magic). Then do it again for shared object libraries (PKGLIBDIR), and perhaps also for other pg_config directories, like --bindir. Once it’s all set up, one can install a new extension and its files will be distributed to the relevant persistent volumes.

This pattern makes upgrades tricky, because the core extensions are mixed in with third-party extensions. Worse, the number of directories that must be mounted into volumes depends on the features of an extension, increasing deployment configuration complexity. It would be preferable to have all the files for an extension in one place, rather than scattered across multiple persistent volumes.

Peter Eisentraut’s patch addresses much of this issue by adding a search path for extension control files and related data/share files (generally SQL files). One can create a single volume with a lib directory for shared modules and share/extension directory for control and data/share files.

OCI Extension Images

However, an additional wrinkle is the ambition from the CloudNativePg (CNPG) community to eliminate the need for a persistent volume, and rely instead on mounting images that each contain all the files for a single extension as their own volumes, perhaps using Kubernetes image volume feature, (currently in alpha).

This feature requires all the file in an extension to live in a single directory, a volume mounted to an extension image contains all the files required to use the extension. The search path patches proposed so far do not enable this behavior.

Postgres.app Immutability

The macOS Postgres.app supports extensions. But installing one into SHAREDIR/extensions changes the contents of the Postgres.app bundle, breaking Apple-required signature validation. The OS will no longer be able to validate that the app is legit and refuse to start it.

Peter Eisentraut’s new patch addresses this issue as well, with all the same caveats as for the packager testing challenges.

Solution

To further address these issues, this RFC proposes to change file organization and lookup patterns for PostgreSQL extensions.

Extension Directories

First, when an extension is installed, by default all of its files will live in a single directory named for the extension. The contents include:

The Control file that describes extension
Subdirectories for SQL, shared modules, docs, binaries, etc.

Subdirectories roughly correspond to the pg_config --*dir options:

bin: Executables
doc: Documentation files
html: HTML documentation files
lib: Dynamically loadable modules
locale: Locale support files
man: Manual pages
share: SQL and other architecture-independent support files

This layout reduces the cognitive overhead for understanding what files belong to what extension. Want to know what’s included in the widget extension? Everything is in the widget directory. It also simplifies installation of an extension: one need add only a directory named for and containing the files required by the extension.

Configuration Parameter

Add a new pg_config value that returns the directory into which extensions will by default be installed:

 --extdir   show location of extensions

Its default value would be $(pg_config --sharedir)/extension, but could be set at compile time like other configuration parameters. Its contents consist of subdirectories that each contain an extension, as described in Extension Directories. With a few extensions installed, it would look something like:

❯ ls -1 "$(pg_config --extdir)"
auto_explain
bloom
isn
pair
plperl
plpgsql
plv8
xml2
semver
vector

Extension Path

Add an extension lookup path GUC akin to dynamic_library_path, called extension_path. It lists all the directories that Postgres will search for extensions and their files. The default value for this GUC will be:

extension_path = '$extdir'

The special string $extdir corresponds to the pg_config option of the same name, and function exactly as $libdir does for the dynamic_library_path GUC, substituting the appropriate value.

Lookup Execution

Update PostgreSQL’s CREATE EXTENSION command to search the directories in extension_path for an extension. For each directory in the list, it will look for the extension control file in a directory named for the extension:

$dir/$extension/$extension.control

The first match will be considered the canonical location for the extension. For example, if Postgres finds the control file for the pair at /opt/pg17/ext/pair/pair.control, it will load files only from the appropriate subdirectories, e.g.:

SQL files from /opt/pg17/ext/pair/share
Shared module files from /opt/pg17/ext/pair/lib

PGXS

Update the extension installation behavior of PGXS to install extension files into the new layout. A new variable, $EXTDIR, will define the directory into which to install extension directories, and default to $(pg_config --extdir). It can be set to any literal path, which must exist and be accessible by the PostgreSQL service.

The $EXTENSION variable will be changed to allow only one extension name. If it’s set, the installation behavior will be changed for the following variables:

EXTENSION: Creates $EXTDIR/$EXTENSION, installs $EXTDIR/$EXTENSION/$EXTENSION.control
MODULES and MODULE_big: Installed into $EXTDIR/$EXTENSION/lib
MODULEDIR: Removed
DATA and DATA_built: Installed into $EXTDIR/$EXTENSION/share
DATA_TSEARCH: Installed into $EXTDIR/$EXTENSION/share/tsearch_data
DOCS: Installed into $EXTDIR/$EXTENSION/doc
PROGRAM, SCRIPTS and SCRIPTS_built: Installed into $EXTDIR/$EXTENSION/bin

Each of these locations can still be overridden by setting one of the (currently undocumented) installation location options (e.g., datadir, pkglibdir, etc.).

External projects that install extensions without using PGXS, like

pgrx, must also be updated to either follow the same pattern or to delegate installation to PGXS.

Control File

The directory control file parameter will be deprecated and ignored.

The module_pathname parameter should only name a shared module in the lib subdirectory of an extension directory. Any existing use of a $libdir prefix will be stripped out and ignored before replacing the MODULE_PATHNAME string in SQL files. The implication for loading extension dynamic modules¹ differs from the existing behavior as follows:

If the name is an absolute path, the given file is loaded.
If the name does not contain a directory part, the file is searched for in the in the lib subdirectory of the extension’s directory ($EXTDIR/$EXTENSION/lib).
Otherwise (the file was not found in the path, or it contains a non-absolute directory part), the dynamic loader will try to take the name as given, which will most likely fail. (It is unreliable to depend on the current working directory.)

Use Cases

Here’s how the proposed file layout and extension_path GUC addresses the use cases that inspired this RFC.

Packager Testing

A packager who wants to run tests without modifying a PostgreSQL install would follow these steps:

Prepend a directory under the packaging install to the extension_path GUC. The resulting value would be something like $RPM_BUILD_ROOT/$(pg_config --extdir):$extdir.
Install the extension into that directory: make install EXTDIR=$RPM_BUILD_ROOT
Make sure the PostgreSQL server can access the directory, then run make installcheck

This will allow PostgreSQL to find and load the extension during the tests. The Postgres installation will not have been modified; only the extension_path will have changed.

OCI/Kubernetes

To allow extensions to be added to a OCI container and to persist beyond its lifetime, one or more volumes could be used. Some examples:

Mount a persistent volume for extensions and prepend the path to that directory to the extension_path GUC. Then Postgres can find any extensions installed there, and they will persist. Files for all extensions will live on a single volume.
Or, to meet a desire to keep some extensions separate (e.g., open-source vs company-internal extensions), two or more persistent volumes could be mounted, as long as they’re all included in extension_path, are accessible by PostgreSQL, and users take care to install extensions in the proper locations.

CNPG Extension Images

To meet the CNPG ambition to “install” an extension by mounting a single directory for each, create separate images for each extension, then use the Kubernetes image volume feature (currently in alpha) to mount each as a read-only volume in the appropriate subdirectory of a directory included in extension_path. Thereafter, any new containers would simply have to mount all the same extension image volumes to provide the same extensions to all containers.

Postgres.app

To allow extension installation without invalidating the Postgres.app bundle signature, the default configuration could prepend a well-known directory outside the app bundle, such as /Library/Application Support/Postgres, to extension_path. Users wishing to install new extensions would then need to point the EXTDIR parameter to that location, e.g.,

$ make install EXTDIR="/Library/Application Support/Postgres"`

Or the app could get trickier, setting the --extdir value to that location so that users don’t need to use EXTDIR. As long as extension_path includes both the bundle’s own extension directory and this external directory, Postgres will be able to find and load all extensions.

Extension Directory Examples

A core extension like citext would have a structure similar to:

citext
├── citext.control
├── lib
│   ├── citext.dylib
│   └── bitcode
│       ├── citext
│       │   └── citext.bc
│       └── citext.index.bc
└── share
    ├── citext--1.0--1.1.sql
    ├── citext--1.1--1.2.sql
    ├── citext--1.2--1.3.sql
    ├── citext--1.3--1.4.sql
    ├── citext--1.4--1.5.sql
    ├── citext--1.4.sql
    └── citext--1.5--1.6.sql

The subdirectory for a pure SQL extension named “pair” in a directory named “pair” that looks something like this:

pair
├── LICENSE.md
├── README.md
├── pair.control
├── doc
│   ├── html
│   │   └── pair.html
│   └── pair.md
└── share
    ├── pair--1.0--1.1.sql
    └── pair--1.1.sql

A binary application like pg_top would live in the pg_top directory, structured something like:

pg_top
├── HISTORY.rst
├── INSTALL.rst
├── LICENSE
├── README.rst
├── bin
│   └── pg_top
└── doc
    └── man
        └── man3
            └── pg_top.3

And a C extension like semver would live in the semver directory and be structured something like:

semver
├── LICENSE
├── README.md
├── semver.control
├── doc
│   └── semver.md
├── lib
│   ├── semver.dylib
│   └── bitcode
│       ├── semver
│       │   └── semver.bc
│       └── semver.index.bc
└── share
    ├── semver--1.0--1.1.sql
    └── semver--1.1.sql

Phase Two: Preloading

The above-proposed solution does not allow shared modules distributed with extensions to compatibly be loaded via shared library preloading, because extension modules wil no longer live in the dynamic_library_path. Users can specify full paths, however. For example, instead of:

shared_preload_libraries = 'pg_partman_bgw'

One could use the path to the lib subdirectory of the extension’s directory:

shared_preload_libraries = '/opt/postgres/extensions/pg_partman_bgw/lib/pg_partman_bgw'

But users will likely find this pattern cumbersome, especially for extensions with multiple shared modules. Perhaps some special syntax could be added to specify a single extension module, such as:

shared_preload_libraries = '$extension_path::pg_partman_bgw'

But this overloads the semantics of shared_preload_libraries and the code that processes it rather heavily, not to mention the LOAD command.

Therefore, as a follow up to the solution proposed above, this RFC proposes additional changes to PostgreSQL.

Extension Preloading

Add new GUCs that complement shared library preloading, but for extension module preloading:

shared_preload_extensions
session_preload_extensions
local_preload_extensions

Each takes a list of extensions for which to preload shared modules. In addition, another new GUC, local_extensions, will contain a list of administrator-approved extensions users are allowed to include in local_preload_extensions. This GUC complements local_preload_libraries’s use of a plugins directory.

Then modify the preloading code to also preload these files. For each extension in a list, it would:

Search each path in extension_path for the extension.
When found, load all the shared libraries from $extension/lib.

For example, to load all shared modules in the pg_partman extension, set:

shared_preload_extensions = 'pg_partman'

To load a single shared module from an extension, give its name after the extension name and two colons. This example will load only the pg_partman_bgw shared module from the pg_partman extension:

shared_preload_extensions = 'pg_partman::pg_partman_bgw'

This change requires a one-time change to existing preload configurations on upgrade.

Future: Deprecate LOAD

For a future change, consider modifying CREATE EXTENSION to support shared module-only extensions. This would allow extensions with no SQL component, such as auto_explain, to be handled like any other extension; it would live under one of the directories in extension_path with a structure like this:

auto_explain
├── auto_explain.control
└── lib
   ├── auto_explain.dylib
   └── bitcode
       ├── auto_explain
       │   └── auto_explain.bc
       └── auto_explain.index.bc

Note the auto_explain.control file. It would need a new parameter to indicate that the extension includes no SQL files, so CREATE EXTENSION and related commands wouldn’t try to find them.

With these changes, extensions could become the primary, recommended interface for extending PostgreSQL. Perhaps the LOAD command could be deprecated, and the *_preload_libraries GUCs along with it.

Compatibility Issues

The module_pathname control file variable would prefer the name of a shared module. The code that replaces the MODULE_PATHNAME string in SQL files would to strip out the $libdir/ prefix, if present.
The behavior of loading dynamic modules that ship with extensions (i.e., the value of the AS part of CREATE FUNCTION) would change to look for a library name (with no directory part) in the lib subdirectory of the extension directory.
The directory control file parameter and the MODULEDIR PGXS variable would be deprecated and ignored.
*_preload_libraries would no longer be used to find extension modules without full paths. Administrators would have to remove module names from these GUCs and add the relevant extension names to the new *_preload_extensions variables. To ease upgrades, we might consider adding a PGXS variable that, when true, would symlink shared modules into --pkglibdr.
LOAD would no longer be able to find shared modules included with extensions, unless we add a PGXS variable that, when true, would symlink shared modules into --pkglibdr.
The EXTENSION PGXS variable will no longer support multiple extension names.
The change in extension installation locations must also be adopted by projects that don’t use PGXS for installation, like pgrx. Or perhaps they could be modified to also use PGXS. Long term it might be useful to replace the Makefile-based PGXS with another installation system, perhaps a CLI.

Out of Scope

This RFC does not include or attempt to address the following issue:

How to manage third-party shared libraries. Making system dependencies consistent in a OCI/Kubernetes environment or for non-system binary packaging patterns presents its own challenges, though they’re not specific to PostgreSQL or the patterns described here. Research is ongoing into potential solutions, and will be addressed elsewhere.

Acknowledgements

A slew of PostgreSQL community members contributed feedback, asked hard questions, and suggested moderate to significant revisions to this RFC via the the pgsql-hackers list, in-person discussion at PGConf.eu, and pull request comments. I’d especially like to thank:

Yurii Rashkovskii and David Christensen for highlighting this issue at the Extension Ecosystem Summit
Christoph Berg for the original patch, calling attention to the permission issues when building Debian packages, and various lists discussions
Tobias Bussmann for calling attention to the immutability issues with Postgres.app
Christoph Berg, Gabriele Bartolini, Peter Eisentraut, and Andres Freund for detailed discussion at PGConf.eu on extension location issues and getting to consensus on a genera approach to solving it
Douglas J Hunley, Shaun Thomas, and Keith Fiske for pull request reviews and corrections
Álvaro Hernández Tortosa for a very close review and ton of substantive feedback on the pull request
Paul Ramsey, Tristan Partin, Ebru Aydin Gol, and Peter Eisentraut for pgsql-hackers list discussions.
Tembo for supporting my work on this and many other extension-related issues

All remaining errors and omissions remain my own.

But not non-extension modules; see Phase Two and Future for further details on preloading extension modules and eventually deprecating non-extension modules. ↩︎

Bye Twitter

14 Nov 2024

I quit Twitter in November 2022, when it became clear that Phony Stark would take control of the company. No regrets; I’ve been active on Mastodon since then, via Ivory, where I’ve enjoyed the Twitter ca. 2010 vibe.

But I hadn’t deleted my Tweets or my accounts until this week. The change was finally triggered by the latest announcement:

On Friday, X will adopt a new terms of service policy that for the first time explicitly says all users agree to let their posts be used to train artificial intelligence, like the company’s generative AI service, Grok. Currently, X users can go to the site’s settings and opt out.

I have no interest in feeding Grok or any other large language model. So this week I deleted all my tweets and then my accounts. I tried to transfer my personal username to a new, empty account, but in the second between changing it and switching to another tab to take it, it was somehow acquired by an account created in 2011 with no posts. Not sure how that worked. Pretty scummy.

So, just to be clear, anyone using these usernames on Twitter are not me anymore:

@theory
@lunar_theory
@DesignScene
@pgxn

See you on Mastodon — or perhaps Bluesky.

Introducing RFC 9535 Go JSONPath and Playground

4 Nov 2024

I’ve written and release a RFC 9535 JSONPath JSONPath Go package, github.com/theory/jsonpath. Why? For a personal project, I needed a simpler JSONPath engine to complement to the Go SQL/JSON Path package, and quickly found myself implementing most of the RFC. So I decided do the whole thing.

Yes, yet another JSONPath package in Go. I really appreciate the idea of a standard — plus its support for features not included in the original design from 2007, such as object slices! But I could find no reference to the RFC on pkg.go.dev. Today the search shows one!

Example

Usage is straightforward; here’s a quick example (Go playground):

package main

import (
	"fmt"
	"log"

	"github.com/theory/jsonpath"
)

func main() {
	// Parse a jsonpath query.
	p, err := jsonpath.Parse(`$["name", "slogan"]`)
	if err != nil {
		log.Fatal(err)
	}

	// Select values from unmarshaled JSON input.
	json := map[string]any{
		"name":   "Kamala Harris",
		"title":  "Vice President of the United States",
		"home":   "California",
		"slogan": "We are not going back!",
	}
	items := p.Select(json)

	// Show the result.
	fmt.Printf("%#v\n", items)
}

And the output:

[]interface {}{"Kamala Harris", "We are not going back!"}

🛝 Playground

No need to write code to try it out, though. I’ve also written a playground webapp to encourage experimentation and exploration of the syntax and behavior of the package. The implementation follows the precedents set by the Go SQL/JSON Playground and design of the Rust JSONPath Playground. Moreover, thanks to TinyGo, the Wasm file comes in at a mere 254K!

The webapp loads sample JSON from the RFC, and randomly rotates through a few example JSONPath queries. Fill in your own and tap the “Permalink” button to share links. The Playground is a stateless JavaScript/Wasm web application: data persists only in permalink URLs.¹

🛝 Try this example Playground permalink right now!

Status

The root jsonpath package is stable and ready for use. Other packages remain in flux, as I refactor and rejigger things in the coming weeks as part of the aforementioned personal project. But for actual JSONPath execution and querying, it should continue to work as-is for the foreseeable future.

I hope you find it useful.

And whatever data [GitHub Pages collect] 😔. ↩︎

NYTimes Tech Guild on Strike

4 Nov 2024

A New York Times-style election needle illustration ranging from “contract” to strike", with the needle all the way over to “strike”

This is a big deal. My former colleagues in the New York Times Tech Guild have declared a strike today in response to a number of unfair labor practice violations and management’s inability to bargain in good faith. From the Washington Post:

Represented by the NewsGuild of New York, the Tech Guild has been negotiating its first contract with the company since 2022. Both sides have accused the other side of bogging down contract talks, which have recently focused on three key issues: “just cause” job protections (which ensure employees can’t be fired without reason and due process), remote work and pay equity. Unit members voted overwhelmingly in September to authorize the strike, saying the timing of the authorization — in the run-up to the high-profile election period — was “no accident.”

I urge you to join me in supporting the strike fund, which will help more vulnerable members to weather the strike.

Beyond that, the Guild asks supporters to refrain from interacting with Cooking and Games, and to break your streaks. Times reporters Maggie Astor has the details:

As of this morning, the New York Times Tech Guild, which represents NYT tech workers, is on strike.

Unless it’s resolved while today’s Wordle is still live, this streak is gone.

NYT Games and Cooking are BEHIND THE PICKET LINE. Please don’t play or engage with Games or Cooking content while the strike lasts!

News coverage is NOT behind the picket line. It’s okay to read and share that, though the site and app may have problems.

Read the whole thread for more information, which, as she says, “the rules may not be what you think!”

PGConf & Extension Ecosystem Summit EU 2024

29 Oct 2024

Last week I MCed the first Extension Ecosystem Summit EU and attended my first at PGConf EU in Athens, Greece. Despite my former career as an archaeologist — with a focus on Mediterranean cultures, no less! — this was my first visit to Greece. My favorite moment was the evening after the Summit, when I cut out of a networking shindig to walk to Pláka and then circumnavigate the Acropolis. I mean just look at this place!

Nighttime photo of the Acropolis of Athens — The Acropolis of Athens on the evening of October 22, 2024. © 2024 David E. Wheeler

Highlight of the trip for sure. But the Summit and conference were terrific, as well.

Extension Ecosystem Summit

Floor Drees kindly organized The Extension Ecosystem Summit EU, the follow-up to the PGConf.dev original. While the Vancouver Summit focused on developers, we tailored this iteration to users. I started the gathering with a condensed version of my POSETTE talk, “State of the Postgres Extension Ecosystem”, but updated with a Trunk OCI Distribution demo. Links:

We then moved into a lightning round of 10 minute introductions to a variety of extensions:

Jimmy Angelakos showed off pg_statviz, a “minimalist extension and utility pair for time series analysis and visualization of PostgreSQL internal statistics”.
Adam Hendel gave a brief history of pgmq, is a “lightweight message queue like AWS SQS and RSMQ, but on Postgres.”
Gülçin Yıldırım Jelínek introduced pgzx, “a library for developing PostgreSQL extensions written in Zig.”
James Sewell talked about pgvectorscale and pgai for managing machine training workloads on Postgres.
Alastair Turner described pg_tde, an extension that provides transparent data encryption for Postgres.

Quite the whirlwind! There followed open discussion, in which each maintainer went to a corner to talk to attendees about contributing to their extensions. Details to come in a more thorough writeup on the Tembo blog, but I personally enjoyed some fascinating discussions about extension distribution challenges.

PGConf.eu

Following the Summit, I attended several thought-provoking and provocative presentations at PGConf.eu, which took place at the same hotel, conveniently enough.

Floor Drees speaking at a podium, next to a slide reading “Why Postgres?” — Floor Drees speaking at PGConf.eu 2024. © 2024 David E. Wheeler

In the conference keynote, Stacey Haysler explained the The PostgreSQL License Fee. I’m pleased to say that my employer “pays” license fee!
Andres Freund (yes, that one) summarized NUMA vs PostgreSQL, covering some of the issues and opportunities for optimization of PostgreSQL on servers using the NUMA multi-processor memory architecture.
Heikki Linnakangas offered an overview of The Wire Protocol, that bit of PostgreSQL technology that lets clients talk to PostgreSQL.
Ophir Lojkine and Thomas Guillemard showed how a pretty sweet framework, SQLPage, lets archaeologists like Thomas write complete web applications in pure SQL, enabling rapid data entry while in the field.
Tomas Vondra’s Performance Archaeology dug into the history of PostgreSQL improvements, mapping their compounding returns over time.
Floor Drees (photo above) talked about Supporting extensions, but really now, making a strong case for the need to organize infrastructure to support the broader extension ecosystem.

There were many more talks, but clearly I tend to be drawn to the most technical, core-oriented topics. And also archaeology.

Museums

Speaking of which, I made time to visit two museums while in Athens. First up was the National Archaeological Museum of Athens, where I was delighted to explore the biggest collection of Mycenaean artifacts I’ve ever seen, including massive collections from the excavations of Heinrich Schliemann. So much great Bronze Age stuff here. I mean, just look at this absolute unit:

Photo of a Mycenaean Krater featuring a horse-drawn chariot — From the museum description: “Fragment of a krater depicting a chariot with two occupants. A male figure holding a staff walks in front of the chariot. Much of the Mycenaean Pictorial Style pottery (14th-12th centuries BC) with representations of humans, chariots, horses and bulls on large kraters, was produced at Berbati in the Argolid and exported to Cyprus, where it was widely imitated. Birds, fish, wild goats or imaginary creatures (i.e. sphinxes) occur on other types of vessels, such as jugs and stirrup jars. Usually only fragments of these vases survive in mainland Greece from settlement contexts. In Cyprus, however, complete vases are preserved, placed as grave gifts in tombs.” © Photo 2024 David E. Wheeler

The animal decorations on Mycenaean and Akrotiri pottery is simply delightful. I also enjoyed the Hellenistic stuff, and seeing the famed Antikythera Mechanism filled my nerd heart with joy. A good 3 hours poking around; I’ll have to go back and spend a few days there sometime. Thanks to my pal Evan Stanton for gamely wandering around this fantastic museum with me.

Immediately after the PGConf.eu closing session, I dashed off to the Acropolis Museum, which stays open till 10 on Fridays. Built in 2009, this modern concrete-and-glass building exhibits several millennia of artifacts and sculpture exclusively excavated from the Acropolis or preserved from its building façades. No photography allowed, alas, but I snapped this photo looking out on the Acropolis from the top floor.

Photo of the Acropolis as viewed from inside the Acropolis Museum. — The Acropolis as viewed from inside the Acropolis Museum. Friezes preserved from the Parthenon inside the museum reflect in the glass, as does, yes, your humble photographer. © 2024 David E. Wheeler

I was struck by the beauty and effectiveness of the displays. It easily puts the lie to the assertion that the Elgin Marbles must remain in the British Museum to protect them. I saw quite a few references to the stolen sculptures, particularly empty spots and artfully sloppy casts from the originals, but the building itself makes the strongest case that the marbles should be returned.

But even without them there remains a ton of beautiful sculpture to see. Highly recommended!

Back to Work

Now that my sojourn in Athens has ended, I’m afraid I must return to work. I mean, the event was work, too; I talked to a slew of people about a number of projects in flight. More on those soon.

PGXN v2 Update

9 Oct 2024

Speaking of PGXN news, I neglected to link to this post I wrote for the Tembo Blog last month, a fairly detailed accounting of what’s been happening on the PGXN v2 project:

Forgive me Postgres community, for it has been five months since my last PGXN v2 Update. In my defense, it has been super busy! The time went into ongoing community discussions, planning, designs, and the start of implementation. Join me below for the lowdown.

A few highlights:

There’s been quite a bit of activity since then, including the aforementioned PGXN RFC–5 — Release Certification. More soon!

PGXN Certifications RFC

9 Oct 2024

A couple weeks ago, I drafted PGXN RFC–5 — Release Certification, which proposes to replace the simple inclusion of a SHA-1 hash digests in PGXN release META.json files with a JWS-signed release payload. From the introduction:

This RFC therefore proposes to extend v2 distribution metadata with a single additional property, certs, that contains one or more certifications that attest to the authenticity or other characteristics of a release on PGXN.

The certs value is an object that contains at least one property, pgxn, which itself contains a PGXN-generated RFC 7515 JSON Web Signature in the JWS JSON Serialization format. The pgxn property will allow clients not only to assemble the release URL and verify the downloaded file against checksums, but also validate it against a public key provided by PGXN.

The design allows multiple signatures, certifications, or other attestations, which in the future MAY allow authors or other entities to sign releases with their own keys. The new format appends a structure such as this to the distribution META.json file:
{
  "certs": {
    "pgxn": {
      "payload": "eyJ1c2VyIjoidGhlb3J5IiwiZGF0ZSI6IjIwMjQtMDktMTNUMTc6MzI6NTVaIiwidXJpIjoiZGlzdC9wYWlyLzAuMS43L3BhaXItMC4xLjcuemlwIiwiZGlnZXN0cyI6eyJzaGE1MTIiOiJiMzUzYjVhODJiM2I1NGU5NWY0YTI4NTllN2EyYmQwNjQ4YWJjYjM1YTdjMzYxMmIxMjZjMmM3NTQzOGZjMmY4ZThlZTFmMTllNjFmMzBmYTU0ZDdiYjY0YmNmMjE3ZWQxMjY0NzIyYjQ5N2JjYjYxM2Y4MmQ3ODc1MTUxNWI2NyJ9fQ",
      "signature": "cC4hiUPoj9Eetdgtv3hF80EGrhuB__dzERat0XF9g2VtQgr9PJbu3XOiZj5RZmh7AAuHIm4Bh-rLIARNPvkSjtQBMHlb1L07Qe7K0GarZRmB_eSN9383LcOLn6_dO--xi12jzDwusC-eOkHWEsqtFZESc6BfI7noOPqvhJ1phCnvWh6IeYI2w9QOYEUipUTI8np6LbgGY9Fs98rqVt5AXLIhWkWywlVmtVrBp0igcN_IoypGlUPQGe77Rw"
    }
  }
}

Review and feedback would be very much appreciated, especially on the list of unresolved questions toward the end.

Thanks to David Christensen and Steven Miller for the early reviews!

Meanwhile, I’ve released pgxn_meta v0.4.0, which adds support for this format, as well as code to rewrite PGXN v1 release fields to the new format. It doesn’t actually do signature verification, yet, as the server back end hasn’t been updated with the pattern and PKI. But I expect to modify it in response to feedback and get it implemented in early 2025.

⛰️ Postgres Ecosystem Summit EU

9 Oct 2024

Given the success of the Extension Ecosystem Summit at PGConf.dev back in May, my colleague Floor Drees has organized a sequel, the Extension Ecosystem Summit EU on Tuesday, October 22, at the Divani Caravel Hotel in Athens. That’s “Day 0” at the same hotel as PGConf.eu. Tembo, Percona, Xata, and Timescale co-sponsor.

While the May event took the form of an open-space technology (OST)-style unconference aimed at extension developers, the EU event aims to inform an audience of Postgres users about the history and some exemplary use cases for extensions. From the invite:

Join us for a gathering to explore the current state and future of Postgres extension development, packaging, and distribution. Bring your skills and your devices and start contributing to tooling underpinning many large Postgres installations.

Jimmy Angelakos - pg_statviz: pg_statviz is a minimalist extension and utility pair for time series analysis and visualization of PostgreSQL internal statistics.

Adam Hendel (Tembo) - pgmq: pgmq is a lightweight message queue. Like AWS SQS and RSMQ but on Postgres. Adam is pgmq’s maintainer since 2023, and will present a journey from pure Rust → pgrx → pl/pgsql.

Alastair Turner (Percona) - pg_tde: pg_tde offers transparent encryption of table contents at rest, through a Table Access Method extension. Percona has developed pg_tde to deliver the benefits of encryption at rest without requiring intrusive changes to the Postgres core.

Gülçin Yıldırım Jelínek (Xata) - pgzx: pgzx is a library for developing PostgreSQL extensions written in Zig.

Mats Kindahl (Timescale) - TimescaleDB (C), [pgvectorscale] (Rust) and pgai (Python): maintaining extensions written in different languages.

I will also deliver the opening remarks, including a brief history of Postgres extensibility. Please join us if you’re in the area or planning to attend PGConf.eu. See you there!

Release: pgxn_meta v0.1.0

8 Aug 2024

Following the recent spate of work drafting RFCs for a binary distribution format and Meta Spec v2, and bearing in mind the vote to implementing PGXN v2 apps in Rust, over the last few weeks I’ve switched gears to write some code.

First I wrote JSON Schemas for the v1 spec and then wrote a full test suite using the boon crate. Next I wrote and tested JSON Schemas for the v2 spec. This process informed revisions to the RFC, uncovering inconsistencies and unnecessary complexities.

With META.json file JSON Schema validation fully worked up, I decided to work the functionality into a proper Rust crate and CLI to eventually replace the old PGXN::Meta::Validator Perl module and its validate_pgxn_meta CLI. This turned out to be a decent Rust starter project, requiring a fairly simple crate and CLI, but also allowed me to develop patterns to build and release binaries for a variety of platforms and architecture.

As a result, I’m happy to announce the release today of the pgxn_meta crate and pgxn_meta CLI v0.1.0, available for download on these platforms, thanks to cross and houseabsolute/actions-rust-cross:

darwin-amd64
darwin-arm64
freebsd-amd64
freebsd-i686
illumos-amd64
linux-amd64
linux-arm
linux-arm64
linux-i686
linux-powerpc
linux-powerpc64
linux-powerpc64le
linux-riscv64
linux-s390x
linux-sparc64
netbsd-amd64
solaris-amd64
solaris-sparcv9
windows-amd64
windows-arm64
windows-i686

Download the archive file appropriate to your platform, decompress it, and put the pgxn_meta (or pgxn_meta.exe) binary in your path. Or use the universal binary installer (ubi) to install it:

ubi --project pgxn/meta --exe pgxn_meta --in ~/bin

And of course you can use cargo to compile it from source:

cargo install pgxn_meta

Usage is simple: just run pgxn_meta in a directory containing the META.json file to validate:

❯ pgxn_meta 
META.json is OK

And optionally pass it the name of the file, as in this example parsing a test file with no version property:

❯ pgxn_meta corpus/invalid.json 
Error: "corpus/invalid.json jsonschema validation failed with https://pgxn.org/meta/v2/distribution.schema.json#\n- at '': missing properties 'version'"```

That’s it!

What’s Next?

Now that I’ve implemented validation and figured out multi-platform binary support for Rust apps, my next tasks are to:

Implement a pattern to convert a v1 META.json to the v2 format
Create a pattern to merge multiple META.json files into one
Write code to build PGXS extension into trunk packages
Develop patterns to satisfy third-party dependencies for multiple platforms

Should keep me busy for a few weeks. Updates as I have them.

To Preload, or Not to Preload

7 Aug 2024

The Tembo Blog published a post by yours truly last week about when to preload shared libraries and when not to:

Recently I’ve been trying to figure out when a Postgres extension shared libraries should be preloaded. By “shared libraries” I mean libraries provided or used by Postgres extensions, whether LOADable libraries or CREATE EXTENSION libraries written in C or pgrx. By “preloaded” I mean under what conditions should they be added to one of the Shared Library Preloading variables, especially shared_preload_libraries.

The answer, it turns out, comes very much down to the extension type.

I view this post as a kind of proto-chapter for an imagined book about developing extensions that I’d like to work on someday. I learned quite a lot researching it and responding to extensive feedback from more knowledgeable community members. It resulted in updates to the PGXN Meta preload property that I hope will inform binary distribution in the future. More on that soon.

RFC: PGXN Meta Spec v2

15 Jul 2024

Two bits of news on the “PGXN v2” project.

PGXN RFCs: The Book

First, I’ve moved the RFC process (again, sorry) from PGXN Discussions, which were a bit fussy about Markdown formatting and don’t support inline comments, to the PGXN RFCs project, where use of pull requests on CommonMark Markdown documents address these issues. This process borrows heavily from the Rust RFCs project, right down to publishing accepted RFCs as a “book” site.

So I’d also like to introduce rfcs.pgxn.org, a.k.a., the PGXN RFCs Book.

It currently houses only one RFC: Meta Spec v1, dating from 2010. This document defines the structure of the META.json file required in archives published on PGXN.

But I expect many more RFCs to be drafted in the coming years, starting with draft RFC–2, the binary distribution RFC I POCed a few weeks ago. There has already been some great feedback in that pull request, in addition to the previous discussion. More eyes will make it even better.

PGXN Meta Spec v2 RFC

Last week I also iterated on the PGXN Metadata Sketch several times to produce draft RFC–3: Meta Spec v2. This represents a major reworking of the original spec in an attempt to meet the following goals:

Allow more comprehensive dependency specification, to enable packagers to identify and install system dependencies and dependencies from other packaging systems, like PyPI and CPAN
Adopt more industry-standard formats like SPDX License Expressions and purls.
Improve support multiple types of Postgres extensions, including apps, LOADable modules, background workers, and TLEs.
Improve curation and evaluation via categories, badging, and additional download links.

There’s a lot here, but hope the result can better serve the community for the next decade, and enable lots of new services and features.

The proof will be in the application, so my next task is to start building the tooling to turn PGXN distributions into binary distributions. I expect experimentation will lead to additional iterations, but feel confident that the current state of both RFC–2 and RFC–3 is on the right track.