POC: Distributing Trunk Binaries via OCI
A couple months ago, Álvaro Hernández suggested that Postgres extensions should be distributed as OCI (née Docker) images:
It’s all about not reinventing the wheel, and leveraging the ecosystem around OCI. Many of the problems (solutions) in building, packaging and distributing extensions are already solved by OCI: there’s a whole ecosystem of tools around OCI that provide additional benefits in terms of tooling, infrastructure and common knowledge.
As a relatively experienced Docker image builder and distributor, I found this idea intriguing. I wasn’t familiar with the OCI Image Manifest Specification, which defines how to build OCI images containing arbitrary files, or “artifacts”. But if we could adopt an existing protocol and federated registry system like OCI/Docker, it would save pretty significant development time over building our own — plus we’d be adopting and potentially contributing to a standard.
After PGConf.dev, I decided to see if I could work out how to distribute packages in the recently-proposed trunk format such that an OCI/Docker-style image URL could be used to install a version of an extension compiled for the appropriate architecture.
Thanks to the denizens of the #oras
and #zot
channels on the CNCF Slack,
I extended the trunk format POC in pg-semver PR 69 to build the
necessary JSON manifest files, push them to a registry, and then pull and
install the architecturally-appropriate package. Here’s how it works.
Metadata generation
First, I extended trunk.mk
, which builds a trunk package, with a
few more targets that create the JSON files with metadata necessary to build
OCI manifests. The files that make trunk
now also generates are:
{extension}_annotations.json
- OCI standard annotations describing a package, including license, vendor,
and URLs. The
semver_annotations.json
file looks like this:{ "org.opencontainers.image.created": "2024-06-20T18:07:24Z", "org.opencontainers.image.licenses": "PostgreSQL", "org.opencontainers.image.title": "semver", "org.opencontainers.image.description": "A Postgres data type for the Semantic Version format with support for btree and hash indexing.", "org.opencontainers.image.source": "https://github.com/theory/pg-semver", "org.opencontainers.image.vendor": "PGXN", "org.opencontainers.image.ref.name": "0.32.1", "org.opencontainers.image.version": "0.32.1", "org.opencontainers.image.url": "https://github.com/theory/pg-semver" }
{package_name}_config.json
- An object with fields appropriate for OCI platform specification, plus the
creation date. Here are the content of
semver-0.32.1+pg16-darwin-23.5.0-arm64_config.json
: -
{ "os": "darwin", "os.version": "23.5.0", "architecture": "arm64", "created": "2024-06-20T18:07:24Z" }
{package_name}_annotations.json
- An object defining annotations to use in an image, built for a specific
platform, all under the special key
$manifest
to be used later by the ORAS CLI to put them in the right place.semver-0.32.1+pg16-darwin-23.5.0-arm64_annotations.json
example: -
{ "$manifest": { "org.opencontainers.image.created": "2024-06-20T18:07:24Z", "org.opencontainers.image.title": "semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk", "org.opencontainers.image.licenses": "PostgreSQL", "org.opencontainers.image.description": "A Postgres data type for the Semantic Version format with support for btree and hash indexing.", "org.opencontainers.image.source": "https://github.com/theory/pg-semver", "org.opencontainers.image.vendor": "PGXN", "org.opencontainers.image.ref.name": "0.32.1", "org.opencontainers.image.version": "0.32.1", "org.opencontainers.image.url": "https://github.com/theory/pg-semver", "org.pgxn.trunk.pg.version": "16.3", "org.pgxn.trunk.pg.major": "16", "org.pgxn.trunk.pg.version_num": "160003", "org.pgxn.trunk.version": "0.1.0" } }
-
The
org.opencontainers.image
keys are the same as insemver_annotations.json
, while the neworg.pgxn.trunk
annotations are intended for an install client to find the image appropriate for the version of Postgres, although that functionality isn’t part of this POC.
The only change to the Makefile
to support these annotations are the
addition of a DESCRIPTION
variable to populate
org.opencontainers.image.description
and a REPO_URL
to populate
org.opencontainers.image.source
. trunk.mk
includes a couple other new
variables, too: TITLE
(defaults to EXTENSION
), VENDOR
(defaults to
“PGXN”), and URL
(defaults to REPO-URL
).
Publishing Images
The new shell script push_trunk
uses the ORAS CLI and jq
to build the
necessary manifest files and push them to an OCI registry. It currently works
only two trunk files like those built in the trunk POC. It first
“pushes” the trunks to a locally-created OCI layout, then constructs
manifests associated the SHA ID of each just-pushed image with annotations and
platform configurations and writes them into an image index manifest.
Finally, it pushes the complete OCI layout described by the index to a remote
registry.
If that sounds like a lot of steps, you’re right, it adds up. But the result, following a precedent established by Homebrew (as described in this issue) is multiple images for different platforms indexed at a single URI. Once we publish the two trunks:
./push_trunk localhost:5000/theory/semver:0-32.1 \
semver-0.32.1+pg16-darwin-23.5.0-arm64 \
semver-0.32.1+pg16-linux-amd64
We can fetch the manifests. The address for the image index is that first
parameter, localhost:5000/theory/semver:0-32.1
; we fetch the manifest with
the command
oras manifest fetch localhost:5000/theory/semver:0-32.1
Which returns:
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.index.v1+json",
"manifests": [
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"size": 1285,
"digest": "sha256:1a14997eb380f9641cba6193c001eb630319f345d76ef07aee37f86fafcdbe0b",
"platform": {
"os": "linux",
"architecture": "amd64"
},
"annotations": {
"org.pgxn.trunk.pg.version": "16.3",
"org.pgxn.trunk.pg.major": "16",
"org.pgxn.trunk.pg.version_num": "160003",
"org.pgxn.trunk.version": "0.1.0"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"size": 1302,
"digest": "sha256:385fcfe6b33c858c3f126fb4284afe23ba8c2f7c32db8a50a607dfece6dd9162",
"platform": {
"os": "darwin",
"os.version": "23.5.0",
"architecture": "arm64"
},
"annotations": {
"org.pgxn.trunk.pg.version": "16.3",
"org.pgxn.trunk.pg.major": "16",
"org.pgxn.trunk.pg.version_num": "160003",
"org.pgxn.trunk.version": "0.1.0"
}
}
],
"annotations": {
"org.opencontainers.image.created": "2024-06-21T13:55:01Z",
"org.opencontainers.image.licenses": "PostgreSQL",
"org.opencontainers.image.title": "semver",
"org.opencontainers.image.description": "A Postgres data type for the Semantic Version format with support for btree and hash indexing.",
"org.opencontainers.image.source": "https://github.com/theory/pg-semver",
"org.opencontainers.image.vendor": "PGXN",
"org.opencontainers.image.ref.name": "0.32.1",
"org.opencontainers.image.version": "0.32.1",
"org.opencontainers.image.url": "https://github.com/theory/pg-semver"
}
}
Note the manifests
array, which lists images associated with this URI. The
first one is for amd64 linux and the second for arm64 darwin. They also
contain the org.pgxn.trunk
annotations that would allow filtering for an
appropriate Postgres version. The idea is to download an index like this, find
the manifest information for the appropriate platform and Postgres version,
and download it. To get the darwin image, pull it by its digest:
oras pull localhost:5000/theory/semver:0-32.1@sha256:385fcfe6b33c858c3f126fb4284afe23ba8c2f7c32db8a50a607dfece6dd9162
Which downloads the file:
$ ls -l *.trunk
semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk
Nice! The OCI protocol allows for filtering on platform directly, skipping the
need to download and examine the image index. This is how docker pull --platform
works, but is general to OCI. We can fetch a manifest with this
command:
oras manifest fetch --platform linux/amd64 localhost:5000/theory/semver:0-32.1
Which returns not the image index, but the manifest for the Linux image:
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"artifactType": "application/vnd.pgxn.trunk.layer.v1",
"config": {
"mediaType": "application/vnd.oci.image.config.v1+json",
"digest": "sha256:90c8d6f2e67cba09f8178648ad95a6b31e51c0d902058bf396f9e7e5f50c8dfd",
"size": 84
},
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:b52b292cabe3ca479673ab68d3ea647802a86f15059c3e19ed24d5a7688159c3",
"size": 61983,
"annotations": {
"org.opencontainers.image.title": "semver-0.32.1+pg16-linux-amd64.trunk"
}
}
],
"annotations": {
"org.opencontainers.image.created": "2024-06-21T17:55:13Z",
"org.opencontainers.image.description": "A Postgres data type for the Semantic Version format with support for btree and hash indexing.",
"org.opencontainers.image.licenses": "PostgreSQL",
"org.opencontainers.image.ref.name": "0.32.1",
"org.opencontainers.image.source": "https://github.com/theory/pg-semver",
"org.opencontainers.image.title": "semver-0.32.1+pg16-linux-amd64.trunk",
"org.opencontainers.image.url": "https://github.com/theory/pg-semver",
"org.opencontainers.image.vendor": "PGXN",
"org.opencontainers.image.version": "0.32.1",
"org.pgxn.trunk.pg.major": "16",
"org.pgxn.trunk.pg.version": "16.3",
"org.pgxn.trunk.pg.version_num": "160003",
"org.pgxn.trunk.version": "0.1.0"
}
}
Or we can pull the file by platform with:
rm *.trunk
oras pull --platform linux/amd64 localhost:5000/theory/semver:0-32.1
And now the Linux image has been downloaded:
$ ls -1 *.trunk
semver-0.32.1+pg16-linux-amd64.trunk
Pretty nice! These examples use zot running in a local Docker container, but
could just as easily use the Docker registry (docker.io
) or the GitHub
registry (ghcr.io
) — which is where Homebrew stores its images (e.g.,
sqlite 3.46.0).
Installation
With these manifests configured and pushed, changes to install_trunk
use
this knowledge to download from the registry instead of relying on an existing
file (as implemented for the trunk POC). Now we call it like so:
./install_trunk localhost:5000/theory/semver:0-32.1
First, it assembles platform information from uname
, then pulls the
platform-specific image with this oras
command:
oras pull --no-tty --plain-http \
--format 'go-template={{(first .files).path}}'
--platform "$platform" "$trunk"
As before, it downloads the image appropriate for the platform. The --format
option, meanwhile, causes it to also download annotations and extract the path
for the downloaded file. So in addition to downloading the file, it also emits
its full path:
/tmp/pgxn/semver-0.32.1+pg16-darwin-23.5.0-arm64.trunk
the script proceeds to unpack the image with that file name and continues with the installation process as before.
Demo
The last new file in the PR is docker_compose.yml
,
which sets up an amd64 Linux container for building an extension for Postgres
16, and a zot container to push to and pull from. I used it to build this
POC and record this demo:
To use it yourself, run these commands with docker_compose.yml
:
git clone https://github.com/theory/pg-semver.git
cd pg-semver
git checkout -b trunk-oci origin/trunk-oci
docker compose up -d
This clones the pg-semver repository, checks out the trunk-oci
branch,
fires up the containers. Wait a couple minutes for Postgres to start and be
configured, then, assuming you can build against Postgres 16 on your local
machine, you can follow the same steps. The commands in the demo are:
make trunk
docker compose exec linux bash
make clean
make trunk
exit
ls -1 *.trunk
ls -1 *.json
./push_trunk localhost:5000/theory/semver:0.32.1 \
semver-0.32.1+pg16-darwin-23.5.0-arm64 \
semver-0.32.1+pg16-linux-amd64
./install_trunk localhost:5000/theory/semver:0.32.1
docker compose exec linux bash
./install_trunk zot:5000/theory/semver:0.32.1
exit
You might need to adjust the first trunk image name if your local configuration is not the same as mine.
Concept Proven
Honestly, this POC far exceeded my expectations. It worked great! Not only does the trunk format seem to work well, but distributing via OCI registries is even better! It brings a bunch of benefits:
- We could build a community registry that automatically builds images for
PGXN releases for a variety of platforms. This could grow to become the
default method for installing extensions, perhaps via a command such as
pgxn trunk install theory/semver
. - Anyone can use any other registry, and the tooling will work with it. Just
as you can pull Docker images from
docker.io
, you can also pull them fromghcr.io
,quay.io
, or any other OCI-compliant registry. The same applies here. Extension authors can build and publish trunks to their own registries if they like. - Better yet, organizations can build extension registries for their own use cases, to complement the community registry. Think internal registries for private extensions, or commercial registries that additional features, such as security scans or curation.
Super promising! I’m just about ready to get to work building this stuff, though I anticipate a few challenges:
- We’ll need a way to find the latest version (tag) of a release. I’m sure
this is do-able, since Homebrew does it. There must be some other index
for tags (
ghcr.io/homebrew/core/sqlite:latest
doesn’t return a result, alas). - In addition to filtering on platform specification when pulling an image,
it would be nice to filter on other attributes, such as the
org.pgxn.trunk
annotations defining Postgres the version. For now it will be fine for the CLI to download an image index and find the right image, but additional server-side filtering would be very nice. - Will need to support extensions that can run on any architecture, such as pure SQL extensions. I think this will be pretty easy by publishing a single tagged image instead of an image index.
- If we build a community registry, where should it be hosted? Homebrew uses
ghcr.io
, presumably avoiding hosting costs, but it might be nice to have a specific community registry, perhaps attrunk.pgxn.org
or perhapsoci.postgresql.org
. - If we do host a registry, might we want to allow extension authors to publish their own trunks within their namespaces? How might that be organized?
I can imagine workable solutions to these relatively minor challenges. As long as we can encapsulate them into the commands for a single command-line client, it should work out well.
Can’t wait to get started. What do you think?