Just a Theory

By David E. Wheeler

Posts about API

Patch: Postgres ABI and API Guidance

Update 2024-11-14

I forgot to update this post at the time, but on July 31, Peter Eisentraut committed the patch patch ABI and API guidance to the C language documentation. I only noticed because today’s releases contained a modified ABI that broke a number of extensions. See the hackers thread for details.

TL;DR

If you’re a Postgres extension developer interested in understanding what to expect from core API and ABI stability, please review and give feedback on this patch (or pull request) adding ABI and API Guidance to the documentation.


In my PGConf.dev report a couple days ago, I mentioned that a few actionable items came out of the Improving extensions in core unconference session. One was the need to document the heretofore unofficial policy for API and ABI stability between major and, especially, minor versions of Postgres.

A frequent topic at the Extension Summit and Mini-Summits and a number of PCConf sessions has been concern regarding compatibility changes between minor releases of Postgres. At Mini Summit Five, for example, Yurii Rashkovskii presented a few examples of such changes, leading him to conclude, along with several others in the community, that C API-using extensions can only be used when built against the minor release with which they’re used.

In the Unconference session, core committers reported that such changes are carefully made, and rarely, if ever, affect extensions compiled for different minor releases of the same major version. Furthermore, they carefully make such changes to avoid compatibility issues. In the case Yurii found, for example, a field was added to a struct’s padding, without affecting the ordering of other fields, thus minimizing the risk of runtime failures.

It became clear that, although the committers follow a policy — and read new committers into it via patch review — it’s not documented anywhere. The result has been a bunch of sturm und drang amongst extension developer unsure what level of compatibility to depend on and what changes to expect.

The week after the conference, I started a pgsql-hackers thread proposing to document the committer policy. Following some discussion and review of potential ABI breaks in minor releases, the consensus seemed to be that the committers strive to avoid such breaks, that they’re quite uncommon in minor releases, and that most of the reported issues were due to using more obscure APIs.

As a result, we started drafting a policy, and after a few iterations, Peter Eisentraut pulled things together from the perspective of a core team member, reframed as “Server API and ABI Guidance”. I converted it into a patch (and pull request) to add it to the C Language docs. A key statement on minor releases:

In general, extension code that compiles and works with a minor release should also compile and work with any other minor release of the same major version, past or future.

I hope this document clarifies things. Even if it’s not as strict as some might hope, it at least documents the project approach to compatibility, so we have a better idea what to expect when using the C APIs. If you see gaps, or you have additional questions, please respond to pgsql-hackers thread — or the pull request (I’ll propagate comments to hackers).

RFC: Restful Secondary Key API

I’ve been working on a simple CRUD API at work, with an eye to make a nicely-designed REST interface for managing a single type of resource. It’s not a complicated API, following best practices recommended by Apigee and Microsoft. It features exactly the sorts for APIs you’d expect if you’re familiar with REST, including:

  • POST /users: Create a new user resource
  • GET /users/{uid}: Read a user resource
  • PUT /users/{uid}: Update a user resource
  • DELETE /users/{uid}: Delete a user resource
  • GET /users?{params}: Search for user resources

If you’re familiar with REST, you get the idea.

There is one requirement that proved a bit of design challenge. We will be creating canonical ID for all resources managed by the service, which will function as the primary key. The APIs above reference that key by the {uid} path variable. However, we also need to support fetching a single resource by a number of existing identifiers, including multiple legacy IDs, and natural keys like, sticking to the users example, usernames and email addresses. Unlike the search API, which returns an array of resources, we need a nice single API like GET /users/{uid} that returns a single resource, but for a secondary key. What should it look like?

None of my initial proposals were great (using username as the sample secondary key, though again, we need to support a bunch of these):

  • GET /users?username={username} — consistent with search, but does it return a collection like search or just a single entry like GET /users/{uid}? Would be weird not to return an array or not based on which parameters were used.
  • GET /users/by/username/{username} — bit weird to put a preposition in the URL. Besides, it might conflict with a planned API to fetch subsets of info for a single resource, e.g., GET /users/{uid}/profile, which might return just the profile object.
  • GET /user?username={username} — Too subtle to have the singular rather than plural, but perhaps the most REST-ish.
  • GET /lookup?obj=user&username={username} Use special verb, not very RESTful

I asked around a coding Slack, posting a few possibilities, and friendly API designers suggested some others. We agreed it was an interesting problem, easily solved if there was just one alternate that never conflicts with the primary key ID, such as GET /users/{uid || username}. But of course that’s not the problem we have: there are a bunch of these fields, and they may well overlap!

There was some interest in GET /users/by/username/{username} as an aesthetically-pleasing URL, plus it allows for

  • /by => list of unique fields
  • /by/username/ => list of all usernames?

But again, it runs up against the planned use of subdirectories to return sub-objects of a resource. One other I played around with was: GET /users/user?username={username}: The user sub-path indicates we want just one user much more than /by does, and it’s unlikely we’d ever use user to name an object in a user resource. But still, it overloads the path to mean one thing when it’s user and another when it’s a UID.

Looking back through the options, I realized that what we really want is an API that is identical to GET /users/{uid} in its behaviors and response, just with a different key. So what if we just keep using that, as originally suggested by a colleague as GET /users/{uid || username} but instead of just the raw value, we encode the key name in the URL. Turns out, colons (:) are valid in paths, so I defined this route:

  • GET /users/{key}:{value}: Fetch a single resource by looking up the {key} with the {value}. Supported {key} params are legacy_id, username, email_address, and even uid. This then becomes the canonical “look up a user resource by an ID” API.

The nice thing about this API is that it’s consistent: all keys are treated the same, as long as no key name contains a colon. Best of all, we can keep the original GET /users/{uid} API around as an alias for GET /users/uid:{value}. Or, better, continue to refer to it as the canonical path, since the PUT and DELETE actions map only to it, and document the GET /users/{key}:{value} API as accessing an alias for symlink for GET /users/{uid}. Perhaps return a Location header to the canonical URL, too?

In any event, as far as I can tell this is a unique design, so maybe it’s too weird or not properly RESTful? Would love to know of any other patterns designed to solve the problem of supporting arbitrarily-named secondary unique keys. What do you think?

Update: Aristotle Pagaltzis started a discussion on this pattern in a Gist.

Custom SQLite Aggregates in Perl

About a year ago, Josh Berkus was reviewing some Bricolage SQL code, looking to optimize it for PostgreSQL. One of the things he noticed was that we were fetching a lot more rows for an object than we needed to. The reason for this is that an object might be associated with one or more groups, and to get back a list of all of the group IDs, we were getting multiple rows. For example, if I wanted to fetch a single story with the ID 10, I might get back rows like this:

SELECT s.id, s.title, grp.id
FROM   story s, member m, grp g
WHERE  s.id = m.story_id
       AND m.grp_id = g.id
       AND s.id = 10;
s.id |        s.title      | grp.id
-----+---------------------+--------
  10 | The Princess Bride  | 23
  10 | The Princess Bride  | 24
  10 | The Princess Bride  | 25
  10 | The Princess Bride  | 26
  10 | The Princess Bride  | 27

Now, that’s a lot of extra data to have to fetch for just a single row to be different; it’s very wasteful, really. So Josh said, “Why don’t you use a custom aggregate for that?” I knew nothing about aggregates, but I did some research, and figured out how to write PostgreSQL custom aggregates in SQL. I wrote a very simple one, called id_list(), that joins up all of the values in a column with an empty space. The aggregate code looks like this:

CREATE   FUNCTION append_id(TEXT, INTEGER)
RETURNS  TEXT AS '
    SELECT CASE WHEN $2 = 0 THEN
                $1
            ELSE
                $1 || '' '' || CAST($2 AS TEXT)
            END;'
LANGUAGE 'sql'
WITH     (ISCACHABLE, ISSTRICT);

CREATE AGGREGATE id_list (
    SFUNC    = append_id,
    BASETYPE = INTEGER,
    STYPE    = TEXT,
    INITCOND = ''
);

Now I was able to vastly simplify the results returned by the query:

SELECT s.id, s.title, id_list(grp.id)
FROM   story s, member m, grp g
WHERE  s.id = m.story_id
       AND m.grp_id = g.id
       AND s.id = 10;
GROUP BY s.id, s.title
s.id |        s.title      | id_list
-----+---------------------+---------------
  10 | The Princess Bride  | 23 24 25 26 27

So then I just had to split the id_list column on the white space and I was ready to go. Cool!

So recently, was thinking about how I might do something similar in SQLite. It turns out that SQLite has a way to add custom aggregates, too, via its sqlite_add_function function. But I don’t know C, and had been wondering for a while how, even if I figured out how to write an aggregate function in C, whether I would have to require users to compile SQLite with my C aggregate in order to get it to work.

However, as a Perl developer, I thought it might be worthwhile to just quickly check the DBD::SQLite docs might have to say on the matter. And it turns out that the ability to add aggregates to SQLite is supported in DBD::SQLite via the create_aggregate custom function. And what’s more, the aggregate can be written in Perl! Whoa! I couldn’t believe that it could be that easy, but a quick test script demonstrated that it is:

#!/usr/bin/perl -w

use strict;

use DBI;

my $dbfile = shift;
my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile", '', '');

END {
    $dbh->disconnect;
    unlink $dbfile;
};

$dbh->do('CREATE TABLE foo (a int)');
$dbh->do('BEGIN');
$dbh->do('INSERT INTO foo (a) VALUES (?)', undef, $_) for 1..10;
$dbh->do('COMMIT');

# Create a new aggregate.
$dbh->func('joiner', 1, 'My::Join', 'create_aggregate');

my $sel = $dbh->prepare(q{SELECT joiner(a) FROM foo});
$sel->execute;
$sel->bind_columns(\my $val);
print "$val\n" while $sel->fetch;

The first argument to create_aggregate() (itself invoked via the DBI func() method) the name of the aggregate, the second is the number of arguments to the aggregate (use -1 for an unlimited number), and the third is the name of a Perl class that implements the aggregate. That class needs just three methods: new(), an object constructor; step(), called for each aggregate row, and finalize, which must return the value calculated by the aggregate. My simple implementation looks like this:

package My::Join;

sub new { bless [] }
sub step {
    my $self = shift;
    push @$self, @_;
}
sub finalize {
    my $self = shift;
    return join q{ }, @$self;
}

Yep, that’s really it! When I run the script, the output looks like this:

% try foo
1 2 3 4 5 6 7 8 9 10

Keen! I mean, that is just so slick! And it really demonstrates the power of SQLite as an embeddable database, as well. Thanks Matt, for making the SQLite API available to us mere mortal Perl developers!

Looking for the comments? Try the old layout.

bricolage.cc Has a New Server

The Bricolage project has a new server! Thanks to a generous donation from NetStumbler.com, we have a shiny new hosted FreeBSD server. I’ve started building it, and have migrated the Bricolage Web site over today. By tomorrow, DNS records should be updated, and it’ll be rarin’ to go!

While I was at it, I rolled out the new Bricolage API documentation browser. There are now browser available for all of the major stable releases of Bricolage, listed on the old documentation page. The API browser for the current stable release will always be available here. Meanwhile, the documentation will also be generated from the Subversion trunk every morning; you can find those docs here.

We’ll also be able to keep better track of the kind of traffic the site gets thanks to the new stats site, which will also be updated daily.

Enjoy!

Looking for the comments? Try the old layout.