Datasette 0.28—and why master should always be releasable
19th May 2019
It’s been quite a while since the last substantial release of Datasette. Datasette 0.27 came out all the way back in January.
This isn’t because development has slowed down. In fact, the project has had 131 commits since then, covering a bewildering array of new functionality and with some significant contributions from developers who aren’t me—Russ Garrett and Romain Primet deserve special recognition here.
The problem has been one of discipline. I’m a big fan of the idea of keeping master shippable at all times in my professional work, but I hadn’t quite adopted this policy for my open-source side projects. A couple of months ago I found myself in a situation where I had two major refactorings (of faceting and of Datasette’s treatment of immutable files) going on in master at the same time, and untangling them turned out to take way longer than I had expected.
So I’ve updated Datasette’s contribution guidelines to specify that master should always be releasable, almost entirely as a reminder to myself.
All of that said, I’m finally back out of the weeds and I’m excited to announce today’s release of Datasette 0.28. It features a salmagundi of new features! I’m replicating the release notes below.
Supporting databases that change
From the beginning of the project, Datasette has been designed with read-only databases in mind. If a database is guaranteed not to change it opens up all kinds of interesting opportunities—from taking advantage of SQLite immutable mode and HTTP caching to bundling static copies of the database directly in a Docker container. The interesting ideas in Datasette explores this idea in detail.
As my goals for the project have developed, I realized that read-only databases are no longer the right default. SQLite actually supports concurrent access very well provided only one thread attempts to write to a database at a time, and I keep encountering sensible use-cases for running Datasette on top of a database that is processing inserts and updates.
So, as-of version 0.28 Datasette no longer assumes that a database file will not change. It is now safe to point Datasette at a SQLite database which is being updated by another process.
Making this change was a lot of work—see tracking tickets #418, #419 and #420. It required new thinking around how Datasette should calculate table counts (an expensive operation against a large, changing database) and also meant reconsidering the “content hash” URLs Datasette has used in the past to optimize the performance of HTTP caches.
Datasette can still run against immutable files and gains numerous performance benefits from doing so, but this is no longer the default behaviour. Take a look at the new Performance and caching documentation section for details on how to make the most of Datasette against data that you know will be staying read-only and immutable.
Faceting improvements, and faceting plugins
Datasette Facets provide an intuitive way to quickly summarize and interact with data. Previously the only supported faceting technique was column faceting, but 0.28 introduces two powerful new capibilities: facet-by-JSON-array and the ability to define further facet types using plugins.
Facet by array (#359) is only available if your SQLite installation provides the json1
extension. Datasette will automatically detect columns that contain JSON arrays of values and offer a faceting interface against those columns—useful for modelling things like tags without needing to break them out into a new table. See Facet by JSON array for more.
The new register_facet_classes() plugin hook (#445) can be used to register additional custom facet classes. Each facet class should provide two methods: suggest()
which suggests facet selections that might be appropriate for a provided SQL query, and facet_results()
which executes a facet operation and returns results. Datasette’s own faceting implementations have been refactored to use the same API as these plugins.
datasette publish cloudrun
Google Cloud Run is a brand new serverless hosting platform from Google, which allows you to build a Docker container which will run only when HTTP traffic is recieved and will shut down (and hence cost you nothing) the rest of the time. It’s similar to Zeit’s Now v1 Docker hosting platform which sadly is no longer accepting signups from new users.
The new datasette publish cloudrun
command was contributed by Romain Primet (#434) and publishes selected databases to a new Datasette instance running on Google Cloud Run.
See Publishing to Google Cloud Run for full documentation.
register_output_renderer plugins
Russ Garrett implemented a new Datasette plugin hook called register_output_renderer (#441) which allows plugins to create additional output renderers in addition to Datasette’s default .json
and .csv
.
Russ’s in-development datasette-geo plugin includes an example of this hook being used to output .geojson
automatically converted from SpatiaLite.
Medium changes
- Datasette now conforms to the Black coding style (#449)—and has a unit test to enforce this in the future
- New Special table arguments:
?columnname__in=value1,value2,value3
filter for executing SQL IN queries against a table, see Table arguments (#433)?columnname__date=yyyy-mm-dd
filter which returns rows where the spoecified datetime column falls on the specified date (583b22a)?tags__arraycontains=tag
filter which acts against a JSON array contained in a column (78e45ea)?_where=sql-fragment
filter for the table view (#429)?_fts_table=mytable
and?_fts_pk=mycolumn
querystring options can be used to specify which FTS table to use for a search query—see Configuring full-text search for a table or view (#428)
- You can now pass the same table filter multiple times—for example,
?content__not=world&content__not=hello
will return all rows where the content column is neitherhello
orworld
(#288) - You can now specify
about
andabout_url
metadata (in addition tosource
andlicense
) linking to further information about a project—see Source, license and about - New
?_trace=1
parameter now adds debug information showing every SQL query that was executed while constructing the page (#435) datasette inspect
now just calculates table counts, and does not introspect other database metadata (#462)- Removed
/-/inspect
page entirely—this will be replaced by something similar in the future, see #465 - Datasette can now run against an in-memory SQLite database. You can do this by starting it without passing any files or by using the new
--memory
option todatasette serve
. This can be useful for experimenting with SQLite queries that do not access any data, such asSELECT 1+1
orSELECT sqlite_version()
.
Small changes
- We now show the size of the database file next to the download link (#172)
- New
/-/databases
introspection page shows currently connected databases (#470) - Binary data is no longer displayed on the table and row pages (#442—thanks, Russ Garrett)
- New show/hide SQL links on custom query pages (#415)
- The extra_body_script plugin hook now accepts an optional
view_name
argument (#443—thanks, Russ Garrett) - Bumped Jinja2 dependency to 2.10.1 (#426)
- All table filters are now documented, and documentation is enforced via unit tests (2c19a27)
- New project guideline: master should stay shippable at all times! (31f36e1)
- Fixed a bug where
sqlite_timelimit()
occasionally failed to clean up after itself (bac4e01) - We no longer load additional plugins when executing pytest (#438)
- Homepage now links to database views if there are less than five tables in a database (#373)
- The
--cors
option is now respected by error pages (#453) datasette publish heroku
now uses the--include-vcs-ignore
option, which means it works under Travis CI (#407)datasette publish heroku
now publishes using Python 3.6.8 (666c374)- Renamed
datasette publish now
todatasette publish nowv1
(#472) datasette publish nowv1
now accepts multiple--alias
parameters (09ef305)- Removed the
datasette skeleton
command (#476) - The documentation on how to build the documentation now recommends
sphinx-autobuild
More recent articles
- Storing times for human events - 27th November 2024
- Ask questions of SQLite databases and CSV/JSON files in your terminal - 25th November 2024
- Weeknotes: asynchronous LLMs, synchronous embeddings, and I kind of started a podcast - 22nd November 2024