Weeknotes: a staging environment, a Datasette alpha and a bunch of new LLMs
6th August 2024
My big achievement for the last two weeks was finally wrapping up work on the Datasette Cloud staging environment. I also shipped a new Datasette 1.0 alpha and added support to the LLM ecosystem for a bunch of newly released models.
- A staging environment for Datasette Cloud
- Datasette 1.0a14
- Llama 3.1 GGUFs and Mistral for LLM
- Blog entries
- Releases
- TILs
A staging environment for Datasette Cloud
I’m a big believer in investing in projects to help accelerate future work. Having a productive development environment is critical for me—it’s why most of my projects start with templates that give me unit tests, contineous integration and a deployment pipeline from the start.
Datasette Cloud runs Datasette in containers hosted on Fly.io. When I was first putting the system together I got a little lazy—while it still had minimal user activity I could get away with iterating on the production environment directly.
That’s no longer a responsible thing to do, and as a result I found my speed of iteration dropping dramatically. Deploying new user-facing Datasette features remained productive because I could test those locally, but the systems that interacted with Fly.io in order to launch and update containers were a different story.
It was time to invest in a staging environment—which turns out to be one of those things that gets harder to set up the longer you leave it. I should add it to my list of PAGNIs—Probably Are Gonna Need Its. There ended up being all sorts of assumptions baked into the system that hard-coded production domains and endpoints.
It took longer than expected, but the staging environment is now in place. I’m really happy with it.
- It’s a full clone of the production environment, replicating all aspects of production in a separate Fly organization with its own domain names, API keys, S3 buckets and other configuration.
- Continuous integration and continous deployment continues to work. Any code pushed to the
main
branch of both the core repositories for Datasette Cloud will be deployed to both production and staging... unless staging is configured to deploy from a branch instead, in which case I can push experimental code to that branch and see it running in the staging environment without affecting production. - I added a feature to help me iterate on the end-user Datasette containers as well: I can now launch a new space and configure that to deploy changes made to a specific branch. This means I can rapidly test end-user changes in a safe, isolated environment that otherwise exactly mirrors how production works.
There are three key components to how Datasette Cloud works:
- A router application, written in Go, which handles ALL traffic to
*.datasette.cloud
and decides which underlying container it should be routed to. Each Datasette Cloud team gets its own dedicated container under that team’s selected subdomain. Fly.io can scale containers to zero, so routed requests can cause a container to be started up if it’s not already running. - A Django application responsible for the
www.datasette.cloud
site. This is the site where users sign in and manage their Datasette Cloud spaces. It also offers several different APIs that the individual Datasette containers can consult for things like permission checks and configuring additional features. - The Datasette containers themselves. Each space (my term for a private team instance) gets their own container with their own encrypted volume, to minimize the chance of accidental leakage of data between different teams and ensure that performance problems in one space don’t affect others. These containers are launched and updated by the Django application.
The staging environment means that any of these three can now be aggressively iterated on without any fear of breaking production. I expect it to dramatically increase my velocity in iterating on improvements to how everything fits together.
Datasette 1.0a14
I published some annotated release notes for this yesterday. It represents several months of accumulated work, much of it by Alex Garcia. It’s already running on Datasette Cloud, which is a useful testing ground for driving improvements to Datasette itself.
Llama 3.1 GGUFs and Mistral for LLM
Llama 3.1 came out on the 23rd of July and represented a huge milestone in openly licensed LLMs: the first openly licensed model that can credibly claim to compete with GPT-4.
It came in three sizes: 403B (the GPT-4 competitor), 70B and 8B.
LLM’s llm-llama-cpp plugin has been in desperate need of an upgrade in quite a while—the plugin has some bad design decisions baked in that coupled it to the Llama 2 era.
Rather than fix that up I ended up building a brand new plugin: llm-gguf, which takes advantage of the latest features in the llama.cpp
ecosystem.
Most importantly, this means LLM can run Llama 3.1 8b. The following command dowloads a 4.6GB model file and uses it to run a prompt:
llm install llm-gguf
llm gguf download-model \
https://huggingface.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf \
--alias llama-3.1-8b-instruct --alias l31i
llm -m l31i "five great names for a pet lemur"
I ran that just now and got:
Here are five great name suggestions for a pet lemur:
- Loki: A mischievous and playful name that suits a lemur’s curious and energetic nature. In Norse mythology, Loki is known for his cunning and trickster ways, which makes him a fitting name for a lemur.
- Zuzu: A whimsical and exotic name that captures the lemur’s unique charm. Zuzu is a nod to the magical and mysterious, evoking the idea of a lemur as a tiny, furry sorcerer.
- Finnley: A charming and adventurous name that suits a lemur’s bold and curious personality. Finnley is a playful name that suggests a sense of fun and mischief, perfect for a pet lemur.
- Kiko: A sweet and endearing name that means “little one” in Swahili. Kiko is a fitting name for a pet lemur, which is already small and adorable.
- Jinx: A fun and mischievous name that suits a lemur’s playful and energetic nature. Jinx is a nod to the idea of a lemur as a tiny, furry troublemaker, always getting into mischief and mayhem.
I hope these suggestions help inspire you to find the perfect name for your pet lemur!
Great job my laptop!
Mistral AI also released a new model which looks to be edging into that GPT-4 territory: Mistral Large 2. This one is available as open weights under a non-commercial “research” license, or you can access it for commercial purposes via the Mistral API.
The llm-mistral plugin talks to that API, and doesn’t need to be updated to handle new models since it can refresh the list of models directly from Mistral:
llm install llm-mistral
llm mistral refresh # if already installed
llm -m mistral/mistral-large-latest 'five names for a pet mountain lion'
It reminded me to get a license:
Here are five names suitable for a pet mountain lion. Remember that big cats like mountain lions need special care, licensing, and experience, so it’s important to be responsible and ensure their well-being.
- Shadow: This name represents the stealthy and elusive nature of mountain lions.
- Tawny: A name inspired by the beautiful tawny coat color of mountain lions.
- Koda: A Native American name meaning “friend” or “ally”, which is fitting for a beloved pet.
- Sierra: After the Sierra Nevada mountains, a habitat for mountain lions, this name has a wild and majestic feel.
- Leon: Derived from the Latin word for “lion”, this name signifies strength and power.
Blog entries
Plus dozens of links and quotations. My link descriptions are indistinguishable from regular blog posts now, especially since I’ve started including inline images and even videos for some of them. Here’s everything in July.
Releases
-
datasette-remote-metadata 0.2a0—2024-08-05
Periodically refresh Datasette metadata from a remote URL -
datasette 1.0a14—2024-08-05
An open source multi-tool for exploring and publishing data -
fetch-github-issues 0.1.2—2024-07-29
Fetch all GitHub issues for a repository -
datasette-extract 0.1a8—2024-07-26
Import unstructured data (text and images) into structured tables -
llm-mistral 0.5—2024-07-24
LLM plugin providing access to Mistral models using the Mistral API -
llm-gguf 0.1a0—2024-07-23
Run models distributed as GGUF files using LLM
TILs
- Assistance with release notes using GitHub Issues—2024-08-05
- Back-dating Git commits based on file modification dates—2024-08-01
- HTML video with subtitles—2024-07-31
More recent articles
- Ask questions of SQLite databases and CSV/JSON files in your terminal - 25th November 2024
- Weeknotes: asynchronous LLMs, synchronous embeddings, and I kind of started a podcast - 22nd November 2024
- Notes from Bing Chat—Our First Encounter With Manipulative AI - 19th November 2024