It isn’t uncommon for Pull Request to be updated before CI/CD is finished for the previous version. Unfortunately, GitHub does not cancel GitHub Actions for a stale version of the code and those tasks keep running until they either fail or fully finish.
It even does stuff like figure out what datacenter your CI job is running in to look up the carbon intensity of the local electricity grid.
This is a good tip in the post, the CI on one of the projects I work on runs across three VMs for a half an hour so I do feel wasteful pushing to an open PR then noticing I have an extra space somewhere or minor typo in a comment or something.
I guess every little bit counts. This seems to be possible one way or another with most CI products: CircleCI (disclaimer: my employer), Gitlab, and Forgejo/Gitea all appear to implement this.
There are some cases where I do want different commits on the same branch/pull request to run all actions. For example, if I am updating workflow dependencies for a workflow that normally runs on merges to main, I will temporarily make them run on my branch to test/validate. I make two commits—one with the updates, and another that runs the workflow on my branch. Once I have validate the change, I remove the last commit.
On the same lines, does anyone know how to cache previous steps like how docker images work. I always have npm install like commands that don’t change very often
What I do is to build docker images of rarely changing stuff offline. The version strings of the docker images are in the repo, so it works across branches
This works fine —It saves several minutes on every job of every run.
But it is kinda coarse grained, and docker is not very expressive. It requires tons of shell scripts for even basic uses IME
GitHub actions has a cache feature, but I don’t use it because we also run on sourcehut (with podman)
I don’t want to be too tied to GitHub.
Also, iirc it is a clunky yaml thing, sorta like docker is a clunky dockerfile thing
I do this by simply using Docker. GitHub comes with a container registry that’s free for open-source projects. The LLVM builds, which don’t fit on the free runners, we do on Cirrus CI (which also supports FreeBSD and AArch64 nicely) on each commit and that produces an artefact. We then run another job periodically that builds a new version of our dev container. We have the metadata for it so that things like VS Code / GitHub Code Spaces can use it automatically and we use the same container in CI. Pulling it from the GitHub container registry takes around 15 seconds. I presume some of the common things (like the Ubuntu base layer) are cached very near the CI runners.
I do the same thing, and I found it to be very fast as well – 10-15 seconds pull time. (Actually it can see here http://op.oilshell.org/uuu/github-jobs/8220/ – some are as low as 8 seconds)
So I think there is basically a cloud lock-in effect – Github probably runs on AWS, and Cirrus CI probably runs on AWS too.
And I use the regular docker.io registry now, which is probably on AWS
That kinda annoys me – Docker + these CI systems are a pretty suboptimal software architecture, but it’s just brute-forced with hardware on 1 or 2 cloud providers.
It certainly works and saves time, but if I had more time, I would try to get rid of it. It is a long term goal.
I guess it’s a little like the BitKeeper debate – open source can’t really function without proprietary cloud systems (CI being a critical service), because it’s economical to use them
Another one is to squash your db migrations up to N days/weeks/months ago. I’ve worked on projects where it took multiple minutes per build to run all db migrations. You’re probably not going to migrate down to whatever the database looked like 3 years ago, so why not squash them? You can put the old migration files in a separate directory if you care about knowing how the schema evolved over time.
this is the “simple trick”:
That’s the simple problem. The simple trick is adding this to the github workflow:
A variant of this that will not cancel any builds on the main branch after merging and also will not touch PRs with a label
ci:full
:cancel-in-progress: ${{ github.ref != 'refs/heads/main' && !contains(github.event.pull_request.labels.*.name, 'ci:full') }}
Though depending on the action triggers,
cancel-in-progress: ${{ github.event_name == 'pull_request' }}
may be enough.On this topic, you can use this github actions plugin to very accurately measure your CI’s emissions: https://github.com/green-coding-solutions/eco-ci-energy-estimation
It even does stuff like figure out what datacenter your CI job is running in to look up the carbon intensity of the local electricity grid.
This is a good tip in the post, the CI on one of the projects I work on runs across three VMs for a half an hour so I do feel wasteful pushing to an open PR then noticing I have an extra space somewhere or minor typo in a comment or something.
I guess every little bit counts. This seems to be possible one way or another with most CI products: CircleCI (disclaimer: my employer), Gitlab, and Forgejo/Gitea all appear to implement this.
[Comment removed by author]
Would it be reasonable for GitHub to make this default behavior?
No, but it should be optional. Gitlab supports this: https://docs.gitlab.com/ee/ci/pipelines/settings.html#auto-cancel-redundant-pipelines.
There are some cases where I do want different commits on the same branch/pull request to run all actions. For example, if I am updating workflow dependencies for a workflow that normally runs on merges to main, I will temporarily make them run on my branch to test/validate. I make two commits—one with the updates, and another that runs the workflow on my branch. Once I have validate the change, I remove the last commit.
It is optional, using exactly the process described in the linked article!
On the same lines, does anyone know how to cache previous steps like how docker images work. I always have
npm install
like commands that don’t change very oftenYou can use the github actions cache.
I would like to know what others do as well
What I do is to build docker images of rarely changing stuff offline. The version strings of the docker images are in the repo, so it works across branches
This works fine —It saves several minutes on every job of every run.
But it is kinda coarse grained, and docker is not very expressive. It requires tons of shell scripts for even basic uses IME
GitHub actions has a cache feature, but I don’t use it because we also run on sourcehut (with podman)
I don’t want to be too tied to GitHub.
Also, iirc it is a clunky yaml thing, sorta like docker is a clunky dockerfile thing
I do this by simply using Docker. GitHub comes with a container registry that’s free for open-source projects. The LLVM builds, which don’t fit on the free runners, we do on Cirrus CI (which also supports FreeBSD and AArch64 nicely) on each commit and that produces an artefact. We then run another job periodically that builds a new version of our dev container. We have the metadata for it so that things like VS Code / GitHub Code Spaces can use it automatically and we use the same container in CI. Pulling it from the GitHub container registry takes around 15 seconds. I presume some of the common things (like the Ubuntu base layer) are cached very near the CI runners.
I do the same thing, and I found it to be very fast as well – 10-15 seconds pull time. (Actually it can see here http://op.oilshell.org/uuu/github-jobs/8220/ – some are as low as 8 seconds)
On the other hand, I run the same thing in sourcehut, and the pull times are 38 to 51 seconds. And I believe it was more like 90 seconds at one point. (http://op.oilshell.org/uuu/sourcehut-jobs/git-5eaf087a737085f643d41715eb7bb7fbfcdfbb5f/)
So I think there is basically a cloud lock-in effect – Github probably runs on AWS, and Cirrus CI probably runs on AWS too.
And I use the regular docker.io registry now, which is probably on AWS
That kinda annoys me – Docker + these CI systems are a pretty suboptimal software architecture, but it’s just brute-forced with hardware on 1 or 2 cloud providers.
It certainly works and saves time, but if I had more time, I would try to get rid of it. It is a long term goal.
I guess it’s a little like the BitKeeper debate – open source can’t really function without proprietary cloud systems (CI being a critical service), because it’s economical to use them
This is great. I know about cancel-in-progress but it’s always something I forget to set up. Thank you for this resource and for the reminder.
Another one is to squash your db migrations up to N days/weeks/months ago. I’ve worked on projects where it took multiple minutes per build to run all db migrations. You’re probably not going to migrate down to whatever the database looked like 3 years ago, so why not squash them? You can put the old migration files in a separate directory if you care about knowing how the schema evolved over time.
I agree with the practice, but that is not a “simple trick,” depending on your migration system.
This one and a few others is what I wrote about earlier https://ashishb.net/tech/common-pitfalls-of-github-actions/