In a large part because, at GitHub’s size, I worry much less about private equity enshittifying it.
I would argue that the little ad for Copilot that now appears on GitHub file pages is the first step of the enshittification cycle. “We need to maximize engagement with Copilot. How do we do it? Oh, I know, we can add a CTA to the content pages…” is the first step on the road to ruin. Hell, the whole homepage is just another social media feed at this point.
Hell, the whole homepage is just another social media feed at this point.
Yeah, this has driven me nuts – I was a regular user of the homepage to track relevant activity (people I follow, people doing things to my own repos), and the new feed design has more or less taken that away.
I hated the new homepage too but I realized if I change the filter (in the top-right, no need to dig in settings) to only include Sponsors, Stars, Repositories, and Follows, it becomes much more like the old homepage.
Reminds me of how the windows widgets have a news feed you can’t disable, and they recently updated it to show fewer widgets and more of the news feed.
Hell, the whole homepage is just another social media feed at this point.
If you’re frustrated by the the home feed changes (like I am), please give feedback on their discussions board, and upvote relevant threads, and encourage others to do the same. In the meantime there’s userscripts and extensions you can use to revert it locally.
These half-assed programming languages in yaml are tremendously frustrating, I keep wondering how much better and simpler things would be if instead of that it were an actual programming language, with a declarative bend. The thing is too weak to be fully useful on its own and has to embed shell scripts anyway, at that point why not have an Elm-like language except instead of creating the integration of an event stream into a web application it simply maps an Event into a Job, and the entire thing is completely typed, and you can have a “standard library” of utilities which you can actually look up, and you can define your own common utilities and variables?
And you can run the compiler locally so you at least know what you wrote is not complete nonsense, even if your shell scripts may be broken.
It’s worse than that. Because the YAML-based language is so anaemic, the extension language (the thing that you use to define new functions) is JavaScript. So you already have a dependency on V8 and Node.js to be able to actually run any actions. At that point, I don’t know why they don’t just expose the actions as modules and ask you to write TypeScript to drive the CI. I’d much rather use TypeScript with a well-defined API than YAML.
Not all are JavaScript, but a bunch of the core ones are. You can run github-act-runner without Node.js, but you have to provide your own checkout command because that’s JavaScript. This means that you basically have an unavoidable Node.js dependency. If you’re going to do that, why not make TypeScript the scripting layer? If people want to provide actions in other languages, they can do so as Node native plugins.
It’s not just about having a NodeJS dependency. It’s also about maintaining one’s own actions - and having other options than JS there is a good thing. I do not want to write any kind of JS for the actions I maintain. Not even glue code. I can do that right now, and that’s fine. I’m ok with having a NodeJS dependency, if I don’t need to touch any JS code.
If anything, I’d love to see more languages supported, perhaps even a generic way to provide pre-compiled actions, so I wouldn’t need to resort to publishing my precompiled actions as a docker container, and then use that from within action.yml (which limits the platforms I can run it on, among other things).
It’s not just about having a NodeJS dependency. It’s also about maintaining one’s own actions - and having other options than JS there is a good thing
Totally agreed.
I do not want to write any kind of JS for the actions I maintain.
Okay…
Not even glue code. I can do that right now, and that’s fine. I’m ok with having a NodeJS dependency, if I don’t need to touch any JS code.
Instead, you need to touch an absolutely awful DSL in YAML to use your action. There’s nothing preventing the actions glue code from allowing you to export some API description and generating the TypeScript exports from that so you don’t need to write TypeScript to write a new action, but if they ditched the YAML and exposed the JavaScript VM (that must be there anyway) for scripting then the interface would be a lot nicer.
Instead, you need to touch an absolutely awful DSL in YAML to use your action.
I don’t find the DSL awful. Mind you, I use very little of it, because 99% of the logic I need for my builds, are in actions. So my workflows are like:
steps:
- name: foo
uses: my-actions/something@version
with:
param1: bar
Repeat something similar a few more times, done. There’s pretty much no shell steps, nor anything complicated, just a pretty flat and stupid yaml. Not sure how JS would make this better?
Something like that, I guess? I’ll stick with the yaml, thanks. I prefer my workflows to be declarative, rather than code. YAML isn’t perfect, but for declarative things, I think it’s a better fit than most programming languages.
Now, if you want the full power of a proper language, and don’t care if your workflows aren’t declarative, then yeah, yaml is horrible. I’m not in that camp, so yaml is fine for my use cases.
if they ditched the YAML and exposed the JavaScript VM (that must be there anyway) for scripting then the interface would be a lot nicer.
From what I can tell after browsing the runner source, it just shells out to nodejs like for any non-js action.
I think you are probably in a minority in being able to express your builds in purely declarative YAML. The YAML language supported by actions supports a lot of imperative constructs for conditional execution. All of these are handled by consuming a string in YAML and evaluating yet another DSL.
Indeed, I probably am. Nevertheless, a CI can provide multiple ways to specify a workflow, so those of us who can express them in a purely declarative manner, can do so, while those who need a proper programming langue, would be able to do just that.
I wouldn’t hold my breath for GitHub Actions to ever get there.
The problem I have is that you’re not really expressing a declarative CI pipeline, you’re expressing an imperative CI pipeline in custom actions and then using their weird language to provide configuration for that pipeline. With a JavaScript / TypeScript scripting language, there’s nothing stopping you from exposing a single entry point to your custom actions that consume a JSON document and keeping the declarative structure in the JavaScript. In both cases, you’re using a declarative subset of an imperative language to drive an imperative language.
They’re best used by ignoring them as much as possible, calling out to a shell script or makefile or whatever your tool of choice is. Treat them as an entry point into something acceptable.
And then you can run that acceptable thing locally, or in some other CI, or anywhere else you want.
Thanks for writing this up. I’ve been thinking the same thing recently that we’ve got such a great tooling when doing programming but DevOps and CI world is riddled with ugly, untyped Yaml. It’s just sad.
Nickel & Dhall can save the day. Why these aren’t the preferred native formats fully-typed and reading in a user opting for the worse YAML experience is beyond me.
I’ve been using Lua for a while, and I am progressively moving more and more into Lua. At this rate I don’t think I will ever go to another format as long as I live. It’s too OP
There’s a good number of vendors attempting to create their own languages for specific niches. What’s more interesting to me is novel application of existing languages. E.g. what if GitHub actions just invoked an entry point you defined, then your program (written in the same popular language you’re already familiar with) calls back to an API to define the next tasks and jobs.
I want to mention act, a terribly-named tool for running GH Actions locally prior to push. It does not always work, but sometimes it can save time when debugging Actions.
Our subdomain name is funny – http://travis-ci.oilshell.org/github-jobs/ – and proves that the generality was not speculative ! We moved off Travis CI when it rotted after getting acquired. It was literally zero disruption since we can always run in multiple CIs. (obviously should rename that domain)
Now I understand that some people look at 20,000 lines of shell and get scared.
(1) Well first, this is a huge amount of functionality, which not every project has. All these tests, benchmarks, metrics, docs, etc. running at every commit:
e.g. Python and C++ linting, Python/C++/shell unit tests, Address sanitizer, UBSAN, Clang coverage, comparing GCC/Clang, cachegrind, uftrace, bloaty, etc. – basically every kind of code quality you can think of
And HTML results to browse
(2) The point of https://www.oilshell.org is to make 20,000 lines of shell less scary – more like a real programming language, which is necessary for CI
We even have Hay Ain’t YAML so you can interleave shell and declarative JSON-like data seamlessly
It’s all parsed statically so you get your code syntax errors and your data syntax errors all at once.
I showed that in the recent thread on YAML – there is no “else-me-no-care” language anti-pattern. And again no textual pasting of languages together.
There are a lot of things I don’t like in GitHub Actions. Most other CI systems I’ve used have a client-server architecture where the runners are dumb and just run commands. GitHub Actions is rebadged Azure Pipelines and it has a model of a fat agent on the machine that runs the entire pipeline. This is good for a cloud provider because each runner is just a pristine VM but that has a bunch of knock-on effects, most notably:
It’s hard to bring up a new platform because you need to port the runner (which depends on both .NET and Node.js).
It’s inherently stateless, so caching layers need to be built on top.
The nicest CI system I’ve seen used ZFS snapshots and clones for each build. When you wanted a new build, it would find the newest snapshot in your git history, clone that, and then pull your changes on top and do an incremental build. You could then do periodic checks for clean builds (and use clean builds for releases) but if you had a small bug fix on a previous commit then it ran that almost instantly. You can build this kind of thing out of containers on GitHub Actions, but it requires building all of that infrastructure yourself, in a way that likely breaks a load of assumptions of other Actions.
I maintain one github action, and testing it is a nightmare for the reasons outlined in “debugging like I’m 15 again”. I can’t run it locally. My action queries github to get state of various repos, and I can’t mock that either, so testing is pretty much manual with eyeballing what could be wrong. The whole ecosystem could be made so much better…
Honestly I still don’t understand how GitHub managed to build something like GitHub Actions after seeing what GitLab-CI does. It’s not perfect by no means, and it will never be, but the non-trivialities of such Ci system are almost infinite and building something secure, performant and widely usable requires some pretty involved thought process - little of which went into GitHub actions.
Honestly I still don’t understand how GitHub managed to build something like GitHub Actions
They didn’t. Azure DevOps built Azure Pipelines. GitHub rebranded it and made small tweaks but the GitHub Actions Runner is a fork of the Azure Pipelines Agent. Azure Pipelines was designed to be tightly integrated with Azure and easy to add your own Azure Scale Sets instead of the hosted versions (but pay MS either way).
Azure DevOps built Azure Pipelines. GitHub rebranded it and made small tweaks but the GitHub Actions Runner is a fork of the Azure Pipelines Agent.
This is true.
Azure Pipelines was designed to be tightly integrated with Azure and easy to add your own Azure Scale Sets instead of the hosted versions (but pay MS either way).
This is false. All the Azure-attach stuff was added much later, and was not part of the core design.
Azure Pipelines, née “Build vNext”, was designed as a reaction against the “bet the farm on Windows-only tech” in the previous TFS build system(s). The explicit goal was to build a platform-agnostic, fairly-dumb agent orchestrated by a cloud service. The first version of the vNext agent was written in Node to help provide that separation from the “MS ecosystem”. (We did take a bet on .Net Core when we productized the system, leading to the downsides you mentioned in another thread.)
I’m not really objecting to your implicit critique of the result, but I did want to correct the record of facts. Source: I was the PM to own the platform pieces after the original PM who helped design it got promoted to, effectively, architect. I think you and I even corresponded when we were trying to get a BSD fork of the agent going.
The explicit goal was to build a platform-agnostic, fairly-dumb agent orchestrated by a cloud service.
I don’t see how you get from that as a goal to the code that shipped. It took a load of dependencies on not-very-portable things, but it was also not written in a portable style. For example, there was no platform abstraction layer to centralise the things that differed between platforms. Things like ‘is the filesystem case sensitive’ are done on explicit ‘is the OS Linux’ checks scattered throughout the codebase so porting to a new platform required a load of cleanup. This is the kind of thing I’d expect in code written for one platform followed by a quick-and-dirty port to something else, not something designed to be platform agnostic.
I’m also not sure how you build something ‘fairly dumb’ that’s so large. Competing systems just expose a simple command runner (which can just be SSH on *NIX systems) and run the scriptable bits centrally. The decision to run all of the plugins on the agent on the target made things a lot more complex.
I think Azure Functions launched after this, which is a shame because it would have been nice to build a richer runner that runs as a FaaS task and just talks via fairly dumb command pipe to the clients.
It took a load of dependencies on not-very-portable things
That’s partly a function of taking a bet on .Net Core when it was in its infancy. I wasn’t there for that part, but was told that it wasn’t a technical decision, if you catch my drift.
but it was also not written in a portable style. For example, there was no platform abstraction layer to centralise the things that differed between platforms. Things like ‘is the filesystem case sensitive’ are done on explicit ‘is the OS Linux’ checks scattered throughout the codebase so porting to a new platform required a load of cleanup.
No disagreement from me on this. I’ve always chalked it up to a misunderstanding, or perhaps internal disagreement, on what “cross platform” was supposed to mean. To be fair, though, I’m nearly certain that “portable” was not a design goal. I probably shouldn’t have said platform-agnostic, when the truth is closer to “runs on the three platforms customers asked for at the time”.
I’m also not sure how you build something ‘fairly dumb’ that’s so large. Competing systems just expose a simple command runner (which can just be SSH on *NIX systems) and run the scriptable bits centrally.
I agree the result as-implemented is complex, but disagree that the design is large. The idea was to put all the brains in the (user-selectable, marketplace-delivered) tasks. The agent only does a few things: download task packages from the orchestrator, execute them in the order told, and ship some info (both streamed and batched) back up to the orchestrator.
Compounding the early places where the implementation is complex, the design also got more complex as it accreted features. I guess that’s inevitable in a product that has both competitors to compete with and the aforementioned non-technical pressures.
I often wonder how much nicer workflows would have been to write if it they were written in HCL like they first set out before they switched to YAML for the GA version.
It’s infuriating when I git push a workflow that silently fails because of invalid YAML; especially when I then merge that workflow’s branch under the mistaken impression that the workflow is passing, rather than not running at all.
I think GitHub should provide a way to check this on their end, but one thing that will also help on the user’s end of things, and that I generally recommend as a best practice anyway, is to set up pre-commit and turn on some of the built-in hooks. There are checkers in there which can reject syntactically-invalid files of several formats: JSON, TOML, XML, YAML.
One of my favorite prototypes I built was to take Temporal and build a CI system in it: have it clone a git repo, run a build command, run tests, parse the results, and post them to GitHub as annotations. The most annoying part was that Python (which I was doing the prototype for) doesn’t expose its unit test runners as a library.
But it took two days to prototype from scratch and get a real program. I’ve spent that long futzing with YAML to try to get it working. I’m not convinced that dedicated CI systems even make sense anymore after that.
Use GitLab, where there’s a lot less magic involved in the CI setup. For a lot of cases you can simply run the commands in the specified Docker container.
Give us a repository setting to reject commits with obviously invalid workflows
Because GitHub fails to distinguish between fork and non-fork SHA references, forks can bypass security settings on GitHub Actions that would otherwise restrict actions to only “trusted” sources
IIUC this is typically solved with if: github.repository == 'my-user/my-repo-name'.
I would argue that the little ad for Copilot that now appears on GitHub file pages is the first step of the enshittification cycle. “We need to maximize engagement with Copilot. How do we do it? Oh, I know, we can add a CTA to the content pages…” is the first step on the road to ruin. Hell, the whole homepage is just another social media feed at this point.
Yeah, this has driven me nuts – I was a regular user of the homepage to track relevant activity (people I follow, people doing things to my own repos), and the new feed design has more or less taken that away.
I hated the new homepage too but I realized if I change the filter (in the top-right, no need to dig in settings) to only include Sponsors, Stars, Repositories, and Follows, it becomes much more like the old homepage.
Reminds me of how the windows widgets have a news feed you can’t disable, and they recently updated it to show fewer widgets and more of the news feed.
Self-plug: I’m maintaining a filter list to hide Microsoft GitHub’s UI garbage if you’d like it to be less shitty.
https://sr.ht/~toastal/github-less-social/
I don’t know how or why this exists, but a slightly wonky variant of the old linear feed can be viewed at https://github.com/dashboard-feed
I assume the old page was pulling that in with AJAX and it will go away once they are confident all the old opened dashboards are closed.
If you’re frustrated by the the home feed changes (like I am), please give feedback on their discussions board, and upvote relevant threads, and encourage others to do the same. In the meantime there’s userscripts and extensions you can use to revert it locally.
These half-assed programming languages in yaml are tremendously frustrating, I keep wondering how much better and simpler things would be if instead of that it were an actual programming language, with a declarative bend. The thing is too weak to be fully useful on its own and has to embed shell scripts anyway, at that point why not have an Elm-like language except instead of creating the integration of an event stream into a web application it simply maps an Event into a Job, and the entire thing is completely typed, and you can have a “standard library” of utilities which you can actually look up, and you can define your own common utilities and variables?
And you can run the compiler locally so you at least know what you wrote is not complete nonsense, even if your shell scripts may be broken.
It’s worse than that. Because the YAML-based language is so anaemic, the extension language (the thing that you use to define new functions) is JavaScript. So you already have a dependency on V8 and Node.js to be able to actually run any actions. At that point, I don’t know why they don’t just expose the actions as modules and ask you to write TypeScript to drive the CI. I’d much rather use TypeScript with a well-defined API than YAML.
Because not all actions are in JS. There are plenty in shell, and iirc, go actions are also supported.
Not all are JavaScript, but a bunch of the core ones are. You can run github-act-runner without Node.js, but you have to provide your own checkout command because that’s JavaScript. This means that you basically have an unavoidable Node.js dependency. If you’re going to do that, why not make TypeScript the scripting layer? If people want to provide actions in other languages, they can do so as Node native plugins.
It’s not just about having a NodeJS dependency. It’s also about maintaining one’s own actions - and having other options than JS there is a good thing. I do not want to write any kind of JS for the actions I maintain. Not even glue code. I can do that right now, and that’s fine. I’m ok with having a NodeJS dependency, if I don’t need to touch any JS code.
If anything, I’d love to see more languages supported, perhaps even a generic way to provide pre-compiled actions, so I wouldn’t need to resort to publishing my precompiled actions as a docker container, and then use that from within
action.yml
(which limits the platforms I can run it on, among other things).Totally agreed.
Okay…
Instead, you need to touch an absolutely awful DSL in YAML to use your action. There’s nothing preventing the actions glue code from allowing you to export some API description and generating the TypeScript exports from that so you don’t need to write TypeScript to write a new action, but if they ditched the YAML and exposed the JavaScript VM (that must be there anyway) for scripting then the interface would be a lot nicer.
I don’t find the DSL awful. Mind you, I use very little of it, because 99% of the logic I need for my builds, are in actions. So my workflows are like:
Repeat something similar a few more times, done. There’s pretty much no shell steps, nor anything complicated, just a pretty flat and stupid yaml. Not sure how JS would make this better?
Something like that, I guess? I’ll stick with the yaml, thanks. I prefer my workflows to be declarative, rather than code. YAML isn’t perfect, but for declarative things, I think it’s a better fit than most programming languages.
Now, if you want the full power of a proper language, and don’t care if your workflows aren’t declarative, then yeah, yaml is horrible. I’m not in that camp, so yaml is fine for my use cases.
From what I can tell after browsing the runner source, it just shells out to nodejs like for any non-js action.
I think you are probably in a minority in being able to express your builds in purely declarative YAML. The YAML language supported by actions supports a lot of imperative constructs for conditional execution. All of these are handled by consuming a string in YAML and evaluating yet another DSL.
Indeed, I probably am. Nevertheless, a CI can provide multiple ways to specify a workflow, so those of us who can express them in a purely declarative manner, can do so, while those who need a proper programming langue, would be able to do just that.
I wouldn’t hold my breath for GitHub Actions to ever get there.
The problem I have is that you’re not really expressing a declarative CI pipeline, you’re expressing an imperative CI pipeline in custom actions and then using their weird language to provide configuration for that pipeline. With a JavaScript / TypeScript scripting language, there’s nothing stopping you from exposing a single entry point to your custom actions that consume a JSON document and keeping the declarative structure in the JavaScript. In both cases, you’re using a declarative subset of an imperative language to drive an imperative language.
They’re best used by ignoring them as much as possible, calling out to a shell script or makefile or whatever your tool of choice is. Treat them as an entry point into something acceptable.
And then you can run that acceptable thing locally, or in some other CI, or anywhere else you want.
Thanks for writing this up. I’ve been thinking the same thing recently that we’ve got such a great tooling when doing programming but DevOps and CI world is riddled with ugly, untyped Yaml. It’s just sad.
Nickel & Dhall can save the day. Why these aren’t the preferred native formats fully-typed and reading in a user opting for the worse YAML experience is beyond me.
Nickel literally just reached 1.0, Dhall’s type system is non-gradual. Just a couple of reasons why they might not be mainstream yet.
Yet still a better experience than YAML 🙂
I’ve been using Lua for a while, and I am progressively moving more and more into Lua. At this rate I don’t think I will ever go to another format as long as I live. It’s too OP
There’s a good number of vendors attempting to create their own languages for specific niches. What’s more interesting to me is novel application of existing languages. E.g. what if GitHub actions just invoked an entry point you defined, then your program (written in the same popular language you’re already familiar with) calls back to an API to define the next tasks and jobs.
Lol, perfect description of how every new workflow file gets created.
I want to mention act, a terribly-named tool for running GH Actions locally prior to push. It does not always work, but sometimes it can save time when debugging Actions.
There’s also gha-runner which Pernosco (the rr debugger developers) have developed and use to run GA workflows on AWS.
By the way, you can just write plain-ass shell scripts and run them in your CI. They run locally, giving your PHP-like productivity where you iterate 50 times a minute. Not one iteration every 5 or 50 minutes :)
Our github config just lists 11 or so jobs:
Same with sourcehut:
https://github.com/oilshell/oil/tree/master/.builds
Then we have tens of thousands of lines of tests and benchmarks, that run in both Github and sourcehut, in containers:
Our subdomain name is funny – http://travis-ci.oilshell.org/github-jobs/ – and proves that the generality was not speculative ! We moved off Travis CI when it rotted after getting acquired. It was literally zero disruption since we can always run in multiple CIs. (obviously should rename that domain)
Now I understand that some people look at 20,000 lines of shell and get scared.
(1) Well first, this is a huge amount of functionality, which not every project has. All these tests, benchmarks, metrics, docs, etc. running at every commit:
https://www.oilshell.org/release/0.18.0/quality.html
e.g. Python and C++ linting, Python/C++/shell unit tests, Address sanitizer, UBSAN, Clang coverage, comparing GCC/Clang, cachegrind, uftrace, bloaty, etc. – basically every kind of code quality you can think of
And HTML results to browse
(2) The point of https://www.oilshell.org is to make 20,000 lines of shell less scary – more like a real programming language, which is necessary for CI
We even have Hay Ain’t YAML so you can interleave shell and declarative JSON-like data seamlessly
It’s all parsed statically so you get your code syntax errors and your data syntax errors all at once.
I showed that in the recent thread on YAML – there is no “else-me-no-care” language anti-pattern. And again no textual pasting of languages together.
There are a lot of things I don’t like in GitHub Actions. Most other CI systems I’ve used have a client-server architecture where the runners are dumb and just run commands. GitHub Actions is rebadged Azure Pipelines and it has a model of a fat agent on the machine that runs the entire pipeline. This is good for a cloud provider because each runner is just a pristine VM but that has a bunch of knock-on effects, most notably:
The nicest CI system I’ve seen used ZFS snapshots and clones for each build. When you wanted a new build, it would find the newest snapshot in your git history, clone that, and then pull your changes on top and do an incremental build. You could then do periodic checks for clean builds (and use clean builds for releases) but if you had a small bug fix on a previous commit then it ran that almost instantly. You can build this kind of thing out of containers on GitHub Actions, but it requires building all of that infrastructure yourself, in a way that likely breaks a load of assumptions of other Actions.
Pretty great writeup, thanks!
I maintain one github action, and testing it is a nightmare for the reasons outlined in “debugging like I’m 15 again”. I can’t run it locally. My action queries github to get state of various repos, and I can’t mock that either, so testing is pretty much manual with eyeballing what could be wrong. The whole ecosystem could be made so much better…
Honestly I still don’t understand how GitHub managed to build something like GitHub Actions after seeing what GitLab-CI does. It’s not perfect by no means, and it will never be, but the non-trivialities of such Ci system are almost infinite and building something secure, performant and widely usable requires some pretty involved thought process - little of which went into GitHub actions.
They didn’t. Azure DevOps built Azure Pipelines. GitHub rebranded it and made small tweaks but the GitHub Actions Runner is a fork of the Azure Pipelines Agent. Azure Pipelines was designed to be tightly integrated with Azure and easy to add your own Azure Scale Sets instead of the hosted versions (but pay MS either way).
This is true.
This is false. All the Azure-attach stuff was added much later, and was not part of the core design.
Azure Pipelines, née “Build vNext”, was designed as a reaction against the “bet the farm on Windows-only tech” in the previous TFS build system(s). The explicit goal was to build a platform-agnostic, fairly-dumb agent orchestrated by a cloud service. The first version of the vNext agent was written in Node to help provide that separation from the “MS ecosystem”. (We did take a bet on .Net Core when we productized the system, leading to the downsides you mentioned in another thread.)
I’m not really objecting to your implicit critique of the result, but I did want to correct the record of facts. Source: I was the PM to own the platform pieces after the original PM who helped design it got promoted to, effectively, architect. I think you and I even corresponded when we were trying to get a BSD fork of the agent going.
Thanks. I’m a bit confused by this:
I don’t see how you get from that as a goal to the code that shipped. It took a load of dependencies on not-very-portable things, but it was also not written in a portable style. For example, there was no platform abstraction layer to centralise the things that differed between platforms. Things like ‘is the filesystem case sensitive’ are done on explicit ‘is the OS Linux’ checks scattered throughout the codebase so porting to a new platform required a load of cleanup. This is the kind of thing I’d expect in code written for one platform followed by a quick-and-dirty port to something else, not something designed to be platform agnostic.
I’m also not sure how you build something ‘fairly dumb’ that’s so large. Competing systems just expose a simple command runner (which can just be SSH on *NIX systems) and run the scriptable bits centrally. The decision to run all of the plugins on the agent on the target made things a lot more complex.
I think Azure Functions launched after this, which is a shame because it would have been nice to build a richer runner that runs as a FaaS task and just talks via fairly dumb command pipe to the clients.
That’s partly a function of taking a bet on .Net Core when it was in its infancy. I wasn’t there for that part, but was told that it wasn’t a technical decision, if you catch my drift.
No disagreement from me on this. I’ve always chalked it up to a misunderstanding, or perhaps internal disagreement, on what “cross platform” was supposed to mean. To be fair, though, I’m nearly certain that “portable” was not a design goal. I probably shouldn’t have said platform-agnostic, when the truth is closer to “runs on the three platforms customers asked for at the time”.
I agree the result as-implemented is complex, but disagree that the design is large. The idea was to put all the brains in the (user-selectable, marketplace-delivered) tasks. The agent only does a few things: download task packages from the orchestrator, execute them in the order told, and ship some info (both streamed and batched) back up to the orchestrator.
Compounding the early places where the implementation is complex, the design also got more complex as it accreted features. I guess that’s inevitable in a product that has both competitors to compete with and the aforementioned non-technical pressures.
I feel the same about all hosted CI. Programming in YAML, with minutes-long latency, is awful.
I often wonder how much nicer workflows would have been to write if it they were written in HCL like they first set out before they switched to YAML for the GA version.
I was too late to adopt GHA to try it out :/
I think GitHub should provide a way to check this on their end, but one thing that will also help on the user’s end of things, and that I generally recommend as a best practice anyway, is to set up pre-commit and turn on some of the built-in hooks. There are checkers in there which can reject syntactically-invalid files of several formats: JSON, TOML, XML, YAML.
One of my favorite prototypes I built was to take Temporal and build a CI system in it: have it clone a git repo, run a build command, run tests, parse the results, and post them to GitHub as annotations. The most annoying part was that Python (which I was doing the prototype for) doesn’t expose its unit test runners as a library.
But it took two days to prototype from scratch and get a real program. I’ve spent that long futzing with YAML to try to get it working. I’m not convinced that dedicated CI systems even make sense anymore after that.
Is https://www.temporal.io/ the Temporal you’re talking about?
Yes. It’s a remarkably useful system.
I find https://github.com/rhysd/actionlint, which catches a heck of a lot of common errors, invaluable.
Use GitLab, where there’s a lot less magic involved in the CI setup. For a lot of cases you can simply run the commands in the specified Docker container.
Try the
check-jsonschema
pre-commit hook.IIUC this is typically solved with
if: github.repository == 'my-user/my-repo-name'
.Forks can remove the if, so that doesn’t help much.
That’s … a very good point. I’ll have to ask my colleagues why this pattern was introduced; maybe it’s for some other reason.
It’s useful when you can trust your forks, to skip running things there that would fail on a fork. It’s useless as a security measure.
Use gitlab CI