Changelog Interviews – Episode #596
Securing GitHub
with Jacob DePriest, VP and Deputy Chief Security Officer at GitHub
Jacob DePriest, VP and Deputy Chief Security Officer at GitHub, joins the show this week to talk about securing GitHub. From Artifact Attestations, profile hardening, preventing XZ-like attacks, GitHub Advanced Security, code scanning, improving Dependabot, and more.
Featuring
Sponsors
Socket – Secure your supply chain and ship with confidence. Install the GitHub app, book a demo or learn more
Neon – Fleets of Postgres! Enterprises use Neon to operate hundreds of thousands of Postgres databases: Automated, instant provisioning of the world’s most popular database.
Cronitor – Cronitor helps you understand your cron jobs. Capture the status, metrics, and output from every cron job and background process. Name and organize each job, and ensure the right people are alerted when something goes wrong.
Fly.io – The home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.
Notes & Links
Chapters
Chapter Number | Chapter Start Time | Chapter Title | Chapter Duration |
1 | 00:00 | This week on The Changelog | 01:46 |
2 | 01:46 | Sponsor: Socket | 03:42 |
3 | 05:28 | Let's talk GitHub security | 02:43 |
4 | 08:11 | The responsibility of security | 05:40 |
5 | 13:51 | Securing change of ownership | 02:48 |
6 | 16:39 | Applying Attestation to XZ | 01:26 |
7 | 18:05 | XZ-like attacks are scary | 03:52 |
8 | 21:57 | The challenge of the defender | 06:43 |
9 | 28:40 | Behind code scanning | 02:54 |
10 | 31:34 | GitHub Advanced Security features | 02:05 |
11 | 33:40 | Sponsor: Neon | 05:47 |
12 | 39:27 | Dependabot signal vs noise | 01:15 |
13 | 40:42 | Attestations from a maintainer's POV | 03:11 |
14 | 43:52 | Attestation tracking the binary | 03:02 |
15 | 46:55 | Attestation goes beyond SBOM | 01:47 |
16 | 48:42 | Are SBOMs widely used? | 00:47 |
17 | 49:29 | 45-ish minutes to AI! | 05:12 |
18 | 54:41 | Proactive vs reactive security | 04:47 |
19 | 59:28 | Sponsor: Cronitor | 01:30 |
20 | 1:00:58 | AI red teams | 04:36 |
21 | 1:05:35 | Jacob's security war stories | 05:23 |
22 | 1:10:57 | Wave a magic security wand | 03:06 |
23 | 1:14:04 | GitHub as a security centerpoint | 01:43 |
24 | 1:15:46 | How to partner on security with GitHub | 08:42 |
25 | 1:24:28 | Closing thoughts from Jacob | 02:16 |
26 | 1:26:45 | Outro and what's next | 02:51 |
Transcript
Play the audio to listen along while you enjoy the transcript. 🎧
Well, we’re here with the VP and Deputy Chief of Security, I should say actually Security Officer at GitHub, Jacob DePriest. Jacob, thank you for coming on the show. Definitely fans of GitHub, obviously, and securing GitHub. Can we do that, please?
Indeed. It’s great to be here. Thanks for having me.
I think it is secure though, right? Is it anti-secure? Is it fully secure?
That’s our goal on the security team, is to secure the world’s developers platform.
I think it’s a good thing. That’s why we wanted to have you on the show. We obviously had an xz attack or issue, I guess, a while back… And in a conversation on this show we talked about – we speculated, at least I did, on the role of GitHub to prevent things like that by hardening the profile of individual developer users on GitHub. I’m sure we can go deep. Where should we begin when we talk about securing GitHub? What’s a good place to begin?
I mean, starting with the developer is kind of where I always like to start. So I think that actually sounds like a great place to start. I think when we talk about open source security, we talk about the supply chain, we talk about all these things, you could start anywhere, but at GitHub we always like to start with the developer. That’s kind of our central ethos, is how to empower the developers, how to secure the developer workflow. And so that’s kind of our approach there.
Last year, about a year and a half ago we announced and started implementing what at the time was kind of a fairly controversial initiative to turn on mandatory 2FA for all contributors on github.com. And we’ve been pretty successfully rolling that out. But I think there’s other things we can do in the developer account space as well. I’m happy to dig into those things.
I think when we look at the xz attack, that’s a lot of social engineering in a way, but you also have sort of profiles on GitHub that may be nation-state based. There’s a lot of speculation around that particular attack and that scenario… Would you call it an attack, Jerod? Was it really an attack? I guess it was. It was a takeover. It was not really an attack, it was more like a social engineering takeover, and then infiltration of –
Well, you call it a supply chain attack, but it’s not via an exploit or brute force. It was via social engineering and takeover of a project, and then the ability to release new code as the owner of said project without people knowing that the ownership had changed. Yeah.
[00:08:10.10] I guess where I’m curious is how much does GitHub take the responsibility of securing profiles beyond simply “They’re secure”? As in, there seems to be nefarious actions in the profile, it doesn’t seem to be some of the social constructs and contracts of being a citizen of developer land in the world… Like, how far do you go in terms of securing proactively, and maybe your own ideas, and then how GitHub may react in the future to these kinds of things where you have profiles, and if there’s prevention at the profile level, what can be done?
Yeah, it’s an interesting question. I think that the way I kind of have been thinking about this is I think it’s a bit broader… Because when you start thinking about social engineering, you start thinking about the techniques and approaches that individuals could use in these cases, even if the profile is secure, even if there’s lots of investigation and telemetry and all those kinds of things, it’s still not necessarily obvious when something is nefarious versus when something’s gonna stake. So in some ways, I think what we’re talking about – the analogy I’ve been thinking about actually is most sufficiently-sized corporations, businesses, organizations, government entities have an insider threat program. And in some ways, this is sort of like the insider threat scenario for the world, for the larger software ecosystem and how do we think about that.
And so certainly, I think developer accounts and profiles are an aspect of it, but I also think there’s an element of the broader supply chain, and things like attestations and SLSA compliance. So the framework to help secure builds, essentially. So what is going into a build, not just what’s the piece of software that was downloaded from the Python PyPI registry, or the Go registry, but what was used to build that software? Where was it built? What were the instructions that went into it, and can we know cryptographically that the software that we’re installing came from that build process, and was created by that developer?
So it wouldn’t have necessarily solved this particular challenge, I’m not sure that it would have, but it sure would have given the security researchers looking into it and trying to figure out what happened way more tools and confidence much more quickly than they would have otherwise had. And so I think this definitely has an element of as a community we’ve all got to come together and figure out what are the standards, what are the ways we want to distribute software, what are the trust signals that we can all get when we’re thinking about what software we’re going to use in our products?
I have a good idea. I think for eight bucks a month, you can just get a verified badge, and then you’re good to go. Right? You just verify your account, and we can trust you immediately.
That’s right. Nobody ever lies on the internet.
[laughs] I mean, to that point, do you think that doing GitHub’s best to reach into the real world and confirm people, “This person is potentially suspicious, this person is okay, this account is being run by somebody who didn’t previously own it…” Let’s forget about the implementation, because I understand how hairy that could all be. But do you think the idea is good? Or do you think it’s not a place where we should even be prying into these things? Because going back to the xz, it’s like “Who in the world is Jia Tan? Is it a person? Is it a nation-state? Why were they trusted?” etc, etc. And it’s like “Well, if we can know who Jia Tan is, and then track that, or know more information, then we can make wiser choices.” But do you think that’s a worthwhile effort or not?
[00:11:54.20] I certainly think it’s something that the broader software industry should continue exploring in all SaaS platforms. I also think there’s a balance here, because one of the promises - and it’s more than a promise; one of the outcomes we’ve seen from the open source movement in the last 20 years is people in developing countries, people all over the world, from different socio-economic backgrounds being able to contribute, be able to get up to speed, be able to make a meaningful difference in a piece of software… And I think we have to recognize that not everybody who is contributing is coming from a place in the world where they either have the technology or the identification system or the infrastructure to be able to do this kind of verification… Or potentially, it puts them in some sort of risk in whatever environment they’re in.
So this is why when we developed the mandatory two-factor authentication we didn’t jump straight to “We have to have a YubiKey, or FIDO, or a passkey.” We continued to leave the door open for a wide range of 2FA options, because we have users who are students in schools, and don’t have mobile phones, or can’t afford mobile phones, or are in an area where they can’t kind of perform that two-factor authentication in a way that – you know, the security industry might say is world class, but we have 100 million developers on the platform, many of which aren’t necessarily tech sector, financially affluent folks who can be able to do these things.
So I think we have to balance both of those as we think about the open source community, and then I think that’s where to me it’s “What can the rest of the community do? What are the things we can work with our partners in industry on securing builds, securing attestations, cryptographically verifying what’s going into things and attesting to those, so that the companies that are building these things have a better sense of what’s in them, and the companies with the resources who are using them can contribute back to that and help make these more secure, versus kind of putting all that on the individuals?”
What about change of ownership? It seems like that is a pretty strong signal of potential problems. And I’m not sure if GitHub has anything built into it, in the security tools, around that particular thing? Like “This repository is now owned by a new user or org.” I mean, it seems like a lot of times that’s still not a big deal, but as a downstream person, I’d still want to know about it and be like “Well, I went and checked it out. It seems legit. I’m cool.” Or “This doesn’t seem legit. Let’s take action.”
We have some protections in place now in terms of the account ownership, and particularly if for instance somebody changes their username, and somebody else grabs it real quick, and things like that… But one of the things we released recently, which will be free for kind of public use as well - it’s not just a paid feature - is attestations for our builds. And so what you can do here with GitHub Actions is let’s say you’re an open source developer, you’re working on something… Normally, you kind of do the build, however; maybe you use GitHub Actions, maybe you use something else. And then you push the artifact up to PyPI, or you push it up to Rust, or wherever. And then when a user goes to download that artifact and leverage it on their systems, they have no idea which repo it came from, what org… I mean, it says the name of the repo on it, but there’s not really a way to prove that it was you, and it was this build process, and it was this repo.
So that’s where things can get really wonky, if the repo changed ownership, or the users, or the lead contributors all rotated out over a weekend for some reason… Then what do you do as an end user? Well, you can’t really do much today, or in the past. Now with attestations what you can do is you can actually say “I want to cryptographically verify this build”, and you can even do things like “I want to make sure that it came from this repo, this org, this branch”, and you can actually attest to that before you deploy it in your environment. And I think that’s something that has been possible through partnerships with things like SigStore and other kind of cryptographic means in the past… But the accessibility of that’s been hard. It’s not really been built in in a way that the average developer could take advantage of it. And now with attestations, it’s literally just add an action that we support and maintain to your workflow, and it produces an attestation that people can check against if they want to have that level of rigor in their deployments and security builds.
[00:16:15.20] That’s step one, right? That’s not going to solve everything. But I think that’s the path we have to go on as an industry, particularly with open source, is making these things more transparent, making them not just researchable-transparent with somebody, a human’s eyes, but machine-readable-transparent, so that we can start to make risk decisions on them in a programmatic, scalable way.
Right.
How about this idea of attestation applied to xz in particular, given that entire scenario where you had a social engineer over a long time…? This was a very patient attack. How would this apply there?
So again, I don’t necessarily think this would have been a preventative thing, but let’s fast-forward to a future where most open source packages on the internet have this built in. I think it’s a deterrent at that point. And here’s what I mean. If attestations were used in this case, then it would have been a very trivial manner for any researcher to look at these packages to be able to, within a matter of a few clicks, get to the build workflow that shows the instructions that were happening in the actual build itself. And what went into the build? Was it just the source code? Was there other things that went in? How do we actually backtrace this into the visibility of not just the code, but what went in to take that code into the artifact that ends up getting used by end users?
So I think as we see this adopted more and more, the recognition from malicious actors that this stuff is really accessible, and everybody’s expecting this transparency, and it’s going to be trivial for a researcher to go look at all the build logs and start to build analytics, and scans, and detections against not just the code, but the builds - I think it’s going to be an important step forward in deterring this as a space in an attack vector.
What’s scary about xz is that it was discovered by accident. Somebody who just happened to have just a millisecond too long on their hands, and they’ve found this thing… So how many of these things are happening – given now the zoom out of the patience to do the engineering, the social engineering, to get into place, and the multiple profiles and cat-fishing that took place to get to sort of wear on the maintainer… That person was taken advantage of in terms of what a maintainer goes through to build, run, communicate etc. in an open source community a software like xz, for example.
The scary part is that it was discovered by accident, and I think you want to have this attestation, this build process look, this sort of reproducible build aspect that’s verifiable… But then you have the other side, which is like, okay, if I’m going to become a core contributor, or a maintainer, or have right to master or main on a given GitHub repo, at that profile level – I know that you have 100 million developers across the platform, but there’s a certain level of developer that begins to become a core contributor to a key piece of software. And that person is different and more unique than everybody else on the platform, insofar that they have a level of power and control given the prowess and usage of that software. So they kind of elevate themselves. And you were a part of the NSA, so you get security clearances… Not everybody can get a security clearance. So they’re set apart, right? And so I’m curious – I think this was Jerod’s angle… “How can we set apart certain profiles to have certain levels of awareness of the personhood, so that we can have more trusted software?”
[00:19:58.26] Yeah, I think it’s a great question. I would pivot it slightly… At least the way I think about, it’s less about the profile and the human and more about the expectations of these critical pieces of software. And here’s what I mean. I think there’s kind of two elements to this. I think one is what is the responsibility and expectations for the organizations, corporations, companies that are using this critical software? Do they have a responsibility to look into and ensure the security of these core fundamental building blocks that really power a lot of the internet and a lot of these companies?
I think today we’ve seen this – I mean, this is a few years old now, but we saw this in log4j, there was this outcry of “Well, are we going to hold these developers accountable?” and it was like a handful of folks over in their spare time, building this stuff. They weren’t resourced to secure, build, look at these things. So in many ways, I don’t view the malicious intent from potentially the xz case as any different than an accidental or poor programming practice, or just not securing it. I mean, to a certain degree, the outcome is the same, in the sense that there’s insecure software that is being included in core functionality across a lot of platforms. So I think some of this is – I think we have to, as a community, take more responsibility for the open source software we’re using. And I think on the platform side, I think there has to grow an expectation of the security tooling and expectations of the code that we’re using.
This is where things like GitHub advanced security, code scanning, secret scanning - and there’s plenty of other tools out there too, but I think we have to elevate the expectation that these core pieces of software are going to have those things turned on. They’re going to have security scanning with the results made available, or at least something that’s consumable as an artifact there, so that we’re kind of hitting this from multiple angles to really level up the security.
The challenge of the defender is that you must secure the entire thing. Right? You’ve got to fortify the entire house. And the advantage of the attacker is they only have to find one way in.
That’s right.
Doesn’t that seem futile? [laughter] I don’t know, I’m just getting a little bit worn down, perhaps… because I’m just thinking about how many lines of code are in, for instance, the Debian distribution. Because xz is low-level software, and certainly widely deployed, but mostly invisible. And would we have considered it critical software? I mean, maybe some people would have, but for the most part, it’s just down there, it’s utility software… And how much of that is there? Millions upon millions upon millions of lines of code, of course…
Yeah, totally.
It just seems like we have to overhaul – I mean, you’re talking about new-ish best practices around writing and deploying secure code… But it almost requires an entire industry come to Jesus moment with regard to these practices before it’s ever going to actually help us.
Yeah, I mean I think if we were just looking at the codebases and assumed that all of it was sitting unprotected on the internet, it would feel and likely be futile, to be honest. And I think this is where the rest of mature security programs come into place, and things that we can do as an industry. So I think that that’s where like zero trust and identity as a perimeter and those kinds of concepts, secure by design come into play. So if we have strong authentication in front of access to a lot of these systems, if we have network isolation between key systems, if we have role-based access control… So we kind of assume that parts of these systems will eventually experience some sort of security issue, how do we firewall those off from other parts of the system? I think this is where the rest of that comes into play. And I also think the other element to this is I think the industry does need to level this up.
[00:23:58.09] So the Secure by Design pledge that was announced in many companies, including GitHub, signed that RSA a few weeks ago, talks about this. It talks about needing increased commitment from key players in the industry to implement secure by design principles as part – not just as part of their internal programs, but as part of the products they offer to the world and to users, so that the settings that make things more secure are on by default, even if it causes a little more friction, or one more click for a user. And I think that’s really an important part of this, that honestly we do have to progress as an industry here, and I think it’s critical that that’s the other element of responsibility that companies, corporations and organizations take in this space.
Yeah, well said. I think it’s tough when you get to the individual org, or individual developer, and we’re relying on them to also do their due diligence and their best practices… Because a) education is a problem. A lot of us don’t know. And then b) the constant pressure and stress to be shipping more features and code, faster, stronger, cheaper etc. with tools now helping us write code that we may not exactly vet… That just makes the problem even more massive, because we need the big players to adopt and to sign pledges and to like push out secure best practices and suites of tools and everything… And then we also need the awareness – we need to equip the everyday developer with the ability to also do these things, use these things, and really just have their wits about them, despite all the pressures pushing them away because of that push and pull between convenience and security; that relationship is just so fraught.
Yeah, I totally agree. I’ll give you a concrete example… And I’m obviously more familiar with what we’re doing at GitHub than other places, but… We have had a feature for a while for enterprise customers - and public repos - that was opt-in; it’s called Push Protection for Secrets. So we have this thing called Secret Scanning, that if we detect a secret in someone’s code, so like a structure AWS token, or Azure token or something like that, we’ll alert, it becomes a security alert… But it had to be turned on. And recently, we enabled push protection, that stops those secrets from getting to the public repo, before the commit happens.
And so you’re a developer, you’re working on your laptop, you go to push up a change to GitHub; if we detect it, we’ll stop it before it gets to the public repo, and send an error back and say “Hey, we detected a secret.” That’s push protection. We turned that on for all public repos recently. All public repos on github.com. And that increases friction, to a certain degree. There are gonna be developers out there who are just pushing test secrets up to their repos to try things out, and they’re gonna get frustrated, and they’re gonna have to go search and figure out what setting to turn off, or whatever… But it’s a secure by design principle that we believe strongly in, is that source code is not the right place to store secrets. And we continue to see issues in the news and industry where things have gone really wrong for companies, where an innocuous, probably well-intentioned secret was put in code, somehow that code gets leaked, there’s a phishing incident, and then all of a sudden, that secret is used to pivot way further into the infrastructure and cause a lot more damage.
So we believe this is a core tenet of secure software development, and so we turn it on by default for all public repos. And I think that’s an example of the types of things I think we need every company, every organization who’s shipping capabilities to developers/users, to think about “What can I turn on by default? What can I just take away as a choice or an education opportunity for someone?” instead of “We’re just going to do this.” And sure, you’ve got options to turn it off if you need to, but this is the way it’s gonna ship.
How was that received?
Well, so far, honestly. I think it’s one of those things where thankfully most developers aren’t doing this every day; you’re not pushing secrets to code. And I think it’s very likely that many who are, didn’t really take the time to step back and think “Oh, yeah, maybe I shouldn’t do that.” Or “Maybe there’s another way to do this.” And you can still override those alerts as they come in, but as far as I’m aware, we’ve had generally positive reception.
[00:28:18.26] Same thing with mandatory 2FA. We’ve seen a significant drop in support tickets since we’ve rolled out the requirement to make 2FA mandatory, and then we’ve seen a 95% opt-in rate across code contributors who’ve received those requirements.
It was a day of great joy when I 2FA-ed myself on GitHub, so I was happy about that.
Same.
What is behind the scenes of this scanning process? How did you have to rearchitecture git push, essentially, to GitHub? Did it have to be a sandbox of sorts, that gets pushed to, then scanned and then kicked back? What’s the process? And even what’s the cost center? Is this a cost center for GitHub, to have to pay for all of this source code to be scanned? What’s the architecture? What’s the cost? What’s all the things?
So I’m not gonna butcher the architecture by trying to explain it in detail…
Okay… Give us a high-level overview.
But in general, yes, that’s essentially the gist, is there is a sandbox space where we do the scanning. And it’s all encrypted, so it’s not like we’re punching out of that… But it hits the GitHub side of it, and before we put it into the git commit, git pull request, whatever that is, on the actual github.com platform, we scan it in a sandbox first. And if we find it, we kick back an alert that says “Hey, we’ve found this. We highly recommend you deal with this. Clean your history out, remove it from code, get everything cleaned, and then push up again.” And that’s kind of the gist of it there.
In terms of how we structured it architecturally, we actually partnered with industry partners across kind of every sector here that does structured secrets. I think we have over 350-400 partners that we have essentially the ability to scan for their secrets, and people can just register for the program. And in a couple cases, we’ve actually gone a step further where we can show enterprise customers whether it’s still valid or not. So it’s not only a secret we’ve found, but it’s an active secret in code. Slightly different than push protection; that’s after, if we’ve found it in your code. So that’s actually a huge benefit as well to developers.
From a cost center perspective, supporting the open source community has and always will be one of our kind of core spaces that we invest in and that we support. So we essentially support most of the GitHub advanced security features that enterprise customers pay for, for all public repos. That includes the compute behind it, that includes the scanning, that includes all those things; the things that you can get on a free account on github.com are incredible. Codespaces… So many minutes a month for Codespaces usage, which if you don’t have a developer laptop is a game-changer… And even for me personally, if I’m just going to tinker around with something on the weekend, the last thing I want to do is spend the first five hours getting my laptop patched, up to date, whatever developer tool I need installed… I don’t worry about any of that anymore. I fire up a Codespace, which is a remote development environment that we host, and I get to work. And free Actions minutes, stuff like that. And that’s part of our mission to accelerate human progress through software development. And so I think that’s – it does cost money for us to run those things for the public repos, and I think that’s okay.
Is this the first time you’re hearing about GitHub’s advanced security features, Jerod? Or is it just me the first time?
First time for me, yeah.
Can you explain that then, Jacob? Because I kind of get what it is, I googled it, landed on a page… But it seems enterprise-focused, but then as you just mentioned, some of this is public repo blessed. Can you give us a rundown?
[00:31:57.03] Sure. I mean, the gist of it is GitHub advanced security is our static analysis capability for software, and that’s based on CodeQL, and it also includes our dependency scanning ability. So I’ll give you an example here; if you’ve got a repository on github.com, if you turn this on, if we see a dependency in your source code that’s out of date, we’ll just send you an alert, and if you have it turned on, we’ll actually open a pull request on your behalf, on your code, and say “Hey, we’ve found a dependency that’s out of date. And also, here’s the updated version that we recommend.” And if it fits, you can just merge it and move on. I do this all the time on a handful of projects that I do personally on github.com. I just kind of like go in every month or so and I look at all the pull requests Dependabot’s opened for me, I merge them, and I’m happy and I move on.
And then the last bit is secret scanning. So those three together are GitHub advanced security, and then for enterprise customers we also have things like a security overview, and trending, and charts, and things like that, that will help enterprise administrators and security teams to administer this across their environments. So that is an enterprise offering that we do offer to our enterprise customers as a security suite for their source code. But then we offer – most of that’s available for free to public repos that are hosted on github.com. So a lot of the open source community can take advantage of that, those that are hosted on GitHub.
Break: [00:33:24.27]
Has Dependabot gotten any better about not warning on latent code, and being able to detect actually used code vs. code that happens to be in a dependency that’s never executed in the run of a program, or dev dependencies only? Because it seems like in the past it’s had a lot of false positives for me, and so I jumped ship. I’m just like “Well, I’m kind of done with you, because 90% of these aren’t actually my problem, but you’re making them my problem.” And I’m assuming that that’s something that y’all work on, because I’m probably not unique in that way, and I wonder if it’s gotten better in that way lately.
We do work on that. We strive to make the alerts as meaningful as possible across the board on our Advanced Security offerings. I think some of the challenges in the dependency space are those dependencies that it’s unclear if they’re used or not, and being able to trace that through the code and figure that out in a scalable way; it’s a difficult challenge. So it’s definitely something that teams are tracking and working on and are always trying to improve.
Fair. I wouldn’t want to work on that problem. I can understand that it’s a hard problem, but I also want somebody to work on it. Can we go back to attestations? Because it seems like it could be a good step forward, in the right direction… And obviously, it’s out there and ready to use, and stuff. You gave the workflow a little bit from the end user perspective. Like, you have GitHub Actions, if you’re using them, toggle on a thing, you probably have to decide what happens if an attestation fails, and that’s roughly the workflow. But what about from the maintainer perspective? What do I have to do to have my code attested to as I’m deploying it out to people?
So the main thing today - and again, this is a very early capability that we’re shipping, so I expect this to continue to just improve as more of our partners adopt it and it’s become sort of ingrained in the developer ecosystem… But today, it’s as simple as adding a specific GitHub Action to the workflow. So a lot of open source projects do their builds on GitHub Actions, and in the workflow itself, you can specify different segments of it. And often, you would include an action for checking out the code. In fact, that’s one of the main ones pretty much every action on github.com includes. You might include an action for deploying your artifact to AWS, or to PyPI, or to Azure. And then we’ve written and released an action that will let you attest to the code. And essentially, all it does is as the build’s happening, once that artifact is produced that you’re either going to deploy somewhere, upload to PyPI, or Rust, or wherever, it will sign it using our trust or cryptographic kind of root trust, and then it will store that attestation in the same repository on which the action was run on.
And so there’s essentially a repo named /attestations, the attestation’s there, and it’s available for download and use for anybody to verify against cryptographically through the GitHub command line tool.
And on the receiving end, obviously you’re still in GitHub Actions, so now you’re just using whatever you guys built to go ahead and do that process during your own deployment, when you’re saying –
Well, so you don’t have to be in the GitHub ecosystem at all to use this. That’s the great part about it. So let’s say the artifact is built, and it’s uploaded to PyPI. And then a developer who’s sitting in another company, using a completely different tech stack, but still uses that PyPI repo, they can download the PyPI artifact onto their local machine, and then they can use the GitHub command line to go check the attestation to see if it’s the same one that they think it should be built on that repo, that org, that flow, that branch, whatever their criteria is. And this is where you can use things like policy enforcement software. Open Policy Agent’s a popular one. So you can write policies and say “Before I deploy, I want to make sure that everything I’m deploying came from one of these three organizations on github.com. The source code, it was created and built there.” But where it was downloaded, and whether it was from a local Artifactory instance, or a public artifact store like Npm doesn’t matter, because the attestation’s cryptographic and it could happen out of band.
Does that allow you to actually track the binary in the case of a binary deploy to the source code commit? What do you know about the source code, the exact –
[00:44:02.15] Yeah, you can go all the way down to the commit; you can go to the commit, you can go to the actual workflow that built that binary, which is fantastic… And this is kind of why when we’re talking about xz, I think this could end up being a helpful deterrent. Again, I don’t think it’s a one-size-solves-everything situation, but you can go from essentially a binary you’ve found laying around on your computer, to knowing which repo built it, which workflow, the build instructions that went into it, which commit went into it, and it gives you this ability that is incredibly difficult to do now at scale… And that’s really how we see this going from an industry perspective, is more and more tools like this, that help us do this at scale, to give that essentially unfalsifiable paper trail.
That’s really interesting to just find a binary and attest it. Like, how does that work? Literally, how does it work?
Underneath the covers?
Yeah. How does that work?
How is the GitHub command line doing it? Yeah, tell us how it work, Jacob.
See, now you’re double-clicking past my ability to be incredibly useful here…
[laughs]
Okay…
But my understanding is - and we’ll follow up if I totally get this wrong. But generally speaking, we are looking at essentially the cryptographic hash of the binary, and then looking up the attestation. You have to kind of know the attestation that you’re going against. Knowing which org you think it came from on the internet, or knowing like “Hey, I think the GitHub Actions org built this.” So you have to know where to go ask for the attestation generally, but hopefully corporations deploying in a high-sensitive environment have that knowledge. But you don’t necessarily have to say “Oh, I have to go find the attestation file myself.” You just have to roughly know where it’s at. You can point the command line there, and then it’s going to go grab the attestation and compare the cryptographic hashes based on the signature and the attestation signing, to be able to tell you if it’s the same one or not. And then once you have that, then you can display all the information about the build that went into that binary.
That all makes sense. Is that a GitHub – it’s obviously GitHub-specific in your implementation, but is this something that other platforms can also do attestations and just follow the same… Do we have a spec or something where we could just get it to be generally useful?
Well, our approach is based on the SigStore approach. It’s essentially a scaled version of what we released last year with Npm and Sigstore. So for public repos, there’s still that kind of normal flow, and then enterprise customers have the opportunity to use a private implementation that’s inside GitHub, so that their attestations don’t show up on the internet. They might not want them to.
This is like the second time I’ve heard the phrase “double-click” in like the last week. I haven’t heard it too frequently, but now I’ve heard it twice in one week, so good job, Jacob.
We want more. Everything comes in threes.
Yeah. For sure.
Well, I was gonna ask you what else you’re working on? What else is cool in this space that is burgeoning, or in development, or like the next attestation that’s going to add another layer to our defense in depth?
Yeah, this is a space I’m excited about, because I want to see more and more parts of not just the GitHub product, but software development products make use of these things… And I think that’s where we’re gonna be heading as industry. So right now we talk about supply chain and table stakes that are necessary for secure development… I think there’s some things that are becoming common in the industry; whether everybody’s doing them or not, everybody at least acknowledges they’re necessary. That’s things “Don’t put secrets in code. Keep your dependencies up to date.” Things like that.
I think that the next step is for us to as an industry say that including attestation and a full paper trail of what is going into the software that we’re all using… And attestations is a step beyond SBOMs, right? So SBOMs is “Here are the ingredients that are going into the recipe I made”, but attestations gives you that receipt for where those ingredients came from. So you know which grocery store they came from, you know which shelf they came from, you know which manufacturer made that and shipped it to that shelf. So it gives you that next level down.
[00:48:16.09] And so I think that’s going to be where we’re headed as an industry, and where I think we should head as an industry… Is not only making those things available, but making it just standard, as part of the build workflow. Everybody expects it. We have tools that show it, it’s very easy and built into the artifact repositories, the developer workflows, CI/CD flows, things like that. I think we had to build the scaffolding and framework first, but now that that’s coming along, I think this is where we’re headed.
How well received, I suppose, has the SBOM been, the software bill of materials? Is it widely used? Is it generally adopted? I remember talking about this and hearing about this, but I’m not in this world to even write one, build one, care about one, but… How well received have they been?
I think there’s a lot of people interested in it. I think everybody acknowledges that it’s part of the solution we need to adopt as an industry… But it’s also, I think, acknowledged that it’s not going to solve everything. It’s just one part of kind of this broader trust flow that I was talking about. It’s something we support on GitHub, it’s something that a lot of companies are making standard as part of their build and deployment practices… But I don’t think it by itself is necessarily going to be the solution to the supply chain challenges we all face.
When you talk about broader adoption of these practices and tools, what are you and your teams doing in order to get that done? Obviously, you put the features out there, you make them usable, and then you blog about them, and then you use GitHub’s channels… But are there conference talks, is there training, are there tutorials? Because really, a lot of this has to be known before it’s going to be adopted. What are you doing there?
The short answer is yes. Our teams are going out to conferences and talking about this. We’re putting together documentation and training on these things… I think there is an awareness here that is part of it, and we’re absolutely doing that. At a broader level, you asked what else we were working on, and we made it almost 45 minutes without talking about AI…
Alright, you’re allowed.
Hey, we waited. We waited a while.
[laughs]
But I think this is the other part of where I think that we can make things easier in adoption for the developer ecosystem. So I’ll give you an example here… We’ve had this capability for a while called GitHub code scanning, CodeQL, the one I mentioned earlier, from GitHub advanced security. And it’s great, because it’s got a really powerful engine that essentially models a piece of software, and then it will trace the sources [unintelligible 00:50:41.29] the inputs and outputs of functions, and it’ll trace the data through the source code to find out where there’s a potential vulnerability. It’s fantastic.
In the past, what would happen is this would run on accounts that had this enabled, and it would show developers like “Hey, we think we’ve found a vulnerability here. Here’s why, here’s some documentation. You can read about it.” But then it was sort of up to the developer to go figure out what to do about it. And so they had to pivot out of their workflow, they had to go to a search engine or wherever on the internet and go figure out “Okay, well, that’s great that you’ve found this for me, but how do I fix it?” And so with AI, what we’re adding in is the ability – we’re calling it Code Scanning autofix.
And so what will happen is in the pull request, where it traditionally shows you what’s wrong, or what we believe to be a vulnerability, it will now also open a second part of the pull request with a suggested fix in it. And so we’re using Copilot AI to be able to do this, and we’ll say “Hey, we’ve found this thing. We think this is going to fix it for you. If you agree, just hit Accept and move on.” And the ability to kind of get that in front of a developer and make it part of their flow I think is really, really important. And then also, we’re early days here, but we’re seeing folks use Copilot chat in the IDE, so the interactive chat capability we have, to ask Copilot about these things. “Hey, is there anything insecure about my code? Can I make my code more secure? What would make this more resilient to an attack?” And that interaction with Copilot – it’s going to look at it and say “Well, hey, if you’ve structured your function input this way, it would be safer. Do you want to do that? Just click Yes.” And it copies it over and you’re off to the races.
[00:52:20.05] So I really think this is where we’ve kind of known as an industry that things like static analysis are important, and we’ve all kind of worked with our teams to enable it… And everybody’s on a different level of maturity on that journey, but generally, helping developers keep up with the pace by which vulnerabilities are found, and CVEs sent out, has been - I won’t say a losing battle, but it’s been challenging. It doesn’t feel like we’ve been making ground, and I think AI and things like autofix are gonna allow our developers and our security teams to make up ground in a way we haven’t been able to do before.
Well, now that we’ve broken the seal, I thought this was going to be your answer to Dependabot getting better. I thought you were gonna be like “Well, we’re throwing AI at it, and it’s getting better at detecting hot code paths.” But I wasn’t gonna bring it up at that moment, so I was waiting for you to bring it up before I loop back around to that. I think autofix sounds really cool. Didn’t you see that demoed, Adam, recently? And it worked great in the demo. I’m not sure about real life.
Exact same.
Yeah, it works exactly like that?
Demos always work the same in real life, right?
That’s right.
Yeah. Well, I wonder, is this the real world yet? Is this the promised future world? Are people using autofix today, or…?
Yeah. We’re getting great feedback. I mean, it’s showing right now that the suggestions are remediating more than two thirds of vulnerabilities with little to no editing. And we have bigger plans for this too, to be able to really – I mean, our goal is to make it easy for developers to build secure software… And so how do we think about this at scale? What are ways we can reduce that friction for developers finding and fixing vulnerabilities in code? And the interesting thing with something like autofix and CodeQL is even if it’s clean today and everything’s fixed today, it may not be tomorrow, because security researchers are finding new things every day. CVEs are getting released every day… And so how do we make this an ongoing practice that is low-friction and low-pain for developers? And I think that’s just really part of how we’re trying to design and think about integration of AI capabilities across the board, is how to accelerate developers and get them focused on the things they want to be focused on. And frankly, pretty much every organization and company wants their developers to be focused on. They don’t want them clicking through a bunch of menus and searching how to fix something that they fixed yesterday, but just can’t remember what it is. They want them moving on to the value-add work. And so do we.
What about proactive versus reactive? Because CVEs are very reactive when it comes to security… It’s like, it’s a known thing. And it’s obviously an awareness thing once it’s known. Because then it’s still burgeoning, and more people are being made aware of it. But how about proactive things? I know you mentioned scanning, obviously attesting is part of the, to some degree, proactiveness… What other plans or ideas do you personally have, or does your organization have around being proactive? Typosquatting, things like that, where it’s like “I didn’t mean to type in React spelled incorrectly”, or with a plural. That kind of thing. What are the proactive ways you’re securing things?
I’ll kind of work my way into it a bit. So at a high level, with our SaaS capabilities, CodeQL is why actually I’m a huge fan of CodeQL just as a security practitioner, whether I worked at GitHub or not… Because the way it works underneath the hood is about variant analysis and modeling, not about trying to pattern-match on a specific CVE.
Now, obviously, our security team and researchers are informed by the types of bugs and vulnerabilities that are being found by the research community, and we have a great research team inside of GitHub as well… But the way it works is it’s modeling and looking for patterns and known insecure patterns, versus a specific, like, “Oh, we know that function in this thing is broken or vulnerable.” So I think that’s kind of step one.
[00:56:07.08] I think the other part of this is I think this is where we’re gonna see significant advances, and we’re already seeing advances now, in editor Copilots. So we build and deliver GitHub Copilot, but being able to have that AI assistant in your code, looking for things that are typos, that are “Hey, we saw you typed it this way, but did you mean this? We actually think it would be much faster if you did it this way.” I think we’re gonna see a lot of advances in the proactive space, as those filters and as the models and as things like fine-tuning get better and better in our shipping.
Yeah, I think that’s a good place for AI, obviously, the pattern-matching, and being that buddy next to you. You don’t always have to be on pins and needles of “Am I making a secure choice?” or “This dependency, is it really secure? Has there been a maintainer swap recently? What’s the right name? Did I typosquat, or typo-butchered this, and I actually installed the wrong thing?” We’ve seen that even with ChatGPT, where people will ask for things and they’ll give it back fake – you know this better than I do, Jerod, because you run news… But ChatGPT will give back fake information, essentially.
It’ll hallucinate a package name, and then some users will register that package.
Precisely. Thank you, Jerod. Hallucinating package names. And it’s not real, but now it’s sort of in the zeitgeist of hallucinations, and then people think it’s a real thing… And now, a squatter will sit on that and do something nefarious, which I think - that’s a great place for AI to pattern-match on that, because as a human, I am generally going to be lazy, or potentially even just not as good every single day, all day, every day.
Yeah. Distracted, stressed, flawed…
Yeah, I’m gonna mess up, you know?
Yeah, I agree. I think there’s a productivity gain here too that shouldn’t be overlooked, which is like - there are a lot of things that technical people sometimes just don’t go do, because they’re like “Well, that’s gonna take me forever.” And I’m a good example of this. So I grew up as a developer, I’ve been a developer most of my career, but I don’t develop every day anymore. And so the idea of getting a dev environment spun up and going and doing something productive is usually not worth the effort, and not really what I’m paid to do. But a few months ago there was like a Jupyter notebook that helped analyze statistics and heuristics for the workforce, and it was something I wanted to use as a people manager running an organization to get insight into my workforce… And it wasn’t doing what I needed to do, and I was like “Ah, I haven’t done a Jupyter notebook in like eight years. This is gonna take me forever.” And then I was like “I wonder if Copilot can help.” And so I literally pulled it up and started asking Copilot some questions and started doing some autocomplete stuff, and I had it sorted out in like 20 minutes, had it done what I needed it to do, got my answer, was able to get back to what I really needed to be doing, versus - that probably would have taken me six, seven hours without an AI system being able to do those things. I just think of that times 10, times 1000, times 100,000 for corporations and organizations… And I think that’s gonna just get more powerful as we go with things like GitHub Copilot, and they’ll get better and better.
Break: [00:59:16.07]
I think AI read teams makes a ton of sense. I mean, there’s probably startups doing this; I don’t know if you all are thinking about it or doing it, but I’ve done some penetration testing, especially after I got to college, and it’s very common for an enterprise to hire a security team, an outside consultant to come in and pentest their system. And they’re very expensive, and they’re very good at what they do oftentimes… But a lot of that work is just grueling, fuzzing, and like running this against that, and doing this… They have their set of things they do. And then of course, they’re all like the expert hackers, who - the AI is never going to be as good as this guy or this gal, because they know whatever-whatever. That’s real, real rare, but it’s real… But for the 80% of orgs that like can’t afford red teaming or auditing at all, but could probably just send a bunch of computers to do non-deterministic fuzzing against their systems - that seems like it’d be a win in the security world. Is that going on? Is that something GitHub’s thinking about?
I mean, it’s definitely something that’s going on. I’m closely following several startups that are putting some time and energy into this space. I think it’s going to be another powerful tool in the tool bag very soon. I think generally, in the security space, I think there’s a lot of things that fall into that category, where the first stage of something that we are paying for a very advanced, highly educated user to go do is often repetitive, is often like “Oh, I’m gonna go query this database to like figure this out, then I’m gonna go do this Splunk query, and then I’m going to take that and export everything into an Excel spreadsheet, so I can do a pivot table with these other IP logs that came from over here”, or whatever the case may be. And I think that’s an area too that we’re gonna see AI [unintelligible 01:02:48.28] gain a lot of speed by… Because I think what we really want is we don’t want less people on our teams, we want those people doing the things that they’re trained to do, and the things that really, truly add value on top of that, of being able to use their intuition, experience and find the signals that don’t look right, that know how to go triage, and figure out how to deal with a potential security incident, or rule them out. But there’s often so much time spent before the experience can kick in… I think this is another area we’re gonna see some pretty amazing work done, and I know, there’s a lot of companies doing that in this space right now.
How do you see GitHub’s position in that world? Where do you decide what GitHub should invest in? I’m sure you’re not the only one deciding, but what’s the decision-making process of what is worthwhile for GitHub to be doing, versus “Well, that’s something that some startups can do, but we’re not going to do that”?
It’s definitely not my decision.
Okay. [laughs]
I run the day to day of our internal security team.
Gotcha.
[01:03:56.13] But you know, I can share - our focus is developers. Our focus is accelerating human progress through software development, and enabling open source. And so we tend to focus on the things that we can bring a lot of value to in that space, and that’s why we’re so excited about some of the AI capabilities… Because I think we all see the news articles every day. There’s a million new models and apps coming out every day… I would say it’s probably, if not accepted, at least talked about a lot, that a lot of those are cool technologies looking for a fit, they’re solutions looking for a problem in some cases…
I think software development is just clear how powerful this is, and that’s why we’re so excited about like incorporating that into the editor, into things like – I mean, how often do developers work on code all day long, and they finally get to the time they’re ready to do the pull requests, and they’re like “Do I really want to spend an hour writing a really great set of documentation on my pull request?” Well, what if we had AI be able to scan all the changes that they made and write 80% of it for them before they did that?
So I think it’s been clear to us for a while – I mean, we released Copilot Tech Preview in late 2021, well before the current kind of wave of things was out… It’s been clear to us for a while this is a huge win. And I think it will come in other parts of the industry, by the way. I’m not in any way saying it’s not going to be helpful in other parts of our lives… But I think software development space, given the structure, given the modeling that’s existing and given the tooling and the work that’s already gone into the industry the last 20 years, we’re just seeing huge wins and huge gains already.
So you are in charge of actually github.com operational security as well, or…?
Yes.
Yes. Do you have any cool stories you can tell us? Any long nights, any rough weekends? DDoS’es?
I mean, everybody in the security world’s had long nights and rough weekends.
Tell us some war stories. Come on, man, you’ve survived them. You survived it.
Surviving, maybe.
Yeah. So internally, real quick - so our security team is great. We basically – like, the short version is we protect the company, so we call ourselves hubbers; we protect our hubbers, the laptops and the data and the access of our internal systems, we protect the product… So operationally, on github.com and our products, but also working closely with our engineering partners to make sure that what we build and ship is secure and safe. And then we also help secure the community. So we have a research team that’s out looking for vulnerabilities in open source software. They’re helping educate the open source community and researchers on how to use things like CodeQL, and secret scanning, and how to incorporate AI into their secure development practices. So that’s kind of the three-pillar [unintelligible 01:06:39.07] that we have.
In terms of war stories - I mean, I think, interestingly, what’s been on my mind a little bit more lately has been… You know, I mentioned earlier that we offer a lot of amazing features for free to public repos. So Codespaces and Actions minutes, and free repos… And most of the capabilities that enterprises have, they’re very –
I know where you’re going with this one.
Well, you know, it turns out that threat actors also have figured that out. And when they hear things like free compute, they go right after it for – pick your abuse vector. We see campaigns that will try and escalate the number of stars that a particular repo has to increase its popularity. We see people trying to mine crypto using free compute. I mean, we see a lot. Hosting files on github.com that we’ll just say don’t have anything to do with software development. And so it’s a challenge. And it’s something that we have a fantastic team working on, we have been employing machine learning and AI for a while in this space, we’ll continue to do that… But it’s a really complex, challenging problem. And the balance we have to strike here is because we serve so much of the world’s software development ecosystem, we can’t turn the dials too strict, because then we start locking out hundreds, thousands of users that have legitimate use cases [unintelligible 01:08:03.10] And certainly, those things happen. You can’t fine-tune every filter to be perfect. But we really try and strike that balance of how we do that.
[01:08:12.15] So this is where we’re also working with our product counterparts to understand what are things we can make changes to in the actual product, or maybe a signup flow or things like that, that will decrease the likelihood of abuse, or impose costs a bit higher to actors who want to do that… And so that’s certainly something – there’s new campaigns every day that come out in this space that our teams are firefighting, and they’re doing a great job at it, and it’s something that’s just top of mind, because it’s something we don’t see getting less of, we’re definitely seeing an increase.
Right.
And AI is a tool that those actors are using as well. So they’re using AI to fake issues and pull requests, or whatever the content is, they’re using it to create fake profile pictures, all sorts of stuff.
Wow.
Never a boring day, I’m sure. So what happens in your life, in the life of Jacob, when a DDOS hits, or something? Are you on pager duty, or are you above that? Do you rush into the hospital – the hospital… [laughs]
Hopefully not the hospital.
No, rush into the office, or are you working from home? Do you get situations where “Hey, we’re getting DDoS-ed. What are we gonna do?”, and then what do you do in those circumstances?
Yeah, I think that’s always true in security teams. We’re a fully remote company, so there’s no rushing into the office. It’s usually rushing to my office, if needed… But we have a fantastic team. I kind of jokingly told the team “If you need me to log into production and do anything during an incident, we’ve probably already gotten to a state where things are pretty bad…”
“We’ve got bigger problems…” [laughs]
…and that’s not what I should be doing. So my goal is to support the teams, understand what they need, and figure out “Do we need to page more people?” and have the right people in, how can I support the great leaders that we already have… Then there’s a comms element to this as well. So depending on what the issue is – you know, we had to rotate one of our public keys last year, and we believe very deeply in transparency in what happens on the platform. We’re trusted in the community, and that trust is only maintained through transparency, I think. So a lot of it is how do we want to get these comms out as quickly as possible? How can we be as transparent as possible, sticking to the facts, and sharing as much as we can? Particularly actionable information. So that’s a lot of what we do. Thankfully, our engineering operations team is world-class, and handles a lot of the DDoS attacks, and they’re very quite good at it. So on that particular one, we have a great engineering team on that.
No hospital runs for you then, I guess.
Let’s hope not.
Yeah. Keep you out of the hospital.
That’s good. You don’t want to go the hospital, especially for a DDoS. That’s a different kind of DDoS, you know what I’m saying?
[laughs] You show up and you tell the doctor “I’ve got a DDoS”, he’s not gonna know what to do, you know?
“What? Get out of here. Go deal with that.” When you look at the open source supply chain – I really don’t even like to call it the open source supply chain, but it’s the industry accepted term, so bear with me… When you look at it and you realize that open source has obviously won, and you realize how important a role it plays in just obviously software at large, but innovation, new startups, new side projects, joy in an individual developer’s life, the freedoms that a person can have to create software and just share it simply… When you look at that entire ecosystem as a security expert, what do you wish would be there that’s not there today to secure it? If you had a magic wand and you just somehow wave it, and a couple new things appear - what would those things be? And what role could you personally play in making them possible?
[01:11:52.19] That’s a great question. I think it goes back to what we were talking about earlier, to be honest. I think that today there is a lot of variation and freedom… Which is a good thing, to be clear. I’m not suggesting we take that away. But there’s not necessarily clear, paved paths for open source developers, hobbyists, even more corporation-backed open source efforts, to know what the best practices are for building, securing, deploying, attesting, signing… It’s complicated, right? And I’ve been in this space for a really long time, and so when I rattle some of these things off it, it may feel like “Oh, yeah, okay, cool. That’s “easy”, but it’s not. We didn’t have SLSA frameworks 10-15 years ago. The frameworks and the thinking are there, but I don’t know that as an ecosystem - and this is beyond GitHub; GitHub is part of that ecosystem - makes it really easy for people to do the right thing. So build the right way, secure it, update it, patch it, deploy it, assign it… Like, that end-to-end flow is still complicated. People use – maybe they store their source code on GitHub, but they build it somewhere else. And then when they build it somewhere else, they’re not scanning it, or that place isn’t secure. And then when they upload it, we don’t have any way to see where it came from. And then when we download it on the other side, we don’t really have a way to automatically get a sense of the risk, because it’s difficult to tie all those things together.
And so I think if I could wave a magic wand, it would be essentially to have OS partners in industry - I think GitHub’s part of this - to make those things easier for developers to just do the right thing out of the box. And then, of course, have the freedoms if they need to do something more complex or different; that’s totally fine. But I think a lot of use cases just want to know “Okay, how do I build this thing and deploy it to this cloud provider? How do I build this thing and make it show up in PyPI, and have it trusted with a little badge on it?” And I think to do that well takes a significantly higher amount of work and expertise than would be optimal if we really want to scale this.
It sounds like that world that you just painted is a world where GitHub accepts more and more responsibility as a security center point. You’ve already accepted the responsibility of hosting the open source code, right? You’ve already accepted the responsibility of supporting on all the ways open source at large. So now the final layer might be more and more over time - if not just now - responsibility on the security front. Typosquatting… Like you had mentioned on Maven, if I can download something from there, or PyPI, or wherever, and I can have that attestation back… I think you’re already doing some of the proving grounds for this, but it sounds like you’re for GitHub accepting more and more responsibility from a security standpoint.
I mean, I’ll say that we as a company take that responsibility already very seriously, and we talk about it a lot internally. And I think at a broader level, I think each industry player in the space - we all have to take more responsibility for this. I don’t think it’s just GitHub, I think it’s all of the corporations that are not just investing in open source - because there’s many that do that; they pay developers to work on open source projects. And that’s great. But I think it’s also the organizations and companies that use open source prolifically need to take more responsibility in this space, too. And I think it’s all of us together. GitHub’s already taking these strides, and will continue to do that, and that’s why we’ve released things like advanced security for public repos. That doesn’t necessarily – like, that’s not a free thing for us to do, but it’s an important thing for us to do. So as far as I’m aware, that vision and direction is not going to change from us. We’re going to continue to invest in those things.
Well, let’s imagine then… Because this is probably pretty close to true. There’s a lot of people listening this podcast with us, and they’re an hour-ish in, and they’re like “Man, this is awesome.” But they hear you say that and they say, “Wow, I would love to find a way into – I’m at one of those organizations that could partner with GitHub to bolster the security model of open source, of the open source supply chain.” In what ways can they reach into GitHub, talk to you, talk to others to create that bridge, to create that partnership? What are some of those paths and methods?
[01:16:18.23] Yeah, that’s a good question. I think from a practical perspective, without having to reach out, I think there’s some simple steps that a lot of organizations can take, which is go play with the new attestation capability and use it. Start signing artifacts and making it part of your build workflows, and then talk about it. Tell people how you’re doing it. Give us feedback on what would make it be better. Because I think those key scaffolding building blocks are so important to the industry right now. Turn on things like secret scanning and push protection. Show through example, lead through example on how to do these practices internally.
And I think in terms of the partnership angle, we have a fantastic OSPO, open source program office at GitHub, that does some of these partnerships. The security research team that I mentioned earlier is always out talking up to the security community about how to do these things and level this up and make it better… And then there’s other kinds of external entities. So there’s the Alpha Omega project as part of the Open SSF - the Open Source Security Software Foundation, if I got those acronyms right - that’s looking at ways that some of the bigger corporations like Microsoft and Amazon and others have invested money into, on how to level up the entire open source ecosystem security space, and what are the programs and possibilities that they can do to help do that. So I think there’s opportunities there for corporations to invest financially, if they so choose, to be able to do that.
And then at a very practical level, go sponsor your favorite open source project. I use Homebrew like crazy. Homebrew is awesome. Go sponsor it. That kind of stuff.
I dig it. How about the maintainers themselves? Give me some nuggets for specifically open source software maintainers that are either burdened, tired, excited… Pick any adjectives you want to describe a maintainer. What can they do to personally bolster their GitHub profile? What things should they do? What are specific things they could do on their repositories etc? Or even their organization, if they have an org for their repo. What are some things for them?
Yeah, I think open source maintainers are amazing, first of all, and I’m so thrilled that that’s part of the community we’re able to support every day. I do think that the adjectives you mentioned probably at some points describe every maintainer, maybe all at once, maybe four times differently a day, maybe over their journey… Because I think it can be overwhelming. Some maintainers don’t have a robust set of contributors that are helping, and it’s a one or two-person effort. Our hope is to be able to give them the security tools built into GitHub that get their level of security up to something that is significant.
And so this is where things like just go ahead and turn on code scanning if you haven’t done that yet, and experiment with it, and see if it can help secure the product. And things like attestation. Even if you don’t, as a maintainer, use or care about attestation, make it available to your developers, because it’s part of the repo to your users; turn it on and include it in there once we’ve kind of gone full GA with that. I think there’s things like that that developers can do and maintainers can do, and then I think there are other things that we are continuing to try and make more accessible and easier to maintainers as well. So things like we’ve got some scanning tools we released open source to help make GitHub Actions workflows more secure, and detect insecure or overprivileged requests in GitHub Actions. So there’s things like that as well to just kind of be aware of, and always reach out to the OSPO and other places in GitHub and the community for help on those things if folks need some additional guidance.
[01:19:59.22] I was thinking about something that is part of this conversation, and I’m going to share an idea with you. Maybe it’s a “feature request”, maybe it’s not, maybe it’s already there and I don’t know… But what if there was this idea of consensus for when you add a new maintainer to a repository? That there is a toggle that says, “Okay, every other repo out there secure by default is “I have powers, I can give power.” I don’t need consensus.” But if there’s one or more maintainers, you have to sort of have somebody give somebody access to become a maintainer. But then the other people who are part of the organization have to do it as well. And maybe some sort of like personal attestation, which is like “I, Adam Stacoviak, agree that Jerod Santo can give X, Y, Z access to maintainership and control of this repository.” Something that’s like a – because you can do that personally, right? But is there a way to bake that in with software? Just a simple thing like that. Does that add more configuration, does that add more burden to the process? I kind of feel like consensus is a natural thing to ask for, and why not bake it into the blessing of one more maintainer to the project?
Yeah. I mean, whether we should or not, I’m not going to touch that one, because there’s been many books, research papers and blog posts written on that. Cathedral and the Bazaar is still one of my favorites on the topic of open source maintainership and kind of thinking about these communities and systems… In terms of the technical side of it, that’s actually what we do internally at GitHub for entitlements. So our internal access system is done essentially the way you described it. So if somebody wants access to one of our tools, or a third party capability that we have inside, or they are a developer and they’re new, and they’re like “Oh, I need production access to do this thing”, or “I need that kind of access for this engineering system”, they actually open a pull request. And depending on the sensitivity of that entitlement, different people get tagged in to be able to approve that. And if it’s a very sensitive one, it’s gonna go all the way up to a VP. And if it’s extremely sensitive, we have to renew it every six months.
So the ability to do that, in some ways, is just baked into pull requests and the Git workflow already, which I think is really fantastic. So I think from an open source perspective, I think developers and maintainers could absolutely do something like that, have a maintainer file, and use a community-driven pull request approach to be able to do it. Whether we need to build something in addition, or on top of that, I think that’s a great question, and I would love folks smarter than me about open source maintainership and the socio kind of dynamics at play there to weigh in more than I would from a security perspective.
The maintainer file is a good start, I think. And you already have pull requests, so there’s no software needed to be written, really. It’s just sort of like [unintelligible 01:22:42.09] in a repository, everybody touches it… What do you think, Jerod? Is that a good start to something like that? Do you agree with that?
There are people smarter than me to answer this question… [laughter] No, I have only thought about it for a few minutes. It certainly makes sense in the context of people who agree that they want to do that. But I think there’s a lot of people that won’t want to do that, and so what do we do for them? Make them do it? Probably not. Give them tooling to do it? Maybe. But yeah, I mean, effectively, you’re vouching; you’re putting your name online for somebody else. And so at least then we have culpability in the case of bad vouch/good vouch. At least we know what how it went down, and it wasn’t just like usurped authority, it was actually provided authority. So I can see some positives. As Jacob said, they do it internally at GitHub, and so it can work inside of chains of authority… But open source projects and chains of authority are often at odds with each other.
We use pull requests for everything inside GitHub. That’s how we do decision documents, that’s how we do all sorts of things, through pull requests… Which is nice, because we have the ability to kind of see the changes, and trace the approvals… And it’s even how we do security exceptions.
[01:24:03.12] Very cool.
Plus your GitHub support requests are really cheap, you know? You guys are getting cheap over there.
That’s right. Employee discount. Employee discount on pull requests.
That’s right. “This one’s free.”
Use ‘em if you’ve got ’em.
Everybody probably thinks my developer green square chart is wild. “Wow, that person develops all the time.” actually, this is just the way we work at GitHub, So most people’ GitHub activity chart looks that way.
What’s left in terms of securing GitHub, or keeping it secure, whichever way you want to phrase it? You’d probably see “keeping it secure” versus “securing it”. But what else can we talk about that makes sense before we call this show done? What’s on your mind? What have we not asked you?
That’s a good question. I mean, we’ve covered a lot of the topics that I think are near and dear to my heart, certainly. I mean, I can probably talk about the work we do on the security team for another eight hours and not run out of things to talk about… But at a high level, we take that responsibility very seriously, that we talked about, being at the center of the developer ecosystem. It’s embedded into everything we do inside GitHub. It’s really great to see the partnership between the security team and the engineering teams and the product teams. It happens every day, all day. We’re side by side with our engineering teams, helping to build in security across the board…
And I think part of what we’re also excited about is the integration of AI into those capabilities, and what it’s going to do for not just being able to kind of have that there for the sake of it, but truly being able to make life easier for the developer, and remove some of that security, toil, and just regular toil from their plates, so they can focus on things they want to focus on, things that their teams and businesses want to focus on. And so at a high level, those are the things that we’re really focused on as a team, as a business… And I think it makes a lot of sense.
We appreciate it, Jacob. It’s been a lot of interesting conversation. I definitely am with you, I’m bullish on this attestation thing. I’m not bullish on how hard it is to say the word, but I do think it is a feature that should be highly leveraged to much success.
It’s not easy to spell either, but I agree.
Was there not a synonym? I mean, where’s the thesaurus? Can we pick something a little bit easier? [laughter]
Attest. Attestation.
This is not a test. Oh, it is?
It was a test. Yeah, you failed it, then you passed it. Jacob, thank you so much for taking time out of your day to just spit some security knowledge with us, take us through the ropes of what you’re doing there at GitHub… We obviously are massive fans of the platform and all the developers on there doing what they do… We appreciate you sharing your time. Thank you.
Thanks so much for having me. This was a great conversation.
Our transcripts are open source on GitHub. Improvements are welcome. 💚