Time for an occassional update on the shared bigquery dataset all about the Apps Script ecosystem on Github, with reports available on looker. For details of how it all works see Gassypedia – public Google Apps Script on Github.
Bear in mind that we are just looking at Apps Script repos shared publicly on Github. I have no way of kowing if this summary is representative of what people are up to with Apps Script, or if the welcome increased activity we’re starting see over the past 6 months is due to more usage or just more github usage.
I’ll continue to update this every few months so we can see how Apps Script usage is progressing.
- Are the number of repos on github featuring Apps Script and developers using it increasing?
- Who are these developers, what are they working on and which oauth scopes, advanced services are they using ?
- How many webapps, datastudio connectors, add-ons are publicly shared on github?
- Which libraries are the most popular and who is using them?
- Which devs are hireable, and what does their Apps Script Github contribution look like?
- Because the report is on looker and publicly available, it’s easy to filter and drill down and find someone to hire with the exact experience you need. Or perhaps find a project exactly like yours to take a look at the source code.
- The data is a publicly available dataset on bigquery. You can use it to do your own queries and research. Please let me know if you’d like to feature any findings from the data here.
Summary
Summary takeaways
With only 6 months of data so far it’s probably premature to draw too many conclusions (after all this is a marathon, not a sprint) but compared to 2 months ago we have a healthy growth in all areas. When looking for trends it might be worthwhile to look at %age changes compared to the organic growth in shared code since last time.
- 7 % increase in number of developers detected
- 6% increas in number of repos shared
- 6% increase in number of manifests
Addons
- 10% increase in total number of Addons shared
- Sheets pushes Gmail into second most popular add-on type. Gmail numbers have been been pretty static throughout the entire measurement period
- 19% increase in Sheets (now first place)
- 16% increase in Chat (now in third place)
- Steady increases in all the others except calendar and meet (there’s still only 1)
Here’s the current data.
Although the detail of this report is focused on the Apps Script content we can see that those that are sharing Apps Script content also share a whole lot more of other stuff. To create this latest extract I had to look at the detail from over 3000 devs who’ve published Apps Script code, and analyze 100m files across almost 200k repos.
Only 2% of those repos were Apps Script related, so it does give an indication that many prolific developers are getting into Apps Script. With 180k followers and 3.5m stars between them, we can see that these 3000+ developers do have a lot of influence. In some of the detailed reports on Looker you can see who they are and what they are doing with Apps script.
Developer profiles
Here’s the first few developer profiles
Where are the Apps Script developers located ?
Number of developers mapped (if they have publicly shared their location on github)
Which repos contain Apps Script projects ?
These are organized by developer, then by each repo they own.
Here’s the first few entries in that report
What are the profiles of Apps Script manifest?
This time we’re looking at the manifest timezone, runtimes and clasp usage. This report for example is saying that clasp usage was detected for 24.2% of manifests (up 1% from last time). Worth bearing in mind that clasp users are most likely to also use github, so it’s a little odd that we have a much lower growth than we’d expect to see simply from organic growth. It’s possible of course that clasp usage is underreported if the clasp.json files are not committed to github along with the project.
Which oauth scopes are these scripts using ?
Looking at oauth scopes, we can see for example
- Google workspace has 9 repos containing 41 manifests (up 3 on last time) using at least one oauth scope
- in total there are 813 repos and 950 manifests where auth/script_external_request is used (7% up on last time).
Which advanced services are these scripts using ?
In a similar format, here’s an analysis of which advanced services are in use. Here we can see for example
- The most popular advanced service in total is drive, with 443 manifests (11% increase) in 417 repos (12% increase) using it
- Although the overall increase in advanced service usage is greater than the organic growth in App Script projects, the most prolific users (top of the list below) appear to have stopped adding them to new projects.
Who is creating datastudio connectors ?
This is an interesting inventory of datastudio connector usage showing not only the developer, but also the company creating the connector as well as its logo where available. As an example, we can see that
- Vibhor Kumar has 2 repos containing 28 data connector manifests
- Google Data Studio Developer relations reference the Meetup data connector in 9 manifests
Who is publishing their Add-ons to github ?
In a similar format, let’s see who is publishing which type of Add-ons. In this example
- Martin Hawksey has 3 repos containing the manifests for 7 Add-ons
- Cats has 8 Add-on manifests – all of them have a Gmail section and 7 have a Drive and a Calendar section
Note that ‘chat’ isn’t strictly speaking an Add-on, but I’ve included it here. Since it’s not an Add-on there isn’t a name in the manifest so for now I just lump them together. There’s a 22% increase in Chat projects.
Who is creating webapps ?
Here are the developers an number of manifests which feature webapps, along with some characteristics. As an example we can see
- Tom Fisher has 36 manifests which contain webapps
- 53% of all webapps are public (1% increase from last time)
- 81% of all webapps are executed as the user deploying (6% decrease from last time)
- 5% of all webapps originate in Japan
Who is referencing libraries ?
Here we can see who is using which libraries. In this example
- The most popular library is oauth2, with 235 distinct users (8% increase) using it in 360 manifests (9% increase)in 282 repos (7% increase).
- There are 19 different versions of oauth2 in use
- Andrew Roberts is using 21 libraries in 24 manifests
- Some libraries appear to have been cloned a few times
Who is creating libraries
Trying to identify who is creating the libraries is a difficult problem because the scriptId (which the manifest would reference as the libraryId) is not readily available in github. Here’s the strategy I use to have a shot at associating them
- If the manifest has a clasp file in the same folder, the scriptId can be found there. However, if a repo has been cloned with a clasp.json file in it, unless the dev has cleaned up, the scriptId could point to the original script project rather than the cloned project. This can lead to false positives – ie the clone of the project can be mistaken for the source project.
- Gasgit mantains an info.json file when it commits to github so the same methodology as clasp to identify the scriptId
- Only 21% of manifests have a related clasp or gasgit file. For the others, I assume the most popular usersymbol associated with the libraryId is the project name. I search all the known manifest paths with no previously assigned scriptId, I search for the best match for the usersymbol and assign the libraryId as its scriptId.
- Finally, most libraries referenced do not have their code shared on github
Other approaches
None of these methods are 100% reliable, but it’s the best I have at the moment. I can easily apply new methods retrospectively, so if I come up with something better I’ll rework the data over time. Ideally, if the manifest would contain its own scriptID, it would make things much simpler (although we’d still have the cloning issue).
Another improvement could be to use the script API to access the project by libraryId (where the script info is public) and like that we could try to match the email of the project owner with that of the github owner, but that again requires that the required info is the same and public in both the script API and github.
In any case watch this space for algorithm improvements.
Library creators
Here’s the currently deduced state of library producers. It’s certainly not exhaustive – my own 107 figure is probably quite accurate as all my projects use either gasgit or clasp to help with identification.
Conclusion
Google probably could produce the universe of scripts, developers and so on so we could see what proportion is open source. It would also help to build github integration into the IDE instead of relying on tools like clasp – which means stepping out of the IDE, or gasgit which means running or triggering an external script to push to github.
Running against the github API, which has a number of awkward rate limits, takes about a day and half. Most of the time is wait time – every now again there’s a half hour wait before another request is possible. By contrast, a cache only run takes just a few minutes. I don’t expect to run this more than once every couple of months month or so.
If you do want to contribute interesting queries or reports you’ve made off the big query dataset, ping me so we can share them.
In particular I’m looking for Looker reports that compare relative growth across time that I haven’t had the time to create, and it would great to see some network diagramming using a tool like gephi, NodeXL, D3 etc.
The collection and loading to bigquery of this data is straightfoward enough, and you can find the code at https://github.com/brucemcpherson/gitsplit