Description
The Situation
I was getting 403 errors on cljdoc requests to the maven search API.
But only for our integration test when running on CircleCI.
A 403 from the maven search API means an IP has been blacklisted.
I reached out to the Maven Central team and TLDR; they had deployed a change that wasn't working out and reverted it.
But...
In our conversation, they asked about our usage of their API.
I shared the kind of calls we make:
Cljdoc hits your APIs for two groupIds: org.clojure and com.turtlequeue.
I'll describe the requests we make for org.clojure, but the pattern is the same for com.turtlequeue.In production, once per hour, we hit:
https://search.maven.org/solrsearch/select?q=g:org.clojure&core=gav&rows=0
If the returned numFound is the same as our last fetch, cljdoc sees this as an indication that no new libs/versions are available and stops (this is the best test for this we could come up with!).
As these groups change very infrequently, cljdoc normally stops here.When the returned numFound differs from the last fetch (an hour ago, or if the server just started, and there is no previous fetch), cljdoc gets all available artifacts for the groupId via:
https://search.maven.org/solrsearch/select?q=g:org.clojure&start=0&rows=200&core=gav
It repeats until it has fetched all artifacts for the group; I just ran this locally on my dev box and see:
https://search.maven.org/solrsearch/select?q=g:org.clojure&start=200&rows=200&core=gav https://search.maven.org/solrsearch/select?q=g:org.clojure&start=400&rows=200&core=gav https://search.maven.org/solrsearch/select?q=g:org.clojure&start=600&rows=200&core=gav https://search.maven.org/solrsearch/select?q=g:org.clojure&start=800&rows=200&core=gav https://search.maven.org/solrsearch/select?q=g:org.clojure&start=1000&rows=200&core=gav https://search.maven.org/solrsearch/select?q=g:org.clojure&start=1200&rows=200&core=gav https://search.maven.org/solrsearch/select?q=g:org.clojure&start=1400&rows=200&core=gav https://search.maven.org/solrsearch/select?q=g:org.clojure&start=1600&rows=200&core=gav https://search.maven.org/solrsearch/select?q=g:org.clojure&start=1800&rows=200&core=gav
Then, cljdoc gets the pom description for the latest version of each artifact. From a local run on my dev box, I see:
https://search.maven.org/remotecontent?filepath=org/clojure/java.classpath/1.1.0/java.classpath-1.1.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.analyzer.js/0.1.0-beta5/tools.analyzer.js-0.1.0-beta5.pom https://search.maven.org/remotecontent?filepath=org/clojure/clojurescript/1.11.132/clojurescript-1.11.132.pom https://search.maven.org/remotecontent?filepath=org/clojure/data.priority-map/1.2.0/data.priority-map-1.2.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.logic/1.1.0/core.logic-1.1.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/data.json/2.5.0/data.json-2.5.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/algo.generic/1.0.1/algo.generic-1.0.1.pom https://search.maven.org/remotecontent?filepath=org/clojure/google-closure-library-third-party/0.0-20230227-c7c0a541/google-closure-library-third-party-0.0-20230227-c7c0a541.pom https://search.maven.org/remotecontent?filepath=org/clojure/pom.baseline/0.0.19/pom.baseline-0.0.19.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.logging/1.3.0/tools.logging-1.3.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.analyzer/1.2.0/tools.analyzer-1.2.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/clojure/1.12.0/clojure-1.12.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.rrb-vector/0.2.0/core.rrb-vector-0.2.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/pom.oss-deploy/0.0.19/pom.oss-deploy-0.0.19.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.deps/0.21.1449/tools.deps-0.21.1449.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.infer/0.6.0/core.typed.infer-0.6.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.unify/0.6.0/core.unify-0.6.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.annotator.jvm/0.8.0-alpha2/core.typed.annotator.jvm-0.8.0-alpha2.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.nrepl/0.2.13/tools.nrepl-0.2.13.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.checker.js/0.8.0-alpha2/core.typed.checker.js-0.8.0-alpha2.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.memoize/1.1.266/core.memoize-1.1.266.pom https://search.maven.org/remotecontent?filepath=org/clojure/test.check/1.1.1/test.check-1.1.1.pom https://search.maven.org/remotecontent?filepath=org/clojure/math.combinatorics/0.3.0/math.combinatorics-0.3.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/java.jdbc/0.7.12/java.jdbc-0.7.12.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.rt/0.6.0/core.typed.rt-0.6.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.async/1.6.681/core.async-1.6.681.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.contracts/0.0.6/core.contracts-0.0.6.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.macro/0.2.1/tools.macro-0.2.1.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.lib.core.async/0.8.0-alpha2/core.typed.lib.core.async-0.8.0-alpha2.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.match/1.1.0/core.match-1.1.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/test.generative/1.1.0/test.generative-1.1.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/spec.alpha/0.5.238/spec.alpha-0.5.238.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.specs.alpha/0.4.74/core.specs.alpha-0.4.74.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.trace/0.8.0/tools.trace-0.8.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/data.zip/1.1.0/data.zip-1.1.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/java.jmx/1.1.0/java.jmx-1.1.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/jvm.tools.analyzer/0.6.2/jvm.tools.analyzer-0.6.2.pom https://[search.maven.org/remotecontent?filepath=org/clojure/core.typed.analyzer.jvm/0.8.0-alpha2/core.typed.analyzer.jvm-0.8.0-alpha2.pom](http://search.maven.org/remotecontent?filepath=org/clojure/core.typed.analyzer.jvm/0.8.0-alpha2/core.typed.analyzer.jvm-0.8.0-alpha2.pom) https://search.maven.org/remotecontent?filepath=org/clojure/data.generators/1.1.0/data.generators-1.1.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/data.xml/0.2.0-alpha9/data.xml-0.2.0-alpha9.pom https://search.maven.org/remotecontent?filepath=org/clojure/algo.monads/0.2.0/algo.monads-0.2.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.typed/0.6.0/core.typed-0.6.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.analyzer.common/0.8.0-alpha2/core.typed.analyzer.common-0.8.0-alpha2.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.lang.jvm/0.8.0-alpha2/core.typed.lang.jvm-0.8.0-alpha2.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.analyzer.jvm/1.3.0/tools.analyzer.jvm-1.3.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/math.numeric-tower/0.1.0/math.numeric-tower-0.1.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/data.fressian/1.1.0/data.fressian-1.1.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.deps.graph/1.1.90/tools.deps.graph-1.1.90.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.runtime.jvm/0.8.0-alpha2/core.typed.runtime.jvm-0.8.0-alpha2.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.emitter.jvm/0.1.0-beta5/tools.emitter.jvm-0.1.0-beta5.pom https://search.maven.org/remotecontent?filepath=org/clojure/data.int-map/1.3.0/data.int-map-1.3.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.lib.clojure/0.8.0-alpha2/core.typed.lib.clojure-0.8.0-alpha2.pom https://search.maven.org/remotecontent?filepath=org/clojure/data.avl/0.2.0/data.avl-0.2.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/data.csv/1.1.0/data.csv-1.1.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/google-closure-library/0.0-20230227-c7c0a541/google-closure-library-0.0-20230227-c7c0a541.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.gitlibs/2.5.197/tools.gitlibs-2.5.197.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.build/0.9.2/tools.build-0.9.2.pom https://search.maven.org/remotecontent?filepath=org/clojure/data.finger-tree/0.1.0/data.finger-tree-0.1.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/data.codec/0.2.0/data.codec-0.2.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/buildtest/0.2.6/buildtest-0.2.6.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.namespace/1.5.0/tools.namespace-1.5.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.cli/1.1.230/tools.cli-1.1.230.pom https://search.maven.org/remotecontent?filepath=org/clojure/clojure-contrib/1.0.0/clojure-contrib-1.0.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.deps.cli/0.11.72/tools.deps.cli-0.11.72.pom https://search.maven.org/remotecontent?filepath=org/clojure/clojure-install/0.1.21/clojure-install-0.1.21.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.checker.jvm/0.8.0-alpha2/core.typed.checker.jvm-0.8.0-alpha2.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.typed-pom/0.8.0-alpha2/core.typed-pom-0.8.0-alpha2.pom https://search.maven.org/remotecontent?filepath=org/clojure/java.data/1.2.107/java.data-1.2.107.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.reader/1.5.0/tools.reader-1.5.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/pom.contrib/1.2.0/pom.contrib-1.2.0.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.cache/1.1.234/core.cache-1.1.234.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.incubator/0.1.4/core.incubator-0.1.4.pom https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.analyzer.js/0.8.0-alpha2/core.typed.analyzer.js-0.8.0-alpha2.pom https://search.maven.org/remotecontent?filepath=org/clojure/tools.deps.alpha/0.15.1254/tools.deps.alpha-0.15.1254.pom
So for org.clojure: 1 request per hour to check for any new lib/versions, then if there are changes (there rarely are), an additional 83 requests.
Not shown, but for com.turtlequeue, same idea, 1 request per hour to check, then if there are changes (there rarely are), an additional 5 requests.
They reviewed our usage and conveyed back:
We just checked your usage of the API, and it looks nice. You should not be blocked if you do not pass around 1000 requests in a span of 5 minutes. In this case, it was the little change we made that messed up the boundaries. Anyway, any effort you can make to reduce the number of requests would be appreciated. At the end, this is a free service, and any of these improvements help the community too :)
So we are well under that usage, but we could do much better by persisting the cache.
And if we persist the cache, we really only need to fetch pom descriptions for new versions.
TODO
- Raise awareness: log the number of requests we make per Maven Central download operation (done in Log maven API calls #966)
- Document Maven Central Team's response in our developer guide (done in Reduce Requests to Maven Central #967)
- Review their API to see if we can use it differently (I remember doing this a while back, but worth a revisit) (done, I think we are using their API the best way we can).
- Look into persisting the cache to further reduce hits (done in Reduce Requests to Maven Central #967)
- And, after that, hey, we should probably also be grabbing
io.github.clojure
groupId, that's wheretools.build
lives. (update: done in Allow docs to be built for io.github.clojure/tools.build #964)
Activity