Skip to content

Reduce calls to Maven Search API #901

Closed
@lread

Description

The Situation

I was getting 403 errors on cljdoc requests to the maven search API.
But only for our integration test when running on CircleCI.

A 403 from the maven search API means an IP has been blacklisted.
I reached out to the Maven Central team and TLDR; they had deployed a change that wasn't working out and reverted it.

But...

In our conversation, they asked about our usage of their API.
I shared the kind of calls we make:

Cljdoc hits your APIs for two groupIds: org.clojure and com.turtlequeue.
I'll describe the requests we make for org.clojure, but the pattern is the same for com.turtlequeue.

In production, once per hour, we hit:

https://search.maven.org/solrsearch/select?q=g:org.clojure&core=gav&rows=0

If the returned numFound is the same as our last fetch, cljdoc sees this as an indication that no new libs/versions are available and stops (this is the best test for this we could come up with!).
As these groups change very infrequently, cljdoc normally stops here.

When the returned numFound differs from the last fetch (an hour ago, or if the server just started, and there is no previous fetch), cljdoc gets all available artifacts for the groupId via:

https://search.maven.org/solrsearch/select?q=g:org.clojure&start=0&rows=200&core=gav

It repeats until it has fetched all artifacts for the group; I just ran this locally on my dev box and see:

https://search.maven.org/solrsearch/select?q=g:org.clojure&start=200&rows=200&core=gav
https://search.maven.org/solrsearch/select?q=g:org.clojure&start=400&rows=200&core=gav
https://search.maven.org/solrsearch/select?q=g:org.clojure&start=600&rows=200&core=gav
https://search.maven.org/solrsearch/select?q=g:org.clojure&start=800&rows=200&core=gav
https://search.maven.org/solrsearch/select?q=g:org.clojure&start=1000&rows=200&core=gav
https://search.maven.org/solrsearch/select?q=g:org.clojure&start=1200&rows=200&core=gav
https://search.maven.org/solrsearch/select?q=g:org.clojure&start=1400&rows=200&core=gav
https://search.maven.org/solrsearch/select?q=g:org.clojure&start=1600&rows=200&core=gav
https://search.maven.org/solrsearch/select?q=g:org.clojure&start=1800&rows=200&core=gav

Then, cljdoc gets the pom description for the latest version of each artifact. From a local run on my dev box, I see:

https://search.maven.org/remotecontent?filepath=org/clojure/java.classpath/1.1.0/java.classpath-1.1.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.analyzer.js/0.1.0-beta5/tools.analyzer.js-0.1.0-beta5.pom
https://search.maven.org/remotecontent?filepath=org/clojure/clojurescript/1.11.132/clojurescript-1.11.132.pom
https://search.maven.org/remotecontent?filepath=org/clojure/data.priority-map/1.2.0/data.priority-map-1.2.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.logic/1.1.0/core.logic-1.1.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/data.json/2.5.0/data.json-2.5.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/algo.generic/1.0.1/algo.generic-1.0.1.pom
https://search.maven.org/remotecontent?filepath=org/clojure/google-closure-library-third-party/0.0-20230227-c7c0a541/google-closure-library-third-party-0.0-20230227-c7c0a541.pom
https://search.maven.org/remotecontent?filepath=org/clojure/pom.baseline/0.0.19/pom.baseline-0.0.19.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.logging/1.3.0/tools.logging-1.3.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.analyzer/1.2.0/tools.analyzer-1.2.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/clojure/1.12.0/clojure-1.12.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.rrb-vector/0.2.0/core.rrb-vector-0.2.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/pom.oss-deploy/0.0.19/pom.oss-deploy-0.0.19.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.deps/0.21.1449/tools.deps-0.21.1449.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.infer/0.6.0/core.typed.infer-0.6.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.unify/0.6.0/core.unify-0.6.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.annotator.jvm/0.8.0-alpha2/core.typed.annotator.jvm-0.8.0-alpha2.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.nrepl/0.2.13/tools.nrepl-0.2.13.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.checker.js/0.8.0-alpha2/core.typed.checker.js-0.8.0-alpha2.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.memoize/1.1.266/core.memoize-1.1.266.pom
https://search.maven.org/remotecontent?filepath=org/clojure/test.check/1.1.1/test.check-1.1.1.pom
https://search.maven.org/remotecontent?filepath=org/clojure/math.combinatorics/0.3.0/math.combinatorics-0.3.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/java.jdbc/0.7.12/java.jdbc-0.7.12.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.rt/0.6.0/core.typed.rt-0.6.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.async/1.6.681/core.async-1.6.681.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.contracts/0.0.6/core.contracts-0.0.6.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.macro/0.2.1/tools.macro-0.2.1.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.lib.core.async/0.8.0-alpha2/core.typed.lib.core.async-0.8.0-alpha2.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.match/1.1.0/core.match-1.1.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/test.generative/1.1.0/test.generative-1.1.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/spec.alpha/0.5.238/spec.alpha-0.5.238.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.specs.alpha/0.4.74/core.specs.alpha-0.4.74.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.trace/0.8.0/tools.trace-0.8.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/data.zip/1.1.0/data.zip-1.1.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/java.jmx/1.1.0/java.jmx-1.1.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/jvm.tools.analyzer/0.6.2/jvm.tools.analyzer-0.6.2.pom
https://[search.maven.org/remotecontent?filepath=org/clojure/core.typed.analyzer.jvm/0.8.0-alpha2/core.typed.analyzer.jvm-0.8.0-alpha2.pom](http://search.maven.org/remotecontent?filepath=org/clojure/core.typed.analyzer.jvm/0.8.0-alpha2/core.typed.analyzer.jvm-0.8.0-alpha2.pom)
https://search.maven.org/remotecontent?filepath=org/clojure/data.generators/1.1.0/data.generators-1.1.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/data.xml/0.2.0-alpha9/data.xml-0.2.0-alpha9.pom
https://search.maven.org/remotecontent?filepath=org/clojure/algo.monads/0.2.0/algo.monads-0.2.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.typed/0.6.0/core.typed-0.6.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.analyzer.common/0.8.0-alpha2/core.typed.analyzer.common-0.8.0-alpha2.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.lang.jvm/0.8.0-alpha2/core.typed.lang.jvm-0.8.0-alpha2.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.analyzer.jvm/1.3.0/tools.analyzer.jvm-1.3.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/math.numeric-tower/0.1.0/math.numeric-tower-0.1.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/data.fressian/1.1.0/data.fressian-1.1.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.deps.graph/1.1.90/tools.deps.graph-1.1.90.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.runtime.jvm/0.8.0-alpha2/core.typed.runtime.jvm-0.8.0-alpha2.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.emitter.jvm/0.1.0-beta5/tools.emitter.jvm-0.1.0-beta5.pom
https://search.maven.org/remotecontent?filepath=org/clojure/data.int-map/1.3.0/data.int-map-1.3.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.lib.clojure/0.8.0-alpha2/core.typed.lib.clojure-0.8.0-alpha2.pom
https://search.maven.org/remotecontent?filepath=org/clojure/data.avl/0.2.0/data.avl-0.2.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/data.csv/1.1.0/data.csv-1.1.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/google-closure-library/0.0-20230227-c7c0a541/google-closure-library-0.0-20230227-c7c0a541.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.gitlibs/2.5.197/tools.gitlibs-2.5.197.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.build/0.9.2/tools.build-0.9.2.pom
https://search.maven.org/remotecontent?filepath=org/clojure/data.finger-tree/0.1.0/data.finger-tree-0.1.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/data.codec/0.2.0/data.codec-0.2.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/buildtest/0.2.6/buildtest-0.2.6.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.namespace/1.5.0/tools.namespace-1.5.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.cli/1.1.230/tools.cli-1.1.230.pom
https://search.maven.org/remotecontent?filepath=org/clojure/clojure-contrib/1.0.0/clojure-contrib-1.0.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.deps.cli/0.11.72/tools.deps.cli-0.11.72.pom
https://search.maven.org/remotecontent?filepath=org/clojure/clojure-install/0.1.21/clojure-install-0.1.21.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.checker.jvm/0.8.0-alpha2/core.typed.checker.jvm-0.8.0-alpha2.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.typed-pom/0.8.0-alpha2/core.typed-pom-0.8.0-alpha2.pom
https://search.maven.org/remotecontent?filepath=org/clojure/java.data/1.2.107/java.data-1.2.107.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.reader/1.5.0/tools.reader-1.5.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/pom.contrib/1.2.0/pom.contrib-1.2.0.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.cache/1.1.234/core.cache-1.1.234.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.incubator/0.1.4/core.incubator-0.1.4.pom
https://search.maven.org/remotecontent?filepath=org/clojure/core.typed.analyzer.js/0.8.0-alpha2/core.typed.analyzer.js-0.8.0-alpha2.pom
https://search.maven.org/remotecontent?filepath=org/clojure/tools.deps.alpha/0.15.1254/tools.deps.alpha-0.15.1254.pom

So for org.clojure: 1 request per hour to check for any new lib/versions, then if there are changes (there rarely are), an additional 83 requests.
Not shown, but for com.turtlequeue, same idea, 1 request per hour to check, then if there are changes (there rarely are), an additional 5 requests.

They reviewed our usage and conveyed back:

We just checked your usage of the API, and it looks nice. You should not be blocked if you do not pass around 1000 requests in a span of 5 minutes. In this case, it was the little change we made that messed up the boundaries. Anyway, any effort you can make to reduce the number of requests would be appreciated. At the end, this is a free service, and any of these improvements help the community too :)

So we are well under that usage, but we could do much better by persisting the cache.
And if we persist the cache, we really only need to fetch pom descriptions for new versions.

TODO

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions