My domain is: We have 80 Million Domains in management (e.g. puppen.de)

I ran this command: we are using the acme client

We are currently ordering ~50 certificates per second, every second of the day. This worked fine for the last 2 months. We are now at ~55 Million active certificates but have noticed that the challengeAuthorization step duration has increased to ~15 seconds (from ~1 sec before).
With this we are unable to catch up with certificate issuing. Is there anything we can do to get the challenge Authorization time back down?

For single domains, 15 seconds is not much, but with the factor of scale we are running into issues.

@Bastian.TI, welcome to the community!

If the certificate request process is fully parallelized, then is should not be matter how long getting one certificate takes. Are you aware of any interdependency in your process of acquiring different certificates?

3 Likes

Can you be more specific about which ACME client? There are a lot of options. More specifics about how exactly you're running your renewal processes might be helpful too.

5 Likes

It might have something to do with a more sequential approach which is I believe introduced some short time ago. Earlier the authorization server would attempt the challenge at the primary location and secondary locations simultaneously, but I believe currently the validation server first tries the primary location and only if that one succeeds, the secondary validation locations are 'triggered' to attempt their validation.

That said, I agree with Bruncsak: the duration shouldn't really matter. 1 second, 15 seconds or 30 seconds, doesn't matter. It might need more powerful hardware though, as it would require more parallel issuances. But I don't think this is something you or Let's Encrypt has influence over. Perhaps it might be faster if LE would also increase their processing power, but as LE already issues more than 5 million certs daily, I'd say a 15 seconds wait is just fine.

2 Likes

Not even that. Each process is practically idling for 15 seconds (the system call is sleep in a loop) for each issuance. That does not require more hardware. It is just question of reorganizing the scheduler and removing interdependencies.

2 Likes

Depending on the software (how efficiently it's coded, the language et c.), it might need more memory.

2 Likes

Can you check your logs to determine when this delay first started appearing? Also can you pro-actively share any identifying info that your rate limit exemptions are tied to?

My guess is that you got hit with some of the newer throttling systems that LetsEncrypt have been testing. Identifying the time will help them identify the exact code or network configuration change.

Their staff will likely chime in here about that in a bit, and they may be able to address this with an account based fix (hence the need for identifying information). If you're able to share anything identifiable, you'll save the delay of an inevitable back & forth to share these details.

4 Likes

Hi Bastian!

As the others have said, we're going to need a lot more information in order to be helpful. What ACME client are you using? How many different hosts/IPs are you using to make requests to our API? What is/are your account ID(s)? What are your rate limit overrides? What do your logs show happening during that 15 seconds?

It's true that we've been having to deal with some overload conditions recently. In particular, we've had to establish some global requests-per-IP-per-second rate limits, to ensure that some clients which have been flooding us with thousands of requests per second don't degrade service for everyone else. We'll need to know the answers to the questions above in order to diagnose if those rate limits are causing your latency, or if the cause is something else.

6 Likes

I would also suggest spreading your certificates across as many CAs as possible, so at very least consider using Google Trust Services and load balancing your certificate orders.

I assume your ACME client is custom and your ACME responses (HTTP?) are also using a custom system. Check that you are definitely responding fast enough. in my test to time curl -I http://puppen.de/.well-known/acme-challenge/test the roundtrip takes 1.27s, and http validation will require multiple requests (recently increasingly so), albeit with some level of parallelism, but the results all have to coalesce to get a final verdict.

Review your renewal batching - 10 certs per second is 77M per 90 days, unless I've worked that our wrong(?).

If you are mainly parking domains consider whether your overall issuance could be reduced using wildcards (puppen.de has many different certs).

If you control DNS for these domains consider using DNS validation if you don't already. I can suggest an architecture for that if interested.

4 Likes