Timeout api.colabfold.com server #606

rukibuki · 2024-04-17T07:19:30Z

Lately, when we try to submit multiple jobs (max 50 per run) to api.colabfold.com (via the alphapulldown package using mmseqs2) we are hit with:
W0416 18:39:07.143900 139828968597312 colabfold.py:86] Timeout while submitting to MSA server. Retrying...

for all of the runs and none of them are able to connect within hours (I canceled the run after 10 hours).

While such a job is running, and I type: "nmap -Pn -p 80 (or 443) api.colabfold.com" it shows that port 80 and 443 are filtered.
PORT STATE SERVICE
80/tcp filtered http

We have been informed by our IT department that they are not filtering port 443 or 80, which is also what we can see when the above job is not running, then we get (here for 443 but same for 80):

PORT STATE SERVICE
443/tcp open https

Today I tried submitting 50 jobs again, same problem, but if I instead submitted one job at a time the server did not throw the timeout error.

So is there a maximum number of jobs we can submit simultaneously? if so what is that number?
Is it maybe possible to have our IP whitelisted to allow us to submit larger jobs, than whatever the limit is?

please let me know if you need any other information from me.

milot-mirdita · 2024-04-17T07:29:25Z

Could you share (or email me) the IP from where you are sending?

Generally it should not time-out but either return a 403 or 429 HTTP error (instantly) if you are banned or temporarily banned.

rukibuki · 2024-04-17T08:39:20Z

yes certainly, the IP out from us should be:
130.225.18.30

milot-mirdita · 2024-04-17T08:49:52Z

I don’t think I have had to ban a danish IP before. I don’t think that’s the problem (not in front of a computer to check right now though).

what does dig api.colabfold.com (when executed from the failing compute node) say?

it’s most likely a DNS error, not idea why though

rukibuki · 2024-04-17T08:54:33Z

@vader9 ~]$ dig api.colabfold.com

; <<>> DiG 9.11.36-RedHat-9.11.36-11.el8_9 <<>> api.colabfold.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62568
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;api.colabfold.com. IN A

;; ANSWER SECTION:
api.colabfold.com. 60 IN A 147.46.145.74

;; Query time: 473 msec
;; SERVER: 10.83.252.137#53(10.83.252.137)
;; WHEN: Wed Apr 17 10:54:02 CEST 2024
;; MSG SIZE rcvd: 62

milot-mirdita · 2024-04-17T12:06:24Z

I don't see any reason why it should time-out. The DNS response also looks fine.

Does curl https://api.colabfold.com/queue work?

rukibuki · 2024-04-18T06:02:23Z

[rtk@vader9 ~]$ curl https://api.colabfold.com/queue
{"queued":0}

So yes it seems to work fine.
I have now tried submitting 5 runs at a time without any problems. I might edge this upwards every time to see where the limit is.

It has nothing to do with your local IT department. like it look like a potential DDOS attack or something like that when I submit 50 jobs at once? Or is that maybe standard practice or maybe even a low amount of runs compared to others?

milot-mirdita · 2024-04-18T06:40:42Z

If you submit 50 jobs at once you should start getting HTTP 429 error that ColabFold will understand to automatically retry later.

It should never time out. That behavior is very puzzling.

I have not asked our network management team, but I would not expect this to be an issue, since there are heavier API users than this.

rukibuki · 2024-04-18T12:57:58Z

we normally saw this:
I0403 14:05:12.905370 140497882613568 objects.py:208] input is features/Q96DT5.a3m

0%| | 0/150 [elapsed: 00:00 remaining: ?]
SUBMIT: 0%| | 0/150 [elapsed: 00:00 remaining: ?]E0403 14:05:14.012090 140497882613568 colabfold.py:164] Sleeping for 8s. Reason: RATELIMIT
E0403 14:05:22.915350 140497882613568 colabfold.py:164] Sleeping for 5s. Reason: RATELIMIT

but if we are not among the top heavy api-users with 50 calls, then I will try to increase the 5 runs to maybe 10 and see if that works. 10 should be more than enough for now.

milot-mirdita · 2024-04-18T17:08:36Z

Ah, that makes more sense. That's not a timeout, but a rate limit and intended behavior.

So how the system currently works is that you get 20 "tokens" for job submissions and the tokens are replenished at a rate of 0.01111111111111 per second (or 1 per 90s), where you can submit another job. It doesn't replenish above 20.

Thus you can use the API for 40-60 MSAs per hour.

We have the colabfold_search script for local searches to run more MSAs on your own resources. I am not sure how AlphaPulldown handles local searches, but I think they also have something to run MMseqs2 locally.

rukibuki · 2024-04-18T17:14:04Z

So What I wrote in my last comment was what we normally saw when submitting 50 runs at one time. But what we got recently was what I wrote in the original post, which was a timeout a run that was left idle for a long time. Sorry for the confusion!

But what you just wrote with 20 tokens and replenish makes a lot of sense for what we normally see.

But for now the timeout problem is not an issue as long as we don't go to high in run numbers.

EdHuttlin · 2024-08-23T14:14:43Z

I've actually been running into a similar issue myself. When I try to run ColabFold, I get a timeout error when trying to contact the MSA server. The text I see in the log for each job is "Timeout while submitting to the MSA server. Retrying...." This problem started for me abruptly a couple of weeks ago and I've been trying to figure out what the issue is.

I've done a number of the troubleshooting steps suggested above and in other similar threads. When I try "curl https://api.colabfold.com" I also get a timeout error: "curl: (7) Failed connect to api.colabfold.com:443; connection timed out". I see this behavior when I'm on the compute node that has been running the jobs (IP 134.174.140.55). When I run this curl command from other locations on the same network, the command works properly, so it's not a general problem.

Here's the output of dig api.colabfold.com:

dig api.colabfold.com

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.16.tuxcare.els1 <<>> api.colabfold.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 65456
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1220
;; QUESTION SECTION:
;api.colabfold.com. IN A

;; ANSWER SECTION:
api.colabfold.com. 29 IN A 147.46.145.74

;; Query time: 0 msec
;; SERVER: 134.174.141.2#53(134.174.141.2)
;; WHEN: Fri Aug 23 09:51:54 EDT 2024
;; MSG SIZE rcvd: 62

I'm not seeing an obvious problem. I do note that the IP address in the SERVER field is different from the public IP I find for the compute node I'm on - I assume this has something to do with how the cluster I'm using has been configured.....

Any suggestions you might have would be appreciated!

mrbatchelor · 2024-08-29T10:53:42Z

Hi. I also have this problem using colabfold_batch.
It was working until last week.

Now:

2024-08-29 11:41:26,919 Error while fetching result from MSA server. Retrying... (1/5)
2024-08-29 11:41:26,920 Error: HTTPSConnectionPool(host='api.colabfold.com', port=443): Read timed out.
2024-08-29 11:41:38,252 Timeout while fetching result from MSA server. Retrying...

; <<>> DiG 9.18.28-0ubuntu0.20.04.1-Ubuntu <<>> api.colabfold.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8207
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;api.colabfold.com. IN A

;; ANSWER SECTION:
api.colabfold.com. 1419 IN A 147.46.145.74

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Thu Aug 29 11:43:54 BST 2024
;; MSG SIZE rcvd: 62

curl https://api.colabfold.com/queue
{"queued":0}

Any help gratefully received!

fglaser · 2024-08-29T11:14:02Z

Same here... I reinstalled and still the same.

curl https://api.colabfold.com/queue
{"queued":0}

Any suggesition will be highly appreciated,
Fabian

milot-mirdita · 2024-08-29T11:43:23Z

Something is definitely wrong on our side, I get single-digit kilobyte/s download speeds from the server currently. I will try to resolve this with our IT.

fglaser · 2024-08-29T11:45:15Z

Ok thanks a lot for your very quick answer!! Fabian Fabian Glaser, PhD Technion Center for Structural Biology (TCSB) - Computational Section, Head Technion Human Health Initiative (THHI) Technion - Israel Institute of Technology, Haifa, Israel ***@***.*** +972 733783701

…

On 29 Aug 2024, at 14:43, Milot Mirdita ***@***.***> wrote: Something is definitely wrong on our side, I get single-digit kilobyte/s download speeds from the server currently. I will try to resolve this with our IT. — Reply to this email directly, view it on GitHub <#606 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACSBVSFTHNELTIS2ST3S25DZT4CPDAVCNFSM6AAAAABGKXDQO6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJXGQYTCMBRGM>. You are receiving this because you commented.

ctueting · 2024-11-20T07:34:25Z

Hi all,

since yesterday, I have the same issue. I am running localcolabfold on one of our clusters. I started 5 predictions, 4 failed and one finished as expected. Today, all 3 predictions failed with the time-out error.

This is the error log:
Could not get MSA/templates for Pex5TPR__Pcs60_NTer_PTS1: HTTPSConnectionPool(host='api.colabfold.com', port=443): Read timed out.

I tried the following suggestions, to identify the issue:

cryosparc_user@pippin:~$ curl https://api.colabfold.com/queue
{"queued":0}
cryosparc_user@pippin:~$ nslookup -query=A api.colabfold.com
Server:		127.0.0.53
Address:	127.0.0.53#53

Non-authoritative answer:
Name:	api.colabfold.com
Address: 205.185.124.98

cryosparc_user@pippin:~$ nslookup -query=AAAA api.colabfold.com
Server:		127.0.0.53
Address:	127.0.0.53#53

Non-authoritative answer:
*** Can't find api.colabfold.com: No answer

cryosparc_user@pippin:~$ nslookup -query=A api.colabfold.com 1.1.1.1
Server:		1.1.1.1
Address:	1.1.1.1#53

Non-authoritative answer:
Name:	api.colabfold.com
Address: 205.185.124.98

cryosparc_user@pippin:~$ nslookup -query=AAAA api.colabfold.com 1.1.1.1
Server:		1.1.1.1
Address:	1.1.1.1#53

Non-authoritative answer:
*** Can't find api.colabfold.com: No answer

cryosparc_user@pippin:~$ traceroute api.colabfold.com
traceroute to api.colabfold.com (205.185.124.98), 64 hops max
  1   141.48.22.62  0,484ms  0,284ms  0,266ms
  2   192.168.140.251  0,817ms  0,480ms  0,326ms
  3   141.48.25.22  1,109ms  0,675ms  0,807ms
  4   188.1.35.69  7,900ms  7,810ms  7,785ms
  5   193.178.185.34  8,414ms  *  8,315ms
  6   184.104.198.118  19,452ms  19,710ms  20,098ms
  7   *  *  *
  8   *  *  184.104.198.246  25,918ms
  9   184.105.81.24  88,453ms  88,093ms  *
 10   *  *  *
 11   184.105.213.2  122,008ms  *  *
 12   *  184.104.199.41  126,645ms  *
 13   72.52.92.42  137,624ms  *  *
 14   184.104.194.82  144,997ms  145,254ms  145,154ms
 15   *  *  *
 16   205.185.124.98  145,656ms  145,284ms  145,316ms
cryosparc_user@pippin:~$ ping -c 1 api.colabfold.com
PING api.colabfold.com (205.185.124.98) 56(84) bytes of data.
64 bytes from 205.185.124.98 (205.185.124.98): icmp_seq=1 ttl=47 time=145 ms

--- api.colabfold.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 145.175/145.175/145.175/0.000 ms

But based on the information found in this thread, this looks "normal".

Is there any issue on the API side and I just have to wait?

Best
Christian

rachitk mentioned this issue Aug 29, 2024

Timing out when attempting to make a prediction? #646

Open

YoshitakaMo mentioned this issue Aug 30, 2024

Suddenly no gpu deteced, ratelimit, and no msa output... YoshitakaMo/localcolabfold#252

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout api.colabfold.com server #606

Timeout api.colabfold.com server #606

rukibuki commented Apr 17, 2024

milot-mirdita commented Apr 17, 2024

rukibuki commented Apr 17, 2024

milot-mirdita commented Apr 17, 2024

rukibuki commented Apr 17, 2024

milot-mirdita commented Apr 17, 2024

rukibuki commented Apr 18, 2024 •

edited

Loading

milot-mirdita commented Apr 18, 2024

rukibuki commented Apr 18, 2024

milot-mirdita commented Apr 18, 2024

rukibuki commented Apr 18, 2024

EdHuttlin commented Aug 23, 2024

mrbatchelor commented Aug 29, 2024

fglaser commented Aug 29, 2024

milot-mirdita commented Aug 29, 2024 •

edited

Loading

fglaser commented Aug 29, 2024 via email

ctueting commented Nov 20, 2024

Timeout api.colabfold.com server #606

Timeout api.colabfold.com server #606

Comments

rukibuki commented Apr 17, 2024

milot-mirdita commented Apr 17, 2024

rukibuki commented Apr 17, 2024

milot-mirdita commented Apr 17, 2024

rukibuki commented Apr 17, 2024

milot-mirdita commented Apr 17, 2024

rukibuki commented Apr 18, 2024 • edited Loading

milot-mirdita commented Apr 18, 2024

rukibuki commented Apr 18, 2024

milot-mirdita commented Apr 18, 2024

rukibuki commented Apr 18, 2024

EdHuttlin commented Aug 23, 2024

mrbatchelor commented Aug 29, 2024

fglaser commented Aug 29, 2024

milot-mirdita commented Aug 29, 2024 • edited Loading

fglaser commented Aug 29, 2024 via email

ctueting commented Nov 20, 2024

rukibuki commented Apr 18, 2024 •

edited

Loading

milot-mirdita commented Aug 29, 2024 •

edited

Loading