Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suddenly no gpu deteced, ratelimit, and no msa output... #252

Open
ohjeyy93 opened this issue Aug 27, 2024 · 13 comments
Open

Suddenly no gpu deteced, ratelimit, and no msa output... #252

ohjeyy93 opened this issue Aug 27, 2024 · 13 comments

Comments

@ohjeyy93
Copy link

Hello local colabolfd team,

I've been using local colabfold for a while. However, suddenly since few hours ago I've been having problems such as no gpu deteced, ratelimit, and no msa output... I tried reinstalling colabfold but it didn't solve the problem...My gpu runs fine on other tools so I do not think it is problem with the gpu...

2024-08-27 13:17:29,133 Running colabfold 1.5.5 (fdf3b235b88746681c46ea12bcded76ecf8e1f76)
2024-08-27 13:17:29,198 WARNING: no GPU detected, will be using CPU
2024-08-27 13:17:30,144 Found 8 citations for tools or databases
2024-08-27 13:17:30,144 Query 1/1: fl_rd_fl_rd_2-7_1_0_3fixed.fa1 (length 306)
2024-08-27 13:17:47,379 Could not get MSA/templates for fl_rd_fl_rd_2-7_1_0_3fixed.fa1: HTTPSConnectionPool(host='api.colabfold.com', port=443): Read timed out.
Traceback (most recent call last):
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/site-packages/urllib3/response.py", line 748, in _error_catcher
yield
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/site-packages/urllib3/response.py", line 873, in _raw_read
data = self._fp_read(amt, read1=read1) if not fp_closed else b""
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/site-packages/urllib3/response.py", line 856, in _fp_read
return self._fp.read(amt) if amt is not None else self._fp.read()
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/http/client.py", line 460, in read
return self._read_chunked(amt)
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/http/client.py", line 592, in _read_chunked
value.append(self._safe_read(chunk_left))
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/http/client.py", line 631, in _safe_read
data = self.fp.read(amt)
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/socket.py", line 705, in readinto
return self._sock.recv_into(b)
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/ssl.py", line 1307, in recv_into
return self.read(nbytes, buffer)
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/ssl.py", line 1163, in read
return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/site-packages/colabfold/batch.py", line 1467, in run
= get_msa_and_templates(jobname, query_sequence, a3m_lines, result_dir, msa_mode, use_templates,
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/site-packages/colabfold/batch.py", line 778, in get_msa_and_templates
a3m_lines_mmseqs2, template_paths = run_mmseqs2(
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/site-packages/colabfold/colabfold.py", line 295, in run_mmseqs2
tar.extractall(path=TMPL_PATH)
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/tarfile.py", line 2264, in extractall
self._extract_one(tarinfo, path, set_attrs=not tarinfo.isdir(),
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/tarfile.py", line 2327, in _extract_one
self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/tarfile.py", line 2410, in _extract_member
self.makefile(tarinfo, targetpath)
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/tarfile.py", line 2463, in makefile
copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/tarfile.py", line 252, in copyfileobj
buf = src.read(bufsize)
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/tarfile.py", line 526, in read
buf = self._read(size)
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/tarfile.py", line 544, in _read
buf = self.fileobj.read(self.bufsize)
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/site-packages/urllib3/response.py", line 949, in read
data = self._raw_read(amt)
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/site-packages/urllib3/response.py", line 872, in _raw_read
with self._error_catcher():
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/contextlib.py", line 153, in exit
self.gen.throw(typ, value, traceback)
File "/home/jehoon/colabfold2/localcolabfold/colabfold-conda/lib/python3.10/site-packages/urllib3/response.py", line 753, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.") from e # type: ignore[arg-type]
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='api.colabfold.com', port=443): Read timed out.
2024-08-27 13:17:47,394 Done

@EmmaZetongZhao
Copy link

Having the same issue.

@dezhi0730
Copy link

I also have the same issue.

HTTPSConnectionPool(host='api.colabfold.com', port=443): Read timed out.

@YoshitakaMo
Copy link
Owner

Probably this issue is related to these threads:

I hope the server issue will be fixed soon.

@milot-mirdita
Copy link
Contributor

Please check if the issue is still happening at everyone who had issues. Our IT claims to have resolved the issue, it might take a while to propagate though.

@dezhi0730
Copy link

Please check if the issue is still happening at everyone who had issues. Our IT claims to have resolved the issue, it might take a while to propagate though.

As for me, the issue has now been resolved. Thanks for your magic🪄.

@milot-mirdita
Copy link
Contributor

Thanks a lot! I’d appreciate feedback from more people in this thread if this is resolved, as it didn’t seem very deterministic when this happens.

@Drew-Thomson
Copy link

I'm still seeing this error (Read timed out) as of just now. It cropped up yesterday afternoon sometime

@milot-mirdita
Copy link
Contributor

@Drew-Thomson could you please run:

tracepath api.colabfold.com

@Drew-Thomson
Copy link

Thanks for looking at this!

I get:

(base) drew@obelisk:~$ tracepath api.colabfold.com
1?: [LOCALHOST] pmtu 1500
1: router-221.chem.gla.ac.uk 7.771ms asymm 2
1: router-221.chem.gla.ac.uk 1.309ms asymm 2
2: 130.209.2.1 0.373ms
3: 130.209.1.70 0.430ms asymm 4
4: no reply
5: no reply
6: no reply
7: no reply
8: no reply
9: no reply
10: no reply
11: no reply
12: no reply
13: no reply
14: no reply
15: no reply
16: no reply
17: no reply
18: no reply
19: no reply
20: no reply
21: no reply
22: no reply
23: no reply
24: no reply
25: no reply
26: no reply
27: no reply
28: no reply
29: no reply
30: no reply
Too many hops: pmtu 1500
Resume: pmtu 1500

  • seems consistent with the behaviour when trying to run the MSA

@rachitk
Copy link

rachitk commented Aug 30, 2024

@milot-mirdita I unfortunately am also still having issues - my tracepath output is essentially the same as @Drew-Thomson above but with no intermediate paths or replies (all no reply):

1?: [LOCALHOST] pmtu 1500
1: no reply
2: no reply
3: no reply
4: no reply
5: no reply
6: no reply
7: no reply
8: no reply
9: no reply
10: no reply
11: no reply
12: no reply
13: no reply
14: no reply
15: no reply
16: no reply
17: no reply
18: no reply
19: no reply
20: no reply
21: no reply
22: no reply
23: no reply
24: no reply
25: no reply
26: no reply
27: no reply
28: no reply
29: no reply
30: no reply
Too many hops: pmtu 1500
Resume: pmtu 1500

@milot-mirdita
Copy link
Contributor

I have implemented a workaround until our IT can fully solve this.
Please check that ping -c 1 api.colabfold.com shows 205.185.124.98 as IP and not 147.46.145.74.
If its still the later, then please try again in a few minutes when the DNS has expired from cache.

Afterwards, colabfold's MSA server should work in a reasonable speed again

@Drew-Thomson
Copy link

Can confirm that ping gives 205.185.124.98.

I've just started a colabfold job and it has run the MSA and is producing models. Thank you so much for you help here- very much appreciated!

@rachitk
Copy link

rachitk commented Aug 30, 2024

@milot-mirdita Thank you so much for all of the work! Everything seems to be working in terms of the MSA now.

To confirm, when I run the command, I get the following (indicating that the new IP has replaced the old in DNS):

PING api.colabfold.com (205.185.124.98) 56(84) bytes of data.
64 bytes from 205.185.124.98 (205.185.124.98): icmp_seq=1 ttl=40 time=77.6 ms

--- api.colabfold.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 77.638/77.638/77.638/0.000 ms

When I run colabfold using the provided Docker container, it seems to get past the stage of querying the API for MSA.

Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants