You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CUDA version if Linux
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
Hi,
My job failed with an error message like "CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered".
Is this due to a shortage of GPU memory?
I ran the job on a server with two Quadro RTX 8000. Because I was allowed to use only one of the two GPUs, I ran the command below before running colabfold_batch. export CUDA_VISIBLE_DEVICES=0
My main command is below. nohup colabfold_batch Hexamer.faa Hexamer.ColabFold --num-recycle 3 > nohup.log 2>&1 &
Below is the whole "log.txt" file created within "Hexamer.ColabFold" directory.
2024-06-01 13:39:23,688 Running colabfold 1.5.5 (1648d2335943f9a483b6a803ebaea3e76162c788)
2024-06-01 13:39:23,887 Running on GPU
2024-06-01 13:39:24,307 Found 5 citations for tools or databases
2024-06-01 13:39:24,307 Query 1/1: Hexamer (length 5856)
2024-06-01 13:39:25,934 Setting max_seq=508, max_extra_seq=828
2024-06-01 13:59:21,926 Could not predict Hexamer. Not Enough GPU memory? INTERNAL: Failed to enqueue async memset operation: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2024-06-01 13:59:21,926 Done
Below is the whole "nohup log" file.
nohup: ignoring input
WARNING: You are welcome to use the default MSA server, however keep in mind that it's a
limited shared resource only capable of processing a few thousand MSAs per day. Please
submit jobs only from a single IP address. We reserve the right to limit access to the
server case-by-case when usage exceeds fair use. If you require more MSAs: You can
precompute all MSAs with `colabfold_search` or host your own API and pass it to `--host-url`
2024-06-01 13:39:23,688 Running colabfold 1.5.5 (1648d2335943f9a483b6a803ebaea3e76162c788)
2024-06-01 13:39:23,887 Running on GPU
2024-06-01 13:39:24,307 Found 5 citations for tools or databases
2024-06-01 13:39:24,307 Query 1/1: Hexamer (length 5856)
0%| | 0/150 [elapsed: 00:00 remaining: ?]
SUBMIT: 0%| | 0/150 [elapsed: 00:00 remaining: ?]
COMPLETE: 0%| | 0/150 [elapsed: 00:00 remaining: ?]
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:00 remaining: 00:00]
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:00 remaining: 00:00]
E0601 13:59:21.874842 93333 gpu_timer.cc:156] INTERNAL: Could not synchronize CUDA stream: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
E0601 13:59:21.874864 93333 gpu_timer.cc:162] INTERNAL: Error destroying CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
E0601 13:59:21.874866 93333 gpu_timer.cc:168] INTERNAL: Error destroying CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
E0601 13:59:21.895475 93333 se_gpu_pjrt_client.cc:634] Failed to query available memory for GPU 0
E0601 13:59:21.895972 93333 se_gpu_pjrt_client.cc:634] Failed to query available memory for GPU 1
2024-06-01 13:39:25,934 Setting max_seq=508, max_extra_seq=828
2024-06-01 13:59:21,926 Could not predict 2901385346_Hexamer. Not Enough GPU memory? INTERNAL: Failed to enqueue async memset operation: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2024-06-01 13:59:21,926 Done
The input was homohexamer with a total length of 5,856 aa.
A job with homopentamer of the same protein (4,880 aa) was finished successfully.
Thanks.
The text was updated successfully, but these errors were encountered:
Computational environment
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
Hi,
My job failed with an error message like "CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered".
Is this due to a shortage of GPU memory?
I ran the job on a server with two Quadro RTX 8000. Because I was allowed to use only one of the two GPUs, I ran the command below before running colabfold_batch.
export CUDA_VISIBLE_DEVICES=0
My main command is below.
nohup colabfold_batch Hexamer.faa Hexamer.ColabFold --num-recycle 3 > nohup.log 2>&1 &
Below is the whole "log.txt" file created within "Hexamer.ColabFold" directory.
Below is the whole "nohup log" file.
The input was homohexamer with a total length of 5,856 aa.
A job with homopentamer of the same protein (4,880 aa) was finished successfully.
Thanks.
The text was updated successfully, but these errors were encountered: