Using precomputed MSA and PDB files for running massive 3d structure prediction #274

berkeucar · 2024-11-08T20:28:37Z

Hello,

I have a fasta file containing thousands of peptide sequences. I wanted to predict their 3D structures using LocalColabFold 1.5.5 installed in an HPC cluster and I have access to GPU clusters as well. Now, I was successfully able to generate PDB & MSA files by following the post/issue: sokrypton/ColabFold#563.

However, as I mentioned, I have multiple peptides in my fasta file and I would like to use my GPU access to produce 3D structure generations with colabfold_batch comment, using the PDB & MSA files I precomputed using the HPC cluster. This was asked in the attached issue but seems to fly under the radar.

Currenty, does LocalColabFold support massive prediction of peptides with the --pdb-hit-file flag?

YoshitakaMo · 2024-11-09T07:54:51Z

Did this not work?: sokrypton/ColabFold#563 (comment)

I use colabfold_batch --pdb-hit-file foobar_pdb100_230517.m8 --local-pdb-path /home/database/pdb_mmcif/mmcif_files foobar.a3m <outputdir> for the prediction. /home/database/pdb_mmcif/mmcif_files contains more than 220,000 flattened 4-letter mmCIF files.

berkeucar · 2024-11-10T21:55:52Z

So, basically, I appended all my peptide sequences together, using ":" as the separator between them. Let's say that file's name is tmp.fasta.
I obtained the files tmp.a3m and tmp_pdb100_230517.m8 from colabfold_search command. Then I was running the following code:
colabfold_batch \ --amber \ --templates \ --num-recycle 3 \ --use-gpu-relax \ --pdb-hit-file tmp_pdb100_230517.m8 \ --local-pdb-path my_local_pdb/pdb_mmcif/mmcif_files \ --random-seed 0 \ --zip \ tmp_pdb100_230517.m8 \ output_folder

and I received the following error:

Could not generate input features tmp: string index out of range
= generate_input_feature(query_seqs_unique, query_seqs_cardinality, unpaired_msa, paired_msa,
   File "localacolabfold_env/bin/lib/python3.10/site-packages/colabfold/batch.py", line 1035, in generate_input_feature
     features_for_chain[protein.PDB_CHAIN_IDS[chain_cnt]] = feature_dict
 IndexError: string index out of range

YoshitakaMo · 2024-11-11T01:28:50Z

Please show me your commit hash number. For example, ColabFold on my machine has 1ccca5a53d20c909f3ccf8a4b81df804e6717cb1. This is the commit on Jul. 23, 2024.

2024-11-11 00:18:05,900 Running colabfold 1.5.5 (1ccca5a53d20c909f3ccf8a4b81df804e6717cb1)
2024-11-11 00:18:06,190 Running on GPU
2024-11-11 00:18:06,859 Found 5 citations for tools or databases
...
...
...

If your commit hash number is old, updating LocalColabFold will fix this issue.

berkeucar · 2024-11-13T20:57:09Z

Just in case, I freshly installed localcolabfold with the script install_colabfold_batch_linux.sh. Now, I cannot even obtain the msa files it gets stuck in MSA of the first peptide in the batch:

k-mer similarity threshold: 110
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 238
Target db start 1 to 209335862
[>                                                                ] 1.27% 4 eta 0s

I am running this on CPUs and my gcc version is 9.4.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using precomputed MSA and PDB files for running massive 3d structure prediction #274

Using precomputed MSA and PDB files for running massive 3d structure prediction #274

berkeucar commented Nov 8, 2024

YoshitakaMo commented Nov 9, 2024

berkeucar commented Nov 10, 2024 •

edited

Loading

YoshitakaMo commented Nov 11, 2024 •

edited

Loading

berkeucar commented Nov 13, 2024 •

edited

Loading

Using precomputed MSA and PDB files for running massive 3d structure prediction #274

Using precomputed MSA and PDB files for running massive 3d structure prediction #274

Comments

berkeucar commented Nov 8, 2024

YoshitakaMo commented Nov 9, 2024

berkeucar commented Nov 10, 2024 • edited Loading

YoshitakaMo commented Nov 11, 2024 • edited Loading

berkeucar commented Nov 13, 2024 • edited Loading

berkeucar commented Nov 10, 2024 •

edited

Loading

YoshitakaMo commented Nov 11, 2024 •

edited

Loading

berkeucar commented Nov 13, 2024 •

edited

Loading