-
Notifications
You must be signed in to change notification settings - Fork 499
v1.5.0
Sergey O edited this page Feb 20, 2023
·
20 revisions
One of the major updates in v1.5.0 is integrating AlphaFold v2.3.1 into ColabFold. This introduces a new fine-tuned model from Deepmind for multimer modeling. We enable this by default.
-
--model-type=
specify which model to use.- If
auto
,alphafold2_ptm
is selected for monomer inputs, andalphafold2_multimer_v3
is selected for complex (multimer) inputs. - Bonus: all models can be used for either monomer or multimer prediction.
- If
- bfloat16 is now enabled by default for both monomer and multimer models. For GPUs that have bfloat16 support, this should significantly reduce the VRAM used and make the computation at least 2X faster. Besides bfloat16 the other change is the fused triangle attention. These changes should allow inferences of much larger protein. (Note: due to slight numeric differences in computation, this may change the results slightly for low-confidence models.)
For multimer modeling, it has been shown by AF2Complex people that increasing the number of recycles can help dramatically. For multimers, the max number of recycles was increased from 3 to 20!
-
--num-recycle=
specify number of recycles to run.--recycle-early-stop-tolerance=
specify when to stop.- The tolerance is defined as the RMSD (difference in distance matrices, angstrom units) between recycles. If it drops below the specified value, the recycling will terminate.
- if not specified,
num-recycles=20 recycle-early-stop-tolerance=0.5
is used foralphafold2_multimer_v3
andnum-recycles=3 recycle-early-stop-tolerance=0.0
is used foralphafold2_ptm
.
-
--save-recycles
save models generated at all recycles.- if coupled with
--save-all
will also save the intermediate outputs between recycles as a pickle file.
- if coupled with
Though the ability to subsample MSAs and enable dropouts has been available in the advanced notebook since day one, given recent community efforts showing these options are useful, we now add support for this in the main notebook. See: AFsample, Alamo et al. and Wayment-Steele et al..
-
--random-seed=
Specify random seed. -
--num-seeds=
Number of seeds to try.- Will iterate from range(random_seed, random_seed+num_seeds)
-
--use-dropout
Activate dropouts during inference to sample from the uncertainty of the models. -
--max-seq
Number of sequence clusters to use.--max-extra-seq
Number of extra sequences to use.- These two options were previously set by
--max-msa="max-seq:max-extra-seq"
, but are now split up to be more user-friendly. - Reducing either option will make your model to be less certain about the prediction, and when combined with random seeds may allow sampling alternative conformations.
-
--disable-cluster-profile
for multimers we find reducing cluster size (max-seq) results in poor model quality due to more diverse profiles. Disabling profiles appears to fix this issue! We suggest using this flag in combination with --max-seq when introducing uncertainty in multimer sampling.
- These two options were previously set by
-
--num-relax=
Specify the number of top models to relax.--amber
flag by default will trigger ALL models to be relaxed. -
--recompile-padding=
Now accepts an integer, which specifies how much to pad each input by, instead of factor. This is now only used if more than a single input is provided for "batch" computation. -
--stop-at-score=[0,100]
As soon as one of the recycles or models or random seeds reaches the specified score, the job will terminate.- The metric used can be specified by the
--rank=[auto,plddt,multimer,ptm,iptm]
flag. For "auto", "multimer" is used for complexes and "plddt" is used for monomers. "multimer" metric is computed as80*iptm + 20*ptm
. Note, all metrics are now on a scale of 0 to 100.
- The metric used can be specified by the
-
--save-all
will output a pickled file of all output. When coupled with--save-recycles
will also save the outputs after each recycle! -
iptm
is now computed for alphafold2_ptm model, allowing for ranking bymultimer
oriptm
metric, for multimer inputs.
- ipTMscores and pTMscores were incorrectly computed if padding was used. The padded region was used in the computation. This only affects local users, as padding was disabled in Colab Notebook. Since padding was at most by factor of 1.1, this likely didn't have a big effect on the scores. The model quality/ranking is unaffected.
- If you used the monomer model (alphafold_ptm) option for modeling complexes. The first full-length sequence was not defined.
- v1.5.1
- bugfix --save-recycles/--save-all option was broken
- v1.5.2
- bugfix - same random seed was used between recycle, resulting in identical dropouts (if --use-dropouts was enabled).
- various modifications to reduce GPU RAM used and minimize memory leaks between recycles/models/inputs.