To use pLM-Repeat, you first need to set up the python environment.
- Create a new conda environment:
conda create -p your_path_to_new_env python=3.9
conda activate your_path_to_new_env
- Install dependencies:
pip install -r requirement.txt
cd path_to_plmrepeat
pip install -e .
The workflow of pLM-Repeat can be briefly described as follows:
- Perform self-sequence alignment using pLM-BLAST (Kaminski et al., Bioinformatics, 2023). For more details on pLM-BLAST, please check the reference and its repository.
- Add transitive traces
- Build the score matrix based on all traces
- Estimate possible lengths of repeat via the cumulative score to the diagonal
- Determine the positions of the representative repeat using a sliding window with the estimated length
- Compute a weighted embedding of the representative repeat
- Conduct another round of pLM-BLAST search between representative embedding and full-length sequence embedding to extract repeat instances
- Output alignment (score) matrix, repeat ranges, and multiple sequence alignment (experimental)
Run pLM-Repeat to detect repeats
cd path_to_plmrepeat
plmrepeat-run --emb path_to_embedding_file --seq path_to_sequence_file --out path_to_output_dir --transitivity --draw
e.g. plmrepeat-run --emb ./example/2QJ6_A.emb --seq ./example/2QJ6_A.fasta --out ./example/2QJ6_A/ --transitivity --draw
See help information
plmrepeat-run --help
The colab demo version is available here: https://colab.research.google.com/drive/1ouBwciiXy7HPnddut15JAGAmREWaqaZ7. In this colab notebook, you can run pLM-Repeat in an interactive way, i.e. selecting optimal repeat length based on the score/substitution matrix plot.
Residue range of detected repeats:
(31, 46), (49, 64), (67, 82), (97, 112), (121, 136), (139, 156), (157, 172), (175, 190)
Repeat alignment (Please note that the multiple alignment is created by directly combining pairwise alignments of representative repeat v.s. detected repeat, so the concatenated multiple alignment could be far away from optimal):
>representative repeat
ALIGNGCIVGNSTKMAGE
>repeat1
AKIGENVEIAPFVYI---
>repeat2
VVIGDNNKIMANANI---
>repeat3
SRIGNGNTIFPGAVI---
>repeat4
AEIGDNNLIRENVTI---
>repeat5
TIVGNNNLLMEGVHV---
>repeat6
ALIGNGCIVGNSTKMAG-
>repeat7
IIIDDNAIISANVLM---
>repeat8
CRVGGYVMIQGGCRF---
This repository is under active construction. Please get in touch via [email protected] if you encounter any issues when using the pipeline once it is ready. Many thanks!


