🧬 BioReason-Pro
Advancing Protein Function Prediction with
Multimodal Biological Reasoning

Abstract

Protein function annotation is fundamental to understanding biological mechanisms, designing therapeutics, and advancing biomedical research. Current computational methods either rely on shallow sequence similarity or treat function prediction as isolated classification tasks, failing to capture the integrative reasoning across sequence, structure, domains, and interactions that expert biologists perform to infer function. We introduce BioReason-Pro, the first multimodal reasoning large language model (LLM) for protein function prediction that integrates protein embeddings with biological context to generate structured reasoning traces. A key input into BioReason-Pro is the set of GO term predictions made by GO-GPT, our autoregressive transformer that captures hierarchical and cross-aspect dependencies of GO terms. BioReason-Pro is trained via supervised fine-tuning on synthetic reasoning traces generated by GPT-5 for over 130K proteins and further optimized through reinforcement learning. It achieves 73.6% F_max on GO term prediction and an LLM judge score of 8/10 on functional summaries, substantially outperforming previous methods. Evaluations with human protein experts show that BioReason-Pro annotations are preferred over ground truth UniProt annotations in 79% of cases. Remarkably, BioReason-Pro de novo predicted experimentally confirmed binding partners with per-residue attention localizing to the exact contact residues resolved in cryo-EM structures of those complexes. Together, GO-GPT and BioReason-Pro establish a framework for protein function prediction that combines precise ontology modeling with interpretable biological reasoning.

Key Contributions

• First multimodal reasoning LLM for protein function: BioReason-Pro deeply integrates ESM3 protein embeddings, a GO graph encoder, and biological context within a unified LLM to generate structured reasoning traces and functional annotations.

• Autoregressive GO term prediction (GO-GPT): A novel autoregressive transformer that treats Gene Ontology prediction as a sequence generation task, capturing hierarchical and cross-aspect dependencies that discriminative methods miss, achieving state-of-the-art weighted F_max of 0.65–0.70.

• Expert-level functional reasoning: Human protein experts preferred BioReason-Pro annotations over curated UniProt entries in 79% of evaluated cases, with an LLM judge score of 8.03/10 across five evaluation axes.

• De novo structural predictions: BioReason-Pro predicted experimentally validated binding partners (e.g., SBP2 for eEFSec) with per-residue attention localizing to the exact contact interfaces resolved in cryo-EM structures.

• Structural reasoning beyond domain transfer: The model performs contextual architectural reasoning that overrides misleading superfamily-level annotations, as demonstrated for CFAP61's non-enzymatic scaffolding role.

• Broad-scale release: All model weights, training code, curated datasets, a web interface, and precomputed predictions for 240,000+ proteins including the Human Protein Atlas are publicly available.

Web Interface

Try BioReason-Pro directly through our web-based inference server:

🔗 bioreason.net

Precomputed predictions for 240,000+ proteins (including the Human Protein Atlas) are available at bioreason.net/atlas.

Datasets

The datasets used to train and evaluate BioReason-Pro comprise 133,492 proteins across 3,135 organisms curated from UniProt with experimental GO annotations, InterPro domains, STRING protein-protein interactions, and PDB structures. Temporal holdout follows the CAFA framework. Detailed download and usage instructions are available on our HuggingFace collection.

Checkpoints

Model weights are available on our HuggingFace collection:

Model	Link
GO-GPT	HuggingFace
BioReason-Pro SFT	HuggingFace
BioReason-Pro RL	HuggingFace

Installation

Prerequisites

Python 3.11+
CUDA/GPU for best performance

Installation Steps

# Clone the repository
git clone https://github.com/bowang-lab/BioReason-Pro.git
cd BioReason-Pro

# Install package
pip install -e .

Citation

If you find this work useful, please cite our papers:

@article {Fallahpour2026.03.19.712954,
	author = {Fallahpour, Adibvafa and Seyed-Ahmadi, Arman and Idehpour, Parsa and Ibrahim, Omar and Gupta, Purav and Naimer, Jack and Zhu, Kevin and Shah, Arnav and Ma, Shihao and Adduri, Abhinav and G{\"u}loglu, Talu and Liu, Nuo and Cui, Haotian and Jain, Arihant and de Castro, Max and Fallahpour, Amirfaham and Cembellin-Prieto, Antonio and Stiles, John S. and Nem{\v c}ko, Filip and Nevue, Alexander A. and Moon, Hyungseok C. and Sosnick, Lucas and Markham, Olivia and Duan, Haonan and Lee, Michelle Y. Y. and Salvador, Andrea F. M. and Maddison, Chris J. and Thaiss, Christoph A. and Ricci-Tam, Chiara and Plosky, Brian S. and Burke, Dave P. and Hsu, Patrick D. and Goodarzi, Hani and Wang, Bo},
	title = {BioReason-Pro: Advancing Protein Function Prediction with Multimodal Biological Reasoning},
	elocation-id = {2026.03.19.712954},
	year = {2026},
	doi = {10.64898/2026.03.19.712954},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2026/03/20/2026.03.19.712954},
	eprint = {https://www.biorxiv.org/content/early/2026/03/20/2026.03.19.712954.full.pdf},
	journal = {bioRxiv}
}

@misc{fallahpour2025bioreasonincentivizingmultimodalbiological,
      title={BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model}, 
      author={Adibvafa Fallahpour and Andrew Magnuson and Purav Gupta and Shihao Ma and Jack Naimer and Arnav Shah and Haonan Duan and Omar Ibrahim and Hani Goodarzi and Chris J. Maddison and Bo Wang},
      year={2025},
      eprint={2505.23579},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.23579}, 
}

Authors

Adibvafa Fallahpour¹²³⁴⁵ * ([email protected])
Arman Seyed-Ahmadi²⁵ *
Parsa Idehpour¹⁵⁹ *
Omar Ibrahim²⁵ *
Purav Gupta³⁴⁵ *
Jack Naimer⁵ ¹⁰
Kevin Zhu⁵⁸
Arnav Shah³⁴⁵
Shihao Ma²³⁴⁵
Abhinav Adduri¹⁵
Talu Güloglu⁴⁵ ¹¹
Nuo Liu¹
Haotian Cui¹³
Arihant Jain¹⁹
Max de Castro⁹
Amirfaham Fallahpour⁴
Antonio Cembellin-Prieto¹
John S. Stiles¹
Filip Nemčko¹
Alexander A. Nevue¹
Hyungseok C. Moon¹
Lucas Sosnick¹⁶
Olivia Markham¹²
Haonan Duan³⁴
Michelle Y. Y. Lee¹⁶
Andrea F. M. Salvador¹⁶
Chris J. Maddison³⁴
Christoph A. Thaiss¹⁶
Chiara Ricci-Tam¹
Brian S. Plosky¹
Dave P. Burke¹
Patrick D. Hsu¹⁸
Hani Goodarzi†‡¹⁷ ([email protected])
Bo Wang†‡²³⁴ ¹³ ([email protected])

¹ Arc Institute ² University Health Network ³ Vector Institute ⁴ University of Toronto ⁵ Core Contributor
⁶ Stanford University ⁷ University of California, San Francisco ⁸ University of California, Berkeley
⁹ University of Pennsylvania ¹⁰ EPFL ¹¹ ETH Zürich ¹² Cohere ¹³ Xaira Therapeutics

* Equal contribution. The order of authors is not a reflection of their relative contributions.
† These authors, listed alphabetically, jointly supervised this work.
‡ Corresponding authors

Made with ❤️ at Arc Institute, University of Toronto, Vector Institute, and University Health Network

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
bioreason2		bioreason2
data		data
evals		evals
gogpt		gogpt
scripts		scripts
.gitignore		.gitignore
README.md		README.md
eval.py		eval.py
gogpt_api.py		gogpt_api.py
interpro_api.py		interpro_api.py
pyproject.toml		pyproject.toml
train_protein_llm.py		train_protein_llm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 BioReason-Pro
Advancing Protein Function Prediction with
Multimodal Biological Reasoning

Abstract

Key Contributions

Web Interface

Datasets

Checkpoints

Installation

Prerequisites

Installation Steps

Citation

Authors

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧬 BioReason-ProAdvancing Protein Function Prediction withMultimodal Biological Reasoning

Abstract

Key Contributions

Web Interface

Datasets

Checkpoints

Installation

Prerequisites

Installation Steps

Citation

Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

🧬 BioReason-Pro
Advancing Protein Function Prediction with
Multimodal Biological Reasoning