Skip to content

inoue0426/awesome-computational-biology

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 

Awesome Computational Biology Awesome

A knowledge collection of databases, software and papers related to computational biology.

Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modelling and computational simulation techniques to the study of biological, ecological, behavioural, and social systems. - Wikipedia

Contents

Databases

scRNA

Compound

Pathway

Mass Spectra

  • MassBank - Open souce databases and tools for mass spectrometry reference spectra.
  • MoNA MassBank of North America - Meta database of metabolite mass spectra, metadata and associated compounds.

Protein

Genome

Disease

  • KEGG DRUG - Comprehensive drug information resource for approved drugs.
  • DrugBank - A database of drug and target maintained by the University of Alberta.

Interaction

  • Drug Gene Interaction
    • DGIdb - A database of drug-gene interactions and the druggable genome.
    • Comparative Toxicogenomics Database - A database of Chemical-gene interactions, Chemical-disease associations, Gene-disease associations, and Chemical-phenotype associations.
    • SNAP - A dataset which contains Drug-gene interactions.
    • Therapeutics Data Commons - A database for a lot of tasks such as drug-target, drug-response, drug-drug interaction.
  • Drug (-Cell line) Response
  • Chemical Protein Interaction
    • STITCH - A database of Chemical Protein Interaction.
    • BindingDB - A database of compounds and targes.
    • PDBBind - Database of experimentally measured binding affinity data for biomolecular complexes.
    • CrossDocked2020 - Large-scale dataset for machine learning in structure-based virtual screening.
  • Protein-Protein Interaction
    • STRING - Protein-Protein Interaction Networks for several organisms.
    • BioGRID - Database of Protein, Genetic and Chemical Interactions.
    • HIPPIE - Human Protein-Protein Interaction database.
  • Knowledge Graph

Clinical Trial

API

Preprocess

  • Chemistry Development Kit - A software of cheminformatics and Machine Learning.
  • FlashDeconv - High-performance spatial transcriptomics deconvolution. Processes 1M spots in ~3 minutes.
  • RDKit - A software of cheminformatics and Machine Learning.
  • ChatSpatial - MCP server enabling spatial transcriptomics analysis via natural language.
  • Scanpy - scRNA analysis library in Python.
  • Seurat - scRNA analysis library in R.
  • Squidpy - Spatial single cell analysis library in Python.

Machine Learning Tasks and Models

Drug Response Prediction

  • drGAT: A model for drug response prediction with gene explainability with attention mechanism.
  • MOFGCN: GCN + heterogeneous network
  • DeepDSC: Autoencoder + Fully Connected NN
  • DGDRP: Multi-view embedding NN.
  • DeepAEG: GNN Embedding + Attention

Drug Repurposing

Drug Target Interaction

  • NeoDTI - A library for Drug Target Interaction.

Compound Protein Interaction

  • MCPINN - A library for drug discovery using Compound Protein Interaction and Machine Learning.
  • TransformerCPI - A library for Compound Protein Interaction prediction using Transformer.

Pre-trained embedding

LLM for biology

  • AI4Chem/ChemLLM-7B-Chat - LLM for chemical and molecule science
  • BioGPT - LLM for Biomedical text generation
  • GeneGPT - LLM for biomedical information with several API.
  • GenePT - foundation LLM for single cell data
  • scPRINT - scPRINT is pretrained on 50M cells to denoise and perform zero imputation of any single cell RNAseq profile.

Foundation models

  • scFoundation - A large-scale pretrained foundation model for single-cell gene expression data, enabling multiple downstream analysis tasks.
  • scGPT - A transformer-based foundation model pretrained on millions of single-cell profiles to support various single-cell analysis tasks.
  • BulkFormer - A foundation model pretrained on large-scale bulk RNA-seq data to learn general transcriptomic representations for downstream analysis tasks.

Contributors 4

  •  
  •  
  •  
  •