Skip to content

molecularinformatics/PretrainedAL-VS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening

This repository provides source code, and data associated with our Journal of Chemical Information and Modeling publication, "Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening".

Paper: JCIM Link / Arxiv Link

PretrainedAL-VS

PretrainedAL-VS is an Active learning framework with the pretained large Language model added as surrogate model.

Installation

Our code is based on MolPal and MolFormer, please check the installation of MolPal and MolFormer.

Quick Start

The model training, inference, and molecular docking parts are customized to run on HPC via SLURM RESTAPI. Please check the slide for the detail (link)

Download MolFormer

Download pretrained MolFormer model from the link, and put it in molpal/models/transformer/pretrained_ckpt/

Config slurm API server

Config slurm API server for model training and inference in molpal/models/transformermodels.py

def __init__(
        self,
        n_iter: int = 0,
        uncertainty: Optional[str] = 'none',
        ngpu: int = 1,
        ddp: bool = False,
        slurm_token: Optional[str] = None,
        log_dir: Optional[Union[str, Path]] = None,
        work_dir: Optional[Union[str, Path]] = None,
        weight_path: str = 'molpal/models/transformer/pretrained_ckpt/pretrained_weights.ckpt',
        seed: Optional[int] = None 
    ):  
        self.n_iter = n_iter
        self.CPU_PER_GPU = 8
        self.seed_path = os.path.join(work_dir, weight_path)
        self.ngpu = ngpu
        self.ncpu = ngpu*self.CPU_PER_GPU
        self.n_node, self.tasks_per_node = self.get_n_node(ngpu)
        self.ddp = ddp
        self.log_dir = log_dir
        self.seed = seed
        self.work_dir = work_dir
        self.slurm_url = 'Your Slurm Management Node ULR and port'
        self.user_name = os.environ.get("SLURM_USER", getpass.getuser())
        self.version = 'v0.0.38'
        self.errorTolerance = 20
        
        if slurm_token == None:
            self.slurm_token = self.get_slurm_token(days=1)

        self.uncertainty = uncertainty

Config slurm API server for SCHRODINGER/glide docking in molpal/objectives/glide.py if you want to use SCHRODINGER glide

def __init__(
        self,
        objective_config: str,
        minimize: bool = True,
        **kwargs,
    ): 
        self.slurm_url = 'http://edge-hpc-mgt-101.hpc.biogen.com:6820'
        self.user_name = os.environ.get("SLURM_USER", getpass.getuser())
        self.slurm_token = self.get_slurm_token(days=1)
        self.version = 'v0.0.38'
        ncpu, utils_dir, targetfname = parse_config(objective_config)
        self.utils_dir = utils_dir
        # self.ncpu = 20
        self.ncpu = ncpu
        self.targetfname = targetfname
        self.errorTolerance = 20
        super().__init__(minimize=minimize)
        self.word_pair = {
            'UTILSPATH': self.utils_dir,
            'TARGETFNAME': self.targetfname
        }

sbatch run_molpal.sh

Contact

Ye Wang ([email protected])

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published