This repository provides an automated docking solution for ligands and receptor proteins using AutoDock Vina and P2Rank. It supports high-throughput docking workflows and integrates seamlessly with SLURM or can be run locally.
- Python 3.11: Core scripting language.
- AutoDock Vina v1.2.5: Molecular docking engine.
- P2Rank v2.4.2: Binding site prediction.
- Biopython, RDKit, Open Babel, PyMOL: Molecular handling, visualization, and preparation tools.
- SLURM: Workload manager for distributed computing (optional).
- Ubuntu 22.04
- Miniconda (Installed via provided scripts)
- SLURM (Optional for distributed execution)
biopython
,biopandas
,pubchempy
,tqdm
,matplotlib
,scipy
,rdkit
,pdbfixer
,pymol-open-source
openbabel
,wget
,tar
- Java Runtime Environment (JRE)
The init_docking.py
script automates the process of docking multiple ligands to multiple receptor proteins. It is designed to handle all necessary steps, from input preparation to generating docking results, with minimal user intervention. Below is a detailed breakdown of its workflow:
- The script accepts the following arguments:
--pdb_ids
: A CSV file located in the./receptors
directory, containing the PDB IDs of receptor proteins. Each ID corresponds to a unique protein structure available in the Protein Data Bank (PDB).--ligands
: An SDF file located in the./ligands
directory, containing one or more ligands for docking.- Optional parameters like
--tol
,--pckt
,--exhaust
, and--energy_range
define the docking box dimensions, pocket selection, search thoroughness, and energy range for pose scoring.
-
Download Receptor Structures:
- For each PDB ID listed in the CSV file, the script downloads the corresponding protein structure from the Protein Data Bank (PDB).
- The downloaded file is saved as
<PDB_ID>_dirty.pdb
in a newly created folder named after the receptor (e.g.,./8W88/
).
-
Fixing the Receptor:
- Using
PDBFixer
, the script:- Retains only the chain with the maximum number of residues.
- Removes heteroatoms and water molecules.
- Adds missing residues, atoms, and hydrogens based on a physiological pH of 7.4.
- The fixed structure is saved as
<PDB_ID>_fixed.pdb
.
- Using
-
Receptor Conversion:
- The fixed PDB structure is converted to the
.pdbqt
format required by AutoDock Vina. The converted file is saved as<PDB_ID>.pdbqt
.
- The fixed PDB structure is converted to the
-
The script utilizes P2Rank to predict potential binding sites (pockets) on the receptor.
- The predictions are saved in a folder named
01_p2rank_output
within the receptor's directory. - A CSV file (
<PDB_ID>_predictions.csv
) lists each pocket's coordinates, size, and scores.
- The predictions are saved in a folder named
-
The selected pocket (based on the
--pckt
argument) is used to define the docking box dimensions. This includes the center coordinates (center_x
,center_y
,center_z
) and sizes (size_x
,size_y
,size_z
) with an optional tolerance (--tol
).
-
For each ligand in the provided SDF file:
- The ligand is converted to
.pdb
format using RDKit. - Hydrogen atoms are added, and a 3D conformer is generated for the ligand.
- The
.pdb
file is converted to.pdbqt
format required for docking using Open Babel.
- The ligand is converted to
-
The prepared files are stored in subdirectories within the receptor's folder (e.g.,
./8W88/aspirin.pdbqt
).
- The script runs AutoDock Vina for each receptor-ligand pair:
- The docking box is defined using P2Rank predictions.
- Parameters such as
--exhaust
(exhaustiveness) and--energy_range
control the thoroughness and energy tolerance for pose scoring. - Docking results are saved in
.pdbqt
format, and key details (e.g., binding affinities) are extracted from the output.
-
Visualizations:
- PyMOL is used to generate visualizations of the best-docked ligand poses superimposed on the receptor structure. The images are saved as
.png
files.
- PyMOL is used to generate visualizations of the best-docked ligand poses superimposed on the receptor structure. The images are saved as
-
HTML Report:
- The script creates an interactive HTML report for each receptor, summarizing:
- Key docking metrics (binding energies, pocket scores).
- Links to output files (e.g.,
.pdbqt
and.txt
). - 2D and 3D visualizations of ligand-receptor complexes.
- The script creates an interactive HTML report for each receptor, summarizing:
- Each receptor has its dedicated directory containing:
- Processed Structures:
<PDB_ID>_dirty.pdb
: Raw receptor structure.<PDB_ID>_fixed.pdb
: Cleaned receptor structure.<PDB_ID>.pdbqt
: Receptor ready for docking.
- Docking Results:
<PDB_ID>_results.txt
: Detailed docking logs.<ligand_name>.pdbqt
: Best poses for each ligand.<ligand_name>.svg
: 2D ligand structure images.
- Visualizations:
<PDB_ID>_<ligand_name>_docking.png
: 3D visualizations of docked complexes.
- P2Rank Predictions:
01_p2rank_output/<PDB_ID>_predictions.csv
: Binding site information.
- Processed Structures:
This modular pipeline ensures seamless handling of multiple receptors and ligands, providing users with comprehensive results for further analysis.
- Clone the repository:
git clone https://github.com/your-repository/docking-system.git cd docking-system
- Run the installation script:
chmod +x install.sh bash install.sh
For environments where most dependencies are already configured:
chmod +x mini_install.sh
bash mini_install.sh
Ensure the following tools are available in their respective paths:
- AutoDock Vina:
/usr/local/bin/vina_1.2.5_linux_x86_64
- P2Rank:
/usr/local/bin/prank
-
Prepare input files:
- Place receptor PDB IDs in a CSV file under
./receptors
. - Place ligand structures in SDF format under
./ligands
.
- Place receptor PDB IDs in a CSV file under
-
Submit the job via SLURM:
sbatch start_docking.sh
- Activate the conda environment:
source ~/miniconda3/etc/profile.d/conda.sh conda activate auto_dock
- Run the Python script:
python3 init_docking.py --pdb_ids receptors.csv --ligands ligand_file.sdf
The repository includes a sample SLURM script (start_docking.sh
) optimized for the docking pipeline. Key configurations include:
- Single task allocation (
#SBATCH --ntasks=1
). - Infinite runtime (
#SBATCH --time=INFINITE
).
--pdb_ids
: CSV file with receptor PDB codes.--ligands
: SDF file containing ligands.--tol
: Docking box tolerance (Å, default: 0).--pckt
: Pocket number from P2Rank predictions (default: 1).--exhaust
: Docking thoroughness (default: 20).--energy_range
: Energy range for docking poses (default: 2 kcal/mol).
-
Results organized by receptor:
receptor_name_results.txt
: Detailed docking results.ligand_name.pdbqt
: Prepared ligand.ligand_name.svg
: 2D ligand structure.
-
HTML Report:
- Summarized docking results.
- Interactive visualization links.
- The system works best with SLURM for distributed execution but can run locally.
- Ensure all dependencies are correctly installed and configured.
- Follow the user manual (
User_Guide_Docking_System_ENG.html
) for detailed steps.
For more details, refer to the Installation Guide.