Skip to content

Latest commit

 

History

History
276 lines (202 loc) · 7.98 KB

INSTALL.md

File metadata and controls

276 lines (202 loc) · 7.98 KB

Installation of FunGAP v1.0.1

*Last updated: Jan 7, 2019

FunGAP is freely available for academic use. For the commerical use or license of FunGAP, please contact In-Geol Choi (email: igchoi (at) korea.ac.kr). Please, cite the following reference

Reference: Byoungnam Min Igor V Grigoriev In-Geol Choi, FunGAP: Fungal Genome Annotation Pipeline using evidence-based gene model evaluation (2017), Bioinformatics, Volume 33, Issue 18, Pages 2936–2937, https://doi.org/10.1093/bioinformatics/btx353


Because FunGAP implements many dependent programs, you may encounter issues during installation. Please don't hesitate to post on *Issues* or contact me ([email protected]) for help.

These steps were tested and confirmed in freshly installed Ubuntu 18.04 LTS.

0. FunGAP requirements

0.1. Required softwares (and tested versions)

  1. Hisat2 v2.1.0
  2. Trinity v2.6.6
  3. RepeatModeler v1.0.11
  4. Maker v2.31.10
  5. GeneMark-ES/ET v4.38
  6. Augustus v3.3
  7. Braker v1.9
  8. BUSCO v3.0.2
  9. Pfam_scan v1.6
  10. BLAST v2.6.0+
  11. Samtools v1.9
  12. Bamtools v2.4.1

0.2. Required databases

  1. BUSCO odb9
  2. Pfam release 32.0

1. Setup Anaconda environment

For robust installation, we recommend to use Anaconda environment and install dependent programs and libraries as much as possible in the environment.

1.1. Install Anaconda2 (v4.5.12 tested)

Download and install Anaconda2 (We assume that you install it in $HOME/anaconda2)

cd $HOME
wget https://repo.continuum.io/archive/Anaconda2-2018.12-Linux-x86_64.sh
bash Anaconda2-2018.12-Linux-x86_64.sh

1.2. Set conda environment

echo ". ~/anaconda2/etc/profile.d/conda.sh" >> ~/.bashrc
source ~/.bashrc

1.3. Create and activate an environment

conda update conda
conda create -n fungap
conda activate fungap

1.4. Add channels

This step is important otherwise Maker will stop

conda config --remove channels bioconda
conda config --remove channels conda-forge
conda config --add channels bioconda/label/cf201901
conda config --add channels conda-forge/label/cf201901

1.5. Install dependencies

conda install -c bioconda augustus rmblast maker trinity hisat2 braker busco blast pfam_scan
pip install biopython bcbio-gff networkx markdown2 matplotlib
cpanm Hash::Merge Logger::Simple Parallel::ForkManager YAML

2. Download and install FunGAP

2.1. Download FunGAP

Download FunGAP using GitHub clone. Suppose we are installing FunGAP in your $HOME directory, but you are free to change the location. $FUNGAP_DIR is going to be your FunGAP installation directory.

cd $HOME
git clone https://github.com/CompSynBioLab-KoreaUniv/FunGAP.git
cd FunGAP/
export FUNGAP_DIR=$(pwd)

3. Download databases

Download Pfam and BUSCO databases in your $FUNGAP_DIR/db directory.

3.1. Pfam DB download

ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release

cd $FUNGAP_DIR  # Change directory to FunGAP installation directory
mkdir -p db/pfam
cd db/pfam
wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz
wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz
gunzip Pfam-A.hmm.gz
gunzip Pfam-A.hmm.dat.gz
hmmpress Pfam-A.hmm  # HMMER package (would be automatically installed in the above Anaconda step)

3.2. BUSCO DB download

There are various databases in BUSCO, so just download one of them fitted to your target genome. Here are example commands.

cd $FUNGAP_DIR
mkdir -p db/busco
cd db/busco
wget https://busco.ezlab.org/datasets/fungi_odb9.tar.gz
wget https://busco.ezlab.org/datasets/ascomycota_odb9.tar.gz
wget https://busco.ezlab.org/datasets/basidiomycota_odb9.tar.gz
tar -zxvf fungi_odb9.tar.gz
tar -zxvf ascomycota_odb9.tar.gz
tar -zxvf basidiomycota_odb9.tar.gz

4. Install GeneMark

Go to this site and download GeneMark-ES/ET. http://topaz.gatech.edu/GeneMark/license_download.cgi Don't forget to download the key, too.

4.1. Uncompress downloaded files

mkdir $FUNGAP_DIR/external/
mv gm_et_linux_64.tar.gz gm_key_64.gz $FUNGAP_DIR/external/  # Move your downloaded files to this directory
cd $FUNGAP_DIR/external/
tar -zxvf gm_et_linux_64.tar.gz
gunzip gm_key_64.gz
cp gm_key_64 ~/.gm_key

4.2. Install required perl modules for GeneMark

(if required) You may need to install certain Perl modules. Because GeneMark forces to use /usr/bin/perl instead of conda-installed perl, you should install the modules for /usr/bin/perl (i.e., not in conda environment). Alternatively, you can modify first lines of GeneMark perl scripts from #!/usr/bin/perl to #!/usr/bin/env perl

conda deactivate
sudo apt-get update
sudo apt-get install build-essential
sudo cpan App::cpanminus  # Install cpanm if you do not have one
sudo cpanm Hash::Merge Logger::Simple Parallel::ForkManager YAML
conda activate fungap

4.3 Check GeneMark and its dependencies are correctly installed.

cd $FUNGAP_DIR/external/gm_et_linux_64/gmes_petap
./gmes_petap.pl

5. RepeatModeler installation

Note: RepeatModerler is available in Anaconda2 (https://anaconda.org/bioconda/repeatmodeler), but conda-installed program did not work at the moment. Installation seemed okay, but no result was produced. I will update this whenever working RepeatModeler is available.

5.1. Check perl version.

perl -v

It should be > 5.8.8

5.2. Install RECON 1.08

cd $FUNGAP_DIR/external/
wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz
tar -zxvf RECON-1.08.tar.gz
cd RECON-1.08/src/
make
make install

5.3. Install RepeatScout 1.0.5

cd $FUNGAP_DIR/external/
wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz
tar -zxvf RepeatScout-1.0.5.tar.gz 
cd RepeatScout-1
make

5.4. Install NSEG

cd $FUNGAP_DIR/external/
mkdir nseg
cd nseg
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/genwin.c
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/genwin.h
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/lnfac.h
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/makefile
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/nmerge.c
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/nseg.c
wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/runnseg
make

5.5. Install RepeatMasker 4.0.8

I could not use conda-installed RepeatMasker for RepeatModeler installation. So I manually installed.

cd $FUNGAP_DIR/external/
wget http://www.repeatmasker.org/RepeatMasker-open-4-0-8.tar.gz
tar -zxvf RepeatMasker-open-4-0-8.tar.gz
cd RepeatMasker
perl ./configure
  • Note: trf and rmblastn are located at ~/anaconda2/envs/fungap/bin.

5.6. Install RepeatModeler 1.0.11

cd $FUNGAP_DIR/external/
wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.11.tar.gz
tar -zxvf RepeatModeler-open-1.0.11.tar.gz
cd RepeatModeler-open-1.0.11/
perl ./configure
  • Note: trf and rmblastn is located at ~/anaconda2/envs/fungap/bin

5.7. Check the installation

cd $FUNGAP_DIR/external/RepeatModeler-open-1.0.11/
./BuildDatabase --help
./RepeatModeler --help

6. Configure FunGAP

This script allows users to set and test (by --help command) all the dependencies. If this script runs without any issue, you are ready to run FunGAP!

cd $FUNGAP_DIR
python set_dependencies.py \
  --pfam_db_dir db/pfam \
  --busco_db_dir db/busco/basidiomycota_odb9/ \
  --genemark_dir external/gm_et_linux_64/gmes_petap/ \
  --repeat_modeler_dir external/RepeatModeler-open-1.0.11