GitHub - seoulpm/DCBLAST: Divide and Conquer BLAST: using grid engines to accelerate BLAST and other sequence analysis tools

#DCBLAST The Basic Local Alignment Search Tool (BLAST) is by far best the most widely used tool in for sequence analysis for rapid sequence similarity searching among nucleic acid or amino acid sequences. Recently, cluster, grid, and cloud environmentshave been are increasing more widely used and more accessible as high-performance computing systems. Divide and Conquer BLAST (DCBLAST) has been designed to perform run on grid system with query splicing which can run National Center for Biotechnology Information (NCBI) BLASTBLAST search comparisons over withinthe cluster, grid, and cloud computing grid environment by using a query sequence distribution approach NCBI BLAST. This is a promising tool to accelerate BLAST job dramatically accelerates the execution of BLAST query searches using a simple, accessible, robust, and practical with extremely easy access, robust and practical approach.

##Requirement

-Sun Grid Engine (Any version)

-Grid cloud or distributed computing system.

##Prerequisites

The program requires Perl to run.

The following Perl modules are required:

Path::Tiny
Data::Dumper

Install prerequisites with the following command:

$ cpan `cat requirement`

or

$ cpanm `cat requirement`

We strongly recommend to use Perlbrew http://perlbrew.pl/ and cpanm https://github.com/miyagawa/cpanminus

##Installation

The program is a single file Perl scripts. Copy it into executive directories.

##Configuration

Please edit config.ini before you run!!

[dcblast]
##Name of job
job_name_prefix=dcblast

[blast]
##BLAST options

##BLAST path (your blast+ path)
path=~/bin/blast/ncbi-blast-2.2.30+/bin/

##DB path (build your own BLAST DB)
##example
##makeblastdb -in example/test_db.fas -dbtype nucl
db=example/test_db.fas

##Evalue cut-off (See BLAST manual)
evalue=1e-05

##number of threads in each job. If your CPU is AMD it needs to be set 1.
num_threads=2

##Max target sequence output (See BLAST manual)
max_target_seqs=1

##Output format (See BLAST manual)
outfmt=6

[sge]
##Grid job submission commands
##please check your job submission scripts
pe=SharedMem 1
M=your@email
o=log
q=common.q
j=yes
cwd=

If you need any other options for your enviroment please contant us.

##Usage

perl dcblast.pl

Usage : dcblast.pl --input input-fasta --size size-of-group --output output-filename-prefix  --blast blast-program-name

  --ini <ini filename> ##config file ex)config.ini

  --input <input filename> ##query fasta file

  --size <output size> ## size of chunks usually all core x 2, if you have 160 core all nodes, you can use 320. please check it to your admin.

  --output <output filename> ##output name

  --blast <blast name> ##blastp, blastx, blastn and etcs.

##Examples

###Dryrun (--dryrun option will only split fasta file into chunks)

perl dcblast.pl --ini config.ini --input example/test.fas --output test --size 20 --blast blastn --dryrun

DRYRUN COMMAND : [qsub -M your@email -cwd -j yes -o log -pe SharedMem 1 -q common.q -N dcblast_split -t 1-20 dcblast_blastcmd.sh]
DRYRUN COMMAND : [qsub -M your@email -cwd -j yes -o log -pe SharedMem 1 -q common.q -hold_jid dcblast_split -N dcblast_merge dcblast_merge.sh test/results 20]
DRYRUN COMMAND : [qstat]
DONE

###Run

perl dcblast.pl --ini config.ini --input example/test.fas --output test --size 20 --blast blastn

This run will splits file into 20 chunks, run on 20 cores and generated BLAST output file "test/results/merged" and chunked input file "test/chunks/"

##Citation Won Cheol Yim and John Cushman (2015) Divide and Conquer BLAST: using grid engines to accelerate BLAST and other sequence analysis tools. Bioinformatics apllication note Rejected.

##Copyright

The program is copyright by Yim, Won Cheol.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
example		example
README.md		README.md
config.ini		config.ini
dcblast.pl		dcblast.pl
requirement		requirement

seoulpm/DCBLAST

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages