FLASH: fast length adjustment of short reads to improve genome assemblies
- PMID: 21903629
- PMCID: PMC3198573
- DOI: 10.1093/bioinformatics/btr507
FLASH: fast length adjustment of short reads to improve genome assemblies
Abstract
Motivation: Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome.
Results: We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of <1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds.
Availability and implementation: The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash.
Contact: [email protected].
Figures
Similar articles
-
COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly.Bioinformatics. 2012 Nov 15;28(22):2870-4. doi: 10.1093/bioinformatics/bts563. Epub 2012 Oct 8. Bioinformatics. 2012. PMID: 23044551
-
PEAR: a fast and accurate Illumina Paired-End reAd mergeR.Bioinformatics. 2014 Mar 1;30(5):614-20. doi: 10.1093/bioinformatics/btt593. Epub 2013 Oct 18. Bioinformatics. 2014. PMID: 24142950 Free PMC article.
-
QuorUM: An Error Corrector for Illumina Reads.PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015. PLoS One. 2015. PMID: 26083032 Free PMC article.
-
Chromosome-level hybrid de novo genome assemblies as an attainable option for nonmodel insects.Mol Ecol Resour. 2020 Sep;20(5):1277-1293. doi: 10.1111/1755-0998.13176. Epub 2020 Jun 7. Mol Ecol Resour. 2020. PMID: 32329220 Review.
-
De novo assembly of short sequence reads.Brief Bioinform. 2010 Sep;11(5):457-72. doi: 10.1093/bib/bbq020. Epub 2010 Aug 19. Brief Bioinform. 2010. PMID: 20724458 Review.
Cited by
-
DGCNN approach links metagenome-derived taxon and functional information providing insight into global soil organic carbon.NPJ Biofilms Microbiomes. 2024 Oct 26;10(1):113. doi: 10.1038/s41522-024-00583-9. NPJ Biofilms Microbiomes. 2024. PMID: 39461939 Free PMC article.
-
Detecting significant expression patterns in single-cell and spatial transcriptomics with a flexible computational approach.Sci Rep. 2024 Oct 30;14(1):26121. doi: 10.1038/s41598-024-75314-3. Sci Rep. 2024. PMID: 39478009 Free PMC article.
-
Seasonal dynamics in taxonomy and function within bacterial and viral metagenomic assemblages recovered from a freshwater agricultural pond.Environ Microbiome. 2020 Oct 28;15(1):18. doi: 10.1186/s40793-020-00365-8. Environ Microbiome. 2020. PMID: 33902740 Free PMC article.
-
Bioaugmented Phytoremediation of Metal-Contaminated Soils and Sediments by Hemp and Giant Reed.Front Microbiol. 2021 Apr 20;12:645893. doi: 10.3389/fmicb.2021.645893. eCollection 2021. Front Microbiol. 2021. PMID: 33959108 Free PMC article.
-
Apex Predators Enhance Environmental Adaptation but Reduce Community Stability of Bacterioplankton in Crustacean Aquaculture Ponds.Int J Mol Sci. 2022 Sep 15;23(18):10785. doi: 10.3390/ijms231810785. Int J Mol Sci. 2022. PMID: 36142697 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources