Fast and accurate de novo genome assembly from long uncorrected reads

  1. Mile Šikić1,4
  1. 1Department of Electronic Systems and Information Processing, University of Zagreb, Faculty of Electrical Engineering and Computing, 10000 Zagreb, Croatia;
  2. 2Centre for Informatics and Computing, Ruđer Bošković Institute, 10000 Zagreb, Croatia;
  3. 3Genome Institute of Singapore, Singapore 138672, Singapore;
  4. 4Bioinformatics Institute, Singapore 138671, Singapore
  1. Corresponding author: mile.sikic{at}fer.hr
  1. 5 These authors contributed equally to this work.

Abstract

The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment–based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.

Footnotes

  • Received August 5, 2016.
  • Accepted January 17, 2017.

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

| Table of Contents

Preprint Server