MIRA is a whole genome shotgun and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio (the later at the moment only CCS and error-corrected CLR reads). It can be seen as a Swiss army knife of sequence assembly developed and used in the past 16 years to get assembly jobs done efficiently - and especially accurately. That is, without actually putting too much manual work into finishing the assembly.
The previous version of MIRA 4.0 on SourceForge.
The latest stable version is MIRA 5 on GitHub!
Table of contents
The most up-to-date documentation is also always available in the binary distribution packages.
Scaffolding:
* Scaffolding with Bambus Instructions for scaffolding MIRA contigs & paired-end data with BAMBUS. Written by Gregory Harhay, USDA.
MIRA started in 1997 as a PhD project at the German Cancer Research Centre in Heidelberg (Deutsches Krebsforschungszentrum Heidelberg). Binaries were always distributed publicly and over time, other labs and sequencing providers have found MIRA useful for assembly of extremely 'unfriendly' projects containing lots of repetitive sequences (as always, your mileage may vary).
In 2007 I (Bastien Chevreux) asked the DKFZ for the permission to put MIRA under an open source license ... and got it.
MIRA 4 is able to perform true hybrid de-novo assemblies using reads gathered through Sanger, 454, Solexa, IonTorrent or PacBio sequencing technologies. That is, it assembles reads instead of a mix of (eventually shredded) consensus sequence and reads. See an example on how it looks like for Sanger and 454 in the documentation introduction, but it also works with any other combination of sequencing technologies. Only restriction at the moment: reads must be <= 32 kilobases and for PacBio, MIRA must get CCS reads or error-corrected CLR data.
The combination currently preferred by the author is a mix of de-novo error-corrected PacBio reads and Illumina mapping assemblies: PacBio to get long contigs built, Illumina to get rid of the indels which even Quiver (PacBio software) cannot get rid of. Here's the recipe I use for sequencing a bacterium or lower eukaryote de-novo and almost perfectly for comparatively little money:
Granted, there may be a few more steps ... but that's basically it.
MIRA contains integrated editors for all sequence technologies which iteratively remove many sequencing errors from the assembly project and improve the overal alignment quality.
MIRA can also be used for mapping assemblies and automatic tagging of difference site (SNPs, insertions or deletions) of mutant strains against a reference sequence.
For organisms where annotated files in GFF3 format are available (or for GenBank files without intron/exon structures), MIRA can generate tables which are ready to use for biologists as they show exactly which genes are hit and give a first estimate whether the function of the protein is attained by the change.