- Split View
-
Views
-
Cite
Cite
Joana Damas, João Carneiro, António Amorim, Filipe Pereira, MitoBreak: the mitochondrial DNA breakpoints database, Nucleic Acids Research, Volume 42, Issue D1, 1 January 2014, Pages D1261–D1268, https://doi.org/10.1093/nar/gkt982
- Share Icon Share
Abstract
Mitochondrial DNA (mtDNA) rearrangements are key events in the development of many diseases. Investigations of mtDNA regions affected by rearrangements (i.e. breakpoints) can lead to important discoveries about rearrangement mechanisms and can offer important clues about the causes of mitochondrial diseases. Here, we present the mitochondrial DNA breakpoints database (MitoBreak; http://mitobreak.portugene.com), a free, web-accessible comprehensive list of breakpoints from three classes of somatic mtDNA rearrangements: circular deleted (deletions), circular partially duplicated (duplications) and linear mtDNAs. Currently, MitoBreak contains >1400 mtDNA rearrangements from seven species (Homo sapiens, Mus musculus, Rattus norvegicus, Macaca mulatta, Drosophila melanogaster, Caenorhabditis elegans and Podospora anserina) and their associated phenotypic information collected from nearly 400 publications. The database allows researchers to perform multiple types of data analyses through user-friendly interfaces with full or partial datasets. It also permits the download of curated data and the submission of new mtDNA rearrangements. For each reported case, MitoBreak also documents the precise breakpoint positions, junction sequences, disease or associated symptoms and links to the related publications, providing a useful resource to study the causes and consequences of mtDNA structural alterations.
INTRODUCTION
A genomic rearrangement is a large scale modification of the genome caused by a deletion, inversion, duplication, insertion or translocation (1,2). The genomic region where a junction between normal and rearranged DNA is detected is called breakpoint and can be located by comparing the sequences of the rearranged genomes. Some genomic regions, known as breakpoint hotspots or fragile sites (3,4), appear to be intrinsically prone to breakage and reorganization (5). The molecular characterization of these unstable genomic regions is a prerequisite for a better comprehension of the mechanisms underlying rearrangements.
Mitochondrial DNA (mtDNA) exhibits extraordinary genetic and physical diversity across different eukaryotic lineages (6) and among different cell and tissue types (7). Despite the notable variations in structural organization, the mitochondrial genome always encodes a small number of proteins essential to the oxidative phosphorylation complex (2,8). Some mutational changes in the mitochondrial genome significantly affect cellular respiration, causing a clinically heterogeneous group of disorders related to oxidative phosphorylation dysfunction known as mitochondrial diseases. The most common types of genetic alterations in mtDNA are point mutations, partial deletions and tandem direct duplications (9,10).
A circular deleted mtDNA (also known as ‘mtDNA deletion’, Figure 1A) is a short mtDNA molecule that lacks a section of the genome but remains in a circular format. Deleted mtDNA molecules have been found to be associated with mitochondrial disease such as progressive external ophthalmoplegia (PEO), Kearns–Sayre syndrome (KSS), Pearson syndrome (PS) and mitochondrial neurogastrointestinal encephalomyopathy (MNGIE), among other diseases. Furthermore, the accumulation of deleted mtDNA molecules might contribute to the aging process (11), Parkinson’s disease (12) or inclusion body myositis (13).
A circular partially duplicated mtDNA (also known as ‘mtDNA duplication’, Figure 1B) is a larger than normal mtDNA molecule because of the presence of a region that was tandemly duplicated. The pathogenic nature of duplications remains uncertain, although they have been found to be associated with KSS, PEO, PS and mitochondrial myopathies, as well as with aged tissues (14,15). Deleted and duplicated mtDNAs can coexist in the same individual, with the non-deleted mtDNA region being the tandem duplicated region in partial duplications (16,17).
A third class of breakpoints is observed in linear mtDNA molecules, which can result from one or more double-strand breaks that open the circular conformation. In this case, a single break produces a full-length linear mtDNA (Figure 1C), and two or more breaks result in short linear mtDNAs (Figure 1D). Short linear mtDNA molecules are common in the mtDNA mutator mice that express a proofreading-deficient mtDNA polymerase (18,19). It has also been suggested that linear mtDNAs caused by double-strand breaks are associated with the formation of mtDNA deletions (20).
The investigation of mtDNA rearrangements has been largely confined to the study of human diseases. The absence of a comprehensive database of breakpoints has restricted the discovery of general DNA sequence features in mtDNA rearrangements. Although some useful mtDNA databases are available [e.g. MITOMAP (21) and MitoTool (22)], a comprehensive catalogue of the different classes of mtDNA rearrangements from different species is still lacking. Therefore, we have developed the mitochondrial DNA breakpoints (MitoBreak) database, a manually curated database containing mitochondrial DNA breakpoints and their associated clinical and phenotypic information. The MitoBreak database integrates the existing information on mtDNA breakpoints from different species with user-friendly visualization and analysis tools. The goal is to collect and maintain all relevant data on mtDNA breakpoints and present it in an easily accessible format.
BASIC CONCEPTS
The MitoBreak database describes breakpoints from three types of mtDNA rearrangements: circular deleted, circular partially-duplicated and linear mtDNAs (Figure 1). The mtDNA deletions and duplications are usually defined by a combination of two reference numbers that identify the location of the 5′ and 3′ breakpoints. The 5′ breakpoint is positioned to the left of the 5′ break and the 3′ breakpoint is positioned to the right of the 3′ break, meaning that both breakpoints are retained in the deleted mtDNAs. We used the terms ‘5fl’ to describe the upstream flanking regions of the 5′ breakpoints and ‘3fl’ to describe the downstream regions of the 3′ breakpoints (i.e. the proximal retained sequences). These 5fl and 3fl regions are next to each other in deleted mtDNAs (Figure 1A). The DNA segment that is removed in a mtDNA deletion is flanked by the downstream region of the 5′ breakpoint (5del) and the upstream region of the 3′ breakpoint (3del), i.e. the proximal deleted sequences. Duplications have a junction site connecting the 5fl and 3fl, which resembles the junction site of mtDNA deletions (Figure 1B). The full-length linear mtDNA is defined by a single breakpoint position (Figure 1C), while short linear mtDNAs can be defined by two breakpoints (Figure 1D). The breakpoints of linear mtDNA are difficult to determine and have been rarely reported in literature. Currently, MitoBreak only describes the short linear mtDNAs found in the mtDNA mutator mice (19).
The 5′ and 3′ breakpoints are often located within perfect direct repeats, i.e. identical DNA sequences found in different locations of the mitochondrial genome (e.g. ACCTCCCTCACC; 8470–8482/13 447–13 459) (Figure 2A and B). Therefore, the junction sequence of both deletions and duplications may retain 0, 1 or 2 copies of the direct repeats (Figure 2C). When the rearrangement retains a single copy of the direct repeat, it is impossible to identify the exact sites of the breakage events. For this reason, an mtDNA rearrangement may be described by an interval of values (e.g. 8469–8482:13 447–13 460) or by arbitrary positions inside the repeated region. To avoid including redundant data in our datasets, the location of the breakpoint, when in the presence of homology, was standardized by always placing both breakpoints downstream (on the right) of the direct repeats (in the previous example, 8482:13460; Figure 2B), as previously described (23).
In the case of human mtDNA, we considered the origins of H-strand (OH) and L-strand (OL) replications according to the strand-displacement model of mtDNA replication (24–27). Although each replication origin is defined by a range of nucleotides, we used the nucleotides positions 407 for OH and 5747 for OL (28–30), which included the mtDNA regions encoding the RNA fragment used as primers to initiate DNA synthesis. In this way, the minor arc (shortest mtDNA section between OL and OH) is located between positions 408 and 5746, while the major arc (largest mtDNA section between OL and OH) is defined by positions 5747–407.
DATA CURATION AND COLLECTION
Currently, the MitoBreak database describes 1472 mtDNA rearrangements from seven species: Homo sapiens, Mus musculus, Rattus norvegicus, Macaca mulatta, Drosophila melanogaster, Caenorhabditis elegans and Podospora anserina (Table 1). The database was constructed using information from 388 peer-reviewed papers and 2 PhD theses published from 1983 to 2013 (Table 1), as well as information gathered from the MITOMAP and MitoTool databases. The information associated with each mtDNA rearrangement described in MitoBreak was manually curated. We started by comparing and numbering the reported 5′ and 3′ breakpoints or junction sequences according to the reference mtDNA sequence (Table 1). We only considered rearrangements from which the breakpoint positions have been confirmed by sequencing analysis. The numbering of the breakpoints located within direct repeats was also corrected according to our standardization procedure (Figure 2). The datasets comprise non-redundant data, i.e. only different combinations of 5′ and 3′ breakpoints are represented. Nevertheless, all references associated with each rearranged mtDNA are described.
Species . | Rearrangement type . | Rearrangements (n) . | Number of publications . | Reference mtDNA sequence . |
---|---|---|---|---|
H. sapiens | Deletions | 805 | 325 | NC_012920.1 |
Duplications | 44 | 33 | ||
M. musculus | Deletions | 245 | 23 | NC_005089.1 |
Linear | 31 | 1 | ||
R. norvegicus | Deletions | 216 | 13 | NC_001665.2 |
M. mulatta | Deletions | 58 | 4 | NC_005943.1 |
D. melanogaster | Deletions | 35 | 2 | NC_001709.1 |
P. anserina | Deletions | 32 | 1 | NC_001329.3 |
C. elegans | Deletions | 6 | 2 | NC_001328.1 |
Species . | Rearrangement type . | Rearrangements (n) . | Number of publications . | Reference mtDNA sequence . |
---|---|---|---|---|
H. sapiens | Deletions | 805 | 325 | NC_012920.1 |
Duplications | 44 | 33 | ||
M. musculus | Deletions | 245 | 23 | NC_005089.1 |
Linear | 31 | 1 | ||
R. norvegicus | Deletions | 216 | 13 | NC_001665.2 |
M. mulatta | Deletions | 58 | 4 | NC_005943.1 |
D. melanogaster | Deletions | 35 | 2 | NC_001709.1 |
P. anserina | Deletions | 32 | 1 | NC_001329.3 |
C. elegans | Deletions | 6 | 2 | NC_001328.1 |
Species . | Rearrangement type . | Rearrangements (n) . | Number of publications . | Reference mtDNA sequence . |
---|---|---|---|---|
H. sapiens | Deletions | 805 | 325 | NC_012920.1 |
Duplications | 44 | 33 | ||
M. musculus | Deletions | 245 | 23 | NC_005089.1 |
Linear | 31 | 1 | ||
R. norvegicus | Deletions | 216 | 13 | NC_001665.2 |
M. mulatta | Deletions | 58 | 4 | NC_005943.1 |
D. melanogaster | Deletions | 35 | 2 | NC_001709.1 |
P. anserina | Deletions | 32 | 1 | NC_001329.3 |
C. elegans | Deletions | 6 | 2 | NC_001328.1 |
Species . | Rearrangement type . | Rearrangements (n) . | Number of publications . | Reference mtDNA sequence . |
---|---|---|---|---|
H. sapiens | Deletions | 805 | 325 | NC_012920.1 |
Duplications | 44 | 33 | ||
M. musculus | Deletions | 245 | 23 | NC_005089.1 |
Linear | 31 | 1 | ||
R. norvegicus | Deletions | 216 | 13 | NC_001665.2 |
M. mulatta | Deletions | 58 | 4 | NC_005943.1 |
D. melanogaster | Deletions | 35 | 2 | NC_001709.1 |
P. anserina | Deletions | 32 | 1 | NC_001329.3 |
C. elegans | Deletions | 6 | 2 | NC_001328.1 |
We then retrieved the phenotypic information associated with each mtDNA rearrangement from each peer-reviewed publication. In the case of human mtDNA deletions and duplications, we also organized the reported cases into seven major categories based on the number and characteristics of the clinical or phenotypical features: single mtDNA deletions, multiple mtDNA deletions, healthy tissues, Parkinson’s disease, inclusion body myositis, tumour and other clinical features (31). The mtDNA rearrangements from M. musculus and R. norvegicus were organized according to the strain. A section with unpublished datasets is also available for those rearrangements not described in a peer-reviewed publication.
DATABASE ORGANIZATION
The MitoBreak database is constituted of three major components: datasets, the classifier tool and the submit tool. The users can access any of these components through the top and bottom navigation bars of all pages. The datasets section is subdivided into two major sections: general statistics and individual rearrangement page. The classifier tool allows users to standardize the positions of the rearrangement breakpoints when they are located in direct repeats and to verify if the rearrangement is already present in the database. The submit tool allows users to easily submit a new mtDNA rearrangement to MitoBreak or to submit supplementary data to an existing case.
Datasets
The datasets section is accessible through a table describing the species, type of rearrangement, number of reported cases and the number of publications used to build the dataset. After selecting a particular rearrangement type, users will be directed to an interactive table listing all cases, including the name and location of the rearrangement, relevant clinical features, references, etc. The contents of this table can be rearranged in multiple ways by multi-column sorting, scrolling options for table viewport and user defined searches. Additionally, the dataset can be filtered by categories and subcategories using the filter boxes on the top of the page. The full or filtered dataset and the flanking regions can then be copied, printed or downloaded in comma separated values or excel format. The ‘All references’ button opens a full list of publications used to build the selected dataset, with a connection to the PUBMED website (http://www.ncbi.nlm.nih.gov/pubmed). Users also have the opportunity to reach the individual page of each rearrangement by clicking on its name in the first column of the table. The ‘General Statistics’ button opens a series of descriptive analyses, performed dynamically, over the full or filtered dataset.
General statistics
The general statistics section provides an overview of the information present in a MitoBreak dataset (full or filtered by the user). The displayed statistics vary according to the type of rearrangement, the species and the available data. For instance, the general statistics for human mtDNA deletions include the breakpoint distributions, deletion lengths, genomic locations of the deleted regions, distribution in sub-groups and a circular visualization of the mtDNA with all deletions. The general statistics are presented using tables, interactive histograms and/or scatterplots (Figure 3). The charts are based on the Highcharts tool (http://www.highcharts.com/) where users can visualize the raw data of each chart point on mouse-over, zoom each axis, select which series will be presented by clicking on the legend and export the chart in png, jpeg, pdf or svg format.
Individual page
The individual page of each rearrangement can be accessed through the first column of the dataset table, which displays the breakpoints. In the case of deletions and duplications, users can visualize the flanking sequences of the breakpoints and the length and location of the direct repeats (if present). The locations of the 5′ and 3′ breakpoints are also shown in a scatterplot (red dot), while the rearrangement length is shown in a histogram (red bar). These two charts also display the data for all available breakpoints in the dataset in grey dots or bars; therefore, the individual rearrangement can be analyzed in the broader context of all reported cases. The rearrangement is also represented in a circular mtDNA plot. The phenotype groups that include the mtDNA rearrangement and the references are also made available on the individual page (Figure 4).
Classifier tool
The classifier tool allows users to compare any mtDNA rearrangement obtained during their research with the full dataset of published rearrangements present in MitoBreak. To compare a rearrangement with the complete dataset, the Classifier tool also corrects the position of the breakpoints when located in direct repeats (Figure 2). The breakpoints can be submitted in two formats: (i) a pair of breakpoint numbers as identified by the user in the reference mtDNA sequence or (ii) the rearrangement junction sequence, i.e. where the normal mtDNA is disrupted by rearrangement. If a sequence junction is provided by the user, a Blast (32) analysis between the given sequence and the reference mtDNA genome is performed. The classification tool provides the same information available for the individual pages described previously, but for the new submitted rearrangement, as well as an indication of the number of cases already present in MitoBreak with the same breakpoints.
Submit tool
The submit tool permits the user to submit new mtDNA rearrangements to MitoBreak. The procedure is similar to that used in the classifier tool. After the indication of the breakpoints or the rearrangement junction, the main characteristics of the rearrangement are shown for confirmation. The submission procedure ends with a contact form.
SUPPORT
We provide relevant information on mtDNA rearrangements (numbering and location of breakpoints, definition of flanking regions, breakpoints on direct repeats, abbreviations, etc.) in the Documentation section and detailed tutorials on how to use the MitoBreak tools can be found in the Tutorial section, such as ‘How to analyze a set of breakpoints already present in the database’, ‘How to locate a pair of breakpoints within the existing datasets’ and ‘How to submit new breakpoints to the database’. User support can be obtained through the contact form available on the Contact & Support page or via email at [email protected].
AVAILABILITY AND DESIGN
The MitoBreak database is available at http://mitobreak.portugene.com. It uses a SQLite database for data storage and runs on an Apache web server using CGI-Perl and JavaScript to generate dynamic HTML pages. The dataset tables were generated using the JQuery plugin DataTables v1.9.4 (http://datatables.net/). The interactive graphs were created using Highcharts v3.0 (http://www.highcharts.com/), and the circular mtDNA representations were made using Circos v0.64 (33).
CONCLUSIONS AND PERSPECTIVES
Our goal is to continually add new mtDNA rearrangements, both inside the existing datasets or new ones, as well as enhance the available analyses and visualization methods. We encourage the submission of new data on rearrangements to MitoBreak to be shared with the research community. If users have large collections of rearrangements to submit or want to add breakpoints from a species and/or rearrangement type not present in MitoBreak, a contact form and e-mail address is available on the website.
Several online resources are available that describe the diverse features of mitochondria and mitochondrial DNA [e.g. MITOMAP (21), MitoTool (22), mtDB (34), HmtDB (35), MitoP2 (36)], but only the MITOMAP and MitoTool databases have information regarding mtDNA rearrangements. However, these two databases only describe mtDNA rearrangements detected in humans without any statistical, descriptive or visual representation of the breakpoints. The MitoBreak database is by far the largest set of mtDNA breakpoints currently available and presents diverse information for each available rearrangement. Moreover, MitoBreak allows multiple types of interactions with the datasets so users can have a fast characterization of all or subsets of mtDNA rearrangements. New rearrangements can be easily analyzed in the light of available data, including their previous description in a clinical context. For the first time, mtDNA breakpoints from different species can be analyzed using a single platform. The comparison of mtDNA rearrangements from different species facilitates the identification of common sequence features in breakpoint regions, such as direct repeats or non-B DNA conformations. This information might help to delineate new experiments in model organisms (M. musculus, D. melanogaster, C. elegans, etc.) to better understand how pathological mtDNA deletions and duplications are formed in humans.
For all these reasons, we believe that MitoBreak will be a useful tool to help researchers gain greater knowledge about mtDNA rearrangements. It will provide clinicians and molecular geneticists a useful resource to study new or previously described mtDNA deletions, which are associated with a wide variety of highly debilitating and often fatal disorders and have been implicated in aging and age-associated disease. MitoBreak will also be useful for researchers interested in designing accurate methods for the identification and screening of abnormal mtDNAs in different clinical contexts by providing detailed information on deleted/duplicated mtDNA regions. Finally, our database is an easily accessible platform for those who might want to explore the basics of mtDNA organization and the general mechanisms of genomic rearrangements across species.
FUNDING
Portuguese Foundation for Science and Technology (FCT), Fundo Social Europeu and Programa Operacional Potencial Humano [PTDC/CVT/100881/2008, SFRH/BPD/44637/2008] and Investigator FCT program. IPATIMUP is an Associate Laboratory of the Portuguese Ministry of Education and Science and is partially supported by FCT. CIIMAR is partially supported by the European Regional Development Fund (ERDF) through the COMPETE—Operational Competitiveness Programme and national funds through FCT, under the project [PEst-C/MAR/LA0015/2013]. Funding for open access charge: Portuguese Foundation for Science and Technology (FCT) [PTDC/CVT/100881/2008].
Conflict of interest statement. None declared.
Comments