|
|
|
|
|
B73 Representative Reference Genome Assembly Status
The current representative reference genome for Maize is B73
Zm-B73-REFERENCE-NAM-5.0 (also known as RefGen_v5).
The current B73 assembly version, Zm-B73-REFERENCE-NAM-5.0, released in
January 2020, was sequenced and assembled along with a set of 25
inbreds known as the NAM founder lines by the
NAM Consortium
using PacBio long reads and mate-pair strategy. Scaffolds were
validated by BioNano optical mapping, and ordered and oriented
using linkage and pan-genome marker data. RNA-seq data from
multiple tissues were used to annotate each genome using a
pipeline that includes BRAKER, Mikado, and PASA.
The first three assemblies, B73 RefGen_v1, B73 RefGen_v2, and
B73 RefGen_v3 were all based on a BAC (bacterial artificial
chromosome) sequencing strategy. B73 RefGen_v4 assembly
used a new approach that relied on PacBio Single Molecule
Real Time (SMRT) sequencing at Cold Spring Harbor to a depth
of 60X coverage with scaffolds created via the assistance of
whole genome restriction mapping (aka Optical Mapping). Error
correction of PacBio sequences was facilitated by Illumina short
read DNA sequencing performed at Washington University.
Annotation was accomplished in the Ware laboratory at Cold
Spring Harbor using the Maker pipeline (Campbell, 2014) and
~111,000 long read PacBio transcipts from six maize tissues.
More complete details in the B73 RefGen_v4 assembly can be found at
Gramene
or by reading the
paper.
See the History of Maize Genome
Assemblies and Annotations for more information.
|
|
|
|
|
|
|
|
|
|
B73 Representative Reference Genome Assembly Details
The current version is
Zm-B73-REFERENCE-NAM-5.0,
also known as "B73 RefGen_v5".
Chromosomes
The assembly sequence includes all 10 chromosomes.
The sequence can be downloaded
here,
from ENA
or
from GenBank
Gaps
Gaps within BACs are indicated by runs of 100 N's. Gaps
between contigs are indicated by runs of 1000 N's.
Zm-B73-REFERENCE-NAM-5.0/Zm00001eb.1 Stats
Gene Feature |
Value |
Average protein-coding transcript size |
5376 bp |
Longest transcript: |
745,091 bp (Zm00001eb334630_T004) |
Average transposable element size |
1638 bp |
Average Exon size |
290 bp |
Average Number of exons per gene |
6 exons |
Maximum exons per gene |
80 exons (Zm00001eb126710_T002) |
Average Coding region size |
1816 bp |
Previous reference genome assemblies
Information and stats for B73 RefGen_v4
(Zm-B73-REFERENCE-GRAMENE-4.0)
Zm-B73-REFERENCE-GRAMENE-4.0/Zm00001d Stats
Gene Feature |
Value |
Average protein-coding transcript size |
7638 bp |
Average low confidence transcript size |
6981 bp |
Average transposable element size |
unavailable |
Average Exon size |
156 bp |
Average Number of exons per gene |
4 exons |
Maximum exons per gene |
81 exons (Zm00001d040166) |
Average Intron size |
578 bp |
Average Coding region size |
207 bp |
Information and stats for B73 RefGen_v3
Assembly process:
In-depth metadata for B73 RefGen_v3 is available
here.
Detailed information about the V3 assembly process is available at
.
B73 RefGen_v3 Stats
Gene Feature |
Value |
Average protein-coding transcript size |
4255 bp |
Average low confidence transcript size |
959 bp |
Average transposable element size |
1694 bp |
Average Exon size |
287 bp |
Average Number of exons per gene |
3.6 exons |
Maximum exons per gene |
35 exons (GRMZM2G068755_T01) |
Average Intron size |
630 bp |
Average Coding region size |
213 bp |
Information and stats for B73 RefGen_v2
B73 RefGen_v2 Stats
Gene Feature |
Value |
Average WGS transcript size |
2646 bp |
Average FGS transcript size |
4237 bp |
Average Exon size |
287 bp |
Average Number of exons per gene |
3.6 exons |
Maximum exons per gene |
53 exons (GRMZM2G068755_T01) |
Average Intron size |
629 bp |
Average Coding region size |
210 bp |
Average 5' UTR average length |
280 bp |
Average 3' UTR average length |
336 bp |
Information and stats for B73 RefGen_v1
|
|
|
|
|
|
|
|
|
|
Change history
B73 RefGen_v1
First complete assembly of the B73 genome.
B73 RefGen_v2
Improvements to order and orientation of within-BAC contigs using the
minimum tiling path (MPT). Improvements to gene models.
B73 RefGen_v3
Captured missing gene space using WGS reads.
213
new gene models were introduced,
251
gene models were improved, and 10 gene models were merged
to create new models:
GRMZM2G000964, GRMZM2G103315 -> GRMZM2G000964
GRMZM2G045892, GRMZM2G452386 -> GRMZM2G045892
GRMZM2G119720, GRMZM2G518717 -> GRMZM2G119720
GRMZM2G142383, GRMZM2G020429 -> GRMZM2G142383
GRMZM2G319465, GRMZM2G439578 -> GRMZM2G319465
GRMZM2G338693, GRMZM2G117517 -> GRMZM2G338693
GRMZM5G861997, GRMZM5G864178 -> GRMZM5G861997
GRMZM5G872800, GRMZM2G143862 -> GRMZM5G872800
GRMZM5G891969, GRMZM5G823855 -> GRMZM5G891969
Zm-B73-REFERENCE-GRAMENE-4.0
A de novo assembly using PacBio technologies. New annotation analysis with
gene models linked to v3 gene models.
Zm-B73-REFERENCE-NAM-5.0
De novo Pac-Bio SEQUEL sequencing technology. Scaffolds
validated by improved BioNano optical mapping. New annotation
analysis.
|
|
|
|
|
|
|
|
|
|
B73 Reference Gene Models and Nomenclature
With increasing numbers of full reference genomes with structural
annotation becoming available, it has become necessary to establish naming
standards that span genomes and versions. The recommendation is available
here.
Important note: The B73 v5 and NAM founders genome assemblies were
released with a preliminary annotation, named "Zm00001e.1". The official
annotation was sufficiently different from the preliminary annotation that
it was given a new name, "Zm00001eb.1" and the gene models were given new
identifiers. Unlike previous annotations, the new identifiers are numbered
in sequential order.
Cross-reference files that translate between the preliminary and official
annotations can be found in the
MaizeGDB downloads. The
cross-reference for Zm00001e.1 to Zm00001eb.1 is
here.
The current reference gene model set is named Zm00001eb.1. Gene models
within this set are prefixed with "Zm00001eb". Associations between the v4
gene models (Zm0001d.2) and the 5b+ gene models are available
here.
Gene model sets (annotations) by reference assembly version:
gene model set |
description |
assembly version |
Gramene version |
cross reference |
Zm00001eb.1
|
Official v5 annotation |
Zm-B73-GRAMENE-NAM-5.0 |
64 |
xref
|
Zm00001e.1 |
Preliminary and replaced by Zm00001eb.1 |
Zm-B73-GRAMENE-NAM-5.0 |
N/A |
|
Zm00001d.2
|
Filtered Gene Set |
Zm-B73-GRAMENE-REFERENCE-4.0 |
36 |
xref
|
Zm00001d.1
|
Filtered Gene Set |
Zm-B73-GRAMENE-REFERENCE-4.0 |
32-33 |
xref
|
5b+
|
Filtered Gene Set, mostly projections of 5b |
RefGen_v3 |
18-31 |
xref
|
5a
|
Working Gene Set (WGS) |
RefGen_v2 |
7-17 |
|
5b
|
Filtered Gene Set (FGS) - subset of WGS |
RefGen_v2 |
7-17 |
|
4a.53 |
Filtered and Working gene sets |
RefGen_v1 |
|
|
The Zm00001eb.1 gene model set is the recommended gene model set
for Zm-B73-GRAMENE-REFERENCE-5.0 and is the representative gene model set
for maize.
For RefGen_v3, the 5b+ gene model set is recommended. Other
gene model sets for RefGen_v3 are provided for comparison. Due to
the difficulty of determining when two gene models are the same
(or when one represents an alternative splicing of the same
genomic material), there are no plans to merge the sets.
For more information see the
Nomenclature Standards
Alternative annotations
Additional annotations for the B73 genome assemblies have been
generated by groups outside the genome sequencing project. The
outside annotations listed below are shown as tracks on the
assembly browsers.
Assembly |
Name |
Source |
link |
Zm-B73-REFERENCE-NAM-5.0 |
NCBI 103 |
NCBI |
https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Zea_mays/103/ |
Zm-B73-REFERENCE-GRAMENE-4.0 |
NCBI 102 |
NCBI |
https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Zea_mays/102/ |
B73 RefGen_v3 |
NCBI 100 |
NCBI |
https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Zea_mays/100/ |
B73 RefGen_v3 |
EvidentialGene |
Don Gilbert, Indiana University |
http://arthropods.eugenes.org/EvidentialGene/plants/corn/evg5corn/ |
|
|
|
|
|
|
|
|
|
|
Description of Gramene/Ensembl versions of B73 genome download files
Versions supported by MaizeGDB is in bold
No changes were made to unmasked or masked assembly downloads unless noted.
Ensembl |
Gramene |
Assembly |
Gene Model Set |
Date |
Changes |
7-17 |
25-35 |
B73 RefGen_v2 |
5b |
11/30/10 - 03/10/13 |
not calculated |
18 |
36 |
B73 RefGen_v3 |
5b+ |
04/29/13 |
Initial B73 v3 downloads |
19 |
37 |
B73 RefGen_v3 |
5b+ |
07/08/13 |
GFF downloads available |
20 |
38 |
B73 RefGen_v3 |
5b+ |
09/10/13 |
17625 additional gene models [more] |
21 |
39 |
B73 RefGen_v3 |
5b+ |
01/16/14 |
2830 gene models removed, additional repeated masked files
[more]
|
22 |
40 |
B73 RefGen_v3 |
5b+ |
04/09/14 |
70676 models removed (WGS) [more] |
23 |
41 |
B73 RefGen_v3 |
5b+ |
09/01/14 |
166 gene models removed [more] |
24 |
42 |
B73 RefGen_v3 |
5b+ |
11/24/14 |
None |
25 |
43 |
B73 RefGen_v3 |
5b+ |
02/03/15 |
None |
26 |
44 |
B73 RefGen_v3 |
5b+ |
04/06/15 |
No repeat masked sequence in downloads |
27 |
45 |
B73 RefGen_v3 |
5b+ |
06/18/15 |
Repeat masked sequence returned to downloads |
28 |
46 |
B73 RefGen_v3 |
5b+ |
08/18/15 |
None |
29 |
47 |
B73 RefGen_v3 |
5b+ |
10/27/15 |
None |
30 |
48 |
B73 RefGen_v3 |
5b+ |
01/14/16 |
None |
31 |
49 |
B73 RefGen_v3 |
5b+ |
07/23/17 |
Repeat-masked gene model downloads |
32 |
50 |
Zm-B73-REFERENCE-GRAMENE-4.0 |
Zm00001d.1 |
08/04/16 |
Initial B73 v4 downloads |
33 |
51 |
Zm-B73-REFERENCE-GRAMENE-4.0 |
|
07/14/17 |
174 additional gene models (organelle), changes in 5825 gene models
[more], Mt and Pt
sequence added to assembly downloads, new repeat masked assembly FASTA files
(%Ns changed from 85.12% to 6.69%).
|
34 |
52 |
Zm-B73-REFERENCE-GRAMENE-4.0 |
|
12/18/16 |
Changes in 154 gene models [more],
no changes to assembly FASTA files.
|
35 |
53 |
Zm-B73-REFERENCE-GRAMENE-4.0 |
|
04/26/17 |
Changes in 787 gene models [more],
no changes to assembly FASTA files.
|
36 |
54 |
Zm-B73-REFERENCE-GRAMENE-4.0 |
Zm00001d.2 |
06/27/17 |
Changes in 154 gene models [more],
no changes to assembly FASTA files.
|
37 |
55 |
Zm-B73-REFERENCE-GRAMENE-4.0 |
|
10/05/17 |
1861 additional gene models including Rfam predictions, changes to 154 gene models
[more],
no changes to assembly FASTA files
|
38 |
56 |
Zm-B73-REFERENCE-GRAMENE-4.0 |
|
02/07/18 |
70 gene models removed, 7 added, changes to 1945 gene models
[more],
new repeat masked assembly FASTA files (%Ns chaged to 90.22)
|
39 |
57 |
Zm-B73-REFERENCE-GRAMENE-4.0 |
|
05/15/18 |
New repeat masked assembly FASTA files (%Ns chaged to 90.01) |
40 |
58 |
Zm-B73-REFERENCE-GRAMENE-4.0 |
|
05/15/18 |
New repeat masked assembly FASTA files (%Ns chaged to 90.22) |
41 |
59 |
Zm-B73-REFERENCE-GRAMENE-4.0 |
|
09/20/18 |
174 GRMZM ids for organelle gene models removed, 364 organelle gene models added. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
B73 Stock Information
The seed source for both Zm-B73-REFERENCE-NAM-5.0 and
Zm-B73-REFERENCE-GRAMENE-4.0 descended from PI 550473, but was
maintained for several generations prior to being used as the
source seed. The seeds closest to those used for sequencing v4 were deposited
at the NCRPIS (accession number: PI 677128).
The B73 source for the BAC libraries (BACs with prefix "b" prepared in
Rod Wing's lab; BACs with prefix "c" prepared in Peter deJong's lab)
was PI 550473.
When requesting seed from the North Central Regional Plant Introduction Station,
ask for any lot descended from the Coe PI 550473 lines.
The stock was received directly by the North Central Regional Plant
Intoduction Station from Arnel Hallauer and has been maintained by the
quality-maintenance procedures at the PI Station. Ed Coe reports that,
"The results of QC lab checks for constancy in PI 550473 have been excellent."
The same source was used for the IBM mapping population. Maps produced
at Missouri used 302 lines of this population, providing unmatched precision
(resolution is at the intra-BAC level). These maps anchor the fingerprint-based
contig assemblies to chromosome location.
High-Molecular-Weight DNA was prepared by Jack Gardiner in the lab at
Missouri and shipped to Clemson (Wing's lab at the time) and to deJong's lab
(just at the time his lab was moving to California) for BAC preparation.
NSF grant reports have documented the details, and specifics for the materials,
preparation, characterization, and final assembly of the contig framework can be found
in Coe E,
Schaeffer ML (2005) Genetic, physical, maps, and database resources for maize.
Maydica 50:285-303. Ed Coe has made a copy of that paper available
here
|
|
|
|
|
|
|
|
|
|
Chromosome - Genbank accessions reference |
Chromosome |
B73 RefGen_v1 |
B73 RefGen_v2 |
B73 RefGen_v3 |
B73 RefGen_v4 |
B73 RefGen_v5 |
Publication |
Chromosome 1 |
GK000031.1
|
GK000031.2
|
GK000031.3
|
CM007647.1
|
LR618874.1
|
PubMed,
MaizeGDB
|
Chromosome 2 |
GK000032.1
|
GK000032.2
|
GK000032.3
|
CM007648.1
|
LR618875.1
|
PubMed,
MaizeGDB
|
Chromosome 3 |
GK000033.1
|
GK000033.2
|
GK000033.3
|
CM007649.1
|
LR618876.1
|
PubMed,
MaizeGDB
|
Chromosome 4 |
CM000780.1
|
CM000780.2
|
CM000780.3
|
CM000780.4
|
LR618877.1
|
PubMed,
MaizeGDB
|
Chromosome 5 |
CM000781.1
|
CM000781.2
|
CM000781.4
|
CM000781.4
|
LR618878.1
|
PubMed,
MaizeGDB
|
Chromosome 6 |
CM000782.1
|
CM000782.2
|
CM000782.3
|
CM000782.4
|
LR618879.1
|
PubMed,
MaizeGDB
|
Chromosome 7 |
GK000034.1
|
GK000034.2
|
GK000034.3
|
CM007650.1
|
LR618880.1
|
PubMed,
MaizeGDB
|
Chromosome 8 |
CM000784.1
|
CM000784.2
|
CM000784.3
|
CM000784.4
|
LR618881.1
|
PubMed,
MaizeGDB
|
Chromosome 9 |
CM000785.1
|
CM000785.2
|
CM000785.3
|
CM000785.4
|
LR618882.1
|
PubMed,
MaizeGDB
|
Chromosome 10 |
CM000786.1
|
CM000786.2
|
CM000786.3
|
CM000786.4
|
LR618883.1
|
PubMed,
MaizeGDB
|
WGS (Whole Genome Shotgun) records at GenBank:
Zm-B73-REFERENCE-GRAMENE-4.0
Zm-B73-REFERENCE-NAM-5.0
|
|
|
|
|
|
|
|
|
|
Publications
Hufford et
al., 2021 De novo assembly, annotation, and comparative analysis of 26
diverse maize genomes. (Preprint)
Jiao et al., 2017.
Improved maize reference genome with single-molecule technologies.
Jiao et al., 2017.
Improved maize reference genome with single-molecule technologies.
Law et al., 2015.
Automated update, revision, and quality control of the maize genome
annotations using MAKER-P improves the B73 RefGen_v3 gene models and
identifies new genes.
Wei et al., 2009.
The physical and genetic framework of the maize B73 genome.
Schnable et al., 2009.
The B73 maize genome: complexity, diversity, and dynamics.
Wei et al., 2007.
Physical and Genetic Structure of the Maize Genome Reflects Its Complex
Evolutionary History.
Bi et al., 2006.
Single Nucleotide Polymorphisms and Insertion–Deletions for Genetic Markers
and Anchoring the Maize Fingerprint Contig Physical Map.
Gardiner et al., 2004.
Anchoring 9,371 maize expressed sequence tagged unigenes to the bacterial
artificial chromosome contig map by two-dimensional overgo hybridization.
Coe et al., 2002.
Access to the Maize Genome: An Integrated Physical and Genetic Map.
Yim et al., 2002.
Characterization of Three Maize Bacterial Artificial Chromosome Libraries
toward Anchoring of the Physical Map to the Genetic Map Using High-Density
Bacterial Artificial Chromosome Filter Hybridization.
|
|
|
|
|
|
|
|
|
|
FAQs
What is a Reference Genome?
What is a Representative Genome?
What are the main changes between the v4 and v5 assemblies?
What are the main changes between the v3 and v4 assemblies?
What are the main changes between the v2 and v3 assemblies?
Why was a preliminary v5 annotation released?
How can I map positions between the v2 and v3 assemblies?
Where can I find legacy resources from MaizeSequence.Org?
How can I identify the Filtered Gene Set (FGS) in RefGen_v3?
Where can I download a GFF dump of the FGS for maize genes in v3 (5b+)?
What is a Reference Genome?
A Reference Genome is a haploid representation of a genome as
DNA sequence with a defined coordinate system, and accession
and version identification. A Reference genome is usually
assembled de novo, rather than relying on related genomes for
assembly of small DNA fragments (which would be a reference
guided assembly). A Reference Genome usually includes the
structural annotations, or gene models, derived from the
sequence assembly. A Reference Genome is almost always a work
in progress that gets better with the additional new data over
time. Data for improvement is collected continually, and at
certain times, new Reference Genome versions come out that
incorporate this data. B73 RefGen_v3 is such an updated
version.
What is a Representative Genome?
A Representative Genome is a reference-quality genome
which is considered to be representative for a species. B73 is
the representative maize genome.
What are the main changes between RefGen_v4 (Zm-B73-REFERENCE-GRAMENE-4.0)
and Zm-B73-REFERENCE-NAM-5.0?
Zm-B73-REFERENCE-NAM-5.0 is a de novo assembly using improved PacBio long-read
technology and BioNano optical maps, using the same tissue sourc as
RefGen_v4 (Zm-B73-REFERENCE-GRAMENE-4.0).
What are the main changes between RefGen_v3 and RefGen_v4
(Zm-B73-REFERENCE-GRAMENE-4.0)?
Zm-B73-REFERENCE-GRAMENE-4.0 was a complete de novo assembly using
PacBio technology on DNA extracted from a descendant of the accession
used for the v1 - v3 assemblies.
What are the main changes between RefGen_v2 and RefGen_v3?
Changes to the assembly include:
-
v3 captured missing gene space in v2 using WGS reads (v2
improved initial BAC assembly using MTP)
-
Several contigs were moved or flipped.
Why was a preliminary v5 annotation released?
A preliminary annotation, Zm00001e.1, was release alongside the v5 genome
assembly to put tools into the hands of researchers as soon as possible,
but with warnings to not rely on any specific gene models until the formal
annotation, Zm00001eb.1 was released.
How can I map positions between the v4 and v5 assemblies?
There is no converter yet available for translating between
v4 and v5 positions, but chain files are available
here,
which can be used with
LiftOver
or CrossMap
to convert sets of coordinates. Be aware that features on the
unplaced scaffolds in the v4 assembly will not be correctly
translated to the v5 assembly.
How can I map positions between the v2 and v3 assemblies?
Use the Ensembl assembly converter tool
at Gramene.
Where can I find legacy resources from MaizeSequence.Org?
At the Gramene
ftp archive.
How can I identify the Filtered Gene Set (FGS) in RefGen_v3?
In the 5b+ gene build, the former FGS gene models are
indicated as protein-coding.
Where can I download a GFF dump of the FGS for maize genes in v3 (5b+)?
From the Gramene 5b+
ftp folder.
|
|
|
|
|
|
|