|
How to Contribute Data to MaizeGDB
|
|
MaizeGDB accepts data that meets these criteria:
-
Data adheres to the FAIR data principles
-
Data and metadata have been published in a publically available
scientific journal
-
Primary data is deposited in the appropriate repository (like NCBI)
-
Data is licensed as public data (See
https://creativecommons.org/licenses).
-
Comprehensive metadata (information about your dataset) is provided and
meets MaizeGDB standards for the data type
-
Additional criteria may be required depending on the data type
MaizeGDB will accept data directly from researchers. Some types of data
must also be deposited at standard repositories, as MaizeGDB is not a
permanent repository for primary data. The MaizeGDB team
can advise on the correct repository and help with the submission, if
needed. More detailed information on criteria by datatype is below.
It is particularly important that all sequence data be submitted to NCBI's
GenBank (US), EBI's
ENA (Europe), or the
DDBJ (Japan). These repositories
share all of their data on a daily basis, so data submitted to any of
these will be visible at all three.
BECOME A COMMUNITY CURATOR
Members of the maize community are also encouraged to become community
curators. Community curators can contribute comments to most data types,
including gene loci, gene models, markers, et cetera. To create a community
annotator account, follow the link at the top of the page marked
"login/register" and check the box
labeled "I am interested in being a MaizeGDB curator" when you fill
out the form to "Create an Annotation Account". You will be contacted via
e-mail when your account is activated.
If you already have a MaizeGDB login but are not a community curator,
click here to request permission to be a curator.
How to contribute:
|
|
NEW GENES, GENE FUNCTION, AND GENE TO GENE MODEL ASSOCIATIONS
If your research is identifying new genes, providing evidence of
gene function, and/or linking gene loci to gene models, it would be
very helpful to MaizeGDB and the maize community, as well as
increasing visibility of your research if you submit this information
directly to MaizeGDB rather than relying on the MaizeGDB curators
reading your paper and extracting this information to load into the
database. The number of maize papers published each month is much
greater than the number of papers that can be curated. Send your paper
to a MaizeGDB curator to have your gene
function information loaded into the MaizeGDB database.
Notes can be added directly to records at MaizeGDB by researchers.
To add a note, you will need a community curation account. Log in to
the site using the login/register link displayed at the top of any
MaizeGDB page. Once logged in, click "Add free text annotation" in
the annotation section of most data record displays.
GENOME ASSEMBLIES
When your genomic data is hosted at MaizeGDB, it increases the visibility and
usability of your data. Depending on the level of hosting, genomic datasets are
presented as downloads and BLAST targets, and via a variety of visualizations
and browsers. Selected genomic datasets are also included in pan-Zea analyses,
and linked to other data hosted by MaizeGDB.
Preparing a genome assembly for hosting at MaizeGDB is a
lengthy process. Satr by contacting a
MaizeGDB sequence curator to receive identifiers for
your assembly and annotations and for additional instructions before submitting
your assembly to Genbank or other INSDC database . The MaizeGDB
metadata template
contains information required by GenBank (also ENA and DDBJ) for
genome assembly submissions, along with additional information
required by MaizeGDB for hosting genomes.
Requirements for hosting genomic dataset at MaizeGDB at any hosting level:
-
Must follow the
maize nomenclature rules
-
Chromosome names should be prefaced with "chr", for example, chr1, chr2, et
cetera. Unplaced scaffolds should be named individually, NOT be combined
into one. That is, there should be no "chr0" or "chrUnplaced".
- Must be deposited in an INSDC database (Genbank, EBI, or DDBJ)
-
Include a completed metadata spreadsheet.
- If possible, sibling seed and pedigree should be deposited at GRIN.
-
A contact person from your team/consortium should be assigned to coordinate
with MaizeGDB staff.
To submit your data to EBI's ENA, see instructions
here.
Note that ENA submission requires command line skills as there is
no longer a web portal for submitting genome assemblies. Additional tutorials
for submiting data to ENA are available:
To submit your data to GenBank, see these
tutorials.
Note that NCBI checks for vector/primer contamination so you
may need to warn in advance about known chloroplast sequence in the
maize nuclear genome.
Level 1:
- Web page describing the dataset (from the metadata spreadsheet).
-
Downloads of the assembly FASTA, annotation GFF, annotation CDS and
protein files. Additional download files will be considered if available.
- Public BLAST targets.
- Will be tagged as a pre-release under Toronto if requested.
Level 2:
To qualify for Level 2 hosting, the assembly must be of reference quality,
the dataset must include gene model annotation and at least two additional
datasets aligned on the assembly, and the data must be published, at
least as a preprint, and therefore, not under Toronto agreement.
Level 2 hosting includes everything from Level 1 and:
- A genome browser.
- Inclusion in Zea pan-gene analyses.
- If part of a larger sequencing project, a page describing the project.
Level 3:
To qualify for Level 3 hosting, the dataset will need to meet all requirements
above and, the cultivar or accession must be considered to be of significant
importance by the community.
Level 3 hosting includes everything from Levels 1 and 2, and:
- Browser enhancements, e.g. gene models from other assemblies lifted.
-
Individual gene model pages that link gene models to other data held by
MaizeGDB.
- Inclusion in PanEffect Tool, possibly other tools.
Revisions
When a genome assembly is updated, the latest version only is used in the
Zea pan-gene analysis. The exception is the representative maize genome,
B73 and the commonly-used Mo17, where all versions are included. Downloads,
BLAST targets, and (depending on hosting level) browsers are maintained
for the earlier version as well.
New assembly by a different group
If a cultivar or accession is reassembled by another group, if one is
significantly improved over the other, only the more improved dataset is used
in Zea pan-gene analyses. Downloads, BLAST targets, and (depending on hosting
level) browsers are maintained for both versions.
ANNOTATIONS
Genome annotations (gene models) can be hosted at MaizeGDB in the
form of browser tracks and/or downloads of GFF or FASTA files.
It is best to have both transcript and protein FASTA. Additional
data is also accepted, for example, SNP alignments, orthologs, et
cetera. Contact a MaizeGDB sequence curator
for more information.
See the
nomenclature guidelines
for naming your gene models.
OTHER NUCLEOTIDE SEQUENCES, INCLUDING INDIVIDUAL GENES
Non-genome nucleotide sequences should also be submitted to
GenBank,
EBI's ENA
or the
DDBJ.
GenBank (US), EBI (Europe), and DDBJ (Japan) share sequence data on a
daily basis, so data can be submitted to any of the three.
Data sets that can be aligned to one or more reference genomes hosted
by MaizeGDB can also be added to genome browsers. the
Contact a MaizeGDB sequence curator for more
information.
PROTEIN SEQUENCES
NEXT GENERATION SEQUENCE READS
MAPPED SEQUENCE READS AND OTHER EXPRESSION DATA
MAIZE SNPS
GenBank dbSNP no longer accepts non-human SNPs. Maize SNPS should
be submitted to
EBI's
EVA. It is important to submit SNPs to EVA because EVA will provide
permanent identifiers for each SNP, and will collapse identical SNPs
into one record, maintaining the original submission identifier as
well as a consensus identifier.
MaizeGDB will also host tracks of aligned SNPs on the genome browsers.
Note that MaizeGDB does not have sufficient personnel to do the
alignments. Contact a MaizeGDB team member
for more information.
GENOTYPE AND PHENOTYPE DATA
MaizeGDB had a long history of curating genotype and phenotype data
from individual genes or small sets of genes. Detailed lab-driven
data about any gene is important to the Maize community, as this is
the best functional data we can have. We want to curate as much of
this type of data as possible. This includes detailed descriptions
of mutant phenotypes and phenotypic changes to mutant expression in
various genotypes. This type of data can be submitted by email to the
MaizeGDB curators.
MAPS
MaizeGDB team welcomes genetic maps. Please contact a
MaizeGDB curator for more information.
METABOLOMICS, IONOMICS AND OTHER DATA TYPES
Contact a MaizeGDB team member if you have a
dataset you would like hosted at MaizeGDB which is not listed here.
We will work with you to see if and how your data can be hosted. To
learn more about data repositories for these types of data, please
see the FAIR data page.
|
FAQs
|
|
WHERE DOES THE DATA STORED AT MAIZEGDB COME FROM?
-
The original data was inherited from the MaizeDB and ZmDB projects.
-
Sequence data comes from GenBank, genome assembly and annotation
groups, and other research groups that are producing genomic,
transcriptomic, and proteomic sequence data for maize.
-
Other types of bulk data are contributed by community members,
usually in standard file formats like GFF, VCF, BED and are added
to the database by members of the MaizeGDB Team.
-
Pubic data from published literature are hand curated and entered
record-by-record by MaizeGDB and community curators.
HOW DO I BECOME A COMMUNITY CURATOR?
To create a community annotator account, follow the
link at the top of the page marked
"login/register" and check the box to become labeled "I am interested
in being a MaizeGDB curator" when you fill out the form to "Create an
Annotation Account". You will be contacted via e-mail when your
account is activated.
HOW CAN COMMUNITY MEMBERS CONTRIBUTE DATA?
MaizeGDB staff members regularly attend the Plant and Animal Genome
Conference in San Diego, California and the Annual Maize Genetics
Conference. To schedule a meeting at any of these conferences,
use the feedback form at the top of this page to contact the
MaizeGDB team, or you can contact a specific
MaizeGDB member.
Contact MaizeGDB directly with a request to host your data, using the
feedback button at the top of the page or contact
a specific MaizeGDB member.
Notes can be added directly to records at MaizeGDB by researchers.
To add a note, you will need a
community curation account. Log in
to the site using the login/register link displayed at the top of
any MaizeGDB page. Once logged in, click "Add free text annotation"
in the annotation section of most data record displays.
Although researchers are encouraged to use a standard long-term
repository that is appropriate for the type of data, large datasets
can be made available through MaizeGDB by special arrangement. Use
the feedback form at the top of this page to contact the MaizeGDB
team or contact a specific MaizeGDB member to
find out what arrangements can be made to accommodate your data.
If
possible, it is best to contact the MaizeGDB Team before you begin to
generate large datasets so that a standardized format can be agreed
upon and so that a customized pipeline can be created for importing
your data in a timely and efficient manner.
WHEN CAN I EXPECT DATA I GENERATED TO APPEAR AT MAIZEGDB?
The MaizeGDB database is typically updated the first Tuesday of each
month.
Unless you have contacted us to make specific arrangements to
accommodate your data, you shouldn't expect it to appear at the
MaizeGDB site unless our data curators have curated your paper,
usually because it was recommended by the
Editorial Board. Use the feedback form
at the top of this page to contact the MaizeGDB team to find out what
arrangements can be made to make your important data become available
through MaizeGDB.
THE AGENCIES THAT FUND OUR RESEARCH HAVE ENCOURAGED ME TO CONTRIBUTE DATA TO
MAIZEGDB. WHAT CAN I DO TO ENSURE THAT MAIZEGDB WILL TAKE MY DATA?
If you wish to contribute a large dataset, you should contact the
MaizeGDB team to make special arrangements for
its inclusion at MaizeGDB.
Note that contacting MaizeGDB personnel well in advance and committing funds
from your grants to cover the cost of personnel to curate your data into
MaizeGDB are the best ways of ensuring that MaizeGDB can accommodate
your requests for data storage.
In general, if the data you are generating have historically been stored at
MaizeGDB (e.g., your project is planning to generate genetic maps using a
new set of probes), it is very easy for us to commit to including your
data in the database. However, if you are proposing to create data of a
type that is not currently stored at MaizeGDB, more work would be
required of the staff at MaizeGDB (e.g., it may be necessary to make
new tables in which to store your data and new data displays would be
needed for the website).
Unless you have contacted the MaizeGDB team, please do not assume that we can
accommodate your data. We are happy to make special arrangements to create new
tables and data displays (e.g., we collaborate with the FSU
Cytogenetic Map of Maize Project to
make their cytological images and data available).
In summary, we encourage you to contact us prior to reporting to the funding
agencies that we will take any and all maize data your project plans to
generate.
WHAT SHOULD I DO IF I AM PLANNING A GENOME ASSEMBLY PROJECT OR
PLANNING TO ANNOTATION THE MAIZE GENOME?
|
|