0% found this document useful (0 votes)
66 views22 pages

SRA Metadata Submission Guide

The document provides instructions for filling out a spreadsheet for SRA metadata submission, emphasizing the use of comments, hyperlinks, and drop-down menus. It details the requirements for saving the spreadsheet as a TSV file, including unique identifiers for libraries and file names. Additionally, it outlines various sequencing strategies, sources, selections, and platforms relevant to the submission process.

Uploaded by

wawmart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views22 pages

SRA Metadata Submission Guide

The document provides instructions for filling out a spreadsheet for SRA metadata submission, emphasizing the use of comments, hyperlinks, and drop-down menus. It details the requirements for saving the spreadsheet as a TSV file, including unique identifiers for libraries and file names. Additionally, it outlines various sequencing strategies, sources, selections, and platforms relevant to the submission process.

Uploaded by

wawmart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd

Instructions:

Each column that has a red triangle in the upper-right corner has a comment that can be displayed if y
Some column headers have hyperlinks to NCBI webpages.
The YELLOW columns have drop-down menues that allow you to select from a controlled vocabulary.

You must save spreadsheet under second tab (SRA_data) as a TSV (tab-delimited file) to uploa
If you created samples previously, provide accessions in the form of
SAMN# in the column sample_accession. Otherwise provide the
sample name used in the BioSample attributes spreadsheet.
Each row in the template represents a sequencing library with a
unique combination of sample + library + sequencing strategy +
layout + instrument model. Each row should have a unique
library_id that is short and meaningful (like an ID you might use in
lab).
When libraries are indeed identical (same combination of sample +
library + strategy + layout + instrument model), all files should be
placed in the same row To do this simply enter the file names
consecutively in the same row by adding more columns with headers
filename2, filename3, etc…. PAIRED files must always be listed in the
same row.
Provide exact file names (including extensions) in the filename
columns.
File names must be unique.

red triangles indicate pop-up comments for that field

Many of the columns also have data checks - if you received a warning, please verify that you have ent

NOTE: There are data checks and autocomplete features in this spreadsheet that are not compatible w
platform and instrument information on the last page.
Example Drop Down Menu

Helpful Hyperlinks:
SRA submission overview:
SRA submission in Submission Portal:

You must save spreadsheet under second tab (SRA_data) as a TSV (tab-delimited file) to uploa
comment that can be displayed if you hover over the header.

select from a controlled vocabulary. Once specified for one row, these values can be copied-and-pasted down.

a TSV (tab-delimited file) to upload the TSV file for the SRA metadata tab.

required for ALL data types required for aligned data paired-end data only

ning, please verify that you have entered a value from the drop-down menu. Example Below.

readsheet that are not compatible with Libre- and Open-Office. If you use one of these suites, please manually consult
http://www.ncbi.nlm.nih.gov/sra/docs/submit/
http://www.ncbi.nlm.nih.gov/sra/docs/submitportal/

a TSV (tab-delimited file) to upload the TSV file for the SRA metadata tab.
nd-pasted down.

please manually consult the


sample_name library_ID title
library_strategy library_source library_selection
library_layout platform instrument_model design_description filetype
filename filename2 filename3
filename4 assemblyfasta_file
Strategy

WGA Random sequencing of the whole genome following non-pcr


amplification
WGS Random sequencing of
Random sequencing of exonic
the whole genome
regions selected from the
WXS genome
RNA-Seq Random sequencing of whole transcriptome
miRNA-Seq Random sequencing of small miRNAs
Random sequencing of a whole chromosome or other replicon
WCS
isolated from a genome
CLONE
Genomic clone based (hierarchical) sequencing
POOLCLONE
Shotgun of pooled clones (usually BACs and Fosmids)
AMPLICON
Sequencing of overlapping or distinct PCR or RT-PCR products
CLONEEND Clone end (5', 3', or both) sequencing
FINISHING
Sequencing intended to finish (close) gaps in existing coverage
ChIP-Seq
Direct sequencing of chromatin immunoprecipitates
MNase-Seq Direct sequencing following MNase digestion

DNase-Hypersensitivity Sequencing of hypersensitive sites, or segments of open


chromatin that are more readily cleaved by DNaseI

Bisulfite-Seq Sequencing following treatment of DNA with bisulfite to convert


cytosine residues to uracil depending on methylation status
Tn-Seq Sequencing from transposon insertion sites
EST Single pass sequencing of cDNA templates
FL-cDNA Full-length sequencing of cDNA templates
CTS Concatenated Tag Sequencing
MRE-Seq
Methylation-Sensitive Restriction Enzyme Sequencing strategy
MeDIP-Seq
Methylated DNA Immunoprecipitation Sequencing strategy
MBD-Seq
Direct sequencing of methylated fractions sequencing strategy

Synthetic-Long-Read
ATAC-seq Assay for Transposase-Accessible Chromatin (ATAC) strategy is use
ChIA-PET Direct sequencing of proximity-ligated chromatin immunoprecipitat
FAIRE-seq Formaldehyde Assisted Isolation of Regulatory Elements. reveals re
Hi-C Chromosome Conformation Capture technique where a biotin-label
ncRNA-Seq Capture of other non-coding RNA types, including post-translation m
RAD-Seq
RIP-Seq Direct sequencing of RNA immunoprecipitates (includes CLIP-Seq, H
SELEX Systematic Evolution of Ligands by EXponential enrichment
ssRNA-seq strand-specific RNA sequencing
Targeted-Capture
Tethered Chromatin Conformation Capture
DIP-Seq DNA immunoprecipitation sequencing (DIP-Seq)
GBS
Inverse rRNA depletion of ribosomal RNA by oligo hybridization
NOMe-Seq
Ribo-seq
VALIDATION CGHub
Library special
strategyrequest: Independent
not listed experiment
(please include to re-evaluate
additional info in the pu
OTHER “design description”)

Source
GENOMIC Genomic DNA (includes PCR products from genomic DNA)
Transcription products or non genomic DNA (EST, cDNA, RT-PCR,
TRANSCRIPTOMIC
screened libraries)
METAGENOMIC Mixed material from metagenome
METATRANSCRIPTOMIC Transcription products from community targets
SYNTHETIC Synthetic DNA
VIRAL RNA Viral RNA
GENOMIC SINGLE CELL
TRANSCRIPTOMIC SINGLE CELL Other, unspecified, or unknown library source material (please
OTHER include additional info in the “design description”)

Selection
RANDOM
Random selection by shearing or other method
PCR Source material was selected by designed primers
RANDOM PCR
Source material was selected by randomly generated primers
RT-PCR Source material was selected by reverse transcription PCR
HMPR Hypo-methylated partial restriction digest
MF Methyl Filtrated
CF-S Cot-filtered single/low-copy genomic DNA
CF-M Cot-filtered moderately repetitive genomic DNA
CF-H Cot-filtered highly repetitive genomic DNA
CF-T Cot-filtered theoretical single-copy genomic DNA

MDA
Multiple displacement amplification
MSLL Methylation Spanning Linking Library
cDNA
complementary DNA
ChIP Chromatin immunoprecipitation
MNase Micrococcal Nuclease (MNase) digestion
DNAse Deoxyribonuclease (MNase) digestion
Hybrid Selection Selection
fragment by sizehybridization in array or
selection, containing a solution
manageable number of loci
Reduced Representation to facilitate re-sampling
Restriction Digest
DNA fractionation using restriction enzymes

5-methylcytidine antibody Selection of methylated DNA fragments using an antibody raised


against 5-methylcytosine or 5-methylcytidine (m5C)

MBD2 protein methyl-CpG binding domain


Enrichment by methyl-CpG binding domain

CAGE
Cap-analysis gene expression
RACE
Rapid Amplification of cDNA Ends

size fractionation
Physical selection of size appropriate targets

Padlock probes capture method


Circularized
Other libraryoligonucleotide probes or selection process (please
enrichment, screening,
other include enrichment,
Library additional info in the “design
screening, description”)
or selection is not specified
unspecified (please include additional info in the “design description”)
cDNA_oligo_dT
cDNA_randomPriming
Inverse rRNA depletion of ribosomal RNA by oligo hybridization.
Oligo-dT enrichment of messenger RNA (mRNA) by hybridization to Oligo-dT
PolyA PolyA selection or enrichment for messenger RNA (mRNA); should r
repeat fractionation Selection for less repetitive (and more gene rich) sequence through

Platforms
_LS454 _LS454
ABI_SOLID 454 GS
BGISEQ 454 GS 20
CAPILLARY 454 GS FLX
COMPLETE_GENOMICS 454 GS FLX+
DNBSEQ 454 GS FLX Titanium
ELEMENT 454 GS Junior
GENAPSYS
GENEMIND
HELICOS
ILLUMINA
ION_TORRENT
OXFORD_NANOPORE
PACBIO_SMRT
TAPESTRI
ULTIMA
VELA_DIAGNOSTICS
f the whole genome following non-pcr

off exonic
the whole genome
regions selected from the

of whole transcriptome
of small miRNAs
f a whole chromosome or other replicon
me

(hierarchical) sequencing

nes (usually BACs and Fosmids)

pping or distinct PCR or RT-PCR products


oth) sequencing

to finish (close) gaps in existing coverage

chromatin immunoprecipitates
owing MNase digestion

ensitive sites, or segments of open


ore readily cleaved by DNaseI

treatment of DNA with bisulfite to convert


racil depending on methylation status
sposon insertion sites
g of cDNA templates
g of cDNA templates
quencing

Restriction Enzyme Sequencing strategy

unoprecipitation Sequencing strategy

methylated fractions sequencing strategy

e-Accessible Chromatin (ATAC) strategy is used to study genome-wide chromatin accessibility. alternative method to DNase
proximity-ligated chromatin immunoprecipitates.
d Isolation of Regulatory Elements. reveals regions of open chromatin
mation Capture technique where a biotin-labeled nucleotide is incorporated at the ligation junction, enabling selective purifi
coding RNA types, including post-translation modification types such as snRNA (small nuclear RNA) or snoRNA (small nucle

RNA immunoprecipitates (includes CLIP-Seq, HITS-CLIP and PAR-CLIP).


of Ligands by EXponential enrichment

tion sequencing (DIP-Seq)


l RNA by oligo hybridization

t: Independent
sted experiment
(please include to re-evaluate
additional info in the putative variants

es PCR products from genomic DNA)


s or non genomic DNA (EST, cDNA, RT-PCR,

metagenome
s from community targets

unknown library source material (please


in the “design description”)

shearing or other method


elected by designed primers

elected by randomly generated primers


elected by reverse transcription PCR
ial restriction digest

-copy genomic DNA


y repetitive genomic DNA
etitive genomic DNA
l single-copy genomic DNA

amplification
Linking Library

ecipitation
(MNase) digestion
MNase) digestion
tion in array or
n, containing a solution
manageable number of loci
ng

ng restriction enzymes
ed DNA fragments using an antibody raised
sine or 5-methylcytidine (m5C)

-CpG binding domain

pression
cDNA Ends

ize appropriate targets

eotide probes or selection process (please


ent, screening,
in the “design
creening, description”)
or selection is not specified
onal info in the “design description”)

l RNA by oligo hybridization.


nger RNA (mRNA) by hybridization to Oligo-dT.
ichment for messenger RNA (mRNA); should replace cDNA enumeration.
etitive (and more gene rich) sequence through Cot filtration (CF) or other fractionation techniques based on DNA kinetics.

ILLUMINA HELICOS ABI_SOLID


HiSeq X Five Helicos HeliScope AB 5500 Genetic Analyzer
HiSeq X Ten AB 5500xl Genetic Analyzer
Illumina Genome Analyzer AB 5500x-Wl Genetic Analyzer
Illumina Genome Analyzer II AB SOLiD 3 Plus System
Illumina Genome Analyzer IIx AB SOLiD 4 System
Illumina HiScanSQ AB SOLiD 4hq System
Illumina HiSeq 1000 AB SOLiD PI System
Illumina HiSeq 1500 AB SOLiD System
Illumina HiSeq 2000 AB SOLiD System 2.0
Illumina HiSeq 2500 AB SOLiD System 3.0
Illumina HiSeq 3000
Illumina HiSeq 4000
Illumina HiSeq X
Illumina MiSeq
Illumina MiniSeq
Illumina NovaSeq 6000
Illumina NovaSeq X
Illumina NovaSeq X Plus
Illumina iSeq 100
NextSeq 1000
NextSeq 2000
NextSeq 500
NextSeq 550
ility. alternative method to DNase-seq that uses an engineered Tn5 transposase to cleave DNA and to integrate primer DN

junction, enabling selective purification of chimeric DNA ligation junctions followed by deep sequencing
ear RNA) or snoRNA (small nucleolar RNA), or expression regulation types such as siRNA (small interfering RNA) or piRNA/p
hniques based on DNA kinetics.

COMPLETE_GENOMICS PACBIO_SMRT ION_TORRENT


Complete Genomics PacBio RS Ion Torrent PGM
PacBio RS II Ion Torrent Proton
Revio Ion Torrent S5 XL
Sequel Ion Torrent S5
Sequel II Ion Torrent Genexus
Sequel IIe Ion GeneStudio S5
Onso Ion GeneStudio S5 Plus
Ion GeneStudio S5 Prime
sposase to cleave DNA and to integrate primer DNA sequences into the cleaved genomic DNA

s followed by deep sequencing


s such as siRNA (small interfering RNA) or piRNA/piwi/RNA (piwi-interacting RNA).
CAPILLARY OXFORD_NANOPORE BGISEQ DNBSEQ
AB 310 Genetic Analyzer GridION BGISEQ-50 DNBSEQ-G400
AB 3130 Genetic Analyzer MinION BGISEQ-500 DNBSEQ-G50
AB 3130xL Genetic Analyzer PromethION MGISEQ-2000RS DNBSEQ-T7
AB 3500 Genetic Analyzer DNBSEQ-G400 FAST
AB 3500xL Genetic Analyzer
AB 3730 Genetic Analyzer
AB 3730xL Genetic Analyzer
ELEMENT GENAPSYS GENEMIND TAPESTRI ULTIMA VELA_DIAGNOSTICS
Element AVITI GS111 FASTASeq 300 Tapestri UG 100 Sentosa SQ301
GenoCare 1600
GenoLab M

You might also like