GFF source methods
GFF source and feature
GFF2 description at the Sanger Institute
In the WormBase GFF files genes are represented in several ways each specified by a different source and feature (second and third columns)
Gene spans
This is the largest extent of a genes' transcripts from the begining of the most 5' transcripts 5' UTR to the end of the most 3' transcripts 3' UTR. Each gene is represented as a single line.
- source = gene; feature = gene.
CHROMOSOME_III gene gene 9488630 9489091 . + . Gene "WBGene00007185" ; Position "0.567795" ; Locus "nlp-36"
CDS
A CDS is the coding sequence of a gene from the start codon to the stop codon (so does not include UTR). A gene may have 1 or more CDS's. Each CDS is represented as a single line describing the start and end coordinates.
- source = curated; feature = CDS.
CHROMOSOME_III curated CDS 9488634 9488986 . + . CDS "B0464.3" ; WormPep "CE:CE00017" ; Locus "nlp-36" ; Status "Confirmed" ; Gene "WBGene00007185" ;
The individual exons and introns are described with a single line per exon / intron eg
So for the exampe 3 exon gene . . .
- source = curated; feature = exon;
CHROMOSOME_III curated exon 9488634 9488695 . + . CDS "B0464.3" CHROMOSOME_III curated exon 9488749 9488839 . + . CDS "B0464.3" CHROMOSOME_III curated exon 9488891 9488986 . + . CDS "B0464.3"
- source = curated; feature = intron;
CHROMOSOME_III curated intron 9488696 9488748 . + . CDS "B0464.3" ; Confirmed_EST FM248941 ; Confirmed_EST OSTF051A3_1 CHROMOSOME_III curated intron 9488840 9488890 . + . CDS "B0464.3" ; Confirmed_EST yk1241c10.5 ; Confirmed_EST OSTF051A3_1
Exons are also represented with their coding phase
- source = curated; feature= coding_exon; (note: the actual coordinates are the same)
CHROMOSOME_III curated coding_exon 9488634 9488695 . + 0 CDS "B0464.3" CHROMOSOME_III curated coding_exon 9488749 9488839 . + 1 CDS "B0464.3" CHROMOSOME_III curated coding_exon 9488891 9488986 . + 0 CDS "B0464.3"
Coding_transcript
Each CDS can have one or more Coding_transcripts. Where a CDS has multiple transcript they will only vary in the UTRs. Coding_transcripts are the best equivalent to a full length mRNA that we can build based on available evidence. They go from transcription start site (Eg SL1) to polyA site. The full extent of a Coding_transcript is defined as a single line and this CDS has two Coding_transcripts . .
- source = Coding_transcript; feature = protein_coding_primary_transcript
CHROMOSOME_III Coding_transcript protein_coding_primary_transcript 9488630 9489087 . + . Transcript "B0464.3.1" CHROMOSOME_III Coding_transcript protein_coding_primary_transcript 9488632 9489091 . + . Transcript "B0464.3.2"
Compare the coordinates of these to the full gene span (above), which is bigger than both of these extending from 9488630 to 9489091 - the outer extremities of the two coding_transcipts.
Each Coding_transcript is composed of the following feature types; coding_exons, introns, five_prime_UTR and three_prime_UTR
Exons - source = Coding_transcript; feature = exon
CHROMOSOME_III Coding_transcript exon 9488630 9488695 . + . Transcript "B0464.3.1" CHROMOSOME_III Coding_transcript exon 9488749 9488839 . + . Transcript "B0464.3.1" CHROMOSOME_III Coding_transcript exon 9488891 9489087 . + . Transcript "B0464.3.1"
and as the CDS does has a "coding_exon" equivalent
Exons - source = Coding_transcript; feature = coding_exon
CHROMOSOME_III Coding_transcript coding_exon 9488634 9488695 . + 0 Transcript "B0464.3.1" ; CDS "B0464.3" CHROMOSOME_III Coding_transcript coding_exon 9488749 9488839 . + 1 Transcript "B0464.3.1" ; CDS "B0464.3" CHROMOSOME_III Coding_transcript coding_exon 9488891 9488986 . + 0 Transcript "B0464.3.1" ; CDS "B0464.3"
Introns - source = Coding_transcript; feature = intron
CHROMOSOME_III Coding_transcript intron 9488696 9488748 . + . Transcript "B0464.3.1" ; Confirmed_EST FM248941 ; Confirmed_EST OSTF051A3_1 CHROMOSOME_III Coding_transcript intron 9488840 9488890 . + . Transcript "B0464.3.1" ; Confirmed_EST yk1241c10.5 ; Confirmed_EST OSTF051A3_1
UTRs
- source = Coding_transcript; feature = five_prime_UTR
- source = Coding_transcript; feature = three_prime_UTR
Each non-coding exon is represented as a single line
CHROMOSOME_III Coding_transcript five_prime_UTR 9488630 9488633 . + . Transcript "B0464.3.1" CHROMOSOME_III Coding_transcript three_prime_UTR 9488987 9489087 . + . Transcript "B0464.3.1"