GENES Database
KEGG GENES is a collection of genes and proteins in complete genomes of cellular organisms and viruses generated from publicly available resources, mostly from NCBI RefSeq and GenBank, and annotated by KEGG in the form of KO (KEGG Orthology) assignment. The collection is supplemented with a KEGG original collection of functionally characterized proteins from published literature.
Protein sequences and RNA sequences of all GENES entries are subject to SSDB computation and KO assignment by KOALA tools (see annotation statistics).
The Addendum category is a PubMed-based collection of protein sequences whose functions are experimentally characterized. They are used to define new KOs that are not covered by complete genomes (see KO database).
The viral peptide (vp) category is a collection of mature peptides processed from genome-encoded polyproteins, which are not usually found as separate entries in the public databases such as NCBI and UniProt. Viral mature peptides appear in KEGG pathway maps and as drug targets and are given KOs.
Each GENES entry is identified by the combination of organism code and gene identifier in the form of
Category | Content | Data source | Organism code |
Gene identifier | Genome identifier |
KEGG organisms (Complete genomes) |
Genes and proteins in cellular organisms | RefSeq or GenBank |
<org> | GeneID or Locus_tag |
T number |
KEGG Viruses | Genes and proteins in viruses | RefSeq | vg | GeneID | Taxonomy ID |
Mature peptides in viruses | vp | GeneID-no | |||
Addendum | Functionally characterized proteins | Publication | ag | ProteinID, etc | N/A |
<org> three- or four-letter organism code for cellular organisms
The viral peptide (vp) category is a collection of mature peptides processed from genome-encoded polyproteins, which are not usually found as separate entries in the public databases such as NCBI and UniProt. Viral mature peptides appear in KEGG pathway maps and as drug targets and are given KOs.
Each GENES entry is identified by the combination of organism code and gene identifier in the form of
- org:gene
SSDB Database
All genome pairs in KEGG are subject to SSDB (Sequence Similarity DataBase) computation using the SSEARCH program for both amino acid sequences (protein coding genes) and nucleotide sequences (RNA genes). For each gene, an organism-based list of similarity neighbors is generated and displayed in a tabular form, called GFIT table. It shows the best hit sequence in each matching organism, as can be viewed from button in the GENES entry page. The collection of GFIT tables is the basis for both manual and automatic annotation of the GENES database.
Each GENES entry page also contains button for viewing a list of best hit genes in other organisms and button for viewing a list of similar genes within the same organism. Furthermore button shows similar genes along the chromosome, up to ten genes on both sides. All of these are based on SSDB computations.
The following interface may also be used to query against the SSDB database.
Each GENES entry page also contains button for viewing a list of best hit genes in other organisms and button for viewing a list of similar genes within the same organism. Furthermore button shows similar genes along the chromosome, up to ten genes on both sides. All of these are based on SSDB computations.
The following interface may also be used to query against the SSDB database.
Enter keggid (org:gene) (Example) syn:ssr3451
Gene Annotation Tools
The annotation of KEGG GENES involves assignment of KO identifiers (K numbers).
Internally, this is done using the KOALA/KoAnn and GFIT annotation tools (see: KO Database).
For outside users, the following automatic annotation servers are made available.
- BlastKOALA
- automatic KO assignment by BLASTP sequence similarity search
- GhostKOALA
- automatic KO assignment by GHOSTX sequence similarity search
- KofamKOALA
- automatic KO assignment by HMM profile search
For a single sequence, the following standard programs may be used to find matching sequences with assigned KOs.
Gene Identifier Conversion
KEGG GENES can be retrieved by giving identifiers of outside databases: NCBI-ProteinID (INSDC accession), NCBI-GeneID (Entrez Gene ID) and UniProt accession numbers.
Last updated: September 1, 2024