|
KEGG Overview1. KEGG DatabasesKEGG is an integrated database resource consisting of 16 main databases, broadly categorized into systems information, genomic information, and chemical information as shown below. Genomic and chemical information represents the molecular building blocks of life in the genomic and chemical spaces, respectively, and systems information represents functional aspects of the biological systems, such as the cell and the organism, that are built from the building blocks. KEGG has been widely used as a reference knowledge base for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies.
Category | Database | Content | Systems
information | KEGG PATHWAY | Pathway maps for metabolism and other cellular processes, as well as human diseases; manually created from published materials | KEGG BRITE | Functional hierarchies (ontologies) representing our knowledge on various aspects of biological systems; manually created from published materials | KEGG MODULE | Tighter functional units for pathways and complexes; manually defined | KEGG DISEASE | List of disease genes and molecules; manually entered from published materials | KEGG DRUG | Chemical structures and associated information of approved drugs in Japan, USA, and Europe; manually entered from published materials | KEGG EDRUG | Chemical components and associated information of crude drugs and other natural products; manually entered from published materials | Genomic
information | KEGG ORTHOLOGY | KEGG Orthology (KO) groups based on PATHWAY and BRITE; manually defined | KEGG GENOME | Genome maps and organism information; generated from RefSeq and other public resources | KEGG GENES | Gene catalogs of complete genomes with manual annotation; generated from RefSeq and other public resources | KEGG SSDB | Sequence similarity scores and best-hit relations; computationally derived from GENES by pairwise genome comparisons of all protein-coding genes | KEGG DGENES | Gene catalogs of draft genomes with automatic annotation; generated from web resources | KEGG EGENES | Gene catalogs (consensus contigs) of EST data with automatic annotation; generated from dbEST | KEGG MGENES | Gene catalogs of metagenomes with automatic annotation; generated from NCBI resources | Chemical
information | KEGG COMPOUND | Chemical compounds; manually entered from published materials[/td] | KEGG GLYCAN | Glycans; manually entered from published materials | KEGG REACTION | Chemical reactions; manually defined from ENZYME and PATHWAY | KEGG RPAIR | Chemical structure transformation patterns; manually defined from REACTION | KEGG RCLASS | Reaction class defined by chmeical structure transformation patterns of main reactant pairs; generated from RPAIR with annotation | KEGG ENZYME | Enzyme nomenclature; generated from ExplorEnz with annotation by KEGG |
2. KEGG ObjectsKEGG is a computer representation of the biological systems. It is based on the concept of graph for representation and manipulation of various KEGG objects from molecular to higher levels. Mathematically, a graph is a set of nodes (KEGG objects) and edges (biological relationships). Each of the KEGG objects (database entries) is given an unique identifier shown below.
Release | Database | Object Identifier | 1995 | KEGG PATHWAY | map number | | KEGG GENES | locus_tag / GeneID | | KEGG ENZYME | EC number | | KEGG COMPOUND | C number | 2000 | KEGG GENOME | organism code / T number | 2001 | KEGG REACTION | R number | 2002 | KEGG ORTHOLOGY | K number | 2003 | KEGG GLYCAN | G number | 2004 | KEGG RPAIR | RP number | 2005 | KEGG BRITE | br number | | KEGG DRUG | D number | 2007 | KEGG MODULE | M number | 2008 | KEGG DISEASE | H number | 2010 | KEGG EDRUG | E number | | KEGG RCLASS | RC number | KEGG objects are linked to/from major life science databases. KEGG objects are also part of the Web; they can be found by Web search engines.
Graph | Node | Edge | Search and Analysis | KEGG | KEGG object | Biological relationship | KEGG | Integrated database | Entry | Cross-reference link | DBGET, Entrez, SRS, etc. | Web | Web page | Hyperlink | Google, etc. |
3. Network HierarchyThe molecular interaction/reaction network is the most unique data object in KEGG, which is stored as a collection of pathway maps (graphical diagrams) in the PATHWAY database. Reflecting the map resolution, KEGG PATHWAY is organized in a hierarchy. The top two levels in the current hierarchy is the following.
First Level | Second Level | Metabolism | Carbohydrate Metabolism
Energy Metabolism
Lipid Metabolism
Nucleotide Metabolism
Amino Acid Metabolism
Metabolism of Other Amino Acids
Glycan Biosynthesis and Metabolism
Metabolism of Cofactors and Vitamins
Metabolism of Terpenoids and Polyketides
Biosynthesis of Other Secondary Metabolites
Xenobiotics Biodegradation and Metabolism | Genetic Information Processing | Transcription
Translation
Folding, Sorting and Degradation
Replication and Repair | Environmental Information Processing | Membrane Transport
Signal Transduction
Signaling Molecules and Interaction | Cellular Processes | Transport and Catabolism
Cell Motility
Cell Growth and Death
Cell Communication[/td] | Organismal Systems | Immune System
Endocrine System
Circulatory System
Digestive System
ExcretorySystem
Nervous System
Sensory System
Development
Environmental Adaptation | Human Diseases | Cancers
Immune System Diseases
Neurodegenerative Diseases
Cardiovascular Diseases
Metabolic Diseases
Infectious Diseases |
4. Network ReconstructionOriginally, the integration of pathway information and genomic information was first achieved in KEGG by the EC numbers. Once the EC numbers were correctly assigned to enzyme genes in the genome, organism-specific pathways could be generated automatically by matching against the networks of EC numbers (enzymes) in the reference metabolic pathways. However, in order to incorporate non-metabolic pathways and to overcome various problems inherent in the enzyme nomenclature, a new scheme based on the ortholog IDs was introduced replacing the EC numbers. KO (KEGG Orthology) is a further extension of ortholog IDs based on not only the pathway maps but also the BRITE functional hierarchies, most notably classifications of protein families.
Period | Identifier | Mapping | Assignment | 1995-1999 | EC number | Metabolic pathways | Domain based[/td] | 2000-2002 | Ortholog ID | Metabolic and regulatory pathways | Domain based[/td] | 2003- | KO | Pathways and BRITE hierarchies | Gene based |
Thus, under the current KO system, the KO identifiers (K numbers) are placed at the fourth (lowest) level in the network hierarchy shown above, or at the lowest level of the BRITE hierarchy.
5. BRITE Functional HierarchyThe BRITE database is a collection of hierarchical text files and binary relation files. It is intended to supplement the PATHWAY database in two ways. One is to computerize higher-level knowledge that cannot easily be represented as molecular interaction/reaction networks, in terms of the hierarchically structured vocabulary. The other is to integrate our knowledge about the genomic space (K numbers) with different types of knowledge in the chemical space (C/D/G/R/RP/EC numbers in the LIGAND database). The BRITE collection is currently categorized as follows.
Top Category | Second Category | Genes and Proteins | Network hierarchy
Protein families | Compounds and Reactions | Compounds
Reactions
Compoound interactions | Drugs and Diseases | Drugs
Diseases | Cells and Organisms | Organisms |
References- Kanehisa, M.; Toward pathway engineering: a new database of genetic and molecular pathways. Science & Technology Japan, No. 59, pp. 34-38 (1996). [pdf]
- Kanehisa, M.; A database for post-genome analysis. Trends Genet. 13, 375-376 (1997). [pubmed]
- Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., and Kanehisa, M.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 27, 29-34 (1999). [pubmed] [pdf]
- Kanehisa, M. and Goto, S.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27-30 (2000). [pubmed] [pdf]
- Kanehisa, M., Goto, S., Kawashima, S., and Nakaya, A.; The KEGG databases at GenomeNet. Nucleic Acids Res. 30, 42-46 (2002). [pubmed] [pdf]
- Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., and Hattori, M.; The KEGG resources for deciphering the genome. Nucleic Acids Res. 32, D277-D280 (2004). [pubmed] [pdf]
- Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., and Hirakawa, M.; From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354-357 (2006). [pubmed] [pdf]
- Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., and Yamanishi, Y.; KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480-D484 (2008). [pubmed] [pdf]
- Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., and Hirakawa, M.; KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38, D355-D360 (2010). [pubmed] [pdf]
Last updated: January 1, 2011 |
|