找回密码
 注册
查看: 7806|回复: 1

KEGG数据库介绍

[复制链接]
发表于 2011-4-15 09:12:13 | 显示全部楼层 |阅读模式
KEGG Overview1. KEGG DatabasesKEGG is an integrated database resource consisting of 16 main databases, broadly categorized into systems information, genomic information, and chemical information as shown below. Genomic and chemical information represents the molecular building blocks of life in the genomic and chemical spaces, respectively, and systems information represents functional aspects of the biological systems, such as the cell and the organism, that are built from the building blocks. KEGG has been widely used as a reference knowledge base for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies.
CategoryDatabaseContent
Systems
information
KEGG PATHWAYPathway maps for metabolism and other cellular processes, as well as human diseases; manually created from published materials
KEGG BRITEFunctional hierarchies (ontologies) representing our knowledge on various aspects of biological systems; manually created from published materials
KEGG MODULETighter functional units for pathways and complexes; manually defined
KEGG DISEASEList of disease genes and molecules; manually entered from published materials
KEGG DRUGChemical structures and associated information of approved drugs in Japan, USA, and Europe; manually entered from published materials
KEGG EDRUGChemical components and associated information of crude drugs and other natural products; manually entered from published materials
Genomic
information
KEGG ORTHOLOGYKEGG Orthology (KO) groups based on PATHWAY and BRITE; manually defined
KEGG GENOMEGenome maps and organism information; generated from RefSeq and other public resources
KEGG GENESGene catalogs of complete genomes with manual annotation; generated from RefSeq and other public resources
KEGG SSDBSequence similarity scores and best-hit relations; computationally derived from GENES by pairwise genome comparisons of all protein-coding genes
KEGG DGENESGene catalogs of draft genomes with automatic annotation; generated from web resources
KEGG EGENESGene catalogs (consensus contigs) of EST data with automatic annotation; generated from dbEST
KEGG MGENESGene catalogs of metagenomes with automatic annotation; generated from NCBI resources
Chemical
information
KEGG COMPOUNDChemical compounds; manually entered from published materials[/td]
KEGG GLYCANGlycans; manually entered from published materials
KEGG REACTIONChemical reactions; manually defined from ENZYME and PATHWAY
KEGG RPAIRChemical structure transformation patterns; manually defined from REACTION
KEGG RCLASSReaction class defined by chmeical structure transformation patterns of main reactant pairs; generated from RPAIR with annotation
KEGG ENZYMEEnzyme nomenclature; generated from ExplorEnz with annotation by KEGG

2. KEGG ObjectsKEGG is a computer representation of the biological systems. It is based on the concept of graph for representation and manipulation of various KEGG objects from molecular to higher levels. Mathematically, a graph is a set of nodes (KEGG objects) and edges (biological relationships). Each of the KEGG objects (database entries) is given an unique identifier shown below.
ReleaseDatabaseObject Identifier
1995KEGG PATHWAYmap number
KEGG GENESlocus_tag / GeneID
KEGG ENZYMEEC number
KEGG COMPOUNDC number
2000KEGG GENOMEorganism code / T number
2001KEGG REACTIONR number
2002KEGG ORTHOLOGY  K number
2003KEGG GLYCANG number
2004KEGG RPAIRRP number
2005KEGG BRITEbr number
KEGG DRUGD number
2007KEGG MODULEM number
2008KEGG DISEASEH number
2010KEGG EDRUGE number
KEGG RCLASSRC number
KEGG objects are linked to/from major life science databases. KEGG objects are also part of the Web; they can be found by Web search engines.
GraphNodeEdgeSearch and Analysis
KEGGKEGG objectBiological relationshipKEGG
Integrated databaseEntryCross-reference linkDBGET, Entrez, SRS, etc.
WebWeb pageHyperlinkGoogle, etc.

3. Network HierarchyThe molecular interaction/reaction network is the most unique data object in KEGG, which is stored as a collection of pathway maps (graphical diagrams) in the PATHWAY database. Reflecting the map resolution, KEGG PATHWAY is organized in a hierarchy. The top two levels in the current hierarchy is the following.
First LevelSecond Level
MetabolismCarbohydrate Metabolism
Energy Metabolism
Lipid Metabolism
Nucleotide Metabolism
Amino Acid Metabolism
Metabolism of Other Amino Acids
Glycan Biosynthesis and Metabolism
Metabolism of Cofactors and Vitamins
Metabolism of Terpenoids and Polyketides
Biosynthesis of Other Secondary Metabolites
Xenobiotics Biodegradation and Metabolism
Genetic Information ProcessingTranscription
Translation
Folding, Sorting and Degradation
Replication and Repair
Environmental Information ProcessingMembrane Transport
Signal Transduction
Signaling Molecules and Interaction
Cellular ProcessesTransport and Catabolism
Cell Motility
Cell Growth and Death
Cell Communication[/td]
Organismal SystemsImmune System
Endocrine System
Circulatory System
Digestive System
ExcretorySystem
Nervous System
Sensory System
Development
Environmental Adaptation
Human DiseasesCancers
Immune System Diseases
Neurodegenerative Diseases
Cardiovascular Diseases
Metabolic Diseases
Infectious Diseases

4. Network ReconstructionOriginally, the integration of pathway information and genomic information was first achieved in KEGG by the EC numbers. Once the EC numbers were correctly assigned to enzyme genes in the genome, organism-specific pathways could be generated automatically by matching against the networks of EC numbers (enzymes) in the reference metabolic pathways. However, in order to incorporate non-metabolic pathways and to overcome various problems inherent in the enzyme nomenclature, a new scheme based on the ortholog IDs was introduced replacing the EC numbers. KO (KEGG Orthology) is a further extension of ortholog IDs based on not only the pathway maps but also the BRITE functional hierarchies, most notably classifications of protein families.
PeriodIdentifierMappingAssignment
1995-1999EC numberMetabolic pathwaysDomain based[/td]
2000-2002Ortholog IDMetabolic and regulatory pathwaysDomain based[/td]
2003-KOPathways and BRITE hierarchiesGene based

Thus, under the current KO system, the KO identifiers (K numbers) are placed at the fourth (lowest) level in the network hierarchy shown above, or at the lowest level of the BRITE hierarchy.
5. BRITE Functional HierarchyThe BRITE database is a collection of hierarchical text files and binary relation files. It is intended to supplement the PATHWAY database in two ways. One is to computerize higher-level knowledge that cannot easily be represented as molecular interaction/reaction networks, in terms of the hierarchically structured vocabulary. The other is to integrate our knowledge about the genomic space (K numbers) with different types of knowledge in the chemical space (C/D/G/R/RP/EC numbers in the LIGAND database). The BRITE collection is currently categorized as follows.
Top CategorySecond Category
Genes and ProteinsNetwork hierarchy
Protein families
Compounds and ReactionsCompounds
Reactions
Compoound interactions
Drugs and DiseasesDrugs
Diseases
Cells and OrganismsOrganisms

References
  • Kanehisa, M.; Toward pathway engineering: a new database of genetic and molecular pathways. Science & Technology Japan, No. 59, pp. 34-38 (1996). [pdf]
  • Kanehisa, M.; A database for post-genome analysis. Trends Genet. 13, 375-376 (1997). [pubmed]
  • Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., and Kanehisa, M.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 27, 29-34 (1999). [pubmed] [pdf]
  • Kanehisa, M. and Goto, S.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27-30 (2000). [pubmed] [pdf]
  • Kanehisa, M., Goto, S., Kawashima, S., and Nakaya, A.; The KEGG databases at GenomeNet. Nucleic Acids Res. 30, 42-46 (2002). [pubmed] [pdf]
  • Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., and Hattori, M.; The KEGG resources for deciphering the genome. Nucleic Acids Res. 32, D277-D280 (2004). [pubmed] [pdf]
  • Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., and Hirakawa, M.; From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354-357 (2006). [pubmed] [pdf]
  • Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., and Yamanishi, Y.; KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480-D484 (2008). [pubmed] [pdf]
  • Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., and Hirakawa, M.; KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38, D355-D360 (2010). [pubmed] [pdf]

Last updated: January 1, 2011
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2024-11-21 16:58 , Processed in 0.028560 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表