gene2pathway(gene2pathway)
gene2pathway()所属R语言包:gene2pathway
Pathway membership prediction
通路成员预测
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Predicts a gene's membership to a branch in the KEGG hierarchy via the contained InterPro domains.
预测基因的成员在KEGG层次通过所载InterPro的域的一个分支。
用法----------Usage----------
gene2pathway(geneIDs=NULL, flyBase=FALSE, gene2Domains=NULL, organism="hsa", useKEGG=TRUE, mc.cores=8)
参数----------Arguments----------
参数:geneIDs
a character vector of Entrez gene IDs or FlyBase identifiers (not necessary, if the argument gene2Domains is provided)
特征向量的Entrez基因ID或果蝇标识符(不是必要的,如果的说法gene2Domains是提供)
参数:flyBase
Are FlyBase identifiers provided? Default: No
提供果蝇标识符吗?默认:无
参数:gene2Domains
By default associations between genes and InterPro domains are retrieved via biomaRt from Ensembl. Alternatively, the user can provide its own mapping of genes to InterPro domains in form of a list here (see details).
默认情况下,基因和InterPro的域之间的关联是通过从Ensembl的biomaRt检索。另外,用户可以提供自己的基因映射到InterPro的形式在这里列表域(见详情)。
参数:organism
KEGG letter code describing an organism. Please refer to <URL:http://www.genome.jp/kegg-bin/create\_kegg\_menu> for a complete list of organisms (and their letter codes) supported by KEGG.
KEGG字母代码描述一个有机体。一个生物体KEGG支持(以及他们的字母代码)的完整列表,请参阅<URL:http://www.genome.jp/kegg-bin/create\_kegg\_menu>。
参数:useKEGG
Should KEGG information instead of a prediction be used when possible?
应KEGG信息,而不是预测时使用?
参数:mc.cores
number of cores to use for parallelization; requires package 'doMC' to be loaded
要求包“doMC要加载核心并行使用;
Details
详情----------Details----------
A hierarchical classification model based on SVMs and a ranking perceptron is used. This model is usually additionally bagged to improve prediction quality. The model is stored in the package data directory and is recommended to be retrained from time to time.
基于支持向量机和排名感知的层次分类模型。这种模式通常是加套袋,以提高预测质量。存储模型的包中的数据目录和建议不时培训。
The KEGG hierarchy is taken from the package keggorthology. By default associations between genes and InterPro domains are retrieved automatically via biomaRt from Ensembl. Please refer to <URL:http://www.ebi.ac.uk/ensembl/> for a list of organisms supported by Ensembl. Alternatively to using Ensembl and biomaRt, the user can provide its own mapping of genes to InterPro domains in form of a list. This especially allows for using organisms, which are supported by KEGG, but not by Ensembl so far. The list has the form genes -> InterPro domains, and each list entry is named by a gene identifier of the corresponding gene. Entrez gene IDs or FlyBase identifiers have to be used.
KEGG层次采取从包keggorthology。基因和InterPro的域之间的关联,默认情况下,通过自动检索从Ensembl的biomaRt。请参考Ensembl的支持生物名单,以<URL:http://www.ebi.ac.uk/ensembl/>。 Ensembl的和biomaRt使用或者,用户可以提供自己的基因映射到InterPro的域列表的形式。尤其是允许使用的生物体,这是KEGG支持,但到目前为止,由Ensembl的。列表形式的基因 - > InterPro的域,每个列表项由相应的基因的基因标识符命名。 Entrez基因身份证或果蝇的标识符,必须使用。
值----------Value----------
参数:gene2Path
mapping of gene IDs to corresponding KEGG pathway names
基因标识映射到相应的KEGG通路名称
参数:byKEGG
indicates by TRUE/FALSE for each gene whether the mapping information was obtained directly from KEGG or whether it was predicted
表示由TRUE / FALSE每个基因是否得到映射信息直接从KEGG,或是否有人预言
参数:scores
confidence scores for the prediction (0, if no prediction was performed): see notes for details
预测的信心分数(0,如果没有进行预测):详见笔记
参数:votes
fraction of votes for individual pathway predictions
分数的个别途径预测票
注意----------Note----------
By default a bagged model prediction is used, i.e. each of the individual sub-models is giving a vote for a specific output. The final output is determined by the majority of the votes for each hierarchy branch separately. The corresponding fraction voting for a specific branch may be interpreted as its probability. In the ideal case all individual branch probabilites should always be close to 1, if the gene maps to that part of the KEGG hierarchy, and close to 0 otherwise. A cumulative measure of confindence is thus the average over all probabilities > 0.5 and one minus the average over all probabilities < 0.5. We combine both measure by taking the average of both and report it as a reliability score.
默认情况下使用袋装模型的预测,即各个子模型的每一个特定的输出表决。最终输出取决于多数的票数分别为每个层次分支。其概率可能被解释为一个特定的分支相应的分数投票。在理想情况下的所有个人分支probabilites应该永远是接近1,如果基因图谱,KEGG层次的一部分,并关闭,否则为0。因此一个累计confindence措施平均超过所有的概率> 0.5,平均减去了所有的概率<0.5。我们结合两者的平均值都采取措施,并报告它作为可靠性得分。
If the user decides to retrain a model WITHOUT using bagging, then the reliability score is simply the margin between the highest and the second highest ranked solution. This margin should be larger 2 for good confidence.
如果用户决定不使用套袋模型培训,然后可靠性得分仅仅是最高和第二的排名最高的解决方案之间的边缘。这个幅度应该更大良好的信心2。
作者(S)----------Author(s)----------
Holger Froehlich
参见----------See Also----------
retrain, classificationModel
retrain,classificationModel
举例----------Examples----------
## Not run: [#无法运行:]
gene2pathway("FBgn0030327", flyBase=TRUE, organism="dme")
## End(Not run)[#结束(不运行)]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|