goseq(goseq)
goseq()所属R语言包:goseq
goseq Gene Ontology analyser
goseq基因本体论分析仪
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Does selection-unbiased testing for category enrichment amongst differentially expressed (DE) genes for RNA-seq data. By default, tests gene ontology (GO) categories, but any categories may be tested.
是否公正的选择(DE)的差异表达基因的RNA-seq的数据类别之间浓缩的测试。默认情况下,测试基因本体(GO)的类别,但任何类别可能会受到考验。
用法----------Usage----------
goseq(pwf, genome, id, gene2cat = NULL,
test.cats=c("GO:CC", "GO:BP", "GO:MF"),
method = "Wallenius", repcnt = 2000)
参数----------Arguments----------
参数:pwf
An object containing gene names, DE calls, the probability weighting function. Usually generated by nullp.
一个对象,包含基因名称,德呼吁,概率加权函数。一般生成nullp。
参数:genome
A string identifying the genome that genes refer to. For a list of supported organisms run supportedGenomes.
字串genes指识别基因组。为支持生物体的列表执行supportedGenomes。
参数:id
A string identifying the gene identifier used by genes. For a list of supported gene IDs run supportedGeneIDs.
字符串确定genes使用基因标识。为支持基因标识的列表执行supportedGeneIDs。
参数:gene2cat
A data frame with two columns containing the mapping between genes and the categories of interest. Alternatively, a list where the names are genes and each entry is a vector containing GO categories associated with that gene (this is the output produced by getgo). If set to NULL goseq attempts to fetch GO categories automatically using getgo.
与两个基因和利益的类别之间的映射的列的数据框。另外的名称是基因和每个条目的一个列表,其中包含与该基因相关的GO类别(这是输出getgo)是一个向量。如果设置自动获取GO类别,使用NULLgetgogoseq尝试。
参数:test.cats
A vector specifying which categories to test for over representation amongst DE genes. See details for allowed options.
一个向量指定测试以上代表德基因之间的类别。详情请参阅允许的选项。
参数:method
The method to use to calculate the unbiased category enrichment scores. Valid options are "Wallenius", "Sampling" & "Hypergeometric". "Hypergeometric" should almost never be used (see details).
使用的方法来计算的无偏类富集分数。有效的选项是“华轮”,“采样”和“超几何”。 “超几何”几乎从来没有被使用(见详情)。
参数:repcnt
Number of random samples to be calculated when random sampling is used. Ignored unless method="Sampling".
计算时采用随机抽样的随机样本数。除非method="Sampling"忽略。
Details
详情----------Details----------
The pwf argument is almost always the output of the function nullp. This is a data frame with 3 columns, named "DEgenes", "bias.data" and "pwf" with the rownames set to the gene names. Each row corresponds to a gene with the DEgenes column specifying if the gene is DE (1 for DE, 0 for not DE), the bias.data column giving the numeric value of the DE bias being accounted for (usually the gene length or number of counts) and the pwf column giving the genes value on the probability weighting function.
pwf参数几乎总是输出功能nullp。这是一个与名为“DEgenes”,“bias.data”和“PWF”的基因名称设置的rownames的3列,数据框。指定的基因,如果DE(德1,不是去0)DEgenes列,每一行对应一个基因,bias.data的DE偏置数值列(通常是基因的长度或数量占计数)和的PWF列给基因的概率加权函数值。
goseq obtains length data from UCSC and GO mappings from the organim packages (see link{getgo} and getlength for details). If your data is in an unsupported format you will need to obtain the GO category mapping and supply them to the goseq function using the gene2cat arguement.
goseq获得加州大学圣克鲁兹分校长度的数据,并从organim包(见link{getgo}和getlength详情)映射。如果你的数据是不支持的格式,你需要获得的GO类别映射,并提供他们goseq函数使用gene2catarguement。
To use your own gene to category mapping with goseq, use the gene2cat arguement. This arguement takes a data.frame, with one column containing gene IDs and the other containing the associated categories. As the mapping from gene <-> category is in general many to many there will be multiple rows containing the same gene identifier. Alternatively, gene2cat can take a list, where the names are the genes and the entries are the GO categories associated with the genes. This is the format produced by the getgo function and is more space efficient than the data.frame representation.
使用goseq你自己的基因类的映射,使用gene2catarguement。此arguement采用数据框,有一列含有基因标识,并包含其他相关的类别。为从基因图谱< - >类别一般是多对多会有多行包含相同的基因标识。另外,gene2cat可以采取列表的名称,其中的基因和条目是与基因相关的GO类别。这是getgo函数的格式和更多的空间效率比数据框表示。
If gene2cat is left as NULL, goseq attempts to use getgo to fetch GO catgeory to gene identifier mappings.
如果gene2cat留给NULL,goseq尝试使用getgo获取catgeory去找基因标识映射。
The PWF is usually calculated using the nullp function to correct for length bias. However, goseq will work with any vector of weights. Any bias can be accounted for so long as a weight for each gene is supplied using this arguement. NAs are allowed in the "pwf" and "bias.data" columns of the PWF data frame (these usually occur as a result of missing length data for some genes). Any entry which is NA is set to the weighting of the median gene.
通常使用nullp功能,纠正偏差为长度计算的法定公益金。然而,goseq将权重向量与任何工作。不带任何偏见,以便为每个基因使用此arguement提供的重量可占。 NAS允许在“PWF”的法定公益金数据框的“bias.data”列(这些通常出现的某些基因的缺失长度数据的结果)。任何条目NA设置为中位数基因的比重。
Valid options for the test.cats arguement are any combination of "GO:CC", "GO:BP", "GO:MF" & "KEGG". The three GO terms refer to the Cellular Component, Biological Process and Molecular Function respectively. "KEGG" refers to KEGG pathways.
test.catsarguement有效选项的任意组合的“GO:抄送”,“GO:BP”,“好:”中频“及”KEGG“。三GO术语是指以单元成分,生物过程和分子功能。 “KEGG”是指KEGG通路。
The three methods, "Wallenius", "Sampling" & "Hypergeometric", calculate the p-values as follows.
这三种方法,“华轮”,“采样”和“超几何”,计算如下p值。
"Wallenius" approximates the true distribution of numbers of members of a category amongst DE genes by the Wallenius non-central hypergeometric distribution. This distribution assumes that within a category all genes have the same probability of being chosen. Therefore, this approximation works best when the range in probabilities obtained by the probability weighting function is small.
“华轮”的成员由华轮非中央超几何分布的一类除德基因数目接近真实分布。这种分布假设一个类别内的所有基因被选中的概率相同。因此,这种近似效果最好时的概率加权函数得到的是小概率范围。
"Sampling" uses random sampling to approximate the true distribution and uses it to calculate the p-values for over (and under) representation of categories. Although this is the most accurate method given a high enough value of repcnt, its use quickly becomes computationally prohibitive.
“采样”,采用随机抽样的近似真实分布,并用它来计算p值以上(下)类别的代表性。虽然这是最准确的方法,给予了很高的repcnt足够的价值,它的用途迅速成为计算望而却步。
CAUTION: "Hypergeometric" should NEVER be used for producing results for biological interpretation. If there is genuinly no bias in power to detect DE in your experiment, the PWF will reflect this and the other methods will produce accuracte results.
注意:“超几何”不应该被用于生产生物解释的结果。如果有功率检测实验DE genuinly没有偏见,法定公益金将反映这一点和其他方法将产生accuracte结果。
"Hypergeometric" assumes there is no bias in power to detect differential expression at all and calculates the p-values using a standard hypergeometric distribution. Useful if you wish to test the effect of selection bias on your results.
“超几何”的假设是没有权力的所有检测差异表达的偏见,并使用标准的超几何分布计算p值。有用的,如果你想测试的结果,选择偏倚的影响。
值----------Value----------
goseq returns a data frame with 3 columns. The first column gives the name of the category, the second gives the p-value for the associated category being over represented amongst DE genes. The third and final column gives the p-value for the associated category being under represented amongst DE genes. The p-values have not been corrected for multiple hypothesis testing.
goseq返回3列的数据框。第一列给出了类的名称,第二个给出了对除德基因为代表的相关类别的p值。第三和最后一列给出了相关的类别之间德基因为代表的下的p值。多个假设检验,p值没有得到纠正。
作者(S)----------Author(s)----------
Matthew D. Young <a href="mailto:myoung@wehi.edu.au">myoung@wehi.edu.au</a>
参考文献----------References----------
Genome Biology Date: Feb 2010 Vol: 11 Issue: 2 Pages: R14
参见----------See Also----------
nullp, getgo, getlength
nullp,getgo,getlength
举例----------Examples----------
data(genes)
pwf <- nullp(genes,'hg19','ensGene')
pvals <- goseq(pwf,'hg19','ensGene')
head(pvals)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|