R语言 gage包 gage()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 18:23:01

gage(gage)
gage()所属R语言包：gage

                                       GAGE (Generally Applicable Gene-set Enrichment) analysis
                                       压力计（普遍适用的基因组富集）分析

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Run GAGE analysis to infer gene sets (or pathways, functional groups etc) that are signficantly perturbed relative to all genes considered. GAGE is generally applicable to essentially all microarray dta independent of data attributes including sample size, experimental layout, study design, and all types of heterogeneity in data generation.
运行压力计分析推断基因组（或途径，功能组别等）是signficantly考虑到所有基因扰动相对。 Gage是普遍适用的，基本上所有芯片DTA独立的数据属性，包括样本的大小，实验布局，研究设计，和所有类型的异质性数据生成。

gage is the main function; gagePrep is the functions for the initial data preparation, especially sample pairing; gageSum carries out the final meta-test summarization.
测厚仪的主要功能是gagePrep是初始数据准备，特别是样本配对的功能; gageSum进行最后的荟萃测试总结。

用法----------Usage----------

gage(exprs, gsets, ref = NULL, samp = NULL, set.size = c(10, 500),
same.dir = TRUE, compare = "paired", rank.test = FALSE, use.fold = TRUE,
FDR.adj = TRUE, weights = NULL, full.table = FALSE, saaPrep = gagePrep,
saaTest = gs.tTest, saaSum = gageSum, use.stouffer=TRUE, ...)

gagePrep(exprs, ref = NULL, samp = NULL, same.dir = TRUE, compare =
"paired", rank.test = FALSE, use.fold = TRUE, weights = NULL, full.table =
FALSE, ...)

gageSum(rawRes, ref = NULL, test4up = TRUE, same.dir =
TRUE, compare = "paired", use.fold = TRUE, weights = NULL, full.table =
FALSE, use.stouffer=TRUE, ...)

参数----------Arguments----------

参数：exprs
an expression matrix or matrix-like data structure, with genes as rows and samples as columns.
表达矩阵或矩阵类似的数据结构，行和列的样本的基因。

参数：gsets
a named list, each element contains a gene set that is a character vector of gene IDs or symbols. For example, type head(kegg.gs). A gene set can also be a "smc" object defined in PGSEA package. Please make sure that the same gene ID system is used for both gsets and exprs.
一个名为列表，每个元素包含一个基因组，基因ID或符号，是一个特征向量。例如，键入head(kegg.gs)。 A基因组也可以是“SMC”PGSEA包中定义的对象。请确保使用相同的基因ID系统既gsets和exprs。

参数：ref
a numeric vector of column numbers for the reference condition or phenotype (i.e. the control group) in the exprs data matrix. Default ref = NULL, all columns are considered as target experiments.
一个参考条件，或在exprs数据矩阵型（即对照组）的列数的数字向量。默认REF = NULL，所有列被视为靶实验。

参数：samp
a numeric vector of column numbers for the target condition or phenotype (i.e. the experiment group) in the exprs data matrix. Default samp = NULL, all columns other than ref are considered as target experiments.
一列数为目标的条件或在exprs数据矩阵型（即实验组）的数字向量。默认SAMP = NULL，比文献中的所有列被视为靶实验。

参数：set.size
gene set size (number of genes) range to be considered for enrichment test. Tests for too small or too big gene sets are not robust statistically or informative biologically. Default to be set.size = c(10, 500).
基因组大小的基因数量范围为浓缩试验。过小或过大的基因组的测试是不健全的统计或生物信息。默认为set.size = C（10，500）。

参数：same.dir
boolean, whether to test for changes in a gene set toward a single direction (all genes up or down regulated) or changes towards both directions simultaneously. For experimentally derived gene sets, GO term groups, etc, coregulation is commonly the case, hence same.dir = TRUE (default); In KEGG, BioCarta pathways, genes frequently are not coregulated, hence it could be informative to let same.dir = FALSE. Although same.dir = TRUE could also be interesting for pathways.
布尔，是否测试朝着一个方向（所有基因或下调）或同时向两个方向变化的基因变化。实验得出的基因集，术语组等，协同调控是常见的情况，因此same.dir = True（默认）; KEGG，BioCarta通路，基因经常是没有协同调节，因此它可能是信息让same.dir = FALSE ，虽然same.dir = TRUE，也可以是有趣的途径。

参数：compare
character, which comparison scheme to be used: 'paired', 'unpaired', '1ongroup', 'as.group'. 'paired' is the default, ref and samp are of equal length and one-on-one paired by the original experimental design; 'as.group', group-on-group comparison between ref and samp; 'unpaired' (used to be '1on1'), one-on-one comparison between all possible ref and samp combinations, although the original experimental design may not be one-on-one paired; '1ongroup', comparison between one samp column at a time vs the average of all ref columns.  For PAGE-like analysis, the default is compare='as.group', which is the only option provided in the original PAGE method. All other comparison schemas are set here for direct comparison to gage.
性格，比较计划将用于：配对，未成，1 ongroup，as.group“。 “配对”是默认，ref和桑普是平等的长度和原来的实验设计配对的一对“as.group”，组组ref和桑普之间的比较“未成”（用于1 ON1“），一对所有可能的ref和桑普的组合，虽然比较原始的实验设计可能没有一对配对; 1 ongroup”，一个桑普列比平均时间之间的比较所有文献列。默认页面类似的分析，比较=as.group，这是唯一的选择，在原来的页面方法提供。这里设置的所有其他比较模式直接比较规。

参数：rank.test
rank.test: Boolean, whether do the optional rank based two-sample t-test (equivalent to the non-parametric Wilcoxon Mann-Whitney test) instead of parametric two-sample t-test. Default rank.test = FALSE. This argument should be used with respect to argument saaTest.
rank.test：布尔，是否可选的排名是根据两样本t检验，而不是参数两样本t检验（相当于非参数秩Mann-Whitney检验）。默认rank.test = FALSE。这种说法应该用于方面参数saaTest。

参数：use.fold
Boolean, whether to use fold changes or t-test statistics as per gene statistics. Default use.fold= TRUE.
布尔，是否使用每个基因统计倍变更或t检验统计。默认use.fold = TRUE。

参数：FDR.adj
Boolean, whether to do adjust for multiple testing as to control FDR (False dicovery rate). Default to be TRUE.
布尔，是否做调整，以控制FDR（假dicovery率）为多个测试。默认为TRUE。

参数：weights
a numeric vector to specify the weights assigned to pairs of ref-samp. This is needed for data with both technical replicates and biological replicates as to count for the different contributions from the two types of replicates. This argument is also useful in manually paring ref-samp for unpaired data, as in pairData function. function. Default to be NULL.
数字向量，以指定分配文献SAMP对权重。这是需要的数据都复制技术和生物复制指望复制两种不同的贡献。在手动配对未成数据REF-SAMP pairData功能，这种说法也是有用的。功能。默认为NULL。

参数：full.table
This option is obsolete. Boolean, whether to output the full table of all individual p-values from the pairwise comparisons of ref and samp. Default to be FALSE.
此选项是过时的。布尔值，是否输出全表的所有个人从文献和桑普的成对比较p值。默认为FALSE。

参数：saaPrep
function used for data preparation for single array based analysis, including sanity check, sample pairing, per gene statistics calculation etc. Default to be gagePrep, i.e. the default data preparation routine for gage analysis.
用于单一阵列为基础的分析数据准备的功能，包括完整性检查，每个基因的统计计算等预设的样本配对，gagePrep，即默认的数据准备常规量具分析。

参数：saaTest
function used for gene set tests for single array based analysis. Default to be gs.tTest, which features a two-sample t-test for differential expression of gene sets. Other options includes: gs.zTest, using one-sample z-test as in PAGE, or gs.KSTest, using the non-parametric Kolmogorov-Smirnov tests as in GSEA. The two non-default options should only be used when rank.test = FALSE.
用于功能基因组为单一阵列为基础的分析测试。默认是gs.tTest，功能基因组的表达差两样本t检验。其他选项包括：gs.zTest，使用Z-测试页面，或gs.KSTest一个样本，利用柯尔莫哥洛夫 - 斯米尔诺夫在GSEA非参数测试。两个非默认选项只应使用时rank.test = FALSE。

参数：saaSum
function used for summarization of the results from single array analysis (i.e. pairwise comparison between ref and samp). This function should include a meta-test for a global p-value or summary statistis and a FDR adjustment for multi-testing issue. Default to be gageSum, i.e. the default data summarization routine for gage analysis.
函数用于从单一阵列分析（即ref和桑普两两之间的比较）的结果总结。此功能应包括元为一个全球性的p值或摘要statistis和FDR校正为多测试问题测试。默认是gageSum的，即默认的数据汇总程序的规分析。

参数：rawRes
a named list, the raw results of gene set tests. Check the help information of gene set test functions like gs.tTest for details.
一个名为列表，原始基因组测试结果。检查基因组测试功能的帮助信息，如gs.tTest详情。

参数：test4up
boolean, whether to summarize the p-value results for up-regulation test (p.results) or not (ps.results for down-regulation). This argument is only needed when the argument same.dir=TRUE in the main gage function, i.e. when test for one-directional changes.
布尔值，是否总结上调测试（p.results）或不（下调ps.results）p值的结果。此参数只需要当参数same.dir = TRUE在主gage功能，即当一个方向变化的考验。

参数：use.stouffer
Boolean, whether to use Stouffer's method when summarizing individual p-values into a global p-value. Default to be TRUE. This default setting is recommended as to avoid the "dual significance", i.e. a gene set is significant for both up-regulation and down-regulation tests simultaneously. Dual signficance  occurs sometimes for data with large sample size, due to extremely small p-values in a few pair-wise comparison. More robust p-value summarization using Stouffer's method is a important new feature added to GAGE since version 2.2.0 (Bioconductor 2.8). This new argument is set as to provide a option to the original summarization based on Gamma distribution (FALSE).
布尔，总结成一个全球性的p值的个别p值时是否使用斯托弗的方法。默认为TRUE。此默认设置建议，以避免“双重意义”，即基因组的同时上调和下调的测试，同时显着。有时大样本的数据，由于非常小，在一些成对比较p值出现双建设的重大意义。更强大的P-值汇总使用斯托弗的方法是一个重要的新功能自从版本2.2.0（Bioconductor 2.8）压力计。这个新的参数设置，以提供一个选项，基于Gamma分布（FALSE），原来的总结。

参数：...
other arguments to be passed into the optional functions for saaPrep, saaTest and saaSum.
其他参数被传递成为saaPrep，saaTest和saaSum的可选功能。

Details

详情----------Details----------

We proposed a single array analysis (i.e. the one-on-one comparison) approach with GAGE. Here we made single array analysis a general workflow for gene set analysis.  Single array analysis has 4 major steps: Step 1 sample pairing, Step 2 per gene tests, Step 3 gene set tests and Step 4 meta-test summarization. Correspondingly, this new main function, gage, is divided into 3 relatively independent modules. Module 1 input preparation covers step 1-2 of single array analysis. Module 2 corresponds to step 3 gene set test, and module 3 to step 4 meta-test summarization. These 3 modules become 3 argument functions to gage, saaPrep, saaTest and saaSum. The modulization made gage open to customization or plug-in routines at each steps and fully realize the general applicability of single array analysis. More examples will be included in a second vignette to demonstrate the customization with these modules.
我们提出了一个单一的阵列分析（即一对比较）与压力计的方法。在这里，我们单一阵列分析了基因组分析的一般工作流程。单一阵列分析有4个主要步骤：步骤1样本配对，每个基因测试的步骤2，步骤3的基因组测试和步骤4元测试总结。相应地，这一新的主要功能，量具，分为3个相对独立的模块。模块1输入准备包括单一阵列分析的步骤1-2。模块2对应步骤3基因组测试，模块3至步骤4元测试总结。这3个模块成为3个参数功能计，saaPrep，saaTest和saaSum。模块化规定制或插件在每个步骤的例程，并充分认识到单一阵列分析的普遍适用性。更多的例子将包括在第二个插曲，以证明这些模块定制。

some important updates has been made to gage package since version 2.2.0 (Bioconductor 2.8): First,  more robust p-value summarization using Stouffer's method through argument use.stouffer=TRUE. The original p-value summarization, i.e. negative log sum following a Gamma distribution as the Null hypothesis, may produce less stable global p-values for large or heterogenous datasets. In other words, the global p-value could be heavily affected by a small subset of extremely small individual p-values from pair-wise comparisons. Such sensitive global p-value leads to the "dual signficance" phenomenon. Dual-signficant means a gene set is called significant simultaneously in both 1-direction tests (up- and down-regulated). "Dual signficance" could be informative in revealing the sub-types or sub-classes in big clinical or disease studies, but may not be desirable in other cases. Second, output of gage function now includes the gene set test statistics from pair-wise comparisons for all proper gene sets. The output is always a named list now, with either 3 elements ("greater", "less", "stats") for one-directional test or 2 elements ("greater", "stats") for two-directional test.  Third, the individual p-value (and test statistics)from dependent pair-wise comparisions, i.e. comparisions between the same experiment vs different controls, are now summarized into a single value. In other words, the column number of individual p-values or statistics is always the same as the sample number in the experiment (or disease) group. This change made the argument value compare="1ongroup" and argument full.table less useful. It also became easier to check the perturbations at gene-set level for individual samples.  Fourth, whole gene-set level changes (either p-values or statistics) can now be visualized using heatmaps due to the third change above. Correspondingly, functions sigGeneSet and gagePipe have been revised to plot heatmaps for whole gene sets.
一些重要的更新已作出规包自版本2.2.0（Bioconductor 2.8）：首先，更强大的P-值总结斯托弗通过参数use.stouffer使用的方法=。原来的p值的总结，即负对数的总和，作为零假设的Gamma分布，可能会产生较稳定的全球大或异质数据集的p-值。换句话说，全球的p值可以严重影响一个非常小的个人成对比较p值的一小部分。这种敏感的全球p值导致“双建设的重大意义”的现象。的双signficant意味着一个基因组被称为显著同时在两个1方向测试（上升和下调）。 “双建设的重大意义”，可能是在揭示的子类型或子类大临床或疾病的研究资料，但在其他情况下可能不理想。其次，量具功能的输出现在包括成对所有适当的基因组比较基因组测试的统计数字。输出始终是一个名为名单现在，3个元素（“大”，“少”，“统计”）（“大”，“统计”两个单向测试或2个元素）定向测试。第三，从依赖成对之比较个别p值（和测试统计），即之比较之间的相同与不同的控制实验，现归纳为一个单一的价值。换句话说，个别p值或统计数列始终是在实验样本数（或疾病）组相同。这种变化的参数值比较=“1ongroup”和的说法full.table少有用。它也变得更容易检查个别样本的基因组水平的扰动。第四，整个基因组水平的变化（无论是P-值或统计），现在可以使用上面的第三个变化，由于热图可视化。与此相对应，功能sigGeneSet和gagePipe已被修改，对整个基因组的积热图。

值----------Value----------

The result returned by gage function is a named list, with either 3 elements ("greater", "less", "stats") for one-directional test (same.dir = TRUE) or 2 elements ("greater", "stats") for two-directional test (same.dir = FALSE). Elements "greater" and "less" are two data matrices of the same structure, mainly the p-values, element "stats" contains the test statistics. Each data matrix here has gene sets as rows sorted by global p- or q-values. Test signficance or statistics columns include:
规函数返回的结果是一个名为名单中，有3个元素为单向测试（same.dir = TRUE），或2个元素（“大”，“少”，“统计”）（“大” ，“统计”）两个方向测试（same.dir FALSE）。元素“大”和“少”是两个相同结构的数据矩阵，主要是p-值，元素“统计”包含了检验统计量。在这里，每一个数据矩阵基因作为全球P-Q-值排序的行集。测试建设的重大意义或统计之列，包括：

参数：p.geomean
geometric mean of the individual p-values from multiple single array based gene set tests
从多个单一阵列为基础的基因组测试个别p值的几何平均

参数：stat.mean
mean of the individual statistics from multiple single array based gene set tests. Normally, its absoluate value measures the magnitude of gene-set level changes, and its sign indicates direction of the changes. When saaTest=gs.KSTest, stat.mean is always positive.
指由多个单一阵列为基础的基因组测试的个别统计。通常情况下，其absoluate值测量基因组水平的变化幅度，其标志指示方向的变化。，stat.mean当saaTest = gs.KSTest始终是积极的。

参数：p.val
gloal p-value or summary of the individual p-values from multiple single array based gene set tests. This is the default p-value being used.
gloal p值或个人从多个单一阵列为基础的基因组测试的p值的摘要。这是正在使用的P-值的默认。

参数：q.val
FDR q-value adjustment of the global p-value using the Benjamini & Hochberg procedure implemented in multtest package. This is the default q-value being used.
FDRQ-值调整，全球使用的Benjamini Hochberg在multtest包实施的过程p值。这是Q值所使用的默认。

参数：set.size
the effective gene set size, i.e. the number of genes included in the gene set test
有效的基因组大小，即基因组测试中包含的基因数量

参数：other columns
columns of the individual p-values or statistics, each measures the gene set perturbation in a single experiment (vs its control or all controls, depends on the "compare argument value)
个别p值或统计的列，每一个措施，在一次实验中基因组的扰动（比其控制或全部控制，“比较参数值而定）

The result returned by gagePrep is a data matrix derived from  exprs, but ready for column-wise gene est tests. In the matrix, genes are rows, and columns are the per gene test statistics from the ref-samp pairwise comparison.
返回的结果gagePrep是从exprs的数据矩阵，但准备列明智的基因EST测试。在矩阵中，基因是行，列从文献桑普成对比较每个基因测试统计。

The result returned by gageSum is almost identical to the results of gage function, it is also a named list but has only 2 elements, "p.glob" and "results", with one round of test results.
返回的结果gageSumgage函数的结果几乎是相同的，它也是一个命名的名单，但只有2个元素，的“p.glob”和“结果”，一圆测试结果。

作者（S）----------Author(s)----------

Weijun Luo <luo_weijun@yahoo.com>

参考文献----------References----------

Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics 2009, 10:161

参见----------See Also----------

gs.tTest, gs.zTest, and gs.KSTest functions used for gene set test; gagePipe and heter.gage function used for multiple GAGE analysis in a batch or combined GAGE analysis on heterogeneous data
gs.tTest，gs.zTest，gs.KSTest使用的功能基因组测试;gagePipe和heter.gage函数用于多个压力计分析批次或合并压力计分析异构数据

举例----------Examples----------

data(gse16873)
cn=colnames(gse16873)
hn=grep('HN',cn, ignore.case =TRUE)
dcis=grep('DCIS',cn, ignore.case =TRUE)
data(kegg.gs)
data(go.gs)

#kegg test for 1-directional changes[KEGG测试为1方向变化]
gse16873.kegg.p <- gage(gse16873, gsets = kegg.gs,
ref = hn, samp = dcis)
#go.gs with the first 1000 entries as a fast example.[go.gs与第1000项作为一个快速的例子。]
gse16873.go.p <- gage(gse16873, gsets = go.gs,
ref = hn, samp = dcis)
str(gse16873.kegg.p)
head(gse16873.kegg.p$greater)
head(gse16873.kegg.p$less)
head(gse16873.kegg.p$stats)
#kegg test for 2-directional changes[KEGG为2方向变化的测试]
gse16873.kegg.2d.p <- gage(gse16873, gsets = kegg.gs,
ref = hn, samp = dcis, same.dir = FALSE)
head(gse16873.kegg.2d.p$greater)
head(gse16873.kegg.2d.p$stats)

###alternative ways to do GAGE analysis###[压力计分析＃＃＃＃＃替代方法]
#with unpaired samples[未成样品]
gse16873.kegg.unpaired.p <- gage(gse16873, gsets = kegg.gs,
ref = hn, samp = dcis, compare = "unpaired")

#other options to tweak includes:[其他选项来调整包括：]
#saaTest, use.fold, rank.test, etc. Check arguments section above for[saaTest，use.fold，rank.test，等检查参数上面的部分]
#details and the vignette for more examples.[详情及更多的例子的小插曲。]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册