R语言 gaga包 forwsimDiffExpr()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 18:21:05

forwsimDiffExpr(gaga)
forwsimDiffExpr()所属R语言包：gaga

                                       Forward simulation for differential expression.
                                       正演模拟的差异表达。

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Forward simulation allows to evaluate the expected utility for sequential designs. Here the utility is the expected number of true discoveries minus a sampling cost. The routine simulates future data either from the prior predictive or using a set of pilot data and a GaGa model fit. At each future time point, it computes a summary statistic that will be used to determine when to stop the experiment.
正演模拟可以评估的顺序设计的预期效用。这里的效用，是真正的发现，减去一个取样成本的预期。常规模拟未来无论是从之前的预测，或用一组试验数据和GAGA模型拟合数据。在未来每一个时间点，计算汇总统计，将被用来确定何时停止实验。

用法----------Usage----------

forwsimDiffExpr(gg.fit, x, groups, ngenes, maxBatch, batchSize, fdrmax = 0.05, genelimit, v0thre = 1, B = 100,
Bsummary = 100, trace = TRUE, randomSeed)

参数----------Arguments----------

参数：gg.fit
GaGa or MiGaGa fit (object of type gagafit, as returned by fitGG).
GAGA或MiGaGa合适的类型gagafit，fitGG返回的对象。

参数：x
ExpressionSet, exprSet, data frame or matrix containing the gene expression measurements used to fit the model.
ExpressionSet，exprSet，数据框或矩阵包含用于拟合模型的基因表达测量。

参数：groups
If x is of type ExpressionSet or exprSet, groups should be the name of the column in pData(x) with the groups that one wishes to compare. If x is a matrix or a data frame, groups should be a vector indicating to which group each column in x corresponds to.
x如果类型ExpressionSet或exprSet，groups应该是列名pData(x)一个愿望比较组。 x如果是一个矩阵或一个数据框，groups应该是哪一组x中的每一列对应的向量。

参数：ngenes
Number of genes to simulate data for. If x is specified this argument is set to nrow(x) and data is simulated from the posterior predictive conditional on x. If x not specified simulation is from the prior predictive.
基因数目的模拟数据。如果x指定此参数设置为nrow(x)和数据模拟x的条件后预测。如果x未指定的模拟是从之前的预测。

参数：maxBatch
Maximum number of batches, i.e. the routine simulates batchSize*maxBatch samples per group.
批次最大数量，即常规的模拟batchSize*maxBatch每组样本。

参数：batchSize
Batch size, i.e. number of observations per group to simulate at each time point. Defaults to ncol(x)/length(unique(groups)).
批量大小，即每个组的意见，以模拟在每个时间点。 ncol(x)/length(unique(groups))默认。

参数：fdrmax
Upper bound on FDR.
在FDR的上的约束。

参数：genelimit
Only the genelimit genes with the lowest probability of being equally expressed across all groups will be simulated. Setting this limit can significantly increase the computational speed.
只有genelimit基因同样被所有群体表示的最低概率将模拟。设置此限制，可以大大提高运算速度。

参数：v0thre
Only genes with posterior probability of being equally expressed < v0thre will be simulated.  Setting this limit can significantly increase the computational speed.
后验概率只有基因同样表示<v0thre将模拟的。设置此限制，可以大大提高运算速度。

参数：B
Number of forward simulations.
远期模拟数。

参数：Bsummary
Number of simulations for estimating the summary statistic.
模拟估算汇总统计数。

参数：trace
For trace==TRUE iteration progress is displayed.
trace==TRUE迭代进度显示。

参数：randomSeed
Integer value used to set random number generator seed. Defaults to as.numeric(Sys.time()) modulus 10^6.
Integer值，用于设置随机数发生器的种子。默认为as.numeric(Sys.time())模10 ^ 6。

Details

详情----------Details----------

To improve computational speed hyper-parameters are not re-estimated as new data is simulated.
为了提高运算速度超参数不重新估计新的数据模拟。

值----------Value----------

A data.frame with the following columns:
一个data.frame以下的列：

参数：simid
Simulation number.
模拟数。

参数：j
Time (sample size).
时间（样本大小）。

参数：u
Expected number of true positives if we were to stop experimentation at this time.
预计真阳性数，如果我们在这个时候停止试验。

参数：fdr
Expected FDR if we were to stop experimentation at this time.
预计FDR，如果我们要停止在这次实验。

参数：fnr
Expected FNR if we were to stop experimentation at this time.
FNR的预期，如果我们要停止在这次实验。

参数：power
Expected power (as estimated by E(TP)/E(positives)) if we were to stop experimentation at this time.
预计功率（E（下TP）/（阳性）的估计），如果我们要停止在这次实验。

参数：summary
Summary statistic: increase in expected true positives if we were to obtain one more data batch.
汇总统计：预期真阳性的增加，如果我们获得一个一批数据。

作者（S）----------Author(s)----------

David Rossell.

参考文献----------References----------

high-throughput hypothesis testing experiments. http://sites.google.com/site/rosselldavid/home.
data analysis. Annals of Applied Statistics, 2009, 3, 1035-1051.

参见----------See Also----------

fitGG for fitting a GaGa model, seqBoundariesGrid for finding the optimal design based on the forwards simulation output.
fitGG拟合一个GAGA模型，seqBoundariesGrid发现优化设计的基础上向前模拟输出。

举例----------Examples----------

#Simulate data and fit GaGa model[模拟数据和适合GAGA模型]
set.seed(1)
x <- simGG(n=20,m=2,p.de=.5,a0=3,nu=.5,balpha=.5,nualpha=25)
gg1 <- fitGG(x,groups=1:2,method='EM')
gg1 <- parest(gg1,x=x,groups=1:2)

#Run forward simulation[运行正演模拟]
fs1 <- forwsimDiffExpr(gg1, x=x, groups=1:2,
maxBatch=2,batchSize=1,fdrmax=0.05, B=100, Bsummary=100, randomSeed=1)

#Expected number of true positives for each sample size[每个样品大小为真阳性的预计数]
tapply(fs1$u,fs1$time,'mean')

#Expected utility for each sample size[每个样品大小的预期效用]
samplingCost <- 0.01
tapply(fs1$u,fs1$time,'mean') - samplingCost*(0:2)

#Optimal sequential design[最优序贯设计]
b0seq <- seq(0,20,length=200); b1seq <- seq(0,40,length=200)
bopt <-seqBoundariesGrid(b0=b0seq,b1=b1seq,forwsim=fs1,samplingCost=samplingCost,powmin=0)
bopt <- bopt$opt

plot(fs1$time,fs1$u,xlab='Additional batches',ylab='E(newly discovered DE genes)')
abline(bopt['b0'],bopt['b1'])
text(.2,bopt['b0'],'Continue',pos=3)
text(.2,bopt['b0'],'Stop',pos=1)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册