找回密码
 注册
查看: 489|回复: 0

R语言 gaga包 forwsimDiffExpr()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-25 18:21:05 | 显示全部楼层 |阅读模式
forwsimDiffExpr(gaga)
forwsimDiffExpr()所属R语言包:gaga

                                         Forward simulation for differential expression.
                                         正演模拟的差异表达。

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Forward simulation allows to evaluate the expected utility for sequential designs. Here the utility is the expected number of true discoveries minus a sampling cost. The routine simulates future data either from the prior predictive or using a set of pilot data and a GaGa model fit. At each future time point, it computes a summary statistic that will be used to determine when to stop the experiment.
正演模拟可以评估的顺序设计的预期效用。这里的效用,是真正的发现,减去一个取样成本的预期。常规模拟未来无论是从之前的预测,或用一组试验数据和GAGA模型拟合数据。在未来每一个时间点,计算汇总统计,将被用来确定何时停止实验。


用法----------Usage----------


forwsimDiffExpr(gg.fit, x, groups, ngenes, maxBatch, batchSize, fdrmax = 0.05, genelimit, v0thre = 1, B = 100,
Bsummary = 100, trace = TRUE, randomSeed)



参数----------Arguments----------

参数:gg.fit
GaGa or MiGaGa fit (object of type gagafit, as returned by fitGG).  
GAGA或MiGaGa合适的类型gagafit,fitGG返回的对象。


参数:x
ExpressionSet, exprSet, data frame or matrix containing the gene expression measurements used to fit the model.
ExpressionSet,exprSet,数据框或矩阵包含用于拟合模型的基因表达测量。


参数:groups
If x is of type ExpressionSet or exprSet, groups should be the name of the column in pData(x) with the groups that one wishes to compare. If x is a matrix or a data frame, groups should be a vector indicating to which group each column in x corresponds to.
x如果类型ExpressionSet或exprSet,groups应该是列名pData(x)一个愿望比较组。 x如果是一个矩阵或一个数据框,groups应该是哪一组x中的每一列对应的向量。


参数:ngenes
Number of genes to simulate data for. If x is specified this argument is set to nrow(x) and data is simulated from the posterior predictive conditional on x. If x not specified simulation is from the prior predictive.  
基因数目的模拟数据。如果x指定此参数设置为nrow(x)和数据模拟x的条件后预测。如果x未指定的模拟是从之前的预测。


参数:maxBatch
Maximum number of batches, i.e. the routine simulates batchSize*maxBatch samples per group.
批次最大数量,即常规的模拟batchSize*maxBatch每组样本。


参数:batchSize
Batch size, i.e. number of observations per group to simulate at each time point. Defaults to ncol(x)/length(unique(groups)).  
批量大小,即每个组的意见,以模拟在每个时间点。 ncol(x)/length(unique(groups))默认。


参数:fdrmax
Upper bound on FDR.
在FDR的上的约束。


参数:genelimit
Only the genelimit genes with the lowest probability of being equally expressed across all groups will be simulated. Setting this limit can significantly increase the computational speed.  
只有genelimit基因同样被所有群体表示的最低概率将模拟。设置此限制,可以大大提高运算速度。


参数:v0thre
Only genes with posterior probability of being equally expressed < v0thre will be simulated.  Setting this limit can significantly increase the computational speed.
后验概率只有基因同样表示<v0thre将模拟的。设置此限制,可以大大提高运算速度。


参数:B
Number of forward simulations.  
远期模拟数。


参数:Bsummary
Number of simulations for estimating the summary statistic.  
模拟估算汇总统计数。


参数:trace
For trace==TRUE iteration progress is displayed.
trace==TRUE迭代进度显示。


参数:randomSeed
Integer value used to set random number generator seed. Defaults to as.numeric(Sys.time()) modulus 10^6.
Integer值,用于设置随机数发生器的种子。默认为as.numeric(Sys.time())模10 ^ 6。


Details

详情----------Details----------

To improve computational speed hyper-parameters are not re-estimated as new data is simulated.
为了提高运算速度超参数不重新估计新的数据模拟。


值----------Value----------

A data.frame with the following columns:
一个data.frame以下的列:


参数:simid
Simulation number.
模拟数。


参数:j
Time (sample size).
时间(样本大小)。


参数:u
Expected number of true positives if we were to stop experimentation at this time.
预计真阳性数,如果我们在这个时候停止试验。


参数:fdr
Expected FDR if we were to stop experimentation at this time.
预计FDR,如果我们要停止在这次实验。


参数:fnr
Expected FNR if we were to stop experimentation at this time.
FNR的预期,如果我们要停止在这次实验。


参数:power
Expected power (as estimated by E(TP)/E(positives)) if we were to stop experimentation at this time.
预计功率(E(下TP)/(阳性)的估计),如果我们要停止在这次实验。


参数:summary
Summary statistic: increase in expected true positives if we were to obtain one more data batch.
汇总统计:预期真阳性的增加,如果我们获得一个一批数据。


作者(S)----------Author(s)----------


David Rossell.



参考文献----------References----------

high-throughput hypothesis testing experiments. http://sites.google.com/site/rosselldavid/home.
data analysis. Annals of Applied Statistics, 2009, 3, 1035-1051.

参见----------See Also----------

fitGG for fitting a GaGa model, seqBoundariesGrid for finding the optimal design based on the forwards simulation output.
fitGG拟合一个GAGA模型,seqBoundariesGrid发现优化设计的基础上向前模拟输出。


举例----------Examples----------


#Simulate data and fit GaGa model[模拟数据和适合GAGA模型]
set.seed(1)
x <- simGG(n=20,m=2,p.de=.5,a0=3,nu=.5,balpha=.5,nualpha=25)
gg1 <- fitGG(x,groups=1:2,method='EM')
gg1 <- parest(gg1,x=x,groups=1:2)

#Run forward simulation[运行正演模拟]
fs1 <- forwsimDiffExpr(gg1, x=x, groups=1:2,
maxBatch=2,batchSize=1,fdrmax=0.05, B=100, Bsummary=100, randomSeed=1)

#Expected number of true positives for each sample size[每个样品大小为真阳性的预计数]
tapply(fs1$u,fs1$time,'mean')

#Expected utility for each sample size[每个样品大小的预期效用]
samplingCost <- 0.01
tapply(fs1$u,fs1$time,'mean') - samplingCost*(0:2)

#Optimal sequential design[最优序贯设计]
b0seq <- seq(0,20,length=200); b1seq <- seq(0,40,length=200)
bopt <-seqBoundariesGrid(b0=b0seq,b1=b1seq,forwsim=fs1,samplingCost=samplingCost,powmin=0)
bopt <- bopt$opt

plot(fs1$time,fs1$u,xlab='Additional batches',ylab='E(newly discovered DE genes)')
abline(bopt['b0'],bopt['b1'])
text(.2,bopt['b0'],'Continue',pos=3)
text(.2,bopt['b0'],'Stop',pos=1)

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-2-8 15:01 , Processed in 0.018869 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表