forwsimDiffExpr(gaga)
forwsimDiffExpr()所属R语言包:gaga
Forward simulation for differential expression.
正演模拟的差异表达。
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Forward simulation allows to evaluate the expected utility for sequential designs. Here the utility is the expected number of true discoveries minus a sampling cost. The routine simulates future data either from the prior predictive or using a set of pilot data and a GaGa model fit. At each future time point, it computes a summary statistic that will be used to determine when to stop the experiment.
正演模拟可以评估的顺序设计的预期效用。这里的效用,是真正的发现,减去一个取样成本的预期。常规模拟未来无论是从之前的预测,或用一组试验数据和GAGA模型拟合数据。在未来每一个时间点,计算汇总统计,将被用来确定何时停止实验。
用法----------Usage----------
forwsimDiffExpr(gg.fit, x, groups, ngenes, maxBatch, batchSize, fdrmax = 0.05, genelimit, v0thre = 1, B = 100,
Bsummary = 100, trace = TRUE, randomSeed)
参数----------Arguments----------
参数:gg.fit
GaGa or MiGaGa fit (object of type gagafit, as returned by fitGG).
GAGA或MiGaGa合适的类型gagafit,fitGG返回的对象。
参数:x
ExpressionSet, exprSet, data frame or matrix containing the gene expression measurements used to fit the model.
ExpressionSet,exprSet,数据框或矩阵包含用于拟合模型的基因表达测量。
参数:groups
If x is of type ExpressionSet or exprSet, groups should be the name of the column in pData(x) with the groups that one wishes to compare. If x is a matrix or a data frame, groups should be a vector indicating to which group each column in x corresponds to.
x如果类型ExpressionSet或exprSet,groups应该是列名pData(x)一个愿望比较组。 x如果是一个矩阵或一个数据框,groups应该是哪一组x中的每一列对应的向量。
参数:ngenes
Number of genes to simulate data for. If x is specified this argument is set to nrow(x) and data is simulated from the posterior predictive conditional on x. If x not specified simulation is from the prior predictive.
基因数目的模拟数据。如果x指定此参数设置为nrow(x)和数据模拟x的条件后预测。如果x未指定的模拟是从之前的预测。
参数:maxBatch
Maximum number of batches, i.e. the routine simulates batchSize*maxBatch samples per group.
批次最大数量,即常规的模拟batchSize*maxBatch每组样本。
参数:batchSize
Batch size, i.e. number of observations per group to simulate at each time point. Defaults to ncol(x)/length(unique(groups)).
批量大小,即每个组的意见,以模拟在每个时间点。 ncol(x)/length(unique(groups))默认。
参数:fdrmax
Upper bound on FDR.
在FDR的上的约束。
参数:genelimit
Only the genelimit genes with the lowest probability of being equally expressed across all groups will be simulated. Setting this limit can significantly increase the computational speed.
只有genelimit基因同样被所有群体表示的最低概率将模拟。设置此限制,可以大大提高运算速度。
参数:v0thre
Only genes with posterior probability of being equally expressed < v0thre will be simulated. Setting this limit can significantly increase the computational speed.
后验概率只有基因同样表示<v0thre将模拟的。设置此限制,可以大大提高运算速度。
参数:B
Number of forward simulations.
远期模拟数。
参数:Bsummary
Number of simulations for estimating the summary statistic.
模拟估算汇总统计数。
参数:trace
For trace==TRUE iteration progress is displayed.
trace==TRUE迭代进度显示。
参数:randomSeed
Integer value used to set random number generator seed. Defaults to as.numeric(Sys.time()) modulus 10^6.
Integer值,用于设置随机数发生器的种子。默认为as.numeric(Sys.time())模10 ^ 6。
Details
详情----------Details----------
To improve computational speed hyper-parameters are not re-estimated as new data is simulated.
为了提高运算速度超参数不重新估计新的数据模拟。
值----------Value----------
A data.frame with the following columns:
一个data.frame以下的列:
参数:simid
Simulation number.
模拟数。
参数:j
Time (sample size).
时间(样本大小)。
参数:u
Expected number of true positives if we were to stop experimentation at this time.
预计真阳性数,如果我们在这个时候停止试验。
参数:fdr
Expected FDR if we were to stop experimentation at this time.
预计FDR,如果我们要停止在这次实验。
参数:fnr
Expected FNR if we were to stop experimentation at this time.
FNR的预期,如果我们要停止在这次实验。
参数:power
Expected power (as estimated by E(TP)/E(positives)) if we were to stop experimentation at this time.
预计功率(E(下TP)/(阳性)的估计),如果我们要停止在这次实验。
参数:summary
Summary statistic: increase in expected true positives if we were to obtain one more data batch.
汇总统计:预期真阳性的增加,如果我们获得一个一批数据。
作者(S)----------Author(s)----------
David Rossell.
参考文献----------References----------
high-throughput hypothesis testing experiments. http://sites.google.com/site/rosselldavid/home.
data analysis. Annals of Applied Statistics, 2009, 3, 1035-1051.
参见----------See Also----------
fitGG for fitting a GaGa model, seqBoundariesGrid for finding the optimal design based on the forwards simulation output.
fitGG拟合一个GAGA模型,seqBoundariesGrid发现优化设计的基础上向前模拟输出。
举例----------Examples----------
#Simulate data and fit GaGa model[模拟数据和适合GAGA模型]
set.seed(1)
x <- simGG(n=20,m=2,p.de=.5,a0=3,nu=.5,balpha=.5,nualpha=25)
gg1 <- fitGG(x,groups=1:2,method='EM')
gg1 <- parest(gg1,x=x,groups=1:2)
#Run forward simulation[运行正演模拟]
fs1 <- forwsimDiffExpr(gg1, x=x, groups=1:2,
maxBatch=2,batchSize=1,fdrmax=0.05, B=100, Bsummary=100, randomSeed=1)
#Expected number of true positives for each sample size[每个样品大小为真阳性的预计数]
tapply(fs1$u,fs1$time,'mean')
#Expected utility for each sample size[每个样品大小的预期效用]
samplingCost <- 0.01
tapply(fs1$u,fs1$time,'mean') - samplingCost*(0:2)
#Optimal sequential design[最优序贯设计]
b0seq <- seq(0,20,length=200); b1seq <- seq(0,40,length=200)
bopt <-seqBoundariesGrid(b0=b0seq,b1=b1seq,forwsim=fs1,samplingCost=samplingCost,powmin=0)
bopt <- bopt$opt
plot(fs1$time,fs1$u,xlab='Additional batches',ylab='E(newly discovered DE genes)')
abline(bopt['b0'],bopt['b1'])
text(.2,bopt['b0'],'Continue',pos=3)
text(.2,bopt['b0'],'Stop',pos=1)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|