R语言 CNVtools包 CNVtest.binary()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 15:40:13

CNVtest.binary(CNVtools)
CNVtest.binary()所属R语言包：CNVtools

                                    Fits a mixture of Gaussian to CNV data
                                       适合混合高斯CNV的数据

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This function fits a mixture of Gaussians to Copy Number Variant data, both under the null hypothesis of no association and under the alternate hypothesis that the CNV frequencies differ between cases and controls.
此功能适合拷贝数变异的数据下没有关联的零假设下的替代假说，CNV的频率在病例组和对照组之间差异，高斯混合。

用法----------Usage----------

CNVtest.binary(signal, batch, sample = NULL, disease.status = NULL, ncomp,
         n.H0 = 5, n.H1 = 0,
            output = 'compact',
         model.mean = "~ strata(batch, cn)",
         model.var = "~ strata(batch, cn)",
            model.disease ="~ cn",
            association.test.strata = NULL,
         beta.estimated = NULL,
         start.mean = NULL,
            start.var = NULL,
         control = list(tol = 1e-05, max.iter = 3000,min.freq = 4))

参数----------Arguments----------

参数：signal
The vector of intensity values, meant to be a proxy for the number of copies.
强度值向量，意味着，份数代理。

参数：batch
Factor, that describes how the data points should be separated in batches, corresponding to different tehnologies to measure the number of DNA copies, or maybe different cohorts in a case control framework.
因素，描述数据点应如何分批分离，对应到不同tehnologies DNA拷贝，或者不同世代的数量来衡量的情况下控制框架。

参数：sample
Optional (but recommended). A character vector containing a name for each data point, typically the name of the individuals.
可选（但建议）。一个特征向量包含一个名称为每个数据点，通常是个人的名字。

参数：disease.status
In the case control situation a vector of 0 and 1 indicating which individuals are controls or cases.
0和1的向量表示的情况下控制局势个人控制或情况。

参数：ncomp
Number of components one wants to fit to the data.
数的组成部分之一，要适合数据。

参数：n.H0
Number of times the EM should be used to maximize the likelihood under the null hypothesis of no association, each time with a different random starting point. The run that maximizes the  likelihood is stored.
数倍的EM应最大限度地根据零假设的无关联的可能性，每次都用不同的随机起点。存储运行，最大限度地提高的可能性。

参数：n.H1
Number of times the EM should be used to maximize the likelihood under the alternate hypothesis of association present, each time with a different random starting point. The run that maximizes the likelihood is stored.
数倍的EM应最大限度地根据协会目前的替代假说的可能性，每次都用不同的随机起点。存储运行，最大限度地提高的可能性。

参数：output
The default value, “compact”, returns a data frame with one line per sample.  Any other setting witll return a much bigger data frame with one line per individual and copy number. This long format is the one used by the underlying fitting algorithm and is only useful if one attempts to use CNVtools in a non standard manner.
“紧凑型”，默认值，返回一个数据框，每一个样本行。任何其他的设置witll返回了一个更大的数据框，每一个个人和拷贝数线。这个长格式是由底层的拟合算法所使用的，并且是唯一有用的，如果有人试图使用一种非标准的方式CNVtools。

参数：model.mean
Formula that describes the linear model for the location of the mean signal intensity. The default is “~ strata(cn, batch)”, which means that the mean intensity can take any value for any combination of the variables “cn” (for copy number) and “batch”. More traditional model description such as ' ~ as.factor(cn)' for example are also possible, but are likely to be slower to fit and less numerically stable than the “strata” notation, which should be preferred.
公式描述的线性模型的平均信号强度的位置。默认是“~地层（CN，一批）”，这意味着平均强度可以采取任何变量“CN”（拷贝数）和“批处理”的任意组合的价值。更多的传统模式的描述，如“as.factor（CN）”的例子也有可能，但有可能是慢比“阶层”的符号，这应该是首选，以适应和减数值稳定。

参数：model.var
A formula as above, but to model the variances. Whenever possible and to maximise speed and stability the model should be specified using the strata command, for example “strata(batch, cn)” (the default), meaning that variances are free to take any value for each combination of the variables “batch” and “copy  number”. Alternatives such as “ ~ cn”, i.e. variance proportional to the number of copies are allowed but slower to fit, and less stable numerically.
上述的公式，但模型的差异。只要有可能，并最大限度地提高速度和稳定性的模型应指定使用阶层命令，例如“地层（批次，CN）”（默认值），也就是说差异是免费为每个变量的组合采取任何价值“批”和“拷贝数”。替代品，如“CN”，即方差成正比的份数被允许，但速度慢一些，以适应，不太稳定的数值。

参数：model.disease
A formula that links the number of copies with the case/control status. The default is a logit linear trend model “~ cn”. Note that this formula will only matter under the alternate hypothesis and has no effect under the null (model descriptions using the “strata” command are not allowed for this model).
公式，把情况/控制状态的副本数量。默认是1罗吉特线性趋势模型“CN”。请注意这个公式只有物质替代假设下，下空没有效果（模型说明使用“阶层”命令不允许这种模式）。

参数：association.test.strata
Optional factor providing the strata when  using a stratified test of association (typically, but not always,  these are geographic regions of origins of the samples).
可选的因素提供的地层，当使用分层测试协会（通常情况下，但并非总是如此，这些区域的样品的起源）。

参数：beta.estimated
Optional. It is used if one wants to fit the model for a particular value of the log odds parameter beta (essentially if one is interested in the profile likelihood). In this case the disease model should be set to ' ~ 1' and the model to 'H1'. It will then provide the best model assuming the value of beta (the log odds ratio parameter) provided by the user.
可选的。它是用来为特定值的log的赔率参数测试（基本上如果是在配置文件的可能性感兴趣），如果想以适应模型。疾病模型，在这种情况下，应设置“~1”和“上半年模型。然后，它会提供最好的模型，假设由用户提供的β值（log胜算比参数）。

参数：start.mean
Optional. A set of starting values for the means. Must be numeric and the size must match ncomp. This argument can also be a matrix if one wants to specify multiple starting points. When passing a matrix as argument, the number of columns should equal the number of components, and the number of rows must be greater than max(n.H0, n.H1). When in a row some numbers are missing, CNVtools will pick the starting points randomly (the default).
可选的。一套手段开始值。必须是数字的大小必须符合NCOMP的。这个参数也可以是一个矩阵，如果要指定多个出发点。当作为参数传递矩阵的列数应该等于元件的数量，和行数必须大于最大（n.H0，n.H1）。当在连续数缺少中，CNVtools会挑的出发点随机（默认）。

参数：start.var
Optional. A set of starting values for the variances. Must be numeric and the size must match ncomp. Can also be a matrix (see start.mean for details).
可选的。一组值的差异开始。必须是数字的大小必须符合NCOMP的。也可以是一个矩阵（见细节start.mean）。

参数：control
A list of parameters that control the behavior of the fitting. min.freq is the minimum number of data points in a copy number class before  the algorithm sets the frequency of this class to zero.In the presence of a  very rare genotype group it might be useful to lower this threshold.  Note, however, that estimating the variance if there are very few individuals  in a class may not be possible, so setting options such as constant variances  (i.e. model.var = ' ~1') might be sensible.
拟合的行为的参数控制列表。 min.freq拷贝数类中的数据点的最低数量的算法之前设置这个类的频率zero.In一个非常罕见的基因型组的存在，它可能是有用的，以降低这个阈值。但是请注意，估计方差，如果在一个类中有极少数人可能不会是可能的，所以设置，如恒定方差（即model.var =~1）可能是明智的选择。

值----------Value----------

参数：model.H0
The parameters for the best fit under H0.
H 0下的最适合的参数。

参数：posterior.H0
The output dataframe with the estimate posterior distribution under H0 as well as the most likely call.
估计后验分布在H 0，以及最有可能的呼叫的输出dataframe。

参数：status.H0
A character that describes the status of the fit under H0. The possible values are 'C' (converged), 'M' (maximum iterations reached), 'P' (posterior distribution problem). Fits that don't return 'C' should be excluded.
一个合适的状态下H 0的字符描述。可能的值是“C”（融合），“M”（达到最大迭代），“P”（后分配问题）。不返回“C”的一刀切，应排除在外。

参数：model.H1
The parameters for the best fit under H1.
下H1的最合适的参数。

参数：posterior.H1
The output dataframe with the estimate posterior distribution under H1
H1的后验分布的估计下的输出dataframe

参数：status.H1
A character that describes the status of the fit under H1. The possible values are 'C' (converged), 'M' (maximum iterations reached), 'P' (posterior distribution problem). Fits that don't return 'C' should be excluded.
字符描述下的H1合适的状态。可能的值是“C”（融合），“M”（达到最大迭代），“P”（后分配问题）。不返回“C”的一刀切，应排除在外。

作者（S）----------Author(s)----------

Vincent Plagnol <vincent.plagnol@cimr.cam.ac.uk> and Chris Barnes <christopher.barnes@imperial.ac.uk>

参见----------See Also----------

apply.pca
apply.pca

举例----------Examples----------

#Load data for CNV for two control cohorts [CNV的数据加载两个控制同伙]
data(A112)
raw.signal <- as.matrix(A112[, -c(1,2)])
dimnames(raw.signal)[[1]] <- A112$subject

#Extract CNV signal using principal components[采用主成分提取CNV的信号]
pca.signal <- apply.pca(raw.signal)

#Extract batch, sample and trait information[提取批次，样品和特点的信息]
batches <- factor(A112$cohort)
sample <- factor(A112$subject)
trait <- ifelse( A112$cohort == '58C', 0, 1)

#Fit the CNV with a three component model[三个组件模型适合在CNV]
fit.pca <- CNVtest.binary(signal = pca.signal, sample = sample, batch = batches,
               disease.status = trait, ncomp = 3, n.H0=3, n.H1=3,
   model.disease = "~ cn")

if(fit.pca[['status.H0']] == 'C' && fit.pca[['status.H1']] == 'C'){
   #Calculate the likelihood ratio[计算的可能性比]
   LR <- -2*(fit.pca$model.H0$lnL - fit.pca$model.H1$lnL)

   #Calculate the pvalue. Has 1 dof since we fit a trend model[计算的pvalue。有1个自由度，因为我们适应趋势模型]
   pvalue <- 1 - pchisq(LR,1)
}

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册