R语言 WGCNA包 TrueTrait()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-10-1 22:25:09

TrueTrait(WGCNA)
TrueTrait()所属R语言包：WGCNA

                                    Estimate the true trait underlying a list of surrogate markers.
                                       估计真正的性状相关的列表的替代指标。

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Assume an imprecisely measured trait y that is related to  the true, unobserved trait yTRUE as follows yTRUE=y+noise where noise is assumed to have mean zero and a constant variance. Assume you have 1 or more surrogate markers for yTRUE corresponding to the columns of datX. The function implements several approaches for estimating yTRUE based on the inputs y and/or datX.
假设的不准确测得的特征y这是真实的，未观察到的特征yTRUE如下yTRUE = Y +噪声在噪声假设零均值和恒定方差。假设你有1个或多个yTRUE对应的列datX的替代指标。根据输入的功能实现了几个估计yTRUE的方法y和/或datX。

用法----------Usage----------

      corFnc = "cor", corOptions = "use = 'pairwise.complete.obs'",
      LeaveOneOut.CV=FALSE, skipMissingVariables=TRUE,
      addLinearModel=FALSE, Strata=NULL)

参数----------Arguments----------

参数：datX
is a vector or data frame whose columns correspond to the surrogate markers (variables) for the true underlying trait. The number of rows of datX equals the number of observations, i.e. it should equal the length of y
是一个向量或列对应的真正的潜在性的替代标志物（变量）的数据框。行datX的数量等于观测值的数量，也就是说，它的长度应该等于y

参数：y
is a numeric vector which specifies the observed trait.
是一个数值向量指定所观察到的性状。

参数：datXtest
can be set as a matrix or data frame of a second, independent test data set. Its columns should correspond to those of datX, i.e. the two data sets should have the same number of columns but the number or rows (test set observations) can be different.
可以被设置为一个矩阵或数据框中的第二个，独立的测试数据集。它的列应该对应于那些对datX，也就是说，两个数据集应具有相同的列数，但可以是不同的数或行（测试集观测）。

参数：corFnc
Character string specifying the correlation function to be used in the calculations.  Recomended values are the default Pearson correlation "cor" or biweight mid-correlation "bicor". Additional arguments to the correlation function can be specified using corOptions.
字符的字符串指定的相关计算中使用的函数。推荐值是默认的Pearson相关"cor"或biweight的中相关"bicor"。其他的相关函数的参数，可以指定使用corOptions。

参数：corOptions
Character string giving additional arguments to the function specified in corFnc.
提供额外的参数字符串指定的corFnc的功能。

参数：LeaveOneOut.CV
logical. If TRUE then leave one out cross validation estimates will be calculated for y.true1 and y.true2 based on datX.
逻辑。如果是TRUE，然后留一交叉验证估计将被计算为y.true1和y.true2的基础上datX。

参数：skipMissingVariables
logical. If  TRUE then variables whose values are missing for a given observation will be skipped when estimating the true trait of that particular observation. Thus, the estimate of a particular observation are determined by all the variables whose values are non-missing.
逻辑。如果是TRUE，那么变量的值丢失一个给定的观测将被跳过时，估计真正的特质，特别是观察。因此，估计是由一个特定的观察变量的值是不可缺失的。

参数：addLinearModel
logical. If TRUE then the function also estimates the true trait based on the predictions of the linear model lm(y~., data=datX)
逻辑。如果TRUE，那么该功能还估计，基于对线性模型lm(y~., data=datX)的预测真正的性状

参数：Strata
vector whose length show equal the number of rows of datX. This vector allows one to specify subsets of observations (called strata or batches or independent data sets) to which the function should be applied. For example, if the observations were measured in different batches then one can apply the function to each individual batch of data. No estimates will be calculated for Strata components that equal NA or "exclude".
向量，其长度秀等于datX的数量的行。这个向量允许一个指定的子集观测（阶层或批次或独立的数据集），应适用的功能。例如，如果观测，测定在不同批次的再一个可以应用到每个单独的批处理数据的功能。估计将计算Strata组件，等于NA“或”排除“。

Details

详细信息----------Details----------

This R function implements formulas described in Klemera and Doubal  (2006). The assumptions underlying these formulas are described in Klemera et al. But briefly, the function provides several estimates of the true underlying trait under the following assumptions: 1) There is a true underlying trait that affects y and a list of surrogate markers corresponding to the columns of datX. 2) There is a linear relationship between the true underlying trait and y and the surrogate markers. 3)  yTRUE =y +Noise where the Noise term has a mean of zero and a fixed variance. 4) Weighted least squares estimation is used to relate the surrogate markers to the underlying trait where the weights are proportional to 1/ssq.j where ssq.j is the noise variance of the j-th marker.
R的功能实现在Klemera和Doubal（2006）的公式。这些基本公式的假设在Klemera等。但简单地说，该功能提供了几个估计真正的潜在遗传根据以下假设：1）是一个真正的基本特征，影响y和一个列表的替代指标对应的列datX。 2）是真正的基本特征和y和替代标记物之间的线性关系。 3）yTRUE = Y +噪声，其中噪音术语具有零均值和一个固定的方差。 4）使用加权最小二乘估计与底层的性状，其中权重正比1/ssq.j，其中ssq.j是噪声方差的第j个标记的替代标记物。

Specifically, output y.true1 corresponds to formula 31,  y.true2 corresponds to formula 25, and y.true3 corresponds to formula 34.
具体而言，输出y.true1对应于式31，y.true2对应于式25，和y.true3对应于式34。

Although the true underlying trait yTRUE is not known, one can estimate the standard deviation between the estimate y.true2 and yTRUE using formula 33. Similarly, one can estimate the SD for the estimate y.true3 using formula 42. These estimated SDs correspond to output components 2 and 3, respectively. These SDs are valuable since they provide a sense of how accurate the measure is.
虽然不知道真正的相关性状yTRUE的，人们可以估算之间的估计的标准偏差y.true2和yTRUE使用公式33。同样地，我们可以估算出SD估计y.true3使用公式42。这些估计的标准差输出部件2和3，分别对应于。这些标准差是有价值的，因为它们提供了感的措施是如何准确的。

To estimate the correlations between y and the surrogate markers, one can specify different correlation measures. The default method is based on the Person correlation but one can also specify the biweight midcorrelation by choosing "bicor", see help(bicor) to learn more.
估计y和替代标记物之间的相关性，可以指定不同的相关措施。默认的方法是根据人相关但1也可以指定biweight的midcorrelation，通过选择“BICOR”，请参阅帮助（BICOR）了解更多信息。

When the datX is comprised of observations measured in different strata (e.g. different batches or independent data sets) then one can obtain stratum specific estimates by specifying the strata using the argument Strata. In this case, the estimation focuses on one stratum at a time.
当datX由观测测量在不同的阶层（例如，不同批次或独立的数据集），那么就可以得到通过指定的地层参数Strata阶层的具体估计。在这种情况下，估计的重点在一个时间上一个阶层。

值----------Value----------

A list with the following components.
与以下组件的列表。

参数：datEstimates
is a data frame whose columns corresponds to estimates of the true underlying trait. The number of rows equals the number of observations, i.e. the length of y. The first column y.true1 is the average value of standardized columns of datX where standardization subtracts out the intercept term and divides by the slope of the linear regression model lm(marker~y). Since this estimate ignores the fact that the surrogate markers have different correlations with y, it is typically inferior to y.true2.  The second column y.true2 equals the weighted average value of standardized columns of datX. The standardization is described in section 2.4 of Klemera et al. The weights are proportional to r^2/(1+r^2) where r denotes the correlation between the surrogate marker and y. Since this estimate does not include y as additional surrogate marker, it may be slightly inferior to y.true3. Having said this, the difference between y.true2 and y.true3 is often negligible.  An additional column called y.lm is added if codeaddLinearModel=TRUE. In this case, y.lm reports the linear model predictions. Finally, the column y.true3 is very similar to y.true2 but it includes y as additional surrogate marker. It is expected to be the best estimate of the underlying true trait (see Klemera et al 2006).
是一个数据框的列对应于估计的真实的底层性状。的行数相等的若干意见，即长度y。第一列y.true1的平均值的标准列datX标准化减去截距项和分流明（标记~Y）的线性回归模型的斜率。估计因为这忽略了一个事实，替代标记物有不同的相关性y，它通常是不如y.true2。第二列y.true2等于标准列datX的加权平均值。标准化描述2.4节中的Klemera等。的权重都正比于r ^ 2 /（1 + R ^ 2）其中，r表示的替代标记和y之间的相关性。由于这个估计不包括y额外的替代标志物，它可能会略逊于y.true3。话虽如此，的区别y.true2和y.true3往往是可以忽略不计。一个附加列名为y.lm的加入= TRUE如果codeaddLinearModel。在这种情况下，y.lm报告的线性模型的预测。最后，列y.true3是非常类似于y.true2的，但是它包含了y的额外替代标记。预计将底层的真实特征（见Klemera等，2006年）的最佳估计。

参数：datEstimatestest
is output only if a test data set has been specified in the argument datXtest. In this case, it contains a data frame with columns ytrue1 and ytrue2. The number of rows equals the number of test set observations, i.e the number of rows of datXtest. Since the value of y is not known in case of a test data set, one cannot calculate y.true3. An additional column with linear model predictions y.lm is added if codeaddLinearModel=TRUE.
如果测试数据集已被指定的输出参数datXtest。在这种情况下，它包含一个数据框与列ytrue1和ytrue2。的行数的数目等于测试集观测，即datXtest的数量的行。由于值y一个测试数据集的情况下，还没有已知的，一个不能计算y.true3。一个附加列，线性模型预测y.lm如果codeaddLinearModel = TRUE。

参数：datEstimates.LeaveOneOut.CV
is output only if the argument LeaveOneOut.CV has been set to TRUE. In this case, it contains a data frame with leave-one-out cross validation estimates of ytrue1 and ytrue2. The number of rows equals the length of y. Since the value of y is not known in case of a test data set, one cannot calculate y.true3
只有当参数LeaveOneOut.CV已设置为TRUE输出。在这种情况下，它包含了一个数据框留一交叉验证估计ytrue1和ytrue2。的行数等于长度y。由于值y一个测试数据集的情况下，还没有已知的，一个不能计算y.true3

参数：SD.ytrue2
is a scalar. This is an estimate of the standard deviation between the estimate y.true2 and the true (unobserved) yTRUE. It corresponds to formula 33.
是一个标量。这是一个估计的估计y.true2和真（未观察到）yTRUE之间的标准偏差。它对应于通式（33）。

参数：SD.ytrue3
is a scalar. This is an estimate of the standard deviation between y.true3 and the true (unobserved) yTRUE. It corresponds to formula 42.
是一个标量。这是一个估计的标准差之间y.true3真正的（不可观察的）yTRUE。它对应于式42。

参数：datVariableInfo
is a data frame that reports information for each variable (column of datX) when it comes to the definition of y.true2. The rows correspond to the number of variables. Columns report the variable name, the center (intercept that is subtracted to scale each variable), the scale (i.e. the slope that is used in the denominator), and finally the weights used in the weighted sum of the scaled variables.
是一个数据框，报告每个变量的信息（列datX），当它涉及到的定义y.true2。行对应的变量的数量。列报告的变量名，该中心（截距是减去以缩放每个变量），规模（即用于在分母中的斜率），最后使用的权重的加权和中的图像的缩放的变量。

参数：datEstimatesByStratum
a data frame that will only be output if Strata is different from NULL. In this case, it is has the same dimensions as datEstimates but the estimates were calculated separately for each level of Strata.
将只能输出一个数据框，如果Strata是不同的NULL。在这种情况下，它被作为datEstimates但估计分别计算每个级别Strata具有相同的尺寸。

参数：SD.ytrue2ByStratum
a vector of length equal to the different levels of Strata. Each component reports the estimate of SD.ytrue2 for observations in the stratum specified by unique(Strata).
一个向量的长度等于Strata的不同层次。每个组件报告的估计SD.ytrue2观测地层中的指定的唯一的（分层）。

参数：datVariableInfoByStratum
a list whose components are matrices with variable information. Each list component reports the variable information in the stratum specified by unique(Strata).
作为分量的列表是矩阵可变信息。每个列表元件报告指定的唯一的（分层）的地层中的变量信息。

（作者）----------Author(s)----------

Steve Horvath

参考文献----------References----------

Ageing and Development 127 (2006) 240-248
Volume 131, Issue 2, February 2010, Pages 69-78

实例----------Examples----------

# observed trait[观察到的特征]
y=rnorm(1000,mean=50,sd=20)
# unobserved, true trait[未观察到，真正的性状]
yTRUE =y +rnorm(100,sd=10)
# now we simulate surrogate markers around the true trait[现在，我们模拟的真实特征的替代标记物的]
datX=simulateModule(yTRUE,nGenes=20, minCor=.4,maxCor=.9,geneMeans=rnorm(20,50,30)  )
True1=TrueTrait(datX=datX,y=y)
datTrue=True1$datEstimates
par(mfrow=c(2,2))
for (i in 1:dim(datTrue)[[2]] ){
meanAbsDev= mean(abs(yTRUE-datTrue[,i]))
verboseScatterplot(datTrue[,i],yTRUE,xlab=names(datTrue)[i],  main=paste(i, "MeanAbsDev=", signif(meanAbsDev,3)) )  ; abline(0,1)
}
#compare the estimated standard deviation of y.true2[比较估计的标准偏差y.true2]
True1[[2]]
# with the true SD[与真正的SD]
sqrt(var(yTRUE-datTrue$y.true2))
#compare the estimated standard deviation of y.true3[比较估计的标准偏差y.true3]
True1[[3]]
# with the true SD[与真正的SD]
sqrt(var(yTRUE-datTrue$y.true3))

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册