R语言 snpStats包 xxt()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-26 14:49:54

xxt(snpStats)
xxt()所属R语言包：snpStats

                                    X.X-transpose for a standardized SnpMatrix
                                       标准化SnpMatrix X.X，转

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

The input SnpMatrix is first standardized by subtracting the mean (or stratum mean) from each call  and dividing by the expected standard deviation under Hardy-Weinberg equilibrium. It is then post-multiplied by its transpose. This is a preliminary step in the computation of principal components.
在的输入SnpMatrix是先减去从每次通话的平均（或阶层的意思）和Hardy-Weinberg平衡的预期下标准差除以标准。它是那么后乘以其转。这是一个计算主成分的初步步骤。

用法----------Usage----------

xxt(snps, strata = NULL, correct.for.missing = FALSE, lower.only = FALSE,
  uncertain = FALSE)

参数----------Arguments----------

参数：snps
The input matrix, of type "SnpMatrix"
输入矩阵类型"SnpMatrix"

参数：strata
A factor (or an object which can be coerced into a  factor) with length equal to the number of rows of snps defining stratum membership
一个factor（或可分为factor裹挟的对象）的长度等于snps定义阶层成员的行数

参数：correct.for.missing
If TRUE, an attempt is made to correct for the effect of missing data by use of inverse probability weights. Otherwise, missing observations are scored zero in the standardized matrix
如果TRUE，企图纠正丢失数据的逆概率加权的使用效果。否则，缺少观测的标准化矩阵零分

参数：lower.only
If TRUE, only the lower triangle of the result is returned and the upper triangle is filled with zeros. Otherwise, the complete symmetric matrix is returned
如果TRUE，结果只有下三角返回上三角形用零填充。否则，返回完整的对称矩阵

参数：uncertain
If TRUE, uncertain genotypes are replaced by posterior expectations. Otherwise these are treated as missing values
如果TRUE，不确定的基因型被替换后的期望。否则，这些被视为遗漏值

Details

详情----------Details----------

This computation forms the first step of the calculation of principal components for genome-wide SNP data. As pointed out by Price et al. (2006), when the data matrix has more rows than columns it is most efficient to calculate the eigenvectors of <VAR>X</VAR>.<VAR>X</VAR>-transpose, where <VAR>X</VAR> is a  SnpMatrix whose columns have been  standardized to zero mean and unit variance. For autosomes, the genotypes are given codes 0, 1 or 2 after subtraction of the mean, 2<VAR>p</VAR>, are divided by the standard deviation  sqrt(2<VAR>p</VAR>(1-<VAR>p</VAR>)) (<VAR>p</VAR> is the estimated allele frequency). For SNPs on the X chromosome in male subjects, genotypes are coded 0 or 2. Then  the mean is still 2<VAR>p</VAR>, but the standard deviation is  2sqrt(<VAR>p</VAR>(1-<VAR>p</VAR>)). If the strata is supplied, a stratum-specific estimate value for <VAR>p</VAR> is used for standardization.
此计算，形成了全基因组SNP数据的主成分计算的第一步。正如价格等。（2006年），数据矩阵有更多的行比列时，它是最有效的计算<VAR> X </ VAR的。<VAR> X] </ VAR的>转，<VAR> X </ VAR的特征向量>SnpMatrix其已经标准化零均值和单位方差的列。为常染色体，基因型代码后平均减2 0，1或2 <VAR>带够</ VAR的，除以标准差SQRT（2 <VAR> P </变更>（1  -  <的VAR> P </功>））（<VAR> P </ VAR的估计等位基因频率）。对于男性的X染色体的单核苷酸多态性，基因型编码0或2。然后平均仍然是2 <VAR> P </ VAR的，但标准差是2sqrt（<VAR> P </变更>（1  -  <VAR> P </ VAR的>））。如果strata提供，阶层的具体估计值<VAR>带够</ VAR>用于标准化。

Missing observations present some difficulty. Price et al. (2006) recommended replacing missing observations by their means, this being equivalent to replacement by zeros in the standardized matrix. However this results in a biased estimate of the complete data result. Optionally this bias can be corrected by inverse probability weighting. We assume that the probability that any one call is missing is small, and can be predicted by a multiplicative model with row (subject) and column (locus) effects. The estimated probability of a missing value in a given row and column is then given by m = RC/T, where <VAR>R</VAR> is the row total number of no-calls, <VAR>C</VAR> is the column total of no-calls, and <VAR>T</VAR> is the overall total number of no-calls. Non-missing contributions to <VAR>X</VAR>.<VAR>X</VAR>-transpose are then weighted by w=1/(1-m) for contributions to the diagonal elements, and products of the relevant pairs of weights for contributions to off–diagonal elements.
失踪的意见提出了一些困难。价格等。（2006）建议更换缺少观测的手段，这相当于在标准化矩阵的零更换。然而，这在一个完整的数据结果有偏估计的结果。逆概率加权可选，可以纠正这种偏见。我们认为，缺少任何一个检测的概率是很小的，可以由行（主题）和列（轨迹）的影响与乘法模型预测。然后，在一个给定的行和列的缺失值估计概率m = RC/T，<VAR>ŕ</ VAR的是没有调用的行总数，<VAR> </变更>是没有调用列总<VAR>温度</ VAR的总数是没有调用整体。非失踪的贡献<VAR> X] </ VAR的。<VAR> X </ VAR的>转然后加权w=1/(1-m)对角线元素的贡献，和重量有关对产品的贡献非对角线元素。

值----------Value----------

A square matrix containing either the complete X.X-transpose matrix, or just its lower triangle
一个方阵，无论是完整的XX，转置矩阵，或者只是它的下三角

警告----------Warning----------

The correction for missing observations can result in an output matrix which is not positive semi-definite. This should not matter in the application for which it is intended
失踪意见修正，可能会导致在输出这不是半正定矩阵。这不应该的问题在它的目的是应用程序

注意----------Note----------

In genome-wide studies, the SNP data will usually be held as a series of objects (of class "SnpMatrix" or"XSnpMatrix"), one per chromosome. Note that the  <VAR>X</VAR>.<VAR>X</VAR>-transpose matrices produced by applying the xxt function to each object in turn can be added to yield the genome-wide result.
在全基因组的研究中，通常会举行的SNP数据为对象的系列（类"SnpMatrix"或"XSnpMatrix"），每个染色体之一。需要注意的是，<VAR> X </变更>。<VAR> X] </ VAR的>转反过来运用xxt函数的每个对象产生的矩阵可以被添加到产生的全基因组的结果。

If the matrix is converted to a correlation matrix by pre- and post-multiplying by the sqrt of the inverse of its diagonal, then this is an unbiased estimate of twice the kinship matrix.
如果矩阵转换为相关矩阵预处理和后乘以SQRT其对角线逆，那么这是一个无偏估计的两倍亲属关系矩阵。

作者（S）----------Author(s)----------

David Clayton <a href="mailto:david.clayton@cimr.cam.ac.uk">david.clayton@cimr.cam.ac.uk</a>

参考文献----------References----------

stratification in genome-wide association studies. Nature Genetics, 38:904-9

举例----------Examples----------

# make a SnpMatrix with a small number of rows[与少数的行1 SnpMatrix]
data(testdata)
small <- Autosomes[1:100,]
# Calculate the X.X-transpose matrix[计算X.X转置矩阵]
xx <- xxt(small, correct.for.missing=TRUE)
# Calculate the principal components[计算的主要组成部分]
pc <- eigen(xx, symmetric=TRUE)$vectors

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 snpStats包 xxt()函数中文帮助文档(中英文对照)

浏览过的版块