snpgdsPairIBD(SNPRelate)
snpgdsPairIBD()所属R语言包:SNPRelate
Calculate Identity-By-Descent (IBD) Coefficients
(IBD)系数计算下降身份
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Calculate the three IBD coefficients (k0, k1, k2) for non-inbred individual pairs by Maximum Likelihood Estimation (MLE) or PLINK Method of Moment (MoM).
最大似然估计(MLE)或PLINK矩量法(MoM)的计算:IBD系数(K0,K1,K2),不交个人对。
用法----------Usage----------
snpgdsPairIBD(geno1, geno2, allele.freq, method=c("EM", "downhill.simplex", "MoM"),
kinship.constraint=FALSE, max.niter=1000, reltol=sqrt(.Machine$double.eps),
coeff.correct=TRUE, out.num.iter=TRUE, verbose=TRUE)
参数----------Arguments----------
参数:geno1
the SNP genotypes for the first individual, 0 – BB, 1 – AB, 2 – AA, other values – missing
SNP基因型的第一个人,0 - BB,1 - AB,2 - AA,其他值 - 失踪
参数:geno2
the SNP genotypes for the second individual, 0 – BB, 1 – AB, 2 – AA, other values – missing
SNP基因型的第二个人,0 - BB,1 - AB,2 - AA,其他值 - 缺少的
参数:allele.freq
the allele frequencies
等位基因频率
参数:method
"EM", "downhill.simplex", or "MoM", see details
“EM”,的“downhill.simplex”,或“妈妈”,查看详情
参数:kinship.constraint
if TRUE, constrict IBD coefficients ($k_0,k_1,k_2$) in the geneloical region ($2 k_0 k_1 >= k_2^2$)
如果为TRUE,压缩的IBD系数(K_0,K_1,K_2 $)中的geneloical的区域($ 2 K_0 K_1> = K_2 ^ 2 $)
参数:max.niter
the maximum number of iterations
最大迭代次数
参数:reltol
relative convergence tolerance; the algorithm stops if it is unable to reduce the value of log likelihood by a factor of $reltol * (abs(log likelihood with the initial parameters) + reltol)$ at a step.
相对收敛误差,算法停止,如果它是无法减少的一个因素$ RELTOL *(ABS(对数似然的初始参数)+ RELTOL)在一个步骤的对数似然值。
参数:coeff.correct
TRUE by default, see details
TRUE默认情况下,查看详情
参数:out.num.iter
if TRUE, output the numbers of iterations
如果为TRUE,输出迭代次数
参数:verbose
if TRUE, show information
如果为TRUE,显示信息
Details
详细信息----------Details----------
If method = "MoM", then PLINK Method of Moment without a allele-count-based correction factor is conducted. Otherwise, two numeric approaches for maximum likelihood estimation can be used: one is Expectation-Maximization (EM) algorithm, and the other is Nelder-Mead method or downhill simplex method. Generally, EM algorithm is more robust than downhill simplex method.
如果method = "MoM",然后砰砰的矩量法没有一个的基于等位基因数修正系数进行。否则,两个数字的最大似然估计的方法可以使用:一个是期望最大化(EM)算法,另一种是内尔德酒法或单纯形法。一般情况下,EM算法是更强大的比单纯形法。
If coeff.correct is TRUE, the final point that is found by searching algorithm (EM or downhill simplex) is used to compare the six points (fullsib, offspring, halfsib, cousin, unrelated), since any numeric approach might not reach the maximum position after a finit number of steps. If any of these six points has a higher value of log likelihood, the final point will be replaced by the best one.
如果coeff.correct是TRUE,搜索算法(EM或单纯形)的最后一点用来比较的六个点(fullsib,后代,halfsib,表哥,无关),因为任何数字方法可能不会到达后FINIT的步数的最大位置。如果这六个点中的任何具有较高的对数似然值,最后取而代之的将是最好的一个点。
值----------Value----------
Return a data.frame:
返回一个data.frame:
参数:k0
IBD coefficient, the probability of sharing ZERO IBD
IBD的系数,的概率下分享的ZERO IBD
参数:k1
IBD coefficient, the probability of sharing ONE IBD
鸡传染性法氏囊病的概率系数,共用一个IBD
参数:loglik
the value of log likelihood
值的对数似然
参数:niter
the number of iterations
的迭代次数
(作者)----------Author(s)----------
Xiuwen Zheng <a href="mailto:zhengx@u.washington.edu">zhengx@u.washington.edu</a>
参考文献----------References----------
Genetic relatedness analysis: modern data and new challenges. Nat Rev Genet. 7(10):771-80.
Case-control association testing in the presence of unknown relationships. Genet Epidemiol 33(8):668-78.
de Bakker PIW, Daly MJ & Sham PC. 2007. PLINK: a toolset for whole-genome association and population-based linkage analysis. American Journal of Human Genetics, 81.
参见----------See Also----------
snpgdsPairIBDMLELogLik, snpgdsIBDMLE, snpgdsIBDMLELogLik, snpgdsIBDMoM
snpgdsPairIBDMLELogLik,snpgdsIBDMLE,snpgdsIBDMLELogLik,snpgdsIBDMoM
实例----------Examples----------
# open an example dataset (HapMap)[打开示例数据集(人类基因组单体型图)]
genofile <- openfn.gds(snpgdsExampleFileName())
YRI.id <- read.gdsn(index.gdsn(genofile, "sample.id"))[
read.gdsn(index.gdsn(genofile, c("sample.annot", "pop.group")))=="YRI"]
# SNP pruning[SNP修剪]
set.seed(1000)
snpset <- snpgdsLDpruning(genofile, sample.id=YRI.id, maf=0.05, missing.rate=0.05)
snpset <- sample(unlist(snpset), 1000)
# the number of samples[的样本数]
n <- 25
# specify the allele frequencies[指定的等位基因频率]
afreq <- snpgdsSNPRateFreq(genofile, sample.id=YRI.id, snp.id=snpset)$AlleleFreq
subMLE <- snpgdsIBDMLE(genofile, sample.id=YRI.id[1:n], snp.id=snpset,
num.thread=2, allele.freq=afreq)
subMoM <- snpgdsIBDMoM(genofile, sample.id=YRI.id[1:n], snp.id=snpset,
num.thread=2, allele.freq=afreq)
# genotype matrix[基因型矩阵]
mat <- snpgdsGetGeno(genofile, sample.id=YRI.id[1:n], snp.id=snpset)
mat[!is.element(mat, c(0,1,2))] <- NA
rv <- NULL
for (i in 2:n)
{
rv <- rbind(rv, snpgdsPairIBD(mat[,1], mat[,i], afreq, "EM"))
print( snpgdsPairIBDMLELogLik(mat[,1], mat[,i], afreq,
relatedness="unrelated", verbose=TRUE))
}
rv
summary(rv$k0 - subMLE$k0[1, 2:n])
summary(rv$k1 - subMLE$k1[1, 2:n])
rv <- NULL
for (i in 2:n)
rv <- rbind(rv, snpgdsPairIBD(mat[,1], mat[,i], afreq, "MoM"))
rv
summary(rv$k0 - subMoM$k0[1, 2:n])
summary(rv$k1 - subMoM$k1[1, 2:n])
# close the genotype file[关闭基因型文件]
closefn.gds(genofile)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|