找回密码
 注册
查看: 543|回复: 0

R语言 SNPRelate包 snpgdsIBDMLE()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-9-30 11:13:44 | 显示全部楼层 |阅读模式
snpgdsIBDMLE(SNPRelate)
snpgdsIBDMLE()所属R语言包:SNPRelate

                                         Maximum likelihood estimation (MLE) for the Identity-By-Descent (IBD) Analysis
                                         身份的下降(IBD)分析的最大似然估计(MLE)

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Calculate the three IBD coefficients (k0, k1, k2) for non-inbred individual pairs by Maximum Likelihood Estimation.
计算三个:IBD系数(K0,K1,K2),不交个人对最大似然估计。


用法----------Usage----------


snpgdsIBDMLE(gdsobj, sample.id=NULL, snp.id=NULL, autosome.only=TRUE,
        remove.monosnp=TRUE, maf=NaN, missing.rate=NaN, kinship=FALSE,
        kinship.constraint=FALSE, allele.freq=NULL, method=c("EM", "downhill.simplex"),
        max.niter=1000, reltol=sqrt(.Machine$double.eps), coeff.correct=TRUE,
        out.num.iter=TRUE, num.thread=1, verbose=TRUE)



参数----------Arguments----------

参数:gdsobj
the gdsclass object in the gdsfmt package
gdsclass对象在gdsfmt包


参数:sample.id
a vector of sample id specifying selected samples; if NULL, all samples are used
一个向量的样品ID指定选取的样本,如果为NULL,所有样本都


参数:snp.id
a vector of snp id specifying selected SNPs; if NULL, all SNPs are used
一个向量指定选定的单核苷酸多态性SNP ID,如果为NULL,所有的SNP


参数:autosome.only
if TRUE, use autosomal SNPs only
如果为TRUE,使用常染色体SNP位点


参数:remove.monosnp
if TRUE, remove monomorphic SNPs
如果为TRUE,删除单态的单核苷酸多态性


参数:maf
to use the SNPs with ">= maf" only; if NaN, no any MAF threshold
如果为NaN,没有任何MAF阈值使用的单核苷酸多态性“> = MAF”;


参数:missing.rate
to use the SNPs with "<= missing.rate" only; if NaN, no any missing threshold
如果为NaN,没有任何遗漏的阈值使用的单核苷酸多态性“<= missing.rate。”而已;


参数:kinship
if TRUE, output the estimated kinship coefficients
如果TRUE,输出的估计亲属关系系数


参数:kinship.constraint
if TRUE, constrict IBD coefficients ($k_0,k_1,k_2$) in the geneloical region ($2 k_0 k_1 >= k_2^2$)
如果为TRUE,压缩的IBD系数(K_0,K_1,K_2 $)中的geneloical的区域($ 2 K_0 K_1> = K_2 ^ 2 $)


参数:allele.freq
to specify the allele frequencies; if NULL, determine the allele frequencies from gdsobj using the specified samples
指定的等位基因频率,如果为NULL,确定等位基因频率gdsobj使用指定的样品


参数:method
"EM", "downhill.simplex", see details
“EM”,“downhill.simplex”,详情请参阅


参数:max.niter
the maximum number of iterations
最大迭代次数


参数:reltol
relative convergence tolerance; the algorithm stops if it is unable to reduce the value of log likelihood by a factor of $reltol * (abs(log likelihood with the initial parameters) + reltol)$ at a step.
相对收敛误差,算法停止,如果它是无法减少的一个因素$ RELTOL *(ABS(对数似然的初始参数)+ RELTOL)在一个步骤的对数似然值。


参数:coeff.correct
TRUE by default, see details
TRUE默认情况下,查看详情


参数:out.num.iter
if TRUE, output the numbers of iterations
如果为TRUE,输出迭代次数


参数:num.thread
the number of CPU cores used
CPU核心的数量


参数:verbose
if TRUE, show information
如果为TRUE,显示信息


Details

详细信息----------Details----------

The minor allele frequency and missing rate for each SNP passed in snp.id are calculated over all the samples in sample.id.
未成年人的等位基因频率和每个SNP位点的丢失率,通过snp.id计算在所有的样品中sample.id。

The PLINK moment estimates are used as the initial values in the algorithm of searching maximum value of log likelihood function. Two numeric approaches can be used: one is Expectation-Maximization (EM) algorithm, and the other is Nelder-Mead method or downhill simplex method. Generally, EM algorithm is more robust than downhill simplex method.
PLINK时刻估计被用作在对数似然函数的最大值的搜索算法的初始值。两个数值的方法可以使用:一种是期望最大化(EM)算法,而另一个是内尔德米德法或单纯形法。一般情况下,EM算法是更强大的比单纯形法。

If coeff.correct is TRUE, the final point that is found by searching algorithm (EM or downhill simplex) is used to compare the six points (fullsib, offspring, halfsib, cousin, unrelated), since any numeric approach might not reach the maximum position after a finit number of steps. If any of these six points has a higher value of log likelihood, the final point will be replaced by the best one.
如果coeff.correct是TRUE,搜索算法(EM或单纯形)的最后一点用来比较的六个点(fullsib,后代,halfsib,表哥,无关),因为任何数字方法可能不会到达后FINIT的步数的最大位置。如果这六个点中的任何具有较高的对数似然值,最后取而代之的将是最好的一个点。

Although MLE estimates are more reliable than MoM, MLE is much more computationally intensive than MoM, and might not be feasible to estimate pairwise relatedness for a large dataset.
虽然MLE估计是更可靠的比环比,MLE更比按月计算密集型的,而未必可行,估计两两相关性的大型数据集。


值----------Value----------

Return a snpgdsIBDClass object, which is a list:
返回snpgdsIBDClass对象,这是一个列表:


参数:sample.id
the sample ids used in the analysis
在分析中使用的样品的id


参数:snp.id
the SNP ids used in the analysis
在分析中使用的SNP ID的


参数:afreq
the allele frequencies used in the analysis
在分析中使用的等位基因频率


参数:k0
IBD coefficient, the probability of sharing ZERO IBD
IBD的系数,的概率下分享的ZERO IBD


参数:k1
IBD coefficient, the probability of sharing ONE IBD
鸡传染性法氏囊病的概率系数,共用一个IBD


参数:kinship
the estimated kinship coefficients, if the parameter kinship=TRUE
估计亲属关系系数,如果该参数kinship=TRUE


(作者)----------Author(s)----------


Xiuwen Zheng <a href="mailto:zhengx@u.washington.edu">zhengx@u.washington.edu</a>



参考文献----------References----------


Genetic relatedness analysis: modern data and new challenges. Nat Rev Genet. 7(10):771-80.
Case-control association testing in the presence of unknown relationships. Genet Epidemiol 33(8):668-78.

参见----------See Also----------

snpgdsIBDMLELogLik, snpgdsIBDMoM
snpgdsIBDMLELogLik,snpgdsIBDMoM


实例----------Examples----------


# open an example dataset (HapMap)[打开示例数据集(人类基因组单体型图)]
genofile <- openfn.gds(snpgdsExampleFileName())

YRI.id <- read.gdsn(index.gdsn(genofile, "sample.id"))[
        read.gdsn(index.gdsn(genofile, c("sample.annot", "pop.group")))=="YRI"]
YRI.id <- YRI.id[1:30]

# SNP pruning[SNP修剪]
set.seed(1000)
snpset <- snpgdsLDpruning(genofile, sample.id=YRI.id, maf=0.05, missing.rate=0.05)
snpset <- sample(unlist(snpset), 500)
mibd <- snpgdsIBDMLE(genofile, sample.id=YRI.id, snp.id=snpset, num.thread=2, kinship=TRUE)
names(mibd)

loglik <- snpgdsIBDMLELogLik(genofile, mibd)
loglik0 <- snpgdsIBDMLELogLik(genofile, mibd, relatedness="unrelated")

# likelihood ratio test[似然比检验]
p.value <- pchisq(loglik - loglik0, 1, lower.tail=FALSE)


flag <- lower.tri(mibd$k0)
plot(NaN, xlim=c(0,1), ylim=c(0,1), xlab="k0", ylab="k1")
lines(c(0,1), c(1,0), col="red", lty=3)
points(mibd$k0[flag], mibd$k1[flag])

# specify the allele frequencies[指定的等位基因频率]
afreq <- snpgdsSNPRateFreq(genofile, sample.id=YRI.id, snp.id=snpset)$AlleleFreq
subibd <- snpgdsIBDMLE(genofile, sample.id=YRI.id[1:25], snp.id=snpset,
        num.thread=2, allele.freq=afreq)
summary(c(subibd$k0 - mibd$k0[1:25, 1:25]))
# ZERO[ZERO]
summary(c(subibd$k1 - mibd$k1[1:25, 1:25]))
# ZERO[ZERO]


# close the genotype file[关闭基因型文件]
closefn.gds(genofile)

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-6-8 07:43 , Processed in 0.026295 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表