找回密码
 注册
查看: 2639|回复: 0

R语言 snpStats包 snp.imputation()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-26 14:47:50 | 显示全部楼层 |阅读模式
snp.imputation(snpStats)
snp.imputation()所属R语言包:snpStats

                                        Calculate imputation rules
                                         计算的归责原则

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Given two set of SNPs typed in the same subjects, this function calculates rules which can be used to impute one set from the other in a subsequent sample. The function can also calculate rules for imputing each SNP in a single dataset from other SNPs in the same dataset
由于两个组在同一科目输入的单核苷酸多态性,此功能可以用来推诿在随后的样品一组从其他的规则计算。归咎于每个SNP在单个数据集从其他SNPs,在相同的数据集的功能,还可以计算规则


用法----------Usage----------


snp.imputation(X, Y, pos.X, pos.Y, phase=FALSE, try=50, stopping=c(0.95, 4, 0.05),
               use.hap=c(1.0, 0.0), em.cntrl=c(50,0.01,10,0.01), minA=5)



参数----------Arguments----------

参数:X
An object of class "SnpMatrix" or "XSnpMatrix" containing observations of the SNPs to be used for imputation ("predictor SNPs")
类"SnpMatrix"或"XSnpMatrix"归集使用的意见(“预测单核苷酸多态性”包含的SNPs)的对象


参数:Y
An object of same class as X containing observations of the SNPs to be imputed in a future sample ("target SNPs"). If this argument is missing, then target SNPs are also drawn from X
同一等级的对象X包含的SNPs要归咎于在未来的样本(“目标单核苷酸多态性”)的意见。如果缺少此参数,则目标的SNPs也来自X


参数:pos.X
The positions of the  predictor SNPs. Can be missing if there is no Y argument and the columns of X are in genome position order
预测的单核苷酸多态性的立场。可缺少,如果有没有YX在基因组的位置顺序的参数和列


参数:pos.Y
The positions of the target SNPs. Only required when a Y argument is present
目标的SNPs的位置。 Y参数存在时,只需要


参数:phase
See "Details" below
请参阅下面的“详细信息”。


参数:try
The number of potential predictor SNPs to be considered in the stepwise regression procedure around each target SNP . The nearest try predictor SNPs to each target SNP will be considered
周围的每一个目标SNP逐步回归过程中必须考虑的潜在预测的SNPs数量。最近try预测的SNPs每个目标SNP将被视为


参数:stopping
Parameters of the stopping rule for the stepwise regression (see below)
停止规则的逐步回归参数(见下文)


参数:use.hap
Parameters to control use of the haplotype imputation method (see below)
参数来控制的使用单体归集方法(见下文)


参数:em.cntrl
Parameters to control test for convergence of EM algorithm for fitting phased haplotypes (see below)
参数来控制测试的EM的收敛算法装修分阶段单倍体(见下文)


参数:minA
A minimum data quantity measure for estimating pairwise linkage disequilibrium (see below)
最低数据估算成对连锁不平衡量的措施(见下文)


Details

详情----------Details----------

The routine first carries out a series of step-wise least-square regression analyses in which each Y SNP is regressed on the nearest try predictor (X) SNPs. If phase is TRUE, the regressions will be calculated at the chromosome (haplotype) level, variances being simply  p(1-p) and covariances estimated from the estimated two-locus haplotypes (this option is not yet implemented). Otherwise, the analysis is carried out at the genotype level based on   conventional variance and covariance estimates using the "pairwise.complete.obs" missing value treatment (see cov). New SNPs are added to the regression until either (a) the value of R^2 exceeds the first parameter of stopping, (b) the number of "tag" SNPs has reached the maximum set in the second parameter of stopping, or (c) the change in R^2 does not achieve the target set by the third parameter of stopping. If the third parameter of stopping is NA, this last test is replaced by a test for improvement in the Akaike information criterion (AIC).
第一次的例行开展了一系列分步进行最小二乘回归分析,其中每个Y SNP是在最近的try预测(X)的单核苷酸多态性回归。如果phase是TRUE,回归将染色体(单倍体)的水平,差异只是计算p(1-p)和协方差估计估计两个位点的单体型(此选项是尚未实施)。否则,在基因水平进行分析,基于传统的方差和协方差的估计,使用"pairwise.complete.obs"缺失值处理(见cov)。新的SNPs被添加到回归,直至(一)R^2超过stopping(二)“标记”单核苷酸多态性的数目已经达到了在第二个最大的第一个参数值参数stopping,或(c)在R^2没有达到目标由stopping的第三个参数设置的变化。如果第三个参数stoppingNA,这最后的测试,测试改善Akaike信息标准(AIC)取代。

After choosing the set of "tag" SNPs in this way, a prediction rule is generated either by calculating phased haplotype frequencies, either (a) under a log-linear model for linkage disequilibrium with only first order association terms fitted, or (b) under the "saturated" model.   These methods do not differ if there is only one tag SNP but, otherwise,  choice between  methods is controlled by the  use.hap parameters.  If the prediction,  as measure by  R^2 achieved with the log-linear smoothing model exceeds a threshold (the first parameter of use.hap) then this method is used. Otherwise, if the gain in R^2 achieved by using the second method exceeds the second parameter of use.hap, then the second method is used. Current experience is that, the log-linear method is rarely preferred with reasonable choices for use.hap, and imputation is much faster when the second  method only is considered.  The current default ensures that this second method is used, but the other possibility might be considered if imputing from very small samples; however this code is not extensively tested and should be regarded as experimental.
预测规则后,选择“标记”单核苷酸多态性在这样的一套,要么会产生相控单体型频率计算,无论是一个只有一阶协会装有连锁不平衡,或(b的对数线性模型(a)根据)根据“饱和”的模式。这些方法没有区别,如果有只有一个标签SNP的,但否则,use.hap参数控制方法之间的选择。如果预测,举措R^2的达到对数线性平滑模型超过一个阈值,然后使用这个方法(use.hap的第一个参数)。否则,如果在增益R^2使用第二个方法超过use.hap的第二个参数来实现的,那么第二种方法是使用。目前的经验是数线性的方法,很少use.hap,归集的合理选择首选的快得多时,第二种方法只被认为是。当前默认的保证,这第二种方法是使用,但可能会考虑其他的可能性,如果从非常小的样本归咎于;但是这个代码不广泛的测试,并应作为实验。

The argument em.cntrl controls convergence testing for the EM algorithm for fitting haplotype frequencies and the IPF algorithm for fitting the log-linear model. The first parameter is the maximum number of EM iterations, and the second parameter is the threshold for the change in log likelihood below which the iteration is judged to have converged. The third and fourth parameters give the maximum number of IPF iterations and the convergence tolerance. There should be no need to change the default values.
参数em.cntrl控制拟合单体型频率和装修log线性模型的IPF算法EM算法的收敛性测试。第一个参数是最大的EM迭代的数量,第二个参数是在log的可能性变化,下面的迭代被判定为有融合的阈值。第三个和第四个参数给出的最大数量和IPF的迭代收敛公差。应该没有必要改变默认值。

All SNPs selected for imputation must have sufficient data for estimating pairwise linkage disequilibrium with each other and with the target SNP. The statistic chosen is based on the four-fold tables of two-locus haplotype frequencies. If the frequencies in such a table are labelled a, b, c and d then, if ad>bc then t = min(a,d) and, otherwise,  t = min(b,c). The cell frequencies t must exceed minA for all pairwise comparisons.  
归集选择所有的SNPs必须有足够的数据,估计互相成对目标SNP的连锁不平衡。统计选择是基于两个位点的单体型频率的4倍表。如果在这样一个表的频率标记a, b, c和d然后,如果ad>bc然后t = min(a,d)“,否则,”t = min(b,c)。单元的频率t必须超过minA所有成对比较。


值----------Value----------

An object of class "ImputationRules".
对象类"ImputationRules"。


注意----------Note----------

The phase=TRUE option is not yet implemented
phase=TRUE选项尚未实现


作者(S)----------Author(s)----------


David Clayton <a href="mailto:david.clayton@cimr.cam.ac.uk">david.clayton@cimr.cam.ac.uk</a>



参考文献----------References----------

Human Heredity, 56:18-31.


参见----------See Also----------

ImputationRules-class,
ImputationRules-class


举例----------Examples----------


# Remove 5 SNPs from a datset and derive imputation rules for them[从datset删除5个SNPs,并从中为他们的归责原则]
data(for.exercise)
sel <- c(20, 1000, 2000, 3000, 5000)
to.impute <- snps.10[,sel]
impute.from <- snps.10[,-sel]
pos.to <- snp.support$position[sel]
pos.fr <- snp.support$position[-sel]
imp <- snp.imputation(impute.from, to.impute, pos.fr, pos.to)

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-23 22:37 , Processed in 0.023128 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表