R语言 SNPRelate包 snpgdsLDpruning()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-30 11:14:48

snpgdsLDpruning(SNPRelate)
snpgdsLDpruning()所属R语言包：SNPRelate

                                       Linkage Disequilibrium (LD) based SNP pruning
                                       连锁不平衡（LD）为基础的SNP修剪

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Recursively removes SNPs within a sliding window
递归方式删除一个滑动窗口内单核苷酸多态性

用法----------Usage----------

snpgdsLDpruning(gdsobj, sample.id = NULL, snp.id = NULL, autosome.only = TRUE,
remove.monosnp = TRUE, maf = NaN, missing.rate = NaN,
method = c("composite", "r", "dprime", "corr"), slide.max.bp = 500000,
slide.max.n = NA, ld.threshold = 0.2, num.thread = 1, verbose = TRUE)

参数----------Arguments----------

参数：gdsobj
the gdsclass object in the gdsfmt package
gdsclass对象在gdsfmt包

参数：sample.id
a vector of sample id specifying selected samples; if NULL, all samples are used
一个向量的样品ID指定选取的样本，如果为NULL，所有样本都

参数：snp.id
a vector of snp id specifying selected SNPs; if NULL, all SNPs are used
一个向量指定选定的单核苷酸多态性SNP ID，如果为NULL，所有的SNP

参数：autosome.only
if TRUE, use autosomal SNPs only
如果为TRUE，使用常染色体SNP位点

参数：remove.monosnp
if TRUE, remove monomorphic SNPs
如果为TRUE，删除单态的单核苷酸多态性

参数：maf
to use the SNPs with ">= maf" only; if NaN, no MAF threshold
如果为NaN，没有MAF阈值使用的单核苷酸多态性“> = MAF”;

参数：missing.rate
to use the SNPs with "<= missing.rate" only; if NaN, no missing threshold
如果为NaN，无失阈值使用的单核苷酸多态性“<= missing.rate。”而已;

参数：method
"composite", "r", "dprime", "corr", see details
“复合”，“R”，“dprime”，“校正”，查看详情

参数：slide.max.bp
the maximum basepairs in the sliding window
在滑动窗口的最大个碱基对

参数：slide.max.n
the maximum number of SNPs in the sliding window
在滑动窗口的SNP位点的最大数目

参数：ld.threshold
the LD threshold
LD的阈值

参数：num.thread
the number of CPU cores used
CPU核心的数量

参数：verbose
if TRUE, show information
如果为TRUE，显示信息

Details

详细信息----------Details----------

The minor allele frequency and missing rate for each SNP passed in snp.id are calculated over all the samples in sample.id.
未成年人的等位基因频率和每个SNP位点的丢失率，通过snp.id计算在所有的样品中sample.id。

Four methods can be used to calculate linkage disequilibrium values: "composite" for LD composite measure, "r" for r square, "dprime" for D', and "corr" for correlation coefficient. The method "corr" is equivalent to "composite", when SNP genotypes are coded as: 0 – BB, 1 – AB, 2 – AA. The LD is the absolute value of measurement.
有四种方法可以用来计算连锁不平衡值：“复合”LD综合衡量，R平方为“R”，“Ddprime”，和“校正”的相关系数。 “校正”的方法是相当于“复合”，SNP基因型时，被编码为：0  -  AB，BB，1  -  2  -  AA。 LD是绝对测量值。

It is useful to generate a pruned subset of SNPs that are in approximate linkage equilibrium with each other. The function snpgdsLDpruning recursively removes SNPs within a sliding window based on the pairwise genotypic correlation. SNP pruning is conducted chromosome by chromosome, since SNPs in a chromosome can be considered to be independent with the other chromosomes.
它是有用的，以产生被剪枝的子集的单核苷酸多态性的是在近似联动相互平衡的。函数snpgdsLDpruning递归方式删除SNP位点的滑动窗口内基于成对的基因型相关。 SNP进行修剪的染色体的染色体，由于染色体SNPs在可以被认为是独立的与其他染色体。

The pruning algorithm on a chromosome is described as follows (n is the total number of SNPs on that chromosome):
在一条染色体上的修剪算法描述如下（n是该染色体上的SNP位点的总数）：

1) Randomly select a starting position i, and let the current SNP set S = { i };
1）随机选择的起始位置我，让目前的SNP集合S = {};

2) For each right position j from i+1 to n: if any LD between j and k is greater than ld.threshold, where k belongs to S, and both of j and k are in the sliding window, then skip j; otherwise, let S be S + { j };
2）对于每个从i +1到n的右位置j：j和k之间的是，如果没有LD大于ld.threshold，其中k属于S，j和k是在滑动窗口，然后跳到J;否则，让S是S + {J};

3) For each left position j from i-1 to 1: if any LD between j and k is greater than ld.threshold, where k belongs to S, and both of j and k are in the sliding window, then skip j; otherwise, let S be S + { j };
3）对于每一个从i-1到1的左侧的位置j：j和k之间的是，如果没有LD大于ld.threshold，其中k属于S，j和k是在滑动窗口，然后跳到J;否则，让S是S + {J};

4) Output S, the final selection of SNPs.
4）输出S，最终选择的单核苷酸多态性。

值----------Value----------

Return a list of SNP IDs stratified by chromosomes.
返回一个列表的染色体的SNP ID的分层。

（作者）----------Author(s)----------

Xiuwen Zheng <a href="mailto:zhengx@u.washington.edu">zhengx@u.washington.edu</a>

参考文献----------References----------

(ed): Mathematical Evolutionary Theory. Princeton, NJ: Princeton University Press, 1989.

参见----------See Also----------

snpgdsLDMat, snpgdsLDpair
snpgdsLDMat，snpgdsLDpair

实例----------Examples----------

# open an example dataset (HapMap)[打开示例数据集（人类基因组单体型图）]
genofile <- openfn.gds(snpgdsExampleFileName())

snpset <- snpgdsLDpruning(genofile)
names(snpset)
#  [1] "chr1"  "chr2"  "chr3"  "chr4"  "chr5"  "chr6"  "chr7"  "chr8"  "chr9"[[1]“”CHR1“CHR 2”CHR 3“”CHR 4“”CHR 5 CHR 6“，”CHR 7“”chr8“CHR9”]
# [10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"[[10]“CHR 10”CHR11“chr12”chr13“”“chr14”的“chr15”，“CHR 16”chr17“chr18”]
# ......[......]
head(snpset$chr1)
# [1] 1 2 3 4 5 6[[1] 1 2 3 4 5 6]

# get SNP ids[获得SNP IDS]
snp.id <- unlist(snpset)

# close the genotype file[关闭基因型文件]
closefn.gds(genofile)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 SNPRelate包 snpgdsLDpruning()函数中文帮助文档(中英文对照)

浏览过的版块