R语言 htSeqTools包 filterDuplReads()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 22:02:57

filterDuplReads(htSeqTools)
filterDuplReads()所属R语言包：htSeqTools

                                       Detect and filter duplicated reads/sequences.
                                       检测和过滤重复的读取/序列。

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

filterDuplReads filters highly repeated sequences, i.e. with the same chromosome, start and end positions. As many such sequences are likely due to over-amplification artifacts, this can be a useful pre-processing step for ultra high-throughput sequencing data. A false discovery rate is computed for each number of repeats being unusually high. The reads with a higher false discovery rate will be removed. For more information on the false discovery rate calculation please read the fdrEnrichment manual.
filterDuplReads过滤高度重复序列，即在同一条染色体，开始和结束位置。由于许多这样的序列很可能是由于过度放大文物，这可能是一个有用的超高通量测序数据的前处理步骤。一个虚假的发现率计算为每个异常高的重复。用虚假的发现率较高的读取将被删除。对于虚假的发现率计算的更多信息，请阅读fdrEnrichment手册。

tabDuplReads counts the number reads with no duplications, duplicated once, twice etc.
tabDuplReads计数读取数不重复，重复一次，两次等

用法----------Usage----------

filterDuplReads(x, maxRepeats, fdrOverAmp=0.01, negBinomUse=.999,components=0, mc.cores=1)

tabDuplReads(x,  minRepeats=1, mc.cores=1)

参数----------Arguments----------

参数：x
Object containing read locations. Currently methods for RangedData and RangedDataList. Duplication is assessed based only on the space, start, end and x[['strand']], i.e. even if they are different based on other variables stored in values(x), the reads are considered duplicated and only the first appearance is returned.
对象，其中包含读取的位置。 RangedData和RangedDataList目前的方法。重复评估基础上的空间，开始，结束和x[['strand']]，即，即使它们是不同的存储在其他变量的基础上values(x)，读取被认为是重复的，只有首先出场的是返回。

参数：maxRepeats
Reads appearing maxRepeats or more times will be excluded. If not specified, this is setup automatically based on fdrOverAmp.
读取出现maxRepeats或更多的时间将被排除在外。如果没有指定，这是fdrOverAmp自动建立。

参数：fdrOverAmp
Reads with false discovery rate of being over-amplified greater than fdrOverAmp are excluded.
读取过分放大更大的比fdrOverAmp被排除虚假的发现率。

参数：negBinomUse
Number of counts that will be used to compute the null distribution. Using 1 - 1/1000 would mean that 99.9% of the reads will be used. The ones with higher number of repetitions are the excluded ones.
数数，将被用来计算空分布。使用1  -  1/1000，将意味着，99.9％将用于读取。那些具有较高的重复次数是排除的。

参数：components
number of negative binomials that will be used to fit null distribution. The default value is 1. This value hase to be between 0 and 4. If 0 is given the optimal number of negative biomials is choosen using the Bayesian information criterion (BIC)
数将用来满足空分布的负二项式。默认值是1。此值HASE是介于0和4。如果0给出负biomials的最佳数量，选择使用贝叶斯信息标准（BIC）

参数：mc.cores
Number of cores to be used in parallel computing (passed on to mclapply).
用于并行计算的核心数量（通过mclapply）。

参数：minRepeats
The table is only produced for reads with at least minRepeats repeats.
表中只产生至少minRepeats重复读取。

值----------Value----------

filterDuplReads returns x without highly repetitive sequencesas, determined by maxRepeats or ppOverAmp.
filterDuplReads返回x没有高度重复sequencesas，确定maxRepeats或ppOverAmp的。

tabDuplReads returns a table counting the number of sequences repeating 1 times, 2 times, 3 times etc.
tabDuplReads返回计数的重复序列1倍，2倍，3倍等表

方法----------Methods----------

Methods for filterDuplReads and tabDuplReads
filterDuplReads和tabDuplReads方法

signature(x = "RangedData")  Two reads are duplicated if
signature(x = "RangedData")两次读取是重复的，如果

signature(x = "RangedDataList")  The method is applied
signature(x = "RangedDataList")方法应用于

作者（S）----------Author(s)----------

Evarist Planet, David Rossell, Oscar Flores

参见----------See Also----------

fdrEnrichedCounts to compute the posterior probability
fdrEnrichedCounts来计算后验概率

举例----------Examples----------

set.seed(1)
st <- round(rnorm(1000,500,100))
strand <- rep(c('+','-'),each=500)
space <- sample(c('chr1','chr2'),size=length(st),replace=TRUE)
sample1 <- RangedData(IRanges(st,st+38),strand=strand,space=space)

#Add artificial repeats[添加人工重复]
st <- rep(400,20)
repeats <- RangedData(IRanges(st,st+38),strand='+',space='chr1')
sample1 <- rbind(sample1,repeats)

filterDuplReads(sample1)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册