找回密码
 注册
查看: 414|回复: 0

R语言 htSeqTools包 filterDuplReads()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-25 22:02:57 | 显示全部楼层 |阅读模式
filterDuplReads(htSeqTools)
filterDuplReads()所属R语言包:htSeqTools

                                         Detect and filter duplicated reads/sequences.
                                         检测和过滤重复的读取/序列。

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

filterDuplReads filters highly repeated sequences, i.e. with the same chromosome, start and end positions. As many such sequences are likely due to over-amplification artifacts, this can be a useful pre-processing step for ultra high-throughput sequencing data. A false discovery rate is computed for each number of repeats being unusually high. The reads with a higher false discovery rate will be removed. For more information on the false discovery rate calculation please read the fdrEnrichment manual.
filterDuplReads过滤高度重复序列,即在同一条染色体,开始和结束位置。由于许多这样的序列很可能是由于过度放大文物,这可能是一个有用的超高通量测序数据的前处理步骤。一个虚假的发现率计算为每个异常高的重复。用虚假的发现率较高的读取将被删除。对于虚假的发现率计算的更多信息,请阅读fdrEnrichment手册。

tabDuplReads counts the number reads with no duplications, duplicated once, twice etc.
tabDuplReads计数读取数不重复,重复一次,两次等


用法----------Usage----------


filterDuplReads(x, maxRepeats, fdrOverAmp=0.01, negBinomUse=.999,components=0, mc.cores=1)

tabDuplReads(x,  minRepeats=1, mc.cores=1)



参数----------Arguments----------

参数:x
Object containing read locations. Currently methods for RangedData and RangedDataList. Duplication is assessed based only on the space, start, end and x[['strand']], i.e. even if they are different based on other variables stored in values(x), the reads are considered duplicated and only the first appearance is returned.
对象,其中包含读取的位置。 RangedData和RangedDataList目前的方法。重复评估基础上的空间,开始,结束和x[['strand']],即,即使它们是不同的存储在其他变量的基础上values(x),读取被认为是重复的,只有首先出场的是返回。


参数:maxRepeats
Reads appearing maxRepeats or more times will be excluded. If not specified, this is setup automatically based on fdrOverAmp.  
读取出现maxRepeats或更多的时间将被排除在外。如果没有指定,这是fdrOverAmp自动建立。


参数:fdrOverAmp
Reads with false discovery rate of being over-amplified greater than fdrOverAmp are excluded.
读取过分放大更大的比fdrOverAmp被排除虚假的发现率。


参数:negBinomUse
Number of counts that will be used to compute the null distribution. Using 1 - 1/1000 would mean that 99.9% of the reads will be used. The ones with higher number of repetitions are the excluded ones.
数数,将被用来计算空分布。使用1  -  1/1000,将意味着,99.9%将用于读取。那些具有较高的重复次数是排除的。


参数:components
number of negative binomials that will be used to fit null distribution. The default value is 1. This value hase to be between 0 and 4. If 0 is given the optimal number of negative biomials is choosen using the Bayesian information criterion (BIC)
数将用来满足空分布的负二项式。默认值是1。此值HASE是介于0和4。如果0给出负biomials的最佳数量,选择使用贝叶斯信息标准(BIC)


参数:mc.cores
Number of cores to be used in parallel computing (passed on to mclapply).
用于并行计算的核心数量(通过mclapply)。


参数:minRepeats
The table is only produced for reads with at least minRepeats repeats.
表中只产生至少minRepeats重复读取。


值----------Value----------

filterDuplReads returns x without highly repetitive sequencesas, determined by maxRepeats or ppOverAmp.
filterDuplReads返回x没有高度重复sequencesas,确定maxRepeats或ppOverAmp的。

tabDuplReads returns a table counting the number of sequences repeating 1 times, 2 times, 3 times etc.  
tabDuplReads返回计数的重复序列1倍,2倍,3倍等表


方法----------Methods----------

Methods for filterDuplReads and tabDuplReads
filterDuplReads和tabDuplReads方法




signature(x = "RangedData")  Two reads are duplicated if
signature(x = "RangedData")两次读取是重复的,如果




signature(x = "RangedDataList")  The method is applied
signature(x = "RangedDataList")方法应用于


作者(S)----------Author(s)----------


Evarist Planet, David Rossell, Oscar Flores



参见----------See Also----------

fdrEnrichedCounts to compute the posterior probability
fdrEnrichedCounts来计算后验概率


举例----------Examples----------


set.seed(1)
st <- round(rnorm(1000,500,100))
strand <- rep(c('+','-'),each=500)
space <- sample(c('chr1','chr2'),size=length(st),replace=TRUE)
sample1 <- RangedData(IRanges(st,st+38),strand=strand,space=space)

#Add artificial repeats[添加人工重复]
st <- rep(400,20)
repeats <- RangedData(IRanges(st,st+38),strand='+',space='chr1')
sample1 <- rbind(sample1,repeats)

filterDuplReads(sample1)

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-2-6 01:57 , Processed in 0.021033 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表