srFilter(ShortRead)
srFilter()所属R语言包:ShortRead
Functions for user-created and built-in ShortRead filters
用户创建的和内置在ShortRead过滤器的功能
译者:生物统计家园网 机器人LoveR
描述----------Description----------
These functions create user-defined (srFitler) or built-in instances of SRFilter objects. Filters can be applied to objects from ShortRead, returning a logical vector to be used to subset the objects to include only those components satisfying the filter.
这些功能创建用户定义(srFitler)或内置SRFilter对象的实例。过滤器可以应用到对象从ShortRead返回一个逻辑的向量,可用于子集的对象只包括那些满足过滤器的组件。
用法----------Usage----------
srFilter(fun, name = NA_character_, ...)
## S4 method for signature 'missing'
srFilter(fun, name=NA_character_, ...)
## S4 method for signature 'function'
srFilter(fun, name=NA_character_, ...)
compose(filt, ..., .name)
idFilter(regex=character(0), fixed=FALSE, exclude=FALSE,
.name="idFilter")
chromosomeFilter(regex=character(0), fixed=FALSE, exclude=FALSE,
.name="ChromosomeFilter")
positionFilter(min=-Inf, max=Inf, .name="PositionFilter")
strandFilter(strandLevels=character(0), .name="StrandFilter")
occurrenceFilter(min=1L, max=1L,
withSread=c(TRUE, FALSE, NA),
duplicates=c("head", "tail", "sample", "none"),
.name=.occurrenceName(min, max, withSread,
duplicates))
nFilter(threshold=0L, .name="CleanNFilter")
polynFilter(threshold=0L, nuc=c("A", "C", "T", "G", "other"),
.name="PolyNFilter")
dustyFilter(threshold=Inf, batchSize=NA, .name="DustyFilter")
srdistanceFilter(subject=character(0), threshold=0L,
.name="SRDistanceFilter")
alignQualityFilter(threshold=0L, .name="AlignQualityFilter")
alignDataFilter(expr=expression(), .name="AlignDataFilter")
参数----------Arguments----------
参数:fun
An object of class function to be used as a filter. fun must accept a single named argument x, and is expected to return a logical vector such that x[fun(x)] selects only those elements of x satisfying the conditions of fun
function类的对象被用作过滤器。 必须接受fun单一命名参数x,预计返回一个逻辑向量x[fun(x)]x满足fun的条件只选择那些元素
参数:name
A character(1) object to be used as the name of the filter. The name is useful for debugging and reference.
过滤器的名字被用来作为一个character(1)对象。 name是有用的调试和参考。
参数:filt
A SRFilter object, to be used with additional arguments to create a composite filter.
一个SRFilter对象,将使用额外的参数,创建一个复合过滤器。
参数:.name
An optional character(1) object used to over-ride the name applied to default filters.
一个可选的character(1)对象使用过骑适用于默认过滤器的名称。
参数:regex
Either character(0) or a character(1) regular expression used as grep(regex, chromosome(x)) to filter based on chromosome. The default (character(0)) performs no filtering
要么character(0)或character(1)grep(regex, chromosome(x))使用正则表达式过滤基于染色体上。默认(character(0))不执行过滤
参数:fixed
logical(1) passed to grep, influencing how pattern matching occurs.
logical(1)通过grep,影响模式匹配如何发生。
参数:exclude
logical(1) which, when TRUE, uses regex to exclude, rather than include, reads.
logical(1),TRUE使用regex排除,而不是包括读取。
参数:min
numeric(1)
numeric(1)
参数:max
numeric(1). For positionFilter, min and max define the closed interval in which position must be found min <= position <= max. For occurrenceFilter, min and max define the minimum and maximum number of times a read occurs after the filter.
numeric(1)。 positionFilter,min和max在哪个位置,必须找到min <= position <= max定义在闭区间。 occurrenceFilter,min和max次定义的最小和最大数目只读发生后过滤。
参数:strandLevels
Either character(0) or character(1) containing strand levels to be selected. ShortRead objects have standard strand levels NA, "+", "-", "*", with NA meaning strand information not available and "*" meaning strand information not relevant.
要么character(0)或character(1)含链被选中的水平。 ShortRead对象有标准链水平NA, "+", "-", "*",NA意义链信息不提供"*"意义链不相关的资料。
参数:withSread
A logical(1) indicating whether uniqueness includes the read sequence (withSread=TRUE), is based only on chromosome, position, and strand (withSread=FALSE), or only the read sequence (withSread=NA), as described for occurrenceFilter below..
一个logical(1)的独特性,包括是否只读序列(withSread=TRUE),只对染色体,位置和钢绞线(withSread=FALSE),或仅读序列(withSread=NA ),occurrenceFilter下面......
参数:duplicates
Either character{1}, a function name, or a function taking a single argument. Influence how duplicates are handled, as described for occurrenceFilter below.
要么character{1}函数name,或一个函数,一个参数。对如何处理重复,下文所述occurrenceFilter。
参数:threshold
A numeric(1) value representing a minimum (srdistanceFilter, alignQualityFilter) or maximum (nFilter, polynFilter, dustyFilter) criterion for the filter. The minima and maxima are closed-interval (i.e., x >= threshold, x <= threshold for some property x of the object being filtered).
一个numeric(1)值较最低(srdistanceFilter,alignQualityFilter)或最高(nFilter,polynFilter,dustyFilter)标准的过滤器。最小值和最大值是封闭的区间(即x >= threshold,x <= threshold一些财产x对象被过滤)。
参数:nuc
A character vector containing IUPAC symbols for nucleotides or the value "other" corresponding to all non-nucleotide symbols, e.g., N.
一个character向量的IUPAC核苷酸或价值"other"对应的所有非核苷酸的符号,例如,N符号。
参数:batchSize
NA or an integer(1) vector indicating the number of DNA sequences to be processed simultaneously by dustyFilter. By default, all reads are processed simultaneously. Smaller values use less memory but are computationally less efficient.
NA或integer(1)向量处理dustyFilter同时表示DNA序列的数量。默认情况下,所有读取,同时处理。较小的值使用较少的内存,但计算效率较低。
参数:subject
A character() of any length, to be used as the corresponding argument to srdistance.
一个character()任何长度,可用于为相应的参数srdistance。
参数:expr
A expression to be evaluated with pData(alignData(x)).
一个expression与pData(alignData(x))进行评估。
参数:...
Additional arguments for subsequent methods; these arguments are not currently used.
后续方法的额外的参数,这些参数目前没有使用。
Details
详情----------Details----------
srFilter allows users to construct their own filters. The fun argument to srFilter must be a function accepting a single argument x and returning a logical vector that can be used to select elements of x satisfying the filter with x[fun(x)]
srFilter允许用户建立自己的过滤器。 funsrFilter参数必须是一个函数接受一个参数x返回一个逻辑的向量,可以用来选择用<x满足过滤器的元素X>
The signature(fun="missing") method creates a default filter that returns a vector of TRUE values with length equal to length(x).
signature(fun="missing")方法创建一个默认的过滤器,返回TRUE值的向量长度等于length(x)。
compose constructs a new filter from one or more existing filter. The result is a filter that returns a logical vector with indices corresponding to components of x that pass all filters. If not provided, the name of the filter consists of the names of all component filters, each separated by " o ".
compose构造一个新的过滤器由一个或多个现有的过滤器。结果是一个过滤器,对应x通过所有过滤器组件与指数返回一个逻辑向量。如果不提供,过滤器的名称,包括所有组件过滤器,每个" o "分隔的名称。
The remaining functions documented on this page are built-in filters that accept an argument x and return a logical vector of length(x) indicating which components of x satisfy the filter.
此页面上剩余的记录功能,内置的过滤器,接受一个参数x和length(x)表明返回的逻辑向量组件x满足过滤器。
idFilter selects elements satisfying grep(regex, id(x), fixed=fixed).
idFilter满足grep(regex, id(x), fixed=fixed)选择元素。
chromosomeFilter selects elements satisfying grep(regex, chromosome(x), fixed=fixed).
chromosomeFilter满足grep(regex, chromosome(x), fixed=fixed)选择元素。
positionFilter selects elements satisfying min <= position(x) <= max.
positionFilter满足min <= position(x) <= max选择元素。
strandFilter selects elements satisfying match(strand(x), strand, nomatch=0) > 0.
strandFilter满足match(strand(x), strand, nomatch=0) > 0选择元素。
occurrenceFilter selects elements that occur >=min and <=max times. withSread determines how reads will be treated: TRUE to include the sread, chromosome, strand, and position when determining occurrence, FALSE to include chromosome, strand, and position, and NA to include only sread. The default is withSread=TRUE. duplicates determines how reads with more than max reads are treated. head selects the first max reads of each set of duplicates, tail the last max reads, and sample a random sample of max reads. none removes all reads represented more than max times. The user can also provide a function (as used by tapply) of a single argument to select amongst reads.
occurrenceFilter选择发生>=min和<=max倍的元素。 withSread确定如何读取将被视为:TRUE包括的SREAD,染色体,钢绞线,并确定发生时的位置,FALSE包括染色体,钢绞线和位置,<X >包括只SREAD的。默认NA。 withSread=TRUE决定如何处理比duplicates读读取。 max选择第一head每套重复读取,max最后tail读取和max随机抽样的sample读取。 max删除所有读取代表比none倍多。用户还可以提供一个单一的参数选择之间读取函数(max)。
nFilter selects elements with fewer than threshold 'N' symbols in each element of sread(x).
nFilter少于threshold'N'sread(x)在每个元素符号的选择元素。
polynFilter selects elements with fewer than threshold copies of any nucleotide indicated by nuc.
polynFilter选择较少比thresholdnuc任何核苷酸表示副本的元素。
dustyFilter selects elements with high sequence complexity, as characterized by their dustyScore. This emulates the dust command from WindowMaker software. Calculations can be memory intensive; use batchSize to process the argument to dustyFilter in batches of the specified size.
dustyFilter选择序列的复杂性高的元素,如dustyScore特点。这是模拟的dustWindowMaker软件命令。计算可以为内存密集型;使用batchSize处理参数指定大小的批次dustyFilter。
srdistanceFilter selects elements at an edit distance greater than threshold from all sequences in subject.
srdistanceFilter选择编辑距离大于元素threshold序列subject。
alignQualityFilter selects elements with alignQuality(x) greater than threshold.
alignQualityFilter选择alignQuality(x)比threshold元素。
alignDataFilter selects elements with pData(alignData(x)) satisfying expr. expr should be formulated as though it were to be evaluated as eval(expr, pData(alignData(x))).
alignDataFilter选择pData(alignData(x))满足expr的元素。 expr应制定,就好像是要为eval(expr, pData(alignData(x)))评估。
值----------Value----------
srFilter returns an object of SRFilter.
srFilterSRFilter返回一个对象。
Built-in filters return a logical vector of length(x), with TRUE indicating components that pass the filter.
内置过滤器返回length(x)的逻辑向量,TRUE显示元件,通过过滤器。
作者(S)----------Author(s)----------
Martin Morgan <mtmorgan@fhcrc.org>
参见----------See Also----------
SRFilter.
SRFilter。
举例----------Examples----------
sp <- SolexaPath(system.file("extdata", package="ShortRead"))
aln <- readAligned(sp, "s_2_export.txt") # Solexa export file, as example[公司Solexa出口文件,例如]
# a 'chromosome 5' filter[“5号染色体的过滤器]
filt <- chromosomeFilter("chr5.fa")
aln[filt(aln)]
# filter during input[在输入过滤]
readAligned(sp, "s_2_export.txt", filter=filt)
# x- and y- coordinates stored in alignData, when source is SolexaExport[x和y坐标存储在alignData,当源是SolexaExport]
xy <- alignDataFilter(expression(abs(x-500) > 200 & abs(y-500) > 200))
aln[xy(aln)]
# both filters as a single filter[作为一个单一的过滤器两个过滤器]
chr5xy <- compose(filt, xy)
aln[chr5xy(aln)]
# both filters as a collection[作为一个集合两个过滤器]
filters <- c(filt, xy)
subsetByFilter(aln, filters)
summary(filters, aln)
# read, chromosome, strand, position tuples occurring exactly once[阅读,染色体,钢绞线,位置元组发生一次]
aln[occurrenceFilter(withSread=TRUE, duplicates="none")(aln)]
# reads occurring exactly once[读取发生一次]
aln[occurrenceFilter(withSread=NA, duplicates="none")(aln)]
# chromosome, strand, position tuples occurring exactly once[发生一次染色体,钢绞线,位置的元组]
aln[occurrenceFilter(withSread=FALSE, duplicates="none")(aln)]
# custom filter: minimum calibrated base call quality >20[自定义过滤器:最低校准碱基的通话质量> 20]
goodq <- srFilter(function(x) {
apply(as(quality(x), "matrix"), 1, min) > 20
}, name="GoodQualityBases")
goodq
aln[goodq(aln)]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|