R语言 IRanges包 IntervalTree-class()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 22:27:58

IntervalTree-class(IRanges)
IntervalTree-class()所属R语言包：IRanges

                                    Interval Search Trees
                                       区间搜索树

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Efficiently perform overlap queries with an interval tree.
有效地执行查询同一个时间间隔树重叠。

Details

详情----------Details----------

A common type of query that arises when working with intervals is finding which intervals in one set overlap those in another. An efficient family of algorithms for answering such queries is known as the Interval Tree. This implementation makes use of the augmented tree algorithm from the reference below, but heavily adapts it for the use case of large, sorted query sets.
查询间隔被发现在一组的间隔重叠在另一个工作时所产生的一个常见的类型。回答此类查询的高效算法的家庭被称为区间树。此实现使用下面的参考，增强树算法，但大量适应大，排序查询台的使用情况。

The simplest approach is to call the findOverlaps function on a Ranges or other object with range information, as described in the following section.
最简单的方法是调用findOverlaps或其他对象与范围的信息，如以下部分所述，Ranges功能。

An IntervalTree object is a derivative of Ranges and stores its ranges as a tree that is optimized for overlap queries. Thus, for repeated queries against the same subject, it is more efficient to create an IntervalTree once for the subject using the constructor described below and then perform the queries against the IntervalTree instance.
IntervalTree对象是一个Ranges衍生和商店，一个是重叠的查询优化树的范围。因此，对同一主题重复查询，这是更有效地创建IntervalTree一次使用的构造问题，下面描述，然后对IntervalTree实例执行查询。

查找重叠----------Finding Overlaps----------

This main purpose of the interval tree is to optimize the search for ranges overlapping those in a query set. The interface for this operation is the findOverlaps function.
此区间树的主要目的是为了优化重叠在一个查询集的搜索范围。此操作界面是findOverlaps功能。

findOverlaps(query, subject = query, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), drop = FALSE, ignoreSelf = FALSE, ignoreRedundant = FALSE):
findOverlaps(query, subject = query, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal"), select = c("all", "first", "last", "arbitrary"), drop = FALSE, ignoreSelf = FALSE, ignoreRedundant = FALSE)：

Find the intervals in query, a Ranges, RangesList, RangedData, or integer vector (to be converted to length-one ranges), that overlap with the intervals subject, a Ranges, RangesList, or RangedData. If query is unsorted, it is sorted first, so it is usually better to sort up-front, to avoid a sort with each findOverlaps call.
query，Ranges，RangesList，RangedData或integer的向量（要转换为长度范围），重叠的间隔间隔subject，Ranges，RangesList或RangedData。如果query是无序的，它是第一个排序，所以它通常是更好的排序的前端，以避免与每个findOverlaps呼叫排序。

If subject is omitted, query is queried against itself. In this case, and only this case, the ignoreSelf and ignoreRedundant arguments are allowed. By default, the result will contain hits for each range against itself, and if there is a hit from A to B, there is also a hit for B to A. If ignoreSelf is TRUE, all self matches are dropped. If ignoreRedundant is TRUE, only one of A->B and B->A is returned.
subject如果省略，query质疑反对本身。在这种情况下，只有这种情况下，ignoreSelf和ignoreRedundant参数被允许。默认情况下，结果将包含每个范围的命中对本身，如果有一个从A到B的命中，也有命中为B到A ignoreSelf如果是TRUE，所有自比赛都将被丢弃。如果ignoreRedundant是TRUE，只有一个 - > B和B  - >返回。

Intervals with a separation of maxgap or less and a minimum of minoverlap overlapping positions, allowing for maxgap, are considered to be overlapping.  maxgap should be a scalar, non-negative, integer. minoverlap should be a scalar, positive integer.
了maxgap或更少的分离和最小minoverlap重叠位置的时间间隔，允许maxgap，都被认为是重叠的。 maxgap应该是一个标量，非负整数。 minoverlap应该是一个标量，正整数。

When select is "all", the results are returned as a RangesMatching object. When select is "first", "last", or "arbitrary" the results are returned as an integer vector of length query containing the first, last, or arbitrary overlapping interval in subject, with NA indicating intervals that did not overlap any intervals in subject.
当select是“所有”，结果被返回RangesMatching对象。当select是"first"，"last"或"arbitrary"结果返回一个长整数向量query包含第一，去年，或任意重叠区间在subject，NA表明没有重叠在subject任何间隔的时间间隔。

If query is a RangesList or RangedData, subject must be a RangesList or RangedData.  If both lists have names, each element from the subject is paired with the element from the query with the matching name, if any. Otherwise, elements are paired by position. The overlap is then computed between the pairs as described above. If select is "all", a RangesMatchingList is returned. For all other select the return value depends on the drop argument. When select != "all" && !drop, an IntegerList is returned, where each element of the result corresponds to a space in query. Whenselect != "all" && drop, an integer vector is returned containing indices that are offset to align with the unlisted query.
如果query是一个RangesList或RangedData，subject必须是一个RangesList或RangedData。如果两个列表的名称，从题目中的每个元素是成对的查询相匹配的名字元素，如果有的话。否则，元素是成对的位置。对如上所述之间的重叠，然后计算。如果select"all"，RangesMatchingList返回。对于所有其他selectdrop参数返回值的依赖。当select != "all" && !drop，IntegerList返回结果中的每个元素对应一个空间query。当select != "all" && drop，整数向量，则返回包含偏移配合非上市query指数。

By default, any overlap is accepted. By specifying the type parameter, one can select for specific types of overlap. The types correspond to operations in Allen's Interval Algebra (see references). If type is start or end, the intervals are required to have matching starts or ends, respectively. While this operation seems trivial, the naive implementation using outer would be much less efficient. Specifying equal as the type returns the intersection of the start and end matches. If type is within, the query interval must be wholly contained within the subject interval. Note that all matches must additionally satisfy the minoverlap constraint described above.
默认情况下，任何重叠被接受。通过指定type参数的，可以选择特定类型的重叠。 Allen的区间代数（参见参考资料）操作类型对应。如果type是start或end，需要有匹配的开始或结束，分别间隔。虽然这种操作似乎是微不足道的，天真的实施，使用outer会少得多效率。指定equal类型返回start和end比赛的交集。 type如果是within，查询的时间间隔必须完全包含主体的时间间隔内。请注意，所有比赛必须另外满足minoverlap约束如上所述。

The maxgap parameter has special meaning with the special overlap types. For start, end, and equal, it specifies the maximum difference in the starts, ends or both, respectively. For within, it is the maximum amount by which the query may be wider than the subject.
maxgap参数有特殊意义的特殊类型重叠。 start，end，equal，它的开始，结束或两个，分别指定的最大区别。 within，它的查询可能会受到广泛的最高金额。

countOverlaps(query, subject, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal")): Returns the overlap hit count for each range in query using the specified findOverlaps parameters. Both query and subject should be Ranges, RangesList or RangedData objects.
countOverlaps(query, subject, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal"))：返回query使用指定的findOverlaps参数为每个范围重叠打计数。既query和subject应该是Ranges，RangesList或RangedData对象。

subsetByOverlaps(query, subject, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal")): Returns the subset of query that has an overlap hit with a range in subject using the specified findOverlaps parameters. Both query and subject should be Ranges, RangesList or RangedData objects.
subsetByOverlaps(query, subject, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal"))：返回的子集query有subject使用指定的findOverlaps参数范围与打击重叠。既query和subject应该是Ranges，RangesList或RangedData对象。

x %in% table: Shortcut for finding the ranges in x that overlap any of the ranges in table. Both x and table should be Ranges, RangesList or RangedData objects. For Ranges objects, the result is a logical vector of length equal to the number of ranges in x. For RangesList and RangedData objects, the result is a LogicalList object, where each element of the result corresponds to a space in x.
x %in% table：寻找范围在捷径x重叠table的任何范围。既x和table应该是Ranges，RangesList或RangedData对象。 Ranges对象，结果是一个logical向量长度等于不等x的数量。 RangesList和RangedData对象，结果是一个LogicalList对象，其中的每个元素对应一个x空间。

match(x, table, nomatch = NA_integer_, incomparables = NULL): Returns an integer vector of length length(x), containing the index of the first overlapping range in table for each range in x. If a range in x does not overlap any ranges in table, its value is nomatch. The x and table arguments should either be both Ranges objects or both RangesList objects, in which case the indices are into the unlisted table. The incomparables argument is currently ignored.
match(x, table, nomatch = NA_integer_, incomparables = NULL)：返回一个长整数向量length(x)，包含指数第一的重叠范围table每个x范围内。如果在x范围不重叠table，它的值是nomatch任何范围。 x和table参数应该是既Ranges对象或两个RangesList对象，在这种情况下，指数进入非上市table。 incomparables参数是目前被忽略。

构造----------Constructor----------

IntervalTree(ranges): Creates an IntervalTree from the ranges in ranges, an object coercible to IntervalTree, such as an IRanges object.
IntervalTree（范围）：创建一个IntervalTree从ranges范围，强制转换对象IntervalTree，等IRanges对象。

强迫----------Coercion----------

as(from, "IRanges"): Imports the ranges in from, an IntervalTree, to an
“as(from, "IRanges")：进口from范围，IntervalTree，到

as(from, "IntervalTree"): Constructs an IntervalTree representing from, a Ranges object that is coercible to IRanges.
as(from, "IntervalTree")：构造一个IntervalTree代表from，RangesIRanges为强制的对象。

存取----------Accessors----------

length(x): Gets the number of ranges stored in the tree. This is a fast operation that does not bring the ranges into
length(x)：获取存储在树的范围。这是一个快速的操作，不带范围

start(x): Get the starts of the ranges.
start(x)：获取范围的开始。

end(x): Get the ends of the ranges.
end(x)：获取范围的两端。

时间复杂度上的注意事项----------Notes on Time Complexity----------

The cost of constructing an instance of the interval tree is a O(n*lg(n)), which makes it about as fast as other types of overlap query algorithms based on sorting. The good news is that the tree need only be built once per subject; this is useful in situations of frequent querying. Also, in this implementation the data is stored outside of R, avoiding needless copying. Of course, external storage is not always convenient, so it is possible to coerce the tree to an instance of IRanges (see the Coercion section).
间隔树的一个实例的建设成本是一个O(n*lg(n))，这使得它像其他类型的重叠查询的排序算法的快速。好消息是，只能建立一次每科树需要频繁查询的情况下，这是有用的。此外，在此实现数据存储的R之外，避免不必要的复制。当然，外部存储并不总是很方便，所以很可能强迫树IRanges（见胁迫节）的一个实例。

For the query operation, the running time is based on the query size m and the average number of hits per query k. The output size is then max(mk,m), but we abbreviate this as mk. Note that when the multiple parameter is set to FALSE, k is fixed to 1 and drops out of this analysis. We also assume here that the query is sorted by start position (the findOverlaps function sorts the query if it is unsorted).
对于查询操作，运行时间是基于查询的大小m每点击查询k平均数。输出的大小，然后max(mk,m)，但我们简称为mk。请注意，当multiple参数设置FALSE，k被固定为1下降了这一分析。我们还假设在这里，由起始位置（findOverlaps功能各种查询，如果是无序）排序查询。

An upper bound for finding overlaps is O(min(mk*lg(n),n+mk)). The fastest interval tree algorithm known is bounded by O(min(m*lg(n),n)+mk) but is a lot more complicated and involves two auxillary trees. The lower bound is Omega(lg(n)+mk), which is almost the same as for returning the answer, Omega(mk). The average is of course somewhere in between.
为寻找重叠的上限是O(min(mk*lg(n),n+mk))。在最快的时间间隔称为树算法界O(min(m*lg(n),n)+mk)但复杂得多，涉及到两个辅助的树木。下界是Omega(lg(n)+mk)，这几乎是相同，返回答案Omega(mk)。当然是介于两者之间的平均。

This analysis informs the choice of which set of ranges to process into a tree, i.e. assigning one to be the subject and the other to be the query. Note that if m > n, then the running time is O(m), and the total operation of complexity O(n*lg(n) + m) is better than if m and n were exchanged. Thus, for once-off operations, it is often most efficient to choose the smaller set to become the tree (but k also affects this). This is reinforced by the realization that if mk is about the same in either direction, the running time depends only on n, which should be minimized. Even in cases where a tree has already been constructed for one of the sets, it can be more efficient to build a new tree when the existing tree of size n is much larger than the query set of size m, roughly when n > m*lg(n).
这种分析，通知的范围设置成一棵树，即指定一个处理的主体和其他查询的选择。请注意，如果m > n，然后运行时间是O(m)，如果O(n*lg(n) + m)和m交换了更好的复杂n操作。因此，对于一次性操作，它往往是最有效的选择将成为树（但k也影响）。这是实现加强mk如果是在任一方向相同，运行时间仅n，应尽量减少依赖。即使在其中一棵树已为其中一组构造的情况下，它可以更有效地建立一个新的树，当现有大小n树是远远比大小m查询集，大致n > m*lg(n)。

作者（S）----------Author(s)----------

Michael Lawrence

参考文献----------References----------

Charles E.; Rivest, Ronald L.; Stein, Clifford. Introduction to Algorithms, second edition, MIT Press and McGraw-Hill. ISBN 0-262-53196-8
James F. Allen: Maintaining knowledge about temporal intervals. In: Communications of the ACM. 26/11/1983. ACM Press. S. 832-843, ISSN 0001-0782

参见----------See Also----------

Ranges, the parent of this class, RangesMatching, the result of an overlap query.
Ranges，这个类的父，RangesMatching，重叠的查询结果。

举例----------Examples----------

  query <- IRanges(c(1, 4, 9), c(5, 7, 10))
  subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
  tree <- IntervalTree(subject)

  ## at most one hit per query[每个查询＃在最一击]
  findOverlaps(query, tree, select = "first")
  findOverlaps(query, tree, select = "last")
  findOverlaps(query, tree, select = "arbitrary")

  ## allow multiple hits[＃允许多个点击]
  findOverlaps(query, tree)

  ## overlap as long as distance <= 1[＃重叠距离<= 1]
  findOverlaps(query, tree, maxgap = 1L)

  ## shortcut[＃捷径]
  findOverlaps(query, subject)

  ## query and subject are easily interchangeable[＃查询和主题很容易互换]
  query <- IRanges(c(1, 4, 9), c(5, 7, 10))
  subject <- IRanges(c(2, 2), c(5, 4))
  tree <- IntervalTree(subject)
  t(findOverlaps(query, tree))
  # the same as:[一样的：]
  findOverlaps(subject, query)

  ## one Ranges with itself[＃一范围与自身]
  findOverlaps(query)

  ## single points as query[＃作为查询的单点]
  subject <- IRanges(c(1, 6, 13), c(4, 9, 14))
  findOverlaps(c(3L, 7L, 10L), subject, select = "first")

  ## alternative overlap types[＃替代重叠类型]
  query <- IRanges(c(1, 5, 3, 4), width=c(2, 2, 4, 6))
  subject <- IRanges(c(1, 3, 5, 6), width=c(4, 4, 5, 4))

  findOverlaps(query, tree, type = "start")
  findOverlaps(query, tree, type = "start", maxgap = 1L)
  findOverlaps(query, tree, type = "end", select = "first")
  findOverlaps(query, tree, type = "within", maxgap = 1L)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 IRanges包 IntervalTree-class()函数中文帮助文档(中英文对照)

浏览过的版块