regionOverlap(Ringo)
regionOverlap()所属R语言包:Ringo
Function to compute overlap of genomic regions
函数来计算基因组区域的重叠
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Given two data frames of genomic regions, this function computes the base-pair overlap, if any, between every pair of regions from the two lists.
鉴于两个基因组区域的数据框,这个函数计算重叠碱基对的,如果有的话,每对区域从两份名单之间。
用法----------Usage----------
regionOverlap(xdf, ydf, chrColumn = "chr", startColumn = "start",
endColumn = "end", mem.limit=1e8)
参数----------Arguments----------
参数:xdf
data.frame that holds the first set of genomic regions
data.frame保存的基因组区域的第一套
参数:ydf
data.frame that holds the first set of genomic regions
data.frame保存的基因组区域的第一套
参数:chrColumn
character; what is the name of the column that holds the chromosome name of the regions in xdf and ydf
字符;持有的的xdf和ydf区域染色体名称的列名称是什么
参数:startColumn
character; what is the name of the column that holds the start position of the regions in xdf and ydf
字符;什么是列名的起始位置的区域,拥有xdf和ydf
参数:endColumn
character; what is the name of the column that holds the start position of the regions in xdf and ydf
字符;什么是列名的起始位置的区域,拥有xdf和ydf
参数:mem.limit
integer value; what is the maximal allowed size of matrices during the computation
整数价值,什么是在计算过程中的最大允许大小的矩阵
值----------Value----------
Originally, a matrix with nrow(xdf) rows and nrow(ydf) columns, in which entry X[i,j] specifies the length of the overlap between region i of the first list (xdf) and region j of the second list (ydf). Since this matrix is very sparse, we use the dgCMatrix representation from the Matrix package for it.
本来,一个nrow(xdf)行nrow(ydf)列,其中进入X[i,j]指定重叠的长度之间的区域i第一个列表(xdf矩阵)和区域j第二个列表(ydf)。由于这个矩阵是非常稀疏,我们使用dgCMatrix包Matrix表示。
注意----------Note----------
The function only return the absolute length of overlapping regions in base-pairs. It does not return the position of the overlap or the fraction of region 1 and/or region 2 that overlaps the other regions.
该函数只返回碱基对重叠区域的绝对长度。它不返回的位置重叠或部分区域1和/或区域的2重叠的其他区域。
The argument mem.limit is not really a limit to used RAM, but rather the maximal size of matrices that should be allowed during the computation. If larger matrices would arise, the second regions list is split into parts and the overlap with the first list is computed for each part. During computation, matrices of size nrow(xdf) times nrow(ydf) are created.
参数mem.limit是不是真的限制使用的RAM,而应允许在计算过程中的矩阵的最大规模。如果将产生更大的矩阵,列表的第二个区域被分割成几部分,计算每个部分的第一个列表的重叠。在计算,矩阵的大小nrow(xdf)次nrow(ydf)创建。
作者(S)----------Author(s)----------
Joern Toedling
参见----------See Also----------
dgCMatrix-class
dgCMatrix-class
举例----------Examples----------
## toy example:[#玩具的例子:]
regionsH3ac <- data.frame(chr=c("chr1","chr7","chr8","chr1","chrX","chr8"), start=c(100,100,100,510,100,60), end=c(200, 200, 200,520,200,80))
regionsH4ac <- data.frame(chr=c("chr1","chr2","chr7","chr8","chr9"),
start=c(500,100,50,80,100), end=c(700, 200, 250, 120,200))
## compare the regions first by eye[#先用眼比较的区域]
## which ones do overlap and by what amount?[#哪些做的重叠和数额由什么呢?]
regionsH3ac
regionsH4ac
## compare it to the result:[#比较的结果:]
as.matrix(regionOverlap(regionsH3ac, regionsH4ac))
nonzero(regionOverlap(regionsH3ac, regionsH4ac))
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|