findChunks(segmentSeq)
findChunks()所属R语言包:segmentSeq
Identifies ‘chunks’ of data within a set of aligned reads.
标识的“块”内一套对准读取数据。
译者:生物统计家园网 机器人LoveR
描述----------Description----------
This function identifies chunks of data within a set of aligned reads by looking for gaps within the alignments; regions where no reads align. If we assume that a locus should not contain a gap of sufficient length, then we can separate the analysis of the data into chunks defined by these gaps, reducing the complexity of the problem of segmentation.
此功能识别内对齐集的数据块读取的路线内寻找差距;区域没有读对齐。如果我们假设,轨迹不应该包含足够的长度的差距,那么我们可以分成这些差距定义块数据的分析,减少了分割问题的复杂性。
用法----------Usage----------
findChunks(alignments, gap, checkDuplication = TRUE)
参数----------Arguments----------
参数:alignments
A GRanges object defining a set of aligned reads.
一个GRanges对象定义一套对准读取。
参数:gap
The minimum length of a gap across which it is assumed that no locus can exist.
一个跨越它认为没有座位可以存在的差距最小长度。
参数:checkDuplication
Should we check whether or not reads are duplicated within a chunk? Defaults to TRUE.
我们应该检查是否读取一个块内重复?默认为true。
Details
详情----------Details----------
This function is called by the readGeneric and readBAM functions but may usefully be called again if filtering of an linkS4class{alignmentData} object has altered the data present, or to increase the computational effort required for subsequent analysis. The lower the "gap" parameter used to define the chunks, the faster (though potentially less accurate) any subsequent analyses will be.
此功能称为readGeneric和readBAM功能,但可能会有益,如果过滤,又被称为:linkS4class{alignmentData}对象已经改变目前的数据,或者增加后续分析所需的计算。较低的“差距”的参数,用来定义块的更快(虽然可能不太准确)的任何后续分析会。
值----------Value----------
A modified GRanges object, now containing columns "chunk" and "chunkDup" (if 'checkDuplication' is TRUE), identifying the chunk to which the alignment belongs and whether the alignment of the tag is duplicated within the chunk respectively.
一种改进的GRanges对象,现在包含列块和chunkDup“(如”checkDuplication“为TRUE),确定是否对齐属于块和块内重复标记对齐分别。
作者(S)----------Author(s)----------
Thomas J. Hardcastle
举例----------Examples----------
# Define the chromosome lengths for the genome of interest.[定义感兴趣的基因组染色体长度。]
chrlens <- c(2e6, 1e6)
# Define the files containing sample information.[定义文件包含样本信息。]
datadir <- system.file("extdata", package = "segmentSeq")
libfiles <- c("SL9.txt", "SL10.txt", "SL26.txt", "SL32.txt")
# Establish the library names and replicate structure.[建立图书馆的名称和复制结构。]
libnames <- c("SL9", "SL10", "SL26", "SL32")
replicates <- c(1,1,2,2)
# Read the files to produce an `alignmentData' object.[阅读文件的产生alignmentData“对象。]
alignData <- readGeneric(file = libfiles, dir = datadir, replicates =
replicates, libnames = libnames, chrs = c(">Chr1", ">Chr2"), chrlens =
chrlens, gap = 100)
# Filter the data on number of matches of each tag to the genome[每个标签的匹配数量上的“筛选”的基因组数据]
alignData <- alignData[values(alignData@alignments)$matches < 5,]
# Redefine the chunking structure of the data.[重新定义数据块的结构。]
alignData <- findChunks(alignData@alignments, gap = 100)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|