找回密码
 注册
查看: 632|回复: 0

R语言 GenomicRanges包 GappedAlignments-class()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-25 19:28:56 | 显示全部楼层 |阅读模式
GappedAlignments-class(GenomicRanges)
GappedAlignments-class()所属R语言包:GenomicRanges

                                        GappedAlignments objects
                                         GappedAlignments对象

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

The GappedAlignments class is a simple container which purpose is to store a set of alignments that will hold just enough information for supporting the operations described below.
GappedAlignments类是一个简单的容器,它的目的是为了保存一套路线将于只是足够的信息支持下述操作。


Details

详情----------Details----------

A GappedAlignments object is a vector-like object where each element describes an alignment i.e. how a given sequence (called "query" or "read", typically short) aligns to a reference sequence (typically long).
一个GappedAlignments对象是矢量对象,其中每个元素描述了一个对齐,即一个给定的序列(称为“查询”或“读”,通常短)如何对齐的参考序列(通常是术语的)。

Most of the time, a GappedAlignments object will be created by loading records from a BAM (or SAM) file and each element in the resulting object will correspond to a record. BAM/SAM records generally contain a lot of information but only part of that information is loaded in the GappedAlignments object. In particular, we discard the query sequences (SEQ field), the query qualities (QUAL), the mapping qualities (MAPQ) and any other information that is not needed in order to support the operations or methods described below.
大部分的时间中,GappedAlignments对象将创建一个BAM(或SAM)文件装载记录和生成的对象的每个元素对应一个记录。的BAM / SAM的记录通常包含了很多信息,但只有部分信息加载在GappedAlignments对象。特别是,我们放弃了查询序列(SEQ域),(QUAL)查询素质,映射素质(MAPQ)的任何其他信息,不需要为了支持的操作或方法介绍如下。

This means that multi-reads (i.e. reads with multiple hits in the reference) won't receive any special treatment i.e. the various SAM/BAM records corresponding to a multi-read will show up in the GappedAlignments object as if they were coming from different/unrelated queries. Also paired-end reads will be treated as single-end reads and the pairing information will be lost.
这意味着,多读(即在参考多个点击阅读)将不会收到任何特殊待遇,即各SAM / BAM的记录对应多读将显示在GappedAlignments的对象,如同他们从不同的未来查询/无关。配对末端读取也将单端读取和配对的信息都将丢失。

Each element of a GappedAlignments object consists of:
每个元素一个GappedAlignments对象包括:

The name of the reference sequence. (This is the RNAME field in a SAM/BAM record.)
参考序列的名称。 (这是RNAME领域中的SAM / BAM的纪录。)

The strand in the reference sequence to which the query is aligned. (This information is stored in the FLAG field in a SAM/BAM record.)
链中的参考序列,查询是一致的。 (此信息存储在标志字段中的SAM / BAM的纪录。)

The CIGAR string in the "Extended CIGAR format" (see the SAM Format Specifications for the details).
在“扩展CIGAR格式”(见细节SAM格式规范)的CIGAR字符串。

The 1-based leftmost position/coordinate of the clipped query relative to the reference sequence. We will refer to it as the "start" of the query. (This is the POS field in a SAM/BAM record.)
最左边的位置1 /协调裁剪查询相对的参考序列。我们将把它作为“启动”查询。 (这是POS机领域中的SAM / BAM的纪录。)

The 1-based rightmost position/coordinate of the clipped query relative to the reference sequence. We will refer to it as the "end" of the query. (This is NOT explicitly stored in a SAM/BAM record but can be inferred from the POS and CIGAR fields.) Note that all positions/coordinates are always relative to the first base at the 5' end of the plus strand of the reference sequence, even when the query is aligned to the minus strand.
最右边的位置1 /协调裁剪查询相对的参考序列。我们将把它作为“结束”的查询。 (这是没有明确存储在一个SAM / BAM的记录,但可以从POS和CIGAR领域的推断。)请注意,所有职位/坐标总是相对的参考序列的正链的5端的第一个碱基,即使查询时对齐的负链。

The genomic intervals between the "start" and "end" of the query that are "covered" by the alignment. Saying that the full [start,end] interval is covered is the same as saying that the alignment has no gap (no N in the CIGAR). It is then considered a simple alignment. Note that a simple alignment can have mismatches or deletions (in the reference). In other words, a deletion, encoded with a D, is NOT considered a gap.
“开始”和“结束”是“覆盖”的路线查询之间的基因组间隔。说完整的[开始,结束说,对齐有没有差距(CIGAR在没有N)的相同区间覆盖。然后考虑一个简单的校准。需要注意的是一个简单的校准可以有不匹配或删除(参考)。换句话说,删除,用D编码,不被视为一个缺口。

Note that the last 2 items are not expicitly stored in the GappedAlignments object: they are inferred on-the-fly from the CIGAR and the "start".
请注意,在过去2项目不expicitly的存储在GappedAlignments对象:他们是在飞从CIGAR和“开始”的推断。

Optionally, a GappedAlignments object can have names (accessed thru the names generic function) which will be coming from the QNAME field of the SAM/BAM records.
可选中,GappedAlignments对象可以有名称(通过names通用功能)将由未来的SAM / BAM的记录的QName领域。

The rest of this man page will focus on describing how to:
这名男子页面的其余部分将集中描述了如何:

Access the information stored in a GappedAlignments object in a way that is independent from how the data are actually stored internally.
访问GappedAlignments一种方式,是独立于实际内部存储数据是如何在对象中存储的信息。

How to create and manipulate a GappedAlignments object.
如何创建和操纵GappedAlignments对象。


构造----------Constructors----------

readGappedAlignments(file, format="BAM", use.names=FALSE, ...): Read a file as a GappedAlignments object. By default (i.e. use.names=FALSE), the resulting object has no names. If use.names is TRUE, then the names are constructed from the query template names (QNAME field in a SAM/BAM file).
readGappedAlignments(file, format="BAM", use.names=FALSE, ...):阅读作为GappedAlignments对象的文件。默认情况下(即use.names=FALSE),由此产生的对象没有名称。 use.names如果是TRUE,然后名构造查询模板名称(在一个SAM / BAM的文件的QName场)。

Note that this function is just a front-end that delegates to the format-specific back-end function specified via the format argument. The use.names argument and any extra argument are passed to the back-end function. Only the BAM format is supported for now. Its back-end is the readBamGappedAlignments function defined in the Rsamtools package. See ?readBamGappedAlignments for more information (you might need to install and load the Rsamtools package first).
请注意,此功能只是一个前端,以特定格式的后端功能,通过format参数指定的代表。 use.names参数和任何额外的参数传递给后端功能。现在只支持BAM格式。其后端是readBamGappedAlignments功能定义在Rsamtools包。看到?readBamGappedAlignments更多的信息(你可能需要安装和先加载Rsamtools包)。

GappedAlignments(rname = Rle(factor()), pos = integer(0),         cigar = character(0), strand = NULL, names = NULL, seqlengths =         NULL, ...): Create a GappedAlignments object. Named arguments in ... are used as elementMetadata.
GappedAlignments(rname = Rle(factor()), pos = integer(0),         cigar = character(0), strand = NULL, names = NULL, seqlengths =         NULL, ...):创建GappedAlignments“对象。 ...命名参数用于为elementMetadata。


存取----------Accessors----------

In the code snippets below, x is a GappedAlignments object.
在下面的代码片段,x是GappedAlignments对象。

length(x): Returns the number of alignments in x.
length(x):返回的路线x。

names(x), names(x) <- value: Gets or sets the names of x. See readGappedAlignments above for how to automatically extract and set the names from the file to read.
names(x),names(x) <- value:获取或设置x的名字。看到readGappedAlignments以上如何自动提取和设置从文件的名称来阅读。

rname(x), rname(x) <- value: Gets or sets the name of the reference sequence for each alignment in x (see Details section above for more information about the RNAME field of a SAM/BAM file). value can be a factor, or a 'factor' Rle, or a character vector.
rname(x),rname(x) <- value:获取或设置为参考序列的名称中的每个对齐x(RNAME领域的SAM / BAM的文件的更多信息,请参阅详细资料“节以上)。 value可以是一个因素,或“因子”RLE,或特征向量。

seqnames(x), seqnames(x) <- value: Same as rname(x) and rname(x) <- value.
seqnames(x),seqnames(x) <- value:相同rname(x)和rname(x) <- value。

strand(x), strand(x) <- value: Gets or sets the strand for each alignment in x (see Details section above for more information about the strand of an alignment). value can be a factor (with levels +, - and *), or a 'factor' Rle, or a character vector.
strand(x),strand(x) <- value:获取或设置每个对齐链x(有关对齐链的更多信息,请参阅“详细资料”节以上)。 value可以是一个因素(+  -  *),或“因素”RLE,或特征向量。

cigar(x): Returns a character vector of length length(x) containing the CIGAR string for each alignment.
cigar(x):返回一个长度为特征向量length(x)含CIGAR每个对齐字符串。

qwidth(x): Returns an integer vector of length length(x) containing the length of the query *after* hard clipping (i.e. the length of the query sequence that is stored in the corresponding SAM/BAM record).
qwidth(x):返回一个长整数向量length(x)包含查询的长度*后*硬剪裁(即查询序列的长度存储在相应的SAM / BAM的纪录)。

start(x), end(x): Returns an integer vector of length length(x) containing the "start" and "end" (respectively) of the query for each alignment. See Details section above for the exact definitions of the "start" and "end" of a query. Note that start(x) and end(x) are equivalent to start(granges(x)) and end(granges(x)), respectively (or, alternatively, to min(rglist(x)) and max(rglist(x)), respectively).
start(x),end(x):返回一个长整数向量length(x)包含“开始”和“结束”(分别)查询每个对齐。 “开始”和“结束”查询确切的定义,请参阅详细资料“节以上。请注意start(x)和end(x)相当于start(granges(x))和end(granges(x)),分别为(或者min(rglist(x))和max(rglist(x)),分别)。

width(x): Equivalent to width(granges(x)) (or, alternatively, to end(x) - start(x) + 1L). Note that this is generally different from qwidth(x) except for alignments with a trivial CIGAR string (i.e. a string of the form "<n>M" where <n> is a number).
width(x):等效width(granges(x))(或交替,end(x) - start(x) + 1L)。请注意,这是一般从qwidth(x)不同,除了一个平凡的CIGAR字符串(即形式"<n>M",其中<n>是一个数字的字符串)的路线。

ngap(x): Returns an integer vector of length length(x) containing the number of gaps for each alignment. Equivalent to elementLengths(rglist(x)) - 1L.
ngap(x):返回一个长整数的向量length(x)每个对齐包含的多项空白。相当于elementLengths(rglist(x)) - 1L。

seqinfo(x), seqinfo(x) <- value: Gets or sets the information about the underlying sequences. value must be a Seqinfo object.
seqinfo(x),seqinfo(x) <- value:获取或设置有关的基本序列的信息。 value,必须成为Seqinfo的对象。

seqlevels(x), seqlevels(x) <- value: Gets or sets the sequence levels. seqlevels(x) is equivalent to seqlevels(seqinfo(x)) or to levels(rname(x)), those 2 expressions being guaranteed to return identical character vectors on a GappedAlignments object. value must be a character vector with no NAs. See ?seqlevels for more information.
seqlevels(x),seqlevels(x) <- value:获取或设置序列水平。 seqlevels(x)seqlevels(seqinfo(x))levels(rname(x)),相当于,这些表达式保证返回上GappedAlignments对象的相同的特征向量。 value必须是没有NAS的特征向量。看到?seqlevels更多信息。

seqlengths(x), seqlengths(x) <- value: Gets or sets the sequence lengths. seqlengths(x) is equivalent to seqlengths(seqinfo(x)). value can be a named non-negative integer or numeric vector eventually with NAs.
seqlengths(x),seqlengths(x) <- value:获取或设置序列的长度。 seqlengths(x)相当于seqlengths(seqinfo(x))的。 value可以是命名的非负整数或最终与NAS的数字向量。

isCircular(x), isCircular(x) <- value: Gets or sets the circularity flags. isCircular(x) is equivalent to isCircular(seqinfo(x)). value must be a named logical vector eventually with NAs.
isCircular(x),isCircular(x) <- value:获取或设置的圆形标志。 isCircular(x)相当于isCircular(seqinfo(x))的。 value必须是一个命名的逻辑向量最终与NAS。

genome(x), genome(x) <- value: Gets or sets the genome identifier or assembly name for each sequence. genome(x) is equivalent to genome(seqinfo(x)). value must be a named character vector eventually with NAs.
genome(x),genome(x) <- value:获取或设置每个序列的基因组标识符或程序集名称。 genome(x)相当于genome(seqinfo(x))的。 value必须是一个命名的特征向量最终与NAS。


强迫----------Coercion----------

In the code snippets below, x is a GappedAlignments object.
在下面的代码片段,x是GappedAlignments对象。

grglist(x, drop.D.ranges=FALSE), granges(x), rglist(x, drop.D.ranges=FALSE), ranges(x): Returns either a GRangesList object, or a GRanges object, or a RangesList object, or a Ranges object of length length(x) where each element represents the regions in the reference to which a query is aligned. If drop.D.ranges is TRUE for either grglist or rglist, the ranges corresponding to deletions in the CIGAR string are dropped, i.e., they are not considered part of the alignment but are treated like the N (intron) CIGAR element. See Details section above for more information. More precisely, the RangesList object returned by rglist(x) is a CompressedNormalIRangesList object, and the Ranges object returned by ranges(x) is an IRanges object.
grglist(x, drop.D.ranges=FALSE),granges(x),rglist(x, drop.D.ranges=FALSE),ranges(x):返回一个GRangesList的对象,或农庄对象的,或RangesList对象,或一个长度的范围对象length(x) ,其中每个元素代表在查询对齐参考的区域。如果drop.D.ranges是TRUE或者grglist或rglist,到CIGAR字符串中删除相应的范围将被丢弃,也就是说,他们不考虑对齐的一部分,但像对待的N(内含子)CIGAR元素。详情上述部分获取更多信息。更多准确地说,RangesList的对象返回rglist(x)是CompressedNormalIRangesList的对象,并返回由ranges(x)的范围对象是IRanges对象。

as(x, "GRangesList"), as(x, "GRanges"), as(x, "RangesList"), as(x, "Ranges"): An alternate way of doing grglist(x), granges(x), rglist(x), ranges(x), respectively.
as(x, "GRangesList"),as(x, "GRanges"),as(x, "RangesList"),as(x, "Ranges"):做的另一种方法grglist(x),granges(x),rglist(x),<X >,分别。


子集和相关的操作----------Subsetting and related operations----------

In the code snippets below, x is a GappedAlignments object.
在下面的代码片段,x是GappedAlignments对象。

x[i]: Returns a new GappedAlignments object made of the selected alignments. i can be a numeric or logical vector.
x[i]:一个新GappedAlignments返回对象的选定路线。 i可以是一个数字或逻辑的向量。


结合----------Combining----------

c(...): Concatenates the GappedAlignment objects in ....
c(...):连接在...的GappedAlignment对象。


其他方法----------Other methods----------

qnarrow(x, start=NA, end=NA, width=NA): x is a GappedAlignments object. Returns a new GappedAlignments object of the same length as x describing how the narrowed query sequences align to the reference. The start/end/width arguments describe how to narrow the query sequences. They must be vectors of integers. NAs and negative values are accepted and "solved" according to the rules of the SEW (Start/End/Width) interface (see ?solveUserSEW for the details).
qnarrow(x, start=NA, end=NA, width=NA):x是GappedAlignments对象。返回的x描述如何缩小查询序列对齐参考相同长度的新GappedAlignments的对象。 start/end/width参数描述如何缩小查询的序列。他们必须是整数向量。 NAS和负值被接受,并按照规则的SEW(开始/结束/宽)接口(见?solveUserSEW细节)来“解决”。

narrow(x, start=NA, end=NA, width=NA): x is a GappedAlignments object. Returns a new GappedAlignments object of the same length as x describing the narrowed alignments. Unlike with qnarrow now the start/end/width arguments describe the narrowing on the reference side, not the query side. Like with qnarrow, they must be vectors of integers. NAs and negative values are accepted and "solved" according to the rules of the SEW (Start/End/Width) interface (see ?solveUserSEW for the details).
narrow(x, start=NA, end=NA, width=NA):x是GappedAlignments对象。返回的x描述收窄路线相同长度的新GappedAlignments的对象。不像qnarrow现在start/end/width参数描述参考端的缩小,而不是查询方。喜欢用qnarrow,他们必须是整数向量。 NAS和负值被接受,并按照规则的SEW(开始/结束/宽)接口(见?solveUserSEW细节)来“解决”。


作者(S)----------Author(s)----------



H. Pages and P. Aboyoun




参考文献----------References----------



参见----------See Also----------

readBamGappedAlignments, GRangesList-class, GRanges-class, seqinfo, CompressedNormalIRangesList-class, IRanges-class, coverage-methods, setops-methods, findOverlaps-methods
readBamGappedAlignments,GRangesList级,农庄类,seqinfo,CompressedNormalIRangesList级,IRanges类,覆盖方法,setops方法,findOverlaps方法


举例----------Examples----------


library(Rsamtools)  # for ScanBamParam() and the ex1.bam file[ScanBamParam()和ex1.bam的文件]
galn_file <- system.file("extdata", "ex1.bam", package="Rsamtools")
galn <- readGappedAlignments(galn_file, param=ScanBamParam(what="flag"))
galn

## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## A. BASIC MANIPULATION[#A.基本的操作]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
length(galn)
head(galn)
names(galn)  # no names by default[默认情况下,没有名字]
head(rname(galn))
seqlevels(galn)

## Rename the reference sequences:[#重命名的参考序列:]
seqlevels(galn) <- sub("seq", "chr", seqlevels(galn))
seqlevels(galn)

head(strand(galn))
head(cigar(galn))
head(qwidth(galn))
table(qwidth(galn))

grglist(galn)  # a GRangesList object[1 GRangesList对象]
granges(galn)  # a GRanges object[1农庄对象]
rglist(galn)   # a CompressedNormalIRangesList object[1 CompressedNormalIRangesList对象]
ranges(galn)   # an IRanges object[IRanges对象]
stopifnot(identical(elementLengths(grglist(galn)), elementLengths(rglist(galn))))

head(start(galn))
head(end(galn))
head(width(galn))
head(ngap(galn))

## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## B. SUBSETTING[#二子集]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
galn[strand(galn) == "-"]
galn[grep("I", cigar(galn), fixed=TRUE)]
galn[grep("N", cigar(galn), fixed=TRUE)]  # no gaps[无缝隙]

## A confirmation that all the queries map to the reference with no[#确认,所有的查询映射没有参考]
## gaps:[#差距:]
stopifnot(all(ngap(galn) == 0))

## Different ways to subset:[#不同方式的子集:]
galn[6]             # a GappedAlignments object of length 1[一个长度为1 GappedAlignments对象]
grglist(galn)[[6]]  # a GRanges object of length 1[格朗对象的长度为1]
rglist(galn)[[6]]   # a NormalIRanges object of length 1[一个长度为1 NormalIRanges对象]

## Ds are NOT gaps:[#DS没有差距:]
ii <- grep("D", cigar(galn), fixed=TRUE)
galn[ii]
ngap(galn[ii])
grglist(galn[ii])

## qwidth() vs width():[,#qwidth()与宽度():]
galn[qwidth(galn) != width(galn)]

## This MUST return an empty object:[#必须返回一个空的对象:]
galn[cigar(galn) == "35M" &amp; qwidth(galn) != 35]
## but this doesn't have too:[#但是这并没有太多:]
galn[cigar(galn) != "35M" &amp; qwidth(galn) == 35]

## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## C. qnarrow()/narrow()[三qnarrow#()/窄()]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## Note that there is no difference between qnarrow() and narrow() when[#注意有没有qnarrow(之间的差异),缩小()]
## all the alignments are simple and with no indels.[#所有的路线都是简单没有INDELS。]

## This trims 3 nucleotides on the left and 5 nucleotides on the right[#这修剪左边右边的3个核苷酸和5个核苷酸]
## of each alignment:[#每个对齐:]
qnarrow(galn, start=4, end=-6)
## Note that the 'start' and 'end' arguments specify what part of each[#注意:开始和end参数指定每个部分]
## query sequence should be kept (negative values being relative to the[#查询序列(负值相对应保持]
## right end of the query sequence), not what part should be trimmed.[#正确的查询序列结束),而不是哪一部分应修剪。]

## Trimming on the left doesn't change the "end" of the queries.[#左修剪不会改变查询的“终结”。]
qnarrow(galn, start=21)
stopifnot(identical(end(qnarrow(galn, start=21)), end(galn)))

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-2-7 10:24 , Processed in 0.034849 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表