R语言 GenomicRanges包 summarizeOverlaps()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 19:30:23

summarizeOverlaps(GenomicRanges)
summarizeOverlaps()所属R语言包：GenomicRanges

                                    Count reads that map to genomic features
                                       计数读取映射到基因组的功能

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Count reads that map to genomic features with options to resolve reads that overlap multiple features
计数读取映射到基因组的功能与选项，以解决读取重叠的多个功能

用法----------Usage----------

  ## S4 method for signature 'GRanges,GappedAlignments'
summarizeOverlaps(
features, reads, mode, ignore.strand = FALSE, ..., param = ScanBamParam())

参数----------Arguments----------

参数：features
A GRanges or a GRangesList object of genomic  regions of interest. When a GRanges is supplied, each row is  considered a different feature. When a GRangesList is  supplied, each highest list-level is considered a feature and the multiple  elements are considered portions of the same feature. See examples or vignette for details.
一个农庄或GRangesList对象的基因组区域的利益。当格朗提供，每一行被认为是一个不同的功能。当GRangesList提供，每个列表级别最高的被认为是一个功能和多种元素被认为是部分相同的功能。有关详情，请参阅的例子或小插曲。

参数：reads
A GappedAlignments, BamFileList or a BamViews object.
的一个GappedAlignments，BamFileList或BamViews对象。

参数：mode
Character name of a function that defines the counting method to be used.  Available counting modes include "Union", "IntersectionStrict", or  "IntersectionNotEmpty" and are designed after the counting modes available  in the HTSeq package by Simon Anders (see references). A user provided count function can be used as the mode with the BamFileList  method for summarizedOverlaps.
字符的名称，一个函数的定义要使用的计算方法。可用计数模式包括“联盟”，“IntersectionStrict”，或“IntersectionNotEmpty”后，由西蒙·安德斯（参见参考资料）HTSeq包的计数模式设计。用户提供的计数功能，可以使用modeBamFileList方法summarizedOverlaps与。

"Union" : (Default) Reads that overlap any portion of exactly one  feature are counted. Reads that overlap multiple features are  discarded. For mode "Union" gapped reads are handled the same as  simple reads. If any portion of the gapped read hits >1 feature  the read is discarded.
“联盟”：（默认）读取重叠的部分计算任何一个功能。读取该重叠多种功能都将被丢弃。 “联盟”为模式跳空的读取处理简单的读取相同。如果读任何部分的的跳空读命中> 1功能将被丢弃。

"IntersectionStrict" : The read must fall completely within a  single feature to be counted. A read can overlap multiple  features but must fall within only one. In the case of gapped reads,  all portions of the read fragment must fall within the same  feature for the read to be counted. The fragments can overlap  multiple features but collectively they must fall within  only one.
“IntersectionStrict”：读必须完全属于一个单一的功能计算。读操作可以重叠多种功能，但必须属于只有一个。只读片段的所有部分在跳空读取的情况下，必须属于相同的功能，为算作阅读。碎片可以重叠多种功能，但他们必须集体范围内只有一个。

"IntersectionNotEmpty" : For this counting mode, the features are partitioned into unique disjoint regions. This is accomplished by disjoining the feature ranges then removing ranges shared by more than one feature. The result is a group of non-overlapping  regions each of which belong to a single feature.  Simple and gapped reads are counted if,
“IntersectionNotEmpty”：对于这个计数模式，功能分区成独特的不相交的区域。这是通过disjoining功能范围，然后取出超过一个功能共享的范围。结果是一组非重叠的区域，其中每个属于一个单一的功能。简单和跳空读取计数，

  the read or exactly 1 of the read fragments overlaps a unique disjoint region
只读或完全只读片段重叠独特的相交区域

the read or >1 read fragments overlap >1 unique disjoint region from the same feature
读或> 1阅读片段重叠> 1独特的不相交的区域，从相同的功能

参数：param
An optional ScanBamParam instance to further influence scanning, counting, or filtering of the BAM file.
一个的可选ScanBamParam实例进一步影响扫描，计数，或过滤的BAM文件。

参数：ignore.strand
A logical value indicating if strand should be considered when matching.
匹配时，应被视为一个逻辑值，指出如果链。

参数：...
Additional arguments for other methods.  If using multiple cores, you can pass arguments in here to be used by mclapply to indicate the number of cores to use etc.
为其他方法的其他参数。如果使用多个内核，你可以在这里传递参数用来指示核心数量，使用等由mclapply

Details

详情----------Details----------

In the context of summarizeOverlaps a "feature" can be any portion of a  genomic region such as a gene, transcript, exon etc. When the features  argument is a GRanges the rows define the features to be  overlapped. When features is a GRangesList the highest  list-levels define the features.
在上下文summarizeOverlaps“功能”，可以是任何部分，如基因组区域的基因，转录，外显子等features参数是一个农庄行定义功能重叠。当features是一个GRangesList最高级别的列表定义的功能。

summarizeOverlaps offers three mode functions to handle reads that overlap multiple features: "Union", "IntersectionStrict", and "IntersectionNotEmpty".  These functions are patterned after the counting methods in the HTSeq package (see references). Each mode has a set of rules that dictate how a read is assigned. Reads are counted a maximum of once.  Alternatively, users can provide their own counting function as the  mode argument and take advantage of the infrastructure in  summarizeOverlaps to count across multiple files and parse the results  into a SummarizedExperiment object.
summarizeOverlaps提供3mode函数来处理读取多种功能：“联盟”，“IntersectionStrict”，和“IntersectionNotEmpty”的重叠。这些功能都仿照在HTSeq包（参阅参考资料）的计算方法。每种模式都有一套规则决定如何分配只读。读取计数最多的一次。另外，用户可以提供自己的计数功能mode参数和summarizeOverlaps指望成SummarizedExperiment对象跨越多个文件，并解析结果的基础设施优势。

Currently reads must be input as either a BAM file or a  GappedAlignments object. The information in the CIGAR field  is used to determine if gapped reads are present.
目前读取要么必须输入一个BAM文件或GappedAlignments对象。在CIGAR领域的信息被用来确定如果跳空读取存在。

NOTE : summarizeOverlaps does not currently handle paired-end reads.
注：summarizeOverlaps不目前处理配对末端读取。

值----------Value----------

A SummarizedExperiment object. The assays slot holds the counts, rowData holds the features, colData will either be NULL or hold any metadata that was present in the reads.
一个SummarizedExperiment对象。 assays插槽持有计数，rowData持有features，colData将是NULL或持有任何元数据，是目前在reads 。

作者（S）----------Author(s)----------

Valerie Obenchain <vobencha@fhcrc.org>

参考文献----------References----------

home page for HTSeq
counting with htseq-count

参见----------See Also----------

DESeq, DEXSeq and edgeR packages BamFileList  BamViews
DESeq，DEXSeq和edgeR包BamFileList BamViews

举例----------Examples----------

  group_id <- c("A", "B", "C", "C", "D", "D", "E", "F", "G", "H", "H")
  features <- GRanges(
   seqnames = Rle(c("chr1", "chr2", "chr1", "chr1", "chr2", "chr2",
      "chr1", "chr1", "chr2", "chr1", "chr1")),
   strand = strand(rep("+", length(group_id))),
   ranges = IRanges(
      start=c(1000, 2000, 3000, 3600, 7000, 7500, 4000, 4000, 3000,
            5000, 5400),
      width=c(500, 900, 500, 300, 600, 300, 500, 900, 500, 500, 500)),
   DataFrame(group_id)
  )

  reads <- GappedAlignments(
   names = c("a","b","c","d","e","f","g"),
   rname = Rle(c(rep(c("chr1", "chr2"), 3), "chr1")),
   pos = as.integer(c(1400, 2700, 3400, 7100, 4000, 3100, 5200)),
   cigar = c("500M", "100M", "300M", "500M", "300M",
      "50M200N50M", "50M150N50M"),
   strand = strand(rep.int("+", 7L)))

  ## Results from countOverlaps are included to highlight how the [＃从countOverlaps结果包括突出如何]
  ## modes in summarizeOverlaps count a read a maximum of once.[＃，在summarizeOverlaps模式计算读取的最大的一次。]

  ## When the 'features' argument is a GRanges, each row [＃当功能的说法是一个农庄，每行]
  ## is treated as a different feature. [＃被视为一个不同的功能。]
  rowsAsFeatures <-
   data.frame(union = assays(summarizeOverlaps(features, reads))$counts,
               intStrict = assays(summarizeOverlaps(features, reads,
                  mode="IntersectionStrict"))$counts,
               intNotEmpty = assays(summarizeOverlaps(features, reads,
                  mode="IntersectionNotEmpty"))$counts,
               countOverlaps = countOverlaps(features, reads))

  ## When the 'features' argument is a GRangesList, each[＃当功能的说法是GRangesList，每个]
  ## highest list-level is a different feature.[＃最高的列表级别是不同的功能。]
  lst <- split(features, values(features)[["group_id"]])
  listAsFeatures <-
   data.frame(union = assays(summarizeOverlaps(lst, reads))$counts,
               intStrict = assays(summarizeOverlaps(lst, reads,
                  mode="IntersectionStrict"))$counts,
               intNotEmpty = assays(summarizeOverlaps(lst, reads,
                  mode="IntersectionNotEmpty"))$counts,
               countOverlaps = countOverlaps(lst, reads))

  ## Read across BAM files and package output for DESeq or edgeR analysis[＃读取整个BAM的文件和输出为DESeq或磨边分析软件包]
  library(Rsamtools)
  library(DESeq)
  library(edgeR)

  fls = list.files(system.file("extdata",package="GenomicRanges"),
   recursive=TRUE, pattern="*bam$", full=TRUE)
  bfl <- BamFileList(fls)
  features <- GRanges(
   seqnames = Rle(c("chr2L", "chr2R", "chr2L", "chr2R", "chr2L", "chr2R",
      "chr2L", "chr2R", "chr2R", "chr3L", "chr3L")),
   strand = strand(rep("+", 11)),
   ranges = IRanges(start=c(1000, 2000, 3000, 3600, 7000, 7500, 4000, 4000,
      3000, 5000, 5400), width=c(500, 900, 500, 300, 600, 300, 500, 900,
      500, 500, 500))
  )

  solap <- summarizeOverlaps(features, bfl)

  deseq <- newCountDataSet(countData=assays(solap)$counts,
                        conditions=rownames(colData(solap)))

  edger <- DGEList(counts=assays(solap)$counts, group=rownames(colData(solap)))

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册