R语言 Biostrings包 letterFrequency()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 13:45:54

letterFrequency(Biostrings)
letterFrequency()所属R语言包：Biostrings

                                    Calculate the frequency of letters in a biological
                                       在生物计算字母的频率

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Given a biological sequence (or a set of biological sequences), the alphabetFrequency function computes the frequency of each letter of the relevant alphabet.
鉴于生物序列（或一组生物序列），alphabetFrequency函数计算每个字母的有关字母的频率。

letterFrequency is similar, but more compact if one is only interested in certain letters. It can also tabulate letters "in common".
letterFrequency类似，但更为紧凑，如果一个人只有在某些字母感兴趣。它也可以制表“中常见的”字母。

letterFrequencyInSlidingView is a more specialized version of letterFrequency for (non-masked) XString objects. It tallys the requested letter frequencies for a fixed-width view, or window, that is conceptually slid along the entire input sequence.
letterFrequencyInSlidingViewletterFrequency（非蒙面）XString的对象是更专业的版本。它tallys请求信频率为固定宽度的观点，或窗口，在概念上沿整个输入序列下滑。

The consensusMatrix function computes the consensus matrix of a set of sequences, and the consensusString function creates the consensus sequence from the consensus matrix based upon specified criteria.
consensusMatrix函数计算一组序列的共识矩阵，consensusString函数创建基于指定的标准的共识矩阵的共识序列。

In this man page we call "DNA input" (or "RNA input") an XString, XStringSet, XStringViews or MaskedXString object of base type DNA (or RNA).
在这名男子页面，我们称之为“输入的DNA”（或“核糖核酸输入”）一个XString，XStringSet，XStringViews或基本类型的DNA（或RNA）MaskedXString对象。

用法----------Usage----------

alphabetFrequency(x, as.prob=FALSE, ...)
hasOnlyBaseLetters(x)
uniqueLetters(x)

letterFrequency(x, letters, OR="|", as.prob=FALSE, ...)
letterFrequencyInSlidingView(x, view.width, letters, OR="|", as.prob=FALSE)

consensusMatrix(x, as.prob=FALSE, shift=0L, width=NULL, ...)

## S4 method for signature 'matrix'
consensusString(x, ambiguityMap="?", threshold=0.5)
## S4 method for signature 'DNAStringSet'
consensusString(x, ambiguityMap=IUPAC_CODE_MAP,
         threshold=0.25, shift=0L, width=NULL)
## S4 method for signature 'RNAStringSet'
consensusString(x,
         ambiguityMap=
         structure(as.character(RNAStringSet(DNAStringSet(IUPAC_CODE_MAP))),
                     names=
                     as.character(RNAStringSet(DNAStringSet(names(IUPAC_CODE_MAP))))),
         threshold=0.25, shift=0L, width=NULL)

参数----------Arguments----------

参数：x
An XString, XStringSet, XStringViews or MaskedXString object for alphabetFrequency, letterFrequency, or uniqueLetters.  DNA or RNA input for hasOnlyBaseLetters.  An XString object for letterFrequencyInSlidingView.  A character vector, or an XStringSet or XStringViews object for consensusMatrix.  A consensus matrix (as returned by consensusMatrix), or an XStringSet or XStringViews object for consensusString.
一个XString，XStringSet，XStringViews或MaskedXStringalphabetFrequency，letterFrequency或uniqueLetters的对象。 DNA或RNA输入hasOnlyBaseLetters。一个的letterFrequencyInSlidingViewXString对象。一个字符向量，或consensusMatrixXStringSet或XStringViews对象的。一个共识矩阵，或consensusMatrix的XStringSet或XStringViews对象（返回）consensusString。

参数：as.prob
If TRUE then probabilities are reported, otherwise counts (the default).
如果TRUE然后概率报道，否则计数（默认）。

参数：view.width
For letterFrequencyInSlidingView, the constant (e.g. 35, 48, 1000) size of the "window" to slide along x. The specified letters are tabulated in each window of length view.width. The rows of the result (see value) correspond to the various windows.
letterFrequencyInSlidingView，不断的“窗口”的大小（如35，48，1000）滑动沿x。指定letters都列在每个窗口的长度view.width。结果行（值），对应到不同的窗口。

参数：letters
For letterFrequency or letterFrequencyInSlidingView, a character vector (e.g. "C", "CG", c("C", "G")) giving the letters to tabulate. When x is DNA or RNA input, letters must come from alphabet(x). Except with OR=0, multi-character elements of letters ('nchar' > 1) are taken as groupings of letters into subsets, to be tabulated in common ("or"'d), as if their alphabetFrequency's were added (Arithmetic). The columns of the result (see value) correspond to the individual and sets of letters which are counted separately. Unrelated (and, with some post-processing, related) counts may of course be obtained in separate calls.
letterFrequency或letterFrequencyInSlidingView，给字母制表字符向量（如“C”，“企业管治”，C（“C”的“G”））。当x是DNA或RNA输入，letters必须来自alphabet(x)。除了用OR=0，多字符元素的字母（NCHAR> 1）作为信集团成子，要共同列于（“或”D），如果他们alphabetFrequency （算术）。结果列（见价值）对应的个人和集分开计算的信件。无关（以及一些后处理，相关）计数当然可以在单独的调用。

参数：OR
For letterFrequency or letterFrequencyInSlidingView, the string (default |) to use as a separator in forming names for the "grouped" columns, e.g. "C|G". The otherwise exceptional value 0 (zero) disables or'ing and is provided for convenience, allowing a single multi-character string (or several strings) of letters that should be counted separately. If some but not all letters are to be counted separately, they must reside in separate elements of letters (with 'nchar' 1 unless they are to be grouped with other letters), and OR cannot be 0.
对于letterFrequency或letterFrequencyInSlidingView，字符串（默认的|）使用中形成的“分组”列名，作为分隔符，例如： “G”另有特殊价值0（零）禁用oring，并提供了方便，使一个单一的多字符串应单独计算的字母（或多个字符串）。如果要分开算一些，但并非所有字母，它们必须驻留在单独的元素字母（NCHAR1，除非它们是要与其他字母组合），OR不能为0。

参数：ambiguityMap
Either a single character to use when agreement is not reached or a named character vector where the names are the ambiguity characters and the values are the combinations of letters that comprise the ambiguity (e.g. link{IUPAC_CODE_MAP}). When ambiguityMap is a named character vector, occurrences of ambiguous letters in x are replaced with their base alphabet letters that have been equally weighted to sum to 1. (See Details for some examples.)
无论是单个字符，使用时不能达成协议或命名名称是模糊的字符和值是字母的组合，包括模糊特征向量（如link{IUPAC_CODE_MAP}）。当ambiguityMap是一个命名的特征向量，暧昧x已同样加权总结1与他们的基本字母表字母取代字母的出现。（见详细一些例子。）

参数：threshold
The minimum probability threshold for an agreement to be declared. When ambiguityMap is a single character, threshold is a single number in (0, 1]. When ambiguityMap is a named character vector (e.g. link{IUPAC_CODE_MAP}),  threshold is a single number in (0, 1/sum(nchar(ambiguityMap) == 1)].
被宣布的协议的最小概率阈值。当ambiguityMap是一个单一的字符，threshold是单数（0，1]。当ambiguityMap是一个命名的特征向量（如link{IUPAC_CODE_MAP}）<X >是一个单一的号码（0，1/sum（NCHAR（ambiguityMap）的== 1）]。

参数：...
Further arguments to be passed to or from other methods.  For the XStringViews and XStringSet methods, the collapse argument is accepted.  Except for letterFrequency or letterFrequencyInSlidingView, and with DNA or RNA input, the baseOnly argument is accepted. If baseOnly is TRUE, the returned vector (or matrix) only contains the frequencies of the letters that belong to the "base" alphabet of x i.e. to the alphabet returned by alphabet(x, baseOnly=TRUE).
进一步的参数被传递或其他方法。，collapse，为XStringViews和XStringSet方法接受参数。除letterFrequency或letterFrequencyInSlidingView，与DNA或RNA的输入，baseOnly论据被接受。 baseOnly如果是TRUE，返回向量（或矩阵）只包含属于“碱基”x即字母的字母表的字母频率返回alphabet(x, baseOnly=TRUE)。

参数：shift
An integer vector (recycled to the length of x) specifying how each sequence in x should be (horizontally) shifted with respect to the first column of the consensus matrix to be returned. By default (shift=0), each sequence in x has its first letter aligned with the first column of the matrix. A positive shift value means that the corresponding sequence must be shifted to the right, and a negative shift value that it must be shifted to the left. For example, a shift of 5 means that it must be shifted 5 positions to the right (i.e. the first letter in the sequence must be aligned with the 6th column of the matrix), and a shift of -3 means that it must be shifted 3 positions to the left (i.e. the 4th letter in the sequence must be aligned with the first column of the matrix).
整数向量回收到的x长度（水平）指定每个序列x应该转移方面要返回的共识矩阵的第一列。默认情况下（shift=0）每个x有对准矩阵的第一列的第一个字母的顺序。一个积极的shift值意味着，必须转移到相应的序列的权利，并负shift的价值，它必须左移。例如，5方式的转变，它必须转向5个职位的权利（即序列中的第一个字母必须与第6列的矩阵排列），-3方式的转变，必须转变3个位置的左边（即序列中的第四封信必须与矩阵的第一列对齐）。

参数：width
The number of columns of the returned matrix for the consensusMatrix method for XStringSet objects. When width=NULL (the default), then this method returns a matrix that has just enough columns to have its last column aligned with the rightmost letter of all the sequences in x after those sequences have been shifted (see the shift argument above). This ensures that any wider consensus matrix would be a "padded with zeros" version of the matrix returned when width=NULL.  The length of the returned sequence for the consensusString method for XStringSet objects.
consensusMatrix为XStringSet对象方法的返回矩阵的列数。当width=NULL（默认），则此方法返回一个矩阵，有足够的列到最后一列x这些序列已被转移后（见全部序列的最右边的字母对齐shift上述论点）。这将确保任何更广泛的共识矩阵将是一个“零填充”矩阵的版本时返回width=NULL。 consensusString为XStringSet对象方法的返回序列的长度。

Details

详情----------Details----------

alphabetFrequency, letterFrequency, and letterFrequencyInSlidingView are generic functions defined in the Biostrings package.
alphabetFrequency，letterFrequency，letterFrequencyInSlidingView是在Biostrings包定义的通用功能。

letterFrequency is similar to alphabetFrequency but specific to the letters of interest, hence more compact, especially with OR non-zero.
letterFrequencyalphabetFrequency但具体利益的字母，因此更紧凑，特别是与OR非零，是类似的。

letterFrequencyInSlidingView yields the same result, on the sequence x, that letterFrequency would, if applied to the hypothetical (and possibly huge) XStringViews object consisting of all the intervals of length view.width on x. Taking advantage of the knowledge that successive "views" are nearly identical, for letter counting purposes, it is both lighter and faster.
letterFrequencyInSlidingView产生同样的结果，对序列x，letterFrequency，如果应用的假设（可能是巨大的）XStringViews对象包括所有的间隔长度view.width上x。信计数的目的，以连续的“意见”几乎是相同的知识优势，它是更轻，速度更快。

For letterFrequencyInSlidingView, a masked (MaskedXString) object x is only supported through a cast to an (ordinary) XString such as unmasked (which includes its masked regions).
letterFrequencyInSlidingView，一个蒙面（MaskedXString）对象x只支持通过投如（普通）XString的unmasked（其中包括蒙面区域）。

When consensusString is executed with a named character ambiguityMap argument, it weights each input string equally and assigns an equal probability to each of the base letters represented by an ambiguity letter. So for DNA and a threshold of 0.25, a "G" and an "R" would result in an "R" since 1/2 "G" + 1/2 "R" = 3/4 "G" + 1/4 "A" => "R"; two "G"'s and one "R" would result in a "G" since 2/3 "G" + 1/3 "R" = 5/6 "G" + 1/6 "A" => "G"; and one "A" and one "N" would result in an "N" since 1/2 "A" + 1/2 "N" = 5/8 "A" + 1/8 "C" + 1/8 "G" + 1/8 "T" => "N".
当consensusString指定的字符ambiguityMap参数执行，它的重量每个输入字符串一视同仁，平等的概率分配到每个歧义字母所代表的基本字母。因此，对于DNA和threshold0.25，一个“G”和一个“R”会导致一个“R”自1/2的“G”+ 1/2的“R”= 3 / 4的“G”+ 1/4“”=>“R”的两个的“G”的一个“住宅”将导致一个“G”，因为2/3的“G” + 1/3的“R”= 5/6的“G”+ 1/6“”=>“的”G“和一个”A“和”N“的会导致” N“的，因为1/2的”A“+ 1/2的”N“= 5/8的”A“+ 1/8的”C“+ 1/8的”G“+ 1/8的”T“ =>“N”的。

值----------Value----------

alphabetFrequency returns an integer vector when x is an XString or MaskedXString object. When x is an XStringSet or XStringViews object, then it returns an integer matrix with length(x) rows where the i-th row contains the frequencies for x[[i]]. If x is a DNA or RNA input, then the returned vector is named with the letters in the alphabet. If the baseOnly argument is TRUE, then the returned vector has only 5 elements: 4 elements corresponding to the 4 nucleotides + the 'other' element.
alphabetFrequency返回一个整数向量x是一个XString或MaskedXString对象。当x一个XStringSet或XStringViews对象，那么它与length(x)行的i-TH行包含的频率为x[[i]]返回一个整数矩阵。如果x是一个DNA或RNA的输入，然后返回的向量被命名为英文字母。如果baseOnly参数是TRUE，然后返回向量只有5个元素：4要素对应的4个核苷酸+“其他”元素。

letterFrequency returns, similarly, an integer vector or matrix, but restricted and/or collated according to letters and OR.
letterFrequency回报，同样，一个整数的向量或矩阵，但限制和/或整理根据letters和OR。

letterFrequencyInSlidingView returns, for an XString object x of length (nchar) L, an integer matrix with L-view.width+1 rows, the i-th of which holding the letter frequencies of substring(x, i, i+view.width-1).
letterFrequencyInSlidingView回报，为XString对象x长度（nchar）LL-view.width+1行的整数矩阵，i个，其中控股substring(x, i, i+view.width-1)信的频率。

hasOnlyBaseLetters returns TRUE or FALSE indicating whether or not x contains only base letters (i.e. As, Cs, Gs and Ts for DNA input and As, Cs, Gs and Us for RNA input).
hasOnlyBaseLetters返回TRUE或FALSE与否x只包含基本字母（即，CS，GS和TS DNA输入和作为，CS，GS和我们RNA的输入）。

uniqueLetters returns a vector of 1-letter or empty strings. The empty string is used to represent the nul character if x happens to contain any. Note that this can only happen if the base class of x is BString.
uniqueLetters1字母或空字符串返回一个向量。空字符串用来表示空字符，如果x恰好包含任何。请注意，这只能发生如果基类的x是BString。

An integer matrix with letters as row names for consensusMatrix.
整数矩阵与作为consensusMatrix的行名的字母。

A standard character string for consensusString.
一个标准字符串consensusString。

作者（S）----------Author(s)----------

H. Pages and P. Aboyoun; H. Jaffee for letterFrequency and
letterFrequencyInSlidingView

参见----------See Also----------

alphabet, coverage, oligonucleotideFrequency, countPDict, XString-class, XStringSet-class, XStringViews-class, MaskedXString-class, strsplit
alphabet，coverage，oligonucleotideFrequency，countPDict，级XStringViews，级XStringSet，XString-级，级MaskedXString，strsplit

举例----------Examples----------

  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  ## alphabetFrequency()[：＃alphabetFrequency（）]
  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  data(yeastSEQCHR1)
  yeast1 <- DNAString(yeastSEQCHR1)

  alphabetFrequency(yeast1)
  alphabetFrequency(yeast1, baseOnly=TRUE)

  hasOnlyBaseLetters(yeast1)
  uniqueLetters(yeast1)

  ## With input made of multiple sequences:[＃随着输入多个序列：]
  library(drosophila2probe)
  probes <- DNAStringSet(drosophila2probe)
  alphabetFrequency(probes[1:50], baseOnly=TRUE)
  alphabetFrequency(probes, baseOnly=TRUE, collapse=TRUE)

  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  ## letterFrequency()[：＃letterFrequency（）]
  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  letterFrequency(probes[[1]], letters="ACGT", OR=0)
  base_letters <- alphabet(probes, baseOnly=TRUE)
  base_letters
  letterFrequency(probes[[1]], letters=base_letters, OR=0)
  base_letter_freqs <- letterFrequency(probes, letters=base_letters, OR=0)
  head(base_letter_freqs)
  GC_content <- letterFrequency(probes, letters="CG")
  head(GC_content)
  letterFrequency(probes, letters="CG", collapse=TRUE)

  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  ## letterFrequencyInSlidingView()[：＃letterFrequencyInSlidingView（）]
  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  data(yeastSEQCHR1)
  x <- DNAString(yeastSEQCHR1)
  view.width <- 48
  letters <- c("A", "CG")
  two_columns <- letterFrequencyInSlidingView(x, view.width, letters)
  head(two_columns)
  tail(two_columns)
  three_columns <- letterFrequencyInSlidingView(x, view.width, letters, OR=0)
  head(three_columns)
  tail(three_columns)
  stopifnot(identical(two_columns[ , "C|G"],
                  three_columns[ , "C"] + three_columns[ , "G"]))

  ## Note that, alternatively, 'three_columns' can also be obtained by[＃注意，另外，“three_columns也可以通过以下方式获得]
  ## creating the views on 'x' (as a Views object) and by calling[＃X（作为一个视图对象），并通过调用创建的意见]
  ## alphabetFrequency() on it. But, of course, that is be *much* less[，＃alphabetFrequency（）就可以了。但是，当然，这是*多*少]
  ## efficient (both, in terms of memory and speed) than using[＃高效率比使用（包括内存和速度方面）]
  ## letterFrequencyInSlidingView():[，＃letterFrequencyInSlidingView（）：]
  v <- Views(x, start=seq_len(length(x) - view.width + 1), width=view.width)
  v
  three_columns2 <- alphabetFrequency(v, baseOnly=TRUE)[ , c("A", "C", "G")]
  stopifnot(identical(three_columns2, three_columns))

  ## Set the width of the view to length(x) to get the global frequencies:[＃设置视图的宽度，长度（X），以获得全球性的频率：]
  letterFrequencyInSlidingView(x, letters="ACGTN", view.width=length(x), OR=0)

  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  ## consensus*()[＃共识*（）]
  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  ## Read in ORF data:[＃阅读框数据：]
  file <- system.file("extdata", "someORF.fa", package="Biostrings")
  orf <- read.DNAStringSet(file)

  ## To illustrate, the following example assumes the ORF data[＃为了说明，下面的例子假定的ORF数据]
  ## to be aligned for the first 10 positions (patently false):[＃要对齐的第10位（显然是假的）：]
  orf10 <- DNAStringSet(orf, end=10)
  consensusMatrix(orf10, baseOnly=TRUE)

  ## The following example assumes the first 10 positions to be aligned[＃下面的示例假定要对齐位置的前10]
  ## after some incremental shifting to the right (patently false):[＃后，一些增量的转移到右侧（显然是虚假的）：]
  consensusMatrix(orf10, baseOnly=TRUE, shift=0:6)
  consensusMatrix(orf10, baseOnly=TRUE, shift=0:6, width=10)

  ## For the character matrix containing the "exploded" representation[＃“爆炸”表示包含的字符矩阵]
  ## of the strings, do:[＃字符串，执行：]
  as.matrix(orf10, use.names=FALSE)

  ## consensusMatrix() can be used to just compute the alphabet frequency[，＃consensusMatrix（）可以用于计算的字母频率]
  ## for each position in the input sequences:[＃每个在输入序列中的位置：]
  consensusMatrix(probes, baseOnly=TRUE)

  ## After sorting, the first 5 probes might look similar (at least on[＃后排序，前5个探测器可能看起来类似（至少在]
  ## their first bases):[＃）第一碱基：]
  consensusString(sort(probes)[1:5])
  consensusString(sort(probes)[1:5], ambiguityMap = "N", threshold = 0.5)

  ## Consensus involving ambiguity letters in the input strings[＃共识涉及含糊不清的信件，在输入字符串]
  consensusString(DNAStringSet(c("NNNN","ACTG")))
  consensusString(DNAStringSet(c("AANN","ACTG")))
  consensusString(DNAStringSet(c("ACAG","ACAR")))
  consensusString(DNAStringSet(c("ACAG","ACAR", "ACAG")))

  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  ## C. RELATIONSHIP BETWEEN consensusMatrix() AND coverage()[＃C。之间consensusMatrix的关系（）和覆盖（）]
  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  ## Applying colSums() on a consensus matrix gives the coverage that[＃应用colSums（共识矩阵）给人的覆盖面]
  ## would be obtained by piling up (after shifting) the input sequences[＃将获得堆放（后移）输入序列]
  ## on top of an (imaginary) reference sequence:[＃（虚）参考序列的顶部：]
  cm <- consensusMatrix(orf10, shift=0:6, width=10)
  colSums(cm)

  ## Note that this coverage can also be obtained with:[＃注意，这个覆盖面也可以得到：]
  as.integer(coverage(IRanges(rep(1, length(orf)), width(orf)), shift=0:6, width=10))

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册