R语言 Biostrings包 nucleotideFrequency()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-25 13:48:15

nucleotideFrequency(Biostrings)
nucleotideFrequency()所属R语言包：Biostrings

                                    Calculate the frequency of oligonucleotides in a DNA
                                       在DNA计算寡核苷酸的频率

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Given a DNA or RNA sequence (or a set of DNA or RNA sequences), the oligonucleotideFrequency function computes the frequency of all possible oligonucleotides of a given length (called the "width" in this particular context).
由于一个DNA或RNA序列（或一组DNA或RNA序列），oligonucleotideFrequency函数计算给定长度的（所谓的“宽”在这个特殊的背景下）的所有可能的寡核苷酸的频率。

The dinucleotideFrequency and trinucleotideFrequency functions are convenient wrappers for calling oligonucleotideFrequency with width=2 and width=3, respectively.
dinucleotideFrequency和trinucleotideFrequency功能是方便调用oligonucleotideFrequencywidth=2和width=3，分别包装。

The nucleotideFrequencyAt function computes the frequency of the short sequences formed by extracting the nucleotides found at some fixed positions from each sequence of a set of DNA or RNA sequences.
nucleotideFrequencyAt函数计算提取一些固定的位置，从每个组DNA或RNA序列的序列，发现核苷酸组成的短序列的频率。

In this man page we call "DNA input" (or "RNA input") an XString, XStringSet, XStringViews or MaskedXString object of base type DNA (or RNA).
在这名男子页面，我们称之为“输入的DNA”（或“核糖核酸输入”）一个XString，XStringSet，XStringViews或基本类型的DNA（或RNA）MaskedXString对象。

用法----------Usage----------

oligonucleotideFrequency(x, width, as.prob=FALSE, as.array=FALSE,
                     fast.moving.side="right", with.labels=TRUE, ...)

## S4 method for signature 'XStringSet'
oligonucleotideFrequency(x,
width, as.prob=FALSE, as.array=FALSE,
fast.moving.side="right", with.labels=TRUE, simplify.as="matrix")

dinucleotideFrequency(x, as.prob=FALSE, as.matrix=FALSE,
                  fast.moving.side="right", with.labels=TRUE, ...)
trinucleotideFrequency(x, as.prob=FALSE, as.array=FALSE,
                     fast.moving.side="right", with.labels=TRUE, ...)

nucleotideFrequencyAt(x, at, as.prob=FALSE, as.array=TRUE,
                  fast.moving.side="right", with.labels=TRUE, ...)

## Some related functions:
oligonucleotideTransitions(x, left=1, right=1, as.prob=FALSE)
mkAllStrings(alphabet, width, fast.moving.side="right")

参数----------Arguments----------

参数：x
Any DNA or RNA input for the *Frequency and oligonucleotideTransitions functions.  An XStringSet or XStringViews object of base type DNA or RNA for nucleotideFrequencyAt.
任何*Frequency和oligonucleotideTransitions功能的DNA或RNA的输入。基本类型的DNA或RNAnucleotideFrequencyAtXStringSet或XStringViews对象。

参数：width
The number of nucleotides per oligonucleotide for oligonucleotideFrequency.  The number of letters per string for mkAllStrings.
oligonucleotideFrequency每寡核苷酸的数量。 mkAllStrings每串的字母数。

参数：at
An integer vector containing the positions to look at in each element of x.
一个整数向量的立场看，在每个元素x。

参数：as.prob
If TRUE then probabilities are reported, otherwise counts (the default).
如果TRUE然后概率报道，否则计数（默认）。

参数：as.array,as.matrix
Controls the "shape" of the returned object. If TRUE (the default for nucleotideFrequencyAt) then it's a numeric matrix (or array), otherwise it's just a "flat" numeric vector i.e. a vector with no dim attribute (the default for the *Frequency functions).
控制返回对象的“形”。如果TRUE（默认nucleotideFrequencyAt），那么它是一个数字矩阵（或数组），否则它只是一个“平”即向量的数字矢量与无昏暗的属性（默认为*Frequency功能）。

参数：fast.moving.side
Which side of the strings should move fastest? Note that, when as.array is TRUE, then the supplied value is ignored and the effective value is "left".
哪一方的字符串应该移动速度最快？请注意，当as.array是TRUE，则提供的值将被忽略，有效的值是"left"。

参数：with.labels
If TRUE then the returned object is named.
如果TRUE然后返回的对象被命名。

参数：...
Further arguments to be passed to or from other methods.
进一步的参数被传递或其他方法。

参数：simplify.as
Together with the as.array and as.matrix arguments, controls the "shape" of the returned object when the input x is an XStringSet or XStringViews object. Supported simplify.as values are "matrix" (the default), "list" and "collapsed". If simplify.as is "matrix", the returned object is a matrix with length(x) rows where the i-th row contains the frequencies for x[[i]]. If simplify.as is "list", the returned object is a list of the same length as length(x) where the i-th element contains the frequencies for x[[i]]. If simplify.as is "collapsed", then the the frequencies are computed for the entire object x as a whole (i.e. frequencies cumulated across all sequences in x).
连同as.array和as.matrix参数，控制返回对象的“形”时输入x是一个XStringSet或XStringViews对象。支持simplify.as值"matrix"（默认），"list"和"collapsed"。如果simplify.as是"matrix"，返回的对象是一个矩阵length(x)行i-TH行包含的频率为x[[i]]。如果simplify.as是"list"，返回的对象是一个具有相同长度的列表length(x)ix[[i]]个元素包含的频率。 simplify.as如果是"collapsed"，然后在频率为整个对象计算x作为一个整体（即频率在所有序列累积x）。

参数：left, right
The number of nucleotides per oligonucleotide for the rows and columns respectively in the transition matrix created by oligonucleotideTransitions.
过渡矩阵的行和列在创建oligonucleotideTransitions分别为每寡核苷酸的数量。

参数：alphabet
The alphabet to use to make the strings.
字母表中使用字符串。

值----------Value----------

If x is an XString or MaskedXString object, the *Frequency functions return a numeric vector of length 4^width. If as.array (or as.matrix) is TRUE, then this vector is formatted as an array (or matrix). If x is an XStringSet or XStringViews object, the returned object has the shape specified by the simplify.as argument.
如果x，*Frequency一个XString或MaskedXString对象函数返回一个数值向量长度4^width。如果as.array（或as.matrix）TRUE，那么这个矢量格式作为一个数组（或矩阵）。如果x是XStringSet的或XStringViews对象，返回的对象由simplify.as参数指定的形状。

作者（S）----------Author(s)----------

H. Pages and P. Aboyoun

参见----------See Also----------

alphabetFrequency, alphabet, hasLetterAt, XString-class, XStringSet-class, XStringViews-class, MaskedXString-class, GENETIC_CODE, AMINO_ACID_CODE, reverseComplement, rev
alphabetFrequency，alphabet，hasLetterAt，级XStringViews，级XStringSet，XString-级，级MaskedXString，GENETIC_CODE，AMINO_ACID_CODE，reverseComplement ，rev

举例----------Examples----------

  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  ## A. BASIC *Frequency() EXAMPLES[＃A.基本频率（）的例子]
  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  data(yeastSEQCHR1)
  yeast1 <- DNAString(yeastSEQCHR1)

  dinucleotideFrequency(yeast1)
  trinucleotideFrequency(yeast1)
  oligonucleotideFrequency(yeast1, 4)

  ## Get the less and most represented 6-mers:[＃获取少和最具代表性的6个碱基：]
  f6 <- oligonucleotideFrequency(yeast1, 6)
  f6[f6 == min(f6)]
  f6[f6 == max(f6)]

  ## Get the result as an array:[＃获取的结果作为一个数组：]
  tri <- trinucleotideFrequency(yeast1, as.array=TRUE)
  tri["A", "A", "C"] # == trinucleotideFrequency(yeast1)["AAC"][== trinucleotideFrequency（yeast1）为“AAC”]]
  tri["T", , ] # frequencies of trinucleotides starting with a "T"[三核苷酸频率开始与一个“T”]

  ## With input made of multiple sequences:[＃随着输入多个序列：]
  library(drosophila2probe)
  probes <- DNAStringSet(drosophila2probe)
  dfmat <- dinucleotideFrequency(probes)  # a big matrix[一个大矩阵]
  dinucleotideFrequency(probes, simplify.as="collapsed")
  dinucleotideFrequency(probes, simplify.as="collapsed", as.matrix=TRUE)

  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  ## B. OBSERVED DINUCLEOTIDE FREQUENCY VERSUS EXPECTED DINUCLEOTIDE[＃B.观察与预期核苷酸核苷酸频率]
  ## FREQUENCY[＃变频器]
  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  ## The expected frequency of dinucleotide "ab" based on the frequencies[＃核苷酸“AB”的预期频率的基础上的频率]
  ## of its individual letters "a" and "b" is:[＃单个字母“a”和“b”是：]
  ## exp_Fab = Fa * Fb / N if the 2 letters are different (e.g. CG)[＃exp_Fab = FA * FB / 2个字母是不同的（如企业管治）]
  ## exp_Faa = Fa * (Fa-1) / N if the 2 letters are the same (e.g. TT)[＃exp_Faa = FA *（FA-1）/ N，如果2个字母是相同的（如电汇）]
  ## where Fa and Fb are the frequencies of "a" and "b" (respectively) and[FA和FB＃，其中“A”和“B”（分别）的频率和]
  ## N the length of the sequence.[＃N序列的长度。]

  ## Here is a simple function that implements the above formula for a[＃下面是一个简单的功能，为实现上述公式]
  ## DNAString object 'x'. The expected frequencies are returned in a 4x4[＃DNAString对象X。预期的频率是在一个4x4返回]
  ## matrix where the rownames and colnames correspond to the 1st and 2nd[＃矩阵的rownames和colnames对应第一和第二]
  ## base in the dinucleotide:[＃碱基在核苷酸：]
  expectedDinucleotideFrequency <- function(x)
  {
   # Individual base frequencies.[个人的基本频率。]
   bf <- alphabetFrequency(x, baseOnly=TRUE)[DNA_BASES]
   (as.matrix(bf) %*% t(bf) - diag(bf)) / length(x)
  }

  ## On Celegans chrI:[＃Celegans chrI：]
  library(BSgenome.Celegans.UCSC.ce2)
  chrI <- Celegans$chrI
  obs_df <- dinucleotideFrequency(chrI, as.matrix=TRUE)
  obs_df  # CG has the lowest frequency[企业管治的最低频率]
  exp_df <- expectedDinucleotideFrequency(chrI)
  ## A sanity check:[＃完整性检查：]
  stopifnot(as.integer(sum(exp_df)) == sum(obs_df))

  ## Ratio of observed frequency to expected frequency:[＃观测到的频率比预期的频率：]
  obs_df / exp_df  # TA has the lowest ratio, not CG![电讯管理局局长的比率是最低的，没有重心！]

  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  ## C. nucleotideFrequencyAt()[＃C. nucleotideFrequencyAt（）]
  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  nucleotideFrequencyAt(probes, 13)
  nucleotideFrequencyAt(probes, c(13, 20))
  nucleotideFrequencyAt(probes, c(13, 20), as.array=FALSE)

  ## nucleotideFrequencyAt() can be used to answer questions like: "how[＃nucleotideFrequencyAt（）可以用来回答这样的问题：“如何]
  ## many probes in the drosophila2 chip have T, G, T, A at position[＃许多在drosophila2芯片探针有T，G，T，在位置]
  ## 2, 4, 13 and 20, respectively?"[＃2，4，13和20，分别是多少？“]
  nucleotideFrequencyAt(probes, c(2, 4, 13, 20))["T", "G", "T", "A"]
  ## or "what's the probability to have an A at position 25 if there is[＃或“什么的概率有25位置，如果有一个A]
  ## one at position 13?"[在13位？“]
  nf <- nucleotideFrequencyAt(probes, c(13, 25))
  sum(nf["A", "A"]) / sum(nf["A", ])
  ## Probabilities to have other bases at position 25 if there is an A[＃概率有25位置的其他碱基，如果有一个A]
  ## at position 13:[＃13位置：]
  sum(nf["A", "C"]) / sum(nf["A", ])  # C[Ç]
  sum(nf["A", "G"]) / sum(nf["A", ])  # G[Ğ]
  sum(nf["A", "T"]) / sum(nf["A", ])  # T[ţ]

  ## See ?hasLetterAt for another way to get those results.[＃看到了吗？hasLetterAt另一种方式来获得这些结果。]

  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  ## D. oligonucleotideTransitions()[＃四oligonucleotideTransitions（）]
  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  ## Get nucleotide transition matrices for yeast1[＃获取yeast1核苷酸转移矩阵]
  oligonucleotideTransitions(yeast1)
  oligonucleotideTransitions(yeast1, 2, as.prob=TRUE)

  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  ## E. ADVANCED *Frequency() EXAMPLES[＃E.先进的*频率（）的例子]
  ## ---------------------------------------------------------------------[＃------------------------------------------------- --------------------]
  ## Note that when dropping the dimensions of the 'tri' array, elements[＃注意，删除“三”的数组，元素的尺寸时，]
  ## in the resulting vector are ordered as if they were obtained with[＃产生的向量是有序的，如果他们获得了与]
  ## 'fast.moving.side="left"':[＃fast.moving.side =“左”：]
  triL <- trinucleotideFrequency(yeast1, fast.moving.side="left")
  all(as.vector(tri) == triL) # TRUE[真]

  ## Convert the trinucleotide frequency into the amino acid frequency[＃转换成氨基酸频率的三核苷酸频率]
  ## based on translation:[＃的基础上翻译：]
  tri1 <- trinucleotideFrequency(yeast1)
  names(tri1) <- GENETIC_CODE[names(tri1)]
  sapply(split(tri1, names(tri1)), sum) # 12512 occurrences of the stop codon[12512出现的终止密码子]

  ## When the returned vector is very long (e.g. width >= 10), using[＃当返回的向量是很长（例如宽度> = 10），使用]
  ## 'with.labels=FALSE' can improve performance significantly.[＃with.labels =假“可以显着提高性能。]
  ## Here for example, the observed speed up is between 25x and 500x:[例如，＃这里观测到的速度之间25X和500X是：]
  f12 <- oligonucleotideFrequency(yeast1, 12, with.labels=FALSE) # very fast![非常快！]

  ## Spome related functions:[＃Spome相关职能：]
  dict1 <- mkAllStrings(LETTERS[1:3], 4)
  dict2 <- mkAllStrings(LETTERS[1:3], 4, fast.moving.side="left")
  stopifnot(identical(reverse(dict1), dict2))

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 Biostrings包 nucleotideFrequency()函数中文帮助文档(中英文对照)

浏览过的版块