R语言 Biostrings包 nucleotideFrequency()函数中文帮助文档(中英文对照)

                                        Calculate the frequency of oligonucleotides in a DNA

Given a DNA or RNA sequence (or a set of DNA or RNA sequences), the oligonucleotideFrequency function computes the frequency of all possible oligonucleotides of a given length (called the "width" in this particular context).

The dinucleotideFrequency and trinucleotideFrequency functions are convenient wrappers for calling oligonucleotideFrequency with width=2 and width=3, respectively.

The nucleotideFrequencyAt function computes the frequency of the short sequences formed by extracting the nucleotides found at some fixed positions from each sequence of a set of DNA or RNA sequences.

In this man page we call "DNA input" (or "RNA input") an XString, XStringSet, XStringViews or MaskedXString object of base type DNA (or RNA).


oligonucleotideFrequency(x, width, as.prob=FALSE, as.array=FALSE,
                         fast.moving.side="right", with.labels=TRUE, ...)

## S4 method for signature 'XStringSet'
    width, as.prob=FALSE, as.array=FALSE,
    fast.moving.side="right", with.labels=TRUE, simplify.as="matrix")

dinucleotideFrequency(x, as.prob=FALSE, as.matrix=FALSE,
                      fast.moving.side="right", with.labels=TRUE, ...)
trinucleotideFrequency(x, as.prob=FALSE, as.array=FALSE,
                       fast.moving.side="right", with.labels=TRUE, ...)

nucleotideFrequencyAt(x, at, as.prob=FALSE, as.array=TRUE,
                      fast.moving.side="right", with.labels=TRUE, ...)

## Some related functions:
oligonucleotideTransitions(x, left=1, right=1, as.prob=FALSE)
mkAllStrings(alphabet, width, fast.moving.side="right")


Any DNA or RNA input for the *Frequency and oligonucleotideTransitions functions.  An XStringSet or XStringViews object of base type DNA or RNA for nucleotideFrequencyAt.  

The number of nucleotides per oligonucleotide for oligonucleotideFrequency.  The number of letters per string for mkAllStrings.  
An integer vector containing the positions to look at in each element of x.  

If TRUE then probabilities are reported, otherwise counts (the default).  

Controls the "shape" of the returned object. If TRUE (the default for nucleotideFrequencyAt) then it's a numeric matrix (or array), otherwise it's just a "flat" numeric vector i.e. a vector with no dim attribute (the default for the *Frequency functions).  

Which side of the strings should move fastest? Note that, when as.array is TRUE, then the supplied value is ignored and the effective value is "left".  

If TRUE then the returned object is named.  

Further arguments to be passed to or from other methods.  

Together with the as.array and as.matrix arguments, controls the "shape" of the returned object when the input x is an XStringSet or XStringViews object. Supported simplify.as values are "matrix" (the default), "list" and "collapsed". If simplify.as is "matrix", the returned object is a matrix with length(x) rows where the i-th row contains the frequencies for x[[i]]. If simplify.as is "list", the returned object is a list of the same length as length(x) where the i-th element contains the frequencies for x[[i]]. If simplify.as is "collapsed", then the the frequencies are computed for the entire object x as a whole (i.e. frequencies cumulated across all sequences in x).  
参数:left, right
The number of nucleotides per oligonucleotide for the rows and columns respectively in the transition matrix created by oligonucleotideTransitions.  

The alphabet to use to make the strings.  


If x is an XString or MaskedXString object, the *Frequency functions return a numeric vector of length 4^width. If as.array (or as.matrix) is TRUE, then this vector is formatted as an array (or matrix). If x is an XStringSet or XStringViews object, the returned object has the shape specified by the simplify.as argument.


H. Pages and P. Aboyoun

参见----------See Also----------

alphabetFrequency, alphabet, hasLetterAt, XString-class, XStringSet-class, XStringViews-class, MaskedXString-class, GENETIC_CODE, AMINO_ACID_CODE, reverseComplement, rev
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## A. BASIC *Frequency() EXAMPLES[#A.基本频率()的例子]
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  yeast1 <- DNAString(yeastSEQCHR1)

  oligonucleotideFrequency(yeast1, 4)

  ## Get the less and most represented 6-mers:[#获取少和最具代表性的6个碱基:]
  f6 <- oligonucleotideFrequency(yeast1, 6)
  f6[f6 == min(f6)]
  f6[f6 == max(f6)]

  ## Get the result as an array:[#获取的结果作为一个数组:]
  tri <- trinucleotideFrequency(yeast1, as.array=TRUE)
  tri["A", "A", "C"] # == trinucleotideFrequency(yeast1)["AAC"][== trinucleotideFrequency(yeast1)为“AAC”]]
  tri["T", , ] # frequencies of trinucleotides starting with a "T"[三核苷酸频率开始与一个“T”]

  ## With input made of multiple sequences:[#随着输入多个序列:]
  probes <- DNAStringSet(drosophila2probe)
  dfmat &lt;- dinucleotideFrequency(probes)  # a big matrix[一个大矩阵]
  dinucleotideFrequency(probes, simplify.as="collapsed")
  dinucleotideFrequency(probes, simplify.as="collapsed", as.matrix=TRUE)

  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ##    FREQUENCY[#变频器]
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## The expected frequency of dinucleotide "ab" based on the frequencies[#核苷酸“AB”的预期频率的基础上的频率]
  ## of its individual letters "a" and "b" is:[#单个字母“a”和“b”是:]
  ##    exp_Fab = Fa * Fb / N if the 2 letters are different (e.g. CG)[#exp_Fab = FA * FB / 2个字母是不同的(如企业管治)]
  ##    exp_Faa = Fa * (Fa-1) / N if the 2 letters are the same (e.g. TT)[#exp_Faa = FA *(FA-1)/ N,如果2个字母是相同的(如电汇)]
  ## where Fa and Fb are the frequencies of "a" and "b" (respectively) and[FA和FB#,其中“A”和“B”(分别)的频率和]
  ## N the length of the sequence.[#N序列的长度。]
  ## Here is a simple function that implements the above formula for a[#下面是一个简单的功能,为实现上述公式]
  ## DNAString object 'x'. The expected frequencies are returned in a 4x4[#DNAString对象X。预期的频率是在一个4x4返回]
  ## matrix where the rownames and colnames correspond to the 1st and 2nd[#矩阵的rownames和colnames对应第一和第二]
  ## base in the dinucleotide:[#碱基在核苷酸:]
  expectedDinucleotideFrequency <- function(x)
      # Individual base frequencies.[个人的基本频率。]
      bf <- alphabetFrequency(x, baseOnly=TRUE)[DNA_BASES]
      (as.matrix(bf) %*% t(bf) - diag(bf)) / length(x)

  ## On Celegans chrI:[#Celegans chrI:]
  chrI <- Celegans$chrI
  obs_df <- dinucleotideFrequency(chrI, as.matrix=TRUE)
  obs_df  # CG has the lowest frequency[企业管治的最低频率]
  exp_df <- expectedDinucleotideFrequency(chrI)
  ## A sanity check:[#完整性检查:]
  stopifnot(as.integer(sum(exp_df)) == sum(obs_df))

  ## Ratio of observed frequency to expected frequency:[#观测到的频率比预期的频率:]
  obs_df / exp_df  # TA has the lowest ratio, not CG![电讯管理局局长的比率是最低的,没有重心!]

  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## C. nucleotideFrequencyAt()[#C. nucleotideFrequencyAt()]
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  nucleotideFrequencyAt(probes, 13)
  nucleotideFrequencyAt(probes, c(13, 20))
  nucleotideFrequencyAt(probes, c(13, 20), as.array=FALSE)

  ## nucleotideFrequencyAt() can be used to answer questions like: "how[#nucleotideFrequencyAt()可以用来回答这样的问题:“如何]
  ## many probes in the drosophila2 chip have T, G, T, A at position[#许多在drosophila2芯片探针有T,G,T,在位置]
  ## 2, 4, 13 and 20, respectively?"[#2,4,13和20,分别是多少?“]
  nucleotideFrequencyAt(probes, c(2, 4, 13, 20))["T", "G", "T", "A"]
  ## or "what's the probability to have an A at position 25 if there is[#或“什么的概率有25位置,如果有一个A]
  ## one at position 13?"[在13位?“]
  nf <- nucleotideFrequencyAt(probes, c(13, 25))
  sum(nf["A", "A"]) / sum(nf["A", ])
  ## Probabilities to have other bases at position 25 if there is an A[#概率有25位置的其他碱基,如果有一个A]
  ## at position 13:[#13位置:]
  sum(nf["A", "C"]) / sum(nf["A", ])  # C[&#199;]
  sum(nf["A", "G"]) / sum(nf["A", ])  # G[&#286;]
  sum(nf["A", "T"]) / sum(nf["A", ])  # T[&#355;]

  ## See ?hasLetterAt for another way to get those results.[#看到了吗?hasLetterAt另一种方式来获得这些结果。]

  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## D. oligonucleotideTransitions()[#四oligonucleotideTransitions()]
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## Get nucleotide transition matrices for yeast1[#获取yeast1核苷酸转移矩阵]
  oligonucleotideTransitions(yeast1, 2, as.prob=TRUE)

  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## E. ADVANCED *Frequency() EXAMPLES[#E.先进的*频率()的例子]
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## Note that when dropping the dimensions of the 'tri' array, elements[#注意,删除“三”的数组,元素的尺寸时,]
  ## in the resulting vector are ordered as if they were obtained with[#产生的向量是有序的,如果他们获得了与]
  ## 'fast.moving.side="left"':[#fast.moving.side =“左”:]
  triL <- trinucleotideFrequency(yeast1, fast.moving.side="left")
  all(as.vector(tri) == triL) # TRUE[真]

  ## Convert the trinucleotide frequency into the amino acid frequency[#转换成氨基酸频率的三核苷酸频率]
  ## based on translation:[#的基础上翻译:]
  tri1 <- trinucleotideFrequency(yeast1)
  names(tri1) <- GENETIC_CODE[names(tri1)]
  sapply(split(tri1, names(tri1)), sum) # 12512 occurrences of the stop codon[12512出现的终止密码子]

  ## When the returned vector is very long (e.g. width &gt;= 10), using[#当返回的向量是很长(例如宽度> = 10),使用]
  ## 'with.labels=FALSE' can improve performance significantly.[#with.labels =假“可以显着提高性能。]
  ## Here for example, the observed speed up is between 25x and 500x:[例如,#这里观测到的速度之间25X和500X是:]
  f12 &lt;- oligonucleotideFrequency(yeast1, 12, with.labels=FALSE) # very fast![非常快!]

  ## Spome related functions:[#Spome相关职能:]
  dict1 <- mkAllStrings(LETTERS[1:3], 4)
  dict2 <- mkAllStrings(LETTERS[1:3], 4, fast.moving.side="left")
  stopifnot(identical(reverse(dict1), dict2))

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


