找回密码
 注册
查看: 865|回复: 0

R语言 Biostrings包 nucleotideFrequency()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-25 13:48:15 | 显示全部楼层 |阅读模式
nucleotideFrequency(Biostrings)
nucleotideFrequency()所属R语言包:Biostrings

                                        Calculate the frequency of oligonucleotides in a DNA
                                         在DNA计算寡核苷酸的频率

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Given a DNA or RNA sequence (or a set of DNA or RNA sequences), the oligonucleotideFrequency function computes the frequency of all possible oligonucleotides of a given length (called the "width" in this particular context).
由于一个DNA或RNA序列(或一组DNA或RNA序列),oligonucleotideFrequency函数计算给定长度的(所谓的“宽”在这个特殊的背景下)的所有可能的寡核苷酸的频率。

The dinucleotideFrequency and trinucleotideFrequency functions are convenient wrappers for calling oligonucleotideFrequency with width=2 and width=3, respectively.
dinucleotideFrequency和trinucleotideFrequency功能是方便调用oligonucleotideFrequencywidth=2和width=3,分别包装。

The nucleotideFrequencyAt function computes the frequency of the short sequences formed by extracting the nucleotides found at some fixed positions from each sequence of a set of DNA or RNA sequences.
nucleotideFrequencyAt函数计算提取一些固定的位置,从每个组DNA或RNA序列的序列,发现核苷酸组成的短序列的频率。

In this man page we call "DNA input" (or "RNA input") an XString, XStringSet, XStringViews or MaskedXString object of base type DNA (or RNA).
在这名男子页面,我们称之为“输入的DNA”(或“核糖核酸输入”)一个XString,XStringSet,XStringViews或基本类型的DNA(或RNA)MaskedXString对象。


用法----------Usage----------


oligonucleotideFrequency(x, width, as.prob=FALSE, as.array=FALSE,
                         fast.moving.side="right", with.labels=TRUE, ...)

## S4 method for signature 'XStringSet'
oligonucleotideFrequency(x,
    width, as.prob=FALSE, as.array=FALSE,
    fast.moving.side="right", with.labels=TRUE, simplify.as="matrix")

dinucleotideFrequency(x, as.prob=FALSE, as.matrix=FALSE,
                      fast.moving.side="right", with.labels=TRUE, ...)
trinucleotideFrequency(x, as.prob=FALSE, as.array=FALSE,
                       fast.moving.side="right", with.labels=TRUE, ...)

nucleotideFrequencyAt(x, at, as.prob=FALSE, as.array=TRUE,
                      fast.moving.side="right", with.labels=TRUE, ...)

## Some related functions:
oligonucleotideTransitions(x, left=1, right=1, as.prob=FALSE)
mkAllStrings(alphabet, width, fast.moving.side="right")



参数----------Arguments----------

参数:x
Any DNA or RNA input for the *Frequency and oligonucleotideTransitions functions.  An XStringSet or XStringViews object of base type DNA or RNA for nucleotideFrequencyAt.  
任何*Frequency和oligonucleotideTransitions功能的DNA或RNA的输入。基本类型的DNA或RNAnucleotideFrequencyAtXStringSet或XStringViews对象。


参数:width
The number of nucleotides per oligonucleotide for oligonucleotideFrequency.  The number of letters per string for mkAllStrings.  
oligonucleotideFrequency每寡核苷酸的数量。 mkAllStrings每串的字母数。


参数:at
An integer vector containing the positions to look at in each element of x.  
一个整数向量的立场看,在每个元素x。


参数:as.prob
If TRUE then probabilities are reported, otherwise counts (the default).  
如果TRUE然后概率报道,否则计数(默认)。


参数:as.array,as.matrix
Controls the "shape" of the returned object. If TRUE (the default for nucleotideFrequencyAt) then it's a numeric matrix (or array), otherwise it's just a "flat" numeric vector i.e. a vector with no dim attribute (the default for the *Frequency functions).  
控制返回对象的“形”。如果TRUE(默认nucleotideFrequencyAt),那么它是一个数字矩阵(或数组),否则它只是一个“平”即向量的数字矢量与无昏暗的属性(默认为*Frequency功能)。


参数:fast.moving.side
Which side of the strings should move fastest? Note that, when as.array is TRUE, then the supplied value is ignored and the effective value is "left".  
哪一方的字符串应该移动速度最快?请注意,当as.array是TRUE,则提供的值将被忽略,有效的值是"left"。


参数:with.labels
If TRUE then the returned object is named.  
如果TRUE然后返回的对象被命名。


参数:...
Further arguments to be passed to or from other methods.  
进一步的参数被传递或其他方法。


参数:simplify.as
Together with the as.array and as.matrix arguments, controls the "shape" of the returned object when the input x is an XStringSet or XStringViews object. Supported simplify.as values are "matrix" (the default), "list" and "collapsed". If simplify.as is "matrix", the returned object is a matrix with length(x) rows where the i-th row contains the frequencies for x[[i]]. If simplify.as is "list", the returned object is a list of the same length as length(x) where the i-th element contains the frequencies for x[[i]]. If simplify.as is "collapsed", then the the frequencies are computed for the entire object x as a whole (i.e. frequencies cumulated across all sequences in x).  
连同as.array和as.matrix参数,控制返回对象的“形”时输入x是一个XStringSet或XStringViews对象。支持simplify.as值"matrix"(默认),"list"和"collapsed"。如果simplify.as是"matrix",返回的对象是一个矩阵length(x)行i-TH行包含的频率为x[[i]]。如果simplify.as是"list",返回的对象是一个具有相同长度的列表length(x)ix[[i]]个元素包含的频率。 simplify.as如果是"collapsed",然后在频率为整个对象计算x作为一个整体(即频率在所有序列累积x)。


参数:left, right
The number of nucleotides per oligonucleotide for the rows and columns respectively in the transition matrix created by oligonucleotideTransitions.  
过渡矩阵的行和列在创建oligonucleotideTransitions分别为每寡核苷酸的数量。


参数:alphabet
The alphabet to use to make the strings.  
字母表中使用字符串。


值----------Value----------

If x is an XString or MaskedXString object, the *Frequency functions return a numeric vector of length 4^width. If as.array (or as.matrix) is TRUE, then this vector is formatted as an array (or matrix). If x is an XStringSet or XStringViews object, the returned object has the shape specified by the simplify.as argument.
如果x,*Frequency一个XString或MaskedXString对象函数返回一个数值向量长度4^width。如果as.array(或as.matrix)TRUE,那么这个矢量格式作为一个数组(或矩阵)。如果x是XStringSet的或XStringViews对象,返回的对象由simplify.as参数指定的形状。


作者(S)----------Author(s)----------


H. Pages and P. Aboyoun



参见----------See Also----------

alphabetFrequency, alphabet, hasLetterAt, XString-class, XStringSet-class, XStringViews-class, MaskedXString-class, GENETIC_CODE, AMINO_ACID_CODE, reverseComplement, rev
alphabetFrequency,alphabet,hasLetterAt,级XStringViews,级XStringSet,XString-级,级MaskedXString,GENETIC_CODE,AMINO_ACID_CODE,reverseComplement ,rev


举例----------Examples----------


  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## A. BASIC *Frequency() EXAMPLES[#A.基本频率()的例子]
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  data(yeastSEQCHR1)
  yeast1 <- DNAString(yeastSEQCHR1)

  dinucleotideFrequency(yeast1)
  trinucleotideFrequency(yeast1)
  oligonucleotideFrequency(yeast1, 4)

  ## Get the less and most represented 6-mers:[#获取少和最具代表性的6个碱基:]
  f6 <- oligonucleotideFrequency(yeast1, 6)
  f6[f6 == min(f6)]
  f6[f6 == max(f6)]

  ## Get the result as an array:[#获取的结果作为一个数组:]
  tri <- trinucleotideFrequency(yeast1, as.array=TRUE)
  tri["A", "A", "C"] # == trinucleotideFrequency(yeast1)["AAC"][== trinucleotideFrequency(yeast1)为“AAC”]]
  tri["T", , ] # frequencies of trinucleotides starting with a "T"[三核苷酸频率开始与一个“T”]

  ## With input made of multiple sequences:[#随着输入多个序列:]
  library(drosophila2probe)
  probes <- DNAStringSet(drosophila2probe)
  dfmat &lt;- dinucleotideFrequency(probes)  # a big matrix[一个大矩阵]
  dinucleotideFrequency(probes, simplify.as="collapsed")
  dinucleotideFrequency(probes, simplify.as="collapsed", as.matrix=TRUE)

  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## B. OBSERVED DINUCLEOTIDE FREQUENCY VERSUS EXPECTED DINUCLEOTIDE[#B.观察与预期核苷酸核苷酸频率]
  ##    FREQUENCY[#变频器]
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## The expected frequency of dinucleotide "ab" based on the frequencies[#核苷酸“AB”的预期频率的基础上的频率]
  ## of its individual letters "a" and "b" is:[#单个字母“a”和“b”是:]
  ##    exp_Fab = Fa * Fb / N if the 2 letters are different (e.g. CG)[#exp_Fab = FA * FB / 2个字母是不同的(如企业管治)]
  ##    exp_Faa = Fa * (Fa-1) / N if the 2 letters are the same (e.g. TT)[#exp_Faa = FA *(FA-1)/ N,如果2个字母是相同的(如电汇)]
  ## where Fa and Fb are the frequencies of "a" and "b" (respectively) and[FA和FB#,其中“A”和“B”(分别)的频率和]
  ## N the length of the sequence.[#N序列的长度。]
  
  ## Here is a simple function that implements the above formula for a[#下面是一个简单的功能,为实现上述公式]
  ## DNAString object 'x'. The expected frequencies are returned in a 4x4[#DNAString对象X。预期的频率是在一个4x4返回]
  ## matrix where the rownames and colnames correspond to the 1st and 2nd[#矩阵的rownames和colnames对应第一和第二]
  ## base in the dinucleotide:[#碱基在核苷酸:]
  expectedDinucleotideFrequency <- function(x)
  {
      # Individual base frequencies.[个人的基本频率。]
      bf <- alphabetFrequency(x, baseOnly=TRUE)[DNA_BASES]
      (as.matrix(bf) %*% t(bf) - diag(bf)) / length(x)
  }

  ## On Celegans chrI:[#Celegans chrI:]
  library(BSgenome.Celegans.UCSC.ce2)
  chrI <- Celegans$chrI
  obs_df <- dinucleotideFrequency(chrI, as.matrix=TRUE)
  obs_df  # CG has the lowest frequency[企业管治的最低频率]
  exp_df <- expectedDinucleotideFrequency(chrI)
  ## A sanity check:[#完整性检查:]
  stopifnot(as.integer(sum(exp_df)) == sum(obs_df))

  ## Ratio of observed frequency to expected frequency:[#观测到的频率比预期的频率:]
  obs_df / exp_df  # TA has the lowest ratio, not CG![电讯管理局局长的比率是最低的,没有重心!]

  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## C. nucleotideFrequencyAt()[#C. nucleotideFrequencyAt()]
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  nucleotideFrequencyAt(probes, 13)
  nucleotideFrequencyAt(probes, c(13, 20))
  nucleotideFrequencyAt(probes, c(13, 20), as.array=FALSE)

  ## nucleotideFrequencyAt() can be used to answer questions like: "how[#nucleotideFrequencyAt()可以用来回答这样的问题:“如何]
  ## many probes in the drosophila2 chip have T, G, T, A at position[#许多在drosophila2芯片探针有T,G,T,在位置]
  ## 2, 4, 13 and 20, respectively?"[#2,4,13和20,分别是多少?“]
  nucleotideFrequencyAt(probes, c(2, 4, 13, 20))["T", "G", "T", "A"]
  ## or "what's the probability to have an A at position 25 if there is[#或“什么的概率有25位置,如果有一个A]
  ## one at position 13?"[在13位?“]
  nf <- nucleotideFrequencyAt(probes, c(13, 25))
  sum(nf["A", "A"]) / sum(nf["A", ])
  ## Probabilities to have other bases at position 25 if there is an A[#概率有25位置的其他碱基,如果有一个A]
  ## at position 13:[#13位置:]
  sum(nf["A", "C"]) / sum(nf["A", ])  # C[&#199;]
  sum(nf["A", "G"]) / sum(nf["A", ])  # G[&#286;]
  sum(nf["A", "T"]) / sum(nf["A", ])  # T[&#355;]

  ## See ?hasLetterAt for another way to get those results.[#看到了吗?hasLetterAt另一种方式来获得这些结果。]

  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## D. oligonucleotideTransitions()[#四oligonucleotideTransitions()]
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## Get nucleotide transition matrices for yeast1[#获取yeast1核苷酸转移矩阵]
  oligonucleotideTransitions(yeast1)
  oligonucleotideTransitions(yeast1, 2, as.prob=TRUE)

  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## E. ADVANCED *Frequency() EXAMPLES[#E.先进的*频率()的例子]
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## Note that when dropping the dimensions of the 'tri' array, elements[#注意,删除“三”的数组,元素的尺寸时,]
  ## in the resulting vector are ordered as if they were obtained with[#产生的向量是有序的,如果他们获得了与]
  ## 'fast.moving.side="left"':[#fast.moving.side =“左”:]
  triL <- trinucleotideFrequency(yeast1, fast.moving.side="left")
  all(as.vector(tri) == triL) # TRUE[真]

  ## Convert the trinucleotide frequency into the amino acid frequency[#转换成氨基酸频率的三核苷酸频率]
  ## based on translation:[#的基础上翻译:]
  tri1 <- trinucleotideFrequency(yeast1)
  names(tri1) <- GENETIC_CODE[names(tri1)]
  sapply(split(tri1, names(tri1)), sum) # 12512 occurrences of the stop codon[12512出现的终止密码子]

  ## When the returned vector is very long (e.g. width &gt;= 10), using[#当返回的向量是很长(例如宽度> = 10),使用]
  ## 'with.labels=FALSE' can improve performance significantly.[#with.labels =假“可以显着提高性能。]
  ## Here for example, the observed speed up is between 25x and 500x:[例如,#这里观测到的速度之间25X和500X是:]
  f12 &lt;- oligonucleotideFrequency(yeast1, 12, with.labels=FALSE) # very fast![非常快!]

  ## Spome related functions:[#Spome相关职能:]
  dict1 <- mkAllStrings(LETTERS[1:3], 4)
  dict2 <- mkAllStrings(LETTERS[1:3], 4, fast.moving.side="left")
  stopifnot(identical(reverse(dict1), dict2))

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-25 12:23 , Processed in 0.019647 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表