nucleotideFrequency(Biostrings)
nucleotideFrequency()所属R语言包:Biostrings
Calculate the frequency of oligonucleotides in a DNA
在DNA计算寡核苷酸的频率
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Given a DNA or RNA sequence (or a set of DNA or RNA sequences), the oligonucleotideFrequency function computes the frequency of all possible oligonucleotides of a given length (called the "width" in this particular context).
由于一个DNA或RNA序列(或一组DNA或RNA序列),oligonucleotideFrequency函数计算给定长度的(所谓的“宽”在这个特殊的背景下)的所有可能的寡核苷酸的频率。
The dinucleotideFrequency and trinucleotideFrequency functions are convenient wrappers for calling oligonucleotideFrequency with width=2 and width=3, respectively.
dinucleotideFrequency和trinucleotideFrequency功能是方便调用oligonucleotideFrequencywidth=2和width=3,分别包装。
The nucleotideFrequencyAt function computes the frequency of the short sequences formed by extracting the nucleotides found at some fixed positions from each sequence of a set of DNA or RNA sequences.
nucleotideFrequencyAt函数计算提取一些固定的位置,从每个组DNA或RNA序列的序列,发现核苷酸组成的短序列的频率。
In this man page we call "DNA input" (or "RNA input") an XString, XStringSet, XStringViews or MaskedXString object of base type DNA (or RNA).
在这名男子页面,我们称之为“输入的DNA”(或“核糖核酸输入”)一个XString,XStringSet,XStringViews或基本类型的DNA(或RNA)MaskedXString对象。
用法----------Usage----------
oligonucleotideFrequency(x, width, as.prob=FALSE, as.array=FALSE,
fast.moving.side="right", with.labels=TRUE, ...)
## S4 method for signature 'XStringSet'
oligonucleotideFrequency(x,
width, as.prob=FALSE, as.array=FALSE,
fast.moving.side="right", with.labels=TRUE, simplify.as="matrix")
dinucleotideFrequency(x, as.prob=FALSE, as.matrix=FALSE,
fast.moving.side="right", with.labels=TRUE, ...)
trinucleotideFrequency(x, as.prob=FALSE, as.array=FALSE,
fast.moving.side="right", with.labels=TRUE, ...)
nucleotideFrequencyAt(x, at, as.prob=FALSE, as.array=TRUE,
fast.moving.side="right", with.labels=TRUE, ...)
## Some related functions:
oligonucleotideTransitions(x, left=1, right=1, as.prob=FALSE)
mkAllStrings(alphabet, width, fast.moving.side="right")
参数----------Arguments----------
参数:x
Any DNA or RNA input for the *Frequency and oligonucleotideTransitions functions. An XStringSet or XStringViews object of base type DNA or RNA for nucleotideFrequencyAt.
任何*Frequency和oligonucleotideTransitions功能的DNA或RNA的输入。基本类型的DNA或RNAnucleotideFrequencyAtXStringSet或XStringViews对象。
参数:width
The number of nucleotides per oligonucleotide for oligonucleotideFrequency. The number of letters per string for mkAllStrings.
oligonucleotideFrequency每寡核苷酸的数量。 mkAllStrings每串的字母数。
参数:at
An integer vector containing the positions to look at in each element of x.
一个整数向量的立场看,在每个元素x。
参数:as.prob
If TRUE then probabilities are reported, otherwise counts (the default).
如果TRUE然后概率报道,否则计数(默认)。
参数:as.array,as.matrix
Controls the "shape" of the returned object. If TRUE (the default for nucleotideFrequencyAt) then it's a numeric matrix (or array), otherwise it's just a "flat" numeric vector i.e. a vector with no dim attribute (the default for the *Frequency functions).
控制返回对象的“形”。如果TRUE(默认nucleotideFrequencyAt),那么它是一个数字矩阵(或数组),否则它只是一个“平”即向量的数字矢量与无昏暗的属性(默认为*Frequency功能)。
参数:fast.moving.side
Which side of the strings should move fastest? Note that, when as.array is TRUE, then the supplied value is ignored and the effective value is "left".
哪一方的字符串应该移动速度最快?请注意,当as.array是TRUE,则提供的值将被忽略,有效的值是"left"。
参数:with.labels
If TRUE then the returned object is named.
如果TRUE然后返回的对象被命名。
参数:...
Further arguments to be passed to or from other methods.
进一步的参数被传递或其他方法。
参数:simplify.as
Together with the as.array and as.matrix arguments, controls the "shape" of the returned object when the input x is an XStringSet or XStringViews object. Supported simplify.as values are "matrix" (the default), "list" and "collapsed". If simplify.as is "matrix", the returned object is a matrix with length(x) rows where the i-th row contains the frequencies for x[[i]]. If simplify.as is "list", the returned object is a list of the same length as length(x) where the i-th element contains the frequencies for x[[i]]. If simplify.as is "collapsed", then the the frequencies are computed for the entire object x as a whole (i.e. frequencies cumulated across all sequences in x).
连同as.array和as.matrix参数,控制返回对象的“形”时输入x是一个XStringSet或XStringViews对象。支持simplify.as值"matrix"(默认),"list"和"collapsed"。如果simplify.as是"matrix",返回的对象是一个矩阵length(x)行i-TH行包含的频率为x[[i]]。如果simplify.as是"list",返回的对象是一个具有相同长度的列表length(x)ix[[i]]个元素包含的频率。 simplify.as如果是"collapsed",然后在频率为整个对象计算x作为一个整体(即频率在所有序列累积x)。
参数:left, right
The number of nucleotides per oligonucleotide for the rows and columns respectively in the transition matrix created by oligonucleotideTransitions.
过渡矩阵的行和列在创建oligonucleotideTransitions分别为每寡核苷酸的数量。
参数:alphabet
The alphabet to use to make the strings.
字母表中使用字符串。
值----------Value----------
If x is an XString or MaskedXString object, the *Frequency functions return a numeric vector of length 4^width. If as.array (or as.matrix) is TRUE, then this vector is formatted as an array (or matrix). If x is an XStringSet or XStringViews object, the returned object has the shape specified by the simplify.as argument.
如果x,*Frequency一个XString或MaskedXString对象函数返回一个数值向量长度4^width。如果as.array(或as.matrix)TRUE,那么这个矢量格式作为一个数组(或矩阵)。如果x是XStringSet的或XStringViews对象,返回的对象由simplify.as参数指定的形状。
作者(S)----------Author(s)----------
H. Pages and P. Aboyoun
参见----------See Also----------
alphabetFrequency, alphabet, hasLetterAt, XString-class, XStringSet-class, XStringViews-class, MaskedXString-class, GENETIC_CODE, AMINO_ACID_CODE, reverseComplement, rev
alphabetFrequency,alphabet,hasLetterAt,级XStringViews,级XStringSet,XString-级,级MaskedXString,GENETIC_CODE,AMINO_ACID_CODE,reverseComplement ,rev
举例----------Examples----------
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## A. BASIC *Frequency() EXAMPLES[#A.基本频率()的例子]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
data(yeastSEQCHR1)
yeast1 <- DNAString(yeastSEQCHR1)
dinucleotideFrequency(yeast1)
trinucleotideFrequency(yeast1)
oligonucleotideFrequency(yeast1, 4)
## Get the less and most represented 6-mers:[#获取少和最具代表性的6个碱基:]
f6 <- oligonucleotideFrequency(yeast1, 6)
f6[f6 == min(f6)]
f6[f6 == max(f6)]
## Get the result as an array:[#获取的结果作为一个数组:]
tri <- trinucleotideFrequency(yeast1, as.array=TRUE)
tri["A", "A", "C"] # == trinucleotideFrequency(yeast1)["AAC"][== trinucleotideFrequency(yeast1)为“AAC”]]
tri["T", , ] # frequencies of trinucleotides starting with a "T"[三核苷酸频率开始与一个“T”]
## With input made of multiple sequences:[#随着输入多个序列:]
library(drosophila2probe)
probes <- DNAStringSet(drosophila2probe)
dfmat <- dinucleotideFrequency(probes) # a big matrix[一个大矩阵]
dinucleotideFrequency(probes, simplify.as="collapsed")
dinucleotideFrequency(probes, simplify.as="collapsed", as.matrix=TRUE)
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## B. OBSERVED DINUCLEOTIDE FREQUENCY VERSUS EXPECTED DINUCLEOTIDE[#B.观察与预期核苷酸核苷酸频率]
## FREQUENCY[#变频器]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## The expected frequency of dinucleotide "ab" based on the frequencies[#核苷酸“AB”的预期频率的基础上的频率]
## of its individual letters "a" and "b" is:[#单个字母“a”和“b”是:]
## exp_Fab = Fa * Fb / N if the 2 letters are different (e.g. CG)[#exp_Fab = FA * FB / 2个字母是不同的(如企业管治)]
## exp_Faa = Fa * (Fa-1) / N if the 2 letters are the same (e.g. TT)[#exp_Faa = FA *(FA-1)/ N,如果2个字母是相同的(如电汇)]
## where Fa and Fb are the frequencies of "a" and "b" (respectively) and[FA和FB#,其中“A”和“B”(分别)的频率和]
## N the length of the sequence.[#N序列的长度。]
## Here is a simple function that implements the above formula for a[#下面是一个简单的功能,为实现上述公式]
## DNAString object 'x'. The expected frequencies are returned in a 4x4[#DNAString对象X。预期的频率是在一个4x4返回]
## matrix where the rownames and colnames correspond to the 1st and 2nd[#矩阵的rownames和colnames对应第一和第二]
## base in the dinucleotide:[#碱基在核苷酸:]
expectedDinucleotideFrequency <- function(x)
{
# Individual base frequencies.[个人的基本频率。]
bf <- alphabetFrequency(x, baseOnly=TRUE)[DNA_BASES]
(as.matrix(bf) %*% t(bf) - diag(bf)) / length(x)
}
## On Celegans chrI:[#Celegans chrI:]
library(BSgenome.Celegans.UCSC.ce2)
chrI <- Celegans$chrI
obs_df <- dinucleotideFrequency(chrI, as.matrix=TRUE)
obs_df # CG has the lowest frequency[企业管治的最低频率]
exp_df <- expectedDinucleotideFrequency(chrI)
## A sanity check:[#完整性检查:]
stopifnot(as.integer(sum(exp_df)) == sum(obs_df))
## Ratio of observed frequency to expected frequency:[#观测到的频率比预期的频率:]
obs_df / exp_df # TA has the lowest ratio, not CG![电讯管理局局长的比率是最低的,没有重心!]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## C. nucleotideFrequencyAt()[#C. nucleotideFrequencyAt()]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
nucleotideFrequencyAt(probes, 13)
nucleotideFrequencyAt(probes, c(13, 20))
nucleotideFrequencyAt(probes, c(13, 20), as.array=FALSE)
## nucleotideFrequencyAt() can be used to answer questions like: "how[#nucleotideFrequencyAt()可以用来回答这样的问题:“如何]
## many probes in the drosophila2 chip have T, G, T, A at position[#许多在drosophila2芯片探针有T,G,T,在位置]
## 2, 4, 13 and 20, respectively?"[#2,4,13和20,分别是多少?“]
nucleotideFrequencyAt(probes, c(2, 4, 13, 20))["T", "G", "T", "A"]
## or "what's the probability to have an A at position 25 if there is[#或“什么的概率有25位置,如果有一个A]
## one at position 13?"[在13位?“]
nf <- nucleotideFrequencyAt(probes, c(13, 25))
sum(nf["A", "A"]) / sum(nf["A", ])
## Probabilities to have other bases at position 25 if there is an A[#概率有25位置的其他碱基,如果有一个A]
## at position 13:[#13位置:]
sum(nf["A", "C"]) / sum(nf["A", ]) # C[Ç]
sum(nf["A", "G"]) / sum(nf["A", ]) # G[Ğ]
sum(nf["A", "T"]) / sum(nf["A", ]) # T[ţ]
## See ?hasLetterAt for another way to get those results.[#看到了吗?hasLetterAt另一种方式来获得这些结果。]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## D. oligonucleotideTransitions()[#四oligonucleotideTransitions()]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## Get nucleotide transition matrices for yeast1[#获取yeast1核苷酸转移矩阵]
oligonucleotideTransitions(yeast1)
oligonucleotideTransitions(yeast1, 2, as.prob=TRUE)
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## E. ADVANCED *Frequency() EXAMPLES[#E.先进的*频率()的例子]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## Note that when dropping the dimensions of the 'tri' array, elements[#注意,删除“三”的数组,元素的尺寸时,]
## in the resulting vector are ordered as if they were obtained with[#产生的向量是有序的,如果他们获得了与]
## 'fast.moving.side="left"':[#fast.moving.side =“左”:]
triL <- trinucleotideFrequency(yeast1, fast.moving.side="left")
all(as.vector(tri) == triL) # TRUE[真]
## Convert the trinucleotide frequency into the amino acid frequency[#转换成氨基酸频率的三核苷酸频率]
## based on translation:[#的基础上翻译:]
tri1 <- trinucleotideFrequency(yeast1)
names(tri1) <- GENETIC_CODE[names(tri1)]
sapply(split(tri1, names(tri1)), sum) # 12512 occurrences of the stop codon[12512出现的终止密码子]
## When the returned vector is very long (e.g. width >= 10), using[#当返回的向量是很长(例如宽度> = 10),使用]
## 'with.labels=FALSE' can improve performance significantly.[#with.labels =假“可以显着提高性能。]
## Here for example, the observed speed up is between 25x and 500x:[例如,#这里观测到的速度之间25X和500X是:]
f12 <- oligonucleotideFrequency(yeast1, 12, with.labels=FALSE) # very fast![非常快!]
## Spome related functions:[#Spome相关职能:]
dict1 <- mkAllStrings(LETTERS[1:3], 4)
dict2 <- mkAllStrings(LETTERS[1:3], 4, fast.moving.side="left")
stopifnot(identical(reverse(dict1), dict2))
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|