找回密码
 注册
查看: 717|回复: 0

R语言 Biostrings包 PDict-class()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-25 13:48:45 | 显示全部楼层 |阅读模式
PDict-class(Biostrings)
PDict-class()所属R语言包:Biostrings

                                        PDict objects
                                         PDict对象

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

The PDict class is a container for storing a preprocessed dictionary of DNA patterns that can later be passed to the matchPDict function for fast matching against a reference sequence (the subject).
PDict类是一个用于存储预处理字典的DNA模式,以后可以通过快速匹配对参考序列(主题)matchPDict函数的容器。

PDict is the constructor function for creating new PDict objects.
PDict是创建新PDict对象的构造函数。


用法----------Usage----------


PDict(x, max.mismatch=NA, tb.start=NA, tb.end=NA, tb.width=NA,
         algorithm="ACtree2", skip.invalid.patterns=FALSE)



参数----------Arguments----------

参数:x
A character vector, a DNAStringSet object or an XStringViews object with a DNAString subject.  
一个字符向量,DNAStringSet的的对象或与DNAString主题的XStringViews对象。


参数:max.mismatch
A single non-negative integer or NA. See the "Allowing a small number of mismatching letters" section below.  
一个单一的非负整数或NA。请参阅下面的“允许少数不匹配字母”节。


参数:tb.start,tb.end,tb.width
A single integer or NA. See the "Trusted Band" section below.  
一个整数或NA。看到“受信任的乐队”一节。


参数:algorithm
"ACtree2" (the default) or "Twobit".  
"ACtree2"(默认)或"Twobit"。


参数:skip.invalid.patterns
This argument is not supported yet (and might in fact be replaced by the filter argument very soon).  
这种说法是不支持,但(实际上可能很快取代filter参数)。


Details

详情----------Details----------

THIS IS STILL WORK IN PROGRESS!
这是工作仍然在进行中!

If the original dictionary x is a character vector or an XStringViews object with a DNAString subject, then the PDict constructor will first try to turn it into a DNAStringSet object.
如果原来的字典x是一个字符向量或XStringViews对象与DNAString主题,然后:PDict构造将首先尝试把它成DNAStringSet对象。

By default (i.e. if PDict is called with max.mismatch=NA, tb.start=NA, tb.end=NA and tb.width=NA) the following limitations apply: (1) the original dictionary can only contain base letters (i.e. only As, Cs, Gs and Ts), therefore IUPAC ambiguity codes are not allowed; (2) all the patterns in the dictionary must have the same length ("constant width" dictionary); and (3) later matchPdict can only be used with max.mismatch=0.
默认情况下(即如果PDict称为max.mismatch=NA,tb.start=NA,tb.end=NA和tb.width=NA)以下的限制:(1)原字典只能(3)包含基本字母(即只作为CS,GS和TS),不允许因此IUPAC模糊码;(2)在字典中的所有的模式,必须具有相同的长度(“固定宽度”字典);后来matchPdict只能用于与max.mismatch=0。

A Trusted Band can be used in order to relax these limitations (see the "Trusted Band" section below).
可信的波段可以用来以放松这些限制(见“受信任的乐队”一节)。

If you are planning to use the resulting PDict object in order to do inexact matching where valid hits are allowed to have a small number of mismatching letters, then see the "Allowing a small number of mismatching letters" section below.
如果你打算使用生成的PDict对象,以做不精确匹配的有效点击允许有少数不匹配字母,然后看看下面的“允许少数不匹配字母”一节。

Two preprocessing algorithms are currently supported: algorithm="ACtree2" (the default) and algorithm="Twobit". With the "ACtree2" algorithm, all the oligonucleotides in the Trusted Band are stored in a 4-ary Aho-Corasick tree. With the "Twobit" algorithm, the 2-bit-per-letter signatures of all the oligonucleotides in the Trusted Band are computed and the mapping from these signatures to the 1-based position of the corresponding oligonucleotide in the Trusted Band is stored in a way that allows very fast lookup. Only PDict objects preprocessed with the "ACtree2" algo can then be used with matchPdict (and family) and with fixed="pattern" (instead of fixed=TRUE, the default), so that IUPAC ambiguity codes in the subject are treated as ambiguities. PDict objects obtained with the "Twobit" algo don't allow this. See ?`matchPDict-inexact` for more information about support of IUPAC ambiguity codes in the subject.
目前支持两种预处理算法:algorithm="ACtree2"(默认)和algorithm="Twobit"。随着"ACtree2"算法,信任的乐队中的所有寡核苷酸储存在4元阿霍有益扩充树。随着"Twobit"算法,信任的乐队中的所有寡核苷酸的2位的每一封信签名的计算和存储从这些签名映射到相应的寡核苷酸在信任带1基于位置在某种程度上,可以非常快速查找。仅PDict "ACtree2"算法可以用matchPdict(和家庭)和fixed="pattern"(代替fixed=TRUE,默认),使用,使IUPAC模糊预处理的对象主题中的代码被视为含糊。 "Twobit"算法获得PDict对象不允许。看到?matchPDict-inexactIUPAC模糊密码支持主题的更多信息。


值得信赖的乐队----------Trusted Band----------

What's a Trusted Band?
什么是可信的乐队吗?

A Trusted Band is a region defined in the original dictionary where the limitations described above will apply.
一个可信的乐队是在上述限制将适用于描述原字典中定义一个区域。

Why use a Trusted Band?
为什么要使用一个可信的乐队?

Because the limitations described above will apply to the Trusted Band only! For example the Trusted Band cannot contain IUPAC ambiguity codes but the "head" and the "tail" can (see below for what those are). Also with a Trusted Band, if matchPdict is called with a non-null max.mismatch value then mismatching letters will be allowed in the head and the tail. Or, if matchPdict is called with fixed="subject", then IUPAC ambiguity codes in the head and the tail will be treated as ambiguities.
由于上述限制将适用于信任的乐队只!例如,受信任的乐队可以不包含IUPAC模糊代码,但“头”和“尾巴”(见下文),这些是什么。也有一个可信的波段,如果matchPdict被称为一个非空max.mismatch值,然后不匹配的信件将被允许在头部和尾部。或者,matchPdict如果与呼吁fixed="subject",然后在头IUPAC模糊代码和尾巴会被视为治疗含糊。

How to specify a Trusted Band?
如何指定一个可信的乐队?

Use the tb.start, tb.end and tb.width arguments of the PDict constructor in order to specify a Trusted Band. This will divide each pattern in the original dictionary into three parts: a left part, a middle part and a right part. The middle part is defined by its starting and ending nucleotide positions given relatively to each pattern thru the tb.start, tb.end and tb.width arguments. It must have the same length for all patterns (this common length is called the width of the Trusted Band). The left and right parts are defined implicitely: they are the parts that remain before (prefix) and after (suffix) the middle part, respectively. Therefore three DNAStringSet objects result from this division: the first one is made of all the left parts and forms the head of the PDict object, the second one is made of all the middle parts and forms the Trusted Band of the PDict object, and the third one is made of all the right parts and forms the tail of the PDict object.
使用tb.start,tb.end和tb.widthPDict构造参数,以指定一个可信的乐队。这将分为原字典中的每个模式分为三个部分:左边部分,中间部分和右半部分。中间部分是定义每种模式通过tb.start,tb.end和tb.width参数相对其开始和结束的核苷酸位置。它必须具有相同的长度为所有模式(这个共同的长度被称为信任的波段宽度)。左边和右边的部分被定义implicitely:他们之前(前缀)和之后(后缀)的中间部分留的部分,分别为。因此三个DNAStringSet对象从这种划分的结果:第一个被所有的左边部分和形式的PDict对象的头,第二个是由所有的中间部位和形式的PDict对象信任的乐队,和第三个是所有正确的部位和形式的PDict对象的尾巴。

In other words you can think of the process of specifying a Trusted Band  as drawing 2 vertical lines on the original dictionary (note that these 2 lines are not necessarily straight lines but the horizontal space between them must be constant). When doing this, you are dividing the dictionary into three regions (from left to right): the head, the Trusted Band and the tail. Each of them is a DNAStringSet object with the same number of elements than the original dictionary and the original dictionary could easily be reconstructed from those three regions.
换句话说,你能想到的过程中指定信任的乐队,作为对原字典(注意,这些2线不一定是直的线,但它们之间的横向空间,必须不断)绘制垂直线2。这样做时,你分成三个区域(从左至右)字典:头,可信的乐队和尾巴。他们每个人都具有相同的元素数量比原来的字典和原来的字典,可以很容易地从上述三个区域的重建DNAStringSet对象。

The width of the Trusted Band must be >= 1 because Trusted Bands of width 0 are not supported.
信任的带的宽度必须> = 1,因为不支持带的宽度为0可信。

Finally note that calling PDict with tb.start=NA, tb.end=NA and tb.width=NA (the default) is equivalent to calling it with tb.start=1, tb.end=-1 and tb.width=NA, which results in a full-width Trusted Band i.e. a Trusted Band that covers the entire dictionary (no head and no tail).
最后注意的是调用PDicttb.start=NA,tb.end=NA和tb.width=NA(默认),相当于调用tb.start=1,tb.end=-1和 tb.width=NA,导致信任度全宽波段,即一个可信的波段,涵盖了整个字典(无头无尾)。


允许少数不匹配字母----------Allowing a small number of mismatching letters----------

[TODO]
[待办事项]


存取方法----------Accessor methods----------

In the code snippets below, x is a PDict object.
在下面的代码片段,x是PDict的对象。

length(x): The number of patterns in x.
length(x):图案x。

width(x): A vector of non-negative integers containing the number of letters for each pattern in x.
width(x):一个含有字母每个x格局的非负整数向量。

names(x): The names of the patterns in x.
names(x):x模式的名称。

head(x): The head of x or NULL if x has no head.
head(x):的x或NULL的头如果x有没有头。

tb(x): The Trusted Band defined on x.
tb(x):可信带定义x。

tb.width(x): The width of the Trusted Band defined on x. Note that, unlike width(tb(x)), this is a single integer. And because the Trusted Band has a constant width, tb.width(x) is in fact equivalent to unique(width(tb(x))), or to width(tb(x))[1].
tb.width(x):上x定义的信任带的宽度。请注意,不像width(tb(x)),这是一个整数。和信任的波段,因为有一个固定的宽度,tb.width(x)相当于unique(width(tb(x)))或width(tb(x))[1]的事实上。

tail(x): The tail of x or NULL if x has no tail.
tail(x):尾x或NULL如果x没有尾巴。


子集的方法----------Subsetting methods----------

In the code snippets below, x is a PDict object.
在下面的代码片段,x是PDict的对象。

x[[i]]: Extract the i-th pattern from x as a DNAString object.
x[[i]]:提取x作为DNAString对象的第i模式。


其他方法----------Other methods----------

In the code snippet below, x is a PDict object.
在下面的代码片段,x是PDict的对象。

duplicated(x): [TODO]
duplicated(x):[待办事项]

patternFrequency(x): [TODO]
patternFrequency(x):[待办事项]


作者(S)----------Author(s)----------


H. Pages



参考文献----------References----------

matching: An aid to bibliographic search". Communications of the ACM 18 (6): 333-340.

参见----------See Also----------

matchPDict, DNA_ALPHABET, IUPAC_CODE_MAP, DNAStringSet-class, XStringViews-class
matchPDict,DNA_ALPHABET,IUPAC_CODE_MAP,级DNAStringSet,XStringViews级


举例----------Examples----------


  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## A. NO HEAD AND NO TAIL (THE DEFAULT)[#A.无头无尾(默认)]
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  library(drosophila2probe)
  dict0 <- DNAStringSet(drosophila2probe)
  dict0                                # The original dictionary.[原字典。]
  length(dict0)                        # Hundreds of thousands of patterns.[数百上千种图案。]
  unique(nchar(dict0))                 # Patterns are 25-mers.[模式是25个碱基。]

  pdict0 &lt;- PDict(dict0)               # Store the original dictionary in[原字典中存储]
                                       # a PDict object (preprocessing).[1 PDict对象(预处理)。]
  pdict0
  class(pdict0)
  length(pdict0)                       # Same as length(dict0).[作为长度(dict0)相同。]
  tb.width(pdict0)                     # The width of the (implicit)[(隐)的宽度]
                                       # Trusted Band.[值得信赖的乐队。]
  sum(duplicated(pdict0))
  table(patternFrequency(pdict0))      # 9 patterns are repeated 3 times.[9模式重复3次。]
  pdict0[[1]]
  pdict0[[5]]

  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## B. NO HEAD AND A TAIL[#B.否头部和尾部。]
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  dict1 <- c("ACNG", "GT", "CGT", "AC")
  pdict1 <- PDict(dict1, tb.end=2)
  pdict1
  class(pdict1)
  length(pdict1)
  width(pdict1)
  head(pdict1)
  tb(pdict1)
  tb.width(pdict1)
  width(tb(pdict1))
  tail(pdict1)
  pdict1[[3]]

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-25 05:23 , Processed in 0.024971 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表