XStringSet-class(Biostrings)
XStringSet-class()所属R语言包:Biostrings
XStringSet objects
XStringSet对象
译者:生物统计家园网 机器人LoveR
描述----------Description----------
The BStringSet class is a container for storing a set of BString objects and for making its manipulation easy and efficient.
BStringSet类是一个用于存储一套BString对象,其操作简单,高效的容器。
Similarly, the DNAStringSet (or RNAStringSet, or AAStringSet) class is a container for storing a set of DNAString (or RNAString, or AAString) objects.
同样,DNAStringSet(或RNAStringSet,或AAStringSet的)类是一个用于存储一组DNAString(或RNAString或AAString)对象的容器。
All those containers derive directly (and with no additional slots) from the XStringSet virtual class.
所有这些容器派生直接从XStringSet虚拟类(与没有额外的插槽)。
用法----------Usage----------
## Constructors:
BStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE)
DNAStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE)
RNAStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE)
AAStringSet(x=character(), start=NA, end=NA, width=NA, use.names=TRUE)
## Accessor-like methods:
## S4 method for signature 'character'
width(x)
## S4 method for signature 'XStringSet'
nchar(x, type="chars", allowNA=FALSE)
## ... and more (see below)
参数----------Arguments----------
参数:x
Either a character vector (with no NAs), or an XString, XStringSet or XStringViews object.
特征向量(无NAS),或XString,XStringSet或XStringViews对象。
参数:start,end,width
Either NA, a single integer, or an integer vector of the same length as x specifying how x should be "narrowed" (see ?narrow for the details).
要么NA,一个单一的整数或整数向量相同长度的x指定如何x应该是“缩小”(见?narrow细节)。
参数:use.names
TRUE or FALSE. Should names be preserved?
TRUE或FALSE。应名被保留吗?
参数:type,allowNA
Ignored.
忽略。
Details
详情----------Details----------
The BStringSet, DNAStringSet, RNAStringSet and AAStringSet functions are constructors that can be used to turn input x into an XStringSet object of the desired base type.
BStringSet,DNAStringSet,RNAStringSet和AAStringSet功能是可以用来打开输入xXStringSet到所需的基本类型对象的构造。
They also allow the user to "narrow" the sequences contained in x via proper use of the start, end and/or width arguments. In this context, "narrowing" means dropping a prefix or/and a suffix of each sequence in x. The "narrowing" capabilities of these constructors can be illustrated by the following property: if x is a character vector (with no NAs), or an XStringSet (or XStringViews) object, then the 3 following transformations are equivalent:
它们还允许用户以“窄”在x通过start,end和/或width论据的正确使用所包含的序列。在此背景下,“缩小”是指删除前缀和/或每个序列的后缀x。 “缩小”功能,可以说明这些构造由下列财产:如果x是一个特征向量(无NAS),或一个XStringSet(或XStringViews)对象,然后以下3个转变是等价的:
BStringSet(x, start=mystart, end=myend, width=mywidth)
BStringSet(x, start=mystart, end=myend, width=mywidth)
subseq(BStringSet(x), start=mystart, end=myend, width=mywidth)
subseq(BStringSet(x), start=mystart, end=myend, width=mywidth)
BStringSet(subseq(x, start=mystart, end=myend, width=mywidth))
BStringSet(subseq(x, start=mystart, end=myend, width=mywidth))
Note that, besides being more convenient, the first form is also more efficient on character vectors.
请注意,除了第一种形式是更方便,也更有效的字符向量。
像存取,方法----------Accessor-like methods----------
In the code snippets below, x is an XStringSet object.
在下面的代码片段,x是XStringSet的对象。
length(x): The number of sequences in x.
length(x):序列x的数目。
width(x): A vector of non-negative integers containing the number of letters for each element in x. Note that width(x) is also defined for a character vector with no NAs and is equivalent to nchar(x, type="bytes").
width(x):包含在x每个元素的字母数量的非负整数向量。注意width(x)也被定义为特征向量没有NAS相当于nchar(x, type="bytes")。
names(x): NULL or a character vector of the same length as x containing a short user-provided description or comment for each element in x. These are the only data in an XStringSet object that can safely be changed by the user. All the other data are immutable! As a general recommendation, the user should never try to modify an object by accessing its slots directly.
names(x):NULL或x包含用户提供的在x每个元素的简短说明或评论的长度相同的特征向量。这是唯一的数据,用户可以安全地通过改变在XStringSet对象。所有其他数据是不可改变的!作为一般建议,用户不应该试图修改一个对象,通过直接访问其插槽。
alphabet(x): Return NULL, DNA_ALPHABET, RNA_ALPHABET or AA_ALPHABET depending on whether x is a BStringSet, DNAStringSet, RNAStringSet or AAStringSet object.
alphabet(x):返回NULL,DNA_ALPHABET,RNA_ALPHABET或AA_ALPHABET上取决于是否x一个BStringSet,DNAStringSet,RNAStringSet或AAStringSet对象。
nchar(x): The same as width(x).
nchar(x):width(x)。
序列提取和相关的转换----------Subsequence extraction and related transformations----------
In the code snippets below, x is a character vector (with no NAs), or an XStringSet (or XStringViews) object.
在下面的代码片段,x是一个特征向量(无NAS),或一个XStringSet(或XStringViews)对象。
subseq(x, start=NA, end=NA, width=NA): Applies subseq on each element in x. See ?subseq for the details.
subseq(x, start=NA, end=NA, width=NA):适用于subseq的每个x的元素。看到?subseq细节。
Note that this is similar to what substr does on a character vector. However there are some noticeable differences:
请注意,这是类似于substr字符向量。但也有一些明显的差异:
(1) the arguments are start and stop for substr;
(1)参数start和stopsubstr;
(2) the SEW interface (start/end/width) interface of subseq is richer (e.g. support for negative start or end values); and (3) subseq checks that the specified start/end/width values are valid i.e., unlike substr, it throws an error if they define "out of limits" subsequences or subsequences with a negative width.
(2)缝界面(开始/结束/宽)subseq的接口更丰富(例如支持负的开始或结束值);(3)subseq的检查指定的开始/结束/宽度值是有效的,即不像substr,它抛出一个错误,如果他们定义“超限”子序列或子序列与负的宽度。
narrow(x, start=NA, end=NA, width=NA, use.names=TRUE): Same as subseq. The only differences are: (1) narrow has a use.names argument; and (2) all the things narrow and subseq work on (IRanges, XStringSet or XStringViews objects for narrow, XVector or XStringSet objects for subseq). But they both work and do the same thing on an XStringSet object.
narrow(x, start=NA, end=NA, width=NA, use.names=TRUE):同为subseq。唯一的区别是:(1)narrowuse.names参数;(2)所有的东西narrow和subseq工作(IRanges,XStringSet或XStringViews对象narrow,subseq)XVector或XStringSet对象。但他们工作和上XStringSet对象做同样的事情。
threebands(x, start=NA, end=NA, width=NA): Like the method for IRanges objects, the threebands methods for character vectors and XStringSet objects extend the capability of narrow by returning the 3 set of subsequences (the left, middle and right subsequences) associated to the narrowing operation. See ?threebands in the IRanges package for the details.
threebands(x, start=NA, end=NA, width=NA):为IRanges对象的方法一样,threebands为特征向量和XStringSet对象的方法延长narrow返回的子序列3组(左,中,右子序列的能力, )相关缩小操作。看到?threebands在的细节IRanges包。
subseq(x, start=NA, end=NA, width=NA) <- value: A vectorized version of the subseq<- method for XVector objects. See ?`subseq<-` for the details.
subseq(x, start=NA, end=NA, width=NA) <- value:矢量版本的subseq<-为XVector对象方法。看到?subseq<-细节。
子集和追加----------Subsetting and appending----------
In the code snippets below, x and values are XStringSet objects, and i should be an index specifying the elements to extract.
在下面的代码片段,x和values是XStringSet的对象,i应该是一个指定的元素提取的指数。
x[i]: Return a new XStringSet object made of the selected elements.
x[i]:返回新XStringSet的对象选定的元素。
x[[i]]: Extract the i-th XString object from x.
x[[i]]XString:提取第ix对象。
append(x, values, after=length(x)): Add sequences in values to x.
append(x, values, after=length(x)):添加序列在valuesx。
订购和相关的方法----------Ordering and related methods----------
In the code snippets below, x is an XStringSet object.
在下面的代码片段,x是XStringSet的对象。
is.unsorted(x, strictly=FALSE): Return a logical values specifying if x is unsorted. The strictly argument takes logical value indicating if the check should be for _strictly_ increasing values.
is.unsorted(x, strictly=FALSE):返回一个逻辑值,指定如果x无序。 strictly参数逻辑值,指出如果检查应该为_strictly_增加值。
order(x): Return a permutation which rearranges x into ascending or descending order.
order(x):返回一个排列,重新排列x到升序或降序排列。
sort(x): Sort x into ascending order (equivalent to x[order(x)]).
sort(x):排序x到升序(相当于x[order(x)])。
rank(x): Rank x in ascending order.
rank(x):等级x升序排列。
复制和独特的方法----------Duplicated and unique methods----------
In the code snippets below, x is an XStringSet object.
在下面的代码片段,x是XStringSet的对象。
duplicated(x): Return a logical vector whose elements denotes duplicates in x.
duplicated(x):返回一个逻辑向量,其元素表示在x重复。
unique(x): Return an XStringSet containing the unique values in x.
unique(x):返回XStringSet的包含在x独特的价值观。
设置操作----------Set operations----------
In the code snippets below, x and y are XStringSet objects
在下面的代码片段,x和y是XStringSet的对象
union(x, y, ...): Union of x and y.
union(x, y, ...):x和y联盟。
intersect(x, y, ...): Intersection of x and y.
intersect(x, y, ...):x和y交叉口。
setdiff(x, y, ...): Asymmetric set difference of x and y.
setdiff(x, y, ...)设置x和y:不对称差异。
setequal(x, y): Set equality of x to y.
setequal(x, y):设置x平等y。
相同的值匹配----------Identical value matching----------
In the code snippets below, x is a character vector, XString, or XStringSet object and table is an XStringSet object.
在下面的代码片段,x是一个特征向量,XString,或XStringSet对象table是XStringSet的对象。
x %in% table: Returns a logical vector indicating which elements in x match identically with an element in table.
x %in% table:返回一个逻辑向量,说明这x匹配与table元素相同的元素。
match(x, table, nomatch = NA_integer_, incomparables = NULL): Returns an integer vector containing the first positions of an identical match in table for the elements in x.
match(x, table, nomatch = NA_integer_, incomparables = NULL):返回一个整数向量包含一个相同的比赛在第一的位置tablex元素。
其他方法----------Other methods----------
In the code snippets below, x is an XStringSet object.
在下面的代码片段,x是XStringSet的对象。
unlist(x): Turns x into an XString object by combining the sequences in x together. Fast equivalent to do.call(c, as.list(x)).
unlist(x):打开x到由x一起结合序列XString对象。快速相当于do.call(c, as.list(x))。
as.character(x, use.names): Convert x to a character vector of the same length as x. use.names controls whether or not names(x) should be used to set the names of the returned vector (default is TRUE).
as.character(x, use.names):转换x为x相同长度的特征向量。 use.names控制是否names(x)应该用来设置返回向量的名称(默认为TRUE)。
as.matrix(x, use.names): Return a character matrix containing the "exploded" representation of the strings. This can only be used on an XStringSet object with equal-width strings. use.names controls whether or not names(x) should be used to set the row names of the returned matrix (default is TRUE).
as.matrix(x, use.names):返回一个含有“爆炸”的字符串表示的字符点阵。这只能用于XStringSet宽度相等的字符串对象。 use.names控制是否names(x)应该用来设置返回矩阵的行名(默认是TRUE)。
toString(x): Equivalent to toString(as.character(x)).
toString(x):toString(as.character(x))等效。
作者(S)----------Author(s)----------
H. Pages
参见----------See Also----------
XString-class, XStringViews-class, XStringSetList-class, subseq, narrow, substr, compact, XVectorList-class
-类XString,级XStringViews,级XStringSetList,subseq,narrow,substr,compact,XVectorList级
举例----------Examples----------
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## A. USING THE XStringSet CONSTRUCTORS ON A CHARACTER VECTOR OR FACTOR[#A.使用XStringSet构造一个特征向量或因素]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## Note that there is no XStringSet() constructor, but an XStringSet[#请注意,有没有XStringSet()构造函数,但1 XStringSet]
## family of constructors: BStringSet(), DNAStringSet(), RNAStringSet(),[#构造函数的家庭:BStringSet()中,DNAStringSet()中,RNAStringSet()]
## etc...[等等。]
x0 <- c("#CTC-NACCAGTAT", "#TTGA", "TACCTAGAG")[CTC-NACCAGTAT“,#TTGA”,“TACCTAGAG”)]
width(x0)
x1 <- BStringSet(x0)
x1
## 3 equivalent ways to obtain the same BStringSet object:[#3等效方法来获取相同BStringSet对象:]
BStringSet(x0, start=4, end=-3)
subseq(x1, start=4, end=-3)
BStringSet(subseq(x0, start=4, end=-3))
dna0 <- DNAStringSet(x0, start=4, end=-3)
dna0
names(dna0)
names(dna0)[2] <- "seqB"
dna0
## When the input vector contains a lot of duplicates, turning it into[#当输入向量包含了很多重复的,把它改为]
## a factor first before passing it to the constructor will produce an[#之前将它传递给构造函数的第一因素将产生]
## XStringSet object that is more compact in memory:[#XStringSet对象是在内存更紧凑:]
library(hgu95av2probe)
x2 <- sample(hgu95av2probe$sequence, 999000, replace=TRUE)
dna2a <- DNAStringSet(x2)
dna2b <- DNAStringSet(factor(x2)) # slower but result is more compact[速度较慢,但结果是更紧凑]
object.size(dna2a)
object.size(dna2b)
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## B. USING THE XStringSet CONSTRUCTORS ON A SINGLE SEQUENCE (XString[#B.使用一个序列XStringSet构造(XString]
## OBJECT OR CHARACTER STRING)[#对象或字符串)]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
x3 <- "abcdefghij"
BStringSet(x3, start=2, end=6:2) # behaves like 'substring(x3, 2, 6:2)'[行为像“串(X3,2,6:2)]
BStringSet(x3, start=-(1:6))
x4 <- BString(x3)
BStringSet(x4, end=-(1:6), width=3)
## Randomly extract 1 million 40-mers from C. elegans chrI:[#随机抽取1亿4000个碱基从线虫chrI:]
extractRandomReads <- function(subject, nread, readlength)
{
if (!is.integer(readlength))
readlength <- as.integer(readlength)
start <- sample(length(subject) - readlength + 1L, nread,
replace=TRUE)
DNAStringSet(subject, start=start, width=readlength)
}
library(BSgenome.Celegans.UCSC.ce2)
rndreads <- extractRandomReads(Celegans$chrI, 1000000, 40)
## Notes:[#注:]
## - This takes only 2 or 3 seconds versus several hours for a solution[ - 这需要对解决方案的几个小时只有2或3秒]
## using substring() on a standard character string.[#使用标准的字符串substring()方法。]
## - The short sequences in 'rndreads' can be seen as the result of a[# - rndreads“的短序列可以看作是一个结果]
## simulated high-throughput sequencing experiment. A non-realistic[#模拟的高通量测序实验。非逼真]
## one though because:[#1虽然这是因为:]
## (a) It assumes that the underlying technology is perfect (the[(一)它假定的底层技术是完美的(]
## generated reads have no technology induced errors).[#生成的内容有没有技术引起的误差)。]
## (b) It assumes that the sequenced genome is exactly the same as the[(二)它假定序列的基因是完全一样的相同]
## reference genome.[#参考基因组。]
## (c) The simulated reads can contain IUPAC ambiguity letters only[仅#(三)模拟读取可以包含IUPAC模糊信]
## because the reference genome contains them. In a real[#因为参考基因组包含其中。在一个真正的]
## high-throughput sequencing experiment, the sequenced genome[#高通量测序的基因组测序实验,]
## of course doesn't contain those letters, but the sequencer[当然#不包含这些信件,但定序]
## can introduce them in the generated reads to indicate ambiguous[#可以介绍他们在生成的读取指示暧昧]
## base-calling.[#基通话。]
## (d) The simulated reads come from the plus strand only of a single[#(四)模拟读取来自只有一个单一的加链]
## chromosome.[#染色体。]
## - See the getSeq() function in the BSgenome package for how to[ - 如何见在BSgenome包的getSeq()函数]
## circumvent (d) i.e. how to generate reads that come from the whole[#规避(D)即如何生成的内容,从整体来]
## genome (plus and minus strands of all chromosomes).[#基因组(所有染色体的加号和减号股)。]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## C. USING THE XStringSet CONSTRUCTORS ON AN XStringSet OBJECT[#C.使用上XStringSet对象的XStringSet构造]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
library(drosophila2probe)
probes <- DNAStringSet(drosophila2probe)
probes
RNAStringSet(probes, start=2, end=-5) # does NOT copy the sequence data![不复制序列数据!]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## D. USING subseq() ON AN XStringSet OBJECT[#D.使用上XStringSet对象subseq()]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
subseq(probes, start=2, end=-5)
subseq(probes, start=13, end=13) <- "N"
probes
## Add/remove a prefix:[#添加/删除一个前缀:]
subseq(probes, start=1, end=0) <- "--"
probes
subseq(probes, end=2) <- ""
probes
## Do more complicated things:[#更复杂的事情:]
subseq(probes, start=4:7, end=7) <- c("YYYY", "YYY", "YY", "Y")
subseq(probes, start=4, end=6) <- subseq(probes, start=-2:-5)
probes
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## E. UNLISTING AN XStringSet OBJECT[#大肠杆菌UNLISTING安XStringSet对象,]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
library(drosophila2probe)
probes <- DNAStringSet(drosophila2probe)
unlist(probes)
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## F. COMPACTING AN XStringSet OBJECT[#楼密实XStringSet对象]
## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
## As a particular type of XVectorList objects, XStringSet objects can[#作为一个XVectorList对象的特定类型,XStringSet对象]
## eventually be compacted. Compacting is done typically before[#最终被压缩。压缩通常前完成]
## serialization. See ?compact for more information.[#序列化。看到了什么?紧凑更多信息。]
library(drosophila2probe)
probes <- DNAStringSet(drosophila2probe)
y <- subseq(probes[1:12], start=5)
probes@pool
y@pool
object.size(probes)
object.size(y)
y0 <- compact(y)
y0@pool
object.size(y0)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|