找回密码
 注册
查看: 983|回复: 0

R语言 Biostrings包 XStringSet-io()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-25 13:51:31 | 显示全部楼层 |阅读模式
XStringSet-io(Biostrings)
XStringSet-io()所属R语言包:Biostrings

                                        Read/write an XStringSet object from/to a file
                                         读/写/文件XStringSet对象

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Functions to read/write an XStringSet object from/to a file.
功能为读/写/文件XStringSet对象。


用法----------Usage----------


## Read FASTA (or FASTQ) files in an XStringSet object:
read.BStringSet(filepath, format="fasta",
                nrec=-1L, skip=0L, use.names=TRUE)
read.DNAStringSet(filepath, format="fasta",
                  nrec=-1L, skip=0L, use.names=TRUE)
read.RNAStringSet(filepath, format="fasta",
                  nrec=-1L, skip=0L, use.names=TRUE)
read.AAStringSet(filepath, format="fasta",
                 nrec=-1L, skip=0L, use.names=TRUE)

## Extract basic information about FASTA (or FASTQ) files
## without loading them:
fasta.info(filepath, nrec=-1L, skip=0L, use.names=TRUE)
fastq.geometry(filepath, nrec=-1L, skip=0L)

## Write an XStringSet object to a FASTA (or FASTQ) file:
write.XStringSet(x, filepath, append=FALSE, format="fasta", ...)

## Serialize an XStringSet object:
save.XStringSet(x, objname, dirpath=".", save.dups=FALSE, verbose=TRUE)



参数----------Arguments----------

参数:filepath
A character vector (of arbitrary length when reading, of length 1 when writing) containing the path(s) to the file(s) to read or write. Note that special values like "" or "|cmd" (typically supported by other I/O functions in R) are not supported here. Also filepath cannot be a connection.  
一个特征向量(阅读时的任意长度,长度为1的写入时),其中包含文件的路径(S)(S)来读取或写入。请注意,这里不支持像""或"|cmd"(通常由其他I / O在R的功能支持)的特殊值。 filepath不能成为一个连接。


参数:format
Either "fasta" (the default) or "fastq".  
要么"fasta"(默认)或"fastq"。


参数:nrec
Single integer. The maximum of number of records to read in. Negative values are ignored.  
单个整数。被忽略的最大数量的记录读入负值。


参数:skip
Single non-negative integer. The number of records of the data file(s) to skip before beginning to read in records.  
一个非负整数。记录的数据文件(S),跳过前开始阅读记录。


参数:use.names
Should the returned vector be named? For FASTA the names are taken from the record description lines. For FASTQ they are taken from the record sequence ids. Dropping the names can help reducing memory footprint e.g. for a FASTQ file containing millions of reads.  
应该返回的向量被命名为?为FASTA格式的名字取自记录说明行。为FASTQ,他们采取从记录序列的IDS。删除的名称,可以帮助减少内存占用,例如为FASTQ文件,其中包含以百万计的读取。


参数:x
For write.XStringSet, the object to write to file.  For save.XStringSet, the object to serialize.  
write.XStringSet,写file对象。对于save.XStringSet,对象序列化。


参数:append
TRUE or FALSE. If TRUE output will be appended to file; otherwise, it will overwrite the contents of file. See ?cat for the details.  
TRUE或FALSE。如果TRUE输出将被附加到file,否则,它会覆盖file内容。看到?cat细节。


参数:...
Further format-specific arguments. If format="fasta", the width argument (single integer) can be used to specify the maximum number of letters per line of sequence. If format="fastq", the qualities argument (BStringSet object) can be used to specify the qualities. If the qualities are omitted, then the fake quality ';' is assigned to each letter in x and written to the file.  
进一步格式的具体参数。如果format="fasta",width参数(单个整数)可用于指定的字母顺序每行的最大数量。如果format="fastq",qualities参数(BStringSet对象),可用于指定的质量。如果素质都被省略,那么假的质量“;”被分配到每个字母x和书面文件。


参数:objname
The name of the serialized object.  
序列化的对象的名称。


参数:dirpath
The path to the directory where to save the serialized object.  
目录路径保存序列化的对象。


参数:save.dups
TRUE or FALSE. If TRUE then the Dups object describing  how duplicated elements in x are related to each other is saved too. For advanced users only.  
TRUE或FALSE。如果TRUE然后Dups对象描述如何复制x是相互关联的元素被保存过。只适用于高级用户。


参数:verbose
TRUE or FALSE.  
TRUE或FALSE。


Details

详情----------Details----------

Only FASTA and FASTQ files are supported for now. The qualities stored in the FASTQ records are ignored.
现在只有FASTA和FASTQ文件的支持。忽略存储在FASTQ记录的素质。

Reading functions read.BStringSet, read.DNAStringSet, read.RNAStringSet and read.AAStringSet load sequences from an input file (or set of input files) into an XStringSet object. When multiple input files are specified, they are read in the corresponding order and their data are stored in the returned object in that order. Note that when multiple input FASTQ files are specified, all must have the same "width" (i.e. all their sequences must have the same length).
阅读功能read.BStringSet,read.DNAStringSet,read.RNAStringSet和read.AAStringSet负载从输入文件(或输入文件)到XStringSet对象序列。当指定多个输入文件,读取相应的命令和他们的数据存储在该命令返回的对象。请注意,指定多个输入FASTQ文件时,都必须有相同的“宽”(即它们的DNA序列必须具有相同的长度)。

The fasta.info utility returns an integer vector with one element per FASTA record in the input files. Each element is the length of the sequence found in the corresponding record.
fasta.info实用FASTA格式记录的每一个元素返回一个整数向量,在输入文件。每个元素是找到相应的记录序列的长度。

The fastq.geometry utility returns an integer vector describing the "geometry" of the FASTQ files i.e. a vector of length 2 where the first element is the total number of FASTQ records in the files and the second element the common "width" of these files (this width is NA if the files contain no FASTQ records or records with different widths).
fastq.geometry实用程序返回一个整数向量描述,即一个长度为2的第一要素是FASTQ记录总数的文件和第二个元素,其中常见的“宽度”向量“几何”的FASTQ文件这些文件(此宽度是NA如果文件包含不同的宽度没有FASTQ记录或记录)。

write.XStringSet writes an XStringSet object to a file. WARNING: Please be aware that using write.XStringSet on a BStringSet object that contains the '\n' (LF) or '\r' (CR) characters or the FASTA markup characters '>' or ';' is almost guaranteed to produce a broken FASTA file!
write.XStringSet写了一个文件XStringSet的对象。警告:请注意,使用write.XStringSet上BStringSet对象,它包含\ N(LF)或\ R(CR)的字符或FASTA格式标记字符>或;几乎是保证产生一个破碎的FASTA格式的文件!

Serializing an XStringSet object with save.XStringSet is equivalent to using the standard save mechanism. But it will try to reduce the size of x in memory first before calling save. Most of the times this leads to a much reduced size on disk.
序列化用save.XStringSet XStringSet的对象是相当于使用标准save机制。但它会尽量减少之前调用xsave先在内存的大小。大部分的时间这将导致大大减少磁盘上的大小。


参见----------See Also----------

readFASTA, writeFASTA, XStringSet-class, BString-class, DNAString-class, RNAString-class, AAString-class
readFASTA,writeFASTA,XStringSet-类级BString,级DNAString,级RNAString,AAString级


举例----------Examples----------


  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## A. READ/WRITE FASTA FILES[#READ / WRITE FASTA格式档案]
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  filepath <- system.file("extdata", "someORF.fa", package="Biostrings")
  fasta.info(filepath)
  x <- read.DNAStringSet(filepath)
  x
  out1 <- tempfile()
  write.XStringSet(x, out1)

  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## B. READ/WRITE FASTQ FILES[#二读/写FASTQ的文件]
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  filepath <- system.file("extdata", "s_1_sequence.txt",
                          package="Biostrings")
  fastq.geometry(filepath)
  read.DNAStringSet(filepath, format="fastq")

  library(BSgenome.Celegans.UCSC.ce2)
  ## Create a "sliding window" on chr I:[#创建对CHR我的“滑动窗口”:]
  sw_start <- seq.int(1, length(Celegans$chrI)-50, by=50)
  sw <- Views(Celegans$chrI, start=sw_start, width=10)
  my_fake_shortreads <- as(sw, "XStringSet")
  my_fake_ids <- sprintf("ID%06d",  seq_len(length(my_fake_shortreads)))
  names(my_fake_shortreads) <- my_fake_ids
  my_fake_shortreads

  ## Fake quality ';' will be assigned to each base in 'my_fake_shortreads':[#假的质量“;”将被分配到每个“my_fake_shortreads碱基:]
  out2 <- tempfile()
  write.XStringSet(my_fake_shortreads, out2, format="fastq")

  ## Passing qualities thru the 'qualities' argument:[#传递素质直通素质“的说法:]
  my_fake_quals <- rep.int(BStringSet("DCBA@?>=<;"),
                           length(my_fake_shortreads))
  my_fake_quals
  out3 <- tempfile()
  write.XStringSet(my_fake_shortreads, out3, format="fastq",
                   qualities=my_fake_quals)

  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  ## C. SERIALIZATION[#C。序列]
  ## ---------------------------------------------------------------------[#------------------------------------------------- --------------------]
  save.XStringSet(my_fake_shortreads, "my_fake_shortreads", dirpath=tempdir())

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-25 04:26 , Processed in 0.028399 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表