R语言 seqinr包 read.fasta()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-30 01:23:01

read.fasta(seqinr)
read.fasta()所属R语言包：seqinr

                                       read FASTA formatted files
                                       阅读FASTA格式的文件

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Read nucleic or amino-acid sequences from a file in FASTA format.
阅读从FASTA格式中的一个文件的核酸或氨基酸序列。

用法----------Usage----------

read.fasta(file = system.file("sequences/ct.fasta", package = "seqinr"),
  seqtype = c("DNA", "AA"), as.string = FALSE, forceDNAtolower = TRUE,
  set.attributes = TRUE, legacy.mode = TRUE, seqonly = FALSE, strip.desc = FALSE,
  bfa = FALSE, sizeof.longlong = .Machine$sizeof.longlong,
  endian = .Platform$endian, apply.mask = TRUE)

参数----------Arguments----------

参数：file
The name of the file which the sequences in fasta format are to be  read from. If it does not contain an absolute or relative path, the file name is relative  to the current working directory, getwd. The default here is to read the ct.fasta file which is present in the sequences folder of the seqinR package.
序列fasta格式的格式的文件，该文件是要读取的从的名称。如果它不包含绝对或相对路径，文件名是相对于当前的工作目录，getwd。这里的默认值是读ct.fasta文件，这是目前在sequences文件夹的seqinR包。

参数：seqtype
the nature of the sequence: DNA or AA, defaulting to DNA
自然的顺序：DNA或AA，默认为DNA

参数：as.string
if TRUE sequences are returned as a string instead of a vector of single characters
如果返回的是一个字符串，而不是单个字符的向量TRUE序列

参数：forceDNAtolower
whether sequences with seqtype == "DNA" should be returned as lower case letters
是否应该返回小写字母序列seqtype == "DNA"

参数：set.attributes
whether sequence attributes should be set
是否应设置序列属性

参数：legacy.mode
if TRUE lines starting with a semicolon ';' are ignored
被忽略，如果真行开始的分号“;”

参数：seqonly
if TRUE, only sequences as returned without attempt to modify them or to get their names and annotations (execution time is divided approximately by a factor 3)
如果返回TRUE，只有序列没有试图修改他们或他们的名字和注释（执行时间大约分成了3倍）

参数：strip.desc
if TRUE the '>' at the beginning of the description lines is removed in the annotations of the sequences
如果为TRUE>的说明线的开头的序列的注解中除去

参数：bfa
logical. If TRUE the fasta file is in MAQ binary format (see details).  Only for DNA sequences.
逻辑。如果为true，fasta格式的文件是“MAQ二进制格式（见详情）。仅适用于DNA序列。

参数：sizeof.longlong
the number of bytes in a C long long type. Only relevant for bfa = TRUE. See .Machine
在Clong long类型的字节数。只有相关的bfa = TRUE。见.Machine

参数：endian
character string, "big" or "little", giving the  endianness of the processor in use. Only relevant for bfa = TRUE. See .Platform
字符串，"big"或"little"，给人使用的处理器的字节序。只有相关的bfa = TRUE。见.Platform

参数：apply.mask
logical defaulting to TRUE. Only relevant for  bfa = TRUE. When this flag is TRUE the mask in the MAQ binary format is used to replace non acgt characters in the sequence by the n character. For pure acgt sequences (without gaps or ambiguous bases) turning this to FALSE will save time.
逻辑违约TRUE。只有相关的bfa = TRUE。当这个标志是TRUE的的MAQ二进制格式的面具是用来取代非ACGT序列中的字符n个字符。把这个FALSE的的纯的ACGT序列，（无间隙或含糊不清的碱基），将节省时间。

Details

详细信息----------Details----------

FASTA is a widely used format in biology, some FASTA files are distributed with the seqinr package, see the examples section below. Sequence in FASTA format begins with a single-line  description (distinguished by a greater-than '>' symbol), followed  by sequence data on the next lines. Lines starting by a semicolon ';' are ignored, as in the original FASTA program (Pearson and Lipman 1988). The sequence name is just after the '>' up to the next space ' ' character, trailling infos are ignored for the name but saved in the annotations.
FASTA是在生物学中广泛使用的格式，一些FASTA文件分发的seqinr包，请参阅下面的示例一节。 FASTA格式的序列中开始一个单行的描述（区分一个大于号“>”符号），然后下一行的序列数据。开头的行由分号“;”被忽略，就像在原来的的FASTA计划（Pearson和Lipman，1988）。序列的名字是刚刚结束的“>”到下一个空格字符“”，trailling的相关信息将被忽略的名字，但在注释中保存。

The MAQ fasta binary format was introduced in seqinR 1.1-7 and has not been extensively tested. This format is used in the MAQ (Mapping and  Assembly with Qualities) software (http://maq.sourceforge.net/). In this format the four nucleotides are coded with two bits and the sequence is stored as a vector of C unsigned long long. There is in addition a mask to locate non-acgt characters.
MAQ FASTA二进制格式1.1-7在seqinR推出的，并没有被广泛的测试。此格式用于在MAQ（与个性的定位和大会）的软件（http://maq.sourceforge.net/）。在此格式中的四个核苷酸编码具有两个比特和序列被存储为一个向量的Cunsigned long long。这里面除了口罩，以找出非ACGT字符。

值----------Value----------

By default read.fasta return a list of vector of chars. Each element  is a sequence object of the class SeqFastadna or SeqFastaAA.
默认情况下read.fasta返回的字符向量的列表。每个元素都是一个序列对象的类SeqFastadna或SeqFastaAA。

注意----------Note----------

The old argument File that was deprecated since seqinR >= 1.1-3 is no more valid since seqinR >= 2.0-6. Just use file instead.
旧观点File被废弃了，因为seqinR> = 1.1-3没有更有效的，因为seqinR> = 2.0-6。只需使用file代替。

（作者）----------Author(s)----------

D. Charif, J.R. Lobry

参考文献----------References----------

sequence comparison. Proceedings of the National Academy  of Sciences of the United States of America, 85:2444-2448

参见----------See Also----------

write.fasta to write sequences in a FASTA file,  gb2fasta to convert a GenBank file into a FASTA file, read.alignment to read aligned sequences, reverse.align to get an alignment at the nucleic level from the
write.fasta写序列的FASTA文件，gb2fasta一个序列文件转换成FASTA文件，read.alignment读序列，reverse.align得到对齐的核酸从水平

实例----------Examples----------

#[]
# Simple sanity check with a small FASTA file:[简单的合理性检查与小FASTA文件：]
#[]
  smallFastaFile <- system.file("sequences/smallAA.fasta", package = "seqinr")
  mySmallProtein <- read.fasta(file = smallFastaFile, as.string = TRUE, seqtype = "AA")[[1]]
  stopifnot(mySmallProtein == "SEQINRSEQINRSEQINRSEQINR*")
#[]
# Example of a DNA file in FASTA format:[的DNA在FASTA格式的文件的例子：]
#[]
  dnafile <- system.file("sequences/malM.fasta", package = "seqinr")
#[]
# Read with defaults arguments, looks like:[阅读与默认参数，看起来如下：]
#[]
# $XYLEECOM.MALM[$ XYLEECOM.MALM]
# [1] "a" "t" "g" "a" "a" "a" "a" "t" "g" "a" "a" "t" "a" "a" "a" "a" "g" "t"[[1]“一”“T”，“G”，“A”“A”“一”“一”“T”“G”“”“”“T”“A”“A”“一”“一“”G“”T“]
# ...[...]
  read.fasta(file = dnafile)
#[]
# The same but do not turn the sequence into a vector of single characters, looks like:[相同的，但不转成一个向量的单个字符的序列，看起来如下：]
#[]
# $XYLEECOM.MALM[$ XYLEECOM.MALM]
# [1] "atgaaaatgaataaaagtctcatcgtcctctgtttatcagcagggttactggcaagcgc [[1]“atgaaaatgaataaaagtctcatcgtcctctgtttatcagcagggttactggcaagcgc]
# ...[...]
  read.fasta(file = dnafile, as.string = TRUE)
#[]
# The same but do not force lower case letters, looks like:[相同的，但不要强迫小写字母，看起来如下：]
#[]
# $XYLEECOM.MALM[$ XYLEECOM.MALM]
# [1] "ATGAAAATGAATAAAAGTCTCATCGTCCTCTGTTTATCAGCAGGGTTACTGGCAAGC[[1]“ATGAAAATGAATAAAAGTCTCATCGTCCTCTGTTTATCAGCAGGGTTACTGGCAAGC]
# ...[...]
  read.fasta(file = dnafile, as.string = TRUE, forceDNAtolower = FALSE)
#[]
# Example of a protein file in FASTA format:[实施例的蛋白质的FASTA格式的文件中：]
#[]
  aafile <- system.file("sequences/seqAA.fasta", package = "seqinr")
#[]
# Read the protein sequence file, looks like:[蛋白质序列文件，看起来像这样：]
#[]
# $A06852[$ A06852]
# [1] "M" "P" "R" "L" "F" "S" "Y" "L" "L" "G" "V" "W" "L" "L" "L" "S" "Q" "L"[[1]“M”，“P”，“R”“L”，“F”，“S”“Y”“L”“L”“G”“V”“W”“L”“L”“L”“S “，”Q“，”L“的]
# ...[...]
  read.fasta(aafile, seqtype = "AA")
#[]
# The same, but as string and without attributes, looks like:[相同，但作为字符串和不带属性的，看起来如下：]
#[]
# $A06852[$ A06852]
# [1] "MPRLFSYLLGVWLLLSQLPREIPGQSTNDFIKACGRELVRLWVEICGSVSWGRTALSLEEP[[1]“MPRLFSYLLGVWLLLSQLPREIPGQSTNDFIKACGRELVRLWVEICGSVSWGRTALSLEEP]
# QLETGPPAETMPSSITKDAEILKMMLEFVPNLPQELKATLSERQPSLRELQQSASKDSNLNFEEFK[QLETGPPAETMPSSITKDAEILKMMLEFVPNLPQELKATLSERQPSLRELQQSASKDSNLNFEEFK]
# KIILNRQNEAEDKSLLELKNLGLDKHSRKKRLFRMTLSEKCCQVGCIRKDIARLC*"[KIILNRQNEAEDKSLLELKNLGLDKHSRKKRLFRMTLSEKCCQVGCIRKDIARLC *“]
#[]
  read.fasta(aafile, seqtype = "AA", as.string = TRUE, set.attributes = FALSE)
#[]
# Example with a FASTA file that contains comment lines starting with[例如使用一个FASTA文件，其中包含注释行开始]
# a semicolon character ';'[分号字符“;”]
#[]
  legacyfile <- system.file("sequences/legacy.fasta", package = "seqinr")
  legacyseq <- read.fasta(file = legacyfile, as.string = TRUE)
  stopifnot( nchar(legacyseq) == 921 )
#[]
# Example of a MAQ binary fasta file produced with maq fasta2bfa ct.fasta ct.bfa[例如，一个的MAQ二进制fasta格式的文件制作与MAQ fasta2bfa的ct.fasta ct.bfa]
# on a platform where .Platform$endian == "little" and .Machine$sizeof.longlong == 8[在一个平台上。平台字节序==“小”。机$ sizeof.longlong == 8]
#[]
  fastafile <- system.file("sequences/ct.fasta", package = "seqinr")
  bfafile <- system.file("sequences/ct.bfa", package = "seqinr")

  original <- read.fasta(fastafile, as.string = TRUE, set.att = FALSE)
  bfavers <- read.fasta(bfafile, as.string = TRUE, set.att = FALSE, bfa = TRUE,
endian = "little", sizeof.longlong = 8)
  if(!identical(original, bfavers)){
   warning(paste("trouble reading bfa file with endian =", .Platform$endian,
"and sizeof.longlong =", .Machine$sizeof.longlong))
  }

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册