read.msa(rphast)
read.msa()所属R语言包:rphast
Reading an MSA Object
读MSA对象
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Reads an MSA from a file.
从文件读取MSA。
用法----------Usage----------
alphabet=NULL, features=NULL, do.4d=FALSE, ordered=ifelse(do.4d ||
!is.null(features), FALSE, TRUE), tuple.size=(if (do.4d) 3 else
NULL), do.cats=NULL, refseq=NULL, offset=0, seqnames=NULL,
discard.seqnames=NULL, pointer.only=FALSE)
参数----------Arguments----------
参数:filename
The name of the input file containing an alignment.
含有对齐的输入文件的名称。
参数:format
input file format: one of "FASTA", "MAF", "SS", "PHYLIP", "MPM", must be correctly specified.
输入文件的格式:一个“FASTA”,“MAF”,“SS”,“PHYLIP”,“MPM”,必须正确指定。
参数:alphabet
the alphabet of non-missing-data chraracters in the alignment. Determined automatically from the alignment if not given.
字母非丢失的数据chraracters的的对齐。自动确定对齐,如果不给。
参数:features
An object of type feat. If provided, the return value will only contain portions of the alignment which fall within a feature. The alignment will not be ordered. The loaded regions can be further constrained with the do.4d or do.cats options. Note that if this object is passed as a pointer to a structure stored in C, the values will be altered by this function!
的对象类型feat。如果提供的话,则返回值将只包含部分的比对,属于一个功能。校准将不会被排序。加载的区域,可以进一步约束的do.4d或do.cats的选项。请注意,如果这个对象是通过为存储在C结构的指针,该值将被此函数所改变!
参数:do.4d
Logical. If TRUE, the return value will contain only the columns corresponding to four-fold degenerate sties. Requires features to be specified.
逻辑。如果TRUE,返回值将只包含对应的列到4倍的退化麦粒肿。需要到指定的功能。
参数:ordered
Logical. If FALSE, the MSA object may not retain the original column order.
逻辑。如果FALSE,的MSA对象可能不保留原有的列顺序。
参数:tuple.size
Integer. If given, and if pointer.only is TRUE, MSA will be stored in sufficient statistics format, where each tuple contains tuple.size consecutive columns of the alignment.
整数。如果给出,如果pointer.only是TRUE,MSA将被存储在充分统计量的格式,其中每个元组包含的对准tuple.size连续列。
参数:do.cats
Character vector if features is provided; integer vector if cats.cylce is provided. If given, only the types of features named here will be represented in the (unordered) return alignment.
如果功能特征矢量整数的向量,提供,如果cats.cylce。如果给定的类型,只有将出席在这里命名的功能(无序)返回调整。
参数:refseq
Character string specifying a FASTA format file with a reference sequence. If given, the reference sequence will be "filled in" whereever missing from the alignment.
字符串指定一个与参考序列的FASTA格式的文件。如果给定,参考序列将“填充”徘徊无论缺少的对齐方式。
参数:offset
An integer giving offset of reference sequence from beginning of chromosome. Not used for MAF format.
一个整数,偏移量从开始的染色体参考序列。不用于MAF格式。
参数:seqnames
A character vector. If provided, discard any sequence in the msa that is not named here. This is only implemented efficiently for MAF input files, but in this case, the reference sequence must be named.
字符向量。如果提供的话,丢弃任何未命名序列的MSA。这是只执行MAF输入文件有效,但在这种情况下,参考序列必须被命名。
参数:discard.seqnames
A character vector. If provided, discard sequenced named here. This is only implemented efficiently for MAF input files, but in this case, the reference sequenced must NOT be discarded.
字符向量。如果提供的话,丢弃在这里命名测序。这是只执行MAF输入文件有效,但在这种情况下,必须不被丢弃的基准测序。
参数:pointer.only
If TRUE, MSA will be stored by reference as an external pointer to an object created by C code, rather than directly in R memory. This improves performance and may be necessary for large alignments, but reduces functionality. See msa for more details on MSA object storage options.
如果TRUE,MSA将存储作为参考用C代码中创建的对象,而不是直接在R存储器的外部指针。这提高了性能,可能是必要的。大对齐,但降低的功能。见msaMSA对象存储选项的详细信息。
值----------Value----------
an MSA object.
MSA对象。
注意----------Note----------
If the input is in "MAF" format and features is specified, the resulting alignment will be stripped of gaps in the reference (1st)
如果输入的是“MAF”的形式和功能的指定,对准被剥夺的差距,在参考(1)
(作者)----------Author(s)----------
Melissa J. Hubisz and Adam Siepel
参见----------See Also----------
msa, read.feat
msa,read.feat
实例----------Examples----------
exampleArchive <- system.file("extdata", "examples.zip", package="rphast")
files <- c("ENr334.maf", "ENr334.fa", "gencode.ENr334.gff")
unzip(exampleArchive, files)
# Read a fasta file, ENr334.fa[fasta格式文件阅读,ENr334.fa]
# this file represents a 4-way alignment of the encode region[此文件代表一个4路对准的编码区域]
# ENr334 starting from hg18 chr6 position 41405894[ENr334从hg18 CHR 6位41405894]
idx.offset <- 41405894
m1 <- read.msa("ENr334.fa", offset=idx.offset)
m1
# Now read in only a subset represented in a feature file[现在读一个功能文件中只有一小部分代表]
f <- read.feat("gencode.ENr334.gff")
f$seqname <- "hg18" # need to tweak source name to match name in alignment[需要调整源名称匹配的名字对齐]
m1 <- read.msa("ENr334.fa", features=f, offset=idx.offset)
# Can also subset on certain features[还可以对某些功能的子集]
do.cats <- c("CDS", "5'flank", "3'flank")
m1 <- read.msa("ENr334.fa", features=f, offset=idx.offset,
do.cats=do.cats)
# Can read MAFs similarly, but don't need offset because[可以读取MAFS相似,但不需要偏移,因为]
# MAF file is annotated with coordinates[MAF文件注释的坐标]
m2 <- read.msa("ENr334.maf", features=f, do.cats=do.cats)
# Also, note that when features is given and the file is[另外,还要注意功能时,该文件是]
# in MAF format, the first sequence is automatically[MAF格式中,所述第一序列是自动]
# stripped of gaps[剥离的差距]
ncol.msa(m1)
ncol.msa(m2)
ncol.msa(m1, "hg18")
unlink(files) # clean up[清理]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|