找回密码
 注册
查看: 1350|回复: 0

R语言 ShortRead包 readAligned()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-26 14:05:23 | 显示全部楼层 |阅读模式
readAligned(ShortRead)
readAligned()所属R语言包:ShortRead

                                        Read aligned reads and their quality scores into R representations
                                         阅读对准读取和成R表示其质量分数

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Import files containing aligned reads into an internal representation of the alignments, sequences, and quality scores. Most methods (see "details" for exceptions) read all files into a single R object.
包含对齐的进口文件读入内部的路线,序列和质量分数表示。大多数方法(见“细节”为例外)读入一个单一的R对象的所有文件。


用法----------Usage----------



readAligned(dirPath, pattern=character(0), ...)




参数----------Arguments----------

参数:dirPath
A character vector (or other object; see methods defined on this generic) giving the directory path (relative or absolute; some methods also accept a character vector of file names) of aligned read files to be input.
对齐的读文件输入一个字符向量(或其他对象;看到这个通用定义的方法)的目录路径(相对或绝对的,有些方法还接受一个文件名的字符向量)。


参数:pattern
The (grep-style) pattern describing file names to be read. The default (character(0)) results in (attempted) input of all files in the directory.
(grep式)模式描述被读取的文件名。默认(character(0))(未遂)输入目录中的所有文件中的结果。


参数:...
Additional arguments, used by methods. When dirPath is a character vector, the argument type must be provided. Possible values for type and their meaning are described below. Most methods implement filter=srFilter(), allowing objects of SRFilter to selectively returns aligned reads.
额外的参数,采用的方法。当dirPath是一个特征向量,参数type必须提供。 type“及其意义的可能值如下所述。大多数方法实施filter=srFilter(),SRFilter选择性返回对准读取允许对象。


Details

详情----------Details----------

There is no standard aligned read file format; methods parse particular file types.
有没有对齐读的标准文件格式,方法分析特定的文件类型。

The readAligned,character-method interprets file types based on an additional type argument. Supported types are:
readAligned,character-method解释一个额外的type参数为基础的文件类型。支持的类型有:

This type parses .*_export.txt files following the documentation in the Solexa Genome Alignment software manual, version 0.3.0. These files consist of the following columns; consult Solexa documentation for precise descriptions. If parsed, values can be retrieved from AlignedRead as follows:
这种类型的解析.*_export.txt后,该公司Solexa基因组比对软件的使用说明书,版本0.3.0的文档文件。这些文件包括以下几列;咨询公司Solexa文档的精确描述。如果解析,值可以取自AlignedRead如下:




Machine see below
机见下文




Run number stored in alignData
运行存储alignData数




Lane stored in alignData
alignData存储巷




Tile stored in alignData
存储alignData的瓷砖




X stored in alignData
x中alignData




Y stored in alignData
Ÿ存储alignData




Multiplex index see below
多重指数见下文




Paired read number see below
配对阅读数量见下文




Read sread
读sread




Quality quality
质量quality




Match chromosome chromosome
比赛染色体chromosome




Match contig alignData
比赛重叠群alignData




Match position position
匹配的位置position




Match strand strand
匹配链strand




Match description Ignored
匹配的描述忽略




Single-read alignment score alignQuality
单读对齐得分alignQuality




Paired-read alignment score Ignored
配对阅读对齐得分忽略




Partner chromosome Ignored
合伙人的染色体被忽略




Partner contig Ignored
合作伙伴重叠群被忽略




Partner offset Ignored
合作伙伴抵销忽略




Partner strand Ignored
伙伴链忽略




Filtering alignData
过滤alignData

The following optional arguments, set to FALSE by default, influence data input
下列可选参数,默认情况下,影响数据输入FALSE




withMultiplexIndex When TRUE, include the multiplex index as a column multiplexIndex in
withMultiplexIndex当TRUE,包括多重指数为一列multiplexIndex




withPairedReadNumber When TRUE, include the paired read number as a column pairedReadNumber in
withPairedReadNumber当TRUE,包括成对读人数为一列pairedReadNumber




withId When TRUE, construct an identifier string as "Machine_Runane:Tile:X:Y#multiplexIndex/pairedReadNumber". The substrings "#multiplexIndex" and "/pairedReadNumber" are not present if withMultiplexIndex=FALSE or
withId当TRUE,建设作为标识符字符串Machine_Run:巷:瓷砖:X:Y#multiplexIndex / pairedReadNumber。“中的子#multiplexIndex和/ pairedReadNumber“如果withMultiplexIndex=FALSE或不存在




withAll A convencience which, when TRUE, sets all
在一切时,TRUE,设置一个convencience所有

Note that not all paired read columns are interpreted.  Different interfaces to reading alignment files are described in SolexaPath and SolexaSet.
请注意,并非所有配对读列解释。不同的接口来读取对齐文件中描述了SolexaPath和SolexaSet。




type="SolexaPrealign" See SolexaRealign
type="SolexaPrealign"见SolexaRealign




type="SolexaAlign" See SolexaRealign
type="SolexaAlign"见SolexaRealign

These types parse s_L_TTTT_prealign.txt, s_L_TTTT_align.txt or s_L_TTTT_realign.txt files produced by default and eland analyses. From the Solexa documentation, align corresponds to unfiltered first-pass alignments, prealign adjusts alignments for error rates (when available), realign filters alignments to exclude clusters failing to pass quality criteria.
这些类型的解析s_L_TTTT_prealign.txt,s_L_TTTT_align.txt或s_L_TTTT_realign.txt默认和羚羊分析产生的文件。从公司Solexa文档,align对应于未过滤的传球路线,prealign调整路线的错误率(可用时),realign过滤器的路线,排除不合格质量标准的聚类。

Because base quality scores are not stored with alignments, the object returned by readAligned scores all base qualities as -32.
因为碱基的质量分数不存储路线,返回的对象readAligned分数-32所有的基本素质。

If parsed, values can be retrieved from AlignedRead as follows:
如果解析,值可以取自AlignedRead如下:




Sequence stored in sread
sread存储序列




Best score stored in alignQuality
alignQuality存储最好成绩




Number of hits stored in alignData
点击次数alignData存储




Target position stored in position
目标位置存储position




Strand stored in strand
strand东街存储




Target sequence Ignored; parse using
靶序列被忽略;解析使用




Next best score stored in alignData
最好成绩alignData存储

This parses s_L_eland_results.txt files, an intermediate format that does not contain read or alignment quality scores.
解析s_L_eland_results.txt文件,不包含读或对齐的质量分数的中间格式。

Because base quality scores are not stored with alignments, the object returned by readAligned scores all base qualities as -32.
因为碱基的质量分数不存储路线,返回的对象readAligned分数-32所有的基本素质。

Columns of this file type can be retrieved from AlignedRead as follows (description of columns is from Table 19, Genome Analyzer Pipeline Software User Guide, Revision A, January 2008):
此文件类型的列可以检索从AlignedRead如下(从表19列的描述是,Genome Analyzer的管道软件用户指南“,修订版A,2008年1月):




Id Not parsed
ID未解析




Sequence stored in sread
sread存储序列




Type of match code Stored in alignData as matchCode. Codes are (from the Eland manual): NM (no match); QC (no match due to quality control failure); RM (no match due to repeat masking); U0 (best match was unique and exact); U1 (best match was unique, with 1 mismatch); U2 (best match was unique, with 2 mismatches); R0 (multiple exact matches found); R1 (multiple 1 mismatch matches found, no exact matches); R2 (multiple 2 mismatch matches found, no
类型匹配的代码存储在alignDatamatchCode。代码(从伊兰手册):网管(不匹配); QC(不匹配,由于质量控制失败);马币(不匹配,由于重复掩蔽); U0(最佳匹配的是独特的,确切),U1(最佳匹配是独特的,不匹配),U2(最佳匹配的是独特的2不匹配); R0(找到多个精确匹配); R1(多不匹配匹配发现,没有确切的比赛); R2的(多2不匹配匹配发现,没有




Number of exact matches stored in alignData as
号码储存在alignData作为精确匹配




Number of 1-error mismatches stored in alignData
数存储alignData中的1错误不匹配




Number of 2-error mismatches stored in alignData
数存储alignData中的2错误不匹配




Genome file of match stored in chromosome
基因组文件匹配存储chromosome




Position stored in position
position位置存储




Strand (direction of match) stored in strand
strand存储东街(匹配方向)




"N" treatment stored in alignData, as NCharacterTreatment. "." indicates treatment of "N" was not applicable; "D" indicates treatment
“N”处理存储在alignData,NCharacterTreatment。 “”治疗“N”表示不适用;“D”表示治疗




Substitution error stored in alignData as mismatchDetailOne and mismatchDetailTwo. Present only for unique inexact matches at one or two positions. Position and type of first substitution error, e.g., 11A represents 11 matches with 12th base an A in reference but not read. The reference manual cited below lists only one field (mismatchDetailOne), but two are present
换人错误存储在alignDatamismatchDetailOne和mismatchDetailTwo。仅在一个或两个位置呈现独特的不精确匹配。第一替代错误,例如位置和类型,11A代表碱基12参考但不读取11场比赛。下面列举的参考手册列出了只有一个字段(mismatchDetailOne),但两者都存在




type="MAQMap", records=-1L Parse binary map files produced by MAQ. See details in the next section. The records option determines how many lines are read; -1L (the default) means that all records are input. For type="MAQMap", dir and pattern must match a
type="MAQMap", records=-1L解析二进制的map由MAQ生产的文件。详情请参阅下一节。 records选项确定多少行被读取;-1L(默认)意味着所有的记录都输入。 type="MAQMap",dir和pattern必须匹配




type="MAQMapShort", records=-1L The same as type="MAQMap" but for map files made with Maq prior to version 0.7.0. (These files use a different maximum read length [64 instead of 128], and are hence incompatible with newer Maq map files.). For type="MAQMapShort", dir and
type="MAQMapShort", records=-1Ltype="MAQMap"但同样与MAQ 0.7.0版本之前的图文件。 (这些文件使用不同的最高读取长度,而不是128 [64],因此具有较新的的MAQ图文件不兼容)。对于type="MAQMapShort",dir“

Parse alignment files created by MAQ's "mapiew" command. Interpretation of columns is based on the description in the MAQ manual, specifically
解析对齐MAQ的mapiew命令创建文件。列的解释说明在MAQ手册的基础上,特别是

Parse alignment files created with the Bowtie alignment algorithm. Parsed columns can be retrieved from AlignedRead as follows:
解析与领结对齐算法创建的对齐文件。解析列可以检索从AlignedRead如下:




Identifier id
标识符id




Strand strand
东街strand




Chromosome chromosome
染色体chromosome




Position position; see comment below
位置position;评论如下




Read sread; see comment below
读sread;评论如下




Read quality quality; see comments below
阅读质量quality见下文




Similar alignments alignData, "similar" column; Bowtie v. 0.9.9.3 (12 May, 2009) documents this as the number of other instances where the same read aligns against the same reference characters as were aligned against in this
类似的路线alignData,类似列;领结诉0.9.9.3(2009年5月12日)证件的其他情况下,同读对同一参考字符对齐一致反对在这




Alignment mismatch locations alignData
对准不匹配的位置alignData

NOTE: the default quality encoding changes to FastqQuality with ShortRead version 1.3.24.
注:默认的质量编码FastqQualityShortRead版本1.3.24的变化。

This method includes the argument qualityType to specify how quality scores are encoded.  Bowtie quality scores are "hred"-like by default, with qualityType='FastqQuality', but can be specified as "Solexa"-like, with qualityType='SFastqQuality'.
这种方法包括参数qualityType指定如何编码质量分数。领结质量分数Phred-样默认情况下,用qualityType='FastqQuality',但可以被指定为“Solexa样用qualityType='SFastqQuality'。

Bowtie outputs positions that are 0-offset from the left-most end of the + strand. ShortRead parses position information to be 1-offset from the left-most end of the + strand.
领结是从最左边的+链的最终0偏移的输出位置。 ShortRead分析+链从最左边的年底偏移位置信息。

Bowtie outputs reads aligned to the - strand as their reverse complement, and reverses the quality score string of these reads. ShortRead parses these to their original sequence and orientation.
领结输出读取对齐-作为其反补链,扭转这些读取的质量得分字符串。 ShortRead解析这些原始序列和方向。

Parse alignment files created with the SOAP alignment algorithm. Parsed columns can be retrieved from AlignedRead as follows:
解析SOAP对准算法创建的对齐文件。解析列可以检索从AlignedRead如下:




id id
IDid




seq sread; see comment below
SEQsread;看到下面的评论




qual quality; see comment below
的QUALquality;见注释下面




number of hits alignData
点击次数alignData




a/b alignData (pairedEnd)
A / BalignData(pairedEnd)




length alignData (alignedLength)
长度alignData(alignedLength)




+/- strand
+ /  - strand




chr chromosome
CHRchromosome




location position; see comment below
位置position;看到下面的评论




types alignData (typeOfHit: integer
类型alignData(typeOfHit:整数

This method includes the argument qualityType to specify how quality scores are encoded.  It is unclear from SOAP documentation what the quality score is; the default is "Solexa"-like, with qualityType='SFastqQuality', but can be specified as "hred"-like, with qualityType='FastqQuality'.
这种方法包括参数qualityType指定如何编码质量分数。目前还不清楚是从SOAP文档的质量得分是什么;默认为“样Solexa,与qualityType='SFastqQuality',但可指定为”Phred样用qualityType='FastqQuality'。

SOAP outputs positions that are 1-offset from the left-most end of the + strand. ShortRead preserves this representation.
SOAP输出的位置,从最左边的+链的结束偏移。 ShortRead保留此表示。

SOAP reads aligned to the - strand are reported by SOAP as their reverse complement, with the quality string of these reads reversed. ShortRead parses these to their original sequence and orientation.
SOAP读取对齐-链报告通过SOAP作为其反补,这些读取逆转质量字符串。 ShortRead解析这些原始序列和方向。

Parse BAM files produced by samtools and other third party programs.  This method includes the argument param=ScanBamParam(). The ScanBamParam object is recycled for all files.  The which and flag arguments to ScanBamParam() can be used to influence which reads in the BAM file are parsed; see ScanBamParam. The following values override user settings (issuing a warning if contradictory values are provided):
解析的BAM由samtools和其他第三方程序产生的文件。这种方法包括参数param=ScanBamParam()。 ScanBamParam对象被回收的所有文件。 which和flag参数ScanBamParam()可以用来影响的BAM文件解析读取看到ScanBamParam。下面的值覆盖用户设置(发出警告,如果矛盾的价值观提供):




simpleCigar=TRUE Reads aligned with indels are ignored; this is required for representation in
simpleCigar=TRUE读取与INDELS的对齐被忽略,这是表示需要




reverseComplement=TRUE By default, BAM stores reads as they are aligned to the reference genome, whereas AiignedRead stores them as they are prior to alignment; this flag converts reads from the BAM to AlignedRead
reverseComplement=TRUE默认情况下,读取BAM的商店,因为它们是一致的参考基因组,而AiignedRead存储它们,因为它们对齐之前,这个旗转换读取AlignedRead从BAM

"mapq", "seq", "qual")</dt> These BAM fields are mapped to
“mapq”,“SEQ”,“资格赛”)</代码> </ DT>这些BAM的字段映射到

BAM fields are mapped to AlignedRead as:
BAM的字段映射到AlignedRead为:




qname id
QNAMEid




seq sread
SEQsread




qual quality
QUALquality




strand strand
的钢绞线strand




rname chromosome
RNAMEchromosome




pos position
POSposition




mapq alignQuality
mapqalignQuality




flag alignData
标志alignData


值----------Value----------

A single R object (e.g., AlignedRead) containing alignments, sequences and qualities of all files in dirPath matching pattern. There is no guarantee of order in which files are read.
一个单一的R对象(例如,AlignedRead)dirPath匹配pattern路线,序列和质量的所有文件。有没有为了保证文件是只读的。


作者(S)----------Author(s)----------



Martin Morgan &lt;mtmorgan@fhcrc.org&gt;,
Simon Anders &lt;anders@ebi.ac.uk&gt; (MAQ map)



参见----------See Also----------

The AlignedRead class.
AlignedRead类。

Genome Analyzer Pipeline Software User Guide, Revision A, January 2008.
Genome Analyzer的管道软件用户指南“,修订版A,2008年1月。

The MAQ reference manual, http://maq.sourceforge.net/maq-manpage.shtml#5, 3 May, 2008.
MAQ参考手册,http://maq.sourceforge.net/maq-manpage.shtml#5,5月3日,2008。

The Bowtie reference manual, http://bowtie-bio.sourceforge.net, 28 October, 2008.
该领结的参考手册,http://bowtie-bio.sourceforge.net,10月28日,2008。

The SOAP reference manual, http://soap.genomics.org.cn/soap1, 16 December, 2008.
SOAP的参考手册,http://soap.genomics.org.cn/soap1 12月16日,2008。

The BAM file format specification, http://samtools.sourceforge.net.
BAM文件格式规范,http://samtools.sourceforge.net。


举例----------Examples----------


sp <- SolexaPath(system.file("extdata", package="ShortRead"))
ap <- analysisPath(sp)
## ELAND_EXTENDED[#ELAND_EXTENDED]
(aln0 <- readAligned(ap, "s_2_export.txt", "SolexaExport"))
## PhageAlign[#PhageAlign]
(aln1 <- readAligned(ap, "s_5_.*_realign.txt", "SolexaRealign"))

## MAQ[#MAQ]
dirPath <- system.file('extdata', 'maq', package='ShortRead')
list.files(dirPath)
## First line[#第一行]
readLines(list.files(dirPath, full.names=TRUE)[[1]], 1)
countLines(dirPath)
## two files collapse into one[#两个文件到一个崩溃]
(aln2 <- readAligned(dirPath, type="MAQMapview"))

## select only chr1-5.fa, '+' strand[#选择只chr1-5.fa,“+”链]
filt <- compose(chromosomeFilter("chr[1-5].fa"),
                strandFilter("+"))
(aln3 <- readAligned(sp, "s_2_export.txt", filter=filt))

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-24 08:49 , Processed in 0.025029 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表