ReQON(ReQON)
ReQON()所属R语言包:ReQON
Recalibrating Quality Of Nucleotides
重新调整质量的核苷酸
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Recalibrate the nucleotide quality scores of either single-end or paired-end next-generation sequencing data that has been aligned.
无论是单端或配对末端新一代测序已对齐的数据校准核苷酸的质量分数。
用法----------Usage----------
ReQON(in_bam, out_bam, region, max_train = -1, SNP = "",
RefSeq = "", plotname = "", temp_files = 0)
参数----------Arguments----------
参数:in_bam
file name of sorted BAM file of single-end or paired-end aligned sequencing data. The corresponding index file (.bai file) must be located in the same directory.
排序的BAM单端或配对末端的文件的文件名一致测序数据。相应的索引文件(白文件)必须位于同一目录。
参数:out_bam
file name for output BAM file with original quality scores replaced with recalibrated quality scores.
输出的文件名BAM文件重新调整的质量分数取代原有的品质分数。
参数:region
training region for recalibration, as “chromosome:start-end”. Cannot span more than one chromosome. See note. Example: "chr1:1-10000".
培训区域校准,“染色体:开始结束”。不能跨越多个染色体。见附注。例如:“chr1 :1-10000”。
参数:max_train
maximum number of nucleotides to include in training region. Useful if you want to train on e.g. the first 5 million bases of chromosome 10. Default = -1 (use all nucleotides from training region).
核苷酸,包括在培训区域的最大数量。有用的,如果你想训练上,例如第5万个碱基对10号染色体。默认值= -1(使用培训区域所有核苷酸)。
参数:SNP
file of SNP locations to remove from training set before recalibration. Text or Rdata file (with variable name “snp”) with no header and two columns: [1] chromosome, [2] position. See note. Default: do not remove any nucleotides from training set.
SNP的位置,从训练前校准设置中删除的文件。文本或RDATA文件的(变量名“SNP”)没有头和两列:[1]染色体,[2]位置。见附注。默认:不从训练组中删除任何核苷酸。
参数:RefSeq
file of reference sequence for training set to identify sequencing errors (i.e, nucleotide is error if it does not match RefSeq). Text or Rdata file (with variable name “ref”) with no header and three columns: [1] chromosome, [2] position, [3] reference nucleotide (A,C,G,T). See note. Default: errors are nucleotides not matching major allele(s) for coverage > 2, removing all nucleotides at positions with coverage of 2 or less.
训练序列的设置,以确定测序错误(例如,核苷酸是错误的,如果它不匹配的RefSeq)的参考文件。文本或RDATA文件的无头和三列(变量名“文献”):[1]染色体,位置[2] [3]参考核苷酸(A,C,G,T)。见附注。默认:错误覆盖> 2的主要等位基因(S)不匹配,消除所有核苷酸的阵地覆盖小于或等于2的核苷酸。
参数:plotname
file name for saving recalibration plots in pdf. If not specified, plots will not be produced.
校准图以PDF格式保存文件名。如果没有指定,图不会产生。
参数:temp_files
option for keeping temporary files. 0: (default) remove all temporary files. 1: keep temporary files in working directory.
保存临时文件的选项。 0:(默认)删除所有临时文件。 1:保持在工作目录的临时文件。
Details
详情----------Details----------
ReQON uses logistic regression to recalibrate the nucleotide quality scores of a sorted BAM file. The BAM file contains either single-end or paired-end next-generation sequencing data that has been aligned using any alignment tool. For help with sorting and indexing BAM files in R, see Rsamtools.
ReQON使用logistic回归校准的核苷酸排序的BAM文件的质量分数。 BAM文件包含任何单端或配对末端新一代测序数据已使用任何对齐工具对齐。有关排序和索引的BAM R文件的帮助,看到Rsamtools“。
ReQON also has the option to output diagnostic plots which show the effectiveness of the recalibration on the training set.
ReQON也可以选择输出显示校准的有效性,对训练集的诊断图。
For a detailed description of usage, output and images, see the vignette by: browseVignettes("ReQON").
如需使用,输出和图像的详细描述,请参阅的小插曲:browseVignettes(“ReQON)。
ReQON utilizes various java tools provided by Picard. For more information on Picard, see http://picard.sourceforge.net
ReQON利用各种Java皮卡尔提供的工具。欲上皮卡尔的更多信息,请参阅http://picard.sourceforge.net。
值----------Value----------
ReQON returns a BAM file, replacing the original quality scores with the recalibrated quality scores in the QUAL field.
ReQON返回一个BAM文件,取代了原有的质量分数在qual领域重新调整的质量分数。
ReQON also outputs a data object of diagnostic data from the training set that is plotted in the output diagnostic plots. The object variables are:
ReQON还输出一个诊断在诊断输出图绘制的训练集数据的数据对象。对象变量是:
参数:$ReadPosErrors
vector of error counts by read position.
读取位置错误计数的向量。
参数:$QualFreqBefore
relative frequency of quality scores before recalibration. The first element in the vector corresponds to a quality score of zero.
校准前的质量分数的相对频率。向量中的第一个元素对应于零的质量得分。
参数:$QualFreqAfter
relative frequency of quality scores after recalibration. The first element in the vector corresponds to a quality score of zero.
校准后的质量分数的相对频率。向量中的第一个元素对应于零的质量得分。
参数:$ErrRatesBefore
vector of empirical error rates before recalibration, reported on the Phred scale. The first element in the vector corresponds to a quality score of zero.
矢量校准前的经验误差率,报告上的PHRED的规模。向量中的第一个元素对应于零的质量得分。
参数:$ErrRatesAfter
vector of empirical error rates after recalibration, reported on the Phred scale. The first element in the vector corresponds to a quality score of zero.
经验误差率向量校准后,报上的PHRED的规模。向量中的第一个元素对应于零的质量得分。
参数:FWSE
vector of Frequency-Weighted Squared Error (FWSE) values. The first element is FWSE before recalibration and the second element is FWSE after recalibration.
矢量频率加权平方误差(FWSE)值。前的第一元素是FWSE的校准和第二个元素是FWSE后重新校准。
注意----------Note----------
Be aware of how the chromosomes are referenced when specifying the training region. For example, one BAM file may require specifying “10:1-2000” while another may need “chr10:1-2000”.
要知道如何指定培训区域时引用染色体。例如,一个BAM文件可能需要指定“10:1-2000”,而另一个可能需要“chr10 :1-2000”。
If providing SNP or RefSeq files, computations will speed up if your file only covers the positions in the training region. For example, if you set region = “chr10:1-2000”, then we recommend only having rows corresponding to chr10:1-2000 in the RefSeq/SNP file.
如果提供的SNP的RefSeq文件,计算将加快,如果你的文件只涵盖在训练区域的位置。例如,如果设置区域=“chr10 :1-2000”,那么我们建议只有行对应的RefSeq / SNP文件chr10 :1-2000。
作者(S)----------Author(s)----------
Christopher Cabanski <a href="mailto:cabanski@email.unc.edu">cabanski@email.unc.edu</a>
举例----------Examples----------
## Read in sample data from seqbias package[#读取样本数据从seqbias包]
library( ReQON )
library( seqbias )
library( Rsamtools )
ref_fn <- system.file( "extra/example.fa", package = "seqbias" )
ref_f <- FaFile( ref_fn )
open.FaFile( ref_f )
reads_fn <- system.file( "extra/example.bam", package = "seqbias" )
## Set up file of reference sequence[#设置文件的参考序列]
seqs <- scanFa( ref_f )
len <- length( seqs[[1]] )
ref <- matrix( nrow = len, ncol = 3 )
ref[,1] <- rep( "seq1", len )
ref[,2] <- c( 1:len )
str <- toString( subseq( seqs[[1]], 1, len ) )
s <- strsplit( str, NULL )
ref[,3] <- s[[1]]
write.table( ref, file = "ref_seq.txt", sep = "\t", quote = FALSE,
row.names = FALSE, col.names = FALSE )
## Recalibrate File[#重新校准文件]
sorted <- sortBam( reads_fn, tempfile() )
indexBam( sorted )
reg <- paste( "seq1:1-", len, sep = "" )
diagnostics <- ReQON( sorted, "Recalibrated_example.bam", reg,
RefSeq = "ref_seq.txt", plotname = "Recalibrated_example_plots.pdf" )
#Remove temporary file[删除临时文件]
unlink( "ref_seq.txt" )
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|