substitution.matrices(Biostrings)
substitution.matrices()所属R语言包:Biostrings
Scoring matrices
得分矩阵
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Predefined substitution matrices for nucleotide and amino acid alignments.
核苷酸和氨基酸路线的预定义的替代矩阵。
用法----------Usage----------
data(BLOSUM45)
data(BLOSUM50)
data(BLOSUM62)
data(BLOSUM80)
data(BLOSUM100)
data(PAM30)
data(PAM40)
data(PAM70)
data(PAM120)
data(PAM250)
nucleotideSubstitutionMatrix(match = 1, mismatch = 0, baseOnly = FALSE, type = "DNA")
qualitySubstitutionMatrices(fuzzyMatch = c(0, 1), alphabetLength = 4L, qualityClass = "PhredQuality", bitScale = 1)
errorSubstitutionMatrices(errorProbability, fuzzyMatch = c(0, 1), alphabetLength = 4L, bitScale = 1)
参数----------Arguments----------
参数:match
the scoring for a nucleotide match.
核苷酸比赛的得分。
参数:mismatch
the scoring for a nucleotide mismatch.
核苷酸错配的得分。
参数:baseOnly
TRUE or FALSE. If TRUE, only uses the letters in the "base" alphabet i.e. "A", "C", "G", "T".
TRUE或FALSE。 TRUE如果,只使用字母,在“碱基”字母,即“A”,“C”,“G”的“T”型。
参数:type
either "DNA" or "RNA".
无论是“DNA”或“核糖核酸”。
参数:fuzzyMatch
a named or unnamed numeric vector representing the base match probability.
一个命名或无名的数字矢量碱基匹配概率。
参数:errorProbability
a named or unnamed numeric vector representing the error probability.
命名的或无名的数字向量,代表了错误的概率。
参数:alphabetLength
an integer representing the number of letters in the underlying string alphabet. For DNA and RNA, this would be 4L. For Amino Acids, this could be 20L.
一个整数,代表信件在底层的字符串字母。 DNA和RNA,这将是4L。氨基酸,这可能是20L。
参数:qualityClass
a character string of either "PhredQuality" or "SolexaQuality".
一个要么"PhredQuality"或"SolexaQuality"字符串。
参数:bitScale
a numeric value to scale the quality-based substitution matrices. By default, this is 1, representing bit-scale scoring.
规模质量为基础的替代矩阵的数值。默认情况下,这是第1,占规模位得分。
格式----------Format----------
The BLOSUM and PAM matrices are square symmetric matrices with integer coefficients, whose row and column names are identical and unique: each name is a single letter representing a nucleotide or an amino acid.
BLOSUM的PAM矩阵是对称矩阵与整数系数,其行和列的名称是相同的,独特的方形:每个名字是一个字母代表一个核苷酸或氨基酸。
nucleotideSubstitutionMatrix produces a substitution matrix for all IUPAC nucleic acid codes based upon match and mismatch parameters.
nucleotideSubstitutionMatrix产生一个基于匹配和不匹配的参数为所有的IUPAC核酸代码替代矩阵。
errorSubstitutionMatrices produces a two element list of numeric square symmetric matrices, one for matches and one for mismatches.
errorSubstitutionMatrices产生的数字平方对称矩阵,一个匹配不匹配的两个元素的列表。
qualitySubstitutionMatrices produces the substitution matrices for Phred or Solexa quality-based reads.
qualitySubstitutionMatrices生产替代PHRED或公司Solexa质量为基础的读取矩阵。
Details
详情----------Details----------
The BLOSUM and PAM matrices are not unique. For example, the definition of the widely used BLOSUM62 matrix varies depending on the source, and even a given source can provide different versions of "BLOSUM62" without keeping track of the changes over time. NCBI provides many matrices here ftp://ftp.ncbi.nih.gov/blast/matrices/ but their definitions don't match those of the matrices bundled with their stand-alone BLAST software available here ftp://ftp.ncbi.nih.gov/blast/
BLOSUM和PAM矩阵不是唯一的。例如,广泛使用BLOSUM62矩阵的定义取决于源,甚至一个给定的源可以提供不同的版本没有保持轨道的变化,随着时间的推移,“BLOSUM62”。 NCBI的在这里提供了许多矩阵ftp://ftp.ncbi.nih.gov/blast/matrices/,但其定义不匹配那些与独立的BLAST软件可以在这里ftp://ftp.ncbi捆绑矩阵。 nih.gov /爆炸/
The BLOSUM45, BLOSUM62, BLOSUM80, PAM30 and PAM70 matrices were taken from NCBI stand-alone BLAST software.
从NCBI的BLAST软件单机BLOSUM45,BLOSUM62,BLOSUM80,PAM30和PAM70矩阵。
The BLOSUM50, BLOSUM100, PAM40, PAM120 and PAM250 matrices were taken from ftp://ftp.ncbi.nih.gov/blast/matrices/
从ftp://ftp.ncbi.nih.gov/blast/matrices/采取BLOSUM50,BLOSUM100,PAM40,PAM120,PAM250矩阵
The quality matrices computed in qualitySubstitutionMatrices are based on the paper by Ketil Malde. Let ε_i be the probability of an error in the base read. For "Phred" quality measures Q in [0, 99], these error probabilities are given by ε_i = 10^{-Q/10}. For "Solexa" quality measures Q in [-5, 99], they are given by ε_i = 1 - 1/(1 + 10^{-Q/10}). Assuming independence within and between base reads, the combined error probability of a mismatch when the underlying bases do match is ε_c = ε_1 + ε_2 - (n/(n-1)) * ε_1 * ε_2, where n is the number of letters in the underlying alphabet. Using ε_c, the substitution score is given by when two bases match is given by b * \log_2(γ_{x,y} * (1 - ε_c) * n + (1 - γ_{x,y}) * ε_c * (n/(n-1))), where b is the bit-scaling for the scoring and γ_{x,y} is the probability that characters x and y represents the same underlying information (e.g. using IUPAC, γ_{A,A} = 1 and γ_{A,N} = 1/4. In the arguments listed above fuzzyMatch represents γ_{x,y} and errorProbability represents ε_i.
qualitySubstitutionMatrices质量矩阵计算是基于对Ketil Malde纸。让ε_i是一个错误的阅读碱基的可能性。为"Phred"质量的措施Q[0, 99],这些错误的概率ε_i = 10^{-Q/10}。为"Solexa"质量的措施Q[-5, 99],ε_i = 1 - 1/(1 + 10^{-Q/10})。假设在与碱基读取的独立性,不匹配的组合错误的可能性时,底层的基础不匹配是ε_c = ε_1 + ε_2 - (n/(n-1)) * ε_1 * ε_2,其中n是在基本字母的数目。使用ε_c,替代得分b * \log_2(γ_{x,y} * (1 - ε_c) * n + (1 - γ_{x,y}) * ε_c * (n/(n-1))),其中b是位得分缩放和γ_{x,y}的概率是两个碱基匹配时字符x和y代表相同的基本信息(例如,使用国际化联,γ_{A,A} = 1和γ_{A,N} = 1/4。以上fuzzyMatch列出的论点指γ_{x,y} errorProbabilityε_i。
作者(S)----------Author(s)----------
H. Pages and P. Aboyoun
参考文献----------References----------
参见----------See Also----------
pairwiseAlignment, PairwiseAlignedXStringSet-class, DNAString-class, AAString-class, PhredQuality-class, SolexaQuality-class
pairwiseAlignment,PairwiseAlignedXStringSet级,DNAString级,级AAString,级PhredQuality,SolexaQuality级
举例----------Examples----------
s1 <-
DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG")
s2 <-
DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC")
## Fit a global pairwise alignment using edit distance scoring[#适合一个全球性的成对使用编辑距离得分对齐]
pairwiseAlignment(s1, s2,
substitutionMatrix = nucleotideSubstitutionMatrix(0, -1, TRUE),
gapOpening = 0, gapExtension = -1)
## Examine quality-based match and mismatch bit scores for DNA/RNA[#检查质量的比赛和用于DNA / RNA错配位分数]
## strings in pairwiseAlignment.[#pairwiseAlignment字符串。]
## By default patternQuality and subjectQuality are PhredQuality(22L).[通过默认patternQuality和subjectQuality#是PhredQuality(22L)。]
qualityMatrices <- qualitySubstitutionMatrices()
qualityMatrices["22", "22", "1"]
qualityMatrices["22", "22", "0"]
pairwiseAlignment(s1, s2)
## Get the substitution scores when the error probability is 0.1[#获取替代分数时,错误的概率是0.1]
subscores <- errorSubstitutionMatrices(errorProbability = 0.1)
submat <- matrix(subscores[,,"0"], 4, 4)
diag(submat) <- subscores[,,"1"]
dimnames(submat) <- list(DNA_ALPHABET[1:4], DNA_ALPHABET[1:4])
submat
pairwiseAlignment(s1, s2, substitutionMatrix = submat)
## Align two amino acid sequences with the BLOSUM62 matrix[#对齐BLOSUM62矩阵的两个氨基酸序列]
aa1 <- AAString("HXBLVYMGCHFDCXVBEHIKQZ")
aa2 <- AAString("QRNYMYCFQCISGNEYKQN")
pairwiseAlignment(aa1, aa2, substitutionMatrix = "BLOSUM62", gapOpening = -3, gapExtension = -1)
## See how the gap penalty influences the alignment[#见差距罚款如何影响对齐]
pairwiseAlignment(aa1, aa2, substitutionMatrix = "BLOSUM62", gapOpening = -6, gapExtension = -2)
## See how the substitution matrix influences the alignment[#请参阅如何替代矩阵影响对齐]
pairwiseAlignment(aa1, aa2, substitutionMatrix = "BLOSUM50", gapOpening = -3, gapExtension = -1)
if (interactive()) {
## Compare our BLOSUM62 with BLOSUM62 from ftp://ftp.ncbi.nih.gov/blast/matrices/[#比较BLOSUM62 ftp://ftp.ncbi.nih.gov/blast/matrices/我们BLOSUM62]
data(BLOSUM62)
BLOSUM62["Q", "Z"]
file <- "ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62"
b62 <- as.matrix(read.table(file, check.names=FALSE))
b62["Q", "Z"]
}
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|