DistanceMatrix(DECIPHER)
DistanceMatrix()所属R语言包:DECIPHER
Calculate the Distance Between DNA Sequences
计算DNA序列之间的距离
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Calculates a distance matrix for a DNAStringSet. Each element of the distance matrix corresponds to the dissimilarity between two sequences in the DNAStringSet.
计算DNAStringSet的的距离矩阵。距离矩阵的每个元素对应的两个DNAStringSet序列之间的相异。
用法----------Usage----------
DistanceMatrix(myDNAStringSet,
includeTerminalGaps = FALSE,
penalizeGapLetterMatches = TRUE,
penalizeGapGapMatches = FALSE,
removeDuplicates = FALSE,
correction = "none",
verbose = TRUE)
参数----------Arguments----------
参数:myDNAStringSet
A DNAStringSet object of aligned sequences.
一个DNAStringSet对象的排列顺序。
参数:includeTerminalGaps
Logical specifying whether or not to include terminal gaps ("-" characters on each end of the sequence) into the calculation of distance.
指定是否包括终端的差距(“ - ”字符序列的每个年底)的距离计算的逻辑。
参数:penalizeGapLetterMatches
Logical specifying whether or not to consider gap-to-letter matches as mismatches.
指定是否要考虑差距字母匹配,如不匹配的逻辑。
参数:penalizeGapGapMatches
Logical specifying whether or not to consider gap-to-gap matches as mismatches.
指定是否要考虑的差距,差距的比赛,如不匹配的逻辑。
参数:removeDuplicates
Logical specifying whether to remove any identical sequences from the DNAStringSet before calculating distance. If FALSE (the default) then the distance matrix is calculated with the entire DNAStringSet provided as input.
指定是否要删除DNAStringSet计算距离之前的相同序列的逻辑。如果FALSE(默认),然后在距离矩阵计算与整个DNAStringSet提供作为输入。
参数:correction
The substitution model used for distance correction. This should be (an unambiguous abbreviation of) one of "none" or "Jukes-Cantor".
距离校正用于替代模型。这应该是(明确的缩写)"none"或"Jukes-Cantor"之一。
参数:verbose
Logical indicating whether to display progress.
逻辑表明是否显示进度。
Details
详情----------Details----------
The uncorrected distance matrix represents the percent distance between each of the sequences in the DNAStringSet. Ambiguity can be represented using the characters of the IUPAC_CODE_MAP. For example, the distance between an 'N' and any other base is zero.
裸的距离矩阵%,距离每个之间DNAStringSet序列。使用IUPAC_CODE_MAP字符,可以表示歧义。例如,“N”和任何其他碱基之间的距离是零。
If includeTerminalGaps = FALSE then terminal gaps are not included in sequence length. This can be faster since only the positions common to each two sequences are compared. If removeDuplicates = TRUE then the distance matrix will only represent unique sequences in the DNAStringSet. This is can be faster because less sequences need to be compared. For example, if only two sequences in the set are exact duplicates then one is removed and the distance is calculated on the remaining set. Note that the distance matrix can still contain values of 100% after removing duplicates because only exact duplicates are removed without taking into account ambiguous matches represented by the IUPAC_CODE_MAP or the treatment of gaps.
如果includeTerminalGaps = FALSE然后终端的差距并不包括在序列的长度。这样可以更快,因为只有共同的立场,每两个序列进行了比较。如果removeDuplicates = TRUE然后在距离矩阵将只占独特DNAStringSet序列。这是可以更快,因为需要比较少序列。例如,如果只有两个组中的序列是确切的重复一个被删除的距离,其余的一套计算。请注意,在距离矩阵仍然可以包含100%后删除重复值,因为只有准确的重复,没有考虑到由IUPAC_CODE_MAP或治疗的差距代表的暧昧比赛中删除。
The elements of the distance matrix can be referenced by dimnames corresponding to the names of the DNAStringSet. Additionally, an attribute named "correction" specifying the method of correction used can be accessed using the function attr.
的距离矩阵的元素可以被引用dimnames的namesDNAStringSet的。此外,名为“修正”的属性,指定用于矫正方法可以访问使用的功能attr。
值----------Value----------
A symmetric matrix where each element is the distance between the sequences referenced by the respective row and column. The dimnames of the matrix correspond to the names of the DNAStringSet. Sequences with no overlapping positions in the alignment are given a value of NA.
一个对称矩阵,其中每个元素是由各自的行和列的参考序列之间的距离。 dimnames矩阵namesDNAStringSet的对应。没有重叠的位置对齐序列给出了NA的价值。
作者(S)----------Author(s)----------
Erik Wright <a href="mailto ECIPHER@cae.wisc.edu">DECIPHER@cae.wisc.edu</a>
参见----------See Also----------
IdClusters
IdClusters
举例----------Examples----------
# defaults compare intersection of internal ranges:[默认比较内部范围的交集:]
dna <- DNAStringSet(c("ANGCT-","-ACCT-"))
d <- DistanceMatrix(dna)
# d[1,2] is still 1 base in 4 = 0.25[[1,2]仍是1 4 = 0.25的基]
# compare union of internal ranges:[比较联盟内部的范围:]
dna <- DNAStringSet(c("ANGCT-","-ACCT-"))
d <- DistanceMatrix(dna, includeTerminalGaps=TRUE)
# d[1,2] is now 2 bases in 5 = 0.40[D [1,2]现在是2碱基5 = 0.40]
# compare the entire sequence ranges:[比较整个序列范围:]
dna <- DNAStringSet(c("ANGCT-","-ACCT-"))
d <- DistanceMatrix(dna, includeTerminalGaps=TRUE,
penalizeGapGapMatches=TRUE)
# d[1,2] is now 3 bases in 6 = 0.50[D [1,2]现在是3碱基6 = 0.50]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|