adist(utils)
adist()所属R语言包:utils
Approximate String Distances
近似串距离
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Compute the approximate string distance between character vectors. The distance is a generalized Levenshtein (edit) distance, giving the minimal possibly weighted number of insertions, deletions and substitutions needed to transform one string into another.
计算特征向量之间的近似串距离。距离是广义Levenshtein(编辑)的距离,给最小的插入,删除和替换需要一个字符串转换成另一种可能的加权数。
用法----------Usage----------
adist(x, y = NULL, costs = NULL, counts = FALSE, fixed = TRUE,
partial = !fixed, ignore.case = FALSE, useBytes = FALSE)
参数----------Arguments----------
参数:x
a character vector.
字符向量。
参数:y
a character vector, or NULL (default) indicating taking x as y.
一个字符向量,或NULL(默认)指示服用xy的。
参数:costs
a numeric vector or list with names partially matching insertions, deletions and substitutions giving the respective costs for computing the Levenshtein distance, or NULL (default) indicating using unit cost for all three possible transformations.
名称部分匹配insertions,deletions和substitutions给各自的成本计算Levenshtein距离,或NULL(默认)表示使用的单位成本为一个数值向量或列表所有三种可能的转变。
参数:counts
a logical indicating whether to optionally return the transformation counts (numbers of insertions, deletions and substitutions) as the "counts" attribute of the return value.
逻辑表明是否选择返回"counts"返回值属性的转换计数(插入,删除和替换的数字)。
参数:fixed
a logical. If TRUE (default), the x elements are used as string literals. Otherwise, they are taken as regular expressions and partial = TRUE is implied (corresponding to the approximate string distance used by agrep with fixed = FALSE.
一个逻辑。如果TRUE(默认),的x的元素都作为字符串文字。否则,他们采取的正则表达式和partial = TRUE是隐含的(相应的近似串距离agrepfixed = FALSE使用。
参数:partial
a logical indicating whether the transformed x elements must exactly match the complete y elements, or only substrings of these. The latter corresponds to the approximate string distance used by agrep (by default).
逻辑指示是否转化x元素都必须完全匹配完整的y元素,或只对这些子串。后者对应的近似字符串距离agrep(默认)使用。
参数:ignore.case
a logical. If TRUE, case is ignored for computing the distances.
一个逻辑。如果TRUE,计算距离的情况下被忽略。
参数:useBytes
a logical. If TRUE distance computations are done byte-by-byte rather than character-by-character.
一个逻辑。如果TRUE距离计算完成字节逐字节而非字符字符。
Details
详情----------Details----------
The (generalized) Levenshtein (or edit) distance between two strings <VAR>s</VAR> and <VAR>t</VAR> is the minimal possibly weighted number of insertions, deletions and substitutions needed to transform <VAR>s</VAR> into <VAR>t</VAR> (so that the transformation exactly matches <VAR>t</VAR>). This distance is computed for partial = FALSE, currently using a dynamic programming algorithm (see, e.g., http://en.wikipedia.org/wiki/Levenshtein_distance) with space and time complexity O(mn), where m and n are the lengths of <VAR>s</VAR> and <VAR>t</VAR>, respectively. Additionally computing the transformation sequence and counts is O(\max(m, n)).
(广义)的Levenshtein(或编辑)之间的距离两个字符串<VAR>小号</无功>和<VAR> T </ VAR的是最小可能插入,删除和替换需要改造<VAR>小号加权数< /变更>进入<VAR> T </变更>(这样的改造完全匹配<VAR> T </变更>)。这个距离计算partial = FALSE,目前使用的时间和空间的复杂性O(mn),<动态规划算法(例如,见http://en.wikipedia.org/wiki/Levenshtein_distance) X>和m<VAR>小号的长度</ VAR的>和<VAR> T </变更,分别。此外,计算转换序列和计数是n。
The generalized Levenshtein distance can also be used for approximate (fuzzy) string matching, in which case one finds the substring of <VAR>t</VAR> with minimal distance to the pattern <VAR>s</VAR> (which could be taken as a regular expression, in which case the principle of using the leftmost and longest match applies), see, e.g., http://en.wikipedia.org/wiki/Approximate_string_matching. This distance is computed for partial = TRUE using tre by Ville Laurikari (http://http://laurikari.net/tre/) and corresponds to the distance used by agrep. In this case, the given cost values are coerced to integer.
广义Levenshtein距离也可以被用于近似的(模糊)字符串匹配,在这种情况下,人们发现子<VAR>ţ</ VAR的最小距离模式<VAR>小号</ VAR的(这可能是作为一个正则表达式,在这种情况下,使用最左边的最长匹配的原则也适用),例如,见http://en.wikipedia.org/wiki/Approximate_string_matching~~V。这个距离计算partial = TRUE使用tre威乐Laurikari(http://http://laurikari.net/tre/)对应agrep使用的距离。在这种情况下,成本值被强制转换为整数。
Note that the costs for insertions and deletions can be different, in which case the distance between <VAR>s</VAR> and <VAR>t</VAR> can be different from the distance between <VAR>t</VAR> and <VAR>s</VAR>.
请注意,插入和缺失的成本可以有所不同,在这种情况下,距离之间<VAR>小号</无功>和<VAR> T </ VAR的>可以是从<VAR> T </无功之间>的距离不同和<VAR> </ VAR的。
值----------Value----------
A matrix with the approximate string distances of the elements of x and y, with rows and columns corresponding to x and y, respectively.
一个矩阵,行和列x和y,分别对应与x和y元素的近似串距离。
If counts is TRUE, the transformation counts are returned as the "counts" attribute of this matrix, as a 3-dimensional array with dimensions corresponding to the elements of x, the elements of y, and the type of transformation (insertions, deletions and substitutions), respectively. Additionally, if partial = FALSE, the transformation sequences are returned as the "trafos" attribute of the return value, as character strings with elements M, I, D and S indicating a match, insertion, deletion and substitution, respectively. If partial = FALSE, the offsets (positions of the first and last element) of the matched substrings are returned as the "offsets" attribute of the return value (with both offsets -1 in case of no match).
如果counts是TRUE,改造计数返回"counts"这个矩阵的属性,作为一个3对应x元素的尺寸维数组,, y,转换(插入,删除和替换),分别类型的元素。此外,如果partial = FALSE,"trafos"属性返回值返回转换序列,字符串元素M,I,D“ S表示匹配,插入,删除和替换,分别。 partial = FALSE如果,匹配的子串的偏移(第一个和最后一个元素的位置)返回"offsets"返回值的属性(既偏移-1在没有比赛的情况下)。
参见----------See Also----------
agrep for approximate string matching (fuzzy matching) using the generalized Levenshtein distance.
agrep近似字符串匹配(模糊匹配)使用广义Levenshtein距离。
举例----------Examples----------
## Cf. http://en.wikipedia.org/wiki/Levenshtein_distance[#CF。 http://en.wikipedia.org/wiki/Levenshtein_distance]
adist("kitten", "sitting")
## To see the transformation counts for the Levenshtein distance:[#要看到Levenshtein距离变换计数:]
drop(attr(adist("kitten", "sitting", counts = TRUE), "counts"))
## To see the transformation sequences:[#要看到转型序列:]
attr(adist(c("kitten", "sitting"), counts = TRUE), "trafos")
## Cf. the examples for agrep:[#CF。 agrep的例子:]
adist("lasy", "1 lazy 2")
## For a "partial approximate match" (as used for agrep):[对于“部分近似匹配”(如使用agrep):]
adist("lasy", "1 lazy 2", partial = TRUE)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|