找回密码
 注册
查看: 8823|回复: 0

R语言:agrep()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-17 09:53:35 | 显示全部楼层 |阅读模式
agrep(base)
agrep()所属R语言包:base

                                        Approximate String Matching (Fuzzy Matching)
                                         近似字符串匹配(模糊匹配)

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Searches for approximate matches to pattern (the first argument) within each element of the string x (the second argument) using the generalized Levenshtein edit distance (the minimal possibly weighted number of insertions, deletions and substitutions needed to transform one string into another).
内每个元素的字符串pattern(第二个参数)使用广义Levenshtein编辑距离(插入,删除和替换所需的最小可能的加权数量x(第一个参数的近似匹配搜索)一个字符串转换成另一种)。


用法----------Usage----------


agrep(pattern, x, max.distance = 0.1, costs = NULL,
      ignore.case = FALSE, value = FALSE, fixed = TRUE,
      useBytes = FALSE)



参数----------Arguments----------

参数:pattern
a non-empty character string or a character string containing a regular expression (for fixed = FALSE) to be matched. Coerced by as.character to a string if possible.
一个非空字符串或一个字符串,包含要匹配一个正则表达式(fixed = FALSE)。强制由as.character如果可能的字符串。


参数:x
character vector where matches are sought. Coerced by as.character to a character vector if possible.
特征向量寻求匹配。 as.character裹挟如果可能的特征向量。


参数:max.distance
Maximum distance allowed for a match.  Expressed either as integer, or as a fraction of the pattern length times the maximal transformation cost (will be replaced by the smallest integer not less than the corresponding fraction), or a list with possible components     
允许的最大距离为比赛。表示为整数,或一小部分的格局长度倍最大的改造成本(将被替换为相应的分数不小于的最小整数),或有可能的组件列表

cost:maximum number/fraction of match cost (generalized Levenshtein distance)  
cost:最大数量/匹配成本的一小部分(广义Levenshtein距离)

all:maximal number/fraction of all transformations (insertions, deletions and substitutions)  
all:最大数量/全部转换分数(插入,删除和替换)

insertions:maximum number/fraction of insertions  
insertions:插入的最大数量/分数

deletions:maximum number/fraction of deletions  
deletions:缺失的最大数量/分数

substitutions:maximum number/fraction of substitutions     If cost is not given, all defaults to 10%, and the other transformation number bounds default to all. The component names can be abbreviated.  
substitutions:cost如果没有给出最大数量/替换部分,all默认为10%,其他改造数量界限默认all。组件的名称都可以缩写。


参数:costs
a numeric vector or list with names partially matching insertions, deletions and substitutions giving the respective costs for computing the generalized Levenshtein distance, or NULL (default) indicating using unit cost for all three possible transformations. Coerced to integer via as.integer if possible.
一个数值向量或列表名称部分匹配insertions,deletions和substitutions给各自的成本计算广义Levenshtein距离,或NULL(默认)表示使用单位成本所有三种可能转变。强制转换为整数,通过as.integer如果可能的话。


参数:ignore.case
if FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching.
如果FALSE,模式匹配是大小写敏感的,如果TRUE,情况在匹配过程中忽略。


参数:value
if FALSE, a vector containing the (integer) indices of the matches determined is returned and if TRUE, a vector containing the matching elements themselves is returned.
FALSE如果,向量确定比赛(整数)指数则返回,如果TRUE,自己一个向量,包含匹配的元素,则返回。


参数:fixed
logical.  If TRUE (default), the pattern is matched literally (as is).  Otherwise, it is matched as a regular expression.
逻辑。如果TRUE(默认),模式匹配字面上的(是)。否则,它是作为一个正则表达式匹配。


参数:useBytes
logical. in a multibyte locale, should the comparison be character-by-character (the default) or byte-by-byte.
逻辑。在一个多字节语言环境中,应该是比较字符的字符(默认)或字节逐字节。


Details

详情----------Details----------

The Levenshtein edit distance is used as measure of approximateness: it is the (possibly cost-weighted) total number of insertions, deletions and substitutions required to transform one string into another.
Levenshtein编辑距离被用作措施的approximateness:它是(可能)的总成本加权的插入,删除和需要转换成另一种字符串替换的数目。

As from R 2.10.0 this uses tre by Ville Laurikari (http://http://laurikari.net/tre/), which supports MBCS character matching much better than the previous version.
从R 2.10.0使用tre,威乐Laurikari(http://http://laurikari.net/tre/),它支持的MBCS字符匹配比以前的版本更好。

The main effect of useBytes is to avoid errors/warnings about invalid inputs and spurious matches in multibyte locales. It inhibits the conversion of inputs with marked encodings, and is forced if any input is found which is marked as "bytes".
useBytes主要作用是为了避免无效投入和杂散场比赛在多字节语言环境的错误/警告。它抑制了显着的编码输入的转换,被迫被标记为"bytes"如果发现任何输入。


值----------Value----------

Either a vector giving the indices of the elements that yielded a match, or, if value is TRUE, the matched elements (after coercion, preserving names but no other attributes).
无论是向量产生一个匹配的元素的索引,或者,如果value是TRUE,匹配的元素(胁迫后,保存的名字,但没有其他属性)。


注意----------Note----------

Since someone who read the description carelessly even filed a bug report on it, do note that this matches substrings of each element of x (just as grep does) and not whole elements.  See adist in package utils, which optionally returns the offsets of the matched substrings.
由于有人读不慎甚至上提交一个bug报告的说明,待办事项,x(grep不),而不是整个元素的每个元素匹配子串。 adist包中utils,可选择返回匹配的子串的偏移。


作者(S)----------Author(s)----------



Original version by David Meyer.
Current version by Brian Ripley and Kurt Hornik.




参见----------See Also----------

grep
grep


举例----------Examples----------


agrep("lasy", "1 lazy 2")
agrep("lasy", c(" 1 lazy 2", "1 lasy 2"), max = list(sub = 0))
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2)
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, value = TRUE)
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, ignore.case = TRUE)

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-23 06:21 , Processed in 0.028367 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表