R语言 RTextTools包 wordStem()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-28 22:43:57

wordStem(RTextTools)
wordStem()所属R语言包：RTextTools

                                    Get the common root/stem of words
                                       共同的根/干的话

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This function computes the stems of each of the given words in the vector. This reduces a word to its base component, making it easier to compare words like win, winning, winner. See http://snowball.tartarus.org/ for more information about the concept and algorithms for stemming.
该函数计算向量中的每一个给定的单词的茎。这将减少一个字的基本组件，使其更容易比较的话喜欢赢，赢，获胜者。的概念和算法所产生的更多信息，请参见http://snowball.tartarus.org/。

用法----------Usage----------

wordStem(words, language = character(), warnTested = FALSE)

参数----------Arguments----------

参数：words
a character vector of words whose stems are to be computed.
字符向量计算，其茎的话。

参数：language
the name of a recognized language for the package. This should either be a single string which is an element in the vector  returned by getStemLanguages, or alternatively a character vector of length 3 giving the names of the routines for creating and closing a Snowball SN\_env environment and performing the stem (in that order). See the example below.
一个公认的语言包的名称。这应该是一个字符串，它是一个元素的矢量返回getStemLanguages，或者一个字符长度为3的向量提供的例程的名称，用于创建和收雪球SN\_env环境和执行干单元（按照这个顺序）。请看下面的例子。

参数：warnTested
an option to control whether a warning is issued about languages which have not been explicitly tested as part of the unit testing of the code.  For the most part, one can ignore these warnings and so they are turned off. In the future, we might consider controlling this with a global option, but for now we suppress the warnings by default.
一个选项来控制是否发出警告，关于没有被明确的语言测试的单元测试的代码的一部分。对于在大多数情况下，可以忽略这些警告等便被关闭。在未来，我们可能会考虑控制一个全球性的选项，但默认情况下，现在我们抑制警告。

Details

详细信息----------Details----------

This uses Dr. Martin Porter's stemming algorithm and the interface generated by  Snowball http://snowball.tartarus.org/.
这使用马丁·波特所产生的算法和接口所产生的雪球http://snowball.tartarus.org/的。

值----------Value----------

A character vector with as many elements as there are in the input vector with the corresponding elements being the stem of the  word.
一个字符一样多的元素，有干这个词的对应元素的输入向量与向量。

（作者）----------Author(s)----------

Duncan Temple Lang <duncan@wald.ucdavis.edu>

参考文献----------References----------

实例----------Examples----------

# Simple example[简单的例子]
# "win" "win" "winner"[“双赢”，“双赢”，“赢家”]
wordStem(c("win", "winning", 'winner'))

  # test the supplied vocabulary.[测试供应词汇。]
testWords = readLines(system.file("words", "english", "voc.txt", package = "RTextTools"))
validate = readLines(system.file("words", "english", "output.txt", package = "RTextTools"))

## Not run: [＃不运行：]
# Read the test words directly from the snowball site over the Web[测试的话直接在Web上的雪球网站]
testWords = readLines(url("http://snowball.tartarus.org/english/voc.txt"))

## End(Not run)[＃（不执行）]

testOut = wordStem(testWords)
all(validate == testOut)

  # Specify the language from one of the built-in languages.[指定从一个内置的语言的语言。]
testOut = wordStem(testWords, "english")
all(validate == testOut)

  # To illustrate using the dynamic lookup of symbols that allows one[为了说明使用的符号的动态查找，允许一个]
  # to easily add new languages or create and close environment[轻松地添加新的语言，创建和关闭环境]
  # routines (for example, to manage pools if this were an efficiency[例程（例如，管理池，如果这是一个效率]
  # issue!)[问题！）]
testOut = wordStem(testWords, c("testDynCreate", "testDynClose", "testDynStem"))

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册