create_matrix(sentiment)
create_matrix()所属R语言包:sentiment
creates a document-term matrix.
创建一个文件术语矩阵。
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Creates an object of class DocumentTermMatrix from tm.
创建对象类DocumentTermMatrixtm。
用法----------Usage----------
create_matrix(textColumns, language="english", minDocFreq=1, minWordLength=3,
removeNumbers=TRUE, removePunctuation=TRUE, removeSparseTerms=0, removeStopwords=TRUE,
stemWords=FALSE, stripWhitespace=TRUE, toLower=TRUE, weighting=weightTf)
参数----------Arguments----------
参数:textColumns
Either character vector (e.g. data$Title) or a cbind() of columns to use for training the algorithms (e.g. cbind(data$Title,data$Subject)).
无论是字符向量(例如标题)或cbind()培训的算法使用的列(如:cbind(data$Title,data$Subject))。
参数:language
The language to be used for stemming the text data.
要使用的语言所产生的文本数据。
参数:minDocFreq
The minimum number of times a word should appear in a document for it to be included in the matrix. See package tm for more details.
倍的最小数目的词语应该出现在文档中,它被包含在基质中。请参阅套件“tm更多详细信息。
参数:minWordLength
The minimum number of letters a word should contain to be included in the matrix. See package tm for more details.
字母的最小数目的词语应包含被包含在基质中。请参阅套件“tm更多详细信息。
参数:removeNumbers
A logical parameter to specify whether to remove numbers.
Alogical参数指定是否要删除号码。
参数:removePunctuation
A logical parameter to specify whether to remove punctuation.
Alogical参数指定是否要删除标点符号。
参数:removeSparseTerms
See package tm for more details.
请参阅套件“tm更多详细信息。
参数:removeStopwords
A logical parameter to specify whether to remove stopwords using the language specified in language.
Alogical参数指定是否要删除停用词使用的语言所指定的语言。
参数:stemWords
A logical parameter to specify whether to stem words using the language specified in language.
Alogical参数指定是否要阻止使用指定的语言在语言的词语。
参数:stripWhitespace
A logical parameter to specify whether to strip whitespace.
Alogical参数指定是否要剥离其中的空白。
参数:toLower
A logical parameter to specify whether to make all text lowercase.
Alogical参数指定是否将所有文字小写。
参数:weighting
Either weightTf or weightTfIdf. See package tm for more details.
无论是weightTf或weightTfIdf。请参阅套件“tm更多详细信息。
(作者)----------Author(s)----------
Timothy P. Jurka <tpjurka@ucdavis.edu>
实例----------Examples----------
library(sentiment)
# DEFINE THE DOCUMENTS[定义文件]
documents <- c("I am very happy, excited, and optimistic.",
"I am very scared, annoyed, and irritated.",
"Iraq's political crisis entered its second week one step closer to the potential
dissolution of the government, with a call for elections by a vital coalition partner
and a suicide attack that extended the spate of violence that has followed the withdrawal
of U.S. troops.",
"With nightfall approaching, Los Angeles authorities are urging residents to keep their
outdoor lights on as police and fire officials try to catch the person or people responsible
for nearly 40 arson fires in the last three days.")
matrix <- create_matrix(documents, language="english", removeNumbers=TRUE,
stemWords=FALSE, weighting=weightTfIdf)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|