TermDocumentMatrix(tm)
TermDocumentMatrix()所属R语言包:tm
Term-Document Matrix
协议 - 文档矩阵
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Constructs or coerces to a term-document matrix or a document-term matrix.
结构或胁迫的术语文档矩阵或文件的术语矩阵。
用法----------Usage----------
TermDocumentMatrix(x, control = list())
DocumentTermMatrix(x, control = list())
as.TermDocumentMatrix(x, ...)
as.DocumentTermMatrix(x, ...)
参数----------Arguments----------
参数:x
a corpus for the constructors and either a term-document matrix or a document-term matrix or a simple triplet matrix (package slam) or a term frequency vector for the coercing functions.
一个主体的构造函数和一个术语文档矩阵或的文件术语矩阵,或一个简单的三线矩阵(包slam)或一个术语的矢量变频胁迫功能。
参数:control
a named list of control options. There are local options which are evaluated for each document and global options which are evaluated once for the constructed matrix. Available local options are documented in termFreq and are internally delegated to a termFreq call. Available global options are:
命名列表控制选项。有本地的选项,评估每个文件和全局选项构建的矩阵计算一次。可用的当地选项“中的termFreq,并在内部委托给一个termFreq调用。可用的全局选项有:
boundsA list with a tag global whose value must be an integer vector of length 2. Terms that appear in less documents than the lower bound bounds$global[1] or in more documents than the upper bound bounds$global[2] are discarded. Defaults to list(global = c(1,Inf)) (i.e., every term will be used).
bounds一个标签global,其值必须是一个整数向量长度为2的列表。条款中出现的文件比下限bounds$global[1]或更多的文件比bounds$global[2]被丢弃的上限。默认list(global = c(1,Inf))(即每学期将使用)。
weightingA weighting function capable of handling a TermDocumentMatrix. It defaults to weightTf for term frequency weighting. Available weighting functions shipped with the tm package are weightTf, weightTfIdf, weightBin, and weightSMART.
weighting的权函数有能力的处理一个TermDocumentMatrix的。它默认为weightTf术语频率加权。可用加权tm包附带的功能是weightTf,weightTfIdf,weightBin和weightSMART。
参数:...
the additional argument weighting (typically a WeightFunction) is allowed when coercing a simple triplet matrix to a term-document or document-term matrix.
额外的参数weighting(通常是WeightFunction)时,允许强迫一个简单的三线矩阵的一个名词的文件或文件术语矩阵。
值----------Value----------
An object of class TermDocumentMatrix or class DocumentTermMatrix (both inheriting from a simple triplet matrix in package slam) containing a sparse term-document matrix or document-term matrix. The attribute Weighting contains the weighting applied to the matrix.
类的一个对象TermDocumentMatrix类DocumentTermMatrix(包都继承自一个简单的三线矩阵slam),其中包含稀疏的术语文档矩阵或文档术语的矩阵。属性Weighting包含施加到矩阵的加权。
(作者)----------Author(s)----------
Ingo Feinerer
实例----------Examples----------
data("crude")
tdm <- TermDocumentMatrix(crude,
control = list(removePunctuation = TRUE,
stopwords = TRUE))
dtm <- DocumentTermMatrix(crude,
control = list(weighting =
function(x)
weightTfIdf(x, normalize =
FALSE),
stopwords = TRUE))
inspect(tdm[155:160,1:5])
inspect(tdm[c("price", "texas"),c("127","144","191","194")])
inspect(dtm[1:5,155:160])
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|