weightTfIdf(tm)
weightTfIdf()所属R语言包:tm
Weight by Term Frequency - Inverse Document Frequency
重量词频 - 逆文档频率
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Weight a term-document matrix by term frequency - inverse document frequency.
重量术语文档矩阵的词频 - 逆文档频率。
用法----------Usage----------
weightTfIdf(m, normalize = TRUE)
参数----------Arguments----------
参数:m
A TermDocumentMatrix in term frequency format.
ATermDocumentMatrix在术语频率格式。
参数:normalize
A Boolean value indicating whether the term frequencies should be normalized.
一个布尔值,指示是否应该归这个词的频率。
Details
详细信息----------Details----------
Formally this function is of class WeightingFunction with the additional attributes Name and Acronym.
正式这个函数是类WeightingFunction的附加属性的Name和Acronym。
Term frequency \mathit{tf}_{i,j} counts the number of occurrences n_{i,j} of a term t_i in a document d_j. In the case of normalization, the term frequency \mathit{tf}_{i,j} is divided by ∑_k n_{k,j}.
词条频率\mathit{tf}_{i,j}的出现次数进行计数n_{i,j}任期t_i在一个文档中d_j。以标准化的箱子,术语频率\mathit{tf}_{i,j}除以∑_k n_{k,j}。
Inverse document frequency for a term t_i is defined as
逆文档频率的t_i被定义为
where |D| denotes the total number of documents and where |\{d \mid t_i \in d\}| is the number of documents where the term t_i appears.
|D|表示的文档总数|\{d \mid t_i \in d\}|文件的术语t_i出现的数量。
Term frequency - inverse document frequency is now defined as \mathit{tf}_{i,j} \cdot \mathit{idf}_i.
现在被定义为\mathit{tf}_{i,j} \cdot \mathit{idf}_i的词频 - 逆文档频率。
值----------Value----------
The weighted matrix.
加权矩阵。
(作者)----------Author(s)----------
Ingo Feinerer
参考文献----------References----------
Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24/5, 513–523.
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|