R语言 tm包 weightTfIdf()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-10-1 10:56:32

weightTfIdf(tm)
weightTfIdf()所属R语言包：tm

                                    Weight by Term Frequency - Inverse Document Frequency
                                       重量词频 - 逆文档频率

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Weight a term-document matrix by term frequency - inverse document frequency.
重量术语文档矩阵的词频 - 逆文档频率。

用法----------Usage----------

weightTfIdf(m, normalize = TRUE)

参数----------Arguments----------

参数：m
A TermDocumentMatrix in term frequency format.
ATermDocumentMatrix在术语频率格式。

参数：normalize
A Boolean value indicating whether the term frequencies should be normalized.
一个布尔值，指示是否应该归这个词的频率。

Details

详细信息----------Details----------

Formally this function is of class WeightingFunction with the additional attributes Name and Acronym.
正式这个函数是类WeightingFunction的附加属性的Name和Acronym。

Term frequency \mathit{tf}_{i,j} counts the number of occurrences n_{i,j} of a term t_i in a document d_j. In the case of normalization, the term frequency \mathit{tf}_{i,j} is divided by ∑_k n_{k,j}.
词条频率\mathit{tf}_{i,j}的出现次数进行计数n_{i,j}任期t_i在一个文档中d_j。以标准化的箱子，术语频率\mathit{tf}_{i,j}除以∑_k n_{k,j}。

Inverse document frequency for a term t_i is defined as
逆文档频率的t_i被定义为

where |D| denotes the total number of documents and where |\{d \mid t_i \in d\}| is the number of documents where the term t_i appears.
|D|表示的文档总数|\{d \mid t_i \in d\}|文件的术语t_i出现的数量。

Term frequency - inverse document frequency is now defined as \mathit{tf}_{i,j} \cdot \mathit{idf}_i.
现在被定义为\mathit{tf}_{i,j} \cdot \mathit{idf}_i的词频 - 逆文档频率。

值----------Value----------

The weighted matrix.
加权矩阵。

（作者）----------Author(s)----------

Ingo Feinerer

参考文献----------References----------

Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24/5, 513–523.
转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册