R语言 discretization包 chi2()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-16 20:40:43

chi2(discretization)
chi2()所属R语言包：discretization

                                       Discretization using the Chi2 algorithm
                                       离散采用χ^ 2算法

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This function performs Chi2 discretization algorithm. Chi2 algorithm automatically determines a proper Chi-sqaure(χ^2) threshold that keeps the fidelity of the original numeric dataset.
这个函数执行χ^ 2离散化算法。 χ^ 2算法自动确定适当的广场上（χ^2）阈值保持原来的数字数据的保真度。

用法----------Usage----------

chi2(data, alp = 0.5, del = 0.05)

参数----------Arguments----------

参数：data
the dataset to be discretize
数据集的离散

参数：alp
significance level; α
显着性水平; α

参数：del
Inconsistency(data)< δ, (Liu and Setiono(1995))
Inconsistency(data)< δ，（刘和Setiono的（1995））

Details

详细信息----------Details----------

The Chi2 algorithm is based on the χ^2 statistic, and consists of two phases. In the first phase, it begins with a high significance level(sigLevel), for all numeric attributes for discretization. Each attribute is sorted according to its values. Then the following is performed:  phase 1. calculate the χ^2 value for every pair of adjacent intervals (at the beginning, each pattern is put into its own interval that contains only one value of an attribute);  pahse 2. merge the pair of adjacent intervals with the lowest χ^2 value. Merging continues until all pairs of intervals have χ^2 values exceeding the parameter determined by sigLevel. The above process is repeated with a decreased sigLevel until an inconsistency rate(δ), incon(), is exceeded in the discretized data(Liu and Setiono (1995)).
χ^ 2算法的基础上χ^2统计，并分为两个阶段。在第一阶段，它具有很高的显着性水平（sigLevel），所有数值属性离散化的开始。每一个属性进行排序，根据自己的价值观。再下面是进行：第1阶段。计算χ^2的值的每对相邻的间隔（在开始时，每个图案被置于其自己的时间间隔，仅包含一个值的一个属性）; pahse 2。合并对相邻间隔与最低χ^2值。合并继续，直到所有的时间间隔对χ^2值超过确定的参数由sigLevel。上述过程被重复与降低sigLevel直到不一致率（δ），incon()，在离散化数据超过（Liu和Setiono（1995））。

值----------Value----------

<table summary="R valueblock"> <tr valign="top"><td>cutp </td> <td> list of cut-points for each variable</td></tr> <tr valign="top"><td>Disc.data </td> <td> discretized data matrix</td></tr> </table>
<table summary="R valueblock"> <tr valign="top"> <TD> cutp </ TD> <TD>的切点为每个变量列表</ TD> </ TR> <TR VALIGN =“”> <TD>Disc.data  </ TD> <TD>离散化数据矩阵</ TD> </ TR> </ TABLE>

（作者）----------Author(s)----------

HyunJi Kim <a href="mailto:polaris7867@gmail.com">polaris7867@gmail.com</a>

参考文献----------References----------

Liu, H. and Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes, Tools with Artificial Intelligence, 388–391.
Liu, H. and Setiono, R. (1997). Feature selection and discretization, IEEE  transactions on knowledge and data engineering, Vol.9, no.4, 642–645.

参见----------See Also----------

value, incon and chiM.
value，incon和chiM。

实例----------Examples----------

data(iris)
#---cut-points[---切点]
chi2(iris,0.5,0.05)$cutp

#--discretized dataset using Chi2 algorithm[ - 使用χ^ 2算法的离散数据集]
chi2(iris,0.5,0.05)$Disc.data

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册