R语言 rockchalk包 summarizeFactors()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-27 22:45:16

summarizeFactors(rockchalk)
summarizeFactors()所属R语言包：rockchalk

                                    Extracts non-numeric variables, calculates summary information,
                                       提取物非数字变量，计算出的摘要信息，

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This function finds the non- numeric variables and ignores the others. (See summarizeNumerics for a function that handles numeric variables.)  It then treats all non-numeric variables as if they were factors, and summarizes each. The main benefits from this compared to R's default summary are 1) more summary information is returned for each variable (entropy estimates ofdispersion), 2) the columns in the output are alphabetized. To prevent alphabetization, use alphaSort = FALSE.
此函数将非数字变量，而忽略其他。（见summarizeNumerics的数值变量的函数来处理的。）然后把所有的非数字变量，如果他们的因素，并总结了各。来自此相比，R的默认摘要的主要好处是：1）更多的摘要信息每个变量（熵估计ofdispersion的的返回），2）的列在输出中的按字母顺序排列。为了防止字母顺序中，使用alphaSort = FALSE。

用法----------Usage----------

  summarizeFactors(dat = NULL, maxLevels = 5,
alphaSort = TRUE, sumstat = TRUE,
digits = max(3, getOption("digits") - 3))

参数----------Arguments----------

参数：dat
A data frame
一个数据框

参数：maxLevels
The maximum number of levels that will be reported.
将被报告的水平的最大数目。

参数：alphaSort
If TRUE (default), the columns are re-organized in alphabetical order. If FALSE, they are presented in the original order.
如果是TRUE（默认），列重的字母顺序排列。如果为FALSE，他们在原来的顺序。

参数：sumstat
If TRUE (default), report indicators of dispersion and the number of missing cases (NAs).
如果为TRUE（默认值），报告指标的分散性和丢失的情况下（NAS）的数量。

参数：digits
integer, used for number formatting output.
整数，用于数字格式输出。

Details

详细信息----------Details----------

Entropy is one possible measure of diversity. If all outcomes are equally likely, the entropy is maximized, while if all outcomes fall into one possible category, entropy is at its lowest values. The lowest possible value for entropy is 0, while the maximum value is dependent on the number of categories. Entropy is also called Shannon's information index in some fields of study (Balch, 2000 ; Shannon, 1949 ).
熵是一种可能的措施的多样性。如果所有的结果是等可能的，熵最大化，而如果所有的结果陷入一个可能的类别，熵是在其最低值。熵可能的最低值是0，而最大的值是依赖于类的数量。熵也被称为Shannon信息指数（2000年鲍尔奇，香农在某些领域的研究，1949年）。

Concerning the use of entropy as a diversity index, the user might consult Balch(). For each possible outcome category, let p represent the observed proportion of cases. The diversity contribution of each category is -p * log2(p). Note that if p is either 0 or 1, the diversity contribution is 0.  The sum of those diversity contributions across possible outcomes is the entropy estimate. The entropy value is a lower bound of 0, but there is no upper bound that is independent of the number of possible categories. If m is the number of categories, the maximum possible value of entropy is -log2(1/m).
关于利用熵多样性指数，用户可能会参考鲍尔奇（）。对于每一个可能的结果类别，让p分别表示观察到的比例的情况下。每个类别的多样性贡献是-p *的log2（对）。需要注意的是，当p是0或1，的多样性的贡献为0。那些可能的结果之间的多样性的贡献的总和是熵的估计。熵值是下限为0，但也有是没有上限，是可能的类别的数量无关。如果m是类的数量，最大可能的熵值是为log2（1 / M）。

Because the maximum value of entropy depends on the number of possible categories, some scholars wish to re-scale so as to bring the values into a common numeric scale. The normed entropy is calculated as the observed entropy divided by the maximum possible entropy.  Normed entropy takes on values between 0 and 1, so in a sense, its values are more easily comparable. However, the comparison is something of an illusion, since variables with the same number of categories will always be comparable by their entropy, whether it is normed or not.
由于最大的熵值取决于一些可能的类别，一些学者希望重新扩大，从而带来到一个共同的数字刻度值。赋范熵计算可能的最大熵除以所观察到的熵。赋范熵会在0和1之间的值，所以从某种意义上来说，它的价值更容易比较。但是，比较是一种错觉的东西，因为与相同数量的类别的变量将始终是由它们的熵相媲美，无论是赋范与否。

值----------Value----------

A list of factor summaries
列表的因素总结

（作者）----------Author(s)----------

Paul E. Johnson <pauljohn@ku.edu>

参考文献----------References----------

Information Theoretic Measure of Robot Group Diversity. Auton. Robots, 8(3), 209-238.
Communication. Urbana: University of Illinois Press.

参见----------See Also----------

summarizeFactors and summarizeNumerics
summarizeFactors和summarizeNumerics

实例----------Examples----------

set.seed(21234)
x <- runif(1000)
xn <- ifelse(x < 0.2, 0, ifelse(x < 0.6, 1, 2))
xf <- factor(xn, levels=c(0,1,2), labels("A","B","C"))
dat <- data.frame(xf, xn, x)
summarizeFactors(dat)
##see help for summarize for more examples[＃看到更多的例子帮助总结]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 rockchalk包 summarizeFactors()函数中文帮助文档(中英文对照)

浏览过的版块