R语言:cut()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-16 18:50:49

cut(base)
cut()所属R语言包：base

                                    Convert Numeric to Factor
                                       转换成数字因素

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

cut divides the range of x into intervals and codes the values in x according to which interval they fall.  The leftmost interval corresponds to level one, the next leftmost to level two and so on.
cut划分范围x到间隔和代码值x根据他们属于哪个区间。最左边的区间水平，等未来最左边的对应。

用法----------Usage----------

cut(x, ...)

## Default S3 method:[默认方法]
cut(x, breaks, labels = NULL,
include.lowest = FALSE, right = TRUE, dig.lab = 3,
ordered_result = FALSE, ...)

参数----------Arguments----------

参数：x
a numeric vector which is to be converted to a factor by cutting.
一个数字的向量，是削减到要转换的一个因素。

参数：breaks
either a numeric vector of two or more cut points or a single number (greater than or equal to 2) giving the number of intervals into which x is to be cut.
要么是两个或两个以上的切点或单数（大于或等于2），给人的间隔融入其中x是要削减的数量的数字向量。

参数：labels
labels for the levels of the resulting category.  By default, labels are constructed using "(a,b]" interval notation.  If labels = FALSE, simple integer codes are returned instead of a factor.
标签所产生的类别水平。默认情况下，标签使用"(a,b]"间隔符号。如果labels = FALSE，简单的整数代码的一个因素，而不是返回。

参数：include.lowest
logical, indicating if an "x[i]" equal to the lowest (or highest, for right = FALSE) "breaks" value should be included.
逻辑，如果一个“X [我]等于最低（或最高值应包括right = FALSE）游。

参数：right
logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa.
逻辑，表示如果时间间隔应关闭右侧（左侧开）或反之亦然。

参数：dig.lab
integer which is used when labels are not given.  It determines the number of digits used in formatting the break numbers.
整数，它是用来当标签不给。它决定了用于格式化中断号的位数。

参数：ordered_result
logical: should the result be an ordered factor?
逻辑：结果应该是一个有序的因素？

参数：...
further arguments passed to or from other methods.
通过进一步的论据或其他方法。

Details

详情----------Details----------

When breaks is specified as a single number, the range of the data is divided into breaks pieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals.  (If x is a constant vector, equal-length intervals are created that cover the single value.)
当breaks被指定为单数，数据范围分为breaks件长度相等，那么外部界限提出的0.1％的范围内，以确保在极端重视截断的时间间隔内的双双下降。（x如果是一个常数向量，长度相等的时间间隔创建覆盖单个值。）

If a labels parameter is specified, its values are used to name the factor levels.  If none is specified, the factor level labels are constructed as "(b1, b2]", "(b2, b3]" etc. for right = TRUE and as "[b1, b2)", ... if right = FALSE. In this case, dig.lab indicates the minimum number  of digits should be used in formatting the numbers b1, b2, .... A larger value (up to 12) will be used if needed to distinguish between any pair of  endpoints: if this fails labels such as "Range3" will be used.
如果labels参数被指定，其值被用来命名因子水平。如果没有指定，构造因子水平标签"(b1, b2]"，"(b2, b3]"等right = TRUE和"[b1, b2)"，...如果right = FALSE。在这种情况下，dig.lab表示最小位数应在格式化数字b1，b2....一个较大的值（12）如果需要的话，将被用来区分任何对端点：如果失败如"Range3"将用于标签。

值----------Value----------

A factor is returned, unless labels = FALSE which results in the mere integer level codes.
一个factor返回，除非labels = FALSE单纯的整数水平代码的结果。

注意----------Note----------

Instead of table(cut(x, br)), hist(x, br, plot = FALSE) is more efficient and less memory hungry.  Instead of cut(*,    labels = FALSE), findInterval() is more efficient.
而不是table(cut(x, br))，hist(x, br, plot = FALSE)是更有效和更记忆饿。而不是cut(*,    labels = FALSE)，findInterval()是更有效率。

参考文献----------References----------

The New S Language. Wadsworth & Brooks/Cole.

参见----------See Also----------

split for splitting a variable according to a group factor; factor, tabulate, table, findInterval().
split分裂根据一组因素的变量; factor，tabulate，table，findInterval()。

quantile for ways of choosing breaks of roughly equal content (rather than length).
quantile大致相等内容的选择符（而不是长度）的方法。

举例----------Examples----------

Z <- stats::rnorm(10000)
table(cut(Z, breaks = -6:6))
sum(table(cut(Z, breaks = -6:6, labels=FALSE)))
sum(graphics::hist(Z, breaks = -6:6, plot=FALSE)$counts)

cut(rep(1,5),4)#-- dummy[ - 虚拟]
tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5)
x <- rep(0:8, tx0)
stopifnot(table(x) == tx0)

table( cut(x, b = 8))
table( cut(x, breaks = 3*(-2:5)))
table( cut(x, breaks = 3*(-2:5), right = FALSE))

##--- some values OUTSIDE the breaks :[＃---外游的一些值：]
table(cx  <- cut(x, breaks = 2*(0:4)))
table(cxl <- cut(x, breaks = 2*(0:4), right = FALSE))
which(is.na(cx));  x[is.na(cx)]  #-- the first 9  values  0[ - 第9值0]
which(is.na(cxl)); x[is.na(cxl)] #-- the last  5  values  8[ - 最后5个值8]

## Label construction:[＃出版商建设：]
y <- stats::rnorm(100)
table(cut(y, breaks = pi/3*(-3:3)))
table(cut(y, breaks = pi/3*(-3:3), dig.lab=4))

table(cut(y, breaks =  1*(-3:3), dig.lab=4))
# extra digits don't "harm" here[额外的数字不“伤害”]
table(cut(y, breaks =  1*(-3:3), right = FALSE))
#- the same, since no exact INT![ - 同样的，因为没有确切的诠释！]

## sometimes the default dig.lab is not enough to be avoid confusion:[＃有时的的默认dig.lab是没有足够的可避免混乱：]
aaa <- c(1,2,3,4,5,2,3,4,5,6,7)
cut(aaa, 3)
cut(aaa, 3, dig.lab=4, ordered = TRUE)

## one way to extract the breakpoints[＃单程提取断点]
labs <- levels(cut(aaa, 3))
cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ),
   upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) ))

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册