cut(base)
cut()所属R语言包:base
Convert Numeric to Factor
转换成数字因素
译者:生物统计家园网 机器人LoveR
描述----------Description----------
cut divides the range of x into intervals and codes the values in x according to which interval they fall. The leftmost interval corresponds to level one, the next leftmost to level two and so on.
cut划分范围x到间隔和代码值x根据他们属于哪个区间。最左边的区间水平,等未来最左边的对应。
用法----------Usage----------
cut(x, ...)
## Default S3 method:[默认方法]
cut(x, breaks, labels = NULL,
include.lowest = FALSE, right = TRUE, dig.lab = 3,
ordered_result = FALSE, ...)
参数----------Arguments----------
参数:x
a numeric vector which is to be converted to a factor by cutting.
一个数字的向量,是削减到要转换的一个因素。
参数:breaks
either a numeric vector of two or more cut points or a single number (greater than or equal to 2) giving the number of intervals into which x is to be cut.
要么是两个或两个以上的切点或单数(大于或等于2),给人的间隔融入其中x是要削减的数量的数字向量。
参数:labels
labels for the levels of the resulting category. By default, labels are constructed using "(a,b]" interval notation. If labels = FALSE, simple integer codes are returned instead of a factor.
标签所产生的类别水平。默认情况下,标签使用"(a,b]"间隔符号。如果labels = FALSE,简单的整数代码的一个因素,而不是返回。
参数:include.lowest
logical, indicating if an "x[i]" equal to the lowest (or highest, for right = FALSE) "breaks" value should be included.
逻辑,如果一个“X [我]等于最低(或最高值应包括right = FALSE)游。
参数:right
logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa.
逻辑,表示如果时间间隔应关闭右侧(左侧开)或反之亦然。
参数:dig.lab
integer which is used when labels are not given. It determines the number of digits used in formatting the break numbers.
整数,它是用来当标签不给。它决定了用于格式化中断号的位数。
参数:ordered_result
logical: should the result be an ordered factor?
逻辑:结果应该是一个有序的因素?
参数:...
further arguments passed to or from other methods.
通过进一步的论据或其他方法。
Details
详情----------Details----------
When breaks is specified as a single number, the range of the data is divided into breaks pieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals. (If x is a constant vector, equal-length intervals are created that cover the single value.)
当breaks被指定为单数,数据范围分为breaks件长度相等,那么外部界限提出的0.1%的范围内,以确保在极端重视截断的时间间隔内的双双下降。 (x如果是一个常数向量,长度相等的时间间隔创建覆盖单个值。)
If a labels parameter is specified, its values are used to name the factor levels. If none is specified, the factor level labels are constructed as "(b1, b2]", "(b2, b3]" etc. for right = TRUE and as "[b1, b2)", ... if right = FALSE. In this case, dig.lab indicates the minimum number of digits should be used in formatting the numbers b1, b2, .... A larger value (up to 12) will be used if needed to distinguish between any pair of endpoints: if this fails labels such as "Range3" will be used.
如果labels参数被指定,其值被用来命名因子水平。如果没有指定,构造因子水平标签"(b1, b2]","(b2, b3]"等right = TRUE和"[b1, b2)",...如果right = FALSE。在这种情况下,dig.lab表示最小位数应在格式化数字b1,b2....一个较大的值(12)如果需要的话,将被用来区分任何对端点:如果失败如"Range3"将用于标签。
值----------Value----------
A factor is returned, unless labels = FALSE which results in the mere integer level codes.
一个factor返回,除非labels = FALSE单纯的整数水平代码的结果。
注意----------Note----------
Instead of table(cut(x, br)), hist(x, br, plot = FALSE) is more efficient and less memory hungry. Instead of cut(*, labels = FALSE), findInterval() is more efficient.
而不是table(cut(x, br)),hist(x, br, plot = FALSE)是更有效和更记忆饿。而不是cut(*, labels = FALSE),findInterval()是更有效率。
参考文献----------References----------
The New S Language. Wadsworth & Brooks/Cole.
参见----------See Also----------
split for splitting a variable according to a group factor; factor, tabulate, table, findInterval().
split分裂根据一组因素的变量; factor,tabulate,table,findInterval()。
quantile for ways of choosing breaks of roughly equal content (rather than length).
quantile大致相等内容的选择符(而不是长度)的方法。
举例----------Examples----------
Z <- stats::rnorm(10000)
table(cut(Z, breaks = -6:6))
sum(table(cut(Z, breaks = -6:6, labels=FALSE)))
sum(graphics::hist(Z, breaks = -6:6, plot=FALSE)$counts)
cut(rep(1,5),4)#-- dummy[ - 虚拟]
tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5)
x <- rep(0:8, tx0)
stopifnot(table(x) == tx0)
table( cut(x, b = 8))
table( cut(x, breaks = 3*(-2:5)))
table( cut(x, breaks = 3*(-2:5), right = FALSE))
##--- some values OUTSIDE the breaks :[#---外游的一些值:]
table(cx <- cut(x, breaks = 2*(0:4)))
table(cxl <- cut(x, breaks = 2*(0:4), right = FALSE))
which(is.na(cx)); x[is.na(cx)] #-- the first 9 values 0[ - 第9值0]
which(is.na(cxl)); x[is.na(cxl)] #-- the last 5 values 8[ - 最后5个值8]
## Label construction:[#出版商建设:]
y <- stats::rnorm(100)
table(cut(y, breaks = pi/3*(-3:3)))
table(cut(y, breaks = pi/3*(-3:3), dig.lab=4))
table(cut(y, breaks = 1*(-3:3), dig.lab=4))
# extra digits don't "harm" here[额外的数字不“伤害”]
table(cut(y, breaks = 1*(-3:3), right = FALSE))
#- the same, since no exact INT![ - 同样的,因为没有确切的诠释!]
## sometimes the default dig.lab is not enough to be avoid confusion:[#有时的的默认dig.lab是没有足够的可避免混乱:]
aaa <- c(1,2,3,4,5,2,3,4,5,6,7)
cut(aaa, 3)
cut(aaa, 3, dig.lab=4, ordered = TRUE)
## one way to extract the breakpoints[#单程提取断点]
labs <- levels(cut(aaa, 3))
cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ),
upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) ))
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|