找回密码
 注册
查看: 10318|回复: 0

R语言:cut()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-16 18:50:49 | 显示全部楼层 |阅读模式
cut(base)
cut()所属R语言包:base

                                        Convert Numeric to Factor
                                         转换成数字因素

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

cut divides the range of x into intervals and codes the values in x according to which interval they fall.  The leftmost interval corresponds to level one, the next leftmost to level two and so on.
cut划分范围x到间隔和代码值x根据他们属于哪个区间。最左边的区间水平,等未来最左边的对应。


用法----------Usage----------


cut(x, ...)

## Default S3 method:[默认方法]
cut(x, breaks, labels = NULL,
    include.lowest = FALSE, right = TRUE, dig.lab = 3,
    ordered_result = FALSE, ...)



参数----------Arguments----------

参数:x
a numeric vector which is to be converted to a factor by cutting.
一个数字的向量,是削减到要转换的一个因素。


参数:breaks
either a numeric vector of two or more cut points or a single number (greater than or equal to 2) giving the number of intervals into which x is to be cut.
要么是两个或两个以上的切点或单数(大于或等于2),给人的间隔融入其中x是要削减的数量的数字向量。


参数:labels
labels for the levels of the resulting category.  By default, labels are constructed using "(a,b]" interval notation.  If labels = FALSE, simple integer codes are returned instead of a factor.
标签所产生的类别水平。默认情况下,标签使用"(a,b]"间隔符号。如果labels = FALSE,简单的整数代码的一个因素,而不是返回。


参数:include.lowest
logical, indicating if an "x[i]" equal to the lowest (or highest, for right = FALSE) "breaks" value should be included.
逻辑,如果一个“X [我]等于最低(或最高值应包括right = FALSE)游。


参数:right
logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa.
逻辑,表示如果时间间隔应关闭右侧(左侧开)或反之亦然。


参数:dig.lab
integer which is used when labels are not given.  It determines the number of digits used in formatting the break numbers.
整数,它是用来当标签不给。它决定了用于格式化中断号的位数。


参数:ordered_result
logical: should the result be an ordered factor?
逻辑:结果应该是一个有序的因素?


参数:...
further arguments passed to or from other methods.
通过进一步的论据或其他方法。


Details

详情----------Details----------

When breaks is specified as a single number, the range of the data is divided into breaks pieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals.  (If x is a constant vector, equal-length intervals are created that cover the single value.)
当breaks被指定为单数,数据范围分为breaks件长度相等,那么外部界限提出的0.1%的范围内,以确保在极端重视截断的时间间隔内的双双下降。 (x如果是一个常数向量,长度相等的时间间隔创建覆盖单个值。)

If a labels parameter is specified, its values are used to name the factor levels.  If none is specified, the factor level labels are constructed as "(b1, b2]", "(b2, b3]" etc. for right = TRUE and as "[b1, b2)", ... if right = FALSE. In this case, dig.lab indicates the minimum number  of digits should be used in formatting the numbers b1, b2, .... A larger value (up to 12) will be used if needed to distinguish between any pair of  endpoints: if this fails labels such as "Range3" will be used.
如果labels参数被指定,其值被用来命名因子水平。如果没有指定,构造因子水平标签"(b1, b2]","(b2, b3]"等right = TRUE和"[b1, b2)",...如果right = FALSE。在这种情况下,dig.lab表示最小位数应在格式化数字b1,b2....一个较大的值(12)如果需要的话,将被用来区分任何对端点:如果失败如"Range3"将用于标签。


值----------Value----------

A factor is returned, unless labels = FALSE which results in the mere integer level codes.
一个factor返回,除非labels = FALSE单纯的整数水平代码的结果。


注意----------Note----------

Instead of table(cut(x, br)), hist(x, br, plot = FALSE) is more efficient and less memory hungry.  Instead of cut(*,     labels = FALSE), findInterval() is more efficient.
而不是table(cut(x, br)),hist(x, br, plot = FALSE)是更有效和更记忆饿。而不是cut(*,     labels = FALSE),findInterval()是更有效率。


参考文献----------References----------

The New S Language. Wadsworth & Brooks/Cole.

参见----------See Also----------

split for splitting a variable according to a group factor; factor, tabulate, table, findInterval().
split分裂根据一组因素的变量; factor,tabulate,table,findInterval()。

quantile for ways of choosing breaks of roughly equal content (rather than length).
quantile大致相等内容的选择符(而不是长度)的方法。


举例----------Examples----------


Z <- stats::rnorm(10000)
table(cut(Z, breaks = -6:6))
sum(table(cut(Z, breaks = -6:6, labels=FALSE)))
sum(graphics::hist(Z, breaks = -6:6, plot=FALSE)$counts)

cut(rep(1,5),4)#-- dummy[ - 虚拟]
tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5)
x <- rep(0:8, tx0)
stopifnot(table(x) == tx0)

table( cut(x, b = 8))
table( cut(x, breaks = 3*(-2:5)))
table( cut(x, breaks = 3*(-2:5), right = FALSE))

##--- some values OUTSIDE the breaks :[#---外游的一些值:]
table(cx  <- cut(x, breaks = 2*(0:4)))
table(cxl <- cut(x, breaks = 2*(0:4), right = FALSE))
which(is.na(cx));  x[is.na(cx)]  #-- the first 9  values  0[ - 第9值0]
which(is.na(cxl)); x[is.na(cxl)] #-- the last  5  values  8[ - 最后5个值8]


## Label construction:[#出版商建设:]
y <- stats::rnorm(100)
table(cut(y, breaks = pi/3*(-3:3)))
table(cut(y, breaks = pi/3*(-3:3), dig.lab=4))

table(cut(y, breaks =  1*(-3:3), dig.lab=4))
# extra digits don't "harm" here[额外的数字不“伤害”]
table(cut(y, breaks =  1*(-3:3), right = FALSE))
#- the same, since no exact INT![ - 同样的,因为没有确切的诠释!]

## sometimes the default dig.lab is not enough to be avoid confusion:[#有时的的默认dig.lab是没有足够的可避免混乱:]
aaa <- c(1,2,3,4,5,2,3,4,5,6,7)
cut(aaa, 3)
cut(aaa, 3, dig.lab=4, ordered = TRUE)

## one way to extract the breakpoints[#单程提取断点]
labs <- levels(cut(aaa, 3))
cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ),
      upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) ))

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-24 01:35 , Processed in 0.028511 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表