factor(base)
factor()所属R语言包:base
Factors
因素
译者:生物统计家园网 机器人LoveR
描述----------Description----------
The function factor is used to encode a vector as a factor (the terms "category" and "enumerated type" are also used for factors). If argument ordered is TRUE, the factor levels are assumed to be ordered. For compatibility with S there is also a function ordered.
功能factor用于编码向量的一个因素(“类别”和“枚举类型的因素也使用)。如果参数ordered是TRUE,因子水平的假设订购。与S的兼容性也有功能ordered。
is.factor, is.ordered, as.factor and as.ordered are the membership and coercion functions for these classes.
is.factor,is.ordered,as.factor和as.ordered是这些类的成员和强制功能。
用法----------Usage----------
factor(x = character(), levels, labels = levels,
exclude = NA, ordered = is.ordered(x))
ordered(x, ...)
is.factor(x)
is.ordered(x)
as.factor(x)
as.ordered(x)
addNA(x, ifany=FALSE)
参数----------Arguments----------
参数:x
a vector of data, usually taking a small number of distinct values.
一个数据的向量,通常采取小数目的不同值。
参数:levels
an optional vector of the values that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x. Note that this set can be smaller than sort(unique(x)).
x可能采取的可选值向量。默认是一套独特的as.character(x)值,将增加x顺序排序。请注意,这一套可以比sort(unique(x))小。
参数:labels
either an optional vector of labels for the levels (in the same order as levels after removing those in exclude), or a character string of length 1.
要么是水平的标签可选的矢量(在相同的顺序levels后删除那些在exclude),或一个字符串的长度为1。
参数:exclude
a vector of values to be excluded when forming the set of levels. This should be of the same type as x, and will be coerced if necessary.
值向量形成的水平时,要排除。这应该是同一类型的x,如有必要将被迫。
参数:ordered
logical flag to determine if the levels should be regarded as ordered (in the order given).
逻辑标志来确定,如果各级应下令(在给定的顺序)。
参数:...
(in ordered(.)): any of the above, apart from ordered itself.
(ordered(.)):任何上述情况,除了从ordered本身。
参数:ifany
(in addNA): Only add an NA level if it is used, i.e. if any(is.na(x)).
(addNA):只有一个NA水平,如果使用它,也就是说,如果any(is.na(x))。
Details
详情----------Details----------
The type of the vector x is not restricted; it only must have an as.character method and be sortable (by sort.list).
向量x类型不限制;唯一的,它必须有一个as.character方法是排序(sort.list)。
Ordered factors differ from factors only in their class, but methods and the model-fitting functions treat the two classes quite differently.
下令因素不同,只有在他们的阶级因素,但方法和模型拟合函数对待这两个类完全不同。
The encoding of the vector happens as follows. First all the values in exclude are removed from levels. If x[i] equals levels[j], then the i-th element of the result is j. If no match is found for x[i] in levels, then the i-th element of the result is set to NA.
编码向量的情况如下。首先所有的值在exclude是从levels。如果x[i]等于levels[j],i个元素的结果是j。如果没有找到匹配的x[i]levels,则i个结果的元素设置为NA。
Normally the "levels" used as an attribute of the result are the reduced set of levels after removing those in exclude, but this can be altered by supplying labels. This should either be a set of new labels for the levels, or a character string, in which case the levels are that character string with a sequence number appended.
作为结果的属性的“水平”正常水平降低后删除那些在exclude,但是这可以通过提供labels改变。这应该是一套水平的新标签,或一个字符串,在这种情况下的水平,追加一个序列号字符串。
factor(x, exclude=NULL) applied to a factor is a no-operation unless there are unused levels: in that case, a factor with the reduced level set is returned. If exclude is used it should also be a factor with the same level set as x or a set of codes for the levels to be excluded.
factor(x, exclude=NULL)应用的一个因素是一个无操作,除非有未使用的水平:在这种情况下,降低水平集的一个因素,则返回。如果exclude使用,它也应该是与同级别的因素x或代码被排除在外的水平。
The codes of a factor may contain NA. For a numeric x, set exclude=NULL to make NA an extra level (prints as <NA>); by default, this is the last level.
代码的一个因素可能包含NA。对于一个数字x,将exclude=NULLNA一个额外的水平(如打印<NA>);默认情况下,这是去年的水平。
If NA is a level, the way to set a code to be missing (as opposed to the code of the missing level) is to use is.na on the left-hand-side of an assignment (as in is.na(f)[i] <- TRUE; indexing inside is.na does not work). Under those circumstances missing values are currently printed as <NA>, i.e., identical to entries of level NA.
NA如果是一个级别,丢失(如反对代码失踪水平)的方式来设置一个代码是使用is.na转让左手端( is.na(f)[i] <- TRUE;索引内is.na不上班)。在这种情况下,目前遗漏值印<NA>,即相同的水平NA项。
is.factor is generic: you can write methods to handle specific classes of objects, see InternalMethods.
is.factor是通用的:你可以写的方法来处理特定的类的对象,看到InternalMethods“。
值----------Value----------
factor returns an object of class "factor" which has a set of integer codes the length of x with a "levels" attribute of mode character and unique (!anyDuplicated(.)) entries. If argument ordered is true (or ordered() is used) the result has class c("ordered", "factor").
factor返回一个类的对象"factor"其中有一个整数代码集x与"levels"属性模式character和独特的长度(!anyDuplicated(.))条目。如果参数ordered是真实的(或ordered()使用)结果类c("ordered", "factor")。
Applying factor to an ordered or unordered factor returns a factor (of the same type) with just the levels which occur: see also [.factor for a more transparent way to achieve this.
申请factor一个有序或无序的因素返回刚刚发生的水平的一个因素(同一类型):又见[.factor一个更加透明的方式实现这一目标。
is.factor returns TRUE or FALSE depending on whether its argument is of type factor or not. Correspondingly, is.ordered returns TRUE when its argument is an ordered factor and FALSE otherwise.
is.factor返回TRUE或FALSE取决于它的参数是否是类型的因素或不。相应地,is.ordered回报TRUE时,它的参数是一个有序因子和FALSE否则。
as.factor coerces its argument to a factor. It is an abbreviated form of factor.
as.factor胁迫其参数的一个因素。这是一个缩写形式factor。
as.ordered(x) returns x if this is ordered, and ordered(x) otherwise.
as.ordered(x)x如果这是有序的,和ordered(x)否则返回。
addNA modifies a factor by turning NA into an extra level (so that NA values are counted in tables, for instance).
addNA修改转动NA到一个额外的水平的一个因素(使NA值计算表中,例如)。
警告----------Warning----------
The interpretation of a factor depends on both the codes and the "levels" attribute. Be careful only to compare factors with the same set of levels (in the same order). In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).
解释的一个因素取决于代码和"levels"属性。只是要小心比较在同一组的水平(以相同的顺序)的因素。尤其是as.numeric申请到的一个因素是没有意义的,并可能发生隐式强制。改造因素f约为其原始的数值,as.numeric(levels(f))[f]建议和效率稍微比as.numeric(as.character(f))。
The levels of a factor are by default sorted, but the sort order may well depend on the locale at the time of creation, and should not be assumed to be ASCII.
排序默认的一个因素的水平,但排序顺序很可能将取决于在创建时的语言环境,不应该被认为是ASCII。
There are some anomalies associated with factors that have NA as a level. It is suggested to use them sparingly, e.g., only for tabulation purposes.
有与NA作为一个水平的因素相关的一些异常。它建议他们谨慎使用,例如,仅用于制表目的。
比较运算符和组通用方法----------Comparison operators and group generic methods----------
There are "factor" and "ordered" methods for the group generic Ops which provide methods for the Comparison operators, and for the min,max, and range generics in Summary of "ordered". (The rest of the groups and the Math group generate an error as they are not meaningful for factors.)
有"factor"和"ordered"方法组通用的Ops提供比较运算符的方法,并min,max,<X >的泛型rangeSummary。 (其余组和"ordered"组产生一个错误,因为它们是有意义的因素。)
Only == and != can be used for factors: a factor can only be compared to another factor with an identical set of levels (not necessarily in the same ordering) or to a character vector. Ordered factors are compared in the same way, but the general dispatch mechanism precludes comparing ordered and unordered factors.
只有==和!=可用于因素:一个因素,只能与相同的水平集(不一定在同一顺序)或一个字符向量的另一个因素。有序的因素相比,在相同的方式,但一般的调度机制比较有序和无序的因素排除。
All the comparison operators are available for ordered factors. Collation is done by the levels of the operands: if both operands are ordered factors they must have the same level set.
所有比较运算符有为下令因素。排序规则是由操作数的水平:如果两个操作数排序的因素,它们必须有相同的水平集。
注意----------Note----------
In earlier versions of R, storing character data as a factor was more space efficient if there is even a small proportion of repeats. However, identical character strings share storage, so the difference is now small in most cases. (Integer values are stored in 4 bytes whereas each reference to a character string needs a pointer of 4 or 8 bytes.)
在研发的早期版本中,存储字符数据的一个因素是如果一个重复的比例很小,甚至有更多的空间效率。然而,相同的字符串共享存储,所以不同的是现在在大多数情况下小。 (整型值存储在4个字节,而每一个字符串的引用需要4个或8个字节的指针。)
参考文献----------References----------
Statistical Models in S. Wadsworth & Brooks/Cole.
参见----------See Also----------
[.factor for subsetting of factors.
[.factor因素子集。
gl for construction of balanced factors and C for factors with specified contrasts. levels and nlevels for accessing the levels, and unclass to get integer codes.
gl建设的平衡因素和C指定的反差因素。 levels和nlevels访问的水平,和unclass整数代码。
举例----------Examples----------
(ff <- factor(substring("statistics", 1:10, 1:10), levels=letters))
as.integer(ff) # the internal codes[内部守则]
(f. <- factor(ff))# drops the levels that do not occur[下降的水平,没有发生]
ff[, drop=TRUE] # the same, more transparently[同样的,更透明]
factor(letters[1:20], labels="letter")
class(ordered(4:1)) # "ordered", inheriting from "factor"[“有序”,从“生产要素”继承]
z <- factor(LETTERS[3:1], ordered = TRUE)
## and "relational" methods work:[#和“关系”的方法:]
stopifnot(sort(z)[c(1,3)] == range(z), min(z) < max(z))
## suppose you want "NA" as a level, and to allow missing values.[#假设你想“不适用”的水平,并允许缺失值。]
(x <- factor(c(1, 2, NA), exclude = NULL))
is.na(x)[2] <- TRUE
x # [1] 1 <NA> <NA>[[1] 1 <NA> <NA>]
is.na(x)
# [1] FALSE TRUE FALSE[[1] TRUE,FALSE,FALSE,]
## Using addNA()[#使用addNA()]
Month <- airquality$Month
table(addNA(Month))
table(addNA(Month, ifany=TRUE))
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|