aggregate(stats)
aggregate()所属R语言包:stats
Compute Summary Statistics of Data Subsets
计算汇总统计数据子集
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.
分成子集的数据,计算每个汇总统计,并在方便的形式返回结果。
用法----------Usage----------
aggregate(x, ...)
## Default S3 method:[默认方法]
aggregate(x, ...)
## S3 method for class 'data.frame'
aggregate(x, by, FUN, ..., simplify = TRUE)
## S3 method for class 'formula'[类formula的方法]
aggregate(formula, data, FUN, ...,
subset, na.action = na.omit)
## S3 method for class 'ts'
aggregate(x, nfrequency = 1, FUN = sum, ndeltat = 1,
ts.eps = getOption("ts.eps"), ...)
参数----------Arguments----------
参数:x
an R object.
R对象。
参数:by
a list of grouping elements, each as long as the variables in x.
一个分组元素的列表,每个只要在x变量。
参数:FUN
a function to compute the summary statistics which can be applied to all data subsets.
一个函数来计算它可以应用到所有的数据子集的汇总统计。
参数:simplify
a logical indicating whether results should be simplified to a vector or matrix if possible.
逻辑表示结果是否应该被简化为一个向量或矩阵如果可能的话。
参数:formula
a formula, such as y ~ x or cbind(y1, y2) ~ x1 + x2, where the y variables are numeric data to be split into groups according to the grouping x variables (usually factors).
公式,如y ~ x或cbind(y1, y2) ~ x1 + x2,y变量是根据分组x变量(通常因素)被分裂成组的数字数据。
参数:data
a data frame (or list) from which the variables in formula should be taken.
一个数据框(或列表)应采取从公式中的变量。
参数:subset
an optional vector specifying a subset of observations to be used.
一个可选的向量,指定要使用的意见的一个子集。
参数:na.action
a function which indicates what should happen when the data contain NA values. The default is to ignore missing values in the given variables.
一个函数,它表示数据包含NA值时,会发生什么。默认是忽略缺失值在给定的变量。
参数:nfrequency
new number of observations per unit of time; must be a divisor of the frequency of x.
新号码;每单位时间的观察,必须是的x频率的除数。
参数:ndeltat
new fraction of the sampling period between successive observations; must be a divisor of the sampling interval of x.
新的分数之间的连续观测采样周期;必须是除数x采样间隔。
参数:ts.eps
tolerance used to decide if nfrequency is a sub-multiple of the original frequency.
容忍用来决定nfrequency如果是原来的频率分多。
参数:...
further arguments passed to or used by methods.
通过进一步的论据,或所使用的方法。
Details
详情----------Details----------
aggregate is a generic function with methods for data frames and time series.
aggregate是一个数据框的方法和时间序列的通用功能。
The default method aggregate.default uses the time series method if x is a time series, and otherwise coerces x to a data frame and calls the data frame method.
默认的方法aggregate.default如果x是一个时间序列,采用时间序列法和其他胁迫x到一个数据框和调用数据框的方法。
aggregate.data.frame is the data frame method. If x is not a data frame, it is coerced to one, which must have a non-zero number of rows. Then, each of the variables (columns) in x is split into subsets of cases (rows) of identical combinations of the components of by, and FUN is applied to each such subset with further arguments in ... passed to it. The result is reformatted into a data frame containing the variables in by and x. The ones arising from by contain the unique combinations of grouping values used for determining the subsets, and the ones arising from x the corresponding summaries for the subset of the respective variables in x. If simplify is true, summaries are simplified to vectors or matrices if they have a common length of one or greater than one, respectively; otherwise, lists of summary results according to subsets are obtained. Rows with missing values in any of the by variables will be omitted from the result. (Note that versions of R prior to 2.11.0 required FUN to be a scalar function.)
aggregate.data.frame是数据框的方法。 x如果是不是一个数据框,它被强制到一个,其中必须有一个非零行的数量。然后,每个x变量(列)被分成情况亚群的相同by,FUN被应用到每个这样的子集的组成部分的组合(行)进一步论据...传递给它。结果被格式化成数据框包含by和x变量。从by产生的含有独特的组合分组值确定的子集,和那些xx各自变量的子集相应的总结产生。 simplify如果是真实的,总结简化向量或矩阵,如果他们有一个共同的一个或一个以上的长度,否则,根据汇总结果列表子集得到。从结果by变量的遗漏值的行会被省略。 (请注意,2.11.0之前版本的RFUN是一个标量函数。)
aggregate.formula is a standard formula interface to aggregate.data.frame.
aggregate.formulaaggregate.data.frame是一个标准的公式接口。
aggregate.ts is the time series method, and requires FUN to be a scalar function. If x is not a time series, it is coerced to one. Then, the variables in x are split into appropriate blocks of length frequency(x) / nfrequency, and FUN is applied to each such block, with further (named) arguments in ... passed to it. The result returned is a time series with frequency nfrequency holding the aggregated values. Note that this make most sense for a quarterly or yearly result when the original series covers a whole number of quarters or years: in particular aggregating a monthly series to quarters starting in February does not give a conventional quarterly series.
aggregate.ts是时间序列的方法,并要求FUN是一个标量函数。 x如果不是一个时间序列,它被裹挟到一个。然后,在变量x分割成适当的块长度frequency(x) / nfrequency,FUN应用到每个这样的块,进一步(命名的)论点...传递给它。返回的结果是一个与频率nfrequency聚合值的时间序列。注意,这使最有意义的一个季度或年度的结果时,原系列涵盖了整个季度或几年,特别是聚集宿舍二月开始不给传统的按季数列的按月系列。
FUN is passed to match.fun, and hence it can be a function or a symbol or character string naming a function.
FUN传递match.fun的,因此它可以是一个函数或一个符号或字符的字符串命名函数。
值----------Value----------
For the time series method, a time series of class "ts" or class c("mts", "ts").
对于时间序列的方法,时间序列的一类"ts"类c("mts", "ts")。
For the data frame method, a data frame with columns corresponding to the grouping variables in by followed by aggregated columns from x. If the by has names, the non-empty times are used to label the columns in the results, with unnamed grouping variables being named Group.<VAR>i</VAR> for by[[<VAR>i</VAR>]].
对于数据框的方法,by由x聚合列随后的分组变量与列对应的数据框。如果by的名称,非空的时间用来标记结果中的列,无名分组变量被命名为Group.<VAR>i</VAR>by[[<VAR>i</VAR>]]。
作者(S)----------Author(s)----------
Kurt Hornik, with contributions by Arni Magnusson.
参考文献----------References----------
The New S Language. Wadsworth & Brooks/Cole.
参见----------See Also----------
apply, lapply, tapply.
apply,lapply,tapply。
举例----------Examples----------
## Compute the averages for the variables in 'state.x77', grouped[#计算变量平均值state.x77“的分组]
## according to the region (Northeast, South, North Central, West) that[根据该区域(东北,南,西,北环),#]
## each state belongs to.[#每个州属。]
aggregate(state.x77, list(Region = state.region), mean)
## Compute the averages according to region and the occurrence of more[#计算的平均值,根据区域和更多的发生]
## than 130 days of frost.[#130余天的霜冻。]
aggregate(state.x77,
list(Region = state.region,
Cold = state.x77[,"Frost"] > 130),
mean)
## (Note that no state in 'South' is THAT cold.)[#(注意,没有南的状态是冷的。)]
## example with character variables and NAs[#例如,字符变量和NAS]
testDF <- data.frame(v1 = c(1,3,5,7,8,3,5,NA,4,5,7,9),
v2 = c(11,33,55,77,88,33,55,NA,44,55,77,99) )
by1 <- c("red","blue",1,2,NA,"big",1,2,"red",1,NA,12)
by2 <- c("wet","dry",99,95,NA,"damp",95,99,"red",99,NA,NA)
aggregate(x = testDF, by = list(by1, by2), FUN = "mean")
# and if you want to treat NAs as a group[如果你要当作一组NAS]
fby1 <- factor(by1, exclude = "")
fby2 <- factor(by2, exclude = "")
aggregate(x = testDF, by = list(fby1, fby2), FUN = "mean")
## Formulas, one ~ one, one ~ many, many ~ one, and many ~ many:[#公式,一〜,一〜很多,很多〜,和很多很多〜]
aggregate(weight ~ feed, data = chickwts, mean)
aggregate(breaks ~ wool + tension, data = warpbreaks, mean)
aggregate(cbind(Ozone, Temp) ~ Month, data = airquality, mean)
aggregate(cbind(ncases, ncontrols) ~ alcgp + tobgp, data = esoph, sum)
## Dot notation:[#点符号:]
aggregate(. ~ Species, data = iris, mean)
aggregate(len ~ ., data = ToothGrowth, mean)
## Often followed by xtabs():[#之后往往由xtabs():]
ag <- aggregate(len ~ ., data = ToothGrowth, mean)
xtabs(len ~ ., data = ag)
## Compute the average annual approval ratings for American presidents.[#计算平均每年批准的评分为美国总统。]
aggregate(presidents, nfrequency = 1, FUN = mean)
## Give the summer less weight.[#给夏季重量轻。]
aggregate(presidents, nfrequency = 1,
FUN = weighted.mean, w = c(1, 1, 0.5, 1))
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|