winsorize(robustHD)
winsorize()所属R语言包:robustHD
Data cleaning by winsorization
数据清理的极值调整
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Clean data by means of winsorization, i.e., by shrinking outlying observations to the border of the main part of the data.
通过极值调整,即,通过缩小边远观测到的数据的主要部分边界的干净的数据。
用法----------Usage----------
winsorize(x, ...)
## Default S3 method:
winsorize(x, standardized = FALSE,
centerFun = median, scaleFun = mad, const = 2,
return = c("data", "weights"), ...)
## S3 method for class 'matrix'
winsorize(x, standardized = FALSE,
centerFun = median, scaleFun = mad, const = 2,
prob = 0.95, tol = .Machine$double.eps^0.5,
return = c("data", "weights"), ...)
## S3 method for class 'data.frame'
winsorize(x, ...)
参数----------Arguments----------
参数:x
a numeric vector, matrix or data frame to be cleaned.
一个数值向量,矩阵或数据框被清洗。
参数:standardized
a logical indicating whether the data are already robustly standardized.
一个逻辑指示数据是否已经鲁棒标准化。
参数:centerFun
a function to compute a robust estimate for the center to be used for robust standardization (defaults to median). Ignored if standardized is TRUE.
一个函数来计算一个稳健估计为中心,以强大的的标准化(默认为median)使用。忽略如果standardized是TRUE。
参数:scaleFun
a function to compute a robust estimate for the scale to be used for robust standardization (defaults to mad). Ignored if standardized is TRUE.
一个功能强大的标准化(默认到mad)用于计算一个强大的规模估计。忽略如果standardized是TRUE。
参数:const
numeric; tuning constant to be used in univariate winsorization (defaults to 2).
数字;时间常数使用单因素极值调整(默认为2)。
参数:prob
numeric; probability for the quantile of the chi-squared distribution to be used in multivariate winsorization (defaults to 0.95).
数字;概率分位数的chi-squared分布的的多元极值调整(默认为0.95)。
参数:tol
a small positive numeric value used to determine singularity issues in the computation of correlation estimates based on bivariate winsorization (see corHuber).
一个小的正数值用于确定奇异计算相关的问题的基础上估计二元极值调整(见corHuber)。
参数:return
character string; if standardized is TRUE, this specifies the type of return value. Possible values are "data" for returning the cleaned data, or "weights" for returning data cleaning weights.
字符串,如果standardizedTRUE,这指定了返回值的类型。可能的值是"data"返回清理的数据,或"weights"返回的数据清洗的权重。
参数:...
for the generic function, additional arguments to be passed down to methods. For the "data.frame" method, additional arguments to be passed down to the "matrix" method. For the other methods, additional arguments to be passed down to robStandardize.
的通用函数,其他参数传下来的方法。 "data.frame"方法,其他参数可以通过"matrix"方法。的其他方法,其他参数可以通过robStandardize。
Details
详细信息----------Details----------
The borders of the main part of the data are defined on the scale of the robustly standardized data. In the univariate case, the borders are given by +/-const, thus a symmetric distribution is assumed. In the multivariate case, a normal distribution is assumed and the data are shrunken towards the boundary of a tolerance ellipse with coverage probability prob. The boundary of this ellipse is thereby given by all points that have a squared Mahalanobis distance equal to the quantile of the chi-squared distribution given by prob.
的数据的主要部分的边界被定义的鲁棒标准化的数据的规模。在单因素的情况下,给出的边界+/-“const,对称分布假设。在多变量的情况下,一个正常的分布假设和数据覆盖概率prob朝向公差椭圆的边界收缩的。此椭圆的边界,由此,给定所有点有平方Mahalanobis距离等于chi-squared分布给出prob的位数。
值----------Value----------
If standardize is TRUE and return is "weights", a set of data cleaning weights. Multiplying each observation of the standardized data by the corresponding weight yields the cleaned standardized data.
standardize如果是TRUE和return是"weights",一组数据清洗的权重。每个观测值乘以相应的权重的标准化的数据产生清洁的标准化的数据。
Otherwise an object of the same type as the original data x containing the cleaned data is returned.
否则作为原始数据相同类型的一个目的x含有清理的数据被返回。
注意----------Note----------
Data cleaning weights are only meaningful for standardized data. In the general case, the data need to be standardized first, then the data cleaning weights can be computed and applied to the standardized data, after which the cleaned standardized data need to be backtransformed to the original scale.
数据清理的权重是唯一有意义的标准化数据。在一般情况下,需要的数据先被标准化,然后数据清理重量可以计算和施加到标准化的数据,之后,在清洁的标准化的数据需要被逆转换到原来的规模。
(作者)----------Author(s)----------
Andreas Alfons, based on code by Jafar A. Khan, Stefan
Van Aelst and Ruben H. Zamar
参考文献----------References----------
linear model selection based on least angle regression. Journal of the American Statistical Association, 102(480), 1289–1299.
参见----------See Also----------
corHuber
corHuber
实例----------Examples----------
## generate data[#生成数据]
set.seed(1234) # for reproducibility[可重复性]
x <- rnorm(10) # standard normal[标准正常]
x[1] <- x[1] * 10 # introduce outlier[介绍离群]
## winsorize data[#winsorize数据]
x
winsorize(x)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|