calout.detect(parody)
calout.detect()所属R语言包:parody
interface to modular calibrated outlier detection system
接口模块化校准异常检测系统
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Various classical and resistant outlier detection procedures are provided in which the outlier misclassification rate for Gaussian samples is fixed over a range of sample sizes.
高斯样本离群值的误判率是固定的样本大小的范围,提供各种古典和耐异常检测程序。
用法----------Usage----------
calout.detect(x, alpha = 0.05, method = c("GESD", "boxplot", "medmad",
"shorth", "hybrid"), k = ((length(x)%%2) * floor(length(x)/2) +
(1 - (length(x)%%2)) * (length(x)/2 - 1)), scaling, ftype,
location, scale, gen.region = function(x, location, scale,
scaling, alpha) {
g <- scaling(length(x), alpha)
location(x) + c(-1, 1) * g * scale(x)
})
参数----------Arguments----------
参数:x
data vector, NAs not allowed
数据向量,NAS不允许
参数:alpha
outlier mislabeling rate for Gaussian samples
高斯样本离群值的标示法率
参数:method
one of c("GESD", "boxplot", "medmad", "shorth"); the first selects generalized extreme studentized deviate (Rosner, 1983); the second selects calibrated boxplot rules; the third selects the method of Hampel in which the sample median is used for location estimation, and the median absolute deviation is used for scale; and the fourth selects Rousseeuw's rule, with the midpoint of the shortest half sample used as location estimator, and the length of this shortest half sample used as scale estimator. An important characteristic of the GESD procedure is that the critical values for outlier labeling are calibrated to preserve the overall Type I error rate of the procedure given that there will be k tests, whether or not any outliers are present in the data.
c的(“GESD”,“盒形图”,“medmad”,“shorth”);第一选择极端广义学生化偏离(罗斯纳,1983);第二选择校准盒形图的规则;第三选择的方法在样本中位数的汉佩尔用于位置估算,并在中位数绝对偏差被用来为规模;和第四次选择在最短的一半作为位置估计样本的中点Rousseeuw的规则,,和这个最短的一半长度样本作为规模估计。一个重要特点是的GESD程序,校准,以保持整体型我给定的程序,将有被k的测试,不论是任何离群数据的错误率为离群标签的临界值。
参数:k
for GESD, the prespecified upper limit on the number of outliers suspected in the data; defaults to “half” the sample size.
GESD,怀疑离群数据的数量;默认的“半壁江山”的样本大小预先设定的上限。
参数:scaling
for resistant methods, scaling is a sample-size dependent function that tells how many multiples of the scale estimate should be laid off on each side of the location estimate to demarcate the inlier region; see Davies and Gather (1993) for the general formulation. The main contribution of this program consists in the development of scaling functions that “calibrate” outlier detection in Gaussian samples. The scaling function is assumed to take two arguments, n and alpha, and it should return a real number. If method=="boxplot", the default value scaling=box.scale will confine the probability of erroneous detection of one or more outliers in a pure Gaussian sample to alpha. The use of scaling=function(n,alpha) 1.5 gives the standard boxplot outlier labeling rule. If method=="medmad", the use of scaling=hamp.scale.4 will confine the outlier mislabeling rate to alpha; whereas the use of scaling=function(n,alpha) 5.2 gives Hampel's rule (Davies and Gather, 1993, p. 790). If method=="shorth", the default value scaling=shorth.scale will confine the outlier mislabeling rate to alpha.
抗法,结垢是一个样本的大小取决于功能,讲述如何许多倍的规模估计应放在每一侧划定英利尔山区区域的位置估计;看到戴维斯和收集的一般配方(1993)。这个方案的主要贡献包括缩放功能的发展,“校准”在高斯样本离群检测。缩放功能,假设采取两个参数,N和α,它应该返回一个实数。如果方法==“盒形图”,默认值换算= box.scale将局限于阿尔法纯高斯样本离群值的一个或多个错误检测的可能性。使用的比例=(N,α)1.5提供了标准的盒形图的离群标签规则。如果法==“medmad”,使用缩放= hamp.scale.4将只离群标示法率阿尔法;而使用的比例=(N,α)5.2给汉佩尔的规则(戴维斯和收集,1993年,页790)。如果方法==“shorth”,默认值换算= shorth.scale将只离群标示法率阿尔法。
参数:ftype
The type of “fourth” calculation; the standard definition of the fourth uses 0.5 * floor((n + 3)/2) to obtain the sortile of the fourth value; Hoaglin and Iglewicz (1987) give an “ideal” definition of the fourth which reduces the dependence of boxplot-based outlier detection performance (in small samples) on the quantity n mod 4.
“第四次”计算类型的第四个标准定义使用0.5 *楼((N + 3)/ 2)获得第四价值sortile; Hoaglin和Iglewicz(1987)提供一个“理想”的定义第四,从而降低了基于盒形图的异常检测性能上的数量n MOD 4(小样本)的依赖。
参数:location
a function on a vector returning a location estimate
在返回的位置估计矢量函数
参数:scale
a function on a vector returning a scale estimate
返回规模估计矢量函数
参数:gen.region
a function of x, location, scale, scaling, alpha that returns the inlier region as a 2-vector
X,地点,规模,缩放,α2矢量英利尔山区区域返回的函数
值----------Value----------
a list with components ind (indices of outliers in the input vector) val (values of these components) and outlier.region, which is only defined for the resistant methods.
一个组件IND(离群指数在输入向量)值(这些组件的值)和outlier.region,这是唯一的抵抗方法定义的列表。
参考文献----------References----------
Rosner (1983 Technom), Hoaglin and Iglewicz (1987 JASA), Carey, Walters, Wager and Rosner (1997 Technom)
举例----------Examples----------
lead <- c(83, 70, 62, 55, 56, 57, 57, 58, 59, 50, 51, 52, 52, 52, 54, 54, 45, 46, 48,
48, 49, 40, 40, 41, 42, 42, 44, 44, 35, 37, 38, 38, 34, 13, 14)
calout.detect(lead,alpha=.05,method="boxplot",ftype="ideal")
calout.detect(lead,alpha=.05,method="GESD",k=5)
calout.detect(lead,alpha=.05,method="medmad",scaling=hamp.scale.3)
calout.detect(lead,alpha=.05,method="shorth")
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|