OutlierPCDist(rrcovHD)
OutlierPCDist()所属R语言包:rrcovHD
Outlier identification in high dimensions using the PCDIST algorithm
在高维空间中的离群点识别使用PCDIST算法
译者:生物统计家园网 机器人LoveR
描述----------Description----------
The function implements a simple, automatic outlier detection method suitable for high dimensional data that treats each class independently and uses a statistically principled threshold for outliers. The algorithm can detect both mislabeled and abnormal samples without reference to other classes.
该函数实现一个简单的,自动的孤立点检测方法适用于高维数据,将每个类独立的离群值,并使用统计学上原则性的阈值。该算法可以检测到贴错标签的没有参考其他类样本和异常。
用法----------Usage----------
OutlierPCDist(x, ...)
## Default S3 method:
OutlierPCDist(x, grouping, control, k, explvar, trace=FALSE, ...)
## S3 method for class 'formula'
OutlierPCDist(formula, data, ..., subset, na.action)
参数----------Arguments----------
参数:formula
a formula with no response variable, referring only to numeric variables.
没有响应变量的公式,只给数值变量。
参数:data
an optional data frame (or similar: see model.frame) containing the variables in the formula formula.
一个可选的数据框(或相似:model.frame),其中包含公式formula中的变量。
参数:subset
an optional vector used to select rows (observations) of the data matrix x.
的可选的向量选择行(观察)的数据矩阵x。
参数:na.action
a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The default is na.omit.
一个函数,它表示当数据包含NA的,应该发生什么。默认设置是由na.action的options,是na.fail,如果是没有设置的。默认的na.omit。
参数:...
arguments passed to or from other methods.
传递的参数或其他方法。
参数:x
a matrix or data frame.
一个矩阵或数据框。
参数:grouping
grouping variable: a factor specifying the class for each observation.
分组变量:指定一个类为每个观测的一个因素。
参数:control
a control object (S4) for one of the available control classes, e.g. CovControlMcd-class, CovControlOgk-class, CovControlSest-class, etc., containing estimation options. The class of this object defines which estimator will be used. Alternatively a character string can be specified which names the estimator - one of auto, sde, mcd, ogk, m, mve, sfast, surreal, bisquare, rocke. If 'auto' is specified or the argument is missing, the function will select the estimator (see below for details)
一个可用的控件类,例如一个控制对象(S4) CovControlMcd-class,CovControlOgk-class,的CovControlSest-class等,估计选项。这个对象的类定义了将被用于的估计。另外一个字符串,可以指定命名的估计 - 之一:汽车,SDE,MCD,OGK,M,MVE,sfast,超现实主义,bisquare,rocke。如果“自动”指定的参数丢失时,该功能将选择的估计(详情见下文)
参数:k
Number of components to select for PCA. If missing, the number of components will be calculated automatically
选择PCA组件数量。如果缺少,将自动计算出的组件数
参数:explvar
Minimal explained variance to be used for calculation of the number of components in PCA. If explvar is not provided, automatic dimensionality selection using profile likelihood, as proposed by Zhu and Ghodsi will be used.
最小方差解释量要用于计算在PCA的组件数。如果explvar不提供的,自动的维度选择使用配置文件的可能性,如朱Ghodsi提出将使用。
参数:trace
whether to print intermediate results. Default is trace = FALSE
是否要打印的中间结果。默认是trace = FALSE
Details
详细信息----------Details----------
If the data set consists of two or more classes (specified by the grouping variable grouping) the proposed method iterates through the classes present in the data, separates each class from the rest and identifies the outliers relative to this class, thus treating both types of outliers, the mislabeled and the abnormal samples in a homogenous way.
如果数据集是由两个或多个类(指定分组变量grouping)所提出的方法遍历数据中存在的类,其余分隔每个类,并确定相对于这一类的离群值,在一个同质的方式对待这两种类型的异常值,标示错误和异常样本。
The first step of the algorithm is dimensionality reduction using (classical) PCA. The number of components to select can be provided by the user but if missing, the number of components will be calculated either using the provided minimal explained variance or by the automatic dimensionality selection using profile likelihood, as proposed by Zhu and Ghodsi.
该算法的第一步是使用(古典)PCA的维数降低。的组件的数量来选择,可以由用户提供的,但如果缺失,组件的数量将被计算出来,使用所提供的最小方差解释量或由自动维度选择使用更新似然,如由诸和Ghodsi提出。
值----------Value----------
An S4 object of class OutlierPCDist which is a subclass of the virtual class Outlier.
S4对象的类OutlierPCDist这是虚拟类Outlier的一个子类。
(作者)----------Author(s)----------
Valentin Todorov <a href="mailto:valentin.todorov@chello.at">valentin.todorov@chello.at</a>
参考文献----------References----------
Detecting Outlier Samples in Microarray Data, Statistical Applications in Genetics and Molecular Biology Vol. 8.
plot via the use of profile likelihood. Computational Statistics & Data Analysis, Vol. 51, 918-930.
Robust tools for the imperfect world, To appear.
参见----------See Also----------
OutlierPCDist, Outlier
OutlierPCDist,Outlier
实例----------Examples----------
data(hemophilia)
obj <- OutlierSign1(gr~.,data=hemophilia)
obj
getDistance(obj) # returns an array of distances[返回一个数组的距离]
getClassLabels(obj, 1) # returns an array of indices for a given class[对于一个给定的类,返回一个数组的索引]
getCutoff(obj) # returns an array of cutoff values (for each class, usually equal)[返回一个数组的临界值(每类,通常等于)]
getFlag(obj) # returns an 0/1 array of flags[返回一个0/1阵列的标志]
plot(obj, class=2) # standard plot function[标准的绘图功能]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|