PcaHubert(rrcov)
PcaHubert()所属R语言包:rrcov
ROBPCA - ROBust method for Principal Components Analysis
ROBPCA - 强大的主成分分析方法
译者:生物统计家园网 机器人LoveR
描述----------Description----------
The ROBPCA algorithm was proposed by Hubert et al (2005) and stays for 'ROBust method for Principal Components Analysis'. It is resistant to outliers in the data. The robust loadings are computed using projection-pursuit techniques and the MCD method. Therefore ROBPCA can be applied to both low and high-dimensional data sets. In low dimensions, the MCD method is applied.
ROBPCA算法,提出了由Hubert等人(2005年),并保持有强大的主成分分析方法“。是耐离群点数据中。强劲的荷载是使用投影追求技术和MCD方法计算的。 ,因此ROBPCA可以施加到低和高维数据集。在低维中,被施加的MCD方法。
用法----------Usage----------
PcaHubert(x, ...)
## Default S3 method:[默认方法]
PcaHubert(x, k = 0, kmax = 10, alpha = 0.75, mcd = TRUE, maxdir=250, scale = FALSE, signflip = TRUE, trace=FALSE, ...)
## S3 method for class 'formula'[类formula的方法]
PcaHubert(formula, data = NULL, subset, na.action, ...)
参数----------Arguments----------
参数:formula
a formula with no response variable, referring only to numeric variables.
没有响应变量的公式,只给数值变量。
参数:data
an optional data frame (or similar: see model.frame) containing the variables in the formula formula.
一个可选的数据框(或相似:model.frame),其中包含公式formula中的变量。
参数:subset
an optional vector used to select rows (observations) of the data matrix x.
的可选的向量选择行(观察)的数据矩阵x。
参数:na.action
a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The default is na.omit.
一个函数,它表示当数据包含NA的,应该发生什么。默认设置是由na.action的options,是na.fail,如果是没有设置的。默认的na.omit。
参数:...
arguments passed to or from other methods.
传递的参数或其他方法。
参数:x
a numeric matrix (or data frame) which provides the data for the principal components analysis.
一个数字矩阵(或数据框),它提供的数据,主成分分析。
参数:k
number of principal components to compute. If k is missing, or k = 0, the algorithm itself will determine the number of components by finding such k that l_k/l_1 >= 10.E-3 and Σ_{j=1}^k l_j/Σ_{j=1}^r l_j >= 0.8. It is preferable to investigate the scree plot in order to choose the number of components and then run again. Default is k=0.
主成分的数目来计算。如果k失踪,或k = 0,算法本身决定的元件数量由找到这样的k,l_k/l_1 >= 10.E-3和Σ_{j=1}^k l_j/Σ_{j=1}^r l_j >= 0.8。这是最好的卵石在选择组件的数量,然后再次运行图进行调查。默认是k=0。
参数:kmax
maximal number of principal components to compute. Default is kmax=10. If k is provided, kmax does not need to be specified, unless k is larger than 10.
最大的主成分个数来计算。默认是kmax=10。如果k提供,kmax不需要被指定,除非k是大于10。
参数:alpha
this parameter measures the fraction of outliers the algorithm should resist. In MCD alpha controls the size of the subsets over which the determinant is minimized, i.e. alpha*n observations are used for computing the determinant. Allowed values are between 0.5 and 1 and the default is 0.75.
这个参数测量异常值的算法应该抵制的分数。在MCD阿尔法控制行列式最小的子集的大小,即α* n个观察用于计算行列式。允许的值是0.5~1之间,默认值是0.75。
参数:mcd
Logical - when the number of variables is sufficiently small, the loadings are computed as the eigenvectors of the MCD covariance matrix, hence the function CovMcd() is automatically called. The number of principal components is then taken as k = rank(x). Default is mcd=TRUE. If mcd=FALSE, the ROBPCA algorithm is always applied.
逻辑 - 时的变量的数量是足够小的,负荷的作为MCD协方差矩阵的特征向量的计算,因此,功能CovMcd()被自动调用。的主成分个数为k =等级(x)的,然后采取。默认是mcd=TRUE。如果mcd=FALSE,ROBPCA算法的应用。
参数:maxdir
maximal number of random directions to use for computing the outlyingness of the data points. Default is maxdir=250. If the number n of observations is small all possible n*(n-1)/2 pairs of observations are taken to generate the directions.
随机的方向,以用于计算的数据点的outlyingness最大数目。默认是maxdir=250。如果数n观测的小采取所有可能的n*(n-1)/2对观测产生。
参数:scale
a logical value indicating whether the variables should be scaled to have unit variance (only possible if there are no constant variables). As a scale function mad is used but alternatively, a vector of length equal the number of columns of x can be supplied. The value is passed to scale and the result of the scaling is stored in the scale slot. Default is scale = FALSE
一个逻辑值,该值指示变量是否应扩展到单位方差(唯一可能的,如果没有常量变量)。作为一个尺度函数mad被使用,但可替代地,一个向量的长度等于x的列的数目可以供给。该值被传递规模和缩放的结果被存储在scale插槽。默认是scale = FALSE
参数:signflip
a logical value indicating wheather to try to solve the sign indeterminancy of the loadings - ad hoc approach setting the maximum element in a singular vector to be positive. Default is signflip = FALSE
一个逻辑值,该值指示wheather尝试解决负荷标志indeterminancy - 专案的方式设置的最大元素在一个单一的向量,是积极的。默认是signflip = FALSE
参数:trace
whether to print intermediate results. Default is trace = FALSE
是否要打印的中间结果。默认是trace = FALSE
Details
详细信息----------Details----------
PcaHubert, serving as a constructor for objects of class PcaHubert-class is a generic function with "formula" and "default" methods. The calculation is done using the ROBPCA method of Hubert et al (2005) which can be described briefly as follows. For details see the relevant references.
PcaHubert,作为一个构造函数的类的对象PcaHubert-class是一个通用的功能与“公式”和“默认”的方法。该计算是通过使用ROBPCA休伯特·等人(2005)的方法,可以简要描述如下。有关详细信息,请参阅相关的参考文献。
Let n denote the number of observations, and p the number of original variables in the input data matrix X. The ROBPCA algorithm finds a robust center M (p x 1) of the data and a loading matrix P which is (p x k) dimensional. Its columns are orthogonal and define a new coordinate system. The scores T, an (n x k) matrix, are the coordinates of the centered observations with respect to the loadings:
让我们n表示若干意见,并p原始变量的输入数据矩阵X。 ROBPCA算法找到一个强大的中心M (p x 1)的数据和一个载荷矩阵P是(p x k)维。它的列是正交的,并定义一个新的坐标系。得分T,一个(n x k)矩阵的坐标的中心的相对于负荷的观察:
The ROBPCA algorithm also yields a robust covariance matrix (often singular) which can be computed as
ROBPCA算法也产生一个强大的协方差矩阵(通常是奇异的)可以被计算为
where L is the diagonal matrix with the eigenvalues l_1, …, \l_k.
其中L是对角矩阵,特征值l_1, …, \l_k。
This is done in the following three main steps:
这是在以下三个主要步骤:
Step 1: The data are preprocessed by reducing their data space to the subspace spanned by the n observations. This is done by singular value decomposition of the input data matrix. As a result the data are represented using at most n-1=rank(X) without loss of information.
第1步:数据预处理,减少他们的数据空间的子空间的n意见。这是通过输入数据矩阵的奇异值分解的。其结果,数据被表示使用至多n-1=rank(X)而不丢失信息。
Step 2: In this step for each data point a measure of outlyingness is computed. For this purpose the high-dimensional data points are projected on many univariate directions, each time the univariate MCD estimator of location and scale is computed and the standardized distance to the center is measured. The largest of these distances (over all considered directions) is the outlyingness measure of the data point. The h data points with smallest outlyingness measure are used to compute the covariance matrix Σ_h and to select the number k of principal components to retain. This is done by finding such k that l_k/l_1 >= 10.E-3 and Σ_{j=1}^k l_j/Σ_{j=1}^r l_j >= 0.8 Alternatively the number of principal components k can be specified by the user after inspecting the scree plot.
第2步:在这一步中的每个数据点的度量计算outlyingness。为此目的,该高维数据点都是在许多单变量的方向突出,每个时间地点和规模的单变量MCD估计的计算和测量到中心的距离的标准化。最大的距离(在所有考虑的方向)是的数据点outlyingness措施。 h的数据点与最小的outlyingness措施被用来计算协方差矩阵Σ_h选择数k的主要组成部分,保留的。这是通过查找等kl_k/l_1 >= 10.E-3和Σ_{j=1}^k l_j/Σ_{j=1}^r l_j >= 0.8另外的主要组成部分k可以由用户指定的检查碎石图后,。
Step 3: The data points are projected on the k-dimensional subspace spanned by the k eigenvectors corresponding to the largest k eigenvalues of the matrix Σ_h. The location and scatter of the projected data are computed using the reweighted MCD estimator and the eigenvectors of this scatter matrix yield the robust principal components.
步骤3:数据点投射在由k最大k矩阵Σ_h的特征值对应的特征向量的k维子空间。使用重加权MCD估计,这个散射矩阵的特征向量产生强劲的主要组成部分的位置和分散的预测数据计算。
值----------Value----------
An S4 object of class PcaHubert-class which is a subclass of the virtual class PcaRobust-class.
S4对象的类PcaHubert-class这是虚拟类PcaRobust-class的一个子类。
注意----------Note----------
The ROBPCA algorithm is implemented on the bases of the Matlab implementation, available as part of LIBRA, a Matlab Library for Robust Analysis to be found at www.wis.kuleuven.ac.be/stat/robust.html
在ROBPCA算法实现的Matlab实现的基础上,提供的天秤座,一个Matlab库被发现在www.wis.kuleuven.ac.be / STAT / robust.html的鲁棒性分析
(作者)----------Author(s)----------
Valentin Todorov <a href="mailto:valentin.todorov@chello.at">valentin.todorov@chello.at</a>
参考文献----------References----------
approach to robust principal components analysis, Technometrics, 47, 64–79.
An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1–47. URL http://www.jstatsoft.org/v32/i03/.
实例----------Examples----------
## PCA of the Hawkins Bradu Kass's Artificial Data[#PCA霍金斯位于Bradu Kass的人工数据]
## using all 4 variables[#使用所有4个变量]
data(hbk)
pca <- PcaHubert(hbk)
pca
## Compare with the classical PCA[#比较经典的PCA]
prcomp(hbk)
## or [#]
PcaClassic(hbk)
## If you want to print the scores too, use[#如果你要打印的分数,]
print(pca, print.x=TRUE)
## Using the formula interface[#使用公式接口]
PcaHubert(~., data=hbk)
## To plot the results:[#要绘制的结果:]
plot(pca) # distance plot[距离图]
pca2 <- PcaHubert(hbk, k=2)
plot(pca2) # PCA diagnostic plot (or outlier map)[PCA诊断图(或异常图)]
## Use the standard plots available for for prcomp and princomp[#使用标准曲线可供选择的prcomp和主成分法]
screeplot(pca)
biplot(pca)
## Restore the covraiance matrix [#恢复covraiance矩阵]
py <- PcaHubert(hbk)
cov.1 <- py@loadings
cov.1
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|