RSimca(rrcovHD)
RSimca()所属R语言包:rrcovHD
Robust classification in high dimensions based on the SIMCA method
在高维空间中的SIMCA方法的基础上强大的分类
译者:生物统计家园网 机器人LoveR
描述----------Description----------
RSimca performs a robust version of the SIMCA method. This method classifies a data matrix x with a known group structure. To reduce the dimension on each group a robust PCA analysis is performed. Afterwards a classification rule is developped to determine the assignment of new observations.
RSimca执行健壮的版本SIMCA法。这种方法分类的数据矩阵X与已知的组结构。对每个组一个强大的主成分分析进行降维。分类规则之后,大发展的新的观测,以确定分配。
用法----------Usage----------
RSimca(x, ...)
## Default S3 method:[默认方法]
RSimca(x, grouping, prior=proportions, k, kmax = ncol(x),
control="hubert", alpha, tol = 1.0e-4, trace=FALSE, ...)
## S3 method for class 'formula'[类formula的方法]
RSimca(formula, data = NULL, ..., subset, na.action)
参数----------Arguments----------
参数:formula
a formula of the form y~x, it describes the response and the predictors. The formula can be more complicated, such as y~log(x)+z etc (see formula for more details). The response should be a factor representing the response variable, or any vector that can be coerced to such (such as a logical variable).
一个公式的形式y~x的,它描述了响应的预测。计算公式可以更复杂,如y~log(x)+z等(见formula更多的细节)。的反应应该是一个因素代表响应变量,或任何向量,可以强制转换为例如(如一个逻辑变量)。
参数:data
an optional data frame (or similar: see model.frame) containing the variables in the formula formula.
一个可选的数据框(或相似:model.frame),其中包含公式formula中的变量。
参数:subset
an optional vector used to select rows (observations) of the data matrix x.
的可选的向量选择行(观察)的数据矩阵x。
参数:na.action
a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The default is na.omit.
一个函数,它表示当数据包含NA的,应该发生什么。默认设置是由na.action的options,是na.fail,如果是没有设置的。默认的na.omit。
参数:x
a matrix or data frame containing the explanatory variables (training set).
矩阵或数据框包含的解释变量(训练集)。
参数:grouping
grouping variable: a factor specifying the class for each observation.
分组变量:指定一个类为每个观测的一个因素。
参数:prior
prior probabilities, default to the class proportions for the training set.
先验概率,默认为类的训练集的比例。
参数:tol
tolerance
公差
参数:control
a control object (S4) for specifying one of the available PCA estimation methods and containing estimation options. The class of this object defines which estimator will be used. Alternatively a character string can be specified which names the estimator - one of auto, hubert, locantore, grid, proj. If 'auto' is specified or the argument is missing, the function will select the estimator (see below for details)
(S4)控制对象指定一个可用的PCA估计方法和估计选项。这个对象的类定义了将被用于的估计。另外一个字符串,可以指定命名的估计 - 自动模式,休伯特,locantore,网格,PROJ。如果“自动”指定的参数丢失时,该功能将选择的估计(详情见下文)
参数:alpha
this parameter measures the fraction of outliers the algorithm should resist. In MCD alpha controls the size of the subsets over which the determinant is minimized, i.e. alpha*n observations are used for computing the determinant. Allowed values are between 0.5 and 1 and the default is 0.5.
这个参数测量异常值的算法应该抵制的分数。在MCD阿尔法控制行列式最小的子集的大小,即α* n个观察用于计算行列式。允许的值是0.5~1之间,默认为0.5。
参数:k
number of principal components to compute. If k is missing, or k = 0, the algorithm itself will determine the number of components by finding such k that l_k/l_1 >= 10.E-3 and Σ_{j=1}^k l_j/Σ_{j=1}^r l_j >= 0.8. It is preferable to investigate the scree plot in order to choose the number of components and then run again. Default is k=0.
主成分的数目来计算。如果k失踪,或k = 0,算法本身决定的元件数量由找到这样的k,l_k/l_1 >= 10.E-3和Σ_{j=1}^k l_j/Σ_{j=1}^r l_j >= 0.8。这是最好的卵石在选择组件的数量,然后再次运行图进行调查。默认是k=0。
参数:kmax
maximal number of principal components to compute. Default is kmax=10. If k is provided, kmax does not need to be specified, unless k is larger than 10.
最大的主成分个数来计算。默认是kmax=10。如果k提供,kmax不需要被指定,除非k是大于10。
参数:trace
whether to print intermediate results. Default is trace = FALSE
是否要打印的中间结果。默认是trace = FALSE
参数:...
arguments passed to or from other methods.
传递的参数或其他方法。
Details
详细信息----------Details----------
RSimca, serving as a constructor for objects of class RSimca-class is a generic function with "formula" and "default" methods.
RSimca,作为一个构造函数的类的对象RSimca-class是一个通用的功能与“公式”和“默认”的方法。
SIMCA is a two phase procedure consisting of PCA performed on each group separately for dimension reduction followed by classification rules built in the lower dimensional space (note that the dimension in each group can be different). Instead of classical PCA robust alternatives will be used. Any of the robust PCA methods available in package Pca-class can be used through the argument control. In original SIMCA new observations are classified by means of their deviations from the different PCA models. Here the classification rules will be obtained using two popular distances arising from PCA - orthogonal distances (OD) and score distances (SD). For the definition of these distances, the definition of the cutoff values and the standartization of the distances see Vanden Branden K, Hubert M (2005) and Todorov and Filzmoser (2009).
SIMCA是一个两阶段的程序组成的PCA对每个组执行分别进行降维,然后通过建在低维空间(请注意,在每个组的尺寸可以是不同的)的分类规则。相反的古典PCA鲁棒替代品将被使用。任何强大的PCA方法在包Pca-class可以用于,通过参数control。在原来的SIMCA新的观测结果进行分类通过从不同的PCA模型的偏差。这里的分类规则,将得到使用两种流行的距离所产生的PCA - 正交距离(OD)和得分的距离(SD)。对于这些距离的定义,该定义的临界值和standartization的距离,请参阅:范登布兰登K,休伯特中号(2005)和(2009)托多罗夫和Filzmoser。
值----------Value----------
An S4 object of class RSimca-class which is a subclass of of the virtual class Simca-class.
S4对象的类RSimca-class这是一个子类,虚拟类Simca-class。
(作者)----------Author(s)----------
Valentin Todorov <a href="mailto:valentin.todorov@chello.at">valentin.todorov@chello.at</a>
参考文献----------References----------
dimensions based on the SIMCA method. Chemometrics and Intellegent Laboratory Systems 79:10–21
实例----------Examples----------
data(pottery)
rs <- RSimca(origin~., data=pottery)
rs
summary(rs)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|