sca(sca)
sca()所属R语言包:sca
Simple Component Analysis – Interactively
简单成分分析 - 以交互方式
译者:生物统计家园网 机器人LoveR
描述----------Description----------
A system of simple components calculated from a correlation (or variance-covariance) matrix is built (interactively if interactive = TRUE) following the methodology of Rousson and Gasser (2003).
简单的组件,计算出相关性(或方差 - 协方差)(交互式矩阵是建立一个系统如果interactive = TRUE)以下的方法的Rousson和加塞(2003年)。
用法----------Usage----------
sca(S, b = if(interactive) 5, d = 0, qmin = if(interactive) 0 else 5,
corblocks = if(interactive) 0 else 0.3,
criterion = c("csv", "blp"), cluster = c("median","single","complete"),
withinblock = TRUE, invertsigns = FALSE,
interactive = dev.interactive())
## S3 method for class 'simpcomp'
print(x, ndec = 2, ...)
参数----------Arguments----------
参数:S
the correlation (or variance-covariance) matrix to be analyzed.
的相关性(或方差 - 协方差)的矩阵来进行分析。
参数:b
the number of block-components initially proposed.
最初提出的数块组件。
参数:d
the number of difference-components initially proposed.
最初提出的不同组件数。
参数:qmin
if larger than zero, the number of difference-components is chosen such that the system contains at least qmin components (overriding argument d!).
如果大于零,被选择为使得该系统包含的数目的不同组件的至少qmin组件(凌驾参数d!)。
参数:corblocks
if larger than zero, the number of block-components is chosen such that correlations among them are all smaller than corblocks (overriding argument b).
如果大于零,数块组件选择等,它们之间的相关性均小于corblocks(压倒一切的参数b)。
参数:criterion
character string specifying the optimality criterion to be used for evaluating a system of simple components. One of "csv" (corrected sum of variances) or "blp" (best linear predictor); can be abbreviated.
字符串指定的最优性准则的,用于评估一个系统的简单的组件。 "csv"(校正后的方差和)或"blp"(最佳线性预测);可缩写为。
参数:cluster
character string specifying the clustering method to be used in the definition of the block-components. One of "single" (single linkage), "median" (median linkage) or "complete" (complete linkage) can be abbreviated.
字符的字符串指定的聚类方法,可以使用的块组件的定义中。之一"single"(单连接),"median"(中位数联动)或"complete"(全连接)可缩写。
参数:withinblock
a logical indicating whether any given difference-component should only involve variables belonging to the same block-component.
逻辑表示任何给定的差分量是否只应涉及属于相同的块的分量的变量。
参数:invertsigns
a logical indicating whether the sign of some variables should be inverted initially in order to avoid negative correlations.
一个逻辑指示是否应该被反转的一些变量的符号,最初为了避免呈负相关。
参数:interactive
a logical indicating whether the system of simple components should be built interactively. If interactive=FALSE, an optimal system of simple components is automatically calculated without any intervention of the user (according to b or corblocks, and to d or qmin). By default, interactive = dev.interactive() (which is true if interactive() and .Device is an interactive graphics device).
逻辑指示是否应建立交互系统的简单的组件。如果interactive=FALSE,简单的组件的一个最佳的系统自动计算没有任何的用户干预(根据b或corblocks,d或qmin的)。默认情况下,interactive = dev.interactive()(这是真正的interactive()和.Device是一个交互式的图形设备)。
参数:x
an object of class sca, typically the result of sca(..).
类的一个对象sca,通常的结果sca(..)。
参数:ndec
number of decimals after the dot, for the percentages printed.
小数点后,打印的百分比。
参数:...
further arguments, passed to and from methods.
进一步的论据,传递和方法。
Details
详细信息----------Details----------
When confronted with a large number p of variables measuring different aspects of a same theme, the practitionner may like to summarize the information into a limited number q of components. A component is a linear combination of the original variables, and the weights in this linear combination are called the loadings. Thus, a system of components is defined by a p times q dimensional matrix of loadings.
当遇到具有大量p测量同一个主题的不同方面的变量,practitionner可能打算总结成一个有限的数目q的组件的信息。 A成分是原始变量的线性组合,并在这个线性组合的权重被称为负荷。因此,系统的组件被定义由p倍q二维矩阵的负荷。
Among all systems of components, principal components (PCs) are optimal in many ways. In particular, the first few PCs extract a maximum of the variability of the original variables and they are uncorrelated, such that the extracted information is organized in an optimal way: we may look at one PC after the other, separately, without taking into account the rest.
在所有的系统组件,主要成分(PCs)在许多方面都是最优的。特别是,一些个人电脑中提取的原始变量的变异性最大的,他们是不相关的,例如,提取的信息以最佳方式组织:我们不妨来看看一台电脑后,分别,不考虑其余部分。
Unfortunately PCs are often difficult to interpret. The goal of Simple Component Analysis is to replace (or to supplement) the optimal but non-interpretable PCs by suboptimal but interpretable simple components. The proposal of Rousson and Gasser (2003) is to look for an optimal system of components, but only among the simple ones, according to some definition of optimality and simplicity. The outcome of their method is a simple matrix of loadings calculated from the correlation matrix S of the original variables.
不幸的是,的电脑往往难以解释。简单的成分分析的目标是代替(或补充)的最佳,但不可解释的不理想,但解释简单的组件的电脑。和加塞的Rousson(2003年)的建议,以寻找最佳的系统组件,但只有在简单的,根据一些的最优性和简单的定义。从相关矩阵S的原始变量的负荷计算的结果,他们的方法是一个简单的矩阵。
Simplicity is not a guarantee for interpretability (but it helps in this regard). Thus, the user may wish to partly modify an optimal system of simple components in order to enhance interpretability. While PCs are by definition 100% optimal, the optimal system of simple components proposed by the procedure sca may be, say, 95%, optimal, whereas the simple system altered by the user may be, say, 93% optimal. It is ultimately to the user to decide if the gain in interpretability is worth the loss of optimality.
简单是不能保证的解释性(但它在这方面有所帮助)。因此,用户可能希望简单的组件的一个最佳的系统的部分修改,以提高解释性。虽然个人电脑是由定义的100%最优的,简单的组件的最佳系统,所提出的程序sca可以是,比方说,95%,最佳的,而简单的系统可能是由用户改变,例如,93%最佳。这是最终用户,以决定是否在解释性的增益是值得的损失最优。
The interactive procedure sca is intended to assist the user in his/her choice for an interptetable system of simple components. The algorithm consists of three distinct stages and proceeds in an interative way. At each step of the procedure, a simple matrix of loadings is displayed in a window. The user may alter this matrix by clicking on its entries, following the instructions given there. If all the loadings of a component share the same sign, it is a “block-component”. If some loadings are positive and some loadings are negative, it is a “difference-component”. Block-components are arguably easier to interpret than difference-components. Unfortunately, PCs almost always contain only one block-component. In the procedure sca, the user may choose the number of block-components in the system, the rationale being to have as many block-components such that correlations among them are below some cut-off value (typically .3 or .4).
交互过程sca的目的是帮助用户在他/她的选择为一个interptetable的系统,简单的组件。该算法包含三个不同的阶段,并继续在一个交互的方式。在每个步骤中的过程中,一个简单的矩阵的负荷是在一个窗口中显示。通过单击其条目,用户可以改变这个矩阵的指示。如果一个组件的所有的负载共享相同的符号,它是一个“块组件”。如果一些负荷是正面和一些负荷是负的,这是一个“差分量”。块组件,可以说是比较容易解释的差异组件。不幸的是,个人电脑几乎总是只包含一个块组件。在该过程中,sca,用户可以选择在系统中的块组件的数目,即有许多块组件,使得它们之间的相关性是一些cut-off值(通常为0.3或以下的理由0.4)。
Simple block-components should define a partition of the original variables. This is done in the first stage of the procedure sca. An agglomerative hierarchical clustering procedure is used there.
简单的块组件定义分区的原始变量。这样做是在第一阶段的程序sca。目前使用的是一种凝聚的层次聚类方法。
The second stage of the procedure sca consists in the definition of simple difference-components. Those are obtained as simplified versions of some appropriate “residual components”. The idea is to retain the large loadings (in absolute value) of these residual components and to shrink to zero the small ones. For each difference-component, the interactive procedure sca displays the loadings of the corresponding residual component (at the right side of the window), such that the user may know which variables are especially important for the definition of this component.
由简单的差分量的定义中的第二阶段的程序sca。这些都是一些适当的“剩余元件”得到简化版本。我们的想法是,保留这些残留成分的大负荷(的绝对值)和收缩到零的小。对于每个差分量,交互式程序sca显示相应的剩余组分(在该窗口的右侧),例如,用户可能知道哪些变量是尤其重要的,此组件的定义的负载。
At the third stage of the interactive procedure sca, it is possible to remove some of the difference-components from the system.
在第三阶段中的交互式程序sca,它是可能的,以除去一些从系统的不同组件。
For many examples, it is possible to find a simple system which is 90% or 95% optimal, and where correlations between components are below 0.3 or 0.4. When the structure in the correlation matrix is complicated, it might be advantageous to invert the sign of some of the variables in order to avoid as much as possible negative correlations. This can be done using the option "invertsigns=TRUE".
对于许多实施例中,它是能够找到一个简单的系统,该系统是90%或95%,最佳的,并在组件之间的相关性是0.3或0.4以下。当在相关矩阵的结构是复杂的,它可能是有利的反转的一些变量的符号,为了避免可能的负相关尽可能。这是可以做到= TRUE使用选项“invertsigns的的。
In principle, simple components can be calculated from a correlation matrix or from a variance-covariance matrix. However, the definition of simplicity used is not well adapted to the latter case, such that it will result in systems which are far from being 100% optimal. Thus, it is advised to define simple components from a correlation matrix, not from a variance-covariance matrix.
在原则上,简单的组件可以从相关矩阵或从方差 - 协方差矩阵计算。然而,还没有很好地适应所使用的简单的定义在后者的情况下,例如,它会导致在远离100%最优的系统。因此,我们建议从相关矩阵定义简单的组件,而不是从方差 - 协方差矩阵。
值----------Value----------
An object of class simpcomp which is basically as list with the following components:
类simpcomp的对象基本上是以下组件的列表:
参数:simplemat
an integer matrix defining a system of simple components. The rows correspond to variables and the columns correspond to components.
一个整数矩阵定义一个系统的简单的组件。的行相对应的变量和列相对应的组件。
参数:loadings
loadings of simple components. This is a normalized (by normmatrix) version of simplemat.
简单的组件的负荷量。这是一个标准化的(normmatrix)simplemat版本的。
参数:allcrit
a list containing the following components:
一个列表,包含以下组件:
varpca vector containing the percentage of total variability accounted by each of the the first nblock + ndiff principal components of S.
varpca向量的总变化的百分比占由每个第一nblock + ndiff主成分S。
varsca vector containing the percentage of total variability accounted by each of the simple components defined by simplemat.
varsca向量的总变异的比例占的简单定义的组件的simplemat。
cumpcthe sum of varpc, indicating the percentage of total variability accounted by the first nblock + ndiff principal components of S.
cumpcthe总和的varpc,表示总变异的比例占第一nblock + ndiff主要组成部分S。
cumsca score indicating the percentage of total variability accounted by the system of simple components. cumsc is calculated according to criterion.
表示总变异的比例占cumsca得分由简单的组件体系。 cumsc的计算方法根据criterion。
optindicates the optimality of the system of simple components and is computed as cumsc/cumpc.
optindicates简单的组件系统的最优性和计算cumsc/cumpc。
corsccorrelation matrix of the simple components defined by simplemat.
corsccorrelation矩阵的简单定义的组件的simplemat。
maxcora list with the following components:
与以下组件maxcora列表:
row label of the row of the maximum value in corsc.
的行中的最大值corsc行标签。
collabel of the column of the maximum value in corsc.
collabel列在corsc的最大值。
valmaximum value in corsc (in absolute value). </table>
valmaximum值在corsc(绝对值)。 </ TABLE>
参数:nblock
number of block-components in simplemat.
在simplemat块组件的数量。
参数:ndiff
number of difference-components in simplemat.
在simplemat的不同组件的数量。
参数:criterion
as above.
如上。
参数:cluster
as above.
如上。
参数:withinblock
as above.
如上。
参数:invertsigns
as above
如上
参数:vardata
the correlation (or variance-covariance) matrix which was analyzed. In principle it should be equal to argument S above, except if it has been transformed in order to avoid negative correlations.
的相关性(或方差 - 协方差)矩阵分析。原则上应该是平等的参数S以上,除非它已经转型,以避免负相关关系。
(作者)----------Author(s)----------
Valentin Rousson <a href="mailto:rousson@ifspm.unizh.ch">rousson@ifspm.unizh.ch</a> and
Martin Maechler <a href="mailto:maechler@stat.math.ethz.ch">maechler@stat.math.ethz.ch</a>.
参考文献----------References----------
Simple Component Analysis. Submitted.
Some Case Studies of Simple Component Analysis. Manuscript.
Some Proposals for Evaluating Systems of Components in Dimension Reduction Problems. Submitted.
参见----------See Also----------
prcomp (for PCA), etc.
prcomp(PCA),等等。
实例----------Examples----------
data(pitpropC)
sc.pitp <- sca(pitpropC, interactive=FALSE)
sc.pitp
## to see it's low-level components:[#看到它的低级别的组件:]
str(sc.pitp)
## Let `X' be a matrix containing some data set whose rows correspond to[#X是一个包含一些数据的行对应的矩阵]
## subjects and whose columns correspond to variables. For example:[#科目和其列对应的变量。例如:]
library(MASS)
Sig <- function(p, rho) { r <- diag(p); r[col(r) != row(r)] <- rho; r}
rmvN <- function(n,p, rho)
mvrnorm(n, mu=rep(0,p), Sigma= Sig(p, rho))
X <- cbind(rmvN(100, 3, 0.7),
rmvN(100, 2, 0.9),
rmvN(100, 4, 0.8))
## An optimal simple system with at least 5 components for the data in `X',[简单的系统,至少5种成分中的数据X#最佳,]
## where the number of block-components is such that correlations among[#,其中块的组件的数目是这样的,之间的相关性]
## them are all smaller than 0.4, can be automatically obtained as:[#他们都是小于0.4时,可以自动获得的:]
(r <- sca(cor(X), qmin=5, corblocks=0.4, interactive=FALSE))
## On the other hand, an optimal simple system with two block-components[#在另一方面,一个最佳的简单的系统具有两个块组件]
## and two difference-components for the data in `X' can be automatically[#和两个不同的组件为X中的数据,可以自动]
## obtained as:[#获得为:]
(r <- sca(cor(X), b=2, d=2, qmin=0, corblocks=0, interactive=FALSE))
## The resulting simple matrix is contained in `r$simplemat'.[#简单的矩阵包含在R $ simplemat的。]
## A matrix of scores for such simple components can then be obtained as:[#A矩阵的这些简单的组件的分数,然后可以得到:]
(Z <- scale(X) %*% r$loadings)
## On the other hand, scores of simple components calculated from the[#在另一方面,从简单的组件的分数计算]
## variance-covariance matrix of `X' can be obtained as:[#方差 - 协方差矩阵的X可以作为以下方式获得:]
r <- sca(var(X), b=2, d=2, qmin=0, corblocks=0, interactive=FALSE)
Z <- scale(X, scale=FALSE) %*% r$loadings
## One can also use the program interactively as follows:[#你也可以使用交互式的程序如下:]
if(interactive()) {
r <- sca(cor(X), corblocks=0.4, qmin=5, interactive = TRUE)
## Since the interactive part of the program is active here, the proposed[#由于互动的一部分,该计划是积极的,在这里,建议]
## system can then be modified according to the user's wishes. The[#然后系统可以根据用户的意愿进行修改。 “]
## result of the procedure will be contained in `r'.[#的程序将包含在R。]
}
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|