R语言 RSKC包 RSKC()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-28 22:02:39

RSKC(RSKC)
RSKC()所属R语言包：RSKC

                                    Robust Sparse K-means
                                       强大的稀疏的K-means

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

The robust sparse K-means clustering method by Kondo (2011). In this algorithm, sparse K-means (Witten and Tibshirani (2010)) is robustified by iteratively trimming the prespecified proportion of cases in the weighted squared Euclidean distances and the squared Euclidean distances.
强大的稀疏的K-means聚类方法近藤（2011）。在该算法中，稀疏的K-means（威腾和Tibshirani（2010））抗差的加权欧氏距离平方和的平方欧氏距离的情况下，通过反复微调预先设定的比例。

用法----------Usage----------

RSKC(d, ncl, alpha, L1 = 12, nstart = 20, silent=TRUE, scaling = FALSE, correlation = FALSE)

参数----------Arguments----------

参数：d
A numeric matrix of data, N by p, where N is the number of cases and p is the number of features.  Cases are partitioned into ncl clusters. Missing values are accepted.
一个数字矩阵的数据，Np，其中N是多少的情况下，p是多少功能。案例ncl群被划分为。遗漏值都可以接受。

参数：ncl
The prespecified number of clusters.
预先指定的簇数。

参数：alpha
0 <= alpha <= 1, the proportion of the cases to be trimmed in robust sparse K-means.
0 <=alpha<= 1的情况下，在强大的稀疏的K-means要修剪的比例。

If alpha > 0 and L1 >= 1 then RSKC performs robust sparse K-means.
如果alpha0L1> = 1，则RSKC执行强大的稀疏的K-means。

If alpha > 0 and L1 = NULL then RSKC performs trimmed K-means.
如果alpha0L1=NULL然后RSKC执行修剪的K-means。

If alpha = 0 and L1 >=1 then RSKC performs sparse K-means (with the algorithm of Lloyd (1982)).
如果alpha= 0和L1> = 1，则RSKC执行稀疏的K-means（与劳埃德算法（1982））。

If alpha = 0 and L1 = NULL then RSKC performs K-means (with the algorithm of Lloyd).
如果alpha= 0，L1=NULL然后RSKC执行的K-means（劳合社的算法）。

For more details on trimmed K-means, see Gordaliza (1991a), Gordaliza (1991b).
修剪的K-means的详细信息，请参阅Gordaliza（1991年人），Gordaliza（1991b）。

参数：L1
A single L1 bound on weights (the feature weights). If L1 is small, then few features will have non-zero weights. If L1 is large then all features will have non-zero weights. If L1 = NULL then RSKC performs nonsparse clustering (see alpha).
一个单一的L1重量（要素权重）的约束。如果L1是小，那么将有几个特点非零的权重。 L1如果大，则所有的功能有非零的权重。如果L1=NULLRSKC执行非稀疏聚类（见alpha）。

参数：nstart
The number of random initial sets of cluster centers in every step (a) which performs K-means or trimmed K-means.
在每一个步骤（a）执行的K-means或修整的K-means聚类中心的随机的初始套的数目。

参数：silent
If TRUE, then the processing step is not printed.
如果TRUE，然后处理步骤不打印。

参数：scaling
If TRUE, RSKC subtracts the each entry of data matrix by the corresponding column mean and divide it by the corresponding column SD: see scale
如果TRUE，RSKC中减去相应的列数据矩阵的每个条目的意思，将它除以相应的列SD：scale

参数：correlation
If TRUE, RSKC centers and scales the rows of data before the clustering is performed. i.e., trans.d = t(scale(t(d))) The squared Euclidean distance between cases in the transformed dataset trans.d is proportional to the dissimilality measure based on the correlation between the cases in the dataset d
如果TRUE，RSKC中心和尺度的行data前进行聚类。即，trans.d = t(scale(t(d)))平方欧氏距离之间的情况下，在转换后的数据集trans.d是成正比的dissimilality措施的基础上之间的相关性的情况下在数据集d

Details

详细信息----------Details----------

Robust sparse K-means is a clustering method that extends the sparse K-means clustering of Witten and Tibshirani to make it resistant to oultiers by trimming a fixed proportion of observations in each iteration. These outliers are flagged both in terms of their weighted and unweighted distances to eliminate the effects of outliers in the selection of feature weights and the selection of a partition.  In Step (a) of sparse K-means, given fixed weights, the algorithm aims to maximize the objective function over a partition i.e. it performs K-means on a weighted dataset. Robust sparse K-means robustifies Step (a) of sparse K-means by performing trimmed K-means on a weighted dataset: it trims cases in weighted squared Euclidean distances. Before Step (b), where, given a partition, the algorithm aims to maximize objective function over weights, the robust sparse  K-means has an intermediate robustifying step, Step (a-2). At this step, it trims cases in squared Euclidean distances.  Given a partition and trimmed cases from Step (a) and Step (a-2), the objective function is maximized over weights at Step(b). The objective function is calculated without the trimmed cases in Step (a) and Step(a-2).  The robust sparse K-means algorithm repeat Step (a), Step (a-2) and Step (b) until a stopping criterion is satisfied.  For the calculation of cluster centers in the weighted distances, see revisedsil.
强大的稀疏的K-means是一种聚类方法，它扩展了稀疏的K-means聚类威腾和Tibshirani，使其耐oultiers修剪一个固定的比例，在每个迭代的意见。无论是在它们的加权和不加权的距离，消除其影响要素权重的选择和选择的分区中的异常值，这些异常标记。以稀疏的K-means的步骤（a），给定的固定的权值，也就是说，它执行对加权数据集的K-means算法的目的是最大限度地提高在一个分区的目标函数。鲁棒稀疏的K均值robustifies步骤（一）稀疏的K-means通过执行修剪上的加权数据集的K-means：它修剪在加权平方欧几里德距离的情况下。之前的步骤（b），其中，给定的一个分区，该算法的目的是最大限度地提高超过重量的目标函数，鲁棒的稀疏的K-means具有中间的鲁棒控制步骤，即步骤（α-2）。在此步骤中，它修剪平方欧氏距离的情况下。由于一个分区，和修剪从步骤（a）和步骤（2）的情况下，目标函数最大化超过在步骤（b）的重量。未经修整的情况下，在步骤（a）和步骤（α-2）计算出的目标函数。稀疏K-means算法的鲁棒重复步骤（a），步骤（2）和步骤（b），直到满足停止标准，。对于聚类中心的加权距离的计算，请参阅revisedsil。

值----------Value----------

<table summary="R valueblock"> <tr valign="top"><td>N</td> <td> The number of cases.</td></tr> <tr valign="top"><td>p</td> <td> The number of features.</td></tr> <tr valign="top"><td>ncl</td> <td> See ncl above.</td></tr> <tr valign="top"><td>L1</td> <td> See L1 above.</td></tr> <tr valign="top"><td>nstart</td> <td> See nstart above.</td></tr> <tr valign="top"><td>alpha</td> <td> See alpha above.</td></tr> <tr valign="top"><td>scaling</td> <td> See scaling above.</td></tr> <tr valign="top"><td>correlation</td> <td> See correlation above.</td></tr> <tr valign="top"><td>missing</td> <td> It is TRUE if at least one point is missing in the data matrix, d.</td></tr>
<table summary="R valueblock"> <tr valign="top"> <TD> N</ TD> <TD>的情况数量。</ TD> </ TR> <TR VALIGN =“顶“<TD> p </ TD> <TD>的功能。</ TD> </ TR> <tr valign="top"> <TD>ncl</ TD > <TD>ncl以上。</ TD> </ TR> <tr valign="top"> <TD>L1 </ TD> <TD>见L1以上</ TD> </ TR> <tr valign="top"> <TD>nstart </ TD> <TD>见nstart以上。</ TD> </ TR> <TR VALIGN =“顶”> <TD>alpha </ TD> <TD>见alpha以上。</ TD> </ TR> <tr valign="top"> <TD> X> </ TD> <TD>见scaling以上。</ TD> </ TR> <tr valign="top"> <TD> scaling</ TD> <TD>见correlation以上。</ TD> </ TR> <tr valign="top"> <TD>correlation </ TD> <TD>这是missing，如果至少有一个点中缺少的数据矩阵，TRUE。</ TD> </ TR>

<tr valign="top"><td>labels</td> <td>  An integer vector of length N, set of cluster labels for each case. Note that trimmed cases also receive the cluster labels. </td></tr>
<tr valign="top"> <TD> labels </ TD> <TD>整数向量的长度N，每一种情况下的聚类标签。请注意，修剪的情况下，也收到聚类标签。 </ TD> </ TR>

<tr valign="top"><td>weights</td> <td>  A positive real vector of length p, containing weights on each feature.</td></tr>
<tr valign="top"> <TD> weights </ TD> <td>一个正实数向量的长度p，包含每个功能的权重。</ TD> </ TR>

<tr valign="top"><td>WBSS</td> <td>  A real vector containing the weighted between sum of squares at each Step (b). The weighted between sum of squares is the objective function to maximize, excluding the prespecified proportions of cases. The length of this vector is the number of times that the algorithm iterates the process steps (a),(a-2) and (b) before the stopping criterion is satisfied. This is returned only if L1 is numeric and > 1. </td></tr>
<tr valign="top"> <TD> WBSS </ TD> <TD>一个真正的向量，在每一个步骤（b）之间的加权平方和。之间的加权平方和为目标函数最大化，但不包括预先设定的比例的情况下。此向量的长度的次数，该算法迭代的过程步骤（a），（α-2）和（b）之前的停止标准是满意的。这是只有返回L1是数字1。 </ TD> </ TR>

<tr valign="top"><td>WWSS</td> <td>  A real number, the within cluster sum of squares at a local minimum. This is the objective function to minimize in nonsparse methods. For robust clustering methods, this quantity is calculated without the prespecified proportions of cases. This is returned only if L1=NULL,  </td></tr>
<tr valign="top"> <TD> WWSS </ TD> <td>一个实数，则聚类内和在当地最低的平方。这是客观的功能，以尽量减少在非稀疏的方法。对于稳健的聚类方法，这个数量计算的情况下，没有预先设定的比例。这是返回的只有L1=NULL“</ TD> </ TR>

<tr valign="top"><td>oE</td> <td>  Indices of the cases trimmed in squared Euclidean distances. </td></tr>
<tr valign="top"> <TD> oE </ TD> <TD>指数的情况下修剪平方欧氏距离。 </ TD> </ TR>

<tr valign="top"><td>oW</td> <td>  Indices of the cases trimmed in weighted squared Euclidean distances.  If L1 =NULL, then oW are the cases trimmed in the Euclidean distance, because all the features have the same weights, i.e., 1's.</td></tr> </table>
<tr valign="top"> <TD>oW</ TD> <TD>指数加权欧氏距离平方修剪的情况下。如果L1=NULL，然后oW是修剪的欧氏距离的情况下，因为所有的功能都有相同的权重，即，1的。</ TD > </ TR> </ TABLE>

（作者）----------Author(s)----------

Yumi Kondo <y.kondo@stat.ubc.ca>

参考文献----------References----------

A. Gordaliza. Best approximations to random variables based on trimming procedures. Journal of Approximation Theory, 64, 1991a.
A. Gordaliza. On the breakdown point of multivariate location estimators based on trimming procedures. Statistics & Probability Letters, 11, 1991b.
Y. Kondo (2011), Robustificaiton of the sparse K-means clustering algorithm, MSc. Thesis, University of British Columbia http://hdl.handle.net/2429/37093
D. M. Witten and R. Tibshirani. A framework for feature selection in  clustering. Journal of the American Statistical Association, 105(490) 713-726, 2010.
S.P. Least Squares quantization in PCM. IEEE Transactions on information theory, 28(2): 129-136, 1982.

实例----------Examples----------

# little simulation function [小的仿真功能]
sim <-
function(mu,f){
D<-matrix(rnorm(60*f),60,f)
D[1:20,1:50]<-D[1:20,1:50]+mu
D[21:40,1:50]<-D[21:40,1:50]-mu
return(D)
}

set.seed(1);d0<-sim(1,500)# generate a dataset[生成数据集]
true<-rep(1:3,each=20) # vector of true cluster labels[矢量真正的聚类标签]
d<-d0
ncl<-3
for ( i in 1 : 10){
d[sample(1:60,1),sample(1:500,1)]<-rnorm(1,mean=0,sd=15)
}

# The generated dataset looks like this...[生成的数据集看起来是这样的......]
pairs(
   d[,c(1,2,3,200)],col=true,
   labels=c("clustering feature 1","clustering feature 2","clustering feature 3","noise feature1"),
   main="The sampling distribution of 60 cases colored by true cluster labels",
   lower.panel=NULL)

# Compare the performance of four algorithms[比较四个算法的性能]
###3-means[＃＃3]
r0<-kmeans(d,ncl,nstart=100)
CER(r0$cluster,true)

###Sparse 3-means[＃＃稀疏3-]
#This example requires sparcl package[此示例要求SPARCL包]
#library(sparcl)[库（SPARCL）]
#r1<-KMeansSparseCluster(d,ncl,wbounds=6)[R1 <KMeansSparseCluster（D，NCL，wbounds = 6）]
# Partition result[分区结果]
#CER(r1$Cs,true)[的CER（R1 $ CS，真正的）]
# The number of nonzero weights[非零的权重的数目]
#sum(!r1$ws<1e-3)[总和（R1 WS <1E-3）]

###Trimmed 3-means[＃＃修整3均值]

r2<-RSKC(d,ncl,alpha=10/60,L1=NULL,nstart=200)
CER(r2$labels,true)

###Robust Sparse 3-means[＃＃强大的稀疏3]
r3<-RSKC(d,ncl,alpha=10/60,L1=6,nstart=200)
# Partition result[分区结果]
CER(r3$labels,true)
r3

### RSKC works with datasets containing missing values...[＃＃RSKC的数据集包含缺失值的作品...]
# add missing values to the dataset[添加缺少的值的数据集]
set.seed(1)
for ( i in 1 : 100)
{
d[sample(1:60,1),sample(1,500,1)]<-NA
}
r4 <- RSKC(d,ncl,alpha=10/60,L1=6,nstart=200)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册