R语言:silhouette()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-16 21:21:36

silhouette(cluster)
silhouette()所属R语言包：cluster

                                    Compute or Extract Silhouette Information from Clustering
                                       从聚类计算或提取的剪影信息

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Compute silhouette information according to a given clustering in k clusters.
根据在k集群的集群计算的轮廓信息。

用法----------Usage----------

silhouette(x, ...)
## Default S3 method:[默认方法]
  silhouette(x, dist, dmatrix, ...)
## S3 method for class 'partition'
silhouette(x, ...)
## S3 method for class 'clara'
silhouette(x, full = FALSE, ...)

sortSilhouette(object, ...)
## S3 method for class 'silhouette'
summary(object, FUN = mean, ...)
## S3 method for class 'silhouette'
plot(x, nmax.lab = 40, max.strlen = 5,
   main = NULL, sub = NULL, xlab = expression("Silhouette width "* s[i]),
   col = "gray",  do.col.sort = length(col) > 1, border = 0,
   cex.names = par("cex.axis"), do.n.k = TRUE, do.clus.stat = TRUE, ...)

参数----------Arguments----------

参数：x
an object of appropriate class; for the default method an integer vector with k different integer cluster codes or a list with such an x$clustering component.  Note that silhouette statistics are only defined if 2 <= k <= n-1.
适当的类的对象;default方法k不同的整数集群代码或x$clustering组件列表的整数向量。注意剪影统计，仅定义如果2 <= k <= n-1。

参数：dist
a dissimilarity object inheriting from class dist or coercible to one.  If not specified, dmatrix must be.
1相异继承类dist或强制转换到一个对象。如果没有指定，dmatrix必须。

参数：dmatrix
a symmetric dissimilarity matrix (n * n), specified instead of dist, which can be more efficient.
对称的相异矩阵（n * n），而不是dist，可以更有效地指定。

参数：full
logical specifying if a full silhouette should be computed for clara object.  Note that this requires O(n^2) memory, since the full dissimilarity (see daisy) is needed internally.
如果一个完整的轮廓应为clara对象计算逻辑指定。请注意，这需要O(n^2)内存，因为完全相异（见daisy）内部需要。

参数：object
an object of class silhouette.
对象类silhouette。

参数：...
further arguments passed to and from methods.
通过进一步的论据，以及从方法。

参数：FUN
function used to summarize silhouette widths.
函数用来总结轮廓宽度。

参数：nmax.lab
integer indicating the number of labels which is considered too large for single-name labeling the silhouette plot.
整数，指示标签，这被认为是单一名称标记的剪影图太大的数目。

参数：max.strlen
positive integer giving the length to which strings are truncated in silhouette plot labeling.
正整数剪影图标签截断字符串的长度。

参数：main, sub, xlab
arguments to title; have a sensible non-NULL default here.
参数title;这里有一个合理的非默认值为NULL。

参数：col, border, cex.names
arguments passed barplot(); note that the default used to be col    = heat.colors(n), border = par("fg") instead.<br> col can also be a color vector of length k for clusterwise coloring, see also do.col.sort:
参数传递barplot();注意，默认使用col    = heat.colors(n), border = par("fg")，而不是参考col也可以是一个长度为颜色矢量k的clusterwise着色，亦见“。 do.col.sort

参数：do.col.sort
logical indicating if the colors col should be sorted “along” the silhouette; this is useful for casewise or clusterwise coloring.
逻辑表示如果颜色col应排序“沿”的剪影，这是有用为casewise或clusterwise着色。

参数：do.n.k
logical indicating if n and k “title text” should be written.
逻辑表示如果n和k“标题文字”应写入。

参数：do.clus.stat
logical indicating if cluster size and averages should be written right to the silhouettes.
逻辑表示如果簇大小和平均应写入权的轮廓。

Details

详情----------Details----------

For each observation i, the silhouette width s(i) is defined as follows: <br> Put a(i) = average dissimilarity between i and all other points of the cluster to which i belongs (if i is the only observation in its cluster, s(i) := 0 without further calculations). For all other clusters C, put d(i,C) = average dissimilarity of i to all observations of C.  The smallest of these d(i,C) is b(i) := \min_C d(i,C), and can be seen as the dissimilarity between i and its “neighbor” cluster, i.e., the nearest one to which it does not belong.
我为每个观察，轮廓宽度s(i)定义如下：参考把我和我所属的集群的所有其他点（A（I）=平均相异，如果我是唯一的观察它的集群，s(i) := 0没有进一步计算）。所有的C其他集群，把d(i,C)=平均相异的I C的所有意见最小这些d(i,C)是b(i) := \min_C d(i,C)，我和它之间的不同，可以看出“邻居”的产业集群，也就是说，它不属于最近的一个。

silhouette.default() is now based on C code donated by Romain Francois (the R version being still available as cluster:::silhouette.default.R).
silhouette.default()现由罗曼·弗朗索瓦捐赠的C代码（R版本仍然为cluster:::silhouette.default.R）。

Observations with a large s(i) (almost 1) are very well clustered, a small s(i) (around 0) means that the observation lies between two clusters, and observations with a negative s(i) are probably placed in the wrong cluster.
与观测大s(i)（几乎1）很好聚集，小s(i)（约0）表示观察两个群集之间，意见是带负s(i)可能放置在错误的群集。

值----------Value----------

silhouette() returns an object, sil, of class silhouette which is an [n x 3] matrix with attributes.  For each observation i, sil[i,] contains the cluster to which i belongs as well as the neighbor cluster of i (the cluster, not containing i, for which the average dissimilarity between its observations and i is minimal), and the silhouette width s(i) of the observation.  The colnames correspondingly are c("cluster", "neighbor", "sil_width").
silhouette()返回一个对象，sil类silhouette这是一个NX 3]矩阵与属性。对于每一个我观察，sil[i,]包含我所属的集群，以及我的邻居簇（簇，不包含我，其中平均相异的意见，我是最小的），剪影宽度s(i)观察。 colnames相应c("cluster", "neighbor", "sil_width")，。

summary(sil) returns an object of class summary.silhouette, a list with components
summary(sil)类summary.silhouette，列表中的对象与组件返回

参数：si.summary
numerical summary of the individual silhouette widths s(i).
数值summary的个人剪影宽度s(i)。

参数：clus.avg.widths
numeric (rank 1) array of clusterwise means of silhouette widths where mean = FUN is used.
数字（等级1）数组clusterwise意味着剪影mean = FUN用于宽度。

参数：avg.width
the total mean FUN(s) where s are the individual silhouette widths.
总的意思FUN(s)s是个人剪影宽度。

参数：clus.sizes
table of the k cluster sizes.
tablek簇大小。

参数：call
if available, the call creating sil.
如果有的话，调用创建sil。

参数：Ordered
logical identical to attr(sil, "Ordered"), see below.
attr(sil, "Ordered")逻辑相同，见下文。

sortSilhouette(sil) orders the rows of sil as in the silhouette plot, by cluster (increasingly) and decreasing silhouette width s(i). <br> attr(sil, "Ordered") is a logical indicating if sil is ordered as by sortSilhouette(). In that case, rownames(sil) will contain case labels or numbers, and <br> attr(sil, "iOrd") the ordering index vector.
sortSilhouette(sil)命令行sil在剪影图集群（越来越多）和轮廓宽度减少s(i)。参考attr(sil, "Ordered")如果sil命令sortSilhouette()作为一个逻辑表示。在这种情况下，rownames(sil)将包含case标签或数字，并参考attr(sil, "iOrd")订购索引向量。

注意----------Note----------

While silhouette() is intrinsic to the partition clusterings, and hence has a (trivial) method for these, it is straightforward to get silhouettes from hierarchical clusterings from silhouette.default() with cutree() and distance as input.
虽然silhouette()是内在partition聚类，因此有这些（平凡）方法，它是直接从silhouette.default()cutree()得到的轮廓分层聚类作为输入的距离。

By default, for clara() partitions, the silhouette is just for the best random subset used.  Use full = TRUE to compute (and later possibly plot) the full silhouette.
clara()分区，默认情况下，人影是只为最好的随机使用的一部分。使用full = TRUE计算（和以后可能图）充分剪影。

参考文献----------References----------

Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20, 53–65.
the references in <code>plot.agnes</code>.

参见----------See Also----------

partition.object, plot.partition.
partition.object，plot.partition。

举例----------Examples----------

data(ruspini)
pr4 <- pam(ruspini, 4)
str(si <- silhouette(pr4))
(ssi <- summary(si))
plot(si) # silhouette plot[剪影图]
plot(si, col = c("red", "green", "blue", "purple"))# with cluster-wise coloring[与集群明智的着色]

si2 <- silhouette(pr4$clustering, dist(ruspini, "canberra"))
summary(si2) # has small values: "canberra"'s fault[有小的值：“堪培拉”的错]
plot(si2, nmax= 80, cex.names=0.6)

op <- par(mfrow= c(3,2), oma= c(0,0, 3, 0),
      mgp= c(1.6,.8,0), mar= .1+c(4,2,2,2))
for(k in 2:6)
plot(silhouette(pam(ruspini, k=k)), main = paste("k = ",k), do.n.k=FALSE)
mtext("PAM(Ruspini) as in Kaufman & Rousseeuw, p.101",
   outer = TRUE, font = par("font.main"), cex = par("cex.main")); frame()

## the same with cluster-wise colours:[＃同群明智的颜色：]
c6 <- c("tomato", "forest green", "dark blue", "purple2", "goldenrod4", "gray20")
for(k in 2:6)
plot(silhouette(pam(ruspini, k=k)), main = paste("k = ",k), do.n.k=FALSE,
      col = c6[1:k])
par(op)

## clara(): standard silhouette is just for the best random subset[＃克拉拉（）：标准的人影就是最好的随机子集]
data(xclara)
set.seed(7)
str(xc1k <- xclara[sample(nrow(xclara), size = 1000) ,])
cl3 <- clara(xc1k, 3)
plot(silhouette(cl3))# only of the "best" subset of 46[唯一的“最佳”子集46]
## The full silhouette: internally needs large (36 MB) dist object:[＃完整的轮廓：国内需要大区（36 MB）对象：]
sf <- silhouette(cl3, full = TRUE) ## this is the same as[＃这是一样的]
s.full <- silhouette(cl3$clustering, daisy(xc1k))
if(paste(R.version$major, R.version$minor, sep=".") >= "2.3.0")
stopifnot(all.equal(sf, s.full, check.attributes = FALSE, tol = 0))
## color dependent on original "3 groups of each 1000":[颜色上原有的“3组，每1000”依赖：]
plot(sf, col = 2+ as.integer(names(cl3$clustering) ) %/% 1000,
   main ="plot(silhouette(clara(.), full = TRUE))")

## Silhouette for a hierarchical clustering:[＃图案的层次聚类：]
ar <- agnes(ruspini)
si3 <- silhouette(cutree(ar, k = 5), # k = 4 gave the same as pam() above[K = 4以上PAM相同的（）]
            daisy(ruspini))
plot(si3, nmax = 80, cex.names = 0.5)
## 2 groups: Agnes() wasn't too good:[＃2组：艾格尼丝（）不太好：]
si4 <- silhouette(cutree(ar, k = 2), daisy(ruspini))
plot(si4, nmax = 80, cex.names = 0.5)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言:silhouette()函数中文帮助文档(中英文对照)

浏览过的版块