silhouette(cluster)
silhouette()所属R语言包:cluster
Compute or Extract Silhouette Information from Clustering
从聚类计算或提取的剪影信息
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Compute silhouette information according to a given clustering in k clusters.
根据在k集群的集群计算的轮廓信息。
用法----------Usage----------
silhouette(x, ...)
## Default S3 method:[默认方法]
silhouette(x, dist, dmatrix, ...)
## S3 method for class 'partition'
silhouette(x, ...)
## S3 method for class 'clara'
silhouette(x, full = FALSE, ...)
sortSilhouette(object, ...)
## S3 method for class 'silhouette'
summary(object, FUN = mean, ...)
## S3 method for class 'silhouette'
plot(x, nmax.lab = 40, max.strlen = 5,
main = NULL, sub = NULL, xlab = expression("Silhouette width "* s[i]),
col = "gray", do.col.sort = length(col) > 1, border = 0,
cex.names = par("cex.axis"), do.n.k = TRUE, do.clus.stat = TRUE, ...)
参数----------Arguments----------
参数:x
an object of appropriate class; for the default method an integer vector with k different integer cluster codes or a list with such an x$clustering component. Note that silhouette statistics are only defined if 2 <= k <= n-1.
适当的类的对象;default方法k不同的整数集群代码或x$clustering组件列表的整数向量。注意剪影统计,仅定义如果2 <= k <= n-1。
参数:dist
a dissimilarity object inheriting from class dist or coercible to one. If not specified, dmatrix must be.
1相异继承类dist或强制转换到一个对象。如果没有指定,dmatrix必须。
参数:dmatrix
a symmetric dissimilarity matrix (n * n), specified instead of dist, which can be more efficient.
对称的相异矩阵(n * n),而不是dist,可以更有效地指定。
参数:full
logical specifying if a full silhouette should be computed for clara object. Note that this requires O(n^2) memory, since the full dissimilarity (see daisy) is needed internally.
如果一个完整的轮廓应为clara对象计算逻辑指定。请注意,这需要O(n^2)内存,因为完全相异(见daisy)内部需要。
参数:object
an object of class silhouette.
对象类silhouette。
参数:...
further arguments passed to and from methods.
通过进一步的论据,以及从方法。
参数:FUN
function used to summarize silhouette widths.
函数用来总结轮廓宽度。
参数:nmax.lab
integer indicating the number of labels which is considered too large for single-name labeling the silhouette plot.
整数,指示标签,这被认为是单一名称标记的剪影图太大的数目。
参数:max.strlen
positive integer giving the length to which strings are truncated in silhouette plot labeling.
正整数剪影图标签截断字符串的长度。
参数:main, sub, xlab
arguments to title; have a sensible non-NULL default here.
参数title;这里有一个合理的非默认值为NULL。
参数:col, border, cex.names
arguments passed barplot(); note that the default used to be col = heat.colors(n), border = par("fg") instead.<br> col can also be a color vector of length k for clusterwise coloring, see also do.col.sort:
参数传递barplot();注意,默认使用col = heat.colors(n), border = par("fg"),而不是参考col也可以是一个长度为颜色矢量k的clusterwise着色,亦见“。 do.col.sort
参数:do.col.sort
logical indicating if the colors col should be sorted “along” the silhouette; this is useful for casewise or clusterwise coloring.
逻辑表示如果颜色col应排序“沿”的剪影,这是有用为casewise或clusterwise着色。
参数:do.n.k
logical indicating if n and k “title text” should be written.
逻辑表示如果n和k“标题文字”应写入。
参数:do.clus.stat
logical indicating if cluster size and averages should be written right to the silhouettes.
逻辑表示如果簇大小和平均应写入权的轮廓。
Details
详情----------Details----------
For each observation i, the silhouette width s(i) is defined as follows: <br> Put a(i) = average dissimilarity between i and all other points of the cluster to which i belongs (if i is the only observation in its cluster, s(i) := 0 without further calculations). For all other clusters C, put d(i,C) = average dissimilarity of i to all observations of C. The smallest of these d(i,C) is b(i) := \min_C d(i,C), and can be seen as the dissimilarity between i and its “neighbor” cluster, i.e., the nearest one to which it does not belong.
我为每个观察,轮廓宽度s(i)定义如下:参考把我和我所属的集群的所有其他点(A(I)=平均相异,如果我是唯一的观察它的集群,s(i) := 0没有进一步计算)。所有的C其他集群,把d(i,C)=平均相异的I C的所有意见最小这些d(i,C)是b(i) := \min_C d(i,C),我和它之间的不同,可以看出“邻居”的产业集群,也就是说,它不属于最近的一个。
silhouette.default() is now based on C code donated by Romain Francois (the R version being still available as cluster:::silhouette.default.R).
silhouette.default()现由罗曼·弗朗索瓦捐赠的C代码(R版本仍然为cluster:::silhouette.default.R)。
Observations with a large s(i) (almost 1) are very well clustered, a small s(i) (around 0) means that the observation lies between two clusters, and observations with a negative s(i) are probably placed in the wrong cluster.
与观测大s(i)(几乎1)很好聚集,小s(i)(约0)表示观察两个群集之间,意见是带负s(i)可能放置在错误的群集。
值----------Value----------
silhouette() returns an object, sil, of class silhouette which is an [n x 3] matrix with attributes. For each observation i, sil[i,] contains the cluster to which i belongs as well as the neighbor cluster of i (the cluster, not containing i, for which the average dissimilarity between its observations and i is minimal), and the silhouette width s(i) of the observation. The colnames correspondingly are c("cluster", "neighbor", "sil_width").
silhouette()返回一个对象,sil类silhouette这是一个NX 3]矩阵与属性。对于每一个我观察,sil[i,]包含我所属的集群,以及我的邻居簇(簇,不包含我,其中平均相异的意见,我是最小的),剪影宽度s(i)观察。 colnames相应c("cluster", "neighbor", "sil_width"),。
summary(sil) returns an object of class summary.silhouette, a list with components
summary(sil)类summary.silhouette,列表中的对象与组件返回
参数:si.summary
numerical summary of the individual silhouette widths s(i).
数值summary的个人剪影宽度s(i)。
参数:clus.avg.widths
numeric (rank 1) array of clusterwise means of silhouette widths where mean = FUN is used.
数字(等级1)数组clusterwise意味着剪影mean = FUN用于宽度。
参数:avg.width
the total mean FUN(s) where s are the individual silhouette widths.
总的意思FUN(s)s是个人剪影宽度。
参数:clus.sizes
table of the k cluster sizes.
tablek簇大小。
参数:call
if available, the call creating sil.
如果有的话,调用创建sil。
参数:Ordered
logical identical to attr(sil, "Ordered"), see below.
attr(sil, "Ordered")逻辑相同,见下文。
sortSilhouette(sil) orders the rows of sil as in the silhouette plot, by cluster (increasingly) and decreasing silhouette width s(i). <br> attr(sil, "Ordered") is a logical indicating if sil is ordered as by sortSilhouette(). In that case, rownames(sil) will contain case labels or numbers, and <br> attr(sil, "iOrd") the ordering index vector.
sortSilhouette(sil)命令行sil在剪影图集群(越来越多)和轮廓宽度减少s(i)。参考attr(sil, "Ordered")如果sil命令sortSilhouette()作为一个逻辑表示。在这种情况下,rownames(sil)将包含case标签或数字,并参考attr(sil, "iOrd")订购索引向量。
注意----------Note----------
While silhouette() is intrinsic to the partition clusterings, and hence has a (trivial) method for these, it is straightforward to get silhouettes from hierarchical clusterings from silhouette.default() with cutree() and distance as input.
虽然silhouette()是内在partition聚类,因此有这些(平凡)方法,它是直接从silhouette.default()cutree()得到的轮廓分层聚类作为输入的距离。
By default, for clara() partitions, the silhouette is just for the best random subset used. Use full = TRUE to compute (and later possibly plot) the full silhouette.
clara()分区,默认情况下,人影是只为最好的随机使用的一部分。使用full = TRUE计算(和以后可能图)充分剪影。
参考文献----------References----------
Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20, 53–65.
the references in <code>plot.agnes</code>.
参见----------See Also----------
partition.object, plot.partition.
partition.object,plot.partition。
举例----------Examples----------
data(ruspini)
pr4 <- pam(ruspini, 4)
str(si <- silhouette(pr4))
(ssi <- summary(si))
plot(si) # silhouette plot[剪影图]
plot(si, col = c("red", "green", "blue", "purple"))# with cluster-wise coloring[与集群明智的着色]
si2 <- silhouette(pr4$clustering, dist(ruspini, "canberra"))
summary(si2) # has small values: "canberra"'s fault[有小的值:“堪培拉”的错]
plot(si2, nmax= 80, cex.names=0.6)
op <- par(mfrow= c(3,2), oma= c(0,0, 3, 0),
mgp= c(1.6,.8,0), mar= .1+c(4,2,2,2))
for(k in 2:6)
plot(silhouette(pam(ruspini, k=k)), main = paste("k = ",k), do.n.k=FALSE)
mtext("PAM(Ruspini) as in Kaufman & Rousseeuw, p.101",
outer = TRUE, font = par("font.main"), cex = par("cex.main")); frame()
## the same with cluster-wise colours:[#同群明智的颜色:]
c6 <- c("tomato", "forest green", "dark blue", "purple2", "goldenrod4", "gray20")
for(k in 2:6)
plot(silhouette(pam(ruspini, k=k)), main = paste("k = ",k), do.n.k=FALSE,
col = c6[1:k])
par(op)
## clara(): standard silhouette is just for the best random subset[#克拉拉():标准的人影就是最好的随机子集]
data(xclara)
set.seed(7)
str(xc1k <- xclara[sample(nrow(xclara), size = 1000) ,])
cl3 <- clara(xc1k, 3)
plot(silhouette(cl3))# only of the "best" subset of 46[唯一的“最佳”子集46]
## The full silhouette: internally needs large (36 MB) dist object:[#完整的轮廓:国内需要大区(36 MB)对象:]
sf <- silhouette(cl3, full = TRUE) ## this is the same as[#这是一样的]
s.full <- silhouette(cl3$clustering, daisy(xc1k))
if(paste(R.version$major, R.version$minor, sep=".") >= "2.3.0")
stopifnot(all.equal(sf, s.full, check.attributes = FALSE, tol = 0))
## color dependent on original "3 groups of each 1000":[颜色上原有的“3组,每1000”依赖:]
plot(sf, col = 2+ as.integer(names(cl3$clustering) ) %/% 1000,
main ="plot(silhouette(clara(.), full = TRUE))")
## Silhouette for a hierarchical clustering:[#图案的层次聚类:]
ar <- agnes(ruspini)
si3 <- silhouette(cutree(ar, k = 5), # k = 4 gave the same as pam() above[K = 4以上PAM相同的()]
daisy(ruspini))
plot(si3, nmax = 80, cex.names = 0.5)
## 2 groups: Agnes() wasn't too good:[#2组:艾格尼丝()不太好:]
si4 <- silhouette(cutree(ar, k = 2), daisy(ruspini))
plot(si4, nmax = 80, cex.names = 0.5)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|