找回密码
 注册
查看: 8043|回复: 0

R语言:silhouette()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-16 21:21:36 | 显示全部楼层 |阅读模式
silhouette(cluster)
silhouette()所属R语言包:cluster

                                        Compute or Extract Silhouette Information from Clustering
                                         从聚类计算或提取的剪影信息

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Compute silhouette information according to a given clustering in k clusters.
根据在k集群的集群计算的轮廓信息。


用法----------Usage----------


silhouette(x, ...)
## Default S3 method:[默认方法]
  silhouette(x, dist, dmatrix, ...)
## S3 method for class 'partition'
silhouette(x, ...)
## S3 method for class 'clara'
silhouette(x, full = FALSE, ...)

sortSilhouette(object, ...)
## S3 method for class 'silhouette'
summary(object, FUN = mean, ...)
## S3 method for class 'silhouette'
plot(x, nmax.lab = 40, max.strlen = 5,
     main = NULL, sub = NULL, xlab = expression("Silhouette width "* s[i]),
     col = "gray",  do.col.sort = length(col) > 1, border = 0,
     cex.names = par("cex.axis"), do.n.k = TRUE, do.clus.stat = TRUE, ...)



参数----------Arguments----------

参数:x
an object of appropriate class; for the default method an integer vector with k different integer cluster codes or a list with such an x$clustering component.  Note that silhouette statistics are only defined if 2 <= k <= n-1.
适当的类的对象;default方法k不同的整数集群代码或x$clustering组件列表的整数向量。注意剪影统计,仅定义如果2 <= k <= n-1。


参数:dist
a dissimilarity object inheriting from class dist or coercible to one.  If not specified, dmatrix must be.
1相异继承类dist或强制转换到一个对象。如果没有指定,dmatrix必须。


参数:dmatrix
a symmetric dissimilarity matrix (n * n), specified instead of dist, which can be more efficient.
对称的相异矩阵(n * n),而不是dist,可以更有效地指定。


参数:full
logical specifying if a full silhouette should be computed for clara object.  Note that this requires O(n^2) memory, since the full dissimilarity (see daisy) is needed internally.
如果一个完整的轮廓应为clara对象计算逻辑指定。请注意,这需要O(n^2)内存,因为完全相异(见daisy)内部需要。


参数:object
an object of class silhouette.
对象类silhouette。


参数:...
further arguments passed to and from methods.
通过进一步的论据,以及从方法。


参数:FUN
function used to summarize silhouette widths.
函数用来总结轮廓宽度。


参数:nmax.lab
integer indicating the number of labels which is considered too large for single-name labeling the silhouette plot.
整数,指示标签,这被认为是单一名称标记的剪影图太大的数目。


参数:max.strlen
positive integer giving the length to which strings are truncated in silhouette plot labeling.
正整数剪影图标签截断字符串的长度。


参数:main, sub, xlab
arguments to title; have a sensible non-NULL default here.
参数title;这里有一个合理的非默认值为NULL。


参数:col, border, cex.names
arguments passed barplot(); note that the default used to be col       = heat.colors(n), border = par("fg") instead.<br> col can also be a color vector of length k for clusterwise coloring, see also do.col.sort:  
参数传递barplot();注意,默认使用col       = heat.colors(n), border = par("fg"),而不是参考col也可以是一个长度为颜色矢量k的clusterwise着色,亦见“。 do.col.sort


参数:do.col.sort
logical indicating if the colors col should be sorted &ldquo;along&rdquo; the silhouette; this is useful for casewise or clusterwise coloring.
逻辑表示如果颜色col应排序“沿”的剪影,这是有用为casewise或clusterwise着色。


参数:do.n.k
logical indicating if n and k &ldquo;title text&rdquo; should be written.
逻辑表示如果n和k“标题文字”应写入。


参数:do.clus.stat
logical indicating if cluster size and averages should be written right to the silhouettes.
逻辑表示如果簇大小和平均应写入权的轮廓。


Details

详情----------Details----------

For each observation i, the silhouette width s(i) is defined as follows: <br> Put a(i) = average dissimilarity between i and all other points of the cluster to which i belongs (if i is the only observation in its cluster, s(i) := 0 without further calculations). For all other clusters C, put d(i,C) = average dissimilarity of i to all observations of C.  The smallest of these d(i,C) is b(i) := \min_C d(i,C), and can be seen as the dissimilarity between i and its &ldquo;neighbor&rdquo; cluster, i.e., the nearest one to which it does not belong.
我为每个观察,轮廓宽度s(i)定义如下:参考把我和我所属的集群的所有其他点(A(I)=平均相异,如果我是唯一的观察它的集群,s(i) := 0没有进一步计算)。所有的C其他集群,把d(i,C)=平均相异的I C的所有意见最小这些d(i,C)是b(i) := \min_C d(i,C),我和它之间的不同,可以看出“邻居”的产业集群,也就是说,它不属于最近的一个。

silhouette.default() is now based on C code donated by Romain Francois (the R version being still available as cluster:::silhouette.default.R).
silhouette.default()现由罗曼·弗朗索瓦捐赠的C代码(R版本仍然为cluster:::silhouette.default.R)。

Observations with a large s(i) (almost 1) are very well clustered, a small s(i) (around 0) means that the observation lies between two clusters, and observations with a negative s(i) are probably placed in the wrong cluster.
与观测大s(i)(几乎1)很好聚集,小s(i)(约0)表示观察两个群集之间,意见是带负s(i)可能放置在错误的群集。


值----------Value----------

silhouette() returns an object, sil, of class silhouette which is an [n x 3] matrix with attributes.  For each observation i, sil[i,] contains the cluster to which i belongs as well as the neighbor cluster of i (the cluster, not containing i, for which the average dissimilarity between its observations and i is minimal), and the silhouette width s(i) of the observation.  The colnames correspondingly are c("cluster", "neighbor", "sil_width").
silhouette()返回一个对象,sil类silhouette这是一个NX 3]矩阵与属性。对于每一个我观察,sil[i,]包含我所属的集群,以及我的邻居簇(簇,不包含我,其中平均相异的意见,我是最小的),剪影宽度s(i)观察。 colnames相应c("cluster", "neighbor", "sil_width"),。

summary(sil) returns an object of class summary.silhouette, a list with components
summary(sil)类summary.silhouette,列表中的对象与组件返回


参数:si.summary
numerical summary of the individual silhouette widths s(i).
数值summary的个人剪影宽度s(i)。


参数:clus.avg.widths
numeric (rank 1) array of clusterwise means of silhouette widths where mean = FUN is used.
数字(等级1)数组clusterwise意味着剪影mean = FUN用于宽度。


参数:avg.width
the total mean FUN(s) where s are the individual silhouette widths.
总的意思FUN(s)s是个人剪影宽度。


参数:clus.sizes
table of the k cluster sizes.
tablek簇大小。


参数:call
if available, the call creating sil.
如果有的话,调用创建sil。


参数:Ordered
logical identical to attr(sil, "Ordered"), see below.
attr(sil, "Ordered")逻辑相同,见下文。

sortSilhouette(sil) orders the rows of sil as in the silhouette plot, by cluster (increasingly) and decreasing silhouette width s(i). <br> attr(sil, "Ordered") is a logical indicating if sil is ordered as by sortSilhouette(). In that case, rownames(sil) will contain case labels or numbers, and <br> attr(sil, "iOrd") the ordering index vector.
sortSilhouette(sil)命令行sil在剪影图集群(越来越多)和轮廓宽度减少s(i)。参考attr(sil, "Ordered")如果sil命令sortSilhouette()作为一个逻辑表示。在这种情况下,rownames(sil)将包含case标签或数字,并参考attr(sil, "iOrd")订购索引向量。


注意----------Note----------

While silhouette() is intrinsic to the partition clusterings, and hence has a (trivial) method for these, it is straightforward to get silhouettes from hierarchical clusterings from silhouette.default() with cutree() and distance as input.
虽然silhouette()是内在partition聚类,因此有这些(平凡)方法,它是直接从silhouette.default()cutree()得到的轮廓分层聚类作为输入的距离。

By default, for clara() partitions, the silhouette is just for the best random subset used.  Use full = TRUE to compute (and later possibly plot) the full silhouette.
clara()分区,默认情况下,人影是只为最好的随机使用的一部分。使用full = TRUE计算(和以后可能图)充分剪影。


参考文献----------References----------

Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20, 53&ndash;65.
the references in <code>plot.agnes</code>.

参见----------See Also----------

partition.object, plot.partition.
partition.object,plot.partition。


举例----------Examples----------


data(ruspini)
pr4 <- pam(ruspini, 4)
str(si <- silhouette(pr4))
(ssi <- summary(si))
plot(si) # silhouette plot[剪影图]
plot(si, col = c("red", "green", "blue", "purple"))# with cluster-wise coloring[与集群明智的着色]

si2 <- silhouette(pr4$clustering, dist(ruspini, "canberra"))
summary(si2) # has small values: "canberra"'s fault[有小的值:“堪培拉”的错]
plot(si2, nmax= 80, cex.names=0.6)

op <- par(mfrow= c(3,2), oma= c(0,0, 3, 0),
          mgp= c(1.6,.8,0), mar= .1+c(4,2,2,2))
for(k in 2:6)
   plot(silhouette(pam(ruspini, k=k)), main = paste("k = ",k), do.n.k=FALSE)
mtext("PAM(Ruspini) as in Kaufman &amp; Rousseeuw, p.101",
      outer = TRUE, font = par("font.main"), cex = par("cex.main")); frame()

## the same with cluster-wise colours:[#同群明智的颜色:]
c6 <- c("tomato", "forest green", "dark blue", "purple2", "goldenrod4", "gray20")
for(k in 2:6)
   plot(silhouette(pam(ruspini, k=k)), main = paste("k = ",k), do.n.k=FALSE,
        col = c6[1:k])
par(op)

## clara(): standard silhouette is just for the best random subset[#克拉拉():标准的人影就是最好的随机子集]
data(xclara)
set.seed(7)
str(xc1k <- xclara[sample(nrow(xclara), size = 1000) ,])
cl3 <- clara(xc1k, 3)
plot(silhouette(cl3))# only of the "best" subset of 46[唯一的“最佳”子集46]
## The full silhouette: internally needs large (36 MB) dist object:[#完整的轮廓:国内需要大区(36 MB)对象:]
sf &lt;- silhouette(cl3, full = TRUE) ## this is the same as[#这是一样的]
s.full <- silhouette(cl3$clustering, daisy(xc1k))
if(paste(R.version$major, R.version$minor, sep=".") >= "2.3.0")
   stopifnot(all.equal(sf, s.full, check.attributes = FALSE, tol = 0))
## color dependent on original "3 groups of each 1000":[颜色上原有的“3组,每1000”依赖:]
plot(sf, col = 2+ as.integer(names(cl3$clustering) ) %/% 1000,
     main ="plot(silhouette(clara(.), full = TRUE))")

## Silhouette for a hierarchical clustering:[#图案的层次聚类:]
ar <- agnes(ruspini)
si3 &lt;- silhouette(cutree(ar, k = 5), # k = 4 gave the same as pam() above[K = 4以上PAM相同的()]
                       daisy(ruspini))
plot(si3, nmax = 80, cex.names = 0.5)
## 2 groups: Agnes() wasn't too good:[#2组:艾格尼丝()不太好:]
si4 <- silhouette(cutree(ar, k = 2), daisy(ruspini))
plot(si4, nmax = 80, cex.names = 0.5)

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-23 21:12 , Processed in 0.021178 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表