查看: 2785|回复: 0


发表于 2012-2-16 18:24:09 | 显示全部楼层 |阅读模式

                                        Fuzzy Analysis Clustering

                                         译者:生物统计家园网 机器人LoveR


Computes a fuzzy clustering of the data into k clusters.


fanny(x, k, diss = inherits(x, "dist"), memb.exp = 2,
      metric = c("euclidean", "manhattan", "SqEuclidean"),
      stand = FALSE, iniMem.p = NULL, cluster.only = FALSE,
      keep.diss = !diss && !cluster.only && n < 100,
      keep.data = !diss && !cluster.only,
      maxit = 500, tol = 1e-15, trace.lev = 0)


data matrix or data frame, or dissimilarity matrix, depending on the value of the diss argument.  In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed.  In case of a dissimilarity matrix, x is typically the output of daisy or dist.  Also a vector of length n*(n-1)/2 is allowed (where n is the number of observations), and will be interpreted in the same way as the output of the above-mentioned functions.  Missing values (NAs) are not allowed.  
数据矩阵或数据框,或相异矩阵,根据上diss参数值。在一个矩阵或数据框的情况下,每一行对应一个观察,每列对应一个变量。所有的变量必须是数字。遗漏值(NAS)是允许的。在一个相异矩阵的情况下,x通常是daisy或dist输出。也被允许长度为n *(N-1)/ 2矢量(其中n为若干意见),将在上述功能的输出相同的方式解释。遗漏值(NAS)是不允许的。

integer giving the desired number of clusters.  It is required that 0 < k < n/2 where n is the number of observations.
整数,所需数量的集群。这是要求0 < k < n/2其中n的若干意见。

logical flag: if TRUE (default for dist or dissimilarity objects), then x is assumed to be a dissimilarity matrix.  If FALSE, then x is treated as a matrix of observations by variables.  

number r strictly larger than 1 specifying the membership exponent used in the fit criterion; see the "Details" below. Default: 2 which used to be hardwired inside FANNY.

character string specifying the metric to be used for calculating dissimilarities between observations.  Options are "euclidean" (default), "manhattan", and "SqEuclidean".  Euclidean distances are root sum-of-squares of differences, and manhattan distances are the sum of absolute differences, and "SqEuclidean", the squared euclidean distances are sum-of-squares of differences.  Using this last option is equivalent (but somewhat slower) to computing so called &ldquo;fuzzy C-means&rdquo;. <br> If x is already a dissimilarity matrix, then this argument will be ignored.  
字符串指定的度量用于计算之间的意见异同。选项是"euclidean"(默认),"manhattan","SqEuclidean"。欧氏距离总和的平方差异的根源,和曼哈顿距离是绝对差异的总和,和"SqEuclidean",平方欧氏距离差异平方的总和。利用这最后的选项是等效(但有点慢)所谓的“模糊C-均值”计算。 <br>如果x已经是一个相异矩阵,那么这个参数将被忽略。

logical; if true, the measurements in x are standardized before calculating the dissimilarities.  Measurements are standardized for each variable (column), by subtracting the variable's mean value and dividing by the variable's mean absolute deviation.  If x is already a dissimilarity matrix, then this argument will be ignored.
逻辑,如果属实,测量x前计算的异同标准化。测量是为每个变量(列)减去变量的平均值除以变量的平均绝对偏差,标准化。 x如果已经是一个相异矩阵,那么这个参数将被忽略。

numeric n * k matrix or NULL (by default); can be used to specify a starting membership matrix, i.e., a matrix of non-negative numbers, each row summing to one.   </table>
数字n * k矩阵或NULL(默认);可以用来指定一个起点membership矩阵,也就是说,一个非负数的矩阵,每行一个总结。 </ TABLE>

logical; if true, no silhouette information will be computed and returned, see details. </table>
逻辑;如果属实,没有人影信息将被计算并返回,查看详细信息。 </ TABLE>

参数:keep.diss, keep.data
logicals indicating if the dissimilarities and/or input data x should be kept in the result.  Setting these to FALSE can give smaller results and hence also save memory allocation time.

参数:maxit, tol
maximal number of iterations and default tolerance for convergence (relative convergence of the fit criterion) for the FANNY algorithm.  The defaults maxit = 500 and tol =       1e-15 used to be hardwired inside the algorithm.
迭代收敛的范妮算法(合适的标准相衔接)的耐受性和默认的最大数量。默认maxit = 500和tol =       1e-15里面的算法硬。

integer specifying a trace level for printing diagnostics during the C-internal algorithm. Default 0 does not print anything; higher values print increasingly more.



In a fuzzy clustering, each observation is &ldquo;spread out&rdquo; over the various clusters.  Denote by u(i,v) the membership of observation i to cluster v.

The memberships are nonnegative, and for a fixed observation i they sum to 1. The particular method fanny stems from chapter 4 of Kaufman and Rousseeuw (1990) (see the references in daisy) and has been extended by Martin Maechler to allow user specified memb.exp, iniMem.p, maxit, tol, etc.
成员都是非负,和一个固定的观察,我总结1。特定的方法fanny源于章考夫曼和Rousseeuw的(1990)4(见daisy的参考文献),并已由马丁Maechler延长允许用户指定memb.exp,<X >,iniMem.p,maxit等。

Fanny aims to minimize the objective function

where n is the number of observations, k is the number of clusters, r is the membership exponent memb.exp and d(i,j) is the dissimilarity between observations i and j. <br> Note that r -> 1 gives increasingly crisper clusterings whereas r -> Inf leads to complete fuzzyness.  K\&amp;R(1990), p.191 note that values too close to 1 can lead to slow convergence.  Further note that even the default, r = 2 can lead to complete fuzzyness, i.e., memberships u(i,v) == 1/k.  In that case a warning is signalled and the user is advised to chose a smaller memb.exp (=r).
n的若干意见,k是数字集群,r是会员指数memb.exp和d(i,j)是观测之间的相异i和j。 <br>请注意,r -> 1给越来越明快的聚类,而r -> Inf导致完成fuzzyness的。 K \&R的(1990),p.191注意,太值接近1,可导致收敛速度慢。进一步指出,即使是默认的,r = 2可导致完成fuzzyness,即,成员u(i,v) == 1/k。在这种情况下,一个警告信号,并且建议用户选择一个较小的memb.exp(=r)。

Compared to other fuzzy clustering methods, fanny has the following features: (a) it also accepts a dissimilarity matrix; (b) it is more robust to the spherical cluster assumption; (c) it provides a novel graphical display, the silhouette plot (see plot.partition).
其他模糊聚类方法相比,fanny具有以下特点:(一)接受相异矩阵;(b)它是更强大的spherical cluster假设;(三)提供了一种新图形显示,剪影图(见plot.partition)。


an object of class "fanny" representing the clustering. See fanny.object for details.

参见----------See Also----------

agnes for background and references; fanny.object, partition.object, plot.partition, daisy, dist.
agnes背景和参考; fanny.object,partition.object,plot.partition,daisy,dist。


## generate 10+15 objects in two clusters, plus 3 objects lying[#产生10 +15两个集群的对象,再加上3个物体躺在]
## between those clusters.[#之间的集群。]
x <- rbind(cbind(rnorm(10, 0, 0.5), rnorm(10, 0, 0.5)),
           cbind(rnorm(15, 5, 0.5), rnorm(15, 5, 0.5)),
           cbind(rnorm( 3,3.2,0.5), rnorm( 3,3.2,0.5)))
fannyx <- fanny(x, 2)
## Note that observations 26:28 are "fuzzy" (closer to # 2):[#注意观察26:28是“模糊”(接近2#):]

(fan.x.15 &lt;- fanny(x, 2, memb.exp = 1.5)) # 'crispier' for obs. 26:28[crispier的OBS。 26:28]
(fanny(x, 2, memb.exp = 3))               # more fuzzy in general[一般更模糊]

f4 <- fanny(ruspini, 4)
stopifnot(rle(f4$clustering)$lengths == c(20,23,17,15))
plot(f4, which = 1)
## Plot similar to Figure 6 in Stryuf et al (1996)[#绘制类似于图6中Stryuf等(1996)]
plot(fanny(ruspini, 5))

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


使用道具 举报

您需要登录后才可以回帖 登录 | 注册


手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-2-2 23:27 , Processed in 0.020489 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表