Partitioning Around Medoids
译者:生物统计家园网 机器人LoveR
Partitioning (clustering) of the data into k clusters “around medoids”, a more robust version of K-means.
pam(x, k, diss = inherits(x, "dist"), metric = "euclidean",
medoids = NULL, stand = FALSE, cluster.only = FALSE,
do.swap = TRUE,
keep.diss = !diss && !cluster.only && n < 100, = !diss && !cluster.only, trace.lev = 0)
data matrix or data frame, or dissimilarity matrix or object, depending on the value of the diss argument. In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed—as long as every pair of observations has at least one case not missing. In case of a dissimilarity matrix, x is typically the output of daisy or dist. Also a vector of length n*(n-1)/2 is allowed (where n is the number of observations), and will be interpreted in the same way as the output of the above-mentioned functions. Missing values (NAs) are not allowed.
数据矩阵或数据框,或相异矩阵或对象,取决于diss参数值的。在一个矩阵或数据框的情况下,每一行对应一个观察,每列对应一个变量。所有的变量必须是数字。遗漏值(NA)只要每对观测有至少有一个不缺的情况下允许的作为。在一个相异矩阵的情况下,x通常是daisy或dist输出。也被允许长度为n *(N-1)/ 2矢量(其中n为若干意见),将在上述功能的输出相同的方式解释。遗漏值(NAS)是不允许的。
positive integer specifying the number of clusters, less than the number of observations.
logical flag: if TRUE (default for dist or dissimilarity objects), then x will be considered as a dissimilarity matrix. If FALSE, then x will be considered as a matrix of observations by variables.
character string specifying the metric to be used for calculating dissimilarities between observations.<br> The currently available options are "euclidean" and "manhattan". Euclidean distances are root sum-of-squares of differences, and manhattan distances are the sum of absolute differences. If x is already a dissimilarity matrix, then this argument will be ignored.
字符串,指定要使用公制计算之间的意见异同。参考目前可用的选项是“欧几里德”和“曼哈顿”。欧氏距离总和的平方差异的根,和曼哈顿距离是绝对差异的总和。 x如果已经是一个相异矩阵,那么这个参数将被忽略。
NULL (default) or length-k vector of integer indices (in 1:n) specifying initial medoids instead of using the "build" algorithm.
logical; if true, the measurements in x are standardized before calculating the dissimilarities. Measurements are standardized for each variable (column), by subtracting the variable's mean value and dividing by the variable's mean absolute deviation. If x is already a dissimilarity matrix, then this argument will be ignored.
逻辑,如果属实,测量x前计算的异同标准化。测量是为每个变量(列)减去变量的平均值除以变量的平均绝对偏差,标准化。 x如果已经是一个相异矩阵,那么这个参数将被忽略。
logical; if true, only the clustering will be computed and returned, see details.
logical indicating if the swap phase should happen. The default, TRUE, correspond to the original algorithm. On the other hand, the swap phase is much more computer intensive than the build one for large n, so can be skipped by do.swap = FALSE.
逻辑表明,如果在交换阶段应该发生。默认情况下,TRUE,对应的原始算法。另一方面,交换阶段是更比建立一个计算机密集的大型n,所以可以跳过do.swap = FALSE。
logicals indicating if the dissimilarities and/or input data x should be kept in the result. Setting these to FALSE can give much smaller results and hence even save memory allocation time.
integer specifying a trace level for printing diagnostics during the build and swap phase of the algorithm. Default 0 does not print anything; higher values print increasingly more.
pam is fully described in chapter 2 of Kaufman and Rousseeuw (1990). Compared to the k-means approach in kmeans, the function pam has the following features: (a) it also accepts a dissimilarity matrix; (b) it is more robust because it minimizes a sum of dissimilarities instead of a sum of squared euclidean distances; (c) it provides a novel graphical display, the silhouette plot (see plot.partition) (d) it allows to select the number of clusters using mean(silhouette(pr)) on the result pr <- pam(..), or directly its component pr$silinfo$avg.width, see also pam.object.
pam完全中所述:章考夫曼和Rousseeuw的(1990)2。 的K-means方法相比kmeans,功能pam具有以下特点:(一)它也接受相异矩阵;(b)它是更强大,因为它最大限度地减少了一笔的异同(三),而不是一个平方欧氏距离的总和;它提供了一种新的图形显示,轮廓图(见plot.partition)(D)允许选择数字集群使用mean(silhouette(pr))结果pr <- pam(..),或直接其组件pr$silinfo$avg.width,看到pam.object。
When cluster.only is true, the result is simply a (possibly named) integer vector specifying the clustering, i.e.,<br> pam(x,k, cluster.only=TRUE) is the same as <br> pam(x,k)$clustering but computed more efficiently.
当cluster.only是真实的,结果很简单(可能命名)指定的聚类的整数向量,即参考pam(x,k, cluster.only=TRUE)参考pam(x,k)$clustering但计算更有效地。
The pam-algorithm is based on the search for k representative objects or medoids among the observations of the dataset. These observations should represent the structure of the data. After finding a set of k medoids, k clusters are constructed by assigning each observation to the nearest medoid. The goal is to find k representative objects which minimize the sum of the dissimilarities of the observations to their closest representative object. <br> By default, when medoids are not specified, the algorithm first looks for a good initial set of medoids (this is called the build phase). Then it finds a local minimum for the objective function, that is, a solution such that there is no single switch of an observation with a medoid that will decrease the objective (this is called the swap phase).
When the medoids are specified, their order does not matter; in general, the algorithms have been designed to not depend on the order of the observations.
an object of class "pam" representing the clustering. See ?pam.object for details.
For large datasets, pam may need too much memory or too much computation time since both are O(n^2). Then, clara() is preferable, see its documentation.
参见----------See Also----------
agnes for background and references; pam.object, clara, daisy, partition.object, plot.partition, dist.
agnes背景和参考; pam.object,clara,daisy,partition.object,plot.partition,dist。
## generate 25 objects, divided into 2 clusters.[#产生25个对象,分为2簇。]
x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)),
cbind(rnorm(15,5,0.5), rnorm(15,5,0.5)))
pamx <- pam(x, 2)
## use obs. 1 & 16 as starting medoids -- same result (typically)[#使用OBS。 1&16起中心点 - 相同的结果(通常)]
(p2m <- pam(x, 2, medoids = c(1,16)))
p3m <- pam(x, 3, trace = 2)
## rather stupid initial medoids:[#而愚蠢的初始中心点:]
(p3m. <- pam(x, 3, medoids = 3:1, trace = 1))
pam(daisy(x, metric = "manhattan"), 2, diss = TRUE)
## Plot similar to Figure 4 in Stryuf et al (1996)[#绘制类似于图4在Stryuf等(1996)]
## Not run: plot(pam(ruspini, 4), ask = TRUE)[#无法运行图(PAM(ruspini,4),问= TRUE)]
转载请注明:出自 生物统计家园网(。