R语言:kmeans()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-16 20:25:18

kmeans(stats)
kmeans()所属R语言包：stats

                                       K-Means Clustering
                                       K-means聚类

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Perform k-means clustering on a data matrix.
执行数据矩阵的k-means聚类。

用法----------Usage----------

kmeans(x, centers, iter.max = 10, nstart = 1,
   algorithm = c("Hartigan-Wong", "Lloyd", "Forgy",
                  "MacQueen"))

参数----------Arguments----------

参数：x
numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).
数字矩阵的数据，或可以被裹挟到这样一个矩阵（如所有数字列一个数值向量或数据框）的对象。

参数：centers
either the number of clusters, say k, or a set of initial (distinct) cluster centres.  If a number, a random set of (distinct) rows in x is chosen as the initial centres.
无论是数字集群，说k，或一组初始（不同的）聚类中心。如果一个数字，一个（不同的）行x随机选择初始中心。

参数：iter.max
the maximum number of iterations allowed.
允许的最大迭代数。

参数：nstart
if centers is a number, how many random sets should be chosen?
centers如果是一个数字，应选择多少随机集？

参数：algorithm
character: may be abbreviated.
性格：可能是缩写。

Details

详情----------Details----------

The data given by x is clustered by the k-means method, which aims to partition the points into k groups such that the sum of squares from points to the assigned cluster centres is minimized. At the minimum, all cluster centres are at the mean of their Voronoi sets (the set of data points which are nearest to the cluster centre).
x给出的数据是集群k意味着方法，旨在分割成k组等，由点到指定的聚类中心的平方和最小的点。在最低限度，所有的聚类中心是在他们的Voronoi集平均（最接近群集中心的数据点集）。

The algorithm of Hartigan and Wong (1979) is used by default.  Note that some authors use k-means to refer to a specific algorithm rather than the general method: most commonly the algorithm given by MacQueen (1967) but sometimes that given by Lloyd (1957) and Forgy (1965). The Hartigan–Wong algorithm generally does a better job than either of those, but trying several random starts (nstart> 1) is often recommended. For ease of programmatic exploration, k=1 is allowed, notably returning the center and withinss.
Hartigan和黄（1979）的算法，默认情况下使用。请注意，一些作者使用k，指来指一般的方法，而不是具体的算法：最常用由MacQueen（1967）给出的算法，但有时，劳埃德（1957年）和Forgy（1965）。 Hartigan皇算法一般比这些都做了更好的工作，但尝试一些随机启动（nstart  > 1）通常建议。方便的纲领性勘探，k=1的是允许的，特别是返回中心和withinss。

Except for the Lloyd–Forgy method, k clusters will always be returned if a number is specified. If an initial matrix of centres is supplied, it is possible that no point will be closest to one or more centres, which is currently an error for the Hartigan–Wong method.
k集群除劳埃德Forgy方法，将永远被退回，如果指定一个数字。如果初始矩阵提供一个中心，很可能将最接近的一个或多个中心，这是目前Hartigan黄方法错误，没有点。

值----------Value----------

An object of class "kmeans" which has a print method and is a list with components:
一个类的对象"kmeans"有print方法是一个组件的列表：

参数：cluster
A vector of integers (from 1:k) indicating the cluster to which each point is allocated.
一个整型vector（从1:k）表示每个点被分配到集群。

参数：centers
A matrix of cluster centres.
聚类中心矩阵。

参数：withinss
The within-cluster sum of squares for each cluster.
集群内为每个群集的平方的总和。

参数：totss
The total within-cluster sum of squares.
集群内的总平方和。

参数：tot.withinss
Total within-cluster sum of squares, i.e., sum(withinss).
共有集群内平方和，即sum(withinss)。

参数：betweenss
The between-cluster sum of squares.
平方集群间的总和。

参数：size
The number of points in each cluster.
在每个集群点的数量。

参考文献----------References----------

efficiency vs interpretability of classifications. Biometrics 21, 768–769.
A K-means clustering algorithm. Applied Statistics 28, 100–108.
Technical Note, Bell Laboratories.  Published in 1982 in IEEE Transactions on Information Theory 28, 128–137.
multivariate observations. In Proceedings of the Fifth Berkeley Symposium on  Mathematical Statistics and  Probability, eds L. M. Le Cam & J. Neyman, 1, pp. 281–297. Berkeley, CA: University of California Press.

举例----------Examples----------

require(graphics)

# a 2-dimensional example[一个2维的例子]
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
         matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
(cl <- kmeans(x, 2))
plot(x, col = cl$cluster)
points(cl$centers, col = 1:2, pch = 8, cex=2)

kmeans(x,1)$withinss # if you are interested in that[如果你有兴趣在这]

## random starts do help here with too many clusters[＃随机启动帮助这里有太多的集群]
(cl <- kmeans(x, 5, nstart = 25))
plot(x, col = cl$cluster)
points(cl$centers, col = 1:5, pch = 8)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册