R语言 vegan包 cascadeKM()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-10-1 15:04:22

cascadeKM(vegan)
cascadeKM()所属R语言包：vegan

 K-means partitioning using a range of values of K
 K-分区使用范围的K值

 译者：生物统计家园网机器人LoveR

描述----------Description----------

This function is a wrapper for the kmeans function. It creates several partitions forming a cascade from a small to a large number of groups.
此功能的包装kmeans功能。它创建多个分区，从一个小的级联形成了大量的群体。

用法----------Usage----------

cascadeKM(data, inf.gr, sup.gr, iter = 100, criterion = "calinski")

cIndexKM(y, x, index = "all")

## S3 method for class 'cascadeKM'
plot(x, min.g, max.g, grpmts.plot = TRUE,
 sortg = FALSE, gridcol = NA, ...)

参数----------Arguments----------

参数： data
The data matrix. The objects (samples) are the rows.
数据矩阵。的对象（样品）的行。

参数： inf.gr
The number of groups for the partition with the smallest number of groups of the cascade (min).
为分区编号最小的级联（分钟）的组的组的数目。

参数： sup.gr
The number of groups for the partition with the largest number of groups of the cascade (max).
级联（最大值）拥有最大数量的基团的组的数目的分区。

参数： iter
The number of random starting configurations for each value of K.
的数量K的每个值的随机起始配置。

参数： criterion
The criterion that will be used to select the best partition. The default value is "calinski", which refers to the Calinski-Harabasz (1974) criterion. The simple structure index ("ssi") is also available. Other indices are available in function clustIndex (package cclust). In our experience, the two indices that work best and are most likely to return their maximum value at or near the optimal number of clusters are "calinski" and "ssi".
将被使用的标准，来选择最佳的分区。默认值是的"calinski"，这是指的Calinski Harabasz（1974）标准。结构简单，指数（"ssi"）也是可用的。指数在功能clustIndex（包cclust）。根据我们的经验，这两个指数的工作最好的和最有可能返回其最大价值达到或接近最优簇数目是"calinski"和"ssi"。

参数：y
Object of class "kmeans" returned by a clustering algorithm such as kmeans
类"kmeans"的聚类算法如kmeans返回的对象

参数：x
Data matrix where columns correspond to variables and rows to observations, or the plotting object in plot
数据矩阵中的列对应的变量和行观察，或绘制对象的plot

参数：index
The available indices are: "calinski" and "ssi". Type "all" to obtain both indices. Abbreviations of these names are also accepted.
可用的指标是："calinski"和"ssi"。输入"all"获得这两个指数。这些名称的缩写也是可以接受的。

参数：min.g, max.g
The minimum and maximum numbers of groups to be displayed.
的最小和最大数目的组，以被显示。

参数：grpmts.plot
Show the plot (TRUE or FALSE).
显示图（TRUE或FALSE）。

参数：sortg
Sort the objects as a function of their group membership to produce a more easily interpretable graph. See Details. The original object names are kept; they are used as labels in the output table x, although not in the graph. If there were no row names, sequential row numbers are used to keep track of the original order of the objects.
排序的函数对象作为它们的组成员更容易解释图。查看详细信息。原始对象的名称被保留;作为在输出表x的标签使用它们，虽然没有在图中。如果没有行名，连续的行号是用来跟踪的对象的原始顺序。

参数：gridcol
The colour of the grid lines in the plots. NA, which is the default value, removes the grid lines.
图中的网格线的颜色。 NA，这是默认值，删除网格线。

参数：...
Other parameters to the functions (ignored).
其他参数的功能（忽略）。

Details

详细信息----------Details----------

The function creates several partitions forming a cascade from a small to a large number of groups formed by kmeans. Most of the work is performed by function cIndex which is based on the clustIndex function (package cclust). Some of the criteria were removed from this version because computation errors were generated when only one object was found in a group.
该函数创建几个分区，形成一个小的级联到大量的个组由kmeans的。大部分的工作是进行功能cIndex：clustIndex（包cclust）函数的基础上。从这个版本的一些标准，因为计算错误产生时，只有一个对象被发现在一组。

The default value is "calinski", which refers to the well-known Calinski-Harabasz (1974) criterion. The other available index is the simple structure index "ssi" (Dolnicar et al. 1999). In the case of groups of equal sizes, "calinski" is generally a good criterion to indicate the correct number of groups. Users should not take its indications literally when the groups are not equal in size. Type "all" to obtain both indices. The indices are defined as:
默认值是的"calinski"，它指的是的知名Calinski-Harabasz（1974）标准。其他可用的索引是结构简单，指数"ssi"（Dolnicar等人，1999年）。组大小相等的情况下，"calinski"一般是一个很好的标准，指明了正确的组数。用户不应该把它的适应症时，逐字的群体的大小不等于。输入"all"获得这两个指数。该指数被定义为：

(SSB/(K-1))/(SSW/(n-K)), where n is the number of data points and K is the number of clusters. SSW is the sum of squares within the clusters while SSB is the sum of squares among the clusters. This index
(SSB/(K-1))/(SSW/(n-K))，这里n是数据点的数量和K的数字聚类。 SSW是聚类内的平方的总和，而SSB是聚类中的平方的总和。该指数

the “Simple Structure Index” multiplicatively combines several elements which influence the interpretability of a partitioning solution. The best partition is indicated by the
“简单结构指数”乘法结合了多种元素影响的分区解决方案的可解释性。所表示的最好的分区

In a simulation study, Milligan and Cooper (1985) found that the Calinski-Harabasz criterion recovered the correct number of groups the most often. We recommend this criterion because, if the groups are of equal sizes, the maximum value of "calinski" usually indicates the correct number of groups. Another available index is the simple structure index "ssi". Users should not take the indications of these indices literally when the groups are not equal in size and explore the groups corresponding to other values of K.
在模拟研究中，米利根和Cooper（1985）发现，该Calinski Harabasz标准恢复最正确的组数。我们建议这个标准，因为如果组是相同的大小，最大值"calinski"通常表明了正确的组数。另一种可用的索引是结构简单，指数"ssi"。用户不应该采取的迹象，这些指标时，逐字的群体是不相等的在大小和探索为其他值的K组。

Function cascadeKM has a plot method. Two plots are produced. The graph on the left has the objects in abscissa and the number of groups in ordinate. The groups are represented by colours. The graph on the right shows the values of the criterion ("calinski" or "ssi") for determining the best partition. The highest value of the criterion is marked in red. Points marked in orange, if any, indicate partitions producing an increase in the criterion value as the number of groups increases; they may represent other interesting partitions.
功能cascadeKM有plot方法。两个图。左边的曲线图上有在横轴，纵轴的组的数目的对象。该集团为代表的颜色。右图显示的值确定最佳的分区标准（"calinski"或"ssi"）。标准的最高值标记为红色。橙，如果没有，在标记的积分表明产生的基团的数目增加的标准值作为增加的分区，它们可能代表其它有趣的分区。

If sortg=TRUE, the objects are reordered by the following procedure: (1) a simple matching distance matrix is computed among the objects, based on the table of K-means assignments to groups, from K = min.g to K = max.g. (2) A principal coordinate analysis (PCoA, Gower 1966) is computed on the centred distance matrix. (3) The first principal coordinate is used as the new order of the objects in the graph. A simplified algorithm is used to compute the first principal coordinate only, using the iterative algorithm described in Legendre & Legendre (1998, Table 9.10). The full distance matrix among objects is never computed; this avoids the problem of storing it when the number of objects is large. Distance values are computed as they are needed by the algorithm.
如果sortg=TRUE，对象被重新排序，通过以下步骤：（1）一个简单的匹配距离矩阵的对象之间的计算，是根据上表的K-means分配到组，由K工业= min.gK=max.g。（2）主坐标分析（后交通动脉，高尔1966年）的中心的距离矩阵计算。（3）的第一主坐标作为新的顺序图中的对象。的简化的算法被用来计算第一主坐标只，使用中所述的Legendre＆勒让德（1998年，表9.10）的迭代算法。从未计算对象之间的充分的距离矩阵，这避免了时将其存储的对象的数目是大的问题。距离值的计算，因为它们是所需要的算法。

值----------Value----------

Function cascadeKM returns an object of class cascadeKM with items:
功能cascadeKM返回一个对象类cascadeKM的项目：

参数： partition
Table with the partitions found for different numbers of groups K, from K = inf.gr to K = sup.gr.
对不同的号码组分区表K，从K=inf.grK=sup.gr。

参数： results
Values of the criterion to select the best partition.
值的标准来选择最佳的分区。

参数： criterion
The name of the criterion used.
使用标准的名称。

参数： size
The number of objects found in each group, for all partitions (columns).
在各组中找到的对象的数量，为所有分区（列）。

Function cIndex returns a vector with the index values. The maximum value of these indices is supposed to indicate the best partition. These indices work best with groups of equal sizes. When the groups are not of equal sizes, one should not put too much faith in the maximum of these indices, and also explore the groups corresponding to other values of K.
功能cIndex返回一个向量的索引值。这些指数的最大值应该是表示最佳的分区。这些指标的工作组大小相等。的群体是不相同的大小，不应该把太多的信心，在这些指数的最大，并探索相应的其他值K组。

（作者）----------Author(s)----------

Marie-Helene Ouellette
<a href="mailto:Marie-Helene.Ouellette@UMontreal.ca">Marie-Helene.Ouellette@UMontreal.ca</a>, Sebastien Durand
<a href="mailto:Sebastien.Durand@UMontreal.ca">Sebastien.Durand@UMontreal.ca</a> and Pierre Legendre
<a href="mailto

ierre.Legendre@UMontreal.ca">

ierre.Legendre@UMontreal.ca</a>. Edited for vegan by Jari
Oksanen.

参考文献----------References----------

analysis. Commun. Stat. 3: 1-27.
cities: perceptual charting for analyzing destination images. Pp. 39-62 in: Woodside, A. et al. [eds.] Consumer psychology of tourism, hospitality and leisure. CAB International, New York.
methods used in multivariate analysis. Biometrika 53: 325-338.
English edition. Elsevier Science BV, Amsterdam.
determining the number of clusters in a data set. Psychometrika 50: 159-179.
Of Indexes For Determining The Number Of Clusters In Binary Data Sets, http://www.wu-wien.ac.at/am/wp99.htm#29

参见----------See Also----------

kmeans, clustIndex.
kmeans，clustIndex。

实例----------Examples----------

# Partitioning a (10 x 10) data matrix of random numbers[分区的（10×10）的随机数的数据矩阵]
mat <- matrix(runif(100),10,10)
res <- cascadeKM(mat, 2, 5, iter = 25, criterion = 'calinski')
toto <- plot(res)

# Partitioning an autocorrelated time series[分割时间序列的自相关]
vec <- sort(matrix(runif(30),30,1))
res <- cascadeKM(vec, 2, 5, iter = 25, criterion = 'calinski')
toto <- plot(res)

# Partitioning a large autocorrelated time series[分割一个大的自相关时间序列]
# Note that we remove the grid lines[需要注意的是，我们删除网格线]
vec <- sort(matrix(runif(1000),1000,1))
res <- cascadeKM(vec, 2, 7, iter = 10, criterion = 'calinski')
toto <- plot(res, gridcol=NA)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册