找回密码
 注册
查看: 9013|回复: 0

R语言:hclust()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-16 19:38:18 | 显示全部楼层 |阅读模式
hclust(stats)
hclust()所属R语言包:stats

                                        Hierarchical Clustering
                                         层次聚类

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Hierarchical cluster analysis on a set of dissimilarities and methods for analyzing it.
上一组的异同和分析方法的聚类分析。


用法----------Usage----------


hclust(d, method = "complete", members=NULL)

## S3 method for class 'hclust'
plot(x, labels = NULL, hang = 0.1,
     axes = TRUE, frame.plot = FALSE, ann = TRUE,
     main = "Cluster Dendrogram",
     sub = NULL, xlab = NULL, ylab = "Height", ...)

plclust(tree, hang = 0.1, unit = FALSE, level = FALSE, hmin = 0,
        square = TRUE, labels = NULL, plot. = TRUE,
        axes = TRUE, frame.plot = FALSE, ann = TRUE,
        main = "", sub = NULL, xlab = NULL, ylab = "Height")



参数----------Arguments----------

参数:d
a dissimilarity structure as produced by dist.
1相异结构产生dist。


参数:method
the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward", "single", "complete", "average", "mcquitty", "median" or "centroid".
结块的方法来使用。这应该是(一个明确的缩写)"ward","single","complete","average","mcquitty","median"或<X >


参数:members
NULL or a vector with length size of d. See the "Details" section.
NULL或向量的长度大小的d。看到“详细资料”一节。


参数:x,tree
an object of the type produced by hclust.
由hclust类型的对象。


参数:hang
The fraction of the plot height by which labels should hang below the rest of the plot. A negative value will cause the labels to hang down from 0.
标签应挂在下面截断的图的图高度的分数。负值将导致标签挂0。


参数:labels
A character vector of labels for the leaves of the tree. By default the row names or row numbers of the original data are used. If labels=FALSE no labels at all are plotted.
一个特征向量树的叶子的标签。默认情况下,该行的名称或原始数据的行数。如果labels=FALSE没有在所有的标签绘制。


参数:axes, frame.plot, ann
logical flags as in plot.default.
plot.default逻辑标志。


参数:main, sub, xlab, ylab
character strings for title.  sub and xlab have a non-NULL default when there's a tree$call.
title字符串。 sub和xlab有一个非默认值为NULL时,有一个tree$call。


参数:...
Further graphical arguments.
进一步的图形参数。


参数:unit
logical.  If true, the splits are plotted at equally-spaced heights rather than at the height in the object.
逻辑。如果情况属实,劈叉绘制间距相等的高度,而不是在对象的高度。


参数:hmin
numeric.  All heights less than hmin are regarded as being hmin: this can be used to suppress detail at the bottom of the tree.
数字。所有高度小于hmin是hmin:这可以用来抑制在树的底部的细节。


参数:level, square, plot.
as yet unimplemented arguments of plclust for S-PLUS compatibility.
plclust,S-PLUS的兼容性尚未未使用的参数。


Details

详情----------Details----------

This function performs a hierarchical cluster analysis using a set of dissimilarities for the n objects being clustered.  Initially, each object is assigned to its own cluster and then the algorithm proceeds iteratively, at each stage joining the two most similar clusters, continuing until there is just a single cluster. At each stage distances between clusters are recomputed by the Lance&ndash;Williams dissimilarity update formula according to the particular clustering method being used.
这个函数执行聚类分析,使用正聚集n对象的异同。最初,每个对象都分配给自己的集群,则算法迭代收益在每一个阶段,参加两个最相似的集群,一直持续到有仅仅是一个单一集群。在集群之间每个阶段的距离是由兰斯·威廉姆斯不同的更新公式重新计算,根据正在使用的特定的聚类方法。

A number of different clustering methods are provided.  Ward's minimum variance method aims at finding compact, spherical clusters. The complete linkage method finds similar clusters. The single linkage method (which is closely related to the minimal spanning tree) adopts a "friends of friends" clustering strategy.  The other methods can be regarded as aiming for clusters with characteristics somewhere between the single and complete link methods.  Note however, that methods "median" and "centroid" are not leading to a monotone distance measure, or equivalently the resulting dendrograms can have so called inversions (which are hard to interpret).
提供了许多不同的聚类方法。病房的最小方差法,旨在寻找紧凑,球形集群。完整的联系方法找到类似的集群。单一的联系方法(这是密切相关的最小生成树)采用“朋友的朋友”的集群策略。其他的方法可以被视为旨在为特色集群之间单一和完整的链接方法。但是请注意,方法"median"和"centroid"不单调的距离措施,或等效的聚类结果可以有所谓的反转(这是很难解释)。

If members!=NULL, then d is taken to be a dissimilarity matrix between clusters instead of dissimilarities between singletons and members gives the number of observations per cluster.  This way the hierarchical cluster algorithm can be "started in the middle of the dendrogram", e.g., in order to reconstruct the part of the tree above a cut (see examples). Dissimilarities between clusters can be efficiently computed (i.e., without hclust itself) only for a limited number of distance/linkage combinations, the simplest one being squared Euclidean distance and centroid linkage.  In this case the dissimilarities between the clusters are the squared Euclidean distances between cluster means.
如果members!=NULL然后d是一个集群,而不是单身和之间的异同之间的相异矩阵members给每个集群的意见。这样的层次聚类算法“的树状中期开始,例如,为了重建上面砍树的一部分(见例子)。集群之间的异同,可以有效地计算(即不hclust本身),只有数量有限的距离/联动组合,最简单的平方欧几里德距离和重心联动。在这种情况下,集群之间的异同之间的集群方式的平方欧氏距离。

In hierarchical cluster displays, a decision is needed at each merge to specify which subtree should go on the left and which on the right. Since, for n observations there are n-1 merges, there are 2^{(n-1)} possible orderings for the leaves in a cluster tree, or dendrogram. The algorithm used in hclust is to order the subtree so that the tighter cluster is on the left (the last, i.e., most recent, merge of the left subtree is at a lower value than the last merge of the right subtree). Single observations are the tightest clusters possible, and merges involving two observations place them in order by their observation sequence number.
在分层聚类显示,决定是否需要在每个合并到指定子树应该去的左侧和右侧。 n观测以来,有n-1合并,有2^{(n-1)}簇树的叶子,或树状的可能顺序。 hclust所使用的算法是命令子树,使左侧紧簇(最后,即最近期合并的左子树是在较低的值比去年的右子树合并)。单观测是最严格的集群可能,合并涉及两个观测放置,以便他们通过自己的观察序列号。


值----------Value----------

An object of class hclust which describes the tree produced by the clustering process. The object is a list with components:
一个对象的的类hclust介绍聚类过程中所产生的树。对象是一个组件列表:


参数:merge
an n-1 by 2 matrix. Row i of merge describes the merging of clusters at step i of the clustering. If an element j in the row is negative, then observation -j was merged at this stage. If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm. Thus negative entries in merge indicate agglomerations of singletons, and positive entries indicate agglomerations of non-singletons.
n-12矩阵。行imerge介绍一步i聚类簇的合并。如果一个元素j行是否定的,然后观察-j在这个阶段合并。 j如果是积极的,那么合并形成的阶段(早)j算法集群。因此merge负项表明单身群,并积极的条目表明,非单身群。


参数:height
a set of n-1 real values (non-decreasing for ultrametric trees). The clustering height: that is, the value of the criterion associated with the clustering method for the particular agglomeration.
n-1(超度量树木非递减)实际值。聚类的高度,即:与集群相关的标准值method为特定的集聚。


参数:order
a vector giving the permutation of the original observations suitable for plotting, in the sense that a cluster plot using this ordering and matrix merge will not have crossings of the branches.
矢量给予适合用于绘制的原始观测的排列,在这个意义上使用这个顺序和矩阵merge集群图不会有分支机构的口岸。


参数:labels
labels for each of the objects being clustered.
标签为每个正在聚集的对象。


参数:call
the call which produced the result.
呼叫产生的结果。


参数:method
the cluster method that has been used.
已使用的聚类方法。


参数:dist.method
the distance that has been used to create d (only returned if the distance object has a "method" attribute).
已被用来创建距离d(只返回的距离,如果对象有一个"method"属性)。

There are print, plot and identify (see identify.hclust) methods and the rect.hclust() function for hclust objects. The plclust() function is basically the same as the plot method, plot.hclust, primarily for back compatibility with S-PLUS. Its extra arguments are not yet implemented.
有print,plot和identify(见identify.hclust)方法和rect.hclust()hclust对象的功能。 plclust()函数基本上是积方法相同,plot.hclust,主要用于回,S-PLUS的兼容性。尚未实现其额外的参数。


作者(S)----------Author(s)----------



The <code>hclust</code> function is based on Fortran code
contributed to STATLIB by F. Murtagh.




参考文献----------References----------

The New S Language. Wadsworth &amp; Brooks/Cole. (S version.)
Cluster Analysis. London: Heinemann Educ. Books.
Clustering  Algorithms. New York: Wiley.
Numerical Taxonomy. San Francisco: Freeman.
Cluster Analysis for Applications. Academic Press: New York.
Classification. Second Edition. London: Chapman and Hall / CRC
&ldquo;Multidimensional Clustering Algorithms&rdquo;, in COMPSTAT Lectures 4. Wuerzburg: Physica-Verlag (for algorithmic details of algorithms used).
Similarity Analysis by Reciprocal Pairs for Discrete and Continuous Data. Educational and Psychological Measurement, 26, 825&ndash;831.

参见----------See Also----------

identify.hclust, rect.hclust, cutree, dendrogram, kmeans.
identify.hclust,rect.hclust,cutree,dendrogram,kmeans。

For the Lance&ndash;Williams formula and methods that apply it generally, see agnes from package cluster.
对于兰斯·威廉姆斯公式和方法,它适用于一般的,看到agnes包cluster。


举例----------Examples----------


require(graphics)

hc <- hclust(dist(USArrests), "ave")
plot(hc)
plot(hc, hang = -1)

## Do the same with centroid clustering and squared Euclidean distance,[#做相同的质心聚类和欧氏距离平方,]
## cut the tree into ten clusters and reconstruct the upper part of the[#切成10集群树和重建的上部]
## tree from the cluster centers.[#树的聚类中心。]
hc <- hclust(dist(USArrests)^2, "cen")
memb <- cutree(hc, k = 10)
cent <- NULL
for(k in 1:10){
  cent <- rbind(cent, colMeans(USArrests[memb == k, , drop = FALSE]))
}
hc1 <- hclust(dist(cent)^2, method = "cen", members = table(memb))
opar <- par(mfrow = c(1, 2))
plot(hc,  labels = FALSE, hang = -1, main = "Original Tree")
plot(hc1, labels = FALSE, hang = -1, main = "Re-start from 10 clusters")
par(opar)

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-24 16:37 , Processed in 0.023032 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表