R语言:diana()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-16 17:33:31

diana(cluster)
diana()所属R语言包：cluster

                                    DIvisive ANAlysis Clustering
                                       分裂分析聚类

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Computes a divisive hierarchical clustering of the dataset returning an object of class diana.
计算一个数据集的分裂层次聚类类diana返回一个对象。

用法----------Usage----------

diana(x, diss = inherits(x, "dist"), metric = "euclidean", stand = FALSE,
   keep.diss = n < 100, keep.data = !diss)

参数----------Arguments----------

参数：x
data matrix or data frame, or dissimilarity matrix or object, depending on the value of the diss argument.  In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable.  All variables must be numeric. Missing values (NAs) are allowed.  In case of a dissimilarity matrix, x is typically the output of daisy or dist.  Also a vector of length n*(n-1)/2 is allowed (where n is the number of observations), and will be interpreted in the same way as the output of the above-mentioned functions. Missing values (NAs) are not allowed.
数据矩阵或数据框，或相异矩阵或对象，取决于diss参数值的。在一个矩阵或数据框的情况下，每一行对应一个观察，每列对应一个变量。所有的变量必须是数字。遗漏值（NAS）是允许的。在一个相异矩阵的情况下，x通常是daisy或dist输出。也被允许长度为n *（N-1）/ 2矢量（其中n为若干意见），将在上述功能的输出相同的方式解释。遗漏值（NAS）是不允许的。

参数：diss
logical flag: if TRUE (default for dist or dissimilarity objects), then x will be considered as a dissimilarity matrix.  If FALSE, then x will be considered as a matrix of observations by variables.
逻辑标志：如果为TRUE（默认为dist或dissimilarity对象），然后x将考虑作为一个相异矩阵。如果为FALSE，那么x将被视为一个由变量的观测矩阵。

参数：metric
character string specifying the metric to be used for calculating dissimilarities between observations.<br> The currently available options are "euclidean" and "manhattan".  Euclidean distances are root sum-of-squares of differences, and manhattan distances are the sum of absolute differences.  If x is already a dissimilarity matrix, then this argument will be ignored.
字符串，指定要使用公制计算之间的意见异同。参考目前可用的选项是“欧几里德”和“曼哈顿”。欧氏距离总和的平方差异的根，和曼哈顿距离是绝对差异的总和。 x如果已经是一个相异矩阵，那么这个参数将被忽略。

参数：stand
logical; if true, the measurements in x are standardized before calculating the dissimilarities.  Measurements are standardized for each variable (column), by subtracting the variable's mean value and dividing by the variable's mean absolute deviation.  If x is already a dissimilarity matrix, then this argument will be ignored.
逻辑，如果属实，测量x前计算的异同标准化。测量是为每个变量（列）减去变量的平均值除以变量的平均绝对偏差，标准化。 x如果已经是一个相异矩阵，那么这个参数将被忽略。

参数：keep.diss, keep.data
logicals indicating if the dissimilarities and/or input data x should be kept in the result.  Setting these to FALSE can give much smaller results and hence even save memory allocation time.
逻辑值表示的异同和/或输入数据x应保持在结果。这些设置FALSE可以给结果小得多，因此，即使节省内存分配时间。

Details

详情----------Details----------

diana is fully described in chapter 6 of Kaufman and Rousseeuw (1990). It is probably unique in computing a divisive hierarchy, whereas most other software for hierarchical clustering is agglomerative. Moreover, diana provides (a) the divisive coefficient (see diana.object) which measures the amount of clustering structure found; and (b) the banner, a novel graphical display (see plot.diana).
diana完全中所述：章考夫曼和Rousseeuw的（1990）6。这可能是独一无二的，在计算分裂层次，而凝聚层次聚类其他软件。此外，diana（一）分化系数（见diana.object）测量发现聚类结构;及（b）的旗帜，一个新的图形显示（见plot.diana）。

The diana-algorithm constructs a hierarchy of clusterings, starting with one large cluster containing all n observations. Clusters are divided until each cluster contains only a single observation.<br> At each stage, the cluster with the largest diameter is selected. (The diameter of a cluster is the largest dissimilarity between any two of its observations.)<br> To divide the selected cluster, the algorithm first looks for its most disparate observation (i.e., which has the largest average dissimilarity to the other observations of the selected cluster). This observation initiates the "splinter group". In subsequent steps, the algorithm reassigns observations that are closer to the "splinter group" than to the "old party". The result is a division of the selected cluster into two new clusters.
diana算法构造了一个层次聚类，有一个大的集群包含了所有N个观测开始。集群分为直到每个群集只包含一个单一的观察。参考在每个阶段，最大直径集群被选中。（群集的直径是最大的相异之间的任何两个的意见。）参考除以所选的群集，该算法首先查找其最不同的观察（即，它具有最大的平均相异的其他意见所选的群集）。观察启动“分裂集团”。在随后的步骤，该算法重新分配的意见，即是“分裂集团”比“老党员”。其结果是选定的群集分裂成两个新簇的。

值----------Value----------

an object of class "diana" representing the clustering; this class has methods for the following generic functions: print, summary, plot.
"diana"代表聚类;这个类有以下的通用功能的方法：一个类的对象print，summary，plot。

Further, the class "diana" inherits from "twins".  Therefore, the generic function pltree can be used on a diana object, and an as.hclust method is available.
此外，类"diana"继承"twins"。因此，通用的功能pltree可以用diana对象，as.hclust方法是可用的。

A legitimate diana object is a list with the following components:
一个合法的diana对象是一个具有下列组件的列表：

参数：order
a vector giving a permutation of the original observations to allow for plotting, in the sense that the branches of a clustering tree will not cross.
矢量给人一种原始观测置换允许图，在这个意义上，一个聚类树的树枝不会越过。

参数：order.lab
a vector similar to order, but containing observation labels instead of observation numbers.  This component is only available if the original observations were labelled.
矢量order，但观测的数字，而不是观察标签。如果原来的意见被打成这个组件是唯一可用的。

参数：height
a vector with the diameters of the clusters prior to splitting.
向量分裂前与簇的直径。

参数：dc
the divisive coefficient, measuring the clustering structure of the dataset.  For each observation i, denote by d(i) the diameter of the last cluster to which it belongs (before being split off as a single observation), divided by the diameter of the whole dataset.  The dc is the average of all 1 - d(i).  It can also be seen as the average width (or the percentage filled) of the banner plot. Because dc grows with the number of observations, this measure should not be used to compare datasets of very different sizes.
分裂系数，测量数据集的聚类结构。我为每个观察，记d(i)最后它所属的群集（之前被分割为一个单一的观察）除以整个数据集的直径，直径。 dc是所有的1 - d(i)平均水平。它也可以被看作是平均宽度（或填充的百分比）的旗帜图。 dc因为增长的若干意见，这项措施不应该被用来比较非常不同大小的数据集。

参数：merge
an (n-1) by 2 matrix, where n is the number of observations. Row i of merge describes the split at step n-i of the clustering. If a number j in row r is negative, then the single observation |j| is split off at stage n-r. If j is positive, then the cluster that will be splitted at stage n-j (described by row j), is split off at stage n-r.
（N-1）2矩阵，其中n为若干意见。排我merge描述的步骤聚类NI分裂。如果数量j的行r为负数，然后观察|j|分裂阶段NR。如果j是积极的，然后被分裂阶段NJ（行第j），将在分拆的产业集群，在舞台NR。

参数：diss
an object of class "dissimilarity", representing the total dissimilarity matrix of the dataset.
类"dissimilarity"代表的集总相异矩阵，对象。

参数：data
a matrix containing the original or standardized measurements, depending on the stand option of the function agnes.  If a dissimilarity matrix was given as input structure, then this component is not available.
矩阵包含原始的或标准化的测量，取决于stand选项的功能agnes。如果相异矩阵的输入结构，然后这个组件不可用。

参见----------See Also----------

agnes also for background and references; cutree (and as.hclust) for grouping extraction; daisy, dist, plot.diana, twins.object.
agnes还为背景和参照; cutree（和as.hclust）分组提取;daisy，dist，plot.diana，twins.object 。

举例----------Examples----------

data(votes.repub)
dv <- diana(votes.repub, metric = "manhattan", stand = TRUE)
print(dv)
plot(dv)

## Cut into 2 groups:[＃切成2组：]
dv2 <- cutree(as.hclust(dv), k = 2)
table(dv2) # 8 and 42 group members[8个和42个小组成员]
rownames(votes.repub)[dv2 == 1]

## For two groups, does the metric matter ?[＃两组，度量的事吗？]
dv0 <- diana(votes.repub, stand = TRUE) # default: Euclidean[默认是：欧几里德]
dv.2 <- cutree(as.hclust(dv0), k = 2)
table(dv2 == dv.2)## identical group assignments[＃相同的组分配]

data(agriculture)
## Plot similar to Figure 8 in ref[＃绘制类似于图8文献]
## Not run: plot(diana(agriculture), ask = TRUE)[＃无法运行图（戴安娜（农业），问= TRUE）]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言:diana()函数中文帮助文档(中英文对照)

浏览过的版块