mona(cluster)
mona()所属R语言包:cluster
MONothetic Analysis Clustering of Binary Variables
二元变量MONothetic分析聚类
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Returns a list representing a divisive hierarchical clustering of a dataset with binary variables only.
返回一个列表代表的分裂层次聚类二元变量的数据集。
用法----------Usage----------
mona(x)
参数----------Arguments----------
参数:x
data matrix or data frame in which each row corresponds to an observation, and each column corresponds to a variable. All variables must be binary. A limited number of missing values (NAs) is allowed. Every observation must have at least one value different from NA. No variable should have half of its values missing. There must be at least one variable which has no missing values. A variable with all its non-missing values identical, is not allowed.
数据矩阵或数据框中每一行对应一个观察,每列对应一个变量。所有的变量必须是二进制。允许数量有限的缺失值(NAS)。每个观察都必须至少有一个值适用不同。任何变量应该有其价值的一半失踪。必须有至少有一个有没有缺失值的变量。所有非缺失值相同,变量是不允许的。
Details
详情----------Details----------
mona is fully described in chapter 7 of Kaufman and Rousseeuw (1990). It is "monothetic" in the sense that each division is based on a single (well-chosen) variable, whereas most other hierarchical methods (including agnes and diana) are "polythetic", i.e. they use all variables together.
mona完全中所述:章考夫曼和Rousseeuw的(1990)7。这是的“monothetic”在基于单(精心挑选的)变量的意义,各部门,而其他大多数的分层方法(agnes和diana)“polythetic”,即包括他们使用的所有变量。
The mona-algorithm constructs a hierarchy of clusterings, starting with one large cluster. Clusters are divided until all observations in the same cluster have identical values for all variables.<br> At each stage, all clusters are divided according to the values of one variable. A cluster is divided into one cluster with all observations having value 1 for that variable, and another cluster with all observations having value 0 for that variable.
mona算法构造了一个层次聚类,一个大型集群开始。集群被划分在同一集群的所有意见,直到所有变量有相同的值。参考在每一个阶段,所有的集群划分根据一个变量的值。集群分为一个与所有的意见,该变量的值1和其他所有该变量的值0意见集群的集群。
The variable used for splitting a cluster is the variable with the maximal total association to the other variables, according to the observations in the cluster to be splitted. The association between variables f and g is given by a(f,g)*d(f,g) - b(f,g)*c(f,g), where a(f,g), b(f,g), c(f,g), and d(f,g) are the numbers in the contingency table of f and g. [That is, a(f,g) (resp. d(f,g)) is the number of observations for which f and g both have value 0 (resp. value 1); b(f,g) (resp. c(f,g)) is the number of observations for which f has value 0 (resp. 1) and g has value 1 (resp. 0).] The total association of a variable f is the sum of its associations to all variables.
分裂集群使用的变量是变量与其他变量的最大总协会,根据被分拆的集群的意见。变量F和G之间的关联(F,G)* D(F,G) - B(F,G)* C(F,G),(F,G),B(F, G),C(F,G),D(F,G)f和g的应急表中的数字。 [也就是说,(F,G)(或D(F,G))的若干意见,f和g值0(或值1),B(F,G)(或C(F,G))是为0(或1)和g的值为1(或0),f具有价值的若干意见]一个变量f总协会是向所有其协会的总和。变量。
This algorithm does not work with missing values, therefore the data are revised, e.g. all missing values are filled in. To do this, the same measure of association between variables is used as in the algorithm. When variable f has missing values, the variable g with the largest absolute association to f is looked up. When the association between f and g is positive, any missing value of f is replaced by the value of g for the same observation. If the association between f and g is negative, then any missing value of f is replaced by the value of 1-g for the same observation.
该算法不遗漏值,因此,数据被修改,如所有缺失值填充。要做到这一点,同样的措施之间的关联变量被用来作为算法。缺少变量f值时,最大的绝对联想到F变量g抬起头来。当F和G之间的关系是积极的,任何缺少的F值是相同的观察值由G替换。如果f和g之间的关联是负面的,然后被替换任何缺少的F值由1-G相同的观察值。
值----------Value----------
an object of class "mona" representing the clustering. See mona.object for details.
类"mona"代表聚类的对象。看到mona.object详情。
参见----------See Also----------
agnes for background and references; mona.object, plot.mona.
agnes背景和参考; mona.object,plot.mona。
举例----------Examples----------
data(animals)
ma <- mona(animals)
ma
## Plot similar to Figure 10 in Struyf et al (1996)[#画出类似于图10中Struyf等(1996)]
plot(ma)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|