找回密码
 注册
查看: 584|回复: 0

R语言 hopach包 labelstomss()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-25 21:46:51 | 显示全部楼层 |阅读模式
labelstomss(hopach)
labelstomss()所属R语言包:hopach

                                        Functions to compute silhouettes and split silhouettes
                                         函数来计算的轮廓和分裂剪影

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Silhouettes measure how well an element belongs to its cluster, and the average silhouette measures the strength of cluster membership overall.  The Median (or Mean) Split Silhouette (MSS) is a measure of cluster  heterogeneity. Given a partitioning of elements into groups, the MSS algorithm considers each group separately and computes the split silhouette for that group, which evaluates evidence in favor of further splitting the group. If the median (or mean) split silhouette over all groups in the partition is low, the groups are homogeneous.
轮廓测量元素属于其聚类,聚类成员的整体实力平均剪影措施。中位数(或平均数)斯普利特剪影(MSS),是一个聚类异质性的措施。 MSS的算法给出一个分割成组的元素,认为每个组分别计算,这有利于进一步分裂集团评估的证据,该组的分裂剪影。如果分区中的所有群体,中位数(或平均数)分裂剪影是低,群体均匀。


用法----------Usage----------


labelstomss(labels, dist, khigh = 9, within = "med", between = "med",
hierarchical = TRUE)

msscheck(dist, kmax = 9, khigh = 9, within = "med", between = "med",
    force = FALSE, echo = FALSE, graph = FALSE)

silcheck(data, kmax = 9, diss = FALSE, echo = FALSE, graph = FALSE)



参数----------Arguments----------

参数:labels
vector of cluster labels for each element in the set.
集合中的每个元素的聚类标签的向量。


参数:dist
numeric distance matrix containing the pair wise distances  between all elements. All values must be numeric and missing values are not allowed.
数字距离矩阵包含的所有元素之间的成对距离。所有的值必须是数字和遗漏值是不允许的。


参数:data
a data matrix. Each column corresponds to an observation, and  each row corresponds to a variable. In the gene expression context,  observations are arrays and variables are genes. All values must be numeric.  Missing values are ignored. In silcheck, data may also be a  distance matrix or dissimilarity object if the argument diss=TRUE.
一个数据矩阵。每一列对应的观察,每行对应一个变量。在基因表达的情况下,观测阵列和变数的基因。所有的值必须是数字。遗漏值将被忽略。在silcheck,data也可能是一个距离矩阵或如果参数diss=TRUE的不同对象。


参数:khigh
integer between 1 and 9 specifying the maximum number of  children for each cluster when computing MSS.
1至9之间的整数,指定为每个聚类计算MSS的最大数量的儿童。


参数:kmax
integer between 1 and 9 specifying the maximum number of clusters to consider. Can be different from khigh, though typically these are the same value.
1至9之间的整数,指定考虑聚类的最大数量。可以是从khigh不同,但通常这些都是相同的值。


参数:within
character string indicating how to compute the split silhouette for each cluster. The available options are "med" (median over all elements in the cluster) or "mean" (mean over all elements in the  cluster).
字符串,指示如何计算每个簇的分裂剪影。可用的选项是“MED”(超过聚类中的所有元素的中位数)或“平均”(平均超过聚类中的所有元素)。


参数:between
character string indicating how to compute the MSS over all clusters. The available options are "med" (median over all clusters) or "mean" (mean over all clusters). Recommended to use the same value as within.
字符串,指示如何计算所有聚类的MSS。可用的选项是“MED”(以上所有聚类中位数)或“平均”(意思是在所有聚类)。推荐使用相同的值within。


参数:hierarchical
logical indicating if 'labels' should be treated as encoding a hierarchical tree, e.g. from HOAPCH.
逻辑表示编码层次树,例如,如果“标签”,应被视为治疗从HOAPCH。


参数:force
indicator of whether to require at least 2 clusters, if FALSE (default), one cluster is considered.
是否需要至少2个聚类,如果为FALSE(默认),被认为是一个聚类的指标。


参数:echo
indicator of whether to print the selected number of clusters and corresponding MSS.
指标是否打印选定的聚类和相应的MSS。


参数:graph
indicator of whether to generate a plot of MSS (or average silhouette in silcheck) versus number of clusters.
指标是否产生的MSS(或平均silcheck剪影)与聚类的图。


参数:diss
indicator of whether data is a dissimilarity matrix  (or dissimilarity object), as in the pam function of the  cluster package. If TRUE then data will be considered  as a dissimilarity matrix. If FALSE, then data will be  considered as a data matrix (observations by variables).  
指标是否data是pamcluster包的功能,相异矩阵(或相异对象)。如果TRUE然后data将考虑作为一个相异矩阵。如果FALSE,则data将考虑作为数据矩阵(由变量的观测)。


Details

详情----------Details----------

The Median (and mean) Split Silhouette (MSS) criteria is defined in  paper107 listed in the references (below). This criteria is based on the criteria function 'silhouette', proposed by Kaufman and Rousseeuw (1990). While average silhouette is a good global measure of cluster strength, MSS was developed to be more "aggressive" for finding small, homogeneous clusters in large data sets. MSS is a measure of average cluster homogeneity. The Median version is more robust than the Mean.
中位数和平均分割剪影(MSS)的标准定义在参考文献中列出的paper107(下同)。由考夫曼和Rousseeuw的(1990)提出的标准功能“剪影”,基于此标准。虽然平均剪影是一个很好的聚类实力的全球性措施,MSS的发展成为更“积极”寻找小,同质化聚类在大型数据集。 MSS是一个平均聚类同质化的措施。中位数的版本是超过中庸强劲。


值----------Value----------

For labelstomss, the median (or mean or combination) split silhouette, depending on the values of within and between. <br>
labelstomss,中位数(或平均或组合)分裂的剪影,取决于within和between值。参考

For msscheck, a vector with first component the chosen number of clusters (minimizing MSS) and second component the corresponding MSS.
msscheck,与聚类的选择(减少MSS),第一部分和第二部分相应的MSS的向量。

For silcheck, a vector with first component the chosen number of clusters (maximizing average silhouette) and second component the corresponding average silhouette.
silcheck,聚类的选择(最大化平均剪影)第一部分和第二部分的相应平均剪影向量。


作者(S)----------Author(s)----------


Katherine S. Pollard &lt;kpollard@gladstone.ucsf.edu&gt; and Mark J. van der Laan &lt;laan@stat.berkeley.edu&gt;



参考文献----------References----------





参见----------See Also----------

pam, hopach, distancematrix
pam,hopach,distancematrix


举例----------Examples----------



mydata<-rbind(cbind(rnorm(10,0,0.5),rnorm(10,0,0.5),rnorm(10,0,0.5)),cbind(rnorm(15,5,0.5),rnorm(15,5,0.5),rnorm(15,5,0.5)))
mydist&lt;-distancematrix(mydata,d="cosangle") #compute the distance matrix.[计算距离矩阵。]

#pam[PAM]
result1<-pam(mydata,k=2)
result2<-pam(mydata,k=5)
labelstomss(result1$clust,mydist,hierarchical=FALSE)
labelstomss(result2$clust,mydist,hierarchical=FALSE)

#hopach[hopach]
result3<-hopach(mydata,dmat=mydist)
labelstomss(result3$clustering$labels,mydist)
labelstomss(result3$clustering$labels,mydist,within="mean",between="mean")


转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-2-6 03:58 , Processed in 0.019564 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表