benhur(clusterStab)
benhur()所属R语言包:clusterStab
A Function to Estimate the Number of Clusters in Microarray Data
一个函数来估计在微阵列数据聚类数目
译者:生物统计家园网 机器人LoveR
描述----------Description----------
This function estimates the number of clusters in e.g., microarray data using an iterative process proposed by Asa Ben-Hur.
这个函数估计的数字聚类在如,微阵列数据使用亚撒宾虚提出一个反复的过程。
用法----------Usage----------
## S4 method for signature 'ExpressionSet'
benhur(object, freq, upper, seednum = NULL,
linkmeth = "average", distmeth = "euclidean", iterations = 100)
## S4 method for signature 'matrix'
benhur(object, freq, upper, seednum = NULL, linkmeth
= "average", distmeth = "euclidean", iterations = 100)
参数----------Arguments----------
参数:object
Either a matrix or ExpressionSet
无论是矩阵或ExpressionSet
参数:freq
The proportion of samples to use. This should be somewhere between 0.6 - 0.8 for best results.
样品的比例使用。这应该是介于0.6 - 0.8,以取得最佳效果。
参数:upper
The upper limit for number of clusters.
簇数的上限。
参数:seednum
A value to pass to set.seed, which will allow for exact reproducibility at a later date.
一个值传递给set.seed,这将允许在稍后的日期精确重现。
参数:linkmeth
Linkage method to pass to hclust. Valid values include "average", "centroid", "ward", "single", "mcquitty", or "median".
联动方法传递给hclust。有效值包括“平均”,“重心”,“病房”,“单一”,“mcquitty”,或“中位数”。
参数:distmeth
The distance method to use. Valid values include "euclidean" and "pearson" where pearson implies 1-pearson correlation.
距离的方法来使用。有效值包括“欧几里德”和“培”皮尔逊意味着1-Pearson相关。
参数:iterations
The number of iterations to use. The default of 100 is a reasonable number.
迭代使用。默认的100,是一个合理的数目。
Details
详情----------Details----------
This function may be used to estimate the number of true clusters that exist in a set of microarray data. This estimate can be used to as input for clusterComp to estimate the stability of the clusters.
此功能可用于估计在微阵列数据集存在真正的聚类。这可以用来估计输入clusterComp估计团簇的稳定性。
The primary output from this function is a set of histograms that show for each cluster size how often similar clusters are formed from subsets of the data. As the number of clusters increases, the pairwise similarity of cluster membership will decrease. The basic idea is to choose the histogram corresponding to the largest number of clusters in which the majority of the data in the histogram is concentrated at or near 1.
从这个函数的主要输出是一个直方图显示每个簇的大小往往是类似的聚类如何从数据的子集组成。聚类数量增加,成对的聚类成员的相似性会减少。的基本思路是,选择相应的直方图直方图数据的大部分集中在1或附近的聚类数量最多。
If overlay is set to TRUE, an additional CDF plot will be produced. This can be used in conjunction with the histograms to determine at which cluster number the data are no longer concentrated at or near 1.
叠加设置TRUE,如果一个额外的民防部队图将产生。这可以用来在结合直方图确定簇号的数据不再集中在或接近1。
值----------Value----------
The output from this function is an object of class benhur. See the benhur-class man page for more information.
从这个函数的输出是一个对象类benhur。更多信息,请参阅benhur-class手册页。
作者(S)----------Author(s)----------
Originally written by Mark Smolkin <marksmolkin@hotmail.com>
further modifications by James W. MacDonald <jmacdon@med.umich.edu>
参考文献----------References----------
method for discovering structure in clustered data. Pacific Symposium on Biocomputing, 2002. Smolkin, M. and Ghosh, D. (2003). Cluster stability scores for microarray data in cancer studies . BMC Bioinformatics 4, 36 - 42.
举例----------Examples----------
data(sample.ExpressionSet)
tmp <- benhur(sample.ExpressionSet, 0.7, 5)
hist(tmp)
ecdf(tmp)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|