bhc(BHC)
bhc()所属R语言包:BHC
Function to perform Bayesian Hierarchical Clustering on a 2D
二维函数来执行贝叶斯层次聚类
译者:生物统计家园网 机器人LoveR
描述----------Description----------
The method performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture) to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge. This avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric. This implementation accepts multinomial (i.e. discrete, with 2+ categories) or time-series data.
该方法执行自下而上的分层聚类,使用Dirichlet过程(无限的混合物),在数据模型和贝叶斯模型选择的不确定性,决定在每一步合并的聚类。这就避免了传统方法的一些局限性,例如有多少聚类应该如何选择一个有原则的距离度量。接受此实施多项(即离散,2 +类)或时间序列数据。
用法----------Usage----------
bhc(data, itemLabels, nFeatureValues, timePoints, dataType,
noise, numReps, noiseMode, robust, numThreads, verbose)
参数----------Arguments----------
参数:data
A 2D array containing discretised data. The dimensions of data should be nDataItems * nFeatures, and the algorithm will cluster the data items.
一个二维数组,包含离散化数据。 data的尺寸应该nDataItems * nFeatures,将聚类中的数据项的算法。
参数:itemLabels
A character array containing nDataItems entries, one for each data item in the analysis. The leaf nodes of the output dendrogram will be labelled with these labels.
一个字符数组,包含nDataItems项,每个数据项分析。输出的树状图的叶节点将被贴上这些标签。
参数:nFeatureValues
Deprecated. This is a legacy argument, retained for backwards compatibility. Any value passed to it will have no effect.
已过时。这是一个传统的说法,保留向后兼容性。传递给它的任何值不会有任何效果。
参数:timePoints
An array of length nFeatures, containing the time points of the measurements.
长度nFeatures数组,包含测量的时间点。
参数:dataType
A string specifying the data type. Either ``multinomial'', ``time-course'', or ``cubicspline''.
一个字符串,指定数据类型。要么multinomial'',time-course''或cubicspline''。
参数:noise
Noise term for each gene, required only if noiseMode=1. The noise term for each gene is calculated as <p align="center">\frac{∑(\mathrm{residuals}^2)}{(\mathrm{number\, of\, observations\, for\, gene} - 1)(\mathrm{number\, of\, replicates})},
噪音术语为每一个基因,需要只有noiseMode=1.<p ALIGN="CENTER">计算每个基因的噪音术语\frac{∑(\mathrm{residuals}^2)}{(\mathrm{number\, of\, observations\, for\, gene} - 1)(\mathrm{number\, of\, replicates})},
where (number of observations for gene) is typically (number of time points * number of replicates).
(基因的观测数)通常是(*号的复制时间点的数量)。
参数:numReps
Number of replicates per observation.
每观察数复制。
参数:noiseMode
Noise mode. If 0 then fitted noise; 1 fixed noise; 2 estimated noise from replicates.
噪音模式。如果再装噪声; 1个固定噪声; 2从复制的估计噪音。
参数:robust
0 to use single Gaussian likelihood, 1 to use mixture likelihood.
0使用单一高斯的可能性,1,使用混合的可能性。
参数:numThreads
The BHC library has been parallelised using OpenMP (currently on UN*X systems only). Specify here the number of threads to use (the default value is 1).
parallelised六六六库已使用OpenMP(目前联合国* X系统只)。在这里指定要使用的线程(默认值是1)的数量。
参数:verbose
If set to TRUE, the algorithm will output some information to screen as it runs.
如果设置为TRUE,该算法将输出一些信息进行筛选,因为它运行。
Details
详情----------Details----------
Typical usage for the multinomial case:
为多项式情况下的典型用法:
值----------Value----------
A DENDROGRAM object (see the R stats package for details).
一个树状对象(见统计的R包细节)。
作者(S)----------Author(s)----------
Rich Savage, Emma Cooke, and Robert Darkins (binomial case originally written by Yang Xu)
参考文献----------References----------
2005-002 (2005); also see shorter version in ICML-2005;
al, BMC Bioinformatics 10:242 (2009);
参见----------See Also----------
hclust
hclust
举例----------Examples----------
##BUILD SAMPLE DATA AND LABELS[#建立样本数据和标签。]
data <- matrix(0,15,10)
itemLabels <- vector("character",15)
data[1:5,] <- 1 ; itemLabels[1:5] <- "a"
data[6:10,] <- 2 ; itemLabels[6:10] <- "b"
data[11:15,] <- 3 ; itemLabels[11:15] <- "c"
timePoints <- 1:10 # for the time-course case[时间进程的情况下]
##DATA DIMENSIONS[#数据维]
nDataItems <- nrow(data)
nFeatures <- ncol(data)
##RUN MULTINOMIAL CLUSTERING[#运行MULTINOMIAL聚类]
hc1 <- bhc(data, itemLabels, verbose=TRUE)
plot(hc1, axes=FALSE)
##RUN TIME-COURSE CLUSTERING[#运行时间课程聚类]
hc2 <- bhc(data, itemLabels, 0, timePoints, "time-course",
numReps=1, noiseMode=0, numThreads=2, verbose=TRUE)
plot(hc2, axes=FALSE)
##OUTPUT CLUSTER LABELS TO FILE[#输出聚类标签来指定文件]
WriteOutClusterLabels(hc1, "labels.txt", verbose=TRUE)
##FOR THE MULTINOMIAL CASE, THE DATA CAN BE DISCRETISED[#多项式情况下,这些数据可以离散化]
newData <- data[] + rnorm(150, 0, 0.1);
percentiles <- FindOptimalBinning(newData, itemLabels, transposeData=TRUE, verbose=TRUE)
discreteData <- DiscretiseData(t(newData), percentiles=percentiles)
discreteData <- t(discreteData)
hc3 <- bhc(discreteData, itemLabels, verbose=TRUE)
plot(hc3, axes=FALSE)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|