projectiveKMeans(WGCNA)
projectiveKMeans()所属R语言包:WGCNA
Projective K-means (pre-)clustering of expression data
射影的表达数据的(预)的K-means聚类
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Implementation of a variant of K-means clustering for expression data.
实施的K-means聚类表达数据的一个变种。
用法----------Usage----------
projectiveKMeans(
datExpr,
preferredSize = 5000,
nCenters = as.integer(min(ncol(datExpr)/20, preferredSize^2/ncol(datExpr))),
sizePenaltyPower = 4,
networkType = "unsigned",
randomSeed = 54321,
checkData = TRUE,
maxIterations = 1000,
verbose = 0, indent = 0)
参数----------Arguments----------
参数:datExpr
expression data. A data frame in which columns are genes and rows ar samples. NAs are allowed, but not too many.
表达数据。一个数据框的基因,在哪些列和行AR样本。来港定居是允许的,但不是太多。
参数:preferredSize
preferred maximum size of clusters.
优选的最大大小的簇。
参数:nCenters
number of initial clusters. Empirical evidence suggests that more centers will give a better preclustering; the default is an attempt to arrive at a reasonable number.
初始簇数。经验证据表明,中心将给予更多一个更好的preclustering;默认的是试图到达一个合理的数目。
参数:sizePenaltyPower
parameter specifying how severe is the penalty for clusters that exceed preferredSize.
参数指定是多么严重的刑罚聚类超过preferredSize的。
参数:networkType
network type. Allowed values are (unique abbreviations of) "unsigned", "signed", "signed hybrid". See adjacency.
网络类型。允许的值是()"unsigned","signed","signed hybrid"唯一的缩写。见adjacency。
参数:randomSeed
integer to be used as seed for the random number generator before the function starts. If a current seed exists, it is saved and restored upon exit.
整数函数开始之前被用作随机数发生器的种子。如果当前的种子存在,它退出时保存和恢复。
参数:checkData
logical: should data be checked for genes with zero variance and genes and samples with excessive numbers of missing samples? Bad samples are ignored; returned cluster assignment for bad genes will be NA.
逻辑:与零的基因变异和基因丢失的样本数目过多样品的数据进行检查?坏的样本将被忽略;返回坏基因簇分配将NA。
参数:maxIterations
maximum iterations to be attempted.
最大迭代次数要尝试。
参数:verbose
integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.
整数的详细程度。零表示沉默,较高的值使输出越来越多,更详细。
参数:indent
indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.
缩进诊断消息。零表示无压痕,每个单元增加两个空格。
Details
详细信息----------Details----------
The principal aim of this function within WGCNA is to pre-cluster a large number of genes into smaller blocks that can be handled using standard WGCNA techniques.
预聚类内WGCNA此功能的主要目的是为较小的块,可以处理使用标准WGCNA技术大量的基因。
This function implements a variant of K-means clustering that is suitable for co-expression analysis. Cluster centers are defined by the first principal component, and distances by correlation (more precisely, 1-correlation). The distance between a gene and a cluster is multiplied by a factor of \code{max(clusterSize/preferredSize, 1)^sizePenaltyPower}, thus penalizing clusters whose size exceeds preferredSize. The function starts with randomly generated cluster assignment (hence the need to set the random seed for repeatability) and executes interations of calculating new centers and reassigning genes to nearest center until the clustering becomes stable. Before returning, nearby clusters are iteratively combined if their combined size is below preferredSize.
此函数实现K-means聚类的一个变体,是适合用于共表达分析。聚类中心定义的第一主成分,和由相关性的距离(更精确地,1 - 相关)。的基因和一个簇之间的距离乘以一个因子\code{max(clusterSize/preferredSize, 1)^sizePenaltyPower},从而惩罚聚类,其大小超过preferredSize。功能开始用随机生成的的簇分配(因此需要设置随机种子重复性的),并执行计算新的中心和重新分配基因的聚类,直到最近的中心趋于稳定的互动。互动。在返回前,附近的聚类反复地结合起来,如果他们的组合大小低于preferredSize。
The standard principal component calculation via the function svd fails from time to time (likely a convergence problem of the underlying lapack functions). Such errors are trapped and the principal component is approximated by a weighted average of expression profiles in the cluster. If verbose is set above 2, an informational message is printed whenever this approximation is used.
标准的主要成分计算通过函数svd失败不时(可能是衔接的问题的基础LAPACK函数)。这样的错误捕获和表达谱的加权平均聚类中的主要成分是近似的。如果verbose设置2以上,打印一条信息性消息时使用这种近似。
值----------Value----------
A list with the following components:
以下组件列表:
参数:clusters
a numerical vector with one component per input gene, giving the cluster number in which the gene is assigned.
数值矢量与输入的基因的每一个组成部分,在该基因被分配给簇编号。
参数:centers
cluster centers, that is their first principal components.
聚类中心,这是他们的第一个主要组成部分。
(作者)----------Author(s)----------
Peter Langfelder
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|