R语言 WGCNA包 consensusProjectiveKMeans()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-10-1 21:10:57

consensusProjectiveKMeans(WGCNA)
consensusProjectiveKMeans()所属R语言包：WGCNA

                                       Consensus projective K-means (pre-)clustering of expression data
                                       共识射影的表达数据的（预）的K-means聚类

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Implementation of a consensus variant of K-means clustering for expression data across multiple data sets.
实施的共识变种的K-means聚类跨多个数据集的数据表达。

用法----------Usage----------

consensusProjectiveKMeans(
  multiExpr,
  preferredSize = 5000,
  nCenters = NULL,
  sizePenaltyPower = 4,
  networkType = "unsigned",
  randomSeed = 54321,
  checkData = TRUE,
  useMean = (length(multiExpr) > 3),
  maxIterations = 1000,
  verbose = 0, indent = 0)

参数----------Arguments----------

参数：multiExpr
  expression data in the multi-set format (see checkSets). A vector of lists, one per set. Each set must contain a component data that contains the expression data, with rows corresponding to samples and columns to genes or probes.
表达在多集的格式的数据（见checkSets）。一个向量的列表，每一个组。每个集必须包含一个组件data包含的表达数据，与对应的基因或探针的样品和列的行。

参数：preferredSize
preferred maximum size of clusters.
优选的最大大小的簇。

参数：nCenters
number of initial clusters. Empirical evidence suggests that more centers will give a  better preclustering; the default is as.integer(min(nGenes/20, preferredSize^2/nGenes))  and is an attempt to arrive at a reasonable number given the resources available.
初始簇数。经验证据表明，中心将给予更多一个更好的preclustering;默认为as.integer(min(nGenes/20, preferredSize^2/nGenes))和试图到达一个合理的数目可利用的资源。

参数：sizePenaltyPower
parameter specifying how severe is the penalty for clusters that exceed preferredSize.
参数指定是多么严重的刑罚聚类超过preferredSize的。

参数：networkType
network type. Allowed values are (unique abbreviations of) "unsigned", "signed", "signed hybrid". See adjacency.
网络类型。允许的值是（）"unsigned"，"signed"，"signed hybrid"唯一的缩写。见adjacency。

参数：randomSeed
integer to be used as seed for the random number generator before the function starts. If a current seed exists, it is saved and restored upon exit.
整数函数开始之前被用作随机数发生器的种子。如果当前的种子存在，它退出时保存和恢复。

参数：checkData
logical: should data be checked for genes with zero variance and  genes and samples with excessive numbers of missing samples? Bad samples are ignored; returned cluster assignment for bad genes will be NA.
逻辑：与零的基因变异和基因丢失的样本数目过多样品的数据进行检查？坏的样本将被忽略;返回坏基因簇分配将NA。

参数：useMean
logical: should mean distance across sets be used instead of maximum? See details.
逻辑：应套之间的平均距离而不是最大？查看详细信息。

参数：maxIterations
maximum iterations to be attempted.
最大迭代次数要尝试。

参数：verbose
integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.
整数的详细程度。零表示沉默，较高的值使输出越来越多，更详细。

参数：indent
indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.
缩进诊断消息。零表示无压痕，每个单元增加两个空格。

Details

详细信息----------Details----------

The principal aim of this function within WGCNA is to pre-cluster a large number of genes into smaller blocks that can be handled using standard WGCNA techniques.
预聚类内WGCNA此功能的主要目的是为较小的块，可以处理使用标准WGCNA技术大量的基因。

This function implements a variant of K-means clustering that is suitable for co-expression analysis. Cluster centers are defined by the first principal component, and distances by correlation. Consensus distance across several sets is defined as the maximum of the corresponding distances in individual sets; however, if useMean is set, the mean distance will be used instead of the maximum.  The distance between a gene and a center of a cluster is multiplied by a factor of \code{max(clusterSize/preferredSize, 1)^sizePenaltyPower}, thus penalizing clusters whose size exceeds preferredSize. The function starts with randomly generated cluster assignment (hence the need to set the random seed for repeatability) and executes interations of calculating new centers and reassigning genes to nearest (in the consensus sense) center until the clustering becomes stable.  Before returning, nearby clusters are iteratively combined if their combined size is below preferredSize.
此函数实现K-means聚类的一个变体，是适合用于共表达分析。聚类中心定义的第一主成分，并通过相关的距离。共识几套跨的距离被定义为在个别集对应的距离的最大值;然而，如果useMean被设置，平均距离将被代替的最大。的基因和一个聚类的中心之间的距离乘以一个因子\code{max(clusterSize/preferredSize, 1)^sizePenaltyPower}，从而惩罚聚类，其大小超过preferredSize。功能开始用随机生成的的簇分配（因此需要设置随机种子重复性的），并执行计算新的中心和重新分配基因的聚类趋于稳定，直到最近的共识感中心的互动。互动。在返回前，附近的聚类反复地结合起来，如果他们的组合大小低于preferredSize。

Consensus distance defined as maximum of distances in all sets is consistent with the approach taken in blockwiseConsensusModules, but the procedure may not converge. Hence it is advisable to use the mean as consensus in cases where there are multiple data sets (4 or more, say) and/or if the input data sets are very different.
共识定义为在所有集合的距离最大距离是与在blockwiseConsensusModules所采取的做法相一致，但可能不收敛的过程。因此，它是建议使用平均值作为协商一致的情况下下，有多个数据集（4个或更多，例如）和/或，如果输入的数据集是非常不同的。

The standard principal component calculation via the function svd fails from time to time (likely a convergence problem of the underlying lapack functions). Such errors are trapped and the principal component is approximated by a weighted average of expression profiles in the cluster. If verbose is set above 2, an informational message is printed whenever this approximation is used.
标准的主要成分计算通过函数svd失败不时（可能是衔接的问题的基础LAPACK函数）。这样的错误捕获和表达谱的加权平均聚类中的主要成分是近似的。如果verbose设置2以上，打印一条信息性消息时使用这种近似。

值----------Value----------

A list with the following components:
以下组件列表：

参数：clusters
a numerical vector with one component per input gene, giving the cluster number in which the gene is assigned.
数值矢量与输入的基因的每一个组成部分，在该基因被分配给簇编号。

参数：centers
a vector of lists, one list per set. Each list contains a component data that contains a matrix whose columns are the cluster centers in the corresponding set.
一个向量的列表，一个列表，每一套。每个列表包含data，其中包含一个矩阵，其列在相应的一组聚类中心的一个组成部分。

参数：unmergedClusters
  a numerical vector with one component per input gene, giving the cluster number in which the gene was assigned before the final merging step.
数值矢量与输入的基因的每一个组成部分，在该基因的最终合并步骤之前被分配给簇号。

参数：unmergedCenters
  a vector of lists, one list per set. Each list contains a component data that contains a matrix whose columns are the cluster centers before merging in the corresponding set.
一个向量的列表，一个列表，每一套。每个列表包含data包含一个矩阵的列合并前相应的一组聚类中心的一个组成部分。

（作者）----------Author(s)----------

Peter Langfelder

参见----------See Also----------

projectiveKMeans
projectiveKMeans

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册