R语言 WGCNA包 blockwiseModules()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-10-1 21:07:45

blockwiseModules(WGCNA)
blockwiseModules()所属R语言包：WGCNA

                                       Automatic network construction and module detection
                                       自动网络建设和模块检测

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This function performs automatic network construction and module detection on large expression datasets in a block-wise manner.
此功能自动进行网络建设和大型表达数据块明智的方式检测模块。

用法----------Usage----------

blockwiseModules(
  datExpr,
  blocks = NULL,
  maxBlockSize = 5000,
  randomSeed = 12345,
  corType = "pearson",
  power = 6,
  networkType = "unsigned",
  TOMType = "signed",
  TOMDenom = "min",
  deepSplit = 2,
  detectCutHeight = 0.995, minModuleSize = min(20, ncol(datExpr)/2 ),
  maxCoreScatter = NULL, minGap = NULL,
  maxAbsCoreScatter = NULL, minAbsGap = NULL,
  pamStage = TRUE, pamRespectsDendro = TRUE,
  minCoreKME = 0.5, minCoreKMESize = minModuleSize/3,
  minKMEtoStay = 0.3,
  reassignThreshold = 1e-6,
  mergeCutHeight = 0.15, impute = TRUE,
  getTOMs = NULL,
  saveTOMs = FALSE,
  saveTOMFileBase = "blockwiseTOM",
  trapErrors = FALSE, numericLabels = FALSE,
  checkMissingData = TRUE,
  maxPOutliers = 1,
  quickCor = 0,
  pearsonFallback = "individual",
  cosineCorrelation = FALSE,
  nThreads = 0,
  verbose = 0, indent = 0)

参数----------Arguments----------

参数：datExpr
expression data. A data frame in which columns are genes and rows ar samples. NAs are allowed, but not too many.
表达数据。一个数据框的基因，在哪些列和行AR样本。来港定居是允许的，但不是太多。

参数：blocks
optional specification of blocks in which hierarchical clustering and module detection should be performed. If given, must be a numeric vector with one entry per column (gene)  of exprData giving the number of the block to which the corresponding gene belongs.
块层次聚类和模块检测应执行的可选规格。如果给定的，必须是一个数值向量exprData给该块的数目相应的基因所属的每列的一个条目（基因）。

参数：maxBlockSize
integer giving maximum block size for module detection. Ignored if blocks above is non-NULL. Otherwise, if the number of genes in datExpr exceeds maxBlockSize, genes will be pre-clustered into blocks whose size should not exceed maxBlockSize.
整数，最大块大小为模块检测。如果忽略blocks以上非NULL。否则，如果基因的数目在datExpr超过maxBlockSize，基因将预聚成块的大小应不超过maxBlockSize的。

参数：randomSeed
integer to be used as seed for the random number generator before the function starts. If a current seed exists, it is saved and restored upon exit. If NULL is given, the  function will not save and restore the seed.
整数函数开始之前被用作随机数发生器的种子。如果当前的种子存在，它退出时保存和恢复。 NULL如果的，该函数将不能保存和恢复的种子。

参数：corType
character string specifying the correlation to be used. Allowed values are (unique abbreviations of) "pearson" and "bicor", corresponding to Pearson and bidweight midcorrelation, respectively. Missing values are handled using the pairwise.complete.obs option.
字符串指定要使用的相关性。允许的值是（唯一的缩写）"pearson"和"bicor"，对应的Pearson和bidweight midcorrelation的，分别。处理缺失值pairwise.complete.obs使用选项。

参数：power
soft-thresholding power for network construction.
软阈值功率的网络建设。

参数：networkType
network type. Allowed values are (unique abbreviations of) "unsigned", "signed", "signed hybrid". See adjacency.
网络类型。允许的值是（）"unsigned"，"signed"，"signed hybrid"唯一的缩写。见adjacency。

参数：TOMType
one of "none", "unsigned", "signed". If "none", adjacency will be used for clustering. If "unsigned", the standard TOM will be used (more generally, TOM function will receive the adjacency as input). If "signed", TOM will keep track of the sign of correlations between neighbors.
"none"，"unsigned"，"signed"之一。如果"none"，邻接将用于聚类。如果"unsigned"，标准的TOM将使用（更一般地，TOM函数将接收到作为输入的邻接）。如果"signed"，TOM将跟踪邻居之间的相关性的符号。

参数：TOMDenom
a character string specifying the TOM variant to be used. Recognized values are  "min" giving the standard TOM described in Zhang and Horvath (2005), and "mean" in which  the min function in the denominator is replaced by mean. The "mean" may produce  better results but at this time should be considered experimental.
要使用的字符的字符串指定的TOM变种。公认的价值观是"min"的给标准TOM张和霍瓦特（2005年）中描述的，"mean"在其中min函数分母中的被替换的mean。 "mean"可能会产生更好的结果，但在这个时候，应考虑实验。

参数：deepSplit
integer value between 0 and 4. Provides a simplified control over how sensitive module detection should be to module splitting, with 0 least and 4 most sensitive. See cutreeDynamic for more details.
在0和4之间的整数值。应该是敏感的模块检测到模块的分裂提供了一个简化的控制，与0至少4个最敏感的。见cutreeDynamic更多详情。

参数：detectCutHeight
dendrogram cut height for module detection. See cutreeDynamic for more details.
模块检测系统树砍高度。见cutreeDynamic更多详情。

参数：minModuleSize
minimum module size for module detection. See cutreeDynamic for more details.
模块检测最小的模块尺寸。见cutreeDynamic更多详情。

参数：maxCoreScatter
maximum scatter of the core for a branch to be a cluster, given as the fraction of cutHeight relative to the 5th percentile of joining heights. See cutreeDynamic for more details.
最大散度为核心的一个分支，是一个聚类，cutHeight相对于第5百分位加盟高度的比例。见cutreeDynamic更多详情。

参数：minGap
minimum cluster gap given as the fraction of the difference between cutHeight and the 5th percentile of joining heights. See cutreeDynamic for more details.
给出最小聚类隙作为cutHeight和高度的接合的第5百分位之间的差异的馏分。见cutreeDynamic更多详情。

参数：maxAbsCoreScatter
maximum scatter of the core for a branch to be a cluster given as absolute heights. If given, overrides maxCoreScatter. See cutreeDynamic for more details.
最大散度为核心的一个分支，是一个绝对高度聚类。如果给定的，覆盖maxCoreScatter。见cutreeDynamic更多详情。

参数：minAbsGap
minimum cluster gap given as absolute height difference. If given, overrides minGap. See cutreeDynamic for more details.
最小群的差距给予作为绝对高度差。如果给定的，覆盖minGap。见cutreeDynamic更多详情。

参数：pamStage
logical.  If TRUE, the second (PAM-like) stage of module detection will be performed. See cutreeDynamic for more details.
逻辑。如果是TRUE，第二阶段（PAM等）的模块检测将被执行。见cutreeDynamic更多详情。

参数：pamRespectsDendro
Logical, only used when pamStage is TRUE.  If TRUE, the PAM stage will respect the dendrogram in the sense an object can be PAM-assigned only to clusters that lie below it on the branch that the object is merged into.  See cutreeDynamic for more details.
逻辑，只用时pamStage是TRUE。如果TRUE，PAM阶段会尊重树状图在这个意义上，一个对象可以是PAM-分配到聚类位于它下面的对象被合并到分支。见cutreeDynamic更多详情。

参数：minCoreKME
a number between 0 and 1. If a detected module does not have at least minModuleKMESize genes with eigengene connectivity at least minCoreKME, the module is disbanded (its genes are unlabeled and returned to the pool of genes waiting for mofule detection).
0和1之间的一个数。如果检测到的模块不与eigengene连接至少有minModuleKMESize基因至少minCoreKME，该模块被解散（未标记的基因，并返回到池中等待mofule检测的基因）。

参数：minCoreKMESize
see minCoreKME above.
看到minCoreKME以上。

参数：minKMEtoStay
genes whose eigengene connectivity to their module eigengene is lower than minKMEtoStay are removed from the module.
基因的eigengene连接到模块eigengene的是比minKMEtoStay是从模块中删除。

参数：reassignThreshold
p-value ratio threshold for reassigning genes between modules. See Details.
p值比阈值，重新分配模块之间的基因。查看详细信息。

参数：mergeCutHeight
dendrogram cut height for module merging.
树图切模块合并的高度。

参数：impute
logical: should imputation be used for module eigengene calculation? See moduleEigengenes for more details.

参数：getTOMs
deprecated, please use saveTOMs below.
已过时，请下面使用saveTOMs。

参数：saveTOMs
logical: should the consensus topological overlap matrices for each block be saved and returned?
逻辑的共识的拓扑重叠的矩阵每块被保存并返回吗？

参数：saveTOMFileBase
character string containing the file name base for files containing the consensus topological overlaps. The full file names have "block.1.RData", "block.2.RData" etc. appended. These files are standard R data files and can be loaded using the load function.
字符串，包含文件名的基础文件，其中包含拓扑重叠共识。完整的文件名"block.1.RData"，"block.2.RData"等附加。这些文件是标准的R数据文件，可以加载使用load功能。

参数：trapErrors
logical: should errors in calculations be trapped?
逻辑：应计算错误而陷入吗？

参数：numericLabels
logical: should the returned modules be labeled by colors (FALSE), or by numbers (TRUE)?
逻辑：应返回的模块进行标记的颜色（FALSE），或由数字（TRUE）的？

参数：checkMissingData
logical: should data be checked for excessive numbers of missing entries in genes and samples, and for genes with zero variance? See details.
逻辑：基因和样品缺少的条目过多的数据进行检查，并与零差异的基因呢？查看详细信息。

参数：maxPOutliers
only used for corType=="bicor". Specifies the maximum percentile of data  that can be considered outliers on either  side of the median separately. For each side of the median, if higher percentile than maxPOutliers is considered an outlier by the weight function based on 9*mad(x), the width of the weight function is increased such that the percentile of outliers on that side of the median equals maxPOutliers. Using maxPOutliers=1 will effectively disable all weight function broadening; using maxPOutliers=0 will give results that are quite similar (but not equal to) Pearson correlation.
仅用于corType=="bicor"。指定的最大百分位数的数据是可以考虑的离群值中位数的任一侧上分开。如果在中位数的每一侧，如果更高的百分比maxPOutliers被认为是一个异常值的权重函数基于9*mad(x)，权重函数的宽度的增加，离群值的那侧上的百分中位数等于maxPOutliers。使用maxPOutliers=1将有效地禁用所有的权重函数扩大; maxPOutliers=0给出的结果是相当类似（但不等于）Pearson相关。

参数：quickCor
real number between 0 and 1 that controls the handling of missing data in the calculation of correlations. See details.
0和1之间，控制处理中丢失的数据的相关性的计算的实数。查看详细信息。

参数：pearsonFallback
Specifies whether the bicor calculation, if used, should revert to Pearson when median absolute deviation (mad) is zero. Recongnized values are (abbreviations of)  "none", "individual", "all". If set to "none", zero mad will result in NA for the corresponding correlation.  If set to "individual", Pearson calculation will be used only for columns that have zero mad.  If set to "all", the presence of a single zero mad will cause the whole variable to be treated in  Pearson correlation manner (as if the corresponding robust option was set to FALSE). Has no  effect for Pearson correlation.  See bicor.
指定是否BICOR计算，如果使用的话，应恢复时，Pearson的平均绝对偏差（MAD）是零。株型识别的值是（的缩写）"none", "individual", "all"。如果设置为"none"，零狂会导致NA相应的相关。如果设置为"individual"，皮尔森计算将仅用于列具有零狂。如果设置为"all"，一个单独的零狂的存在，将导致在Pearson相关性的方式来对待整个变量（如果相应的robust选项被设置为FALSE）。有没有影响Pearson相关。见bicor。

参数：cosineCorrelation
logical: should the cosine version of the correlation calculation be used? The cosine calculation differs from the standard one in that it does not subtract the mean.
余弦版本的相关计算逻辑：应使用？的余弦计算不同于标准的一个，它并没有减去均值。

参数：nThreads
non-negative integer specifying the number of parallel threads to be used by certain parts of correlation calculations. This option only has an effect on systems on which a POSIX thread library is available (which currently includes Linux and Mac OSX, but excludes Windows).  If zero, the number of online processors will be used if it can be determined dynamically, otherwise correlation calculations will use 2 threads.
非负的整数，用于指定要使用的某些部分的相关性计算的并行线程的数目。此选项仅影响的系统上POSIX线程库（目前包括Linux和Mac OSX，但不包括视窗）。如果为零，则在线的处理器的数目将被使用，如果是可以动态地确定，否则将使用相关计算2个线程。

参数：verbose
integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.

参数：indent
indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.
缩进诊断消息。零表示无压痕，每个单元增加两个空格。

Details

详细信息----------Details----------

Before module detection starts, genes and samples are optionally checked for the presence of NAs. Genes and/or samples that have too many NAs are flagged as bad and removed from the analysis; bad genes will be automatically labeled as unassigned, while the returned eigengenes will have NA entries for all bad samples.
模块检测开始之前，基因，并将样品任选检查存在NA的。基因和/或样品，有太多的NA的被标记为坏，并从分析中删除;坏基因会被自动标记为未分配，而返回的特征基因，将所有坏的样本有NA项。

If blocks is not given and the number of genes exceeds maxBlockSize, genes are pre-clustered into blocks using the function projectiveKMeans; otherwise all genes are treated in a single block.
blocks如果不和的基因数目超过maxBlockSize，，基因的预聚成块的功能projectiveKMeans;否则，所有基因都在一个单一的块处理。

For each block of genes, the network is constructed and (if requested) topological overlap is calculated. If requested, the topological overlaps are returned as part of the return value list. Genes are then clustered using average linkage hierarchical clustering and modules are identified in the resulting dendrogram by the Dynamic Hybrid tree cut. Found modules are trimmed of genes whose correlation with module eigengene (KME) is less than minKMEtoStay. Modules in which fewer than minCoreKMESize genes have KME higher than minCoreKME  are disbanded, i.e., their constituent genes are pronounced unassigned.
对于每个块的基因，该网络被构造和（如果要求）拓扑重叠计算。如果有要求，的拓扑重叠，将返回的返回值列表的一部分。基因平均联动的层次聚类和模块标识在聚类分析的动态混合型树切，然后集中使用。发现模块被修剪的基因，其相关与模块eigengene（KME）小于minKMEtoStay。不到minCoreKMESize基因KME高于minCoreKME解散，即模块，它们的组成基因的发音分配的。

After all blocks have been processed, the function checks whether there are genes whose KME in the module they assigned is lower than KME to another module. If p-values of the higher correlations are smaller than those of the native module by the factor reassignThresholdPS, the gene is re-assigned to the closer module.
所有块都被处理之后，该函数将检查是否有基因的KME在他们分配模块是低于KME到另一个模块。如果较高的相关性的p-值小于由因子reassignThresholdPS，该基因被重新分配到的接近模块的本机模块。

In the last step, modules whose eigengenes are highly correlated are merged. This is achieved by clustering module eigengenes using the dissimilarity given by one minus their correlation, cutting the dendrogram at the height mergeCutHeight and merging all modules on each branch. The process is iterated until no modules are merged. See mergeCloseModules for more details on module merging.
在最后一步中，模块，其特征基因是高度相关的合并。这是通过使用给定的由减去及其相关性之一的相异的聚类模块特征基因，切割mergeCutHeight和合并每个分支上的所有模块的高度在树状图。该过程被重复，直到没有合并模块。见mergeCloseModules模块合并的详细信息。

The argument quick specifies the precision of handling of missing data in the correlation calculations. Zero will cause all  calculations to be executed precisely, which may be significantly slower than calculations without  missing data. Progressively higher values will speed up the calculations but introduce progressively larger errors. Without missing data, all column means and variances can be pre-calculated before the covariances are calculated. When missing data are present,  exact calculations require the column means and variances to be calculated for each covariance. The  approximate calculation uses the pre-calculated mean and variance and simply ignores missing data in the covariance calculation. If the number of missing data is high, the pre-calculated means and variances may be very different from the actual ones, thus potentially introducing large errors.  The quick value times the number of rows specifies the maximum difference in the number of missing entries for mean and variance calculations on the one hand and covariance on the other  hand that will be tolerated before a recalculation is triggered. The hope is that if only a few missing data are treated approximately, the error introduced will be small but the potential speedup can be  significant.
参数quick指定的相关计算丢失的数据处理的精度。为零，将导致所有的计算精确被执行，这可能是明显慢于没有丢失的数据的计算。值会逐步提高计算的速度，但介绍逐步误差较大。没有丢失的数据，所有的列均值和方差，可以预先计算出的协方差计算之前。当丢失的数据，精确的计算需要为每个协方差列的均值和方差来计算。的近似计算使用预先计算的均值和方差，并简单地忽略丢失数据的协方差计算。如果丢失的数据的数目是高的，预先计算的均值和方差可能从实际的是非常不同的，从而有可能引入较大的误差。 quick值乘以的行数指定缺少的条目，用于一方面和协方差，在另一方面，将重新计算被触发之前被容忍的均值和方差计算的数目的最大差值。希望是，如果只有很少的缺失数据处理约，将引入的误差小，但潜在的加速可能是显着的。

值----------Value----------

A list with the following components:
以下组件列表：

参数：colors
a vector of color or numeric module labels for all genes.
一个向量，颜色或数字模块标签的所有基因。

参数：unmergedColors
a vector of color or numeric module labels for all genes before module merging.
一个向量之前的所有基因的颜色或数字模块标签模块合并。

参数：MEs
a data frame containing module eigengenes of the found modules (given by colors).
一个数据框包含模块特征基因的发现模块（由colors）。

参数：goodSamples
numeric vector giving indices of good samples, that is samples that do not have too many missing entries.
数字矢量给指数好样的，这是样品，没有太多缺少的条目。

参数：goodGenes
numeric vector giving indices of good genes, that is genes that do not have too many missing entries.
数字矢量给指数的优良基因，是基因没有太多的遗漏的项目。

参数：dendrograms
a list whose components conatain hierarchical clustering dendrograms of genes  in each block.
一个列表，其在每块组件conatain基因的层次聚类树状图。

参数：TOMFiles
if saveTOMs==TRUE, a vector of character strings, one string per block, giving the file names of files (relative to current directory) in which blockwise  topological overlaps were saved.
如果saveTOMs==TRUE，矢量字符的字符串，每块的一个字符串，文件（相对于当前目录），其中列块的拓扑重叠保存的文件名。

参数：blockGenes
a list whose components give the indices of genes in each block.
作为分量的列表给的基因在每个块的索引。

参数：blocks
if input blocks was given, its copy; otherwise a vector of length equal number of genes giving the block label for each gene. Note that block labels are not necessarilly sorted in the order in which the blocks were processed (since we do not require this for the input blocks). See blockOrder below.
如果输入blocks是给定的，它的拷贝，否则将向量的长度相等数量的基因给每个基因的块标签。需要注意的是，块标签不necessarilly排序块的顺序进行处理（因为我们不需要输入blocks）。见blockOrder下面。

参数：blockOrder
a vector giving the order in which blocks were processed and in which blockGenes above is returned. For example, blockOrder[1] contains the label of the first-processed block.
一个向量块的顺序进行的处理，并在其中blockGenes返回上述。例如，blockOrder[1]的第一处理块中包含的标签。

参数：MEsOK
logical indicating whether the module eigengenes were calculated without errors.
逻辑模块特征基因是否进行了计算没有错误。

注意----------Note----------

If the input dataset has a large number of genes, consider carefully the maxBlockSize as it significantly affects the memory footprint (and whether the function will fail with a memory allocation error). From a theoretical point of view it is advantageous to use blocks as large as possible; on the other hand, using smaller blocks is substantially faster and often the only way to work with large numbers of genes. As a rough guide, it is unlikely a standard desktop computer with 4GB memory or less will be able to work with blocks larger than 8000 genes.
如果输入的数据集有一个大的基因数量，请仔细考虑maxBlockSize，因为它显着影响的内存占用（而不论该函数将失败，内存分配错误）。从理论的角度来看，它有利的是使用尽可能大的块，另一方面，使用更小的块基本上是更快，往往是唯一的方式来工作，与大量的基因。作为一个粗略的指南，它不可能是一个标准的台式机，搭配4GB内存或以下，将能与块大于8000个基因。

（作者）----------Author(s)----------

Peter Langfelder

参考文献----------References----------

参见----------See Also----------

goodSamplesGenes for basic quality control and filtering;
goodSamplesGenes基本的质量控制和过滤;

adjacency, TOMsimilarity for network construction;
adjacency，TOMsimilarity网络建设;

hclust for hierarchical clustering;
hclust层次聚类;

cutreeDynamic for adaptive branch cutting in hierarchical clustering dendrograms;
cutreeDynamic自适应枝扦插层次聚类树状图;

mergeCloseModules for merging of close modules.
mergeCloseModules合并密切模块。

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册