modulePreservation(WGCNA)
modulePreservation()所属R语言包:WGCNA
Calculation of module preservation statistics
计算模块保存统计
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Calculations of module preservation statistics between independent data sets.
独立的数据集之间的模块保存统计计算。
用法----------Usage----------
modulePreservation(
multiData,
multiColor,
dataIsExpr = TRUE,
networkType = "unsigned",
corFnc = "cor",
corOptions = "use = 'p'",
referenceNetworks = 1,
nPermutations = 100,
includekMEallInSummary = FALSE,
restrictSummaryForGeneralNetworks = TRUE,
calculateQvalue = FALSE,
randomSeed = 12345,
maxGoldModuleSize = 1000,
maxModuleSize = 1000,
quickCor = 1,
ccTupletSize = 2,
calculateCor.kIMall = FALSE,
calculateClusterCoeff = FALSE,
useInterpolation = FALSE,
checkData = TRUE,
greyName = NULL,
savePermutedStatistics = TRUE,
loadPermutedStatistics = FALSE,
permutedStatisticsFile = if (useInterpolation) "permutedStats-intrModules.RData"
else "permutedStats-actualModules.RData",
plotInterpolation = TRUE,
interpolationPlotFile = "modulePreservationInterpolationPlots.pdf",
discardInvalidOutput = TRUE,
verbose = 1, indent = 0)
参数----------Arguments----------
参数:multiData
expression data or adjacency data in the multi-set format (see checkSets). A vector of lists, one per set. Each set must contain a component data that contains the expression or adjacency data. If expression data are used, rows correspond to samples and columns to genes or probes. In case of adjacencies, each data matrix should be a symmetric matrix ith entries between 0 and 1 and unit diagonal. Each component of the outermost list should be named.
表达数据或邻接在多集的格式的数据(见checkSets)。一个向量的列表,每一个组。 data包含表达式或邻接数据,每一组必须包含一个组件。如果是用来表达数据的,对应行以基因探针样品和列。在邻接的情况下,每个data矩阵应该是0和1之间,和单元对角线对称矩阵的第i个条目。应该被命名为每个组件最外层的列表。
参数:multiColor
a list in which every component is a vector giving the module labels of genes in multiExpr. The components must be named using the same names that are used in multiExpr; these names are used top match labels to expression data sets. See details.
的列表中的每一个分量是一个向量,在multiExpr的情况下,给模块标签基因。使用相同的名称中使用的组件必须被命名为multiExpr;这些名称是用来表达数据集的顶部匹配的标签。查看详细信息。
参数:dataIsExpr
logical: if TRUE, multiData will be interpreted as expression data; if FALSE, multiData will be interpreted as adjacencies.
逻辑:如果TRUE,multiData将被解释为表达数据;如果FALSE,multiData将被解释为相邻。
参数:networkType
network type. Allowed values are (unique abbreviations of) "unsigned", "signed", "signed hybrid". See adjacency.
网络类型。允许的值是()"unsigned","signed","signed hybrid"唯一的缩写。见adjacency。
参数:corFnc
character string specifying the function to be used to calculate co-expression similarity. Defaults to Pearson correlation. Another useful choice is bicor. More generally, any function returning values between -1 and 1 can be used.
指定的功能的字符串被用来计算共表达相似。默认为Pearson相关。另一个有用的选择是bicor。更一般地说,任何函数返回-1和1之间的值都可以使用。
参数:corOptions
character string specifying additional arguments to be passed to the function given by corFnc. Use "use = 'p', method = 'spearman'" to obtain Spearman correlation.
字符串指定额外的参数被传递给函数的corFnc。使用"use = 'p', method = 'spearman'"获得Spearman等级相关。
参数:referenceNetworks
a vector giving the indices of expression data to be used as reference networks. Reference networks must have their module labels given in multiColor.
的向量给予表达数据的指数,其中被用来作为参考网络。参考网络必须有自己的模块在multiColor标签。
参数:nPermutations
specifies the number of permutations that will be calculated in the permutation test.
指定的数目,将计算出的置换试验中的排列。
参数:includekMEallInSummary
logical: should cor.kMEall be included in the calculated summary statistics? Because kMEall takes into account all genes in the network, this statistic measures preservation of the full network with respect to the eigengene of the module. This may be undesirable, hence the default is FALSE.
逻辑:应cor.kMEall包含在所计算的汇总统计数据吗?由于kMEall考虑到网络中的所有基因,此统计数据测量保存完整的网络方面的模块eigengene。这可能是不希望的,因此,默认是FALSE。
参数:restrictSummaryForGeneralNetworks
logical: should the summary statistics for general (not correlation) networks be restricted (density to meanAdj, connectivity to cor.kIM and cor.Adj)? The default TRUE corresponds to published work.
逻辑:一般的摘要统计信息(不相关)网络限制(密度到meanAdj,连接到cor.kIM和cor.Adj)?默认TRUE对应于已经发表的作品。
参数:calculateQvalue
logical: should q-values (local FDR estimates) be calculated? Package qvalue must be installed for this calculation. Note that q-values may not be meaningful when the number of modules is small and/or most modules are preserved.
逻辑:Q值(本地FDR估计)计算出来的?包装qvalue必须安装此计算。注意是q值可能并没有意义当模块的数量是小的和/或大多数模块被保存。
参数:randomSeed
seed for the random number generator. If NULL, the seed will not be set. If non-NULL and the random generator has been initialized prior to the function call, the latter's state is saved and restored upon exit
随机数发生器的种子。如果NULL,种子不会被设置。如果非NULL和随机数发生器已初始化的函数调用之前,后者的状态被保存和恢复后退出
参数:maxGoldModuleSize
maximum size of the "gold" module, i.e., the random sample of all network genes.
最大规模的“黄金”模块,即所有的网络基因的随机抽样。
参数:maxModuleSize
maximum module size used for calculations. Modules larger than maxModuleSize will be reduced by randomly sampling maxModuleSize genes.
用于计算最大模块大小。模块大于maxModuleSize将减少随机抽样maxModuleSize基因。
参数:quickCor
number between 0 and 1 specifying the handling of missing data in calculation of correlation. Zero means exact but potentially slower calculations; one means potentially faster calculations, but with potentially inaccurate results if the proportion of missing data is large. See cor for more details.
介于0和1之间的数,指定的相关计算中的丢失的数据的处理。零意味着精确,但能降低计算的另一种方式可能更快的计算,但结果可能不准确,如果丢失的数据的比例大。见cor更多详情。
参数:ccTupletSize
tuplet size for co-clustering calculations.
tuplet合作聚类计算的大小。
参数:calculateCor.kIMall
logical: should cor.kMEall be calculated? This option is only valid for adjacency input. If FALSE, cor.kIMall will not be calculated, potentially saving significant amount of time if the input adjacencies are large and contain many modules.
逻辑:cor.kMEall应如何计算?此选项仅适用于邻接输入。如果FALSE,cor.kIMall将无法计算,有可能挽救大量的时间,如果输入邻接很大,包含许多模块。
参数:calculateClusterCoeff
logical: should statistics based on the clustering coefficient be calculated? While these statistics may be interesting, the calculations are also computationally expensive.
逻辑:统计的聚类系数的基础上计算出来的?虽然这些统计数据可能是有趣的,计算是计算昂贵的。
参数:checkData
logical: should data be checked for excessive number of missing entries? See goodSamplesGenesMS for details.
逻辑:数据缺少的条目过多的检查吗?见goodSamplesGenesMS的详细信息。
参数:greyName
label used for unassigned genes. Traditionally such genes are labeled by grey color or numeric label 0. These values are the default when multiColor contains character or numeric vectors, respectively.
标签用于未分配的基因。传统上,这种基因标记的灰色的颜色或数字标签0。这些值是默认情况下,当multiColor包含字符或数字向量,分别。
参数:savePermutedStatistics
logical: should calculated permutation statistics be saved? Saved statistics may be re-used if the calculation needs to be repeated.
逻辑:应计算排列的统计数据被保存吗?保存的统计数字,可重复使用,如果需要重复计算。
参数:permutedStatisticsFile
file name to save the permutation statistics into.
文件名保存排列的统计数据。
参数:loadPermutedStatistics
logical: should permutation statistics be loaded? If a previously executed calculation needs to be repeated, loading permutation study results can cut the calculation time many-fold.
逻辑排列的统计加载?如果需要重复先前执行的计算,装载置换的研究结果可以减少许多倍的计算时间。
参数:useInterpolation
logical: should permutation statistics be calculated by interpolating an artificial set of evenly spaced modules? This option may potentially speed up the calculations, but it restricts calculations to density measures.
逻辑排列的统计计算,通过插值的均匀分布模块的人工吗?此选项可能会加快计算,但它限制了计算密度的措施。
参数:plotInterpolation
logical: should interpolation plots be saved? If interpolation is used (see useInterpolation above), the function can optionally generate diagnostic plots that can be used to assess whether the interpolation makes sense.
逻辑:应插图得救呢?如果使用插值(见useInterpolation以上),可以选择生成的功能,可以用来评估是否插有意义的诊断图。
参数:interpolationPlotFile
file name to save the interpolation plots into.
文件名保存插补曲线。
参数:discardInvalidOutput
logical: should output columns containing no valid data be discarded? This option may be useful when input dataIsExpr is FALSE and some of the output statistics cannot be calculated. This option causes such statistics to be dropped from output.
逻辑:输出列不包含有效的数据被丢弃?此选项可能是有用的,当输入dataIsExpr是FALSE和输出统计中的一些可以不计算。此选项会导致这样的统计,从输出被丢弃。
参数:verbose
integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.
整数的详细程度。零表示沉默,较高的值使输出越来越多,更详细。
参数:indent
indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.
缩进诊断消息。零表示无压痕,每个单元增加两个空格。
Details
详细信息----------Details----------
This function calculates module preservation statistics pair-wise between given reference sets and all other sets in multiExpr. Reference sets must have their corresponding module assignment specified in multiColor; module assignment is optional for test sets. Individual expression sets and their module labels are matched using names of the corresponding components in multiExpr and multiColor.
此函数计算模块保存的统计数据,在给定的参考集以及所有其他组在multiExpr成对。 ,参考集必须有其相应的模块分配multiColor;模块分配在指定的试验台是可选的。个人表达组和模块标签使用namesmultiExpr和multiColor相应的组件相匹配。
For each reference-test pair, the function calculates module preservation statistics that measure how well the modules of the reference set are preserved in the test set. If the multiColor also contains module assignment for the test set, the calculated statistics also include cross-tabulation statistics that make use of the test module assignment.
对于每个基准测试对,该函数将计算模块保存的统计数据,测量的模块的参考设置被保存在测试集。如果multiColor,测试组还包含模块分配,计算出的统计数据还包括,使用的测试模块分配的交叉制表统计。
For each reference-test pair, the function only uses genes (columns of the data component of each component of multiExpr) that are in common between the reference and test set. Columns are matched by column names, so column names must be valid.
对于每个参考测试对,该函数只使用现在的位置之间共同的参考和测试集的基因(列data组件的每个组件的multiExpr)。匹配列的列名,因此,列名必须是有效的。
In addition to preservation statistics, the function also calculates several statistics of module quality, that is measures of how well-defined modules are in the reference set. The quality statistics are calculated with respect to genes in common with with a test set; thus the function calculates a set of quality statistics for each reference-test pair. This may be somewhat counter-intuitive, but it allows a direct comparison of corresponding quality and preservation statistics.
在除了保全统计,函数计算模块质量的统计数据,这是定义良好的模块是如何在标准设定的措施。质量统计计算中常见的基因与测试集,因此该函数将计算每个基准测试对一组高质量的统计数据。这可能是有点反直觉的,但它可以让一个相应的质量和保存统计数据的直接比较。
The calculated p-values are determined from the Z scores of individual measures under assumption of normality. No p-value is calculated for the Zsummary measures. Bonferoni correction to the number of tested modules. Because the p-values for strongly preserved modules are often extremely low, the function reports natural logarithms (base e) of the p-values. However, q-values are reported untransformed since they are calculated that way in package qvalue.
从个别措施的Z分数的正态性假设下计算的P值确定。没有p-值计算为Zsummary措施。 bonferoni校正测试的模块的数目。由于强烈保留模块的p值往往是非常低的,功能报告的p-值的自然对数(以e为底)。然而,报告未转化的Q值,因为它们是计算,包qvalue。
Missing data are removed (but see quickCor above).
丢失的数据被删除(但quickCor以上)。
值----------Value----------
The function returns a nested list of preservation statistics. At the top level, the list components are:
该函数返回一个嵌套列表的保存统计。在顶层,列表组件是:
参数:quality
observed values, Z scores, log p-values, Bonferoni-corrected log p-values, and (optionally) q-values of quality statistics. All logarithms are in base 10.
观测值,Z分数,log中的p值,Bonferoni校正log中的p值,以及(可选)Q值的高质量的统计数据。对数以10为基数。
参数:preservation
observed values, Z scores, log p-values, Bonferoni-corrected log p-values, and (optionally) q-values of density and connectivity preservation statistics. All logarithms are in base 10.
观测值,Z分数,log中的p值,Bonferoni校正log中的p值,和Q值(可选)的密度和连接保存统计信息。对数以10为基数。
参数:accuracy
observed values, Z scores, log p-values, Bonferoni-corrected log p-values, and (optionally) q-values of cross-tabulation statistics. All logarithms are in base 10.
观测值,Z分数,log中的p值,Bonferoni校正log中的p值,以及(可选)Q值的交叉制表统计。对数以10为基数。
参数:referenceSeparability
observed values, Z scores, log p-values, Bonferoni-corrected log p-values, and (optionally) q-values of module separability in the reference network. All logarithms are in base 10.
观测值,Z分数,log中的p值,Bonferoni校正log中的p值,以及(可选)Q值在参考网络模块可分离。对数以10为基数。
参数:testSeparability
observed values, Z scores, p-values, Bonferoni-corrected p-values, and (optionally) q-values of module separability in the test network. All logarithms are in base 10.
观测值,Z分数,p-值,Bonferoni校正的p-值,和(可选)在测试网络中的模块可分离性的Q-值。对数以10为基数。
参数:permutationDetails
results of individual permutations, useful for diagnostics
个别排列的结果,用于诊断的有用
All of the above are lists. The lists quality, preservation, referenceSeparability, and testSeparability each contain 4 or 5 components: observed contains observed values, Z contains the corresponding Z scores, log.p contains base 10 logarithms of the p-values, log.pBonf contains base 10 logarithms of the Bonferoni corrected p-values, and optionally q contains the associated q-values. The list accuracy contains observed, Z, log.p, log.pBonf, optionally q, and additional components observedOverlapCounts and observedFisherPvalues that contain the observed matrices of overlap counts and Fisher test p-values.
所有上述的列表。的名单quality,preservation,referenceSeparability和testSeparability每个包含4或5个部分组成:observed包含的观测值,Z包含了相应的Z分数,log.p包含底座10的p值的对数,log.pBonf包含底座10的对数的Bonferoni校正的p-值,和任选q包含相关的Q-值。的列表accuracy observed,Z,log.p,log.pBonf,可选q,和其他组件observedOverlapCounts和observedFisherPvalues包含所观察到的重叠计数矩阵和Fisher检验的P值。
Each of the lists observed, Z, log.p, log.pBonf, optionally q, observedOverlapCounts and observedFisherPvalues is structured as a 2-level list where the outer components correspond to reference sets and the inner components to tests sets. As an example, preservation$observed[[1]][[2]] contains the density and connectivity preservation statistics for the preservation of set 1 modules in set 2, that is set 1 is the reference set and set 2 is the test set. preservation$observed[[1]][[2]] is a data frame in which each row corresponds to a module in the reference network 1 plus one row for the unassigned objects, and one row for a "module" that contains randomly sampled objects and that represents a whole-network average. Each column corresponds to a statistic as indicated by the column name.
名单observed,Z,log.p,log.pBonf,可选择q,observedOverlapCounts和observedFisherPvalues结构为2级列表中对应的外部组件引用集和内部组件的测试集。作为一个例子,preservation$observed[[1]][[2]]包含保存集1模块的密度和连接保存统计信息在组2中,设置参考图1是集和组2是测试集。 preservation$observed[[1]][[2]]是一个数据框,其中每一行对应于未分配的对象的参考网络1加一排中的一个模块,和一个行的一个“模块”,其中包含随机取样的对象,它表示整个网络的平均水平。如所指示的列名,每列对应于一个统计。
注意----------Note----------
For large data sets, the permutation study may take a while (typically on the order of several hours). Use verbose = 3 to get detailed progress report as the calculations advance.
对于大型数据集,置换的研究可能需要一段时间(通常在几个小时的顺序)。使用verbose = 3计算提前得到详细的进展报告。
(作者)----------Author(s)----------
Rui Luo and Peter Langfelder
参考文献----------References----------
<h3>See Also</h3> Network construction and module detection functions in the WGCNA package such as <code>adjacency</code>, <code>blockwiseModules</code>; rudimentary cleaning in <code>goodSamplesGenesMS</code>; the WGCNA implementation of correlation in <code>cor</code>.
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|