blockwiseIndividualTOMs(WGCNA)
blockwiseIndividualTOMs()所属R语言包:WGCNA
Calculation of block-wise topological overlaps
计算块明智的拓扑重叠
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Calculates topological overlaps in the given (expression) data. If the number of variables (columns) in the input data is too large, the data is first split using pre-clustering, then topological overlaps are calculated in each block.
计算的拓扑重叠,在给定的(表达)的数据。如果在输入数据中的变量的数量(列)过大时,数据首先被分割,然后使用预聚类拓扑重叠在每个块计算。
用法----------Usage----------
blockwiseIndividualTOMs(
multiExpr,
# Data checking options
checkMissingData = TRUE,
# Blocking options
blocks = NULL,
maxBlockSize = 5000,
randomSeed = 12345,
# Network construction arguments: correlation options
corType = "pearson",
maxPOutliers = 1,
quickCor = 0,
pearsonFallback = "individual",
cosineCorrelation = FALSE,
# Adjacency function options
power = 6,
networkType = "unsigned",
checkPower = TRUE,
# Topological overlap options
TOMType = "unsigned",
TOMDenom = "min",
# Save individual TOMs? If not, they will be returned in the session.
saveTOMs = TRUE,
individualTOMFileNames = "individualTOM-Set%s-Block%b.RData",
# General options
nThreads = 0,
verbose = 2, indent = 0)
参数----------Arguments----------
参数:multiExpr
expression data in the multi-set format (see checkSets). A vector of lists, one per set. Each set must contain a component data that contains the expression data, with rows corresponding to samples and columns to genes or probes.
表达在多集的格式的数据(见checkSets)。一个向量的列表,每一个组。每个集必须包含一个组件data包含的表达数据,与对应的基因或探针的样品和列的行。
参数:checkMissingData
logical: should data be checked for excessive numbers of missing entries in genes and samples, and for genes with zero variance? See details.
逻辑:基因和样品缺少的条目过多的数据进行检查,并与零差异的基因呢?查看详细信息。
参数:blocks
optional specification of blocks in which hierarchical clustering and module detection should be performed. If given, must be a numeric vector with one entry per gene of multiExpr giving the number of the block to which the corresponding gene belongs.
块层次聚类和模块检测应执行的可选规格。如果给定的,必须是一个数值向量与基因multiExpr给该块的数目相应的基因所属的每一个条目。
参数:maxBlockSize
integer giving maximum block size for module detection. Ignored if blocks above is non-NULL. Otherwise, if the number of genes in datExpr exceeds maxBlockSize, genes will be pre-clustered into blocks whose size should not exceed maxBlockSize.
整数,最大块大小为模块检测。如果忽略blocks以上非NULL。否则,如果基因的数目在datExpr超过maxBlockSize,基因将预聚成块的大小应不超过maxBlockSize的。
参数:randomSeed
integer to be used as seed for the random number generator before the function starts. If a current seed exists, it is saved and restored upon exit. If NULL is given, the function will not save and restore the seed.
整数函数开始之前被用作随机数发生器的种子。如果当前的种子存在,它退出时保存和恢复。 NULL如果的,该函数将不能保存和恢复的种子。
参数:corType
character string specifying the correlation to be used. Allowed values are (unique abbreviations of) "pearson" and "bicor", corresponding to Pearson and bidweight midcorrelation, respectively. Missing values are handled using the pariwise.complete.obs option.
字符串指定要使用的相关性。允许的值是(唯一的缩写)"pearson"和"bicor",对应的Pearson和bidweight midcorrelation的,分别。处理缺失值pariwise.complete.obs使用选项。
参数:maxPOutliers
only used for corType=="bicor". Specifies the maximum percentile of data that can be considered outliers on either side of the median separately. For each side of the median, if higher percentile than maxPOutliers is considered an outlier by the weight function based on 9*mad(x), the width of the weight function is increased such that the percentile of outliers on that side of the median equals maxPOutliers. Using maxPOutliers=1 will effectively disable all weight function broadening; using maxPOutliers=0 will give results that are quite similar (but not equal to) Pearson correlation.
仅用于corType=="bicor"。指定的最大百分位数的数据是可以考虑的离群值中位数的任一侧上分开。如果在中位数的每一侧,如果更高的百分比maxPOutliers被认为是一个异常值的权重函数基于9*mad(x),权重函数的宽度的增加,离群值的那侧上的百分中位数等于maxPOutliers。使用maxPOutliers=1将有效地禁用所有的权重函数扩大; maxPOutliers=0给出的结果是相当类似(但不等于)Pearson相关。
参数:quickCor
real number between 0 and 1 that controls the handling of missing data in the calculation of correlations. See details.
0和1之间,控制处理中丢失的数据的相关性的计算的实数。查看详细信息。
参数:pearsonFallback
Specifies whether the bicor calculation, if used, should revert to Pearson when median absolute deviation (mad) is zero. Recongnized values are (abbreviations of) "none", "individual", "all". If set to "none", zero mad will result in NA for the corresponding correlation. If set to "individual", Pearson calculation will be used only for columns that have zero mad. If set to "all", the presence of a single zero mad will cause the whole variable to be treated in Pearson correlation manner (as if the corresponding robust option was set to FALSE). Has no effect for Pearson correlation. See bicor.
指定是否BICOR计算,如果使用的话,应恢复时,Pearson的平均绝对偏差(MAD)是零。株型识别的值是(的缩写)"none", "individual", "all"。如果设置为"none",零狂会导致NA相应的相关。如果设置为"individual",皮尔森计算将仅用于列具有零狂。如果设置为"all",一个单独的零狂的存在,将导致在Pearson相关性的方式来对待整个变量(如果相应的robust选项被设置为FALSE)。有没有影响Pearson相关。见bicor。
参数:cosineCorrelation
logical: should the cosine version of the correlation calculation be used? The cosine calculation differs from the standard one in that it does not subtract the mean.
余弦版本的相关计算逻辑:应使用?的余弦计算不同于标准的一个,它并没有减去均值。
参数:power
soft-thresholding power for netwoek construction.
软阈值功率为netwoek建设。
参数:networkType
network type. Allowed values are (unique abbreviations of) "unsigned", "signed", "signed hybrid". See adjacency.
网络类型。允许的值是()"unsigned","signed","signed hybrid"唯一的缩写。见adjacency。
参数:checkPower
logical: should basic sanity check be performed on the supplied power? If you would like to experiment with unusual powers, set the argument to FALSE and proceed with caution.
逻辑基础上进行完整性检查所提供的power?如果您想尝试不寻常的权力,设置的参数FALSE和谨慎行事。
参数:TOMType
one of "none", "unsigned", "signed". If "none", adjacency will be used for clustering. If "unsigned", the standard TOM will be used (more generally, TOM function will receive the adjacency as input). If "signed", TOM will keep track of the sign of correlations between neighbors. Note that the "unsigned" vs. "signed" distinction is only relevant when networkType is "unsigned". When networkType is "signed" or "signed hybrid", there is no difference between TOMType="signed" and TOMType="unsigned".
"none","unsigned","signed"之一。如果"none",邻接将用于聚类。如果"unsigned",标准的TOM将使用(更一般地,TOM函数将接收到作为输入的邻接)。如果"signed",TOM将跟踪邻居之间的相关性的符号。请注意,"unsigned"与"signed"区别仅适用networkType是"unsigned"。当networkType是"signed"或"signed hybrid",有没有什么区别TOMType="signed"和TOMType="unsigned".
参数:TOMDenom
a character string specifying the TOM variant to be used. Recognized values are "min" giving the standard TOM described in Zhang and Horvath (2005), and "mean" in which the min function in the denominator is replaced by mean. The "mean" may produce better results in certain special situations but at this time should be considered experimental.
要使用的字符的字符串指定的TOM变种。公认的价值观是"min"的给标准TOM张和霍瓦特(2005年)中描述的,"mean"在其中min函数分母中的被替换的mean。 "mean"可能会产生更好的结果在某些特殊情况下,但在这个时候,应考虑实验。
参数:saveTOMs
logical: should calculated TOMs be saved to disk (TRUE) or returned in the return value (FALSE)? Returning calculated TOMs via the return value ay be more convenient bt not always feasible if the matrices are too big to fit all in memory at the same time.
逻辑:大卫 - 汤姆斯被保存到磁盘(TRUE)或返回的返回值(FALSE)计算呢?通过返回值AY返回计算大卫 - 汤姆斯的矩阵太大,不适合在内存中同时更方便,BT并不总是可行的。
参数:individualTOMFileNames
character string giving the file names to save individual TOMs into. The following tags should be used to make the file names unique for each set and block: %s will be replaced by the set number; %N will be replaced by the set name (taken from names(multiExpr)) if it exists, otherwise by set number; %b will be replaced by the block number. If the file names turn out to be non-unique, an error will be generated.
字符的字符串,给出文件名进行保存个人TOMS。下面的标签应该被用来制造独特的文件名,为每个集合和数据块:%s将被取代的定数; %N将被替换集的名称(取自names(multiExpr) ),如果它存在,否则定数,“%b将被替换的块号。如果该文件名变成非唯一的,会产生错误。
参数:nThreads
non-negative integer specifying the number of parallel threads to be used by certain parts of correlation calculations. This option only has an effect on systems on which a POSIX thread library is available (which currently includes Linux and Mac OSX, but excludes Windows). If zero, the number of online processors will be used if it can be determined dynamically, otherwise correlation calculations will use 2 threads.
非负的整数,用于指定要使用的某些部分的相关性计算的并行线程的数目。此选项仅影响的系统上POSIX线程库(目前包括Linux和Mac OSX,但不包括视窗)。如果为零,则在线的处理器的数目将被使用,如果是可以动态地确定,否则将使用相关计算2个线程。
参数:verbose
integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.
整数的详细程度。零表示沉默,较高的值使输出越来越多,更详细。
参数:indent
indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.
缩进诊断消息。零表示无压痕,每个单元增加两个空格。
Details
详细信息----------Details----------
The function starts by optionally filtering out samples that have too many missing entries and genes that have either too many missing entries or zero variance in at least one set. Genes that are filtered out are excluded from the TOM calculations.
此功能启动,通过随意过滤的样品也有许多遗漏的项目和基因,要么太多的遗漏的项目或至少在一组的方差为零。基因被过滤掉不从TOM计算。
If blocks is not given and the number of genes exceeds maxBlockSize, genes are pre-clustered into blocks using the function consensusProjectiveKMeans; otherwise all genes are treated in a single block.
blocks如果不和的基因数目超过maxBlockSize,,基因的预聚成块的功能consensusProjectiveKMeans;否则,所有基因都在一个单一的块处理。
For each block of genes, the network is constructed and (if requested) topological overlap is calculated in each set. The topological overlaps can be saved to disk as RData files, or returned directly within the return value (see below). Note that the matrices can be big and returning them within the return value can quickly exhaust the system's memory. In particular, if the block-wise calculation is necessary, it is nearly certain that returning all matrices via the return value will be impossible.
对于每个块的基因,该网络被构造和(如果被请求)拓扑重叠的每个集合中的计算。 RDATA文件的的拓扑重叠,可以被保存到磁盘,或直接在返回的返回值(见下文)。需要注意的是矩阵可大了,他们在返回值返回,可能很快会耗尽系统的内存。特别是,如果块明智计算,是必要的,它是几乎可以肯定的是通过返回值返回所有矩阵将是不可能的。
值----------Value----------
A list with the following components:
以下组件列表:
参数:actualTOMFileNames
Only returned if input saveTOMs is TRUE. A matrix of character strings giving the file names in which each block TOM is saved. Rows correspond to data sets and columns to blocks.
仅返回,如果输入saveTOMs是TRUE,。 A矩阵给每块TOM保存的文件名的字符串。行对应于数据集和列的块。
参数:TOMSimilarities
Only returned if input saveTOMs is FALSE. A list in which each component corresponds to one block. Each component is a matrix of dimensions ( (number of sets) times N), where N is the length of a distance structure corresponding to the block. That is, if the block contains n genes, N=n*(n-1)/2. Each row of the matrix contains the topological overlap of variables in the corresponding set ( and the corresponding block), arranged as a distance structure. Do note however that the topological overlap is a similarity (not a distance).
仅返回,如果输入saveTOMs是FALSE,。的列表,其中每个组件对应于一个块。每个组件的尺寸是一个矩阵((套)次数N),其中N是对应于块的长度的距离的结构。也就是说,如果该块包含n个基因,N = N *(N-1)/ 2。的矩阵的每一行中包含的变量(以及对应的块)中的对应的一组,配置作为距离结构拓扑重叠。但要注意的拓扑重叠是相似的(不是距离)。
参数:blocks
if input blocks was given, its copy; otherwise a vector of length equal number of genes giving the block label for each gene. Note that block labels are not necessarilly sorted in the order in which the blocks were processed (since we do not require this for the input blocks). See blockOrder below.
如果输入blocks是给定的,它的拷贝,否则将向量的长度相等数量的基因给每个基因的块标签。需要注意的是,块标签不necessarilly排序块的顺序进行处理(因为我们不需要输入blocks)。见blockOrder下面。
参数:blockGenes
a list with one component for each block of genes. Each component is a vector giving the indices (relative to the input multiExpr) of genes in the corresponding block.
基因的一个组成部分的每个块的列表。每个组件是一个给指数(相对于输入multiExpr)在对应的块中的基因的向量。
参数:goodSamplesAndGenes
if input checkMissingData is TRUE, the output of the function goodSamplesGenesMS. A list with components goodGenes (logical vector indicating which genes passed the missing data filters), goodSamples (a list of logical vectors indicating which samples passed the missing data filters in each set), and allOK (a logical indicating whether all genes and all samples passed the filters). See goodSamplesGenesMS for more details. If checkMissingData is FALSE, goodSamplesAndGenes contains a list of the same type but indicating that all genes and all samples passed the missing data filters.
如果输入checkMissingData是TRUE,输出的功能goodSamplesGenesMS。组件列表goodGenes(逻辑向量的基因传给丢失的数据过滤器),goodSamples的逻辑向量,样本通过在每一组中丢失的数据过滤器(名单),<X >(逻辑是否所有的基因,所有样本通过过滤器)。见allOK更多详情。如果goodSamplesGenesMScheckMissingData,FALSE中包含的相同的类型,但表明所有的基因和所有样品通过丢失数据过滤器的列表。
The following components are present mostly to streamline the interaction of this function with blockwiseConsensusModules.
下列组件,主要是为了简化此功能的互动与blockwiseConsensusModules。
参数:nGGenes
Number of genes that passed missing data filters (if input checkMissingData is TRUE), or the number of all genes (if checkMissingData is FALSE).
如果输入checkMissingData的基因通过数据的过滤器数量(是TRUE),或所有基因的数量(如果checkMissingData是FALSE)。
参数:gBlocks
the vector blocks (above), restricted to good genes only.
向量blocks(上图),仅限良好的基因。
参数:nThreads
number of threads used to calculate correlation and TOM matrices.
计算相关和TOM矩阵的线程数。
参数:TOMSavedInFiles
logical: were calculated matrices saved in files (TRUE) or returned in the return value (FALSE)?
逻辑:计算矩阵保存在文件(TRUE)或返回返回值中的(FALSE)?
参数:intNetworkType, intCorType
integer codes for network and correlation type.
网络和相关类型的整数代码。
(作者)----------Author(s)----------
Peter Langfelder
参考文献----------References----------
Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article 17
BMC Bioinformatics 2008, 9:559
参见----------See Also----------
blockwiseConsensusModules
blockwiseConsensusModules
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|