找回密码
 注册
查看: 632|回复: 0

R语言 WGCNA包 collapseRows()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-10-1 21:09:22 | 显示全部楼层 |阅读模式
collapseRows(WGCNA)
collapseRows()所属R语言包:WGCNA

                                        Select one representative row per group
                                         每组中选择一个代表行

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Abstractly speaking, the function allows one to collapse the rows of a numeric matrix,  e.g. by forming an average or selecting one representative row for each group of rows specified by a  grouping variable (referred to as rowGroup). The word "collapse" reflects the fact that the  method yields a new matrix whose rows correspond to other rows of the original input data. The function  implements several network-based and biostatistical methods for finding a representative row for each  group specified in rowGroup. Optionally, the function identifies the representative row according to the least number of missing data,  the highest sample mean, the highest sample variance, the highest connectivity. One of the advantages of  this function is that it implements default settings which have worked well in numerous applications.  Below, we describe these default settings in more detail.
抽象地讲,该功能允许一个崩溃的数字矩阵的行,例如:通过形成的平均或选择一个代表行的每一组行中所指定的一组变量(称为作为rowGroup)。词语“崩溃”反映的事实,即该方法会产生一个新的矩阵的行对应于其它行的原始输入数据。该函数实现基于网络和基于生物统计学的方法为每个组指定的rowGroup找到一个代表行。或者,该功能可识别的代表行数最少的数据丢失,最高的样本均值,样本方差最高,最高连通。此功能的优势之一是它实现了行之有效的默认设置,在众多应用中。下面,我们更详细描述这些默认设置。


用法----------Usage----------


collapseRows(datET, rowGroup, rowID,
             method="MaxMean", connectivityBasedCollapsing=FALSE,
             methodFunction=NULL, connectivityPower=1,
             selectFewestMissing=TRUE, thresholdCombine=NA)



参数----------Arguments----------

参数:datET
matrix or data frame containing numeric values where rows correspond to variables (e.g. microarray probes) and columns correspond to observations (e.g. microarrays). Each row of datET must have a unique row identifier (specified in the vector rowID). The group label of each row is encoded in the vector rowGroup. While rowID should have non-missing, unique values (identifiers), the values of the vector rowGroup will typically not be unique since the function aims to pick a representative row for each group.  
矩阵或数据框包含数字值的行对应的变量(如基因芯片探针)和列对应的观测(如微阵列)。 datET的每一行必须有一个唯一的行标识符(指定的矢量rowID“)。每一行的组标签被编码在矢量rowGroup。而rowID应具有非缺失,唯一值(标识),值的向量rowGroup通常不会是唯一的,由于该功能的目的是各组中选择一个代表行。


参数:rowGroup
character vector whose components contain the group label (e.g. a character string) for each row of datET. This vector needs to have the same length as the vector rowID. In gene expression applications, this vector could contain the gene symbol (or a co-expression module label).  
字符向量,其包含的每一行的datET组标签(如字符串)。该向量必须具有相同的长度作为矢量rowID。在基因表达的应用中,该向量可包含的基因符号(或共表达模块的标签)。


参数:rowID
character vector of row identifiers.  This should include all the rows from rownames(datET), but can include other rows. Its entries should be unique (no duplicates) and no missing values are permitted. If the row identifier is missing for a given row, we suggest you remove this row from datET before applying the function.  
字符向量的行标识符。这应该包括所有的行名行从(datET),但可以包括其他行。其参赛作品必须是唯一的(不重复)没有失踪的允许值。如果缺少一个给定的行的行标识符,我们建议您删除此行datET在提出申请前的功能。


参数:method
character string for determining which method is used to choose a probe among exactly 2 corresponding rows or when connectivityBasedCollapsing=FALSE. These are the options: "MaxMean" (default) or "MinMean" = choose the row with the highest or lowest mean value, respectively. "maxRowVariance" = choose the row with the highest variance (across the columns of datET).  "absMaxMean" or "absMinMean" = choose the row with the highest or lowest mean absolute value. "ME" = choose the eigenrow (first principal component of the rows in each group).  Note that with this method option, connectivityBasedCollapsing is automatically set to FALSE. "Average" = for each column, take the average value of the rows in each group  "function" = use this method for a user-input function (see the description of the argument "methodFunction"). Note: if method="ME", "Average" or "function", the output parameters "group2row" and "selectedRow" are not informative.  
决定使用哪种方法来选择相应的行或connectivityBasedCollapsing = FALSE时,探针之间的字符的字符串。这些选项:“MaxMean”(默认)或“MinMean”=选择该行的最高或最低的平均值。 “maxRowVariance”=选择的行最高的方差(跨列datET)。 “absMaxMean”或“absMinMean”=选择该行的最高或最低的平均绝对值。 “ME”=选择“的eigenrow”(第一行各组的主要组成部分)。请注意,用这种方法的选项,connectivityBasedCollapsing被自动设置为FALSE。 “平均”=对于每一列,取平均值“功能”每个组中的行使用此方法,用户输入的功能(见的描述的说法“methodFunction”)。注:如果method =“ME”,“一般”或“功能”,输出参数“group2row”和“selectedRow的是没有信息。


参数:connectivityBasedCollapsing
logical value. If TRUE, groups with 3 or more corresponding rows will be represented by the row with the highest connectivity according to a signed weighted correlation network adjacency matrix among the corresponding rows. Recall that the connectivity is defined as the rows sum of the adjacency matrix. The signed weighted adjacency matrix is defined as A=(0.5+0.5*COR)^power where power is determined by the argument connectivityPower and COR denotes the matrix of pairwise Pearson correlation coefficients among the corresponding rows.  
逻辑值。如果为TRUE,将具有最高连通根据一个有符号的加权相关的网络的邻接矩阵的相应的行之间的行表示的具有3个或更多个相应的行的组。回想一下,连接的邻接矩阵的行的总和定义为。已签署的加权邻接矩阵定义为A =(0.5 +0.5 * COR)^是由电源参数connectivityPower和COR表示相应的行两两之间的Pearson相关系数矩阵的能力。


参数:methodFunction
character string. It only needs to be specified if method="function" otherwise its input is ignored.  Must be a function that takes a Nr x Nc matrix of numbers as input and outputs a vector with the length Nc (e.g., colMeans).  This will then be the method used for collapsing values for multiple rows into a single value for the row.  
字符的字符串。如果method =“功能”,否则其输入被忽略,只需要指定。必须是一个函数,它接受一个NR X NC的数字矩阵作为输入和输出向量的长度NC(例如,colMeans)。然后,这将是用于倍数成一个单一的值的行的多个行的值,所使用的方法。


参数:connectivityPower
Positive number (typically integer) for specifying the threshold (power) used to construct the signed weighted adjacency matrix, see the description of connectivityBasedCollapsing. This option is only used if connectivityBasedCollapsing=TRUE.  
用于指定用于构建已签署的加权邻接矩阵的阈值(功率)的正数(通常为整数),请参阅connectivityBasedCollapsing的描述。此选项仅用于,如果connectivityBasedCollapsing = TRUE。


参数:selectFewestMissing
logical values. If TRUE (default), the input expression matrix is trimmed such that for each group only the rows with the fewest number of missing values are retained.  In situations where an equal number of values are missing (or where there is no missing data), all rows for a given group are retained.  Whether this value is set to TRUE or FALSE, all rows with >90% missing data are omitted from the analysis.  
逻辑值。如果是TRUE(默认),输入表达式矩阵的修剪等,每个组只行数最少的遗漏值被保留。在一组给定的情况下,同等数量的值丢失(或如没有丢失数据),所有行被保留。这个值是否设置为TRUE或FALSE,> 90%丢失的数据,从分析中忽略的所有行。


参数:thresholdCombine
Number between -1 and 1, or NA. If NA (default), this input is ignored. If a number between -1 and 1 is input, this value is taken as a threshold value, and collapseRows proceeds following the "maxMean" method, but ONLY for ids with correlations of R>thresholdCombine.  Specifically: ...1) If there is one id/group, keep the id ...2) If there are 2 ids/group, take the maximum mean expression if their correlation is > thresholdCombine ...3) If there are 3+ ids/group, iteratively repeat (2) for the 2 ids with the highest correlation until  all ids remaining have correlation < thresholdCombine for each group Note that this option usually results in more than one id per group; therefore, one must use care when  implementing this option for use in comparisons between multiple matrices / data frames.  
-1和1之间,或NA数目。如果NA(默认),该输入将被忽略。如果输入-1和1之间的一个数,这个值作为阈值,并collapseRows所得继“maxMean”方法,但仅使用的ID与的R> thresholdCombine的的的相关性。具体做法是:1)如果有一个ID /组,不断的ID ... 2)如果有2个IDS /组,最高平均表达的相关性是> thresholdCombine ... 3)如果有为3 +的id /组,迭代重复(2)2具有最高相关性的id,直到所有的ID的剩余有相关性<thresholdCombine为每个组注意,此选项通常的查询结果在一个以上的每个组的id,因此,人必须执行该选项用于在多个矩阵/数据框之间的比较时要小心。


Details

详细信息----------Details----------

The function is robust to missing data.  Also, if rowIDs are missing, they are inferred according to the rownames of datET when possible.   When a group corresponds to only 1 row then it is represented by this row since there is no other choice. Having said this, the row may be removed if it contains an excessive amount of missing data (90 percent or more missing values), see the description of the argument selectFewestMissing for more details.
缺少数据的功能是强大的。此外,如果所谓的ROWIDs丢失,它们被推断根据的行名datET在可能的情况下。当一组对应只有1行,然后由该行表示,因为没有其他的选择。话虽如此,该行可能会被删除,如果它含有过量丢失的数据量(90%或更多的遗漏值的),请参阅描述的参数selectFewestMissing的详细信息。

A group is represented by a corresponding row with the fewest number of missing data if selectFewestMissing has been set to TRUE.  Often several rows have the same minimum number of missing values (or no missing values) and a representative  must be chosen among those rows.  In this case we distinguish 2 situations: (1) If a group corresponds to exactly 2 rows then the corresponding row with the highest average is selected if method="maxMean". Alternative methods can be chosen as described in method. (2) If a group corresponds to more than 2 rows, then the function calculates a signed weighted correlation network (with power specified in connectivityPower) among the corresponding rows if connectivityBasedCollapsing=TRUE. Next the function calculates the network connectivity of each row  (closely related to the sum or correlations with the other matching rows). Next it chooses the most highly connected row as representative. If connectivityBasedCollapsing=FALSE, then method is used.  For both situations, if more than one row has the same value, the first such row is chosen.
A组代表一个对应的行数最少丢失的数据如果selectFewestMissing已被设置为TRUE。往往几个行具有相同的缺失值的最小数目(或没有缺失值)和一个代表必须选择这些行之间。在这种情况下,我们区分2种情况:(1)如果一个组只有2行,那么相应的行被选定,如果以最高的平均method="maxMean"。可以选择的替代方法,如描述在method。 (2)如果一组对应超过2行,,然后函数计算签名的加权关联网络(权力在connectivityPower)之间的对应行,如果connectivityBasedCollapsing=TRUE。往下函数计算每行的网络连通性(密切相关的总和,或与其他匹配的行的相关性)。接着,它会选择最高度连接的行为代表。如果connectivityBasedCollapsing = FALSE,那么method使用。对于这两种情况下,如果一个以上的行具有相同的值,第一个这样的行被选择。

Setting thresholdCombine is a special case of this function, as not all ids for a single group are necessarily collapsed&ndash;only those with similar expression patterns are collapsed.  We suggest using this option when the goal is to decrease the number of ids for computational reasons, but when ALL ids for a single group should not be combined (for example, if two probes could represent different  splice variants for the same gene for many genes on a microarray).
设置thresholdCombine是一个特殊的情况下这个功能,因为不是所有的IDS为一个组是一定倒塌,只有那些具有相似的表达模式倒塌。我们建议使用此选项时,目标是降低的ID数计算的原因,但是当一个组中的所有ID应该不会被合并(例如,如果两个探针可以代表许多基因相同的基因不同的剪接变异体在微阵列上)。

Example application: when dealing with microarray gene expression data then the rows of datET may correspond to unique probe identifiers and rowGroup may contain corresponding gene symbols. Recall that multiple probes (specified using rowID=ProbeID) may correspond to the same gene symbol (specified using rowGroup=GeneSymbol). In this case, datET contains the input expression data with rows as rowIDs and output expression data with rows as gene symbols, collapsing all probes for a given gene symbol into one representative.  
示例应用程序:在处理与微阵列基因表达数据的行datET可能对应于独特的探针标识和rowGroup可能包含相应的基因符号。回想一下,多个探针(指定的使用rowID= ProbeID的)可以对应于相同的基因符号(使用rowGroup= GeneSymbol的指定)。在这种情况下,datET包含的输入表达式数据的ROWIDs和输出表达数据与作为基因符号的行,崩成一个代表一个给定的基因符号的所有探针的行作为。


值----------Value----------

The output is a list with the following components.
的输出是一个与以下组件列表。


参数:datETcollapsed
is a numeric matrix with the same columns as the input matrix datET, but with rows corresponding to the different row groups rather than individual row identifiers.  (If thresholdCombine is set, then rows still correspond to individual row identifiers.)  
是一个数值的矩阵与输入矩阵datET相同的列,但是对应于不同的行的组,而不是个别的行标识符与行。 (,如果thresholdCombine的设置,然后行仍然符合个别行标识符)。


参数:group2row
is a matrix whose rows correspond to the unique group labels and whose 2 columns report which group label (first column called group) is represented by what row label (second column called selectedRowID).  Set to NULL if method="ME" or "function". </table>
是一个矩阵的行对应的独特的组标签,其2列报告什么行标签(叫做group第二列表示的组标签(第一列叫做selectedRowID))。如果method =“ME”或“功能”设置为NULL。 </ TABLE>

.



参数:selectedRow
is a logical vector whose components are TRUE for probes selected as representatives and FALSE otherwise. It has the same length as the vector probeID.  Set to NULL if method="ME" or "function".
是一个逻辑向量,其组成成分为代表,否则返回FALSE选择的探针。它具有相同的长度作为矢量probeID。如果method =“ME”或“功能”设置为NULL。


(作者)----------Author(s)----------



Jeremy A. Miller, Steve Horvath, Peter Langfelder, Chaochao Cai




参考文献----------References----------

The collapseRows R function. Technical Report.

实例----------Examples----------


    ########################################################################[################################################## #####################]
    # EXAMPLE 1:[实施例1:]
    # The code simulates a data frame (called dat1) of correlated rows.[代码模拟一个数据框(称为DAT1)的相关行。]
    # You can skip this part and start at the line called Typical Input Data[您可以跳过这一部分,并开始在该行被称为典型的输入数据]
    # The first column of the data frame will contain row identifiers[第一列的数据框将包含行标识符]
    # number of columns (e.g. observations or microarrays)[列数(如意见或微阵列)]
    m=60
    # number of rows (e.g. variables or probes on a microarray) [的行数(例如,变量或在微阵列上的探针)]
    n=500
    # seed module eigenvector for the simulateModule function[种子模块特征向量进行simulateModule功能的]
    MEtrue=rnorm(m)
    # numeric data frame of n rows and m columns[n行和m列的数字数据框]
    datNumeric=data.frame(t(simulateModule(MEtrue,n)))
    RowIdentifier=paste("robe", 1:n, sep="")
    ColumnName=paste("Sample",1:m, sep="")
    dimnames(datNumeric)[[2]]=ColumnName
    # Let us now generate a data frame whose first column contains the rowID[现在,让我们产生一个数据框的第一个列包含的rowID的]
    dat1=data.frame(RowIdentifier, datNumeric)
    #we simulate a vector with n/5 group labels, i.e. each row group corresponds to 5 rows[我们模拟了向量的n / 5组标签,即每行组对应5行]
    rowGroup=rep(  paste("Group",1n/5),  sep=""), 5 )
   
    # Typical Input Data [典型的输入数据]
    # Since the first column of dat1 contains the RowIdentifier, we use the following code[由于第一列DAT1包含的RowIdentifier的,我们使用下面的代码]
    datET=dat1[,-1]
    rowID=dat1[,1]
   
    # assign row names according to the RowIdentifier [指定行名称的RowIdentifier]
    dimnames(datET)[[1]]=rowID
    # run the function and save it in an object[运行的功能,并将其保存在一个对象]
   
    collapse.object=collapseRows(datET=datET, rowGroup=rowGroup, rowID=rowID)
   
    # this creates the collapsed data where [这创造了倒塌数据]
    # the first column contains the group name[第一列包含的组名]
    # the second column reports the corresponding selected row name (the representative)[第二列报告选择相应的列名(代表)]
    # and the remaining columns report the values of the representative row[和剩余的列代表行报告的值]
    dat1Collapsed=data.frame( collapse.object$group2row, collapse.object$datETcollapsed)
    dat1Collapsed[1:5,1:5]

    ########################################################################[################################################## #####################]
    # EXAMPLE 2:[实施例2:]
    # Using the same data frame as above, run collapseRows with a user-inputted function.[使用与上述相同的数据框,运行collapseRows与用户输入的功能。]
    # In this case we will use the mean.  Note that since we are choosing some combination[在这种情况下,我们将使用的意思。需要注意的是,因为我们选择的某种组合]
    #   of the probe values for each gene, the group2row and selectedRow output [每个基因,group2row;和selectedRow输出的探测值]
    #   parameters are not meaningful.[参数是没有意义的。]

    collapse.object.mean=collapseRows(datET=datET, rowGroup=rowGroup, rowID=rowID,
          method="function", methodFunction=colMeans)[[1]]

    # Note that in this situation, running the following code produces the identical results:[请注意,在这种情况下,执行下面的代码产生相同的结果:]

    collapse.object.mean.2=collapseRows(datET=datET, rowGroup=rowGroup, rowID=rowID,
          method="Average")[[1]]

    ########################################################################[################################################## #####################]
    # EXAMPLE 3:[实施例3:]
    # Using collapseRows to calculate the module eigengene.[使用collapseRows计算模块eigengene。]
    # First we create some sample data as in example 1 (or use your own!)[首先,我们创建一些示例数据,如例1中(或使用您自己的!)]
    m=60
    n=500
    MEtrue=rnorm(m)
    datNumeric=data.frame(t(simulateModule(MEtrue,n)))

    # In this example, rows are genes, and groups are modules.[在这个例子中,行的基因,和组的模块。]
    RowIdentifier=paste("Gene", 1:n, sep="")
    ColumnName=paste("Sample",1:m, sep="")
    dimnames(datNumeric)[[2]]=ColumnName
    dat1=data.frame(RowIdentifier, datNumeric)
    # We simulate a vector with n/100 modules, i.e. each row group corresponds to 100 rows[我们模拟了矢量与N/100模块,即每个组对应行至100行]
    rowGroup=rep(  paste("Module",1n/100),  sep=""), 100 )
    datET=dat1[,-1]
    rowID=dat1[,1]
    dimnames(datET)[[1]]=rowID

    # run the function and save it in an object[运行的功能,并将其保存在一个对象]
    collapse.object.ME=collapseRows(datET=datET, rowGroup=rowGroup, rowID=rowID, method="ME")[[1]]
   
    # Note that in this situation, running the following code produces the identical results:[请注意,在这种情况下,执行下面的代码产生相同的结果:]
    collapse.object.ME.2 = t(moduleEigengenes(expr=t(datET),colors=rowGroup)$eigengene)
    colnames(collapse.object.ME.2) = ColumnName
    rownames(collapse.object.ME.2) = sort(unique(rowGroup))

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2024-11-25 07:15 , Processed in 0.043270 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表