mutualInfoAdjacency(WGCNA)
mutualInfoAdjacency()所属R语言包:WGCNA
Calculate weighted adjacency matrices based on mutual information
基于互信息计算加权邻接矩阵
译者:生物统计家园网 机器人LoveR
描述----------Description----------
The function calculates different types of weighted adjacency matrices based on the mutual information between vectors (corresponding to the columns of the input data frame datE). The mutual information between pairs of vectors is divided by an upper bound so that the resulting normalized measure lies between 0 and 1.
该函数计算加权邻接矩阵的不同类型的基于矢量(对应的输入数据框的日期的列)之间的互信息。矢量之间的互信息对被划分的上部绑定,使得所得的归一措施位于0和1之间。
用法----------Usage----------
mutualInfoAdjacency(datE, discretizeColumns = TRUE, entropyEstimationMethod = "MM", numberBins = NULL)
参数----------Arguments----------
参数:datE
datE is a data frame or matrix whose columns correspond to variables and whose rows correspond to measurements. For example, the columns may correspond to genes while the rows correspond to microarrays. The number of nodes in the mutual information network equals the number of columns of datE.
datE是一个数据框或矩阵的列对应的变量和其对应测量的行。例如,列可以对应于基因而行对应于微阵列。互信息的网络中的节点的数目等于数列datE。
参数:discretizeColumns
is a logical variable. If it is set to TRUE then the columns of datE will be discretized into a user-defined number of bins (see numberBins).
是一个逻辑变量。如果它被设置为true,则列datE将离散为一个用户定义的容器(见numberBins“)。
参数:entropyEstimationMethod
takes a text string for specifying the entropy and mutual information estimation method. If entropyEstimationMethod="MM" then the Miller-Madow asymptotic bias corrected empirical estimator is used. If entropyEstimationMethod="ML" the maximum likelihood estimator (also known as plug-in or empirical estimator) is used. If entropyEstimationMethod="shrink", the shrinkage estimator of a Dirichlet probability distribution is used. If entropyEstimationMethod="SG", the Schurmann-Grassberger estimator of the entropy of a Dirichlet probability distribution is used.
将一个文本字符串中指定的熵,互信息估计方法。如果entropyEstimationMethod="MM"然后的米勒Madow的的渐近偏差纠正经验估计使用。如果entropyEstimationMethod="ML"的最大似然估计(也被称为插件或经验估计)使用。如果entropyEstimationMethod="shrink",收缩估计的狄利克雷概率分布使用。如果entropyEstimationMethod="SG",舒尔曼的Grassberger估计的狄利克雷概率分布的熵。
参数:numberBins
is an integer larger than 0 which specifies how many bins are used for the discretization step. This argument is only relevant if discretizeColumns has been set to TRUE. By default numberBins is set to sqrt(m) where m is the number of samples, i.e. the number of rows of datE. Thus the default is numberBins=sqrt(nrow(datE)).
大于0,指定的离散化步骤用于多少箱是一个整数。这种说法是有关discretizeColumns如果已被设置为TRUE。默认为numberBins被设置到sqrt(m),其中m是样本的数目,即datE的数量的行。因此,默认情况下是numberBins= SQRT(NROW(日期))。
Details
详细信息----------Details----------
The function inputs a data frame datE and outputs a list whose components correspond to different weighted network adjacency measures defined beteween the columns of datE. Make sure to install the following R packages entropy, minet, infotheo since the function mutualInfoAdjacency makes use of the entropy function from the R package entropy (Hausser and Strimmer 2008) and functions from the minet and infotheo package (Meyer et al 2008). A weighted network adjacency matrix is a symmetric matrix whose entries take on values between 0 and 1. Each weighted adjacency matrix contains scaled versions of the mutual information between the columns of the input data frame datE. We assume that datE contains numeric values which will be discretized unless the user chooses the option discretizeColumns=FALSE. The raw (unscaled) mutual information and entropy measures have units "nat", i.e. natural logarithms are used in their definition (base e=2.71..). Several mutual information estimation methods have been proposed in the literature (reviewed in Hausser and Strimmer 2008, Meyer et al 2008). While mutual information networks allows one to detect non-linear relationships between the columns of datE, they may overfit the data if relatively few observations are available. Thus, if the number of rows of datE is smaller than say 200, it may be better to fit a correlation using the function adjacency.
该函数输入一个数据框datE和输出一个列表,其组件对应于不同的加权网络邻接措施,从而为beteween的列datE。确保安装以下R包entropy,minet,infotheo自mutualInfoAdjacency功能利用的entropy功能的R包<X (豪塞尔和Strimmer 2008)和功能从entropy和minet包(Meyer等人,2008年)。 A计权网络的邻接矩阵是一个对称矩阵,其作品就在0和1之间的值。每个加权邻接矩阵包含的列之间的互信息的输入数据框infotheo的缩放版本。我们假设该日起将离散的数字值,除非用户选择的选项“datE。原始(无标度)的互信息和熵措施的单位“NAT”,即自然对数的定义中(基数为e = 2.71 ..)。几种互信息估计方法,在文献中已经提出了(评论在2008年,豪塞尔和Strimmer的Meyer等人,2008年)。虽然互信息网络允许一个检测discretizeColumns=FALSE的列之间的非线性关系,他们可能会过度拟合数据,如果提供相对较少的观测。因此,如果的行数datE是小于发言权200,它可能是更好的,以适应使用相关的功能datE。
值----------Value----------
The function outputs a list with the following components:
该函数输出与以下组件的列表:
参数: Entropy
is a vector whose components report entropy estimates of each column of datE. The natural logarithm (base e) is used in the definition. Using the notation from the Wikipedia entry (http://en.wikipedia.org/wiki/Mutual_information), this vector contains the values Hx where x corresponds to a column in datE.
是一个向量,其组件报告熵估计的每一列datE。自然对数(以e为底)的定义中使用。使用的符号,从维基百科的的条目(http://en.wikipedia.org/wiki/Mutual_information),该向量包含的值HX其中x对应的一列datE。
参数:MutualInformation
is a symmetric matrix whose entries contain the pairwise mutual information measures between the columns of datE. The diagonal of the matrix MutualInformation equals Entropy. In general, the entries of this matrix can be larger than 1, i.e. this is not an adjacency matrix. Using the notation from the Wikipedia entry, this matrix contains the mutual information estimates I(X;Y)
是一个对称矩阵,其作品中包含的列datE两两相互之间的信息手段。的对角线的矩阵MutualInformation等于Entropy,。在一般情况下,该矩阵的条目可以是大于1的,即这不是一个邻接矩阵。从维基百科的条目使用的符号,这个基体中含有的互信息估计I(X,Y)
参数:AdjacencySymmetricUncertainty
is a weighted adjacency matrix whose entries are based on the mutual information. Using the notation from the Wikipedia entry, this matrix contains the mutual information estimates AdjacencySymmetricUncertainty=2*I(X;Y)/(H(X)+H(Y)). Since I(X;X)=H(X), the diagonal elements of AdjacencySymmetricUncertainty equal 1. In general the entries of this symmetric matrix AdjacencySymmetricUncertainty lie between 0 and 1.
是一个加权邻接矩阵,其作品是基于互信息。从维基百科的条目使用的符号,这个基体中含有的互信息估计AdjacencySymmetricUncertainty= 2 * I(X,Y)/(H(X)+ H(Y))。由于I(X,X)= H(X),AdjacencySymmetricUncertainty等于1的对角元素。一般来说本对称矩阵AdjacencySymmetricUncertainty的条目位于0和1之间。
参数:AdjacencyUniversalVersion1
is a weighted adjacency matrix that is a simple function of the AdjacencySymmetricUncertainty. Specifically, AdjacencyUniversalVersion1= AdjacencySymmetricUncertainty/(2- AdjacencySymmetricUncertainty). Note that f(x)= x/(2-x) is a monotonically increasing function on the unit interval [0,1] whose values lie between 0 and 1. The reason why we call it the universal adjacency is that dissUA=1-AdjacencyUniversalVersion1 turns out to be a universal distance function, i.e. it satisfies the properties of a distance (including the triangle inequality) and it takes on a small value if any other distance measure takes on a small value (Kraskov et al 2003).
是一个加权邻接矩阵的AdjacencySymmetricUncertainty这是一个简单的功能。具体来说,AdjacencyUniversalVersion1= AdjacencySymmetricUncertainty/(2- AdjacencySymmetricUncertainty)。注意使f(X)= X /(2-x)的上的单位区间[0,1],其值位于0和1之间的单调递增函数。之所以我们把它称为通用的邻接的是,dissUA = 1 - AdjacencyUniversalVersion1原来是一个普遍的距离函数,即满足的距离(包括三角不等式)的属性,它需要一个较小的值如果任何其他距离测量需要在一个小的值(Kraskov等2003)。
参数:AdjacencyUniversalVersion2
is a weighted adjacency matrix for which dissUAversion2=1-AdjacencyUniversalVersion2 is also a universal distance measure. Using the notation from Wikipedia, the entries of the symmetric matrix AdjacencyUniversalVersion2 are defined as follows AdjacencyUniversalVersion2=I(X;Y)/max(H(X),H(Y)).
是一个加权邻接矩阵dissUAversion2 = 1 - AdjacencyUniversalVersion2也是一个普遍的距离测量。使用符号维基百科的条目对称矩阵AdjacencyUniversalVersion2的定义如下AdjacencyUniversalVersion2= I(X,Y)/最大(H(X),H(Y))。
(作者)----------Author(s)----------
Steve Horvath, Lin Song, Peter Langfelder
参考文献----------References----------
参见----------See Also----------
adjacency
adjacency
实例----------Examples----------
# Load requisite packages. These packages are considered "optional", so WGCNA does not load them[加载所需的包。这些软件包被认为是“可选的”,,所以WGCNA不加载它们]
# automatically.[自动。]
if (require(infotheo, quietly = TRUE) && require(minet, quietly = TRUE) && require(entropy, quietly = TRUE))
{
# Example can be executed.[实施例可以被执行。]
#Simulate a data frame datE which contains 5 columns and 50 observations[模拟一个数据框,其中包含5列和50个观察值。]
m=50
x1=rnorm(m)
r=.5; x2=r*x1+sqrt(1-r^2)*rnorm(m)
r=.3; x3=r*(x1-.5)^2+sqrt(1-r^2)*rnorm(m)
x4=rnorm(m)
r=.3; x5=r*x4+sqrt(1-r^2)*rnorm(m)
datE=data.frame(x1,x2,x3,x4,x5)
#calculate entropy, mutual information matrix and weighted adjacency matrices based on mutual information.[计算熵,互信息矩阵和加权邻接矩阵的基础上相互信息。]
MIadj=mutualInfoAdjacency(datE=datE)
} else
printFlush("Please install packages infotheo, minet and entropy before running this example.");
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|