R语言 RTools4TB包 DBFMCL()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-26 13:33:25

DBFMCL(RTools4TB)
DBFMCL()所属R语言包：RTools4TB

                                    The "Density Based Filtering and Markov CLustering" algorithm (DBF-MCL).
                                       “密度的基础上筛选和马尔可夫聚类算法（DBF的韧带）。

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

DBF-MCL is a tree-steps adaptative algorithm (http://tagc.univ-mrs.fr/tbrowser/) that (i)find elements located in dense areas (DBF), (ii)uses selected items to construct a graph, (iii)performs graph partitioning using the Markov CLustering Algorithm (MCL).
的DBF-韧带是一个自适应算法树步骤（http://tagc.univ-mrs.fr/tbrowser/），（我）找到元素位于密集的区域（DBF）文件，（二）使用选定的项目，以构建一个图（三）执行图分割使用马尔可夫聚类算法（MCL）。

This function requires installation of the mcl program (http://www.micans.org/mcl). See "Warnings" section for more informations.
此功能需要安装MCL计划（http://www.micans.org/mcl）。见“警告”一节更详细的信息。

用法----------Usage----------

DBFMCL(data = NULL, filename = NULL, path = ".", name = NULL, distance.method = c("pearson", "spearman", "euclidean", "spm", "spgm"),
clustering = TRUE, silent = FALSE, verbose = TRUE, k = 150, random = 3, memory.used = 1024, fdr = 10, inflation = 2.0, set.seed = 123, returnRank = FALSE)

参数----------Arguments----------

参数：data
a matrix, data.frame or ExpressionSet object.
matrix，data.frame或ExpressionSet对象。

参数：filename
a character string representing the file name.
代表文件名字符串。

参数：name
a prefix for the names of the intermediary files created by DBF and MCL.
创建DBF和韧带的中介文件名称的前缀。

参数：path
a character string representing the data directory where intermediary files are to be stored. Default to current working directory.
一个字符串，代表中介文件将被存储的数据目录。默认为当前工作目录。

参数：distance.method
a method to compute the distance to the k-th nearest neighbor. One of "pearson" (Pearson's correlation coefficient-based distance), "spearman" (Spearman's rho-based distance), "euclidean". The "spm" distance corresponds to the arithmetic mean

"pearson"+"spearman")/2 whereas "spgm" is the geometric mean : sqrt("pearson"*"spearman).
一种方法来计算距离的k个近邻。 “皮尔逊（Pearson相关系数为基础的距离）”，“矛”（斯皮尔曼的RHO基于距离），“欧几里德”之一。对应的算术平均值“SPM”的距离：（皮尔逊“+”矛“）/ 2，而”spgm“是几何平均数：SQRT（”培“*”斯皮尔曼）。

参数：clustering
indicates whether partitioning step (MCL) should be applied to the data. If clustering = FALSE, the function returns a DBFMCLresult object that contains informative elements (as detected by the DBF step) coerced into a single cluster.
指示是否应适用于数据分区步骤（MCL）。如果clustering = FALSE，该函数返回DBFMCLresult对象，它包含的信息元素（由DBF步骤检测）裹挟到一个单一的聚类。

参数：silent
if set to TRUE, the progression of distance matrix calculation is not displayed.
如果设置为TRUE，距离矩阵计算的进展将不会显示。

参数：verbose
if set to TRUE the function runs verbosely.
如果设置为TRUE，函数运行冗长。

参数：k
the neighborhood size.
附近的大小。

参数：random
the number of simulated distributions S to compute. By default random = 3.
模拟分布的数S计算。默认情况下random = 3。

参数：memory.used
size of the memory used to store part of the distance matrix. The subsequent sub-matrix is used to computed simulated distances to the k-th nearest neighbor (see detail section).
大小的内存用于存储距离矩阵的一部分。随后子矩阵是用来模拟计算距离的k个近邻（见细节部分）。

参数：fdr
an integer value corresponding to the false discovery rate (range: 0 to 100).
一个整数值，对应的错误发现率（范围：0到100）。

参数：inflation
the main control of MCL. Inflation affects cluster granularity. It is usually chosen somewhere in the range [1.2-5.0]. inflation = 5.0 will tend to result in fine-grained clusterings, and whereas inflation = 1.2 will tend to result in very coarse grained clusterings. By default, inflation = 2.0. Default setting gives very good results for microarray data when k is set between 70 and 250.
韧带主要控制。通货膨胀影响聚类的粒度。它通常在某处选择范围[1.2-5.0]。 inflation = 5.0往往会导致细粒度的聚类，而inflation = 1.2往往会导致非常粗粒聚类。默认情况下，inflation = 2.0。默认设置芯片的数据提供了很好的结果时，k是70和250之间。

参数：set.seed
specify seeds for random number generator.
指定随机数发生器的种子。

参数：returnRank
allows to obtain in the DBFMCLresult object, a rank's matrix. The output files are processed using the normalization argument.
允许获得DBFMCLresult对象，职级的矩阵。使用normalization参数，输出文件的处理。

Details

详情----------Details----------

When analyzing a noisy dataset, one is interested in isolating dense regions as they are populated with genes/elements that display weak distances to their nearest neighbors (i.e. strong profile similarities). To isolate these regions DBF-MCL computes, for each gene/element, the distance with its kth nearest neighbor (DKNN).In order to define a critical DKNN value that will depend on the dataset and below which a gene/element will be considered as falling in a dense area, DBF-MCL computes simulated DKNN values by using an empirical randomization procedure. Given a dataset containing n genes and p samples, a simulated DKNN value is obtained by sampling n distance values from the gene-gene distance matrix D and by extracting the kth-smallest value. This procedure is repeated n times to obtain a set of simulated DKNN values S. Computed distributions of simulated DKNN are used to compute a FDR value for each observed DKNN value. The critical value of DKNN is the one for which a user-defined FDR value (typically 10%) is observed. Genes with DKNN value below this threshold are selected and used to construct a graph. In this graph, edges are constructed between two genes (nodes) if one of them belongs to the k-nearest neighbors of the other. Edges are weighted based on the respective coefficient of correlation (i.e., similarity) and the graph obtained is partitioned using the Markov CLustering Algorithm (MCL).
分析嘈杂的数据集时，一个是隔离密集的区域，因为它们与基因/元素，显示弱距离他们的近邻（即强烈的个人资料相似之处）填充。隔离这些区域的DBF韧带计算，每个基因/元素，它的第k近邻（DKNN）的距离。为了定义一个的关键DKNN价值将取决于数据集，将被视为低于该基因/元素下降，密集区的DBF-韧带计算通过使用实证的随机过程模拟DKNN值。由于含N基因和P样品的数据集，模拟DKNN值从基因 - 基因距离矩阵D的采样N的距离值和提取KTH-最小值。此过程重复n次，获得了一套模拟DKNN，S.电脑模拟DKNN的分布，用来计算为FDR每个观测DKNN值值值。临界值的DKNN是一个用户定义的FDR值（通常为10％）观察。 DKNN值低于此阈值的基因选择和使用，构建一个图形。在这个图中，边建两个基因（节点）之间，如果其中一人属于其他的K-近邻。边缘加权的基础上各自的相关系数（即相似）和分区图获得使用马尔可夫聚类算法（MCL）。

值----------Value----------

a DBFMCLresults class object.
DBFMCLresults类对象。

警告----------Warnings----------

With the current implementation, this function only works only on UNIX-like plateforms.
与目前的执引号况，此功能只适用于类UNIX plateforms。

MCL should be installed. One can used the following command lines in a terminal:
MCL应安装。一个可以用在终端下面的命令行：

# Download the latest version of mcl (the script has been tested successfully with the 06-058 version).
# Download the latest version of mcl (the script has been tested successfully with the 06-058 version).

wget http://micans.org/mcl/src/mcl-latest.tar.gz
wget http://micans.org/mcl/src/mcl-latest.tar.gz

# Uncompress and install mcl
# Uncompress and install mcl

tar xvfz mcl-latest.tar.gz
tar xvfz mcl-latest.tar.gz

cd mcl-xx-xxx
cd mcl-xx-xxx

./configure
./configure

make
make

sudo make install
sudo make install

# You should get mcl in your path
# You should get mcl in your path

mcl -h
mcl -h

作者（S）----------Author(s)----------

Bergon A., Lopez F., Textoris J., Granjeaud S. and Puthier D.

参考文献----------References----------

flexible toolbox to explore productively the transcriptional landscape of the Gene Expression Omnibus database. PLoSONE, 2008;3(12):e4001.

参见----------See Also----------

createSignatures4TB
createSignatures4TB

举例----------Examples----------

## Not run: [＃无法运行：]
## with an artificial dataset[＃与人工数据集]

m <- matrix(rnorm(80000), nc=20)
m[1:100,1:10] <- m[1:100,1:10] + 4
m[101:200,11:20] <- m[101:200,11:20] + 3
m[201:300,5:15] <- m[201:300,5:15] + -2

res <- DBFMCL(data = m, distance.method = "pearson", clustering = TRUE, k = 25)
plotGeneExpProfiles(res)
plotGeneExpProfiles(res,signatures=1)

## with a real dataset[＃一个真实的数据集]
library(ALL)
data(ALL)
sub <- exprs(ALL)[1:3000,]

#First, we will normalize the data set using the doNormalScore function. [首先，我们将标准化数据集使用doNormalScore功能。]
subNorm <- doNormalScore(sub)
res <- DBFMCL(subNorm, distance.method="pearson", memory=512)

#The results are stored in an instance of class DBFMCLresult.[结果被存储在一个类DBFMCLresult实例。]
class(res)
res

# The expression matrix is stored in the data slot. [表达矩阵存储的数据槽。]
# This matrix contains only genes detected as informative (that is falling into a cluster).[此矩阵包含唯一的基因检测，信息（即下降到聚类）。]
head(res@data[,1:2])

# The partitioning results are stored in the cluster slot.[分区的结果存储在聚类槽。]
slotNames(res)

# Here, 3 TS were found.[这里被发现，3 TS。]
res@size

# The following instruction can be used to get the expression matrix corresponding to the first TS. [可以用下面的指令，得到相应的第一TS表达矩阵。]
res@data[res@cluster ==1,]

# The high level function plotGeneExpProfilescan be used to visualize, [高层次的的功能plotGeneExpProfilescan被用于可视化，]
# for instance, gene expression profiles corresponding to the first signature.[例如，基因的表达谱对应的第一个签名。]
plotGeneExpProfiles(res, sign=1)

#To stored the partitioning results onto disk (as a tab-delimited file), [要存储到磁盘上的分区结果（制表符分隔的文件），]
# use the writeDBFMCLresult function as show below.[作为显示在下面的使用writeDBFMCLresult功能。]
writeDBFMCLresult(res, filename.out="ALL.sign.txt")

## End(Not run)[＃结束（不运行）]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 RTools4TB包 DBFMCL()函数中文帮助文档(中英文对照)

浏览过的版块