找回密码
 注册
查看: 828|回复: 0

R语言 RTools4TB包 DBFMCL()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-26 13:33:25 | 显示全部楼层 |阅读模式
DBFMCL(RTools4TB)
DBFMCL()所属R语言包:RTools4TB

                                        The "Density Based Filtering and Markov CLustering" algorithm (DBF-MCL).
                                         “密度的基础上筛选和马尔可夫聚类算法(DBF的韧带)。

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

DBF-MCL is a tree-steps adaptative algorithm (http://tagc.univ-mrs.fr/tbrowser/) that (i)find elements located in dense areas (DBF), (ii)uses selected items to construct a graph, (iii)performs graph partitioning using the Markov CLustering Algorithm (MCL).
的DBF-韧带是一个自适应算法树步骤(http://tagc.univ-mrs.fr/tbrowser/),(我)找到元素位于密集的区域(DBF)文件,(二)使用选定的项目,以构建一个图(三)执行图分割使用马尔可夫聚类算法(MCL)。

This function requires installation of the mcl program (http://www.micans.org/mcl). See "Warnings" section for more informations.
此功能需要安装MCL计划(http://www.micans.org/mcl)。见“警告”一节更详细的信息。


用法----------Usage----------


DBFMCL(data = NULL, filename = NULL, path = ".", name = NULL, distance.method = c("pearson", "spearman", "euclidean", "spm", "spgm"),
clustering = TRUE, silent = FALSE, verbose = TRUE, k = 150, random = 3, memory.used = 1024, fdr = 10, inflation = 2.0, set.seed = 123, returnRank = FALSE)



参数----------Arguments----------

参数:data
a matrix, data.frame or ExpressionSet object.
matrix,data.frame或ExpressionSet对象。


参数:filename
a character string representing the file name.
代表文件名字符串。


参数:name
a prefix for the names of the intermediary files created by DBF and MCL.
创建DBF和韧带的中介文件名称的前缀。


参数:path
a character string representing the data directory where intermediary files are to be stored. Default to current working directory.
一个字符串,代表中介文件将被存储的数据目录。默认为当前工作目录。


参数:distance.method
a method to compute the distance to the k-th nearest neighbor. One of "pearson" (Pearson's correlation coefficient-based distance), "spearman" (Spearman's rho-based distance), "euclidean". The "spm" distance corresponds to the arithmetic mean "pearson"+"spearman")/2 whereas "spgm" is the geometric mean : sqrt("pearson"*"spearman).
一种方法来计算距离的k个近邻。 “皮尔逊(Pearson相关系数为基础的距离)”,“矛”(斯皮尔曼的RHO基于距离),“欧几里德”之一。对应的算术平均值“SPM”的距离:(皮尔逊“+”矛“)/ 2,而”spgm“是几何平均数:SQRT(”培“*”斯皮尔曼)。


参数:clustering
indicates whether partitioning step (MCL) should be applied to the data. If clustering = FALSE, the function returns a DBFMCLresult object that contains informative elements (as detected by the DBF step) coerced into a single cluster.   
指示是否应适用于数据分区步骤(MCL)。如果clustering = FALSE,该函数返回DBFMCLresult对象,它包含的信息元素(由DBF步骤检测)裹挟到一个单一的聚类。


参数:silent
if set to TRUE, the progression of distance matrix calculation is not displayed.
如果设置为TRUE,距离矩阵计算的进展将不会显示。


参数:verbose
if set to TRUE the function runs verbosely.
如果设置为TRUE,函数运行冗长。


参数:k
the neighborhood size.
附近的大小。


参数:random
the number of simulated distributions S to compute. By default random = 3.
模拟分布的数S计算。默认情况下random = 3。


参数:memory.used
size of the memory used to store part of the distance matrix. The subsequent sub-matrix is used to computed simulated distances to the k-th nearest neighbor (see detail section).
大小的内存用于存储距离矩阵的一部分。随后子矩阵是用来模拟计算距离的k个近邻(见细节部分)。


参数:fdr
an integer value corresponding to the false discovery rate (range: 0 to 100).
一个整数值,对应的错误发现率(范围:0到100)。


参数:inflation
the main control of MCL. Inflation affects cluster granularity. It is usually chosen somewhere in the range [1.2-5.0]. inflation = 5.0 will tend to result in fine-grained clusterings, and whereas inflation = 1.2 will tend to result in very coarse grained clusterings. By default, inflation = 2.0. Default setting gives very good results for microarray data when k is set between 70 and 250.
韧带主要控制。通货膨胀影响聚类的粒度。它通常在某处选择范围[1.2-5.0]。 inflation = 5.0往往会导致细粒度的聚类,而inflation = 1.2往往会导致非常粗粒聚类。默认情况下,inflation = 2.0。默认设置芯片的数据提供了很好的结果时,k是70和250之间。


参数:set.seed
specify seeds for random number generator.
指定随机数发生器的种子。


参数:returnRank
allows to obtain in the DBFMCLresult object, a rank's matrix. The output files are processed using the normalization argument.
允许获得DBFMCLresult对象,职级的矩阵。使用normalization参数,输出文件的处理。


Details

详情----------Details----------

When analyzing a noisy dataset, one is interested in isolating dense regions as they are populated with genes/elements that display weak distances to their nearest neighbors (i.e. strong profile similarities). To isolate these regions DBF-MCL computes, for each gene/element, the distance with its kth nearest neighbor (DKNN).In order to define a critical DKNN value that will depend on the dataset and below which a gene/element will be considered as falling in a dense area, DBF-MCL computes simulated DKNN values by using an empirical randomization procedure. Given a dataset containing n genes and p samples, a simulated DKNN value is obtained by sampling n distance values from the gene-gene distance matrix D and by extracting the kth-smallest value. This procedure is repeated n times to obtain a set of simulated DKNN values S. Computed distributions of simulated DKNN are used to compute a FDR value for each observed DKNN value. The critical value of DKNN is the one for which a user-defined FDR value (typically 10%) is observed. Genes with DKNN value below this threshold are selected and used to construct a graph. In this graph, edges are constructed between two genes (nodes) if one of them belongs to the k-nearest neighbors of the other. Edges are weighted based on the respective coefficient of correlation (i.e., similarity) and the graph obtained is partitioned using the Markov CLustering Algorithm (MCL).
分析嘈杂的数据集时,一个是隔离密集的区域,因为它们与基因/元素,显示弱距离他们的近邻(即强烈的个人资料相似之处)填充。隔离这些区域的DBF韧带计算,每个基因/元素,它的第k近邻(DKNN)的距离。为了定义一个的关键DKNN价值将取决于数据集,将被视为低于该基因/元素下降,密集区的DBF-韧带计算通过使用实证的随机过程模拟DKNN值。由于含N基因和P样品的数据集,模拟DKNN值从基因 - 基因距离矩阵D的采样N的距离值和提取KTH-最小值。此过程重复n次,获得了一套模拟DKNN,S.电脑模拟DKNN的分布,用来计算为FDR每个观测DKNN值值值。临界值的DKNN是一个用户定义的FDR值(通常为10%)观察。 DKNN值低于此阈值的基因选择和使用,构建一个图形。在这个图中,边建两个基因(节点)之间,如果其中一人属于其他的K-近邻。边缘加权的基础上各自的相关系数(即相似)和分区图获得使用马尔可夫聚类算法(MCL)。


值----------Value----------

a DBFMCLresults class object.
DBFMCLresults类对象。


警告----------Warnings----------

With the current implementation, this function only works only on UNIX-like plateforms.
与目前的执引号况,此功能只适用于类UNIX plateforms。

MCL should be installed. One can used the following command lines in a terminal:
MCL应安装。一个可以用在终端下面的命令行:

# Download the latest version of mcl (the script has been tested successfully with the 06-058 version).
# Download the latest version of mcl (the script has been tested successfully with the 06-058 version).

wget http://micans.org/mcl/src/mcl-latest.tar.gz
wget http://micans.org/mcl/src/mcl-latest.tar.gz

# Uncompress and install mcl
# Uncompress and install mcl

tar xvfz mcl-latest.tar.gz
tar xvfz mcl-latest.tar.gz

cd mcl-xx-xxx
cd mcl-xx-xxx

./configure
./configure

make
make

sudo make install
sudo make install

# You should get mcl in your path
# You should get mcl in your path

mcl -h
mcl -h


作者(S)----------Author(s)----------


Bergon A., Lopez F., Textoris J., Granjeaud S. and Puthier D.



参考文献----------References----------

flexible toolbox to explore productively the transcriptional landscape of the Gene Expression Omnibus database. PLoSONE, 2008;3(12):e4001.


参见----------See Also----------

createSignatures4TB
createSignatures4TB


举例----------Examples----------


## Not run: [#无法运行:]
## with an artificial dataset[#与人工数据集]

m <- matrix(rnorm(80000), nc=20)
m[1:100,1:10] <- m[1:100,1:10] + 4
m[101:200,11:20] <- m[101:200,11:20] + 3
m[201:300,5:15] <- m[201:300,5:15] + -2

res <- DBFMCL(data = m, distance.method = "pearson", clustering = TRUE, k = 25)
plotGeneExpProfiles(res)
plotGeneExpProfiles(res,signatures=1)

## with a real dataset[#一个真实的数据集]
library(ALL)
data(ALL)
sub <- exprs(ALL)[1:3000,]

#First, we will normalize the data set using the doNormalScore function. [首先,我们将标准化数据集使用doNormalScore功能。]
subNorm <- doNormalScore(sub)
res <- DBFMCL(subNorm, distance.method="pearson", memory=512)

#The results are stored in an instance of class DBFMCLresult.[结果被存储在一个类DBFMCLresult实例。]
class(res)
res

# The expression matrix is stored in the data slot. [表达矩阵存储的数据槽。]
# This matrix contains only genes detected as informative (that is falling into a cluster).[此矩阵包含唯一的基因检测,信息(即下降到聚类)。]
head(res@data[,1:2])

# The partitioning results are stored in the cluster slot.[分区的结果存储在聚类槽。]
slotNames(res)

# Here, 3 TS were found.[这里被发现,3 TS。]
res@size

# The following instruction can be used to get the expression matrix corresponding to the first TS. [可以用下面的指令,得到相应的第一TS表达矩阵。]
res@data[res@cluster ==1,]

# The high level function plotGeneExpProfilescan be used to visualize, [高层次的的功能plotGeneExpProfilescan被用于可视化,]
# for instance, gene expression profiles corresponding to the first signature.[例如,基因的表达谱对应的第一个签名。]
plotGeneExpProfiles(res, sign=1)

#To stored the partitioning results onto disk (as a tab-delimited file), [要存储到磁盘上的分区结果(制表符分隔的文件),]
# use the writeDBFMCLresult function as show below.[作为显示在下面的使用writeDBFMCLresult功能。]
writeDBFMCLresult(res, filename.out="ALL.sign.txt")


## End(Not run)[#结束(不运行)]

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-24 20:54 , Processed in 0.040314 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表