cmp.cluster(ChemmineR)
cmp.cluster()所属R语言包:ChemmineR
cluster compounds using a descriptor database
原子簇化合物的使用描述数据库
译者:生物统计家园网 机器人LoveR
描述----------Description----------
'cmp.cluster' uses compound descriptors in a database and clusters these compounds based on their pairwise distances. 'cmp.cluster' uses single linkage to measure distance between clusters when it merges clusters. 'cmp.cluster' accepts both a single cutoff and a cutoff vector. By using a cutoff vector, it can generate the same result as hierachical clustering.
“cmp.cluster使用复合描述,在数据库中,这些化合物成对距离的聚类。 “cmp.cluster使用单一的联动来衡量聚类时,它融合聚类之间的距离。 “cmp.cluster接受一个单一的截止截止向量的。通过使用截止向量,它可以产生相同的结果作为分层次聚类。
用法----------Usage----------
cmp.cluster(db, cutoff, is.similarity = TRUE, save.distances = FALSE,
use.distances = NULL, quiet = FALSE, ...)
参数----------Arguments----------
参数:db
The desciptor database, in the format returned by 'cmp.parse'.
返回的desciptor数据库格式,cmp.parse“。
参数:cutoff
The clustering cutoff. Can be a single value or a vector. The cutoff gives the maximum distance between two compounds in order to group them in the same clsuter.
聚类截止。可以是一个单值或者一个矢量。截止给出了两种化合物之间的最大距离,以分组在同一clsuter。
参数:is.similarity
Set when the cutoff supplied is a similarity cutoff. This cutoff is the mimumum similarity value between two compounds such that they will be grouped in the same cluster.
设置时,提供的是一个相似的截止截止。截止是mimumum相似值之间的两种化合物等,他们将在同一个聚类分组。
参数:save.distances
whether to save distance for future clustering. See details below.
是否保存对未来聚类的距离。详见下文。
参数:use.distances
Supply pre-computed distance matrix.
提供预先计算距离矩阵。
参数:quiet
Whether to supress the progress information.
是否写出的进度信息。
参数:...
Further arguments to be passed to 'cmp.similarity' to calculate similarities if necessary.
进一步的参数被传递到“cmp.similarity计算的相似之处,如果有必要。
Details
详情----------Details----------
'cmp.cluster' will compute distance on the fly if 'use.distances' is not set. Furthermore, if 'save.distances' is not set, the distance will never be stored and distance between any two compounds is guaranteed not to be computed twice. Using this method, 'cmp.cluster' can deal with large database, when a distance matrix in memory is not feasible. The speed of this cluster function should be slowed because of using this transient distance value.
“cmp.cluster计算飞行的距离,如果没有设置”use.distances“。此外,如果没有设置“save.distances,距离将永远不会被保存,任何两个化合物之间的距离,保证不会被计算两次。 “cmp.cluster使用这种方法,可以处理大型数据库,当内存中的一个距离矩阵是不可行的。这个聚类功能的速度应放慢,因为使用这个短暂的距离值。
When 'save.distances' is set, 'cmp.cluster' will be forced to compute the distance matrix and save it in memory before doing clustering. This is useful when you need to do further clustering in the future and do not want the distance to be re-computed then. Set 'save.distances' to TRUE if you only want to force the clustering to use this 2-step approach; otherwise, set it to the filename under which you want the distance matrix to be saved. After you save it, when you need to reuse the distance matrix, you can 'load' it, and supply to 'cmp.cluster' via the 'use.distances' argument.
当save.distances“设置,将被迫”cmp.cluster来计算距离矩阵,并保存在内存中,然后做聚类。这是有用的,当你需要在未来做进一步的聚类和不想要的距离,然后重新计算。设置为TRUE save.distances“如果你只是想迫使聚类使用这2步的方法,否则,将它设置为你想要的距离矩阵保存下的文件名。之后保存它,当你需要重用的距离矩阵,你可以“载入”,并通过“use.distances的说法为”cmp.cluster“提供。
'cmp.cluster' supports vector of cutoffs. When you have multiple cutoffs, 'cmp.cluster' still guarantees that pairwise distances will never be recomputed, and no copy of distances is kept in memory. It is guaranteed to be as fast as calling 'cmp.cluster' with a single cutoff that results in the longest processing time, plus some small overhead linear in that processing time.
“cmp.cluster支持截止向量。当你有多个截断,cmp.cluster仍然保证成对距离将永远不会被重新计算,并没有距离的副本保存在内存中。这是保证调用cmp.cluster一个单一的截止到快,处理时间最长的结果,再加上一些小的开销,处理时间呈线性关系。
值----------Value----------
Returns a data frame. Besides a variable giving compound ID, each of the other variables in the data frame will either give the cluster IDs of compounds under some clustering cutoff, or the size of clusters that the compounds belong to. When N cutoffs are given, in total 2*N+1 variables will be generated, with N of them giving the cluster ID of each compound under each of the N cutoffs, and the other N of them giving the cluster size under each of the N cutoffs. The rows are sorted by the cluster sizes.
返回一个数据框。除了给一个变量化合物的ID,每个数据框中的其他变量,将给予簇化合物的ID下一些聚类截止,或该化合物属于簇的大小。当N截止,共有2 * N个+1变量将产生,与N他们给每个化合物下的N截断每个聚类ID,并给予他们其他N下每个簇的大小列印截断。簇大小的行进行排序。
作者(S)----------Author(s)----------
Y. Eddie Cao, Li-Chang Cheng
参见----------See Also----------
cmp.parse1, cmp.parse, cmp.search, cmp.similarity
cmp.parse1,cmp.parse,cmp.search,cmp.similarity
举例----------Examples----------
## Load sample SD file[#负载样品的SD文件]
# data(sdfsample); sdfset <- sdfsample[数据(sdfsample); sdfset < - sdfsample]
## Generate atom pair descriptor database for searching[#生成原子对数据库搜索描述]
# apset <- sdf2ap(sdfset) [< - sdf2ap apset(sdfset)]
## Loads same atom pair sample data set provided by library[#加载相同的原子对样本数据集,由图书馆提供]
data(apset)
db <- apset
## cluster using multiple cutoffs[#聚类使用多个截止]
clusters <- cmp.cluster(db, cutoff=c(0.5, 0.85))
## or save the distance before clustering:[#或保存前聚类的距离:]
clusters <- cmp.cluster(db, cutoff=0.65, save.distances="distmat.rda")
# later, you can load the matrix and pass it to do clustering. Load will load[以后,你可以加载矩阵,并把它传递给做聚类。负荷将载入]
# the variable 'distmat' that contains the distance matrix[变量的distmat“包含的距离矩阵]
load("distmat.rda")
clusters <- cmp.cluster(db, cutoff=0.60, use.distances=distmat)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|