similarity(sets)
similarity()所属R语言包:sets
Similarity and Dissimilarity Functions
相似和相异功能
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Similarities and dissimilarities for (generalized) sets.
(广义)集的相似点和不同点。
用法----------Usage----------
set_similarity(x, y, method = "Jaccard")
gset_similarity(x, y, method = "Jaccard")
cset_similarity(x, y, method = "Jaccard")
set_dissimilarity(x, y, method = c("Jaccard", "Manhattan", "Euclidean", "L1", "L2"))
gset_dissimilarity(x, y, method = c("Jaccard", "Manhattan", "Euclidean", "L1", "L2"))
cset_dissimilarity(x, y, method = c("Jaccard", "Manhattan", "Euclidean", "L1", "L2"))
参数----------Arguments----------
参数:x, y
Two (generalized/customizable) sets.
两集(广义/定制)。
参数:method
Character string specifying the proximity method (see below).
字符串指定的接近方法(见下文)。
Details
详细信息----------Details----------
For two generalized sets X and Y, the Jaccard similarity is |X intersect Y| / |X U Y| where |.| denotes the cardinality for generalized sets (sum of memberships). The Jaccard dissimilarity is 1 minus the similarity.
对于两个广义集X和Y,Jaccard相似|X intersect Y| / |X U Y|其中|.|表示的基数的广义集(总和的会员资格)。 Jaccard相异1负的相似性。
The L1 (or Manhattan) and L2 (or Euclidean) dissimilarities are defined as follows. For two fuzzy multisets A and B on a given universe X with elements x, let M_A(x) and M_B(x) be functions returning the memberships of an element x in sets A and B, respectively. The memberships are returned in standard form, i.e. as an infinite vector of decreasing membership values, e.g. (1, 0.3, 0, 0, ...). Let M_A(x)_i and M_B(x)_i denote the ith components of these membership vectors. Then the L1 distance is defined as:
L1(或Manhattan)和L2(或Euclidean)相异的定义如下。对于两个模糊多重A和B在一个给定的宇宙X的元素x,让M_A(x)和M_B(x)函数返回的会员资格,元素x在台A和B,分别。返回会籍的标准形式,即作为一个无限向量隶属度值下降,例如(1, 0.3, 0, 0, ...)。让我们M_A(x)_i和M_B(x)_i表示i个分量的这些成员向量。然后L1的距离被定义为:
and the L2
和L2
值----------Value----------
A numeric value (similarity or dissimilarity, as specified).
(相似或相异的规定,)的数值。
源----------Source----------
T. Matthe, R. De Caluwe, G. de Tre, A. Hallez, J. Verstraete, M. Leman, O. Cornelis, D. Moelants, and J. Gansemans (2006), Similarity Between Multi-valued Thesaurus Attributes: Theory and Application in Multimedia Systems, Flexible Query Answering Systems, Lecture Notes in Computer Science, Springer, 331–342.
T.德Caluwe,Matthe,R. G.去滓A. Hallez的,J. Verstraete,M.莱曼,O.科内利斯,D. Moelants,和J. Gansemans的(2006年),多值词库属性的相似性:理论与应用多媒体系统,灵活的查询系统,讲义在计算机科学,施普林格,331-342。
K. Mizutani, R. Inokuchi, and S. Miyamoto (2008), Algorithms of Nonlinear Document Clustering Based on Fuzzy Multiset Model, International Journal of Intelligent Systems, 23, 176–198.
K.水谷,河井口,与宫本(2008年)基础上,国际智能系统杂志,23,176-198模糊多集模型的非线性文档聚类算法。
参见----------See Also----------
set.
set。
实例----------Examples----------
A <- set("a", "b", "c")
B <- set("c", "d", "e")
set_similarity(A, B)
set_dissimilarity(A, B)
A <- gset(c("a", "b", "c"), c(0.3, 0.7, 0.9))
B <- gset(c("c", "d", "e"), c(0.2, 0.4, 0.5))
gset_similarity(A, B, "Jaccard")
gset_dissimilarity(A, B, "Jaccard")
gset_dissimilarity(A, B, "L1")
gset_dissimilarity(A, B, "L2")
A <- gset(c("a", "b", "c"), list(c(0.3, 0.7), 0.1, 0.9))
B <- gset(c("c", "d", "e"), list(0.2, c(0.4, 0.5), 0.8))
gset_similarity(A, B, "Jaccard")
gset_dissimilarity(A, B, "Jaccard")
gset_dissimilarity(A, B, "L1")
gset_dissimilarity(A, B, "L2")
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|