R语言:dist()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-16 19:51:13

dist(stats)
dist()所属R语言包：stats

                                    Distance Matrix Computation
                                       距离矩阵计算

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This function computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix.
此函数计算并返回使用指定的距离测量，计算一个数据矩阵中的行之间的距离，计算距离矩阵。

用法----------Usage----------

dist(x, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)

as.dist(m, diag = FALSE, upper = FALSE)
## Default S3 method:[默认方法]
as.dist(m, diag = FALSE, upper = FALSE)

## S3 method for class 'dist'
print(x, diag = NULL, upper = NULL,
   digits = getOption("digits"), justify = "none",
   right = TRUE, ...)

## S3 method for class 'dist'
as.matrix(x, ...)

参数----------Arguments----------

参数：x
a numeric matrix, data frame or "dist" object.
数字矩阵，数据框或"dist"对象。

参数：method
the distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". Any unambiguous substring can be given.
要使用的距离度量。这必须是"euclidean"，"maximum"，"manhattan"，"canberra"，"binary"或"minkowski"。可以给出任何明确的子串。

参数：diag
logical value indicating whether the diagonal of the distance matrix should be printed by print.dist.
逻辑值，该值指示是否应印print.dist距离矩阵对角线。

参数：upper
logical value indicating whether the upper triangle of the distance matrix should be printed by print.dist.
逻辑值指示是否应印print.dist距离矩阵的上三角。

参数：p
The power of the Minkowski distance.
Minkowski距离的力量。

参数：m
An object with distance information to be converted to a "dist" object.  For the default method, a "dist" object, or a matrix (of distances) or an object which can be coerced to such a matrix using as.matrix().  (Only the lower triangle of the matrix is used, the rest is ignored).
距离信息的对象将被转换成一个"dist"对象。对于默认的方法，使用"dist"对象，或矩阵（距离），或可强制转换为这样一个矩阵对象as.matrix()。（只有下三角矩阵的使用，其余的将被忽略）。

参数：digits, justify
passed to format inside of print().
传递给formatprint()内。

参数：right, ...
further arguments, passed to other methods.
进一步的论据，通过其他方法。

Details

详情----------Details----------

Available distance measures are (written for two vectors x and y):
可用距离措施（两个向量的书面x和y）：

euclidean: Usual square distance between the two
euclidean：常住两者之间的距离平方

maximum: Maximum distance between two components of x
maximum：x两个组件之间的最大距离

manhattan: Absolute distance between the two vectors
manhattan：两个向量之间的绝对距离

sum(|x_i - y_i| / |x_i + y_i|). Terms with zero numerator and denominator are omitted from the sum and treated as if the values were missing.
sum(|x_i - y_i| / |x_i + y_i|)。省略的总和为零的分子和分母的条款处理，如果值丢失。

This is intended for non-negative values (e.g. counts): taking the absolute value of the denominator is a 1998 R modification to avoid negative distances.
这是用于非负值（如计数）：分母的绝对值是1998年ŕ修改，以避免负的距离。

binary: (aka asymmetric binary): The vectors are regarded as binary bits, so non-zero elements are "on" and zero elements are "off".  The distance is the proportion of bits in which only one is on amongst those in
binary：（又称不对称二进制）：向量为二进制位，所以非零元素是对零个元素是小康。距离是位的比例，其中只有一个是其中包括那些在

minkowski: The p norm, the pth root of the
minkowski：p规范，p次方根

Missing values are allowed, and are excluded from all computations involving the rows within which they occur. Further, when Inf values are involved, all pairs of values are excluded when their contribution to the distance gave NaN or NA. If some columns are excluded in calculating a Euclidean, Manhattan, Canberra or Minkowski distance, the sum is scaled up proportionally to the number of columns used.  If all pairs are excluded when calculating a particular distance, the value is NA.
遗漏值是允许的，并从所有涉及计算它们出现的行内排除。此外，涉及Inf值时，所有值对被排除在外时，他们贡献的距离了NaN或NA。如果排除在计算欧几里德，曼哈顿，堪培拉或可夫斯基距离，一些列的总和缩放比例使用的列数。如果所有对被排除计算一个特定的距离时，该值是NA。

The "dist" method of as.matrix() and as.dist() can be used for conversion between objects of class "dist" and conventional distance matrices.
"dist"方法as.matrix()和as.dist()类"dist"“常规距离矩阵的对象可以用于转换之间。

as.dist() is a generic function.  Its default method handles objects inheriting from class "dist", or coercible to matrices using as.matrix().  Support for classes representing distances (also known as dissimilarities) can be added by providing an as.matrix() or, more directly, an as.dist method for such a class.
as.dist()是一个通用的功能。其默认的方法处理继承使用"dist"类as.matrix()，或强制转换到矩阵的对象。为代表的距离（也称为异同）班的支持可以添加提供as.matrix()，或更直接的是，as.dist这样一个类的方法。

值----------Value----------

dist returns an object of class "dist".
dist类"dist"返回一个对象。

The lower triangle of the distance matrix stored by columns in a vector, say do. If n is the number of observations, i.e., n <- attr(do, "Size"), then for i < j ≤ n, the dissimilarity between (row) i and j is do[n*(i-1) - i*(i-1)/2 + j-i]. The length of the vector is n*(n-1)/2, i.e., of order n^2.
距离矩阵下三角列的形式存储在一个向量中说，do。 n如果的若干意见，即n <- attr(do, "Size")，然后i < j ≤ n，（行）i和j之间的不同是：do[n*(i-1) - i*(i-1)/2 + j-i]。向量的长度是n*(n-1)/2，即为了n^2。

The object has the following attributes (besides "class" equal to "dist"):
对象具有以下属性（除了"class"等于"dist"）：

参数：Size
integer, the number of observations in the dataset.
整数，观测数据集的数目。

参数：Labels
optionally, contains the labels, if any, of the observations of the dataset.
可选，包含标签数据集的意见，如果有的话。

参数：Diag, Upper
logicals corresponding to the arguments diag and upper above, specifying how the object should be printed.
逻辑值对应的论点diag和upper以上，指定对象应印。

参数：call
optionally, the call used to create the object.
可选的call用于创建对象。

参数：method
optionally, the distance method used; resulting from dist(), the (match.arg()ed) method argument.
可选的距离的方法; dist()（match.arg()版）method参数。

参考文献----------References----------

The New S Language. Wadsworth & Brooks/Cole.
Multivariate Analysis. Academic Press.
Modern Multidimensional Scaling.  Theory and Applications. Springer.

参见----------See Also----------

daisy in the cluster package with more possibilities in the case of mixed (continuous / categorical) variables. hclust.
daisycluster包混合（连续/分类）变量的情况下，更多的可能性。 hclust。

举例----------Examples----------

require(graphics)

x <- matrix(rnorm(100), nrow=5)
dist(x)
dist(x, diag = TRUE)
dist(x, upper = TRUE)
m <- as.matrix(dist(x))
d <- as.dist(m)
stopifnot(d == dist(x))

## Use correlations between variables "as distance"[＃使用变量之间的相关性“距离”]
dd <- as.dist((1 - cor(USJudgeRatings))/2)
round(1000 * dd) # (prints more nicely)[（版画更漂亮）]
plot(hclust(dd)) # to see a dendrogram of clustered variables[看到一个聚集变量的聚类分析]

## example of binary and canberra distances.[＃例如二进制和堪培拉的距离。]
x <- c(0, 0, 1, 1, 1, 1)
y <- c(1, 0, 1, 1, 0, 1)
dist(rbind(x,y), method= "binary")
## answer 0.4 = 2/5[＃回答0.4 = 2/5]
dist(rbind(x,y), method= "canberra")
## answer 2 * (6/5)[＃回答2 *（6/5）]

## To find the names[＃要查找的名称]
labels(eurodist)

## Examples involving "Inf" :[＃例子涉及“INF”：]
## 1)[＃1）]
x[6] <- Inf
(m2 <- rbind(x,y))
dist(m2, method="binary")# warning, answer 0.5 = 2/4[警告，答案0.5 = 2/4]
## These all give "Inf":[＃这些都给予“INF”：]
stopifnot(Inf == dist(m2, method= "euclidean"),
      Inf == dist(m2, method= "maximum"),
      Inf == dist(m2, method= "manhattan"))
##  "Inf" is same as very large number:[＃“INF”是非常大的数目相同的：]
x1 <- x; x1[6] <- 1e100
stopifnot(dist(cbind(x ,y), method="canberra") ==
print(dist(cbind(x1,y), method="canberra")))

## 2)[＃2）]
y[6] <- Inf #-> 6-th pair is excluded[ - > 6日对被排除在外]
dist(rbind(x,y), method="binary") # warning; 0.5[警告; 0.5]
dist(rbind(x,y), method="canberra") # 3[3]
dist(rbind(x,y), method="maximum")  # 1[1]
dist(rbind(x,y), method="manhattan")# 2.4[2.4]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言:dist()函数中文帮助文档(中英文对照)

浏览过的版块