R语言 vegan包 vegdist()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-10-1 15:15:08

vegdist(vegan)
vegdist()所属R语言包：vegan

                                    Dissimilarity Indices for Community Ecologists
                                       社区生态学家的相异指数

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

The function computes dissimilarity indices that are useful for or popular with community ecologists. All indices use quantitative data, although they would be named by the corresponding binary index, but you can calculate the binary index using an appropriate argument. If you do not find your favourite index here, you can see if it can be implemented using designdist.  Gower, Bray–Curtis, Jaccard and Kulczynski indices are good in detecting underlying ecological gradients (Faith et al. 1987). Morisita, Horn–Morisita, Binomial, Cao and Chao indices should be able to handle different sample sizes (Wolda 1981, Krebs 1999, Anderson & Millar 2004), and Mountford (1962) and Raup-Crick indices for presence–absence data should be able to handle unknown (and variable) sample sizes.
该函数计算是有用的或流行与社会生态的差异性指标。所有指数均使用定量数据，虽然他们将被命名为对应的二进制指数，但使用适当的参数，就可以计算出二进制指数。如果你没有找到你最喜欢的指数在这里，你可以看到，如果它可以使用designdist。，高尔，布雷柯蒂斯，Jaccard和Kulczynski，指数在检测潜在的生态梯度（信仰等，1987年）。 Morisita，喇叭Morisita，二项式，曹和超指标应该是能够处理不同的样本量（克雷布斯Wolda 1981年，1999年，2004年安德森和米勒），芒福德（1962）和劳普 - 克里克指数存在不存在数据能够处理未知（变量）的样本量。

用法----------Usage----------

参数----------Arguments----------

参数：x
Community data matrix.
社区数据矩阵。

参数：method
Dissimilarity index, partial match to  "manhattan", "euclidean", "canberra", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita",  "horn", "mountford", "raup" , "binomial",  "chao" or "cao".
相异指数，部分匹配"manhattan"，"euclidean"，"canberra"，"bray"，"kulczynski"，"jaccard"，"gower"，"altGower"，"morisita"，"horn"，"mountford"，"raup"，"binomial"，"chao"或"cao"。

参数：binary
Perform presence/absence standardization before analysis using decostand.
在分析之前使用decostand执行存在/不存在标准化。

参数：diag
Compute diagonals.
计算对角线。

参数：upper
Return only the upper diagonal.
返回上对角线。

参数：na.rm
Pairwise deletion of missing observations when computing dissimilarities.
成对删除时计算相异的缺失观察。

参数：...
Other parameters.  These are ignored, except in method ="gower" which accepts range.global parameter of decostand. .
其他参数。这些都将被忽略，但在method ="gower"接受range.global参数的decostand。。

Details

详细信息----------Details----------

Jaccard ("jaccard"), Mountford ("mountford"), Raup–Crick ("raup"), Binomial and Chao indices are discussed later in this section.  The function also finds indices for presence/ absence data by setting binary = TRUE. The following overview gives first the quantitative version, where x[ij] x[ik] refer to the quantity on species (column) i and sites (rows) j and k. In binary versions A and B are the numbers of species on compared sites, and J is the number of species that occur on both compared sites similarly as in designdist (many indices produce identical binary versions):
杰卡德（"jaccard"），芒福德（"mountford"），劳普克里克（"raup"），二项分布和超指标的讨论在本节后面。该功能还发现通过设置binary = TRUE，存在/不存在数据的指数。以下概述给出了第一个定量的版本，其中x[ij]x[ik]物种的数量（列）i和网站（行）j和k 。在二进制版本A和B相比，网站上的物种的数量，和J两者相比，网站上发生的类似designdist（很多物种的数量指数产生相同的二进制版本）：

Jaccard index is computed as 2B/(1+B), where B is Bray–Curtis dissimilarity.
Jaccard指数的计算方法为2B/(1+B)，B是布雷柯蒂斯相异。

Binomial index is derived from Binomial deviance under null hypothesis that the two compared communities are equal. It should be able to handle variable sample sizes. The index does not have a fixed upper limit, but can vary among sites with no shared species. For further discussion, see Anderson & Millar (2004).
二项式指数的计算从二项式偏差下的零假设，即比较的两个社区都是平等的。它应该能够处理变量的样本量。该指数并没有一个固定的上限，但可以改变站点之间没有共享的物种。如需进一步讨论，请参阅安德森 - 米勒（2004年）。

Cao index or CYd index (Cao et al. 1997) was suggested as a minimally biased index for high beta diversity and variable sampling intensity. Cao index does not have a fixed upper limit, but can vary among sites with no shared species.  The index is intended for count (integer) data, and it is undefined for zero abundances; these are replaced with arbitrary value 0.1 following Cao et al. (1997).  Cao et al. (1997) used log10, but the current function uses natural logarithms so that the values are approximately 2.30 times higher than with 10-based logarithms. Anderson & Thompson (2004) give an alternative formulation of Cao index to highlight its relationship with Binomial index (above).
曹指数或：何秀兰指数（曹等人，1997）被建议作为一种微创的偏见高β多样性指数和变量抽样强度。曹指数并没有一个固定的上限，但可以改变站点之间没有共享的物种。该指数是用于计数（整数）的数据，并且它是未定义的零的丰度，这些任意值0.1以下Cao等所取代。（1997年）。曹等人。（1997）用log10，但目前的功能使用的值是约2.30倍以上10基于对数，自然对数。安德森·汤普森（2004年）的替代配方曹指数以突出其与二项式指数（以上）的关系。

Mountford index is defined as M = 1/α where α is the parameter of Fisher's logseries assuming that the compared communities are samples from the same community (cf. fisherfit, fisher.alpha). The index M is found as the positive root of equation exp(a*M) + exp(b*M) = 1 + exp((a+b-j)*M), where j is the number of species occurring in both communities, and a and b are the number of species in each separate community (so the index uses presence–absence information). Mountford index is usually misrepresented in the literature: indeed Mountford (1962) suggested an approximation to be used as starting value in iterations, but the proper index is defined as the root of the equation above. The function vegdist solves M with the Newton method. Please note that if either a or b are equal to j, one of the communities could be a subset of other, and the dissimilarity is 0 meaning that non-identical objects may be regarded as similar and the index is non-metric. The Mountford index is in the range 0 … log(2), but the dissimilarities are divided by log(2) so that the results will be in the conventional range 0 … 1.
芒福德指数被定义为M = 1/α其中α是参数的假设相比，社区样品来自同一社区的（参见fisherfit费舍尔的logseries，fisher.alpha）。该指数M方程的正根，exp(a*M) + exp(b*M) = 1 + exp((a+b-j)*M)，其中j是发生在这两个社区的物种，a和b物种的数量在每个单独的社区（所以索引使用信息存在不存在）。芒福德指数通常歪曲在文献中：确实芒福德（1962）建议用作起始值在迭代的近似，但适当的指数被定义为上面的方程中的根目录下。该功能vegdistM与牛顿的方法来解决。请注意，如果是a或b等于j“的社区之一可能是其他的一个子集，的相异是0意义是不相同的对象可被视为类似于索引非十进制。的芒福德指数的范围内0 … log(2)，但相异的除以log(2)，这样的结果将是在常规范围0 … 1。

Raup–Crick dissimilarity (method = "raup") is a probabilistic index based on presence/absence data.  It is defined as 1 - prob(j), or based on the probability of observing at least j species in shared in compared communities.  Legendre & Legendre (1998) suggest using simulations to assess the probability, but the current function uses analytic result from hypergeometric distribution (phyper) instead.  This probability (and the index) is dependent on the number of species missing in both sites, and adding all-zero species to the data or removing missing species from the data will influence the index.  The probability (and the index) may be almost zero or almost one for a wide range of parameter values.  The index is nonmetric: two communities with no shared species may have a dissimilarity slightly below one, and two identical communities may have dissimilarity slightly above zero. The index uses equal occurrence probabilities for all species, but Raup and Crick originally suggested that sampling probabilities should be proportional to species frequencies (Chase et al. 2011). A simulation approach with unequal species sampling probabilities is implemented in raupcrick function following Chase et al. (2011).
劳普-Crick的相异（method = "raup"）是根据关于存在/不存在数据的概率的索引。它被定义为1 - prob(j)，或观察至少j物种的概率相比社区在共享的基础上。勒让德和勒让德（1998）建议使用模拟研究，评估的概率，但是从超几何分布（phyper），而不是当前的功能使用分析的结果。这个概率（和索引）是依赖于缺少两个站点中的物种的数目，并添加全零的物种的数据，或删除的物种，从该数据将影响索引。可能几乎为零或几乎一个用于范围广泛的参数值（和索引）的概率。该指数非测量：两个社区没有共享的物种可能有差异性略低于1，可能有两个相同的社区略高于零的差异性。该指数使用相同的发生概率为所有物种，但劳普和克里克最初建议的取样概率应该是成比例的种频率（Chase等人2011年）。与不平等的物种抽样概率的模拟方法中实现raupcrick功能后，大通等。（2011年）。

Chao index tries to take into account the number of unseen species pairs, similarly as in method = "chao" in specpool. Function vegdist implements a Jaccard type index defined as d[jk] = 1 - U[j]*U[k]/(U[j] + U[k] - U[j]*U[k]), where U[j] = C[j]/N[j] + (N[k] -1)/N[k] * a1/(2*a2) * S[j]/N[j], and similarly for U[k]. Here C[j] is the total number of individuals in the species of site j that are shared with site k, N[j] is the total number of individuals at site j, a1 (and a2) are the number of species occurring in site j that have only one (or two) individuals in site k, and S[j] is the total number of individuals in the species present at site j that occur with only one individual in site k (Chao et al. 2005).
赵索引时会考虑到看不见的种对数，同样method = "chao"specpool。函数vegdist实现了雅可比型指数定义为d[jk] = 1 - U[j]*U[k]/(U[j] + U[k] - U[j]*U[k])，U[j] = C[j]/N[j] + (N[k] -1)/N[k] * a1/(2*a2) * S[j]/N[j]，同样为U[k]。这是C[j]是个人在网站j共享与现场k，N[j]是总数的个人网站j的品种总数的，a1（和a2）都在现场发生的物种数量j只有一个（或两个）个人在现场k，S[j]是个人的物种总数的呈现在现场j中出现的只有一个人在现场k（Chao等人，2005）。

Morisita index can be used with genuine count data (integers) only. Its Horn–Morisita variant is able to handle any abundance data.
Morisita指数可以用来与真正的计数数据（整数）。号角Morisita变体是能够处理任何丰度数据。

Euclidean and Manhattan dissimilarities are not good in gradient separation without proper standardization but are still included for comparison and special needs.
欧几里德和曼哈顿的不同点是在没有适当的标准化梯度分离不是很好，但仍然比较和特殊需要。

Bray–Curtis and Jaccard indices are rank-order similar, and some other indices become identical or rank-order similar after some  standardizations, especially with presence/absence transformation of equalizing site totals with decostand. Jaccard index is metric, and probably should be preferred instead of the default Bray-Curtis which is semimetric.
布雷柯蒂斯和Jaccard指数的排名顺序相似，和一些其他指标相同或类似的经过一些严格规范，尤其是在存在/不存在转型的均衡与decostand的网站总数的排名顺序。 Jaccard指数是度量，应该是首选，而不是默认的布雷柯蒂斯这是semimetric的。

The naming conventions vary. The one adopted here is traditional rather than truthful to priority. The function finds either quantitative or binary variants of the indices under the same name, which correctly may refer only to one of these alternatives For instance, the Bray index is known also as Steinhaus, Czekanowski and S酶rensen index. The quantitative version of Jaccard should probably called Ru啪i膷ka index. The abbreviation "horn" for the Horn–Morisita index is misleading, since there is a separate Horn index. The abbreviation will be changed if that index is implemented in vegan.
命名惯例有所不同。这里采用的是传统的，而不是真实的优先权。该功能发现不同的指数相同的名称下，正确只能引用这些替代品，例如定量或二进制，布雷指数也被称为斯坦豪斯，Czekanowski和Sorensen指数。的定量版本的Jaccard应该鲁齐卡指数。的缩写"horn"非洲之角Morisita指数是误导性的，因为有一个单独的喇叭指数。将被改变，如果该索引中实现vegan的缩写。

值----------Value----------

Should provide a drop-in replacement for dist and return a distance object of the same type.
应提供一个下拉式的替代dist返回的距离相同类型的对象。

注意----------Note----------

The function is an alternative to dist adding some ecologically meaningful indices.  Both methods should produce similar types of objects which can be interchanged in any method accepting either.  Manhattan and Euclidean dissimilarities should be identical in both methods. Canberra index is divided by the number of variables in vegdist, but not in dist.  So these differ by a constant multiplier, and the alternative in vegdist is in range (0,1).  Function daisy (package cluster) provides alternative implementation of Gower index that also can handle mixed data of numeric and class variables.  There are two versions of Gower distance ("gower", "altGower") which differ in scaling: "gower" divides all distances by the number of observations (rows) and scales each column to unit range, but "altGower" omits double-zeros and divides by the number of pairs with at least one above-zero value, and does not scale columns (Anderson et al. 2006).  You can use decostand to add range standardization to "altGower" (see Examples). Gower (1971) suggested omitting double zeros for presences, but it is often taken as the general feature of the Gower distances. See Examples for implementing the Anderson et al. (2006) variant of the Gower index.
的功能是替代dist的加入一些生态意义的指标。这两种方法都应该产生类似的类型的对象可以互换，在任何方法要么接受。在这两种方法，曼哈顿和欧几里德的异同应该是相同的。堪培拉指数，但不是在vegdist在dist除以的变量的数量。因此，这些不同的常数乘法器，替代在vegdist是（0,1）范围内。函数daisy（包cluster）提供了另一种实现高尔指数，也可以处理混合数据的数字和类变量。有两个版本高尔距离（"gower"，"altGower"）不同的缩放比例："gower"将所有距离的若干意见（行）和规模的每一列的单元范围内，但"altGower"双零遗漏，并划分的对数至少有一个以上的零值，和确实不小柱（Anderson等，2006年）。您可以使用decostand添加范围内的标准化"altGower"（见例）。高尔（1971）建议省略双零存在，但它通常被认为高尔距离的一般特征。实施Anderson等人的例子。（2006年）的变体的高尔指数。

Most dissimilarity indices in vegdist are designed for community data, and they will give misleading values if there are negative data entries.  The results may also be misleading or NA or NaN if there are empty sites.  In principle, you cannot study species composition without species and you should remove empty sites from community data.
相异指数在vegdist社区数据而设计的，而如果有负面的数据项，它们的值会产生误导。结果还可能误导或NA或NaN如果有空洞网站。原则上，你不能没有的物种研究的物种组成，你应该从社区数据删除空的网站。

（作者）----------Author(s)----------

Jari Oksanen, with contributions from Tyler Smith (Gower index)
and Michael Bedward (Raup–Crick index).

参考文献----------References----------

of habitat on temperate reef fish assemblages in northeastern New Zealand.  Journal of Experimental Marine Biology and Ecology 305, 191–221.
dispersion as a measure of beta diversity. Ecology Letters  9, 683–693.
ecological and environmental monitoring. Ecological Applications 14, 1921–1935.
in river benthic Auswuchs community analysis. Water Environment Research 69, 95–106.
statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters 8, 148–159.
B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in <code>alpha</code>-diversity. Ecosphere 2:art24 [doi:10.1890/ES10-00117.1]
Compositional dissimilarity as a robust measure of ecological distance. Vegetatio 69, 57–68.
of its properties. Biometrics 27, 623–637.

Edition. Elsevier.
classification problems. In: P.W.Murphy (ed.), Progress in Soil Zoology, 43–50. Butterworths.
diversity. Oecologia 50, 296–302.

参见----------See Also----------

Function designdist can be used for defining your own dissimilarity index. Alternative dissimilarity functions include dist in base R, daisy (package cluster), and dsvdis (package labdsv).  Function betadiver provides indices intended for the analysis of
功能designdist可用于定义你自己的差异性指数。替代相异的功能包括dist在碱基r，daisy（包cluster），和dsvdis（包labdsv）。功能betadiver提供拟用于分析的指数

实例----------Examples----------

data(varespec)
vare.dist <- vegdist(varespec)
# Orl贸ci's Chord distance: range 0 .. sqrt(2)[Orlóci的弦距范围：0 .. SQRT（2）]
vare.dist <- vegdist(decostand(varespec, "norm"), "euclidean")
# Anderson et al.  (2006) version of Gower[Anderson等人。（2006）版本的高尔]
vare.dist <- vegdist(decostand(varespec, "log"), "altGower")
# Range standardization with "altGower" (that excludes double-zeros)[范围标准化“altGower”（不包括双零）]
vare.dist <- vegdist(decostand(varespec, "range"), "altGower")

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册