R语言 simba包 sim()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-30 02:30:24

sim(simba)
sim()所属R语言包：simba

                                    Calculate similarities for binary vegetation data
                                       植被的二进制数据计算相似度

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

One of 56 (dis)similarity measures for binary data can be set to calculate (dis)similarities.  The vegetational data can be in either database (list) or matrix format. Same holds for the output. Simultaneous calculation of geographical distances between plots and the virtual position of the calculated similarity values between the parental units can be achieved if a data.frame with coordinates is given.
56（DIS）的相似性度量的二进制数据可以计算（DIS）的相似性。植被的数据可以在任一数据库（名单）或矩阵格式。同样适用于输出。如果data.frame用坐标给出，可以实现同时计算图之间的GEO距离和父母单位计算出的相似度值之间的虚拟位置。

用法----------Usage----------

sim(x, coord=NULL, method = "soer", dn=NULL, normalize = FALSE,
listin = FALSE, listout = FALSE, ...)

参数----------Arguments----------

参数：x
Vegetation data, either as matrix with rows = plots  and columns = species (similarities are calculated between rows!), or as data.frame with  first three columns representing plots, species and occurence  information respectively.  All further columns are dumped before  calculation.  Occurence is only considered as binary. If your list  or matrix contains abundances or frequencies they are transformed  automatically.
植被数据，无论是作为矩阵的行=图和列=物种（相似之处行与行之间的计算），或data.frame与第一三列分别表示图，的物种和发生信息。所有其他的列倾倒之前计算。发生只考虑为二进制。如果您的列表或矩阵中的丰度或频率的自动转换。

参数：coord
A data.frame with two columns containing the coordinate values of the sampling units. If given, it triggers the simultaneous calculation of the geographical distances between the sampling units, the coordinates of virtual centre-points between all possible pairs of plots, and the geographical distances in either x- or y-direction. If coord is given, output is always in database format (no matrix).
Adata.frame两列抽样单位的坐标值。如果给出，它触发采样单元之间的GEO距离，同时计算的虚拟的中心点的坐标之间的所有可能的对图，和在任一x-或y-方向的GEO距离。 coord如果，输出始终处于database格式（没有矩阵）。

参数：method
Binary Similarity index (see Details for references and formulae), partial match to "soerensen", "jaccard", "ochiai", "mountford", "whittaker", "lande", "wilsonshmida", "cocogaston", "magurran", "harrison", "cody", "williams", "williams2", "harte", "simpson", "lennon", "weiher", "ruggiero", "lennon2", "rout1ledge", "rout2ledge", "rout3ledge", "sokal1", "dice", "kulcz1insky", "kulcz2insky", "mcconnagh", "manhattan", "simplematching", "margaleff", "pearson", "roger", "baroni", "dennis", "fossum", "gower", "legendre", "sokal2", "sokal3", "sokal4", "stiles", "yule", "michael", "hamann", "forbes", "chisquare", "peirce", "eyraud", "simpson2", "legendre2", "fager", "maarel", "lamont", "johnson", "sorgenfrei", "johnson2". See details.
二元相似性指数（见Details引用和公式），部分匹配的"soerensen"，"jaccard"，"ochiai"，"mountford"，"whittaker"， "lande"，"wilsonshmida"，"cocogaston"，"magurran"，"harrison"，"cody"，"williams"，"williams2"，<X >，"harte"，"simpson"，"lennon"，"weiher"，"ruggiero"，"lennon2"，"rout1ledge"，"rout2ledge" "rout3ledge"，"sokal1"，"dice"，"kulcz1insky"，"kulcz2insky"，"mcconnagh"，"manhattan"，"simplematching"， "margaleff"，"pearson"，"roger"，"baroni"，"dennis"，"fossum"，"gower"，"legendre"，<X >，"sokal2"，"sokal3"，"sokal4"，"stiles"，"yule"，"michael"，"hamann"，"forbes" "chisquare"，"peirce"，"eyraud"，"simpson2"，"legendre2"，"fager"，"maarel"，"lamont"， "johnson"。查看详细信息。

参数：dn
Neighbor definition. A geographic distance represented by a numeric or a two value vector defining a ring around each plot. Only takes effect when coord != NULL. If specified, the output does only contain similarities between neighboring plots. A plot is a neighbour to any given plot if it is within the range of the neighbor definition. See details.
邻居的定义。一个GEO上的距离，代表一个数字或两个值向量定义每个小区周围环。只有生效时coord！= NULL。如果指定，输出中只包含相邻图的相似性。有一个图是一个任何给定的图，如果是邻居的邻居定义的范围内。查看详细信息。

参数：normalize
Logical value indicating whether the values for  a, b and c which are calculated in the process should be normalized to 100% (per row, which means per plot comparison). If normalize = TRUE an asymmetric index must be chosen (see details).
逻辑值，该值指示是否值a，b和c计算的过程中，应该被标准化为100％（每行，这意味着每图比较）。如果标准化= TRUE，必须选择不对称指数（见详情）。

参数：listin
if x is given in database (list) format this must be set to TRUE (there is no automatic detection of the format)
如果x数据库（名单）格式必须设置为TRUE（有没有自动检测到的格式）

参数：listout
If output is wanted in database format rather than as a dist-object set this to TRUE. Output is automatically given in database-format, when coord is specified.
如果输出是想在数据库格式，而不是一个dist对象设置为TRUE。会自动在数据库格式输出，当coord指定。

参数：...
Arguments to other functions
其他函数的参数

Details

详细信息----------Details----------

All binary similarity indices are based on the variables a, b and c (or can be expressed as such).  Some of them also use d.  Where a is the number of species shared by two compared plots, b is the number of species found only in one of the compared plots, and c is the number of species only found in the other of the compared plots.  d refers to species which are absent from both the compared plots but present in the whole dataset.  Indices incorporating d are discussed critically by Legendre & Legendre (1998) and elsewhere.  They are called symmetric and expose a "double zero" problem as they take species into account which are absent from both compared units.  Absence of species from a sampling site might be due to various factors, it does not necessarily reflect differences in the environment.  Hence, it is preferable to avoid drawing ecological conclusions from the absence of species at two sites (Legendre & Legendre 1998). The indices presented here come from various sources as indicated. Comparative reviews can be found in e.g. Huhta (1979), Wolda (1981), Janson & Vegelius (1981), Shi (1993), Koleff et al. (2003), Albatineh (2006)
所有二进制的相似性指数是基于变量a，b和c（或可表示为等）。其中有些也可以使用d。这里“a是b比较的两个图，共享的物种的数量是多少种只在一个比较图，和c是只存在于物种的数量其他的比较图。 d是指物种是没有比较图，但目前在整个数据集。指数将d关键勒让德勒让德（1998）和其他地方进行了讨论。他们被称为对称和公开“双零”的问题，因为他们需要从两者相比，单位没有考虑到这是种。没有从采样点的物种可能是由于各种因素的影响，并不一定反映不同的环境。因此，它是优选的，以避免绘图生态物种的情况下，从在两个地点（勒让德＆雷建德1998）的结论。这里给出的指标来自不同的来源，如图所示。比较评论中可以找到例如Wolda Huhta（1979），（1981），詹森和Vegelius（1981年），石（1993年），Koleff等。（2003年），Albatineh（2006）

The indices considerably differ in their behaviour. For classification purposes and in ecology, Jaccard and S酶rensen have been found to give robust and meaningful results (e.g. Janson & Vegelius 1981, Shi 1993). For other purposes other indices might be better suited. However, you are invited to use (at least with the asymmetric indices) ternary plots as suggested by Koleff et al. 2003. The matching components a, b, and c can be displayed in a ternary.plot to evaluate the position of the plots in similarity space. When output is in database-format, the matching components are always given and triax.plot can be used to plot them into a triangle-plot. Koleff et al. (2003) used an artificial set of matching components including all possibilities of values that a, b, and c can take from 0 to 100 to display the mathematical behavior of indices. An artificial data-set with this properties - together with the values for the asymmetric indices included here - is part of this package (ads.ternaries) and can be used to study the behavior of the indices prior to analysis. See details and examples there.
该指数显着不同，他们的行为。对于分类和生态，Jaccard和索伦森已发现提供强大的和有意义的结果（如詹森和Vegelius的1981年时，1993年）。对于其他用途的其他指标可能更适合。不过，你被邀请使用（至少在非对称指数）三元图所建议的Koleff等。 2003年。匹配的组件a, b,和c可以显示在ternary.plot评价相似度空间中的曲线的位置。是在数据库格式输出时，总是给匹配的组件和triax.plot可以用来绘制成三角形图。 koleff等。（2003）使用人工组匹配的组件，包括所有的值的可能性，a, b,和c可以从0到100，以显示指数的数学行为。一个人工的数据设置与此属性 - 连同这里包括的不对称指数的值 - 是这个软件包（ads.ternaries）的一部分，并可以用来研究的行为分析前的指数。查看详细信息和例子。

If coord is given, the geographic distances between plots/sampling units are calculated automatically, which may be of value when the display or further analyses of distance decay (sensu Tobler 1970, Nekola & White 1999) is in focus. For convenience the dn-trigger can be used to tell the function to only return similarities calculated between neighboring plots. Similarities between neighboring plots in an equidistant array are not subjected to the problem of auto-correlation because all plots share the same distance (Jurasinski & Beierkuhnlein 2006). Therefore, any variation occurring in the data are most likely caused by environmental differences alone.
如果coord，图/抽样单位之间的GEO距离自动计算，这可能是有价值的，当显示或距离衰减的进一步分析（的意义上托布勒1970年，Nekola 1999年与白）是在焦点。为方便起见，dn触发器可以被用来告诉函数返回的相似性计算相邻图。异同相邻图在等距阵列之间不进行自相关的问题，因为所有的的图共享相同的距离（Jurasinski＆Beierkuhnlein 2006）。因此，在数据发生任何变化最有可能单靠环境的差异造成的。

In the following formulae...
在下面的式...

a = number of shared species
a=共享的物种数量

b = number of species only found on one of the compared units
b=物种数只发现一个比较单位

c = number of species only found on the other of the compared units
c=物种数仅上找到与被比较的单位中的另一个

d = number of species not found on the compared plots but in the dataset
d=没有发现的物种数量的比较图，但在数据集

N = a+b+c+d
N=a+b+c+d

with (n1 <= n2)...
(n1 <= n2)...

n1 = number of species of the plot with fewer species (a+b) or (a+c)
n1=的物种数量的图，品种少(a+b)或(a+c)的

n2 = number of species of the plot with more species (a+b) or (a+c)
n2种的图有更多的品种(a+b)或(a+c)

Computable asymmetric indices:
可计算非对称指数：

Computable symmetric indices (including unshared species):
可计算对称指数（包括非共享的物种）：

rout2ledge formula (Routledge, 1977; Koleff et al. 2003):
rout2ledge式（Routledge出版社，1977; Koleff等，2003）：

值----------Value----------

If listout = FALSE a distance matrix of class dist is returned. If listout = TRUE, a data.frame is returned with 7 columns giving the names of the compared plots in the first two and the calculated similarity measure in the third column. The rest of the columns give the values for a, b, c, and d (in this order). Naming of the first three columns can be changed but defaults to NBX (one of the compared plots), NBY (the other one), used index (the values of the calculated index). If coord != NULL, the following columns are given in addition and the columns a:d shift to the end of the data.frame.
如果listout = FALSE的一个距离矩阵类dist，则返回。如果listout = TRUE，data.frame返回与7列给出的名称，在头两个与被比较的图，并在第三列中的计算出的相似性度量。其余的列给a, b, c,和d（按照这个顺序）的值。前三列的命名是可以改变的，但默认为NBX的比较图（一），NBY（1），used index（计算出的指数值）。如果coord！= NULL，除了给出的下面的列和列a:d移位的数据框的末尾。

参数：distance
Geographical distance between compared plots
之间的GEO距离比较图

参数：X
For plotting purposes, the x-coordinate of the virtual position of the calculated similarity value in the center between the two compared plots
用于绘图的目的，所计算出的相似度值的比较的两个图之间的中心的虚拟位置的x坐标的

参数：Y
For plotting purposes, the y-coordinate of the virtual position of the calculated similarity value in the center between the two compared plots
用于绘图的目的，所计算出的相似度值的比较的两个图之间的中心的虚拟位置的y坐标的

参数：xdist
Geographical distance between compared plots, on the x-axis only
GEO距离之间的比较的图，只在x-轴

参数：ydist
Geographical distance between compared plots, on the y-axis only
GEO距离之间的比较的图，只在y-轴

注意----------Note----------

In general, concepts of data-handling are taken from vegdist and the calculation of a, b, c and d is taken from dist.binary. Thanks to Jari Oksanen for his vegan package. The indices were collected from the literature and are applicable in different fields of research.
一般情况下，数据处理的概念，从vegdist和计算a，，b，c和d取自dist.binary，。感谢杰瑞奥克萨宁他的vegan包。该指数从文献中收集的，并适用于不同领域的研究。

（作者）----------Author(s)----------

Gerald Jurasinski <a href="mailto:gerald.jurasinski@uni-rostock.de">gerald.jurasinski@uni-rostock.de</a>

参考文献----------References----------

参见----------See Also----------

vegdist, dist.binary,  dsvdis, dist
vegdist，dist.binary，dsvdis，dist

实例----------Examples----------

data(abis)
##calculate jaccard similarity and output as dist-object[＃计算Jaccard相似性和输出区对象]
jacc.dist <- sim(abis.spec, method="jaccard")

##calculate Whittaker similarity (with prior normalisation) and [＃计算惠特克相似（与以前的归一化）]
##output as data.frame[＃输出数据框]
whitt.list <- sim(abis.spec, method="whittaker", normalize=TRUE,
listout=TRUE)

##calculate similarity from a database list after Harte & Kinzig (1997) [＃计算的相似性后哈特的Kinzig从数据库列表（1997）]
##and output as dist-object[＃和输出为dist对象]
abis.spec.ls <- liste(abis.spec, splist=TRUE)
hart.dist <- sim(abis.spec.ls, method="harte", listin=TRUE)

## calculate the geographic distances between sites simultaneously[＃计算GEO距离站点之间同时]
## and return only similarities calculated between neighboring plots[＃返回唯一的相同之处计算相邻图]
abis.soer <- sim(abis.spec, coord=abis.env[,1:2], dn=100)

## in an equidistant array[＃等距阵列]
## you can plot this nice between the original positions of the[＃你可以绘制出这个漂亮的原始位置之间的]
## sites (with the size of the dots expressing number of species[＃站点（与点的大小表示的物种数]
## for the sites, and value of the S酶rensen coefficient in between)[＃网站，和值的索伦森系数的关系）]
require(geoR)
points.geodata(coord=abis.env[,1:2], data=abis.env$n.spec,
cex.min=1, cex.max=5)
points.geodata(coord=abis.soer[,5:6], data=abis.soer$soerensen,
cex.min=1, cex.max=5, col="grey50", add=TRUE)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 simba包 sim()函数中文帮助文档(中英文对照)

浏览过的版块