R语言 TraMineR包 seqdist()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-10-1 11:36:58

seqdist(TraMineR)
seqdist()所属R语言包：TraMineR

                                    Distances (dissimilarities) between sequences
                                       序列之间的距离（相异）

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Computes pairwise dissimilarities between sequences or dissimilarities with a reference sequence. Several dissimilarities measures or metrics are available: optimal matching (OM), distance based on the longest common prefix (LCP), on the longest common suffix (RLCP), on the longest common subsequence (LCS), the Hamming distance (HAM) and the Dynamic Hamming Distance (DHD).
计算成对与参考序列的序列或异同的异同。几个不同点措施或指标是：最佳匹配（OM），基于距离的最长公共前缀（LCP），最长的共同的后缀（RLCP），最长公共子序列（LCS），汉明距离（HAM）和动态海明距离（DHD）。

用法----------Usage----------

seqdist(seqdata, method, refseq=NULL, norm=FALSE,
   indel=1, sm, with.missing = FALSE, full.matrix = TRUE)

参数----------Arguments----------

参数：seqdata
a state sequence object defined with the seqdef function.
一个状态序列与seqdef函数定义的对象。

参数：method
a character string indicating the metric to be used. One of "OM" (Optimal Matching), "LCP" (Longest Common Prefix), "RLCP" (reversed LCP, i.e. Longest Common Suffix), "LCS" (Longest Common Subsequence), "HAM" (Hamming distance), "DHD" (Dynamic Hamming distance).
一个字符的字符串，表示要使用的度量。 "OM"（最佳匹配），"LCP"（最长公共前缀），"RLCP"（拨回LCP，即最长公共子后缀），"LCS"（最长公共子序列）， "HAM"（海明距离），"DHD"（动态海明距离）。

参数：refseq
Optional baseline sequence to compute the distances from. Can be the index of a sequence in the state sequence object, 0 for the most frequent sequence, or an external sequence passed as a sequence object with 1 row.
可选基线序列来计算距离。可以是一个序列中的状态序列对象，最频繁序列为0，或通过外部音序序列对象有1行的索引。

参数：norm
if TRUE, the computed OM, LCP, RLCP or LCS distances are normalized to account for differences in sequence lengths, and the normalization method is automatically selected.  Default is FALSE. Can also be one of "none", "maxlength", "gmean", "maxdist", "YujianBo". See details.
如果TRUE，计算的OM，LCP，RLCP，或LCS距离的归到不同的序列长度，自动选择和归一化方法。默认是FALSE。也可以是一个"none"，"maxlength"，"gmean"，"maxdist"，"YujianBo"。查看详细信息。

参数：indel
the insertion/deletion cost (OM method). Default is 1. Ignored with non OM metrics.
的插入/缺失的成本（OM方法）。默认值是1。忽略的非OM指标。

参数：sm
substitution-cost matrix (OM, HAM and DHD method). Can also be one of the seqsubm build methods "TRATE" or "CONSTANT". Default is NA. Ignored with LCP, RLCP and LCS metrics.
替代成本矩阵（OM，火腿和DHD方法）。也可以是一个seqsubm构建方法"TRATE"或"CONSTANT"。默认是NA。忽略，LCP，RLCP和LCS的指标。

参数：with.missing
must be set to TRUE when sequences contain non deleted gaps (missing values). See details.
必须设置为TRUE序列包含未删除差距（缺失值）。查看详细信息。

参数：full.matrix
If TRUE (default), the full distance matrix is returned. This is for compatibility with  earlier versions of the seqdist function. If FALSE, an object of class dist is returned, that is, a vector containing only values from the upper triangle of the distance matrix. Since the distance matrix is symmetrical, no information is lost with this representation while size is divided by 2. Objects of class dist can be passed directly as arguments to most clustering functions. Ignored when refseq is set.
如果TRUE（默认），返回的距离矩阵。这是seqdist功能与早期版本的兼容性。如果FALSE，一个对象的类dist被返回，即，一个向量，包含从上三角的距离矩阵的唯一的值。由于距离矩阵是对称的，而不会丢失任何信息，这表示大小除以2。可以通过对象的类dist直接作为参数传递给最集聚功能。被忽略的，当refseq设置。

Details

详细信息----------Details----------

The seqdist function returns a matrix of distances between sequences or a vector of distances to a reference sequence. The available metrics (see 'method' option) are optimal matching ("OM"), longest common prefix ("LCP"), longest common suffix ("RLCP"), longest common subsequence ("LCS"), Hamming distance ("HAM") and Dynamic Hamming Distance ("DHD"). The Hamming distance is OM without indels and the Dynamic Hamming Distance is HAM with specific substitution costs at each position as proposed by <CITE>Lesnard (2006)</CITE>. Note that HAM and DHD apply only to sequences of equal length.
seqdist函数返回序列或一个参考序列的向量的距离之间的距离的矩阵。可用的度量（见“方法”选项）的最佳匹配（"OM"），最长公共前缀（"LCP"），最长公共后缀（"RLCP"），最长公共子序列（<X >），海明距离（"LCS"）和动态海明距离（"HAM"）。海明距离为OM没有INDELS和动态的海明距离是与特定的替代成本在每个位置上所提出的<CITE> Lesnard（2006年）</ CITE> HAM。请注意，HAM和DHD仅适用于序列长度相等。

For OM, HAM and DHD, a user specified substitution cost matrix can be provided with the sm argument. For DHD, this should be a series of matrices grouped in a 3-dimensional matrix with the third index referring to the position in the sequence. When sm is not specified, a constant substitution cost of 1 is used with HAM, and <CITE>Lesnard (2006)</CITE>'s proposal for DHD.
对于OM，HAM和DHD，用户指定的替代成本矩阵，可以提供与sm参数。对于DHD，这应该是一个系列分组在一个3维的矩阵的矩阵与参照序列中的位置的第三指标。当sm不指定，不断替代成本的1用于HAM，并<CITE> Lesnard的（2006年）</ CITE>的建议DHD。

Distances can optionally be normalized by means of the norm argument. If set to TRUE, Elzinga's normalization (similarity divided by geometrical mean of the two sequence lengths) is applied to LCP, RLCP and LCS distances, while Abbott's normalization (distance divided by length of the longer sequence) is used for OM, HAM and DHD. Elzinga's method can be forced with "gmean" and Abbott's rule with "maxlength". With "maxdist" the distance is normalized by its maximal possible value. For more details, see <CITE>Elzinga (2008)</CITE> and <CITE>Gabadinho et al. (2009)</CITE>.
距离可以任选地norm参数的归一化装置。如果设置为TRUE，Elzinga的标准化（相似的两个序列长度除以几何平均）被施加到LCP，RLCP和LCS距离，而OM Abbott的归一化（距离除以较长序列长度）是用于，HAM和DHD。，Elzinga的方法可以强制"gmean"“和雅培公司的规则与"maxlength"。用"maxdist"的距离是归其最大可能值。有关详细信息，请参阅<CITE> Elzinga（2008年）</ CITE>和<CITE>时Gabadinho等。（2009年）</ CITE>。

When sequences contain gaps and the gaps=NA option was passed to seqdef, i.e. when there are non deleted missing values, the with.missing argument should be set to TRUE. If left to FALSE the function stops when it encounters a gap. This is to make the user aware that there are gaps in his sequences. If the OM method is selected, seqdist expects a substitution cost matrix with a row and a column entry for the missing state (symbol defined with the nr option of seqdef). This will be the case for substitution cost matrices returned by seqsubm. More details on how to compute distances with sequences containing gaps are given in <CITE>Gabadinho et al. (2009)</CITE>.
当序列包含的差距和被传递gaps=NA，即当有不删除缺失值的，seqdef参数应该设置为with.missing的TRUE选项。如果离开FALSE的功能停止时，它遇到了一个缺口。这是为了让用户知道，有自己的序列中的差距。如果被选中，OM方法seqdist期望的取代成本矩阵的一行和一列条目为丢失的状态（符号定义与nr选项seqdef）。这将是替代成本的情况下，矩阵返回seqsubm。如何计算距离的序列包含的差距在<CITE> Gabadinho等更多细节。（2009年）</ CITE>。

值----------Value----------

When refseq is specified, a vector with distances between the sequences in the data sequence object and the reference sequence is returned. When refseq is NULL (default), the whole matrix of pairwise distances between sequences is returned.
当refseq被指定时，在该数据序列中的对象与参考序列的序列的向量之间的距离被返回。当refseqNULL（默认），的整个矩阵序列两两之间的距离是回来了。

参考文献----------References----------

series. Technical Report, Department of Social Science Research Methods, Vrije Universiteit, Amsterdam.

参见----------See Also----------

seqsubm, seqdef, and for multichannel distances seqdistmc.
seqsubm，seqdef，为多声道的距离seqdistmc。

实例----------Examples----------

## optimal matching distances with substitution cost matrix[＃最佳匹配距离与替代成本矩阵]
## derived from transition rates[＃来自的升学率]
data(biofam)
biofam.seq <- seqdef(biofam, 10:25)
costs <- seqsubm(biofam.seq, method="TRATE")
biofam.om <- seqdist(biofam.seq, method="OM", indel=3, sm=costs)

## normalized LCP distances[＃归LCP距离]
biofam.lcp <- seqdist(biofam.seq, method="LCP", norm=TRUE)

## normalized LCS distances to the most frequent sequence in the data set[＃归LCS的最频繁的数据集合中的序列的距离]
biofam.lcs <- seqdist(biofam.seq, method="LCS", refseq=0, norm=TRUE)

## histogram of the normalized LCS distances[＃直方图的归一化的LCS距离]
hist(biofam.lcs)

## =====================[＃=====================]
## Example with missings[＃与missings例]
## =====================[＃=====================]
data(ex1)
ex1.seq <- seqdef(ex1,1:13)

subm <- seqsubm(ex1.seq, method="TRATE", with.missing=TRUE)
ex1.om <- seqdist(ex1.seq, method="OM", sm=subm, with.missing=TRUE)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册