找回密码
 注册
查看: 373|回复: 0

R语言 seriation包 seriate()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-9-30 01:30:31 | 显示全部楼层 |阅读模式
seriate(seriation)
seriate()所属R语言包:seriation

                                        Seriate objects in dissimilarity matrices, matrices or arrays
                                         系列化的对象相异矩阵,矩阵或数组

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Tries to find an linear order for objects using data in form of a dissimilarity  matrix (two-way one mode data), a data matrix (two-way two-mode data) or a  data array (k-way k-mode data).
试图找到使用的相异矩阵的形式的数据的对象的线性顺序(双向的一种模式的数据),数据矩阵(双向双模式数据)或数据阵列(的k-方式的k-模式数据) 。


用法----------Usage----------


## S3 method for class 'dist'
seriate(x, method = NULL, control = NULL, ...)
## S3 method for class 'matrix'
seriate(x, method = NULL, control = NULL,
    margin = c(1,2), ...)
## S3 method for class 'array'
seriate(x, method = NULL, control = NULL,
    margin = seq(length(dim(x))), ...)



参数----------Arguments----------

参数:x
the data.
的数据。


参数:method
a character string with the name of the seriation method (default: varies by data type).
系列化方法的名称(默认值:一个字符串,不同的数据类型)。


参数:control
a list of control options passed on to the seriation algorithm.
的控制选项的列表传递的系列化算法。


参数:margin
a vector giving the margins to be seriated. For matrix, 1 indicates rows, 2 indicates columns, c(1,2)  indicates rows and columns. For array, margin gets a vector with  the dimensions to seriate.
一个向量给系列化的边缘。对于矩阵,1行表示,2表示列,c(1,2)表示行和列。对于数组,利润率得到一个向量到系列化的尺寸。


参数:...
further arguments (unused).
进一步的论据(未使用)。


Details

详细信息----------Details----------

Two-way two-mode data has to be provided as a dist object (not as a symmetric matrix). Similarities have to be transformed in a suitable way into dissimilarities. Currently the following methods are implemented for dist:
双程2模式的数据被提供作为一个dist对象(而不是作为一个对称矩阵)。在一个合适的方式进入异同异同必须被变换。目前,以下方法实施区:

  


"ARSA" Anti-Robinson seriation by simulated annealing.
"ARSA"反罗宾逊系列化的模拟退火。

A heuristic developed by Brusco et al (2007).   
布鲁斯科等(2007)开发的一种启发式。

"BBURCG" Anti-Robinson seriation (unweighted)
"BBURCG"反罗宾逊系列化(未加权)

A branch-and-bound implementation by Brusco and Stahl (2005).   
布鲁斯科和斯塔尔(2005年)的一个分支定界实施。

"BBWRCG" Anti-Robinson seriation (weighted)
"BBWRCG"反罗宾逊系列化(加权)

A branch-and-bound implementation by Brusco and Stahl (2005).  
布鲁斯科和斯塔尔(2005年)的一个分支定界实施。




"TSP" Traveling salesperson problem solver.
"TSP"旅行推销员问题求解。

A solver in TSP can be used (see solve_TSP in package TSP). The solver method can be passed on via the control argument, e.g. control = list(method = "insertion").
TSP解算器可以使用(见solve_TSP在包TSP)。通过control参数,例如可以通过的求解方法control = list(method = "insertion")。

Since a tour returned by a TSP solver is a connected circle and we are looking for a path representing a linear order, we need to find the best cutting point.  Climer and Zhang (2006) suggest to add a dummy city with equal distance to each other city before generating the tour. The place of this dummy city in an optimal tour with minimal length is the best cutting point (it lies between the most distant cities).  
由于旅游返回的TSP求解的关连圈,我们正在寻找一个线性顺序的路径,我们需要找到最佳的切割点。 Climer和张(2006年)产生的旅游之前,建议增加一个虚拟的城市,每个城市的距离相等。这个虚拟的城市中最小长度的最佳旅游是最佳的切割点(这是最遥远的城市之间)。




"Chen" Rank-two ellipse seriation (Chen 2002).
"Chen"秩两个椭圆形系列化(陈2002)。

This method starts with generating a sequence of correlation matrices R^1, R^2, …. R^1 is the correlation matrix of the original distance matrix D (supplied to the function as  x),  and
此方法开始,产生一个序列的相关矩阵R^1, R^2, …。 R^1是原来的距离矩阵D(提供的功能x)的相关矩阵,并

The rank of the matrix R^n falls with increasing n. The  first R^n in the sequence which has a rank of 2 is found.  Projecting all points in this matrix on the first two eigenvectors, all points fall on an ellipse. The order of the points on this ellipse is the resulting order.
该矩阵的秩R^n属于与增加n。第一R^n的序列,其中有一个秩为2。投影的前两个特征向量矩阵中所有点,所有点落在椭圆。此椭圆上的点的顺序是所得到的顺序。

The ellipse can be cut at the two interception points  (top or bottom) of the vertical axis with the ellipse.  In this implementation the top most cutting point is used.   
在两个拦截点(顶部或底部)的垂直轴与椭圆,该椭圆是可以削减。在此实现中最顶部的切割点被使用。




"MDS" Multidimensional scaling (MDS).
"MDS"多维尺度(MDS)。

Use multidimensional scaling techniques to find an linear order. Note  that unidimensional scaling would be more appropriate but is very hard to  solve. Generally, MDS provides good results.
使用多维标度的技术,找到一个线性顺序。需要注意的是一维的缩放比例会更合适,但很难解决。一般情况下,MDS提供了良好的结果。

By default, metric MDS (cmdscale in stats) is used.  In case of of general dissimilarities, non-metric MDS can be used. The choices are isoMDS and sammon from MASS. The method can be specified as the element method  ("cmdscale", "isoMDS" or "sammon") in control.     
默认情况下,度量MDS(cmdscalestats)。在情况的一般性异同,可以使用非十进制MDS。的选择isoMDS和sammonMASS。该方法可以被指定为元素method("cmdscale","isoMDS"或"sammon")control。

"HC" Hierarchical clustering.
"HC"分层聚类。

Using the order of the leaf nodes in a dendrogram obtained by hierarchical clustering can be used as a very simple seriation technique. This method applies hierarchical clustering (hclust) to x.  The clustering method can be given using a "method" element in  the control list. If omitted, the default "complete" is used.  
使用在一个树状层次聚类得到的叶节点的顺序,可以使用作为一个非常简单的系列化技术。这种方法适用于分层聚类(hclust)x。聚类分析方法,可以用"method"元素在control名单。如果省略,则默认"complete"使用。




"GW", "OLO" Hierarchical  clustering with optional reordering.
"GW","OLO"分层聚类与可选的重新排序。

Uses also the order of the leaf nodes in a dendrogram (see method  "HC"), however, the leaf notes are reordered.
还使用一个树状图中的叶节点的顺序(参见方法"HC"),然而,叶片的注释被重新排序。

A dendrogram (binary tree) has 2^{n-1} internal nodes (subtrees) and the same number of leaf orderings. That is, at each internal node the left and right subtree (or leaves) can be swapped, or, in terms of a dendrogram, be flipped.
进行的聚类分析(二叉树的)具有2^{n-1}内部节点(子树)和相同数量的叶序。也就是说,在每个内部节点的左和右子树(或叶片)可以互换,或者,在一个树状方面,被翻转。

Method "GW" uses an algorithm developed by Gruvaeus and Wainer (1972) and implemented in package gclus (Hurley 2004).  The clusters are ordered at each level so that the objects at the edge of each cluster are adjacent to that object outside the cluster to which it is nearest. The method produces an unique order.
方法"GW"使用一种算法开发由Gruvaeus和Wainer(1972)和实施在包gclus(赫尔利2004年)。在每个级别,以便在每个聚类的边缘相邻的对象,对象以外的聚类,它是最近聚类是有序的。该方法可产生一个唯一的顺序。

Method "OLO" (Optimal leaf ordering, Bar-Joseph et al., 2001)  produces an optimal leaf ordering with respect to the minimizing the sum of the distances along the (Hamiltonian) path connecting the leaves in the given order. The time complexity of the algorithm is O(n^3). Note that non-finite distance values are not allowed.
方法"OLO"(最佳叶订购,条形,Joseph等人,2001),产生最佳的叶就订购(哈密尔顿)的连接路径的叶子在给定的顺序沿的距离总和最小化。该算法的时间复杂度是O(n^3)。请注意,不允许非有限的距离值。

Both methods start with a dendrogram created by hclust. As  the "method" element in the control list a clustering method (default "complete") can be specified. Alternatively, a hclust object can be supplied using an element named "hclust".     
这两种方法开始一个树状的hclust。由于"method"control列表中的聚类方法(默认"complete")可以指定元素。另外,hclust对象可以提供使用命名的元素"hclust"。

Two-way two mode data are general positive matrices. Currently the following methods are implemented for matrix:   
双向两种模式的数据一般正定矩阵。目前,以下方法来实现矩阵:

"BEA" Bond Energy Algorithm (BEA; McCormick 1972).
"BEA"债券能算法(BEA麦考密克1972年)。

The algorithm tries to maximize the measure of effectiveness (see criterion) of a non-negative matrix. Due to the definition of this measure, the tasks of ordering rows and columns is separable and can  be solved independently.
该算法试图最大化有效性的措施(参见criterion)的一个非负矩阵。由于这一措施的定义,订货的行和列的任务是可分的,并独立地是可以解决的。

A row is arbitrarily placed; then rows are positioned one by one. When this is completed, the columns are treated similarly. The overall procedure amounts to two approximate traveling salesperson problems (TSP), one on the rows and one on the columns. The so-called 'best insertion strategy' is used: rows (or columns) are inserted into the current permuted list of rows (or columns). Several consecutive runs of the algorithm might improve the energy.  
随意摆放一排,然后逐一行定位。当此完成时,列被视为类似。整个过程的近似推销员问题(TSP),行和列上。使用所谓的“最好的插入策略:行(或列)被插入到当前排列的列表的行(或列)。几个连续运行的算法可能会提高能量。

Note that Arabie and Hubert (1990) question its use with non-binary data if the objective is to find a seriation or one-dimensional orderings of rows and columns.
需要注意的是Arabie和休伯特(1990年)问题,其目标是使用非二进制数据,如果找到了系列化一维序的行和列。

The BEA code used in this package was implemented by Fionn Murtagh.
BEA在此包中使用的代码实现菲昂Murtagh。

In control as element "rep" the number of runs can be  specified. The results of the best run will be returned.  
在control为元素"rep"运行的次数,可以指定。最佳运行的结果将被返回。




"BEA_TSP" Use a TSP to optimize the measure of effectiveness  (Lenstra 1974).
"BEA_TSP"使用,TSP,以优化措施的有效性(Lenstra 1974)。

Use a TSP solver to optimize ME.
使用TSP求解优化ME。

In control as element "method" a TSP solver method can be specified (see package TSP).  
在control为元素"method"的TSP求解方法,可以指定(见套件“TSP)。




"PCA"  Principal component analysis.
"PCA"的主成分分析。

Uses the projection of the data on its first principal component to determine the order.
使用上的数据,它的第一主成分的投影来确定顺序。

Note that for a distance matrix calculated from x with Euclidean distance, this methods minimizes the least square criterion.      
请注意,对于从x与欧几里德距离计算的距离矩阵,此方法,最大限度地减少了最小二乘准则。

For array no built-in methods are currently available.
对于数组没有内置的方法,目前已经上市。


值----------Value----------

Returns an object of class ser_permutation.
返回一个对象类ser_permutation。


参考文献----------References----------

P. Arabie and L.J. Hubert (1990): The bond energy algorithm revisited,  IEEE Transactions on Systems, Man, and Cybernetics, 20(1), 268–274.
Z. Bar-Joseph, E. D. Demaine, D. K. Gifford, and T. Jaakkola. (2001): Fast Optimal Leaf Ordering for Hierarchical Clustering. Bioinformatics, 17(1), 22–29.
Brusco, M., Koehn, H.F., and Stahl, S. (2007): Heuristic Implementation of Dynamic Programming for Matrix Permutation Problems in Combinatorial Data Analysis. Psychometrika, conditionally accepted.
Brusco, M., and Stahl, S. (2005): Branch-and-Bound Applications in Combinatorial Data Analysis. New York: Springer.
Chen, C. H. (2002):  Generalized Association Plots: Information Visualization via Iteratively Generated Correlation Matrices. Statistica Sinica, 12(1), 7–29.
Sharlee Climer, Weixiong Zhang (2006): Rearrangement Clustering: Pitfalls, Remedies, and Applications, Journal of Machine Learning Research, 7(Jun), 919–943.
Gruvaeus, G. and Wainer, H. (1972): Two Additions to Hierarchical Cluster Analysis, British Journal of Mathematical and Statistical Psychology, 25, 200–206.
Hurley, Catherine B. (2004): Clustering Visualizations of Multidimensional Data. Journal of Computational and Graphical Statistics, 13(4), 788–806.
J.K. Lenstra (1974): Clustering a Data Array and the Traveling-Salesman  Problem, Operations Research, 22(2) 413–414.
W.T. McCormick, P.J. Schweitzer and T.W. White (1972): Problem decomposition and data reorganization by a clustering technique,  Operations Research,  20(5), 993–1009.


参见----------See Also----------

criterion, solve_TSP in TSP, hclust in stats
criterion,solve_TSP中TSP,hcluststats


实例----------Examples----------


##seriate dist[#SERIATE区]
data("iris")
x <- as.matrix(iris[-5])
x <- x[sample(1:nrow(x)),]
d <- dist(x)

## default seriation[#默认系列化]
order <- seriate(d)
order

## plot[#图]
def.par <- par(no.readonly = TRUE)
layout(cbind(1,2), respect = TRUE)

pimage(d, main = "Random")
pimage(d, order, main = "Reordered")

par(def.par)

## compare quality[#比较质量]
rbind(
        random = criterion(d),
        reordered = criterion(d, order)
     )


## seriate matrix[#SERIATE矩阵]
data("iris")
x <- as.matrix(iris[-5])

## to make the variables comparable, we scale the data[#使变量具有可比性,我们的数据扩展]
x <- scale(x, center = FALSE)

## try some methods[#尝试了一些方法。]
def.par <- par(no.readonly = TRUE)
layout(matrix(1:4, ncol = 2, byrow = TRUE), respect=TRUE)

pimage(x, main = "original data")
criterion(x)

order <- seriate(x, method = "BEA_TSP")
pimage(x, order, main = "TSP to optimize ME")
criterion(x, order)

order <- seriate(x, method="PCA")
pimage(x, order, main = "first principal component")
criterion(x, order)

## 2 TSPs[#2茶匙]
order <- c(
    seriate(dist(x), method = "TSP"),
    seriate(dist(t(x)), method = "TSP")
)
pimage(x, order, main = "2 TSPs")
criterion(x, order)

par(def.par)

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-5-20 22:03 , Processed in 0.032329 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表