dissrep(TraMineR)
dissrep()所属R语言包:TraMineR
Extracting sets of representative objects using a dissimilarity matrix
提取使用相异度矩阵的代表对象
译者:生物统计家园网 机器人LoveR
描述----------Description----------
The function extracts a set of representative objects that exhibits the key features of the whole data set, the goal being to get easy sounded interpretation of the latter. The user can set either the desired coverage level (the proportion of objects having a representative in their neighborhood) or the desired number of representatives.
这个函数抽取一组有代表性的对象,表现出整个数据集的主要特点是容易吹响,我们的目标是后者的解释。用户可以设置所需的覆盖水平(比例的代表在他们的邻里的对象)或所需数量的代表。
用法----------Usage----------
dissrep(diss, criterion="density",
score=NULL, decreasing=TRUE,
trep=0.25, nrep=NULL, tsim=0.1, dmax=NULL, weights=NULL)
参数----------Arguments----------
参数:diss
A dissimilarity matrix or a dist object (see dist)
一个的相异矩阵或dist对象(见dist)
参数:criterion
the representativeness criterion for sorting the candidate list. One of "freq" (frequency), "density" (neighborhood density) or "dist" (centrality). An optional vector containing the scores for sorting the candidate objects may also be provided. See below and details.
具有代表性的标准进行排序的候选名单。 "freq"(频率),"density"(居密度)或"dist"(核心)。也可以设置一个可选的向量的用于排序的候选对象的分数。见下面和细节。
参数:score
an optional vector containing the representativeness scores used for sorting the objects in the candidate list. The length of the vector must be equal to the number of rows/columns in the distance matrix, i.e the number of objects.
一个可选的向量,含有用于排序在候选列表中的对象的代表性分数。该矢量的长度的距离矩阵中的行/列的数目必须等于,即的对象的数量。
参数:decreasing
if a score vector is provided, indicates whether the objects in the candidate list must be sorted in ascending or decreasing order of this score. The first object in the candidate list is supposed to be the most representative.
如果得分向量,表示在候选列表中的对象是否必须在这个分数的升序或降序进行排序。在候选列表中的第一个对象被认为是最有代表性的。
参数:trep
controls the size of the representative set by setting the desired coverage level, i.e the proportion of objects having a representative in their neighborhood. Neighborhood radius is defined by tsim.
代表集控制的大小,通过设置期望的覆盖水平,即具有在其附近的代表的对象的比例。邻居半径是指由tsim。
参数:nrep
number of representatives. If NULL (default), trep argument is used to control the size of the representative set.
一些代表。如果NULL(默认),trep参数是用来代表集的大小来控制。
参数:tsim
threshold for setting the redundancy and neighborhood radius. Defined as a percentage of the maximum (theoretical) distance. Defaults to 0.1 (10%). Object y is considered as redundant to/in the neighborhood of object x if the distance from y to x is less than tsim*dmax. The neighborhood diameter is thus twice this threshold.
阈值设置的冗余性和邻域半径。定义为最大(理论值)的距离的百分比。默认为0.1(10%)。被认为是多余的/在附近的物体对象yx如果距离从y来x是小于tsim*dmax 。附近直径为这个阈值的两倍。
参数:dmax
maximum theoretical distance. Redundancy and neighborhood diameters are defined as a proportion of this maximum theoretical distance. If NULL, it is derived from the distance matrix.
理论上的最大距离。冗余和邻里直径被定义为一个比例,这个理论上的最大距离。如果NULL,它是来自距离矩阵。
参数:weights
vector of weights of length equal to the number of rows of the dissimilarity matrix. If NULL, equal weights are assigned.
相异矩阵的行的数目的长度相等的权重向量。如果NULL,相同的权重分配。
Details
详细信息----------Details----------
The representative set is obtained by an heuristic that first builds a sorted list of candidates using a representativeness score and then eliminates redundancy. The available criterions for sorting the candidate list are: sequence frequency, neighborhood density, centrality. Other user defined sorting criterions can be provided using the score argument.
代表集的启发式算法,首先构建一个排序使用的代表性得分的候选人名单,然后消除冗余。可用的准则进行排序的候选名单如下:序列频率,邻里密度,中心。其它用户定义的排序准则可以提供score使用参数。
The frequency criterion uses the frequencies as representativeness score. The frequency of an object in the data is computed as the number of other objects with whom the dissimilarity is equal to 0. The more frequent an object the more representative it is supposed to be. Hence, objects are sorted in decreasing frequency order. Indeed, this criterion is the neighborhood (see below) criterion with the neighborhood diameter set to 0.
使用的频率标准的频率代表性得分。的对象中的数据的频率的计算的其他对象的数目与谁相异等于0。更频繁的一个目的,它是更具有代表性应该是。因此,在降低频率的顺序进行排序的对象。事实上,这一标准是与周边的直径设置为0附近(见下文)的标准。
The neighborhood density criterion uses the number—density—of objects in the neighborhood of each candidate. This requires indeed to set the neighborhood diameter. We suggest to set it as a given proportion of the maximal (theoretical) distance between two objects. Candidates are sorted in decreasing density order.
附近的密度标准使用的数密度在附近的每名候选人的对象。这确实需要设置邻里直径。我们建议将其设置为给定的比例最大(理论)两个物体之间的距离。密度为了降低考生进行排序。
The centrality criterion uses the sum of distances to all other objects, i.e. the centrality as a representativeness criterion. The smallest the sum, the most representative the candidate.
核心标准使用的所有其他对象的距离的总和,即核心作为一个代表性的标准。最小的总和,最有代表性的候选人。
For more details, see <CITE>Gabadinho et al., 2011</CITE>.
有关详细信息,请参阅<CITE> Gabadinho等。,2011 </ CITE>。
值----------Value----------
An object of class diss.rep. This is a vector containing the indexes of the representative objects with the following additional attributes:
对象的类diss.rep。这是一个向量,包含以下附加属性代表对象的索引:
参数:Scores
a vector with the representative score of each object given the chosen criterion.
给出每个对象所选择的标准与该代表得分的向量。
参数:Distances
a matrix with the distance of each object to its nearest representative.
一个矩阵,其中每个对象的距离,它的最有代表性的。
参数:Statistics
contains several quality measures for each representative in the set: number of objects attributed to the representative, number of object in the representatives neighborhood, mean distance to the representative.
包含几个质量的措施:由于代表的对象集合中的每个代表,代表附近的对象,是指代表的距离。
参数:Quality
overall quality measure.
整体质量的措施。
Print and summary methods are available.
打印和总结的方法。
参考文献----------References----------
参见----------See Also----------
seqrep, plot.stslist.rep
seqrep,plot.stslist.rep
实例----------Examples----------
## Defining a sequence object with the data in columns 10 to 25[#定义一个序列对象中的数据列10至25]
## (family status from age 15 to 30) in the biofam data set[#(家庭状况从15岁至30日)在biofam数据集]
data(biofam)
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
"Child", "Left+Child", "Left+Marr+Child", "Divorced")
biofam.seq <- seqdef(biofam, 10:25, labels=biofam.lab)
## Computing the distance matrix[#计算距离矩阵]
costs <- seqsubm(biofam.seq, method="TRATE")
biofam.om <- seqdist(biofam.seq, method="OM", sm=costs)
## Representative set using the neighborhood density criterion[#代表性的一组使用附近的密度标准]
biofam.rep <- dissrep(biofam.om)
biofam.rep
summary(biofam.rep)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|