microaggregation(sdcMicro)
microaggregation()所属R语言包:sdcMicro
Microaggregation
Microaggregation
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Function to perform various methods of microaggregation.
函数来执行各种方法microaggregation。
用法----------Usage----------
microaggregation(x, method = "pca", aggr = 3, weights=NULL, nc = 8, clustermethod = "clara", opt = FALSE, measure = "mean", trim = 0, varsort = 1, transf = "log")
参数----------Arguments----------
参数:x
data frame or matrix
数据框或矩阵
参数:method
pca, rmd, onedims, single, simple, clustpca, pppca, clustpppca, mdav, clustmcdpca, influence, mcdpca
PCA,RMD,onedims,单一的,简单的,clustpca,pppca,clustpppca,MDAV,clustmcdpca,影响力,mcdpca
参数:aggr
aggregation level (default=3)
聚合级别(默认值= 3)
参数:nc
number of cluster, if the chosen method performs cluster analysis
如果所选择的方法进行聚类分析,数字聚类,
参数:weights
sampling weights. If determined, a weighted version of the aggregation measure is chosen automatically, e.g. weighted median or weighted mean.
取样权重。如果确定,自动选择的聚集度量的加权版本,例如加权平均或加权平均。
参数:clustermethod
clustermethod, if necessary
clustermethod,如果必要
参数:opt
experimental
实验的
参数:measure
aggregation statistic, mean, median, trim, onestep (default = mean)
聚合统计,平均数,中位数,修剪,一步法(默认值=平均值)
参数:trim
trimming percentage, if measure=trim
修剪的比例,如果措施=修剪
参数:varsort
variable for sorting, if method= single
变量进行排序,如果方法=单
参数:transf
transformation for data x
数据X转型
Details
详细信息----------Details----------
On http://neon.vb.cbs.nl/casc/Glossary.htm one can found the “official” definition of microaggregation:
在http://neon.vb.cbs.nl/casc/Glossary.htm可以找到的microaggregation“官方”的定义:
Records are grouped based on a proximity measure of variables of interest, and the same small groups of records are used in calculating aggregates for those variables. The aggregates are released instead of the individual record values.
记录进行分组的基础上接近程度的不同利益和小团体的记录中使用这些变量计算聚合。被释放的聚集,而不是单个记录值。
The recommended method is “rmd” which forms the proximity using multivariate distances based on robust methods. It is an extension of the well-known method “mdav”. However, when computational speed is important, method “mdav” is the preferable choice.
推荐的方法是“风险管理部”,形成了接近使用多元的距离,根据可靠的方法。它是一个扩展的公知的方法“MDAV”。然而,当计算速度是重要的,方法“MDAV”是优选的选择。
While for the proximity measure very different concepts can be used, the aggregation itself is naturally done with the arithmetic mean. Nevertheless, other measures of location can be used for aggregation, especially when the group size for aggregation has been taken higher than 3. Since the median seems to be unsuitable for microaggregation because of being highly robust, other mesures which are included can be chosen. If a complex sample survey is microaggregated, the corresponding sampling weights should be determined to either aggregate the values by the weighted arithmetic mean or the weighted median.
虽然邻近度量可以使用非常不同的概念,聚合本身是自然完成的算术平均。尽管如此,位置的其他措施可用于聚合,尤其是当用于聚合的组大小已采取高于3。由于该中位数似乎是不适合,因为高度鲁棒microaggregation,所包括的其他对策分析可以被选择。如果一个复杂的抽样调查microaggregated,应确定相应的取样权重为总值的加权算术平均或加权中位数。
This function contains also a method with which the data can be clustered with a variety of different clustering algorithms. Clustering observations before applying microaggregation might be useful. Note, that the data are automatically standardised before clustering.
此功能也含有与该数据的方法具有各种不同的聚类算法可以被聚类。聚类的观测值前申请microaggregation可能是有用的。请注意,这些数据是自动进行聚类标准化。
The usage of clustering method "Mclust" requires package mclust02, which must be loaded first. The package is not loaded automatically, since the package is not under GPL but comes with a different licence.
使用的聚类方法Mclust“的需要包mclust02,必须先加载。包不自动加载,因为这个包是没有根据GPL,但不同的许可证。
The are also some projection methods for microaggregation included. The robust version "pppca" or "clustpppca" (clustering at first) are fast implementations and provide almost everytime the best results.
也有一些投影方法microaggregation。强劲的“pppca或clustpppca”的聚类在第一版本是快速实现,并提供几乎每次最好的结果。
Univariate statistics are preserved best with the individual ranking method (we called them "onedims", however, often this method is named "individual ranking"), but multivariate statistics are strong affected.
单变量统计,保存最好的个人排名的方法(我们称他们为“onedims”,然而,这种方法往往被命名为“个人排名”),但多元统计分析是强大的影响。
With method "simple" one can apply microaggregation directly on the (unsorted) data. It is useful for the comparison with other methods as a benchmark, i.e. replies the question how much better is a sorting of the data before aggregation.
随着方法的简单的人可以申请microaggregation的直接(未分类)的数据。它是有用的为基准,与其他方法的比较,即回答的问题是如何更好地聚合前的数据进行排序。
值----------Value----------
参数:mx
aggregated data set
汇总数据集
参数:x
original data
原始数据
参数:method
method
方法
参数:aggr
aggregation level
聚合级
参数:measure
proximity measure for aggregation
邻近度量的聚集
参数:fot
correction factor, necessary if totals calculated and n divided by aggr is not an integer.
校正因子,必要的,如果总数除以AGGR计算和n不是整数。
(作者)----------Author(s)----------
Matthias Templ
For method “mdav”: This work is being supported by the International Household
Survey Network (IHSN) and funded by a DGF Grant provided by the World Bank to the
PARIS21 Secretariat at the Organisation for Economic Co-operation and Development (OECD).
This work builds on previous work which is elsewhere acknowledged.
Author for the integration of the code for mdav in R: Alexander Kowarik.
参考文献----------References----------
Robust Statistics Meets SDC: New Disclosure Risk Measures for Continuous Microdata Masking, Lecture Notes in Computer Science, Privacy in Statistical Databases, vol. 5262, pp. 113-126, 2008.
Statistical Disclosure Control for Microdata Using the R-Package sdcMicro, Transactions on Data Privacy, vol. 1, number 2, pp. 67-85, 2008. http://www.tdp.cat/issues/abs.a004a08.php
New Developments in Statistical Disclosure Control and Imputation: Robust Statistics Applied to Official Statistics, Suedwestdeutscher Verlag fuer Hochschulschriften, 2009, ISBN: 3838108280, 264 pages.
Practical Applications in Statistical Disclosure Control Using R, Privacy and Anonymity in Information Management Systems New Techniques for New Practical Problems, Springer, 31-62, 2010, ISBN: 978-1-84996-237-7.
参见----------See Also----------
summary.micro, plotMicro, valTable
summary.micro,plotMicro,valTable
实例----------Examples----------
data(Tarragona)
m1 <- microaggregation(Tarragona, method="onedims", aggr=3)
## summary(m1)[#摘要(M1)]
data(testdata)
m2 <- microaggregation(testdata[1:100,c("expend","income","savings")], method="mdav", aggr=4)
summary(m2)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|