sdcMicro-package(sdcMicro)
sdcMicro-package()所属R语言包:sdcMicro
Statistical Disclosure Control (SDC) for the generation of protected microdata for researchers and for public use.
统计的披露控制(SDC)的研究人员和公众使用受保护的微观数据的产生。
译者:生物统计家园网 机器人LoveR
描述----------Description----------
This package includes all methods of the popular software mu-Argus plus several new methods. In comparison with mu-Argus the advantages of this package are that the results are fully reproducible even with the included GUI, that the package can be used in batch-mode from other software, that the functions can be used in a very flexible way, that everybody could look at the source code and that there are no time-consuming meta-data management is necessary. However, the user should have a detailed knowledge about SDC when applying the methods on data.
这个软件包包括了所有流行的软件亩阿格斯加了一些新的方法的方法。比较,与μ-阿格斯此包的优点是,其结果是完全重现即使与包括GUI,该包可以在批处理模式下使用从其它软件,该功能可以在一个非常灵活的方式使用,每个人都可以看看源代码,并有无需进行耗时的元数据管理是必要的。然而,施加数据上的方法时,用户应该有一个详细的了解SDC。
The implemented graphical user interface (GUI) for microdata protection serves as an easy-to-handle tool for users who want to use the sdcMicro package for statistical disclosure control but are not used to the native R command line interface. In addition to that, interactions between objects which results from the anonymization process are provided within the GUI. This allows an automated recalculation and displaying information of the frequency counts, individual risk, information loss and data utility after each anonymization step. In addition to that, the code for every anonymization step carried out within the GUI is saved in a script which can then be easily modified and reloaded.
作为一个易于处理的工具,用户要使用的统计披露控制sdcMicro包,但不使用本机的R命令行界面实现的图形用户界面(GUI)的微观数据保护。此外,这导致从匿名化过程中的对象之间的相互作用所提供的GUI内。这允许自动重新计算和显示信息的频率计数,个人风险,信息丢失和数据实用工具在每个匿名步骤。此外,在GUI中每一个的匿名步骤进行的代码保存在一个脚本,然后可以很容易地修改和重新加载。
Please note, that methods “shuffling”, “robShuffle” (robust shuffling), “gadp” and “robgadp” are not included in the package because method “shuffling” is under a US-patent by other authors, even shuffling consits only of 8 lines of code ...
请注意,该方法“混洗”,“robShuffle”(鲁棒洗牌),的“gadp”和“robgadp”没有包含在包中,因为方法“混洗”是根据由其他作者在美国专利甚至洗牌consits只有8行的代码...
Details
详细信息----------Details----------
(作者)----------Author(s)----------
Matthias Templ
Maintainer: Matthias Templ <templ@statistik.tuwien.ac.at>
参考文献----------References----------
Statistical Disclosure Control for Microdata Using the R-Package sdcMicro, Transactions on Data Privacy, vol. 1, number 2, pp. 67-85, 2008. http://www.tdp.cat/issues/abs.a004a08.php
New Developments in Statistical Disclosure Control and Imputation: Robust Statistics Applied to Official Statistics, Suedwestdeutscher Verlag fuer Hochschulschriften, 2009, ISBN: 3838108280, 264 pages.
实例----------Examples----------
## example from Capobianchi, Polettini and Lucarelli:[#例如,从Capobianchi,Polettini和卢卡雷利:]
data(francdat)
f <- freqCalc(francdat, keyVars=c(2,4,5,6),w=8)
f
f$fk
f$Fk
## with missings:[#(与missings):]
x <- francdat
x[3,5] <- NA
x[4,2] <- x[4,4] <- NA
x[5,6] <- NA
x[6,2] <- NA
f2 <- freqCalc(x, keyVars=c(2,4,5,6),w=8)
f2$Fk
## individual risk calculation:[#个人风险计算:]
indivf <- indivRisk(f)
indivf$rk
## Local Suppression [#本地抑制]
localS <- localSupp(f, keyVar=2, indivRisk=indivf$rk, threshold=0.25)
f2 <- freqCalc(localS$freqCalc, keyVars=c(2,4,5,6), w=8)
indivf2 <- indivRisk(f2)
indivf2$rk
## select another keyVar and run localSupp once again, if you think the table is not fully protected[#选择另一个keyVar,和运行localSupp的再次,如果你认为该表是不充分的保障]
data(free1)
f <- freqCalc(free1, keyVars=1:3, w=30)
ind <- indivRisk(f)
## and now you can use the interactive plot for individual risk objects: [#现在你可以使用个人风险对象的互动图:]
## plot(ind)[#图(IND)]
## Local suppression with localSupp2 and localSupp2Wrapper is more effective:[#本地抑制与localSupp2和localSupp2Wrapper是更有效:]
## example from Capobianchi, Polettini and Lucarelli:[#例如,从Capobianchi,Polettini和卢卡雷利:]
data(francdat)
l1 <- localSupp2(francdat, keyVars=c(2,4,5,6), w=8)
l1
l1$x
l2 <- localSupp2(francdat, keyVars=c(2,4,5,6), w=8, k=2)
l3 <- localSupp2(francdat, keyVars=c(2,4,5,6), w=8, k=4)
## long computation time:[#长的计算时间:]
## l = localSupp2(free1, keyVar=1:3, w=30, k=2, importance=c(0.1,1,0.8))[#升= localSupp2(FREE1,keyVar = 1:3,瓦特= 30,k = 2时,重要性= C(0.1,1,0.8))]
## we want to avoid missings in column 5:[#我们要避免missings在第5列:]
l1 <- localSupp2Wrapper(francdat, keyVars=c(2,4,5,6), importance=c(1,1,0,1), w=8, kAnon=1)
l1$x
## we want to avoid missings in column 5 and allow missings in 1 only if[#要避免missings 1只,如果在第5列,并允许missings的]
## is really necessary:[#是真的有必要:]
l1 <- localSupp2Wrapper(francdat, keyVars=c(2,4,5,6), importance=c(0.1,1,0,1), w=8, kAnon=1)
l1$x
plot(l1)
## Data from mu-Argus:[数据亩阿格斯:]
## Global recoding:[#全局的重新编码:]
data(free1)
free1[, "AGE"] <- globalRecode(free1[,"AGE"], c(1,9,19,29,39,49,59,69,100), labels=1:8)
## Top coding:[#首编码:]
topBotCoding(free1[,"DEBTS"], value=9000, replacement=9100, kind="top")
## Numerical Rank Swapping:[#数值秩交换:]
## do not use the mu-Argus test data set (free1) since the numerical variables are (probably) faked.[不使用自变量的数值(可能)伪造亩阿格斯测试数据集(FREE1)。]
data(Tarragona)
Tarragona1 <- rankSwap(Tarragona, P=10)
## Microaggregation:[#Microaggregation:]
m1 <- microaggregation(Tarragona, method="onedims", aggr=3)
m2 <- microaggregation(Tarragona, method="pca", aggr=3)
# summary(m1)[摘要(m1)的]
# valTable(Tarragona, method=c("simple","onedims","pca")) ## approx. 1 minute computation time[valTable(塔拉戈纳,方法= C(“简单”,“onedims”,“PCA”))##约。 1分钟计算时间]
data(microData)
m1 <- microaggregation(microData, method="mdav")
x <- m1$x ### fix me[##解决我]
summary(m1)
plotMicro(m1, 0.1, which.plot=1) # too less observations...[过少的观察...]
data(free1)
plotMicro(microaggregation(free1[,31:34], method="onedims"), 0.1, which.plot=1)
## disclosure risk (interval) and data utility:[披露风险(间隔)和数据实用程序:]
data(free1)
m1 <- microaggregation(Tarragona, method="onedims", aggr=3)
dRisk(x=Tarragona, xm=m1$mx)
dRisk(x=Tarragona, xm=m2$mx)
dUtility(x=Tarragona, xm=m1$mx)
dUtility(x=Tarragona, xm=m2$mx)
## S4 class code for Adding Noise methods will be included in the next version of sdcMicro.[#S4类代码添加噪声的方法将包含在下一版本的sdcMicro。]
## Fast generation of synthetic data with aprox. the same covariance matrix as the original one.[#快速生成的合成与APROX。原来的一个相同的协方差矩阵。]
data(mtcars)
cov(mtcars[,4:6])
cov(dataGen(mtcars[,4:6],n=200))
pairs(mtcars[,4:6])
pairs(dataGen(mtcars[,4:6],n=200))
## PRAM[#PRAM]
set.seed(123)
x <- sample(1:4, 250, replace=TRUE)
pr1 <- pram(x)
length(which(pr1$x == x))
x2 <- sample(1:4, 250, replace=TRUE)
length(which(pram(x2)$x == x2))
data(free1)
marstatPramed <- pram(free1[,"MARSTAT"])
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|