gendata.ep(RxCEcolInf)
gendata.ep()所属R语言包:RxCEcolInf
Function To Simulate Ecological and Survey Data For Use in
函数来模拟的生态和测量数据,用于
译者:生物统计家园网 机器人LoveR
描述----------Description----------
This function generates simulated ecological data, i.e., data in the form of contigency tables in which the row and column totals but none of the internal cell counts are observed. At the user's option, data from simulated surveys of some of the "units" (in voting parlance, 'precincts') that gave rise to the contingency tables are also produced.
此函数生成模拟的生态数据,即,在形式contigency在其中的表的行和列的总数,但内部的单元计数都没有观察到的数据。根据用户的选择,从模拟的“单位”(在投票的说法,“专用”),引起了列联表的调查数据也有生产。
用法----------Usage----------
gendata.ep(nprecincts = 175,
nrowcat = 3,
ncolcat = 3,
colcatnames = c("Dem", "Rep", "Abs"),
mu0 = c(-.6, -2.05, -1.7, -.2, -1.45, -1.45),
rowcatnames = c("bla", "whi", "his", "asi"),
alpha = c(.35, .45, .2, .1),
housing.seg = 1,
nprecincts.ep = 40,
samplefrac.ep = 1/14,
K0 = NULL,
nu0 = 12,
Psi0 = NULL,
lambda = 1000,
dispersion.low.lim = 1,
dispersion.up.lim = 1,
outfile=NULL,
his.agg.bias.vec = c(0,0),
HerfInvexp = 3.5,
HerfNoInvexp = 3.5,
HerfReasexp = 2)
参数----------Arguments----------
参数:nprecincts
positive integer: The number of contingency tables (precincts) in the simulated dataset.
正整数数列联表(专用)的模拟数据集。
参数:nrowcat
integer > 1: The number of rows in each of the contingency tables.
> 1的整数:每个列联表中的行数。
参数:ncolcat
integer > 1: The number of columns in each of the contingency tables.
> 1的整数:每个列联表中的列数。
参数:rowcatnames
string of length = length(nrowcat): Names of rows in each contingency table.
字符串的长度=长度(nrowcat):在每一个列联表行的名称。
参数:colcatnames
string of length = length(ncolcat): Names of columns in each contingency table.
字符串的长度=长度(ncolcat):在每一个列联表的列名。
参数:alpha
vector of length(nrowcat): initial parameters to a Dirichlet distribution used to generate each contingency table's row fractions.
向量的长度(nrowcat):初始参数用于生成每一个列联表的行级的Dirichlet分布。
参数:housing.seg
scalar > 0: multiplied to alpha to generate final parameters to Dirichlet distribution used to generate each contingency table's row fractions.
标量> 0,乘以阿尔法以产生最终的参数,用来产生每一个列联表的行级Dirichlet分布。
参数:mu0
vector of length (nrowcat * (ncolcat - 1)): The mean of the multivariate normal hyperprior at the top level of the hierarchical model from which the data are simulated. See Details.
向量的长度(nrowcat(ncolcat - 1)):的平均在顶层的层次模型的数据进行了模拟的多元正常hyperprior的。查看详细信息。
参数:K0
square matrix of dimension (nrowcat * (ncolcat - 1)): the covariance matrix of the multivariate normal hyperprior at the top level of the hierarchical model from which the data are simulated. See Details.
方阵尺寸(nrowcat(ncolcat - 1)):在顶层的层次模型的数据进行了模拟的多元正常hyperprior的协方差矩阵。查看详细信息。
参数:nu0
scalar > 0: the degrees of freedom for the Inv-Wishart hyperprior from which the Σ matrix will be drawn.
标量> 0:度从该为馆藏威沙特hyperprior自由Σ矩阵将被绘制。
参数:Psi0
square matrix of dimension (nrowcat * (ncolcat - 1)): scale matrix for the Inv-Wishart hyperprior from which the SIGMA matrix will be drawn.
正方形矩阵的维数(nrowcat*(ncolcat - 1)):尺度矩阵为的馆藏威沙特hyperprior,从该SIGMA矩阵将被绘制的。
参数:lambda
scalar > 0: initial parameter of the Poisson distribution from which the number of voters in each precinct will be drawn
标量> 0:初始泊松分布的参数将被绘制在每个选区的选民数
参数:dispersion.low.lim
scalar > 0 but < dispersion.up.lim: lower limit of a draw from runif() to be multiplied to lambda to set a lower limit on the parameter used to draw from the Poisson distribution that determines the number of voters in each precinct.
标量> 0,但dispersion.up.lim:下限的平局从runif()要乘以lambda用于绘制的泊松分布,确定的参数设置的下限在每个选区的选民人数。
参数:dispersion.up.lim
scalar > dispersion.low.lim: upper limit of a draw from runif() to be multiplied to lambda to set a upper limit on the parameter used to draw from the Poisson distribution that determines the number of voters in each precinct.
标> dispersion.low.lim:上限的平局从runif()lambda设置一个上限决定的选民人数从泊松分布,用于绘制的参数要乘以在每个选区。
参数:outfile
string ending in ".Rdata": filepath and name of object; if non-NULL, the object returned by this function will be saved to the location specified by outfile.
结束的字符串“。RDATA”:文件路径和名称的对象,如果非NULL,这个函数返回的对象将被保存到指定的位置outfile。
参数:his.agg.bias.vec
vector of length 2: only implemented for nowcat = 3 and ncolcat = 3: if non-null, induces aggregation bias into the simulated data. See Details.
向量长度为2只执行了,为nowcat = 3和ncolcat = 3:如果非空,诱导聚集到模拟数据的偏差。查看详细信息。
参数:nprecincts.ep
integer > -1 and less than nprecincts: number of contingency tables (precincts) to be included in simulated survey sample (ep for "exit poll").
整数> -1,小于nprecincts的列联表(专用)被列入在模拟调查样本中(EP为“出口民调”)。
参数:samplefrac.ep
fraction (real number between 0 and 1): percentage of individual units (voters) within each contingency table (precinct) include in the survey sample.
分数(0和1之间的实数):以个人为单位百分比(选民)在每一个列联表(区),包括在调查样本。
参数:HerfInvexp
scalar: exponent used to generate inverted quasi-Herfindahl weights used to sample contingency tables (precincts) for inclusion in a sample survey. See Details.
标:用于生成倒准赫芬达尔用于样品包括在抽样调查的列联表(专用)的权重指数。查看详细信息。
参数:HerfNoInvexp
scalar: same as HerInvexp except the quasi-Herfindahl weights are not inverted. See Details.
标:同样的作为准赫芬达尔重HerInvexp除了不倒。查看详细信息。
参数:HerfReasexp
scalar: same as HerfInvexp, for a separate sample survey. See Details.
标:HerfInvexp一样,一个单独的抽样调查。查看详细信息。
Details
详细信息----------Details----------
This function simulates data from the ecological inference model outlined in Greiner \& Quinn (2009). At the user's option (by setting nprecincts.ep to an integer greater than 0), the function generates three survey samples from the simulated dataset. The specifics of the function's operation are as follows.
此功能从格雷纳\&奎恩(2009年)中列出的生态推理模型模拟数据。根据用户的选择(通过设置nprecincts.ep大于0的整数),该函数生成3个调查样本中,从模拟数据集。的功能的操作的具体细节如下。
First, the function simulates the total number of individual units (voters) in each contigency table (precinct) from a Poisson distribution with parameter lambda * runif(1, dispersion.low.lim, dispersion.up.lim). Next, for each table, the function simulates the vector of fraction of units (voters) in each table (precinct) row. The fractions are simulated from a Dirichlet distribution with parameter vector housing.seg * alpha. The row fractions are multiplied by the total number of units (voters), and the resulting vector is rounded to produce contingency table row counts for each table.
首先,功能模拟总数的个别单位(选民)在每个contigency表(区)的泊松分布参数lambda* runif(1,dispersion.low.lim,dispersion.up.lim)。接着,为每个表中,该函数模拟分数单位(选民)的矢量在每个表(界区)行。的部分是模拟的Dirichlet分布参数向量housing.seg*alpha的。行馏分乘以总的单位数(选民),并且将所得的矢量舍入产生应变为每个表的表的行数。
Next, a vector μ is simulated from a multivariate normal with mean mu0 and covariance matrix K0. A covariance matrix Sigma is simulated from an Inv-Wishart with nu0 degrees of freedom and scale matrix Psi0.
接下来,向量μ是从一个多元正常,平均mu0和协方差矩阵K0模拟。协方差矩阵Sigma是模拟的投资威沙特nu0程度的自由和规模的矩阵Psi0。
Next, nprecincts vectors are drawn from N(μ, Σ). Each of these draws undergoes an inverse-stacked multidimensional logistic transformation to produce a set of nrowcat probability vectors (each of which sums to one) for nrowcat multinomial distributions, one for each row in that contingency table. Next, the nrowcat multinomial values, which represent the true (and in real life, unobserved) internal cell counts, are drawn from the relevant row counts and these probability vectors. The column totals are calculated via summation.
接下来,nprecincts向量的来自N(μ, Σ)。这些绘制的每一个经历了一个反叠多维MF转型产生一组nrowcat的概率向量(每个款项1)nrowcat多项分布,其中,列联表中的每一行。接下来,nrowcat多元的价值观,代表真正的(在现实生活中,谁也没)的内部单元计数,绘制相关的行数和概率向量。通过总结计算列总计。
If nprecincts.ep is greater than 0, three simulated surveys (exit polls) are drawn. All three select contingency tables (precincts) using weights that are a function of the composition of the row totals. Specifically the row fractions are raised to a power q and then summed (when q = 2 this calculation is known in antitrust law as a Herfindahl index). For one of the three surveys (exit polls) gendata.ep generates, these quasi-Herfindahl indices are the weights. For two of the three surveys (exit polls) gendata.ep generates, denoted EPInv and EPReas, the sample weights are the reciprocals of these quasi-Herfindhal indices. The former method tends to weight contingency tables (precincts) in which one row dominates the table higher than contigency tables (precincts) in which row fractions are close to the same. In voting parlance, precincts in which one racial group dominates are more likely to be sampled than racially mixed precincts. The latter method, in which the sample weights are reciprocated, weights contingency tables in which row fractions are similar more highly; in voting parlance, mixed-race precincts are more likly to be sampled.
如果nprecincts.ep是大于0,三个模拟调查(出口民调)绘制。所有这三个选择应变表(专用区)是一个函数的组合物中的行总计的使用权。具体行分数提高到一个功率Q,然后总结(当q = 2计算被称为反托拉斯法作为赫芬达尔指数)。三项调查之一(出口民调)gendata.ep的产生,这些准赫芬达尔指数的权重。对于两三次调查(出口民调)gendata.ep产生,表示EPInv和EPReas,样本权重是这些准赫芬德尔指数的倒数。前一种方法的倾向重量应变表(专用区),在其中一排支配高于contigency表(专用区),其中行馏分是接近相同的表。在投票的说法,选区在哪一个种族群体占主导地位的是更可能是比种族混合的专用取样。后者的方法,在其中作往复运动的样本权重,重量应变,其中的表行馏分是类似的更高度;在投票的说法,种族混合区是更likly以被采样。
For example, suppose nrowcat = 3, HerInvexp = 3.5, HerfReas = 2, and HerfNoInv = 3.5. Consider contingency table P1 with row counts (300, 300, 300) and contingency table P2 with row counts (950, 25, 25). Then:
例如,假设nrowcat= 3,HerInvexp = 3.5,HerfReas= 2,和HerfNoInv= 3.5。考虑列联表的行数(300,300,300)和行数(950,25,25)的列联表P2 P1。然后:
Row fractions: The corresponding row fractions are (300/900, 300/900, 300/900) = (.33, .33, .33) and (950/1000, 25/1000, 25/1000) = (.95, .025, .025).
行部分:相应的行级(300/900,300/900,300/900)=(0.33,0.33,0.33)和(950/1000,25/1000,25/1000)=(.95 ,.025,.025)。
EPInv weights: EPInv would sample from assign P1 and P2 weights as follows: 1/sum(.33^3.5, .33^3.5, .33^3.5) = 16.1 and 1/sum(.95^3.5, .025^3.5, .025^3.5) = 1.2.
EPInv重量:EPInv将样品从分配P1和P2的权重如下:1/sum(.33^3.5, .33^3.5, .33^3.5) = 16.1和1/sum(.95^3.5, .025^3.5, .025^3.5) = 1.2。
EPReas weights: EPReas would assign weights as follows: 1/sum(.33^2, .33^2, .33^2) = 3.1 and 1/sum(.95^2, .025^2, .025^2) = 1.1.
EPReas重量:EPReas将分配权重如下:1/sum(.33^2, .33^2, .33^2) = 3.1和1/sum(.95^2, .025^2, .025^2) = 1.1。
EPNoInv weights: EPNoInv would assign weights as follows: sum(.33^3.5, .33^3.5, .33^3.5) = .062 and sum(.95^3.5, .025^3.5, .025^3.5) = .84.
EPNoInv重量:EPNoInv将分配权重如下:sum(.33^3.5, .33^3.5, .33^3.5) = .062和sum(.95^3.5, .025^3.5, .025^3.5) = .84。
For each of the three simulated surveys (EPInv, EPReas, and EPNoInv), gendata.ep returns a list of length three. The first element of the list, returnmat.ep, is a matrix of dimension nprecincts by (nrowcat * ncolcat) suitable for passing to TuneWithExitPoll and AnalyzeWithExitPoll. That is, the first row of returnmat.ep corresponds to the first row of GQdata, meaning that they both contain information from the same contingency table. The second row of returnmat.ep contains information from the contingency table represented in the second row of GQdata. And so on. In addition, returnmat.ep has counts from the sample of the contingency table in vectorized row major format, as required for TuneWithExitPoll and AnalyzeWithExitPoll.
对于每三个模拟调查(EPInv,EPReas和EPNoInv)gendata.ep返回一个列表长度为3。列表中的第一个元素,returnmat.ep,是一个矩阵的维nprecincts的(nrowcat*ncolcat)适合传递给TuneWithExitPoll和<X >。即,第一行AnalyzeWithExitPoll对应returnmat.ep的第一行,这意味着,它们都从相同的列联表中包含的信息。的第二行GQdata从代表returnmat.ep在第二行中的列联表中包含的信息。等。此外,GQdata计数从样品的应急表中的矢量行的主要格式,所需的returnmat.ep和TuneWithExitPoll。
If nrowcat = ncolcat = 3, then the user may set his.agg.bias.vec to be nonzero. This will introduce aggregation bias into the data by making the probability vector of the second row of each contingency table a function of the fractional composition of the third row. In voting parlance, if the rows are black, white, and Hispanic, the white voting behavior will be a function of the percent Hispanic in each precinct. For example, if his.agg.bias.vec = c(1.7, -3), and if the fraction Hispanic in each precinct i is X_{h_i}, then in the ith precinct, the μ_i[3] is set to mu0[3] + X_{h_i} * 1.7, while μ_i[4] is set to mu0[4] + X_{h_i} * -3. This feature allows testing of the ecological inference model with aggregation bias.
如果nrowcat=ncolcat= 3,那么用户可以设置his.agg.bias.vec是非零的。这将引入聚合偏置到的数据,通过使每一意外事件表的小数部分的函数的组合物中的第三行的第二行的概率矢量。在投票的说法,如果行有黑,白,西班牙裔,白色的投票行为将是一个函数,在每个选区%为拉美裔。例如,如果his.agg.bias.vec= C(1.7 -3),如果我是在每个区的西班牙裔美国人的比例X_{h_i},然后在第i个选区,μ_i[3]设置为 mu0[3]+X_{h_i} * 1.7,而μ_i[4]设置为mu0[4]+X_{h_i} * -3。此功能允许测试的生态聚集偏见的推理模型。
值----------Value----------
A list with the follwing elements.
与肺癌皮的元素列表。
参数:GQdata
Matrix of dimension nprecincts by (nrowcat + ncolcat): The simulated (observed) ecological data, meaning the row and column totals in the contingency tables. May be passed as data argument in Tune, Analyze, TuneWithExitPoll, and AnalyzeWithExitPoll
矩阵的维nprecincts由(nrowcat+ncolcat):模拟(观察)生态数据,这意味着应急表中的行和列的总数。可能是通过data在Tune参数,Analyze,TuneWithExitPoll和AnalyzeWithExitPoll
参数:EPInv
List of length 3: returnmat.ep, the first element in the list, is a matrix that may be passed as the exitpoll argument in TuneWithExitPoll and AnalyzeWithExitPoll. See Details. ObsData is a dataframe that may be used as the data argument in the survey package. sampprecincts.ep is a vector detailing the row numbers of GQdata (meaning the contingency tables) that were included in the EPInv survey (exit poll). See Details for an explanation of the weights used to select the contingency tables for inclusion in the EPInv survey (exit poll).
列表的长度为3:returnmat.ep,在列表中的第一个元素是一个矩阵,可通过exitpollTuneWithExitPoll和AnalyzeWithExitPoll参数。查看详细信息。 ObsData是一个数据框,可用于data survey包参数。 sampprecincts.ep是一个向量,详细介绍了行号的GQdata(意思是列联表)被包含在EPInv调查(exit poll的)。请参阅详细解释的权重选择的应变列入EPInv调查表(出口民调)。
参数:EPNoInv
List of length 3: Contains the same elements as EPInv. See Details for an explanation of weights used to select the contingency tables for inclusion in the EPNoInv survey (exit poll).
长度为3的列表:包含相同的元件EPInv。查看详细的应急纳入EPNoInv调查表(出口民调)选择权的解释。
参数:EPReas
List of length 3: Contains the same elements as EPInv. See Details for an explanation of weights used to select the contingency tables for inclusion in the EPReas survey (exit poll).
长度为3的列表:包含相同的元件EPInv。查看详细的应急纳入EPReas调查表(出口民调)选择权的解释。
参数:omega.matrix
Matrix of dimension nprecincts by (nrowcat * (ncolcat-1)): The matrix of draws from the multivariate normal distribution at the second level of the hiearchical model giving rise to GQdata. These values undergo an inverse-stacked-multidimensional logistic transformation to produce contingency table row probability vectors.
矩阵的维nprecincts由(nrowcat(ncolcat-1)):矩阵,吸引了来自多元正态分布的hiearchical模型的第二层引起<X >。这些值进行反堆叠多维的MF转型的列联表行的概率向量。
参数:interior.tables
List of length nprecincts: Each element of the list is a full (meaning all interior cells are filled in) contingency table.
列表的长度nprecincts:列表中的每一个元素是一个完整的(这意味着所有的内部电池充满)列联表。
参数:mu
vector of length nrowcat * (ncolcat-1): the μ vector drawn at the top level of the hierarchical model giving rise to GQdata. See Details.
向量的长度nrowcat(ncolcat-1):μ的上升GQdata的分层模型的顶层矢量绘制。查看详细信息。
参数:Sigma
square matrix of dimension nrowcat * (ncolcat-1): the covariance matrix drawn at the top level of the hierarchical model giving rise to GQdata. See Details.
正方形矩阵的维度nrowcat*(ncolcat-1):上升GQdata的分层模型的顶层绘制的协方差矩阵。查看详细信息。
参数:Sigma.diag
the output of diag(Sigma).
输出diag(Sigma)。
参数:Sigma.corr
the output of cov2cor(Sigma).
输出cov2cor(Sigma)。
参数:sim.check.vec
vector: the true values of the parameters generated by Analyze and AnalyzeWithExitPoll in the same order as the parameters are produced by those two functions. This vector is useful in assessing the coverage of intervals from the posterior draws from Analyze and AnalyzeWithExitPoll.
向量:所产生的参数的真值Analyze和AnalyzeWithExitPoll中的参数相同的顺序产生的这两个功能。这个向量是有用的,在评估的时间间隔后的覆盖面,吸引了来自Analyze和AnalyzeWithExitPoll。
(作者)----------Author(s)----------
D. James Greiner \& Kevin M. Quinn
参考文献----------References----------
Inference: Bounds, Correlations, Flexibility, and Transparency of
实例----------Examples----------
## Not run: [#不运行:]
SimData <- gendata.ep() # simulated data[模拟数据]
FormulaString <- "Dem, Rep, Abs ~ bla, whi, his"
EPInvTune <- TuneWithExitPoll(fstring = FormulaString,
data = SimData$GQdata,
exitpoll=SimData$EPInv$returnmat.ep,
num.iters = 10000,
num.runs = 15)
EPInvChain1 <- AnalyzeWithExitPoll(fstring = FormulaString,
data = SimData$GQdata,
exitpoll=SimData$EPInv$returnmat.ep,
num.iters = 2000000,
burnin = 200000,
save.every = 2000,
rho.vec = EPInvTune$rhos,
print.every = 20000,
debug = 1,
keepTHETAS = 0,
keepNNinternals = 0)
EPInvChain2 <- AnalyzeWithExitPoll(fstring = FormulaString,
data = SimData$GQdata,
exitpoll=SimData$EPInv$returnmat.ep,
num.iters = 2000000,
burnin = 200000,
save.every = 2000,
rho.vec = EPInvTune$rhos,
print.every = 20000,
debug = 1,
keepTHETAS = 0,
keepNNinternals = 0)
EPInvChain3 <- AnalyzeWithExitPoll(fstring = FormulaString,
data = SimData$GQdata,
exitpoll=SimData$EPInv$returnmat.ep,
num.iters = 2000000,
burnin = 200000,
save.every = 2000,
rho.vec = EPInvTune$rhos,
print.every = 20000,
debug = 1,
keepTHETAS = 0,
keepNNinternals = 0)
EPInv <- mcmc.list(EPInvChain1, EPInvChain2, EPInvChain3)
## End(Not run)[#(不执行)]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|