AnalyzeWithExitPoll(RxCEcolInf)
AnalyzeWithExitPoll()所属R语言包:RxCEcolInf
Workhorse Function for Ecological Inference for Sets of R x C
主力生态推理功能集的R X C
译者:生物统计家园网 机器人LoveR
描述----------Description----------
This function (using the tuned parameters from TuneWithExitPoll) fits a hierarchical model to data from two sources: (i) ecological data and in which the underlying contigency tables can have any number of rows or columns, and (ii) data from a survey sample of some of the contingency tables. The user supplies the data and may specify hyperprior values. Samples from the posterior distribution are returned as an mcmc object, which can be analyzed with functions in the coda package.
这的功能(使用调谐参数,从TuneWithExitPoll)适合的层次结构模型数据有两个来源:(一)生态数据,并在其中的基础contigency表可以有任意数量的行或列,及(ii)数据从调查样本中的一些列联表。用户提供的数据和可以指定hyperprior值。从后验分布的样品作为mcmc对象coda包中的功能,它可以分析返回。
用法----------Usage----------
AnalyzeWithExitPoll(fstring, rho.vec, exitpoll, data = NULL,
num.iters = 1e+06, save.every = 1000, burnin = 10000,
mu.vec.0 = rep(log((0.45/(mu.dim - 1))/0.55), mu.dim),
kappa = 10, nu = (mu.dim + 6), psi = mu.dim,
mu.vec.cu = runif(mu.dim, -3, 0), NNs.start = NULL,
MMs.start = NULL, THETAS.start = NULL, sr.probs = NULL,
sr.reps = NULL, keep.restart.info = FALSE,
keepNNinternals = 0, keepTHETAS = 0, nolocalmode = 50,
numscans = 1, Diri = 100, dof = 4, eschew = FALSE,
print.every = 10000, debug = 1)
参数----------Arguments----------
参数:fstring
String: model formula of contingency tables' column totals versus row totals. Must be in specified format (an R character string and NOT a true R formula). See Details and Examples.
字符串:模型公式的列联表的列的数量与行总计。必须在指定的格式(R字符的字符串,而不是一个真正的R公式)。查看详细信息和例子。
参数:rho.vec
Vector of dimension I = number of contigency tables = number of rows in data: multipliers (usually in (0,1)) to the covariance matrix of the proposal distribution for the draws of the intermediate level parameters. Typically set to the vector output from TuneWithExitPoll.
向量的维I数= contigency表的行数data:的乘数(通常是在(0,1))的建议分布的协方差矩阵的绘制的中间级别的参数。通常情况下设置为TuneWithExitPoll的矢量输出。
参数:exitpoll
Matrix of dimensions I = number of contingency tables = number of rows in data by (R * C) = number of cells in each contingency table: The results of a survey sample of some of the contingency tables. Must be in specified format. See Details.
矩阵尺寸I数=列联表的行数在data(R * C)=每列联表中的单元数量:调查样本的一些列联表的结果。必须在指定的格式。查看详细信息。
参数:data
Data frame.
数据框。
参数:num.iters
Positive integer: The number of MCMC iterations for the sampler.
正整数:MCMC采样迭代的数量。
参数:save.every
Positive integer: The interval at which the draws will be saved. num.iters must be divisible by this value. Akin to thin in some packages. For example, num.iters = 1000 and save.every = 10 outputs every 10th draw for a total of 100 outputed draws.
正整数的绘制将被保存的时间间隔。 num.iters必须整除这个值。类似于thin在一些包的。例如,num.iters = 1000和save.every = 10输出每10个抽奖共100 outputed画。
参数:burnin
Positive integer: The number of burn-in iterations for the sampler.
正整数:燃烧的采样迭代的数量。
参数:mu.vec.0
Vector: mean of the (normal) hyperprior distribution for the mu parameter.
向量:指(正常)hyperprior的分布mu参数。
参数:kappa
Scalar: The diagonal of the covariance matrix for the (normal) hyperprior distribution for the mu parameter.
标量:(正常)hyperpriormu参数分布的协方差矩阵的对角线。
参数:nu
Scalar: The degrees of freedom for the (Inverse-Wishart) hyperprior distriution for the SIGMA parameter.
标量的自由度为SIGMA参数的的(反威沙特)hyperprior distriution的。
参数:psi
Scalar: The diagnoal of the matrix parameter of the (Inverse-Wishart) hyperprior distribution for the SIGMA parameter.
标量:diagnoal(反威沙特的)hyperprior分布的SIGMA参数的矩阵参数。
参数:mu.vec.cu
Vector of dimension R*(C-1), where R(C) is the number of rows(columns) in each contigency table: Optional starting values for mu parameter.
向量的维R*(C-1),其中R(C)的行数(列)在每个contigency表:可选mu参数的初始值。
参数:NNs.start
Matrix of dimension I x (R*C), where I is the number of contingency tables = number of rows in data: Optional starting values for the internal cell counts, which must total to the continency table row and column totals contained in data. Use of the default (randomly generated internally) recommended.
矩阵的维IX(R*C),其中I是数列联表中的行数data:可选内部的单元计数的初始值,这必须总要continency表格的行和列中包含的data总额。使用默认的(内部随机产生的)建议。
参数:MMs.start
Matrix of dimension I x (R*C), where I is the number of contingency tables = number of rows in data: Optional starting values for the missing internal cell counts, which must total to the continency table row and column totals contained in data. By missing internal cell counts we mean the counts in the contigency tables not observed in the survey sample or exitpoll. Use of the default (randomly generated internally) recommended.
矩阵的维IX(R*C),其中I的列联表中的行数data:可选缺少的内部单元计数的初始值是多少,必须总要continency表格的行和列中包含的data总额的。通过内部单元计数丢失,我们的意思是不是在调查样本或exitpoll在contigency表的计数。使用默认的(内部随机产生的)建议。
参数:THETAS.start
Matrix of dimension I x (R*C), where I is the number of contingency tables = number of rows in data: Optional starting values for the contingency table row probability vectors. The elements in each row of THETAS.start must meet R sum-to-one constraints. Use of the default (randomly generated internally) recommended.
矩阵的维IX(R*C),其中I是数列联表中的行数data:可选的初始值的列联表行的概率向量。的元素在每一行THETAS.start必须满足R总和到一约束。使用默认的(内部随机产生的)建议。
参数:sr.probs
Matrix of dimension I x R: Each value represents the probability of selecting a particular contingency table's row as the row to be calculated deterministically in (product multinomial) proposals for Metropolis draws of the internal cell counts. For example, if R = 3 and row 2 of position sr.probs = c(.1, .5, .4), then in the third contingency table (correspoding to the third row of data), the proposal algorithm for the interior cell counts will calculate the third contingency table's first row deterministically with probability .1, the second row with probability .5, and the third row with probability .4. Use of default (generated internally) recommended.
维矩阵IXR:每个值代表的概率,选择一个特定的应变表的行,作为行计算确定性在(产品多项式)都市建议利用内部单元计数。例如,如果R = 3和第2行的位置sr.probs= C(0.1,0.5,0.4),然后在第三列联表(correspodingdata)到第三行,用于室内的单元计数,将计算出的第三应变表的第一行的建议算法确定性的概率为0.1的概率为0.5,,第二行和第三行的概率是0.4。建议使用默认值(内部产生)。
参数:sr.reps
Matrix of dimension I x R: Each value represents the number of times the (product multinomial proposal) Metropolis algorithm will be attempted when, in drawing the internal cell counts, the proposal for the corresponding contingency table row is to be calculated deterministically. sr.reps has the same structure as sr.probs, i.e., position [3,1] of sr.reps corresponds to the third contingency table's first row. Use of default (generated internally) recommended.
矩阵的维IXR:每个值表示的次数(产品多项建议)Metropolis算法将尝试时,在制定内部的单元计数,相应的应急表行的建议是确定性计算。 sr.reps具有相同的结构,[3,1] sr.reps作为sr.probs,即,位置对应于第三列联表的第一行。建议使用默认值(内部产生)。
参数:keep.restart.info
Logical: Whether last state of the chain should be saved to allow restart in the same state. Restart function not currently implemented.
逻辑链的最后一个状态是否应该被保存在同一个国家允许重新启动。重新启动功能目前尚未实现。
参数:keepNNinternals
Positive integer: The number of draws of the internal cell counts in the contingency tables to be outputted. Must be divisible into num.iters. Use with caution: results in large RAM use even in modest-sized datasets.
正整数的数目绘制的内部单元计数输出的列联表。可分为num.iters。请谨慎使用:在大容量的RAM使用,即使在中等规模的数据集。
参数:keepTHETAS
Positive integer: The number of draws of the contingency table row probability vectors in the contingency tables to be outputted. Must be divisible into num.iters. Use with caution: results in large RAM use even in modest-sized datasets.
正整数的数目绘制的概率向量输出的列联表列联表行。可分为num.iters。请谨慎使用:在大容量的RAM使用,即使在中等规模的数据集。
参数:nolocalmode
Positive integer: How often an alternative drawing method for the contigency table internal cell counts will be used. Use of default value recommended.
正整数:多久为contigency表内部的单元计数的另一种绘图方法将被使用。使用默认值建议。
参数:numscans
Positive integer: How often the algorithm to draw the contingency table internal cell counts will be implemented before new values of the other parameters are drawn. Use of default value recommended.
正整数多久实施前的其他参数的新值的算法绘制列联表的内部单元计数绘制。使用默认值建议。
参数:Diri
Positive integer: How often a product Dirichlet proposal distribution will be used to draw the contingency table row probability vectors (the THETAS).
正整数:往往是一个产品狄利克雷分配提案将被用于绘制列联表行的概率向量(THETAS)。
参数:dof
Positive integer: The degrees of freedom of the multivariate t proposal distribution used in drawing the contingency table row probability vectors (the THETAS).
正整数:度自由的多元t的分配提案用于绘制列联表行的概率向量(THETAS)的。
参数:eschew
Logical: If true, calculation of certain functions of the interntal cell counts omits the two right-most columns instead of only the single right-most column. Not yet implemented.
逻辑:如果为true,计算的interntal单元计数的某些功能省略了最右边的两个列,而不是只有单一的最右列。尚未实施。
参数:print.every
Positive integer: If debug == 1, the number of every print.everyth iteration will be written to the screen. Must be divisible into num.iters.
正整数:如果debug == 1“的数量每print.every”th迭代将被写入到屏幕上。可分为num.iters。
参数:debug
Integer: Akin to verbose in some packages. If set to 1, certain status information (including rough notification regarding the number of iterations completed) will be written to the screen.
整数:类似于verbose的一些包。如果一定的状态信息(包括粗通知关于完成的反复数)设置为1,将被写入到屏幕上。
Details
详细信息----------Details----------
AnalyzeWithExitPoll is the workhorse function in fitting the R x C ecological inference model described in Greiner & Quinn (2009) with the addition of information from a survey sample from some of the contingency tables. Details and terminology of the basic (i.e., without a survey sample) data structure and ecological inference model are discussed in the documentation accompanying the Analyze function.
AnalyzeWithExitPoll是主力函数拟合的R X C区位推论模型格雷纳奎因(2009)中描述的一些列联表从调查样本的附加信息。 Analyze函数附带的文档中详细描述和术语的基本(即,没有调查样本)的数据结构和生态推理模型进行了讨论。
In the present implementation, the AnalyzeWithExitPoll presumes that the survey consisted of a simple random sample from the in-sample contingency tables. Future implementations will allow incorporation of more complicated survey sampling schemes.
在目前的实现,AnalyzeWithExitPoll假定该调查包括一个简单的随机抽样样本列联表。未来实现更复杂的抽样调查方案将允许成立。
The arguments to AnalyzeWithExitPoll are essentially identical to those of Analyze with the major exception of exitpoll. exitpoll feeds the results of the survey sample to the function, and a particular format is required. Specifically, exitpoll must have the same number of rows as data, meaning one row for each contigency table in the dataset. It must have R * C columns, meaning one column for each cell in one of the ecological data's contingency tables. The first row of exitpoll must correspond to the first row of data, meaning that the two rows must contain information from the same contingency table. The second row of exitpoll must contain information from the contingency table represented in the second row of data. And so on. Finally, exitpoll must have counts from the sample of the contingency table in vectorized row major format.
AnalyzeWithExitPoll的参数到Analyze主要的例外exitpoll基本上是相同的。 exitpoll并将结果在调查样本中的功能,和一个特定的格式要求。具体来说,exitpoll必须有相同的行数data,这意味着一列为每个contigency表的数据集。它必须有R * C列,这意味着一列中的每个单元的生态数据的列联表。的第一行exitpoll必须对应,这意味着两行的信息必须包含相同的应变表的第一行中data。第二排exitpoll必须包含的信息从列联表的data在第二行表示。等。最后,exitpoll必须有矢量化行的主要格式列联表中的样本计数。
To illustrate with a voting example: Suppose the contingency tables have two rows, labeled bla and whi, and three columns, denoted Dem, Rep, and Abs. In other words, the fstring argument would be "Dem, Rep, Abs ~ bla, whi". Suppose there are 100 contingency tables. The data will be of dimension 100 \times 5, with each row consisting of the row and column totals from that particular contigency table. exitpoll will be of dimension 100 \times 6. Row 11 of the exitpoll will consist of the following: in position 1, the number of blacks voting Democrat observed in the sample of contingency table 11; in position 2, the number of blacks voting Republican observed in the sample of contigency table 11; in position 3, the number of blacks Abstaining from voting observed in the sample of contingency table 11; in position 4, the number of whites voting Democrat observed in the sample of contingency table 11; etc.
一个投票的例子来说明:假设应急表中有两行,标有喇嘛和WHI,三列,表示民主党,代表和ABS。换句话说,fstring参数是"Dem, Rep, Abs ~ bla, whi"。假设有100列联表。 data将维度100 \times 5,与每一行组成的从该特定contigency表中的行和列的总数。 exitpoll是的尺寸100 \times 6。 11排的exitpoll将包括以下内容:在位置1,黑人投票的民主党人列联表11中观察到的样本在位置2,黑人投票共和党中观察到的样本contigency表11,在位置3,黑人弃权投票观察样品中的列联表11的数量;在第4位,数的白人投票的民主党人列联表11中观察到的样本等。
For tables in which there was no sample taken (i.e., out-of-sample tables), the corresponding row of exitpoll should have a vector of 0s.
对于表中有没有样本(即样本表),相应的行exitpoll应该有一个向量的0。
Model fitting proceeds similarly as in Analyze, and output is simimilarly similar. See documentation accompanyng Analyze for further information.
同样在Analyze模型拟合收益,并输出是simimilarly类似的。 accompanyng Analyze的详细信息,请参阅文档。
值----------Value----------
An object of class mcmc suitable for use in functions in the coda package. Additional items, listed below, may be retrieved from this object, as detailed in the examples section.
类的一个对象mcmc适合在尾波包中的功能中使用。下面列出的其他项目,则可以从这个对象,在示例一节。
参数:dim
Vector (integers) of length 2: number of saved simulations and number of automatically outputted parameters.
长度为2的向量(整数):保存的模拟和数字自动输出参数的数量。
参数:dimnames
List: the first element NULL (currently not used), and the second element is a vector of the names of the automatically outputted parameters.
原价:第一元件NULL(目前没有使用),和所述第二元件是一个向量,自动输出的参数的名称。
参数:acc.t
Vector of length I = number of contigency tables: The fraction of multivariate t proposals accepted in the Metropolis algorithm used to draw the THETAs (meaning the intermediate parameters in the hierarchy).
向量的长度I数= contigency表:分数多元t建议Metropolis算法用于绘制THETA(即在层次结构中的中间参数)在接受。
参数:acc.Diri
Vector of length I = number of contigency tables: The fraction of Dirichlet-based proposals accepted in the Metropolis algorithm used to draw the THETAs (meaning the intermediate parameters in the hierarchy).
向量的长度I= contigency表:狄利克雷为基础的建议,接受Metropolis算法用于绘制THETA(即在层次结构中的中间参数)的小数部分。
参数:vld.multinom
Matrix: To draw from the conditional posterior of the internal cell counts of a contigency table, the Analyze function draws R-1 vectors of lenth C from multinomial distributions. In then calculates the counts in the additional row (denote this row as r') deterministically. This procedure can result in negative values in row r', in which case the overall proposal for the interior cell counts is outside the parameter space (and thus invalid). vld.multinom keeps track of the percentage of proposals drawn in this manner that are valid (i.e., not invalid). Each row of vld.multinom corresponds to a contingency table. Each column in vld.multinom corresponds to a row in the a contingency table. Each entry specifies the percentage of multinomial proposals that are valid when the specified contingency table row serves as the r' row. For instance, in position 5,2 of vld.multinom is the fraction of valid proposals for the 5th contingency table when the second contigency table row is the r'th row. A value of “NaN” means that Analyze chose to use a different (slower) method of drawing the internal cell counts because it suspected that the multinomial method would behave badly.
矩阵:要绘制的条件后,内部单元计数的contigency表,Analyze函数绘制R-1向量;长度ç多项式分布。在计算额外的行计数(R)表示该行确定的。此过程可以导致在行r,在这种情况下的整体方案的内部单元计数以外的参数空间(并因此是无效的)的负值。 vld.multinom跟踪以这种方式得出的提案,是有效的(即,不是无效的)的百分比。每一行的vld.multinom对应的列联表。 vld.multinom中的每一列联表中的行。每个条目指定的百分比是有效的多项建议,在指定的列联表行作为R行。例如,在位置5,2 vld.multinom是当所述第二contigency表行是rth行第五列联表的有效的建议的馏分。 “南”的值意味着Analyze选择使用一个不同的(更慢)制定的内部单元计数的方法,因为它怀疑,多项式法的行为严重。
参数:acc.multinom
Matrix: Same as vld.multinom, except the entries represent the fraction of proposals accepted (instead of the fraction that are in the permissible parameter space).
矩阵:vld.multinom相同,除外的条目表示接受的提案(而不是,在允许的参数空间的馏分)的馏分。
参数:numrows.pt
Integer: Number of rows in each contingency table.
整数:在每一个列联表的行数。
参数:numcols.pt
Integer: Number of columns in each contingency table.
整数:在每一个列联表的列数。
参数:THETA
mcmc: Draws of the THETAs. See Details and Examples.
MCMC:绘制的THETA的。查看详细信息和例子。
参数:NN.internals
mcmc: Draws of the internal cell counts. See Details and Examples.
MCMC:绘制的内部单元计数的。查看详细信息和例子。
警告----------Warnings----------
<STRONG>Computer time:</STRONG> At present, using this function (and the others in this package) requires substantial computer time. The lack of information in ecological data results in slow mixing chains, and the number of parameters that must be drawn in each Gibbs sampler iteration is large. Chain length should be adjusted to achieve adequate convergence. In general, the more segregated the housing patterns in the jurisdiction (meaning the greater the percentage of contingency tables in which one row's counts make up a large portion of that table's total), the smaller the number of iterations needed. We are exploring more efficient sampling algorithms that we anticipate will result in better mixing and faster drawing. At present, however, users should anticipate that analysis of a dataset will take several hours.
<STRONG>电脑时间:</ STRONG>目前,使用此功能(和其他人在此程序包),需要大量的计算机时间。必须绘制在每个Gibbs采样迭代缺乏信息的的生态数据结果在缓慢的搅拌链,参数的数量是很大的。链长应进行调整,以达到适当的收敛。在一般情况下,多个独立的住房模式,这意味着更大的列联表中,一排数的比例占了很大一部分,该表的总的管辖范围,所需的迭代次数较小的。我们正在探索更有效的采样算法,我们预期将导致更好的混合和更快的绘图。然而,目前,用户应预期,分析数据集将需要几个小时。
<STRONG>Large datasets:</STRONG> At present, use of this fuction (and thus this package) is not recommended for large (i.e., more than 1000 contingency tables) datasets. See immediately above.
<STRONG>大型数据集:</ STRONG>目前,本机能的使用(因此这个包),不建议大(即超过1000列联表)的数据集。正上方。
<STRONG>RAM requirements:</STRONG> Do not select large values of keepNNinternals or keepTHETAS without adequate RAM.
<STRONG> RAM的要求:</ STRONG>不要选择大的值keepNNinternals或keepTHETAS没有足够的RAM。
<STRONG>Gelman-Rubin diagnostic in the coda package:</STRONG> Using the Gelman-Rubin convergence diagnostic as presently implemented in the CODA package (called by gelman.diag()) on multiple chains produced by Analyze will cause an error. The reason is that some of the NN.internals and functions of them (Λ's, TURNOUTs, Γ's, and β's) are linearly dependant, and the current coda implmentation of gelman.diag()
<STRONG>格尔曼·鲁宾诊断中的尾波包:</ STRONG>使用格尔曼鲁宾收敛诊断,目前实施中的的CODA包(称为gelman.diag())通过分析多个连锁将导致一个错误。的原因是,一些他们的NN.internals和功能(Λ's,TURNOUTs,Γ's,和β's)是线性相关的,并且当前的次谓语安装启用业务的gelman.diag()
(作者)----------Author(s)----------
D. James Greiner, Paul D. Baines, \& Kevin M. Quinn
参考文献----------References----------
Inference: Bounds, Correlations, Flexibility, and Transparency of Assumptions.” J.R. Statist. Soc. A 172:67-81.
Output Analysis and Diagnostics for MCMC (CODA). http://www-fis.iarc.fr/coda/
实例----------Examples----------
## Not run: [#不运行:]
SimData <- gendata.ep() # simulated data[模拟数据]
FormulaString <- "Dem, Rep, Abs ~ bla, whi, his"
EPInvTune <- TuneWithExitPoll(fstring = FormulaString,
data = SimData$GQdata,
exitpoll=SimData$EPInv$returnmat.ep,
num.iters = 10000,
num.runs = 15)
EPInvChain1 <- AnalyzeWithExitPoll(fstring = FormulaString,
data = SimData$GQdata,
exitpoll=SimData$EPInv$returnmat.ep,
num.iters = 2000000,
burnin = 200000,
save.every = 2000,
rho.vec = EPInvTune$rhos,
print.every = 20000,
debug = 1,
keepTHETAS = 0,
keepNNinternals = 0)
EPInvChain2 <- AnalyzeWithExitPoll(fstring = FormulaString,
data = SimData$GQdata,
exitpoll=SimData$EPInv$returnmat.ep,
num.iters = 2000000,
burnin = 200000,
save.every = 2000,
rho.vec = EPInvTune$rhos,
print.every = 20000,
debug = 1,
keepTHETAS = 0,
keepNNinternals = 0)
EPInvChain3 <- AnalyzeWithExitPoll(fstring = FormulaString,
data = SimData$GQdata,
exitpoll=SimData$EPInv$returnmat.ep,
num.iters = 2000000,
burnin = 200000,
save.every = 2000,
rho.vec = EPInvTune$rhos,
print.every = 20000,
debug = 1,
keepTHETAS = 0,
keepNNinternals = 0)
EPInv <- mcmc.list(EPInvChain1, EPInvChain2, EPInvChain3)
## End(Not run)[#(不执行)]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|