simContinuous(simPopulation)
simContinuous()所属R语言包:simPopulation
Simulate continuous variables of population data
模拟连续变量的人口数据
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Simulate continuous variables of population data using multinomial log-linear models combined with random draws from the resulting categories or (two-step) regression models combined with random error terms. The household structure of the population data and any other categorical predictors need to be simulated beforehand.
模拟连续变量的人口数据,利用多项式对数线性模型,结合随机抽取产生的类别或(二步法)结合随机误差项的回归模型。家庭的人口结构数据和任何其他的分类预测需要事先要模拟的。
用法----------Usage----------
simContinuous(dataS, dataP, w = "rb050", strata = "db040",
basic = c("age", "rb090", "hsize"),
additional = "netIncome",
method = c("multinom", "lm"), zeros = TRUE,
breaks = NULL, lower = NULL, upper = NULL,
equidist = TRUE, probs = NULL, gpd = TRUE,
threshold = NULL, est = "moments", limit = NULL,
censor = NULL, log = TRUE, const = NULL,
alpha = 0.01, residuals = TRUE, keep = TRUE,
maxit = 500, MaxNWts = 1500,
tol = .Machine$double.eps^0.5,
eps = NULL, seed)
参数----------Arguments----------
参数:dataS
a data.frame containing household survey data.
data.frame包含家庭调查数据。
参数:dataP
a data.frame containing the simulated population data. Household structure and any other categorical predictors need to be simulated beforehand.
data.frame包含模拟的人口数据。家庭结构和任何其他的分类预测需要事先要模拟的。
参数:w
a character string specifying the column of dataS that contains the (personal) sample weights.
指定列的dataS,包含一个字符串(个人)样本权重。
参数:strata
a character string specifying the columns of dataS and dataP, respectively, that define strata. The regression models are computed for each stratum separately. Note that this is currently a required argument and only one stratification variable is supported.
一个字符串指定的列dataS和dataP,分别确定地层。回归模型计算各阶层的分别。请注意,这是目前一个必要的参数,并支持只有一个分层变量。
参数:basic
a character vector specifying the columns of dataS and dataP, respectively, that define the household structure and any other categorical predictors, such as age, gender and household size.
dataS和dataP,分别定义的家庭结构和任何其他的分类预测变量,如年龄,性别和家庭规模的指定列的字符向量。
参数:additional
a character string specifying the additional continuous variable of dataS that should be simulated for the population data. Currently, only one additional variable can be simulated at a time.
一个字符串指定其他连续变量dataS应该是模拟的人口数据。目前,只有一个附加的变量可以是在一个时间模拟。
参数:method
a character string specifying the method to be used for simulating the continuous variable. Accepted values are "multinom", for using multinomial log-linear models combined with random draws from the resulting ategories, and "lm", for using (two-step) regression models combined with random error terms.
一个字符的字符串指定的方法,用于模拟连续变量。可接受的值是"multinom",使用多项对数线性模型与随机相结合的平从ategories,和"lm",使用(二步法)结合随机误差项的回归模型。
参数:zeros
a logical indicating whether the variable specified by additional is semi-continuous, i.e., contains a considerable amount of zeros. If TRUE and method is "multinom", a separate factor level for zeros in the response is used. If TRUE and method is "lm", a two-step model is applied. The first step thereby uses a log-linear or multinomial log-linear model (see “Details”).
一个逻辑指示是否指定的变量由additional是半连续的,即,包含了相当数量的零。如果TRUE和method"multinom",一个单独的因素水平的响应中的零使用。如果TRUE和method"lm",两个步骤的模型应用。第一步,从而采用对数线性或多项式对数线性模型(见“详细信息”)。
参数:breaks
an optional numeric vector; if multinomial models are computed, this can be used to supply two or more break points for categorizing the variable specified by additional. If NULL, break points are computed using weighted quantiles.
一个可选的数字向量;,如果多项式模型来计算的,这可以被用来提供两个或多个断点进行分类的指定的变量由additional。如果NULL,破发点使用加权位数计算。
参数:lower, upper
optional numeric values; if multinomial models are computed and breaks is NULL, these can be used to specify lower and upper bounds other than minimum and maximum, respectively. Note that if method is "multinom" and gpd is TRUE (see below), upper defaults to Inf.
可选的数字值,如果多项式模型的计算和breaks是NULL,这些可用于指定上限和下限以外的最小值和最大值,分别为。请注意,如果method是"multinom"和gpd是TRUE(见下文),upper默认为Inf的。
参数:equidist
logical; if method is "multinom" and breaks is NULL, this indicates whether the (positive) default break points should be equidistant or whether there should be refinements in the lower and upper tail (see getBreaks).
逻辑,如果method是"multinom"和breaks是NULL,这表明(正)默认破发点是否应该是等距离的,还是应该有改进,在较低的和上尾(见getBreaks“)。
参数:probs
numeric vector with values in [0, 1]; if method is "multinom" and breaks is NULL, this gives probabilities for quantiles to be used as (positive) break points. If supplied, this is preferred over equidist.
数字矢量[0, 1]; method如果是"multinom"和breaks是NULL,这给位数的概率被用来作为(正)破发点中的值。如果提供,这是优于equidist。
参数:gpd
logical; if method is "multinom", this indicates whether the upper tail of the variable specified by additional should be simulated by random draws from a (truncated) generalized Pareto distribution rather than a uniform distribution.
逻辑,如果method是"multinom",这表明指定的变量的上尾additional应模拟(截断),而不是广义帕累托分布是均匀分布的随机从。
参数:threshold
a numeric value; if method is "multinom", values for categories above threshold are drawn from a (truncated) generalized Pareto distribution.
一个数值,如果method是"multinom",threshold都来自(截断)广义帕累托分布的上述类别的值。
参数:est
a character string; if method is "multinom", the estimator to be used to fit the generalized Pareto distribution (see fitgpd).
一个字符串,如果method是"multinom",估计被用来适应广义帕累托分布(见fitgpd)。
参数:limit
an optional named list of lists; if multinomial models are computed, this can be used to account for structural zeros. The names of the list components specify the predictor variables for which to limit the possible outcomes of the response. For each predictor, a list containing the possible outcomes of the response for each category of the predictor can be supplied. The probabilities of other outcomes conditional on combinations that contain the specified categories of the supplied predictors are set to 0. Currently, this is only implemented for more than two categories in the response.
一个可选的命名列表的列表;如果多项式模型来计算,这可以使用考虑到结构的零。列表组件的名称指定预测变量的限制可能的结果的反应。对于每一个预测,可以提供一个列表,其中包含每个类别的预测可能的结果的反应。条件的组合,包含指定的类别所提供的预测变量的其他结果的概率被设置为0。目前,这仅仅是在响应实施了两个以上的类别。
参数:censor
an optional named list of lists or data.frames; if multinomial models are computed, this can be used to account for structural zeros. The names of the list components specify the categories that should be censored. For each of these categories, a list or data.frame containing levels of the predictor variables can be supplied. The probability of the specified categories is set to 0 for the respective predictor levels. Currently, this is only implemented for more than two categories in the response.
一个可选的命名列表中列表或data.frame的;,如果多项式模型计算,这可以用来解释结构零。列表组件的名称指定的类别,应审查。对于每个类别,列表或data.frame可以提供包含预测变量的水平。指定类别的概率被设置为0的各自的预测水平。目前,这仅仅是在响应实施了两个以上的类别。
参数:log
logical; if method is "lm", this indicates whether the linear model should be fitted to the logarithms of the variable specified by additional. The predicted values are then back-transformed with the exponential function. See “Details” for more information.
逻辑,如果method是"lm",这表明是否应装有线性模型的对数变量指定的additional。的预测值,然后再转化的指数函数。有关更多信息,请参阅“详细信息”。
参数:const
numeric; if method is "lm" and log is TRUE, this gives a constant to be added before log transformation.
数字; method如果是"lm"和log是TRUE,这给出了一个常量前加上数转换。
参数:alpha
numeric; if method is "lm", this gives trimming parameters for the sample data. Trimming is thereby done with respect to the variable specified by additional. If a numeric vector of length two is supplied, the first element gives the trimming proportion for the lower part and the second element the trimming proportion for the upper part. If a single numeric is supplied, it is used for both. With NULL, trimming is suppressed.
数字;如果method是"lm",这给了微调参数的样本数据。修剪从而完成指定的additional的变量。如果供给的长度为2的一个数值向量时,第一元件给出的下部和所述第二元件的上部的比例修剪修剪比例。如果被提供了一个单一的数字,它是用于两个。用NULL,修整抑制。
参数:residuals
logical; if method is "lm", this indicates whether the random error terms should be obtained by draws from the residuals. If FALSE, they are drawn from a normal distribution (median and MAD of the residuals are used as parameters).
逻辑,如果method是"lm",这表明随机误差项是否应通过借鉴的残差。如果FALSE,他们是来自正态分布(中位数和MAD的残差作为参数)。
参数:keep
logical; if multinomial models are computed, this indicates whether the simulated categories should be stored as a variable in the resulting population data. If TRUE, the corresponding column name is given by additional with postfix "Cat".
逻辑多项模型计算,这表明无论是模拟类应存放在人口数据作为变量。如果TRUE,相应的列名additional后缀"Cat"。
参数:maxit, MaxNWts
control parameters to be passed to multinom and nnet. See the help file for nnet.
控制参数被传递到multinom和nnet。请参阅帮助文件nnet。
参数:tol
if method is "lm" and zeros is TRUE, a small positive numeric value or NULL. When fitting a log-linear model within a stratum, factor levels may not exist in the sample but are likely to exist in the population. However, the coefficient for such factor levels will be 0. Therefore, coefficients smaller than tol in absolute value are replaced by coefficients from an auxiliary model that is fit to the whole sample. If NULL, no auxiliary log-linear model is computed and no coefficients are replaced.
method如果是"lm"和zeros是TRUE,一个小的正数值或NULL。内的地层,对数线性模型拟合因子水平可能不存在的样品,但有可能存在的人口。然而,这种因子水平的系数将是0。因此,系数小于tol的绝对值辅助模型的系数从是适合整个样本的替换。如果NULL,没有辅助的对数线性模型计算,并没有被替换系数。
参数:eps
a small positive numeric value, or NULL (the default). In the former case and if (multinomial) log-linear models are computed, estimated probabilities smaller than this are assumed to result from structural zeros and are set to exactly 0.
一个小的正数值,或NULL(默认值)。在前者的情况下,如果(多项式)对数线性模型计算,估计概率小于该假定导致从结构零,被设置为恰好为0。
参数:seed
optional; an integer value to be used as the seed of the random number generator, or an integer vector containing the state of the random number generator to be restored.
可选的,一个整数的值被用作种子的随机数发生器,或一个整数矢量包含随机数发生器的状态,以被恢复。
Details
详细信息----------Details----------
If method is "lm", the behavior for two-step models is described in the following.
如果method"lm",用于两步模型的行为在下面描述。
If zeros is TRUE and log is not TRUE or the variable specified by additional does not contain negative values, a log-linear model is used to predict whether an observation is zero or not. Then a linear model is used to predict the non-zero values.
如果zeros是TRUE和log是不是TRUE或additional不包含负值指定的变量,对数线性模型来预测观察是否是零或没有。然后,使用一个线性模型预测的非零值。
If zeros is TRUE, log is TRUE and const is specified, again a log-linear model is used to predict whether an observation is zero or not. In the linear model to predict the non-zero values, const is added to the variable specified by additional before the logarithms are taken.
zeros如果是TRUE,log是TRUE和const,再次对数线性模型被用来预测,观察是否为零或不。的线性模型来预测非零值,constadditional前对数添加到指定的变量。
If zeros is TRUE, log is TRUE, const is NULL and there are negative values, a multinomial log-linear model is used to predict negative, zero and positive observations. Categories for the negative values are thereby defined by breaks. In the second step, a linear model is used to predict the positive values and negative values are drawn from uniform distributions in the respective classes.
如果zeros是TRUE,log是TRUE,const是NULL和均为负值,多项式对数线性模型的使用以预测负,零和积极的意见。分类为负值,从而定义的breaks。在第二步骤中,使用一个线性模型来预测的正值和负值来自均匀分布在各个类别。
If zeros is FALSE, log is TRUE and const is NULL, a two-step model is used if there are non-positive values in the variable specified by additional. Whether a log-linear or a multinomial log-linear model is used depends on the number of categories to be used for the non-positive values, as defined by breaks. Again, positive values are then predicted with a linear model and non-positive values are drawn from uniform distributions.
如果zeros是FALSE,log是TRUE和constNULL,两个步骤的模型,如果有非正面在变量值指定的additional。无论是对数 - 线性或多项式的对数线性模型用于取决于被用于非正的值,所定义的breaks的分类的数量。同样,正面的价值观,然后用线性模型预测和非正面的价值观都来自均匀分布。
值----------Value----------
A data.frame containing the simulated population data including the continuous variable specified by additional.
Adata.frame模拟的人口数据,包括连续变量指定的additional。
注意----------Note----------
The basic household structure and any other categorical predictors need to be simulated beforehand with the functions simStructure and simCategorical, respectively.
基本的家庭结构和任何其他的分类预测需要事先用的功能simStructure和simCategorical,分别是模拟的。
Parts of the function were re-implemented with package version 0.3. The function is now much more memory-efficient and faster if there is a large number of possible combinations in the categorical predictor variables. Nevertheless, results may be different from previous versions of the package.
各部分的功能包0.3版重新实现。函数现在是更多的内存效率和更快的,如果有大量的可能的组合中的类别的预测变量。然而,结果可能会有所不同从以前版本的软件包。
(作者)----------Author(s)----------
Original code by Stefan Kraft, redesign and generalizations by Andreas Alfons.
参见----------See Also----------
simStructure, simCategorical, simComponents, simEUSILC
simStructure,simCategorical,simComponents,simEUSILC
实例----------Examples----------
## Not run: [#不运行:]
## these take some time and are not run automatically[#这需要一定的时间,并没有自动运行]
## copy & paste to the R command line[#复制和粘贴到R命令行]
set.seed(1234) # for reproducibility[可重复性]
data(eusilcS) # load sample data[加载示例数据]
eusilcP <- simStructure(eusilcS)
eusilcP <- simCategorical(eusilcS, eusilcP)
basic <- c("age", "rb090", "hsize", "pl030", "pb220a")
# multinomial model with random draws[多项式模型与随机抽取]
eusilcM <- simContinuous(eusilcS, eusilcP,
basic = basic, upper = 200000, equidist = FALSE)
summary(eusilcM)
# two-step regression[两步回归]
eusilcT <- simContinuous(eusilcS, eusilcP,
basic = basic, method = "lm")
summary(eusilcT)
## End(Not run)[#(不执行)]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|