R语言 sampleSelection包 selection()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-29 21:37:22

selection(sampleSelection)
selection()所属R语言包：sampleSelection

                                    Heckman-style selection models
                                       赫克曼风格选择模型

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This is the frontend for estimating Heckman-style selection models either with one or two outcomes (also known as generalized tobit models).  It supports binary outcomes in the single-outcome case.
这是前端估计的赫克曼式选择模型使用一个或两个的结果（也被称为广义Tobit回归模型）。单结果的情况下，它支持二进制结果。

For model specification and more details, see Henningsen and Toomet (2008) and the included vignette “Sample Selection Models”.
型号规格和详细信息，请参阅亨宁森和Toomet（2008年）和所包含的小插曲“样本选择模型”。

用法----------Usage----------

selection(selection, outcome, data = sys.frame(sys.parent()),
subset, method = "ml", start = NULL,
ys = FALSE, xs = FALSE, yo = FALSE, xo = FALSE,
mfs = FALSE, mfo = FALSE, print.level = 0, ...)

heckit( selection, outcome, data = sys.frame(sys.parent()),
method = "2step", ... )

参数----------Arguments----------

参数：selection
formula, the selection equation.
公式，挑选方程。

参数：outcome
the outcome equation(s).  Either a single equation (for tobit 2 models), or a list of two equations (tobit 5 models).
的结果的等式（）。无论是单式（托比2种型号），或两个方程（托比5款）的列表。

参数：data
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model.  If not found in data, the variables are taken from environment(formula), typically the environment from which selection is called.
一个可选的数据框，列表或环境（或as.data.frame到数据框的对象强制转换），其中包含在模型中的变量。如果没有找到data，变量environment(formula)，通常是selection被称为环境。

参数：subset
an optional index vector specifying a subset of observations to be used in the fitting process.
一个可选的索引向量指定的装配过程中可以使用的观测值的一个子集。

参数：method
how to estimate the model.  Either "ml" for Maximum Likelihood, "2step" for 2-step estimation, or "model.frame" for returning the model frame (only).
如何对模型进行估计。是"ml"最大似然法（Maximum Likelihood），"2step"2步估计，或"model.frame"返回的模型框架（只）。

参数：start
vector, initial values for the ML estimation.  If start does not have names, names are constructed based on the model frame.
矢量，初始值的最大似然估计。 start如果没有人名，地名的构造模型框架的基础上的。

参数：ys, yo, xs, xo, mfs, mfo
logicals.  If true, the response (y), model matrix (x) or the model frame (mf) of the selection (s) or outcome (o) equation(s) are returned.
的逻辑。如果为true，响应（y），模型矩阵（x）或选择（mf）或结果（s的模型框架（o））方程（组）被返回。

参数：print.level
integer.  Various debugging information, higher value gives more information.
整数。各种调试信息，更高的价值，提供更多的信息。

参数：...
additional parameters for the corresponding fitting functions tobit2fit, tobit5fit, heckit2fit, and heckit5fit.
相应的拟合函数的附加参数tobit2fit，tobit5fit，heckit2fit和heckit5fit。

Details

详细信息----------Details----------

The endogenous variable of the argument 'selection' must have exactly two levels (e.g. 'FALSE' and 'TRUE', or '0' and '1'). By default the levels are sorted in increasing order ('FALSE' is before 'TRUE', and '0' is before '1').  This also applies for the binary outcome equation.  For continuous-oucome cases, the dependent variable(s) should be numeric.
内生变量的参数“选择”必须刚好有两个级别（例如，FALSE和TRUE或0和1）。默认情况下，水平进行排序顺序（FALSE是TRUE，0之前是前1）。这也适用于二进制结果方程。对于连续oucome的情况下，因变量（s）应该是数字。

For tobit-2 (sample selection) models, only those observations are included in the second step estimation (argument 'outcome'), where this variable equals the second element of its levels (e.g. 'TRUE' or '1').
TOBIT-2（样本选择）模式，只有那些意见都包含在第二阶段估计（参数结果），这个变量等于其水平的第二个元素（例如，TRUE或1）。

For tobit-5 (switching regression) models, in the second step the first outcome equation (first element of argument 'outcome') is estimated only for those observations, where this endogenous variable of the selections equation equals the first element of its levels (e.g. 'FALSE' or '0'). The second outcome equation is estimated only for those observations, where this variable equals the second element of its levels (e.g. 'TRUE' or '1').
TOBIT-5（切换回归）模型，在第二个步骤中的第一个成果方程的参数“结果”（第一个元素），估计只有这些意见，这种内生变量的选择方程的第一个元素，其水平相等（例如，“FALSE”或“0”）。第二个结果方程，估计只有这些观测结果，这个变量等于其水平的第二个元素（例如，TRUE或1）。

NA-s are allowed in the data.  These are ignored if the corresponding outcome is unobserved, otherwise observations which contain NA (either in selection or outcome) are removed.
NA-S允许的数据。这些都将被忽略，如果相应的结果，否则是不可观测的意见，包含NA（无论是在选择或结果）被删除。

These selection models assume a known (multivariate normal) distribution of error terms.  Because of this, the instruments (exclusion restrictions) are not necessary.  However, if no instruments are supplied, the results are based solely on the assumption on multivariate normality.  This may or may not be an appropriate assumption for a particular problem.  Note also that standard errors tend to be large without a strong exclusion restriction.
这些的选择模型假定一个已知的（多元正态）分布的误差项。正因为如此，工具（排除的限制）是没有必要的。但是，如果没有提供乐器，结果是完全基于多元常态的假设。这可能是或可能不会在一个适当的假设为一个特殊的问题。还请注意，没有一个强有力的约束，标准的错误往往是大。

The (generic) function 'coef' ('coef.selection') can be used to extract the estimated coefficients. The (generic) function 'vcov' ('vcov.selection') can be used to extract the estimated variance covariance matrix  of the coefficients. The (generic) function 'print' ('print.selection') can be used to print a few results. The (generic) function 'summary' ('summary.selection') can be used to obtain and print detailed results.
（通用）函数的系数（coef.selection）可以用来提取的估计系数。（通用）功能vcov（vcov.selection）可以用来提取系数的估计方差协方差矩阵。（通用）函数打印（print.selection）可以用来打印一些结果。（通用）函数摘要（summary.selection）可以被用来获取和打印详细的结果。

值----------Value----------

'selection' returns an object of class "selection". If the model estimated by Maximum Likelihood (argument method = "ml"), this object is a list, which has all the components of 'maxLik', and in addition the elements 'twoStep', 'start, 'param', termS, termO, and if requested 'ys', 'xs', 'yo', 'xo', 'mfs', and 'mfo'. If a tobit-2 (sample selection) model is estimated by the two-step method (argument method = "2step"), the returned object is list with components 'probit', 'coefficients', 'param', 'vcov', 'lm', 'sigma', 'rho', 'invMillsRatio', and 'imrDelta'. If a tobit-2 (sample selection) model is estimated by the two-step method (argument method = "2step"), the returned object is list with components 'coefficients', 'vcov', 'probit', 'lm1', 'lm2', 'rho1', 'rho2', 'sigma1', 'sigma2', 'termsS', 'termsO', 'param', and if requested 'ys', 'xs', 'yo', 'xo', 'mfs', and 'mfo'.
“选择”返回“选择”类的一个对象。如果模型估计的最大似然（参数方法=“ML”），该对象是一个列表，里面有所有的组件“maxLik”，除了元素的两步法，开始，参数条款，TERMO，如果被请求的伊苏，“XS，哟，XO，就像mfs，和MFO。如果TOBIT-2（样品选择）模型，所估计的两个步骤的方法（参数=“2步骤”方法），返回的对象是与组件的概率“，”系数，参数，vcov列表，流明，西格马，rho沸石中，invMillsRatio，和imrDelta。如果TOBIT-2（样品选择）模型估计由两个步骤的方法（参数=“2步骤”方法），返回的对象列表与组件的系数，vcov，概率，LM1 ，LM2中，RHO1，RHO2中，sigma1，sigma2中，termsS中，termsO，参数，并且如果被请求的伊苏的，“XS，哟，XO ，中mfs，和“MFO。

参数：probit
object of class 'probit' that contains the results of the 1st step (probit estimation) (only for two-step estimations).
对象的类的概率包含的第一工序（概率单位估计）（仅用于两步估计）的结果。

参数：twoStep
(only if initial values not given) results of the 2-step estimation, used for initial values
（仅当初始值未给出）的结果，用于初始值的2  - 步估计

参数：start
initial values for ML estimation
ML估计的初始值。

参数：termsS, termsO
terms for the selection and outcome equation
选择和结果的方程

参数：ys, xs, yo, xo, mfs, mfo
response, matrix and frame of the selection- and outcome equations (as a list of two for the latter). NULL, if not requested.  The response is represented internally as 0/1 integer vector with 0 denoting either the unobservable outcome (tobit 2) or the first selection (tobit 5).
响应，矩阵和帧的选择和结果方程（后者作为两个列表）。 NULL，如果没有要求。 0/1整数的向量，用0表示“不可观察的结果（托比2）或第一选择（托比5）的内部表示响应。

参数：coefficients
estimated coefficients, the complete model. coefficient for the Inverse Mills ratio is treated as a parameter (= rho * sigma).
估计系数，完整的模型。逆Mills比的系数被视为一个参数（= rho * sigma）。

参数：vcov
variance covariance matrix of the estimated coefficients.
方差协方差矩阵的估计系数。

参数：param
a list with following components: index, a list of numeric vectors: where in the coef the component are located; oIntercept, a logical: whether the outcome equation includes intercept; N0, N1, integer, number of observations with unobserved and observed outcomes; nObs, integer, number of valid observations; nParam, integer, number of the parameters in the model (not all are independent); df, integer, degrees of freedom.  Note this is not equal to nObs - nParam because of the parameters are not independent in all the cases; levels levels for the response of the selection equation. levels[1] corresponds to the outcome 1, levels[2] to the outcome 2.
以下组件：一个列表，index“的列表，数字向量：在coef的组件的位置; oIntercept，一个逻辑结果是否的方程包括拦截;<X >，整数，与未观察到的和观察到的结果的观测数;N0, N1，整数，有效的观测数;nObs，整数，该模型中的参数数目（不是所有的都是独立的）; nParam，整数自由度。请注意，这不等于df因为参数不是独立在所有情况下，“nObs - nParam的选择方程的响应。 levels对应的结果，levels[1]的结果2。

参数：lm, lm1, lm2
objects of class 'lm' that contain the results of the 2nd step estimation(s) of the outcome equation(s). Note: the standard errors of this estimation are biased, because they do not account for the estimation of γ in the 1st step estimation (the correct standard errors are returned by summary and they are contained in vcov component).
对象类流明包含的结果的等式（）的第二步骤（s）的估计的结果。注：此估计的标准误差是偏颇的，因为他们没有考虑到的估计γ在第1步估计（正确的标准错误返回summary，他们都包含在 X>组件）。

参数：sigma, sigma1, sigma2
the standard error(s) of the error terms of the outcome equation(s).
标准误差（S）的结果方程（组）的误差项。

参数：rho, rho1, rho2
the estimated correlation coefficient(s) between the error term of the selection equation and the outcome equation(s).
估计的相关系数（S）之间的选择公式和结果的误差项方程（组）。

参数：invMillsRatio
the inverse Mills Ratios calculated from the results of the 1st step probit estimation.
逆米尔斯比率计算出的结果的第1步的概率估计。

参数：imrDelta
the δs calculated from the inverse Mills Ratios and the results of the 1st step probit estimation.
δ的计算，从逆米尔斯比率和第1步概率估计的结果的。

注意----------Note----------

The 2-step estimate of 'rho' may be outside of the [-1,1] interval.  In that case the standard errors of invMillsRatio may be meaningless.
rho沸石两步骤的估计可能是之外的[-1,1]间隔。在这种情况下，标准的错误的invMillsRatio可能是毫无意义的。

----------Author(s)----------

Arne Henningsen,
Ott Toomet <a href="mailto

toomet@ut.ee">otoomet@ut.ee</a>

参考文献----------References----------

Microeconometrics: Methods and Applications, Cambridge University Press.
Econometric Analysis, Fifth Edition, Prentice Hall.
The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models, Annals of Economic and Social Measurement, 5(4), p. 475-492.
Econometric Methods, Fourth Edition, McGraw-Hill.
Asymetric covariance matrices of two-stage probit and two-stage tobit methods for simultaneous equations models with selectivity. Econometrica, 48, p. 491-503.
Sample Selection Models in R: Package sampleSelection. Journal of Statistical Software 27(7), http://www.jstatsoft.org/v27/i07/
Introductory Econometrics: A Modern Approach, 2e, Thomson South-Western.

参见----------See Also----------

lm, glm, binomial
lm，glm，binomial

实例----------Examples----------

## Greene( 2003 ): example 22.8, page 786[＃格林（2003年）：例如22.8％，页786]
data( Mroz87 )
Mroz87$kids  <- ( Mroz87$kids5 + Mroz87$kids618 > 0 )
# Two-step estimation[两步估计]
summary( heckit( lfp ~ age + I( age^2 ) + faminc + kids + educ,
wage ~ exper + I( exper^2 ) + educ + city, Mroz87 ) )
# ML estimation[ML估计]
summary( selection( lfp ~ age + I( age^2 ) + faminc + kids + educ,
wage ~ exper + I( exper^2 ) + educ + city, Mroz87 ) )

## Wooldridge( 2003 ): example 17.5, page 590[＃伍尔德里奇（2003年）：例如17.5，页590]
data( Mroz87 )
# Two-step estimation[两步估计]
summary( heckit( lfp ~ nwifeinc + educ + exper + I( exper^2 ) + age +
kids5 + kids618, log( wage ) ~ educ + exper + I( exper^2 ), Mroz87,
method = "2step" ) )

## Example using binary outcome for selection model.[＃示例使用二进制结果选择模型。]
## We estimate the probability of womens' education on their[＃我们估计的概率妇女的教育对他们的]
## chances to get high wage (> $5/hr in 1975 USD), using PSID data[＃机会获得较高的工资（> $ 5/hr在1975年美元），PSID数据]
## We use education, age, experience as explanatory variables[＃我们使用以上学历，年龄，经验作为解释变量]
## and add kids, other family income, marginal tax rate, and parents[＃和孩子，其他家庭收入，边际税率和父母]
## education as exclusion restrictions.[教育，排除限制。]
## Note: this is slow[＃注：这是慢]
data(Mroz87)
m <- selection(lfp~educ + poly(age,3) + kids5 + kids618
            + huseduc + mtr + motheduc + fatheduc
            + poly(exper,2) + nwifeinc,
            wage >= 5 ~ educ + poly(age,2)
            + poly(exper,2),
            data=Mroz87)
print(summary(m))

## Cameron and Trivedi (2005): Section 16.6, page 553ff[＃卡梅伦和Trivedi（2005）：第16.6节，第553ff]
data( RandHIE )
subsample <- RandHIE$year == 2 & !is.na( RandHIE$educdec )
selectEq <- binexp ~ logc + idp + lpi + fmde + physlm + disea +
hlthg + hlthf + hlthp + linc + lfam + educdec + xage + female +
child + fchild + black
outcomeEq <- lnmeddol ~ logc + idp + lpi + fmde + physlm + disea +
hlthg + hlthf + hlthp + linc + lfam + educdec + xage + female +
child + fchild + black
# ML estimation[ML估计]
cameron <- selection( selectEq, outcomeEq, data = RandHIE[ subsample, ] )
summary( cameron )

## example using random numbers[例如，使用随机数]
library( MASS )
nObs <- 1000
sigma <- matrix( c( 1, -0.7, -0.7, 1 ), ncol = 2 )
errorTerms <- mvrnorm( nObs, c( 0, 0 ), sigma )
myData <- data.frame( no = c( 1:nObs ), x1 = rnorm( nObs ), x2 = rnorm( nObs ),
u1 = errorTerms[ , 1 ], u2 =  errorTerms[ , 2 ] )
myData$y <- 2 + myData$x1 + myData$u1
myData$s <- ( 2 * myData$x1 + myData$x2 + myData$u2 - 0.2 ) > 0
myData$y[ !myData$s ] <- NA
myOls <- lm( y ~ x1, data = myData)
summary( myOls )
myHeckit <- heckit( s ~ x1 + x2, y ~ x1, myData, print.level = 1 )
summary( myHeckit )

## example using random numbers with IV/2SLS estimation[例如，使用随机数IV/2SLS估计]
library( MASS )
nObs <- 1000
sigma <- matrix( c( 1, 0.5, 0.1, 0.5, 1, -0.3, 0.1, -0.3, 1 ), ncol = 3 )
errorTerms <- mvrnorm( nObs, c( 0, 0, 0 ), sigma )
myData <- data.frame( no = c( 1:nObs ), x1 = rnorm( nObs ), x2 = rnorm( nObs ),
u1 = errorTerms[ , 1 ], u2 = errorTerms[ , 2 ], u3 = errorTerms[ , 3 ] )
myData$w <- 1 + myData$x1 + myData$u1
myData$y <- 2 + myData$w + myData$u2
myData$s <- ( 2 * myData$x1 + myData$x2 + myData$u3 - 0.2 ) > 0
myData$y[ !myData$s ] <- NA
myHeckit <- heckit( s ~ x1 + x2, y ~ w, data = myData )
summary( myHeckit )  # biased![偏！]
myHeckitIv <- heckit( s ~ x1 + x2, y ~ w, data = myData, inst = ~ x1 )
summary( myHeckitIv ) # unbiased[持平]

## tobit-5 example[＃托比-5例子]
N <- 500
library(mvtnorm)
vc <- diag(3)
vc[lower.tri(vc)] <- c(0.9, 0.5, 0.6)
vc[upper.tri(vc)] <- vc[lower.tri(vc)]
eps <- rmvnorm(N, rep(0, 3), vc)
xs <- runif(N)
ys <- xs + eps[,1] > 0
xo1 <- runif(N)
yo1 <- xo1 + eps[,2]
xo2 <- runif(N)
yo2 <- xo2 + eps[,3]
a <- selection(ys~xs, list(yo1 ~ xo1, yo2 ~ xo2))
summary(a)

## tobit2 example[＃tobit2例如]
vc <- diag(2)
vc[2,1] <- vc[1,2] <- -0.7
eps <- rmvnorm(N, rep(0, 2), vc)
xs <- runif(N)
ys <- xs + eps[,1] > 0
xo <- runif(N)
yo <- (xo + eps[,2])*(ys > 0)
a <- selection(ys~xs, yo ~xo)
summary(a)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册

R语言 sampleSelection包 selection()函数中文帮助文档(中英文对照)

浏览过的版块