whatif(WhatIf)
whatif()所属R语言包:WhatIf
Counterfactual Evaluation
反事实的评价
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Implements the methods described in King and Zeng (2006a, 2006b) for evaluating counterfactuals.
在国王和曾(2006年a,2006年b)评估反事实方法的实现。
用法----------Usage----------
whatif(formula = NULL, data, cfact, range = NULL, freq = NULL, nearby = 1,
distance = "gower", miss = "list", choice = "both", return.inputs = FALSE,
return.distance = FALSE, ...)
参数----------Arguments----------
参数:formula
An optional formula without a dependent variable that is of class "formula" and that follows standard R conventions for formulas, e.g. ~ x1 + x2. Allows you to transform or otherwise re-specify combinations of the variables in both data and cfact. To use this parameter, both data and cfact must be coercable to data frames; the variables of both data and cfact must be labeled; and all variables appearing in formula must also appear in both data and cfact. Otherwise, errors are returned. The intercept is automatically dropped. Default is NULL.
一个可选的公式,没有一个因变量是类的“公式”,并遵循标准的R约定的公式,例如: ~X1 + X2。允许你转换或以其他方式重新指定组合中的变量都data和cfact。要使用此参数,两个data和cfact必须coercable的数据框中的变量都data和cfact必须被标记,以及所有的变量出现在 X>也必须出现中两个formula和data。否则,将返回错误。拦截后自动删除。默认是cfact。
参数:data
May take one of the following forms: <ol> A R model output object, such as the output from calls to lm, glm, and zelig. Such an output object must be a list. It must additionally have either a formula or terms component and either a data or model component; if it does not, an error is returned. Of the latter, whatif first looks for data, which should contain either the original data set supplied as part of the model call (as in glm) or the name of this data set (as in zelig), which is assumed to reside in the global environment. If data does not exist, whatif then looks for model, which should contain the model frame (as in lm). The intercept is automatically dropped from the extracted observed covariate data set if the original model included one.
可采取以下形式之一:<OL> AR模型输出对象,,如输出调用lm,glm和zelig。这样的输出对象必须是一个列表。另外,它必须有一个formula或terms组件,要么一个data或model组件;如果没有错误,则返回。后者,whatif先寻找data,应包含提供原始数据集的一部分的模型调用(glm)或该数据集的名称(如如在zelig),它被假定为居住在全球环境。如果data不存在,whatif然后寻找model,它应该包含的模型框架(如lm“)。拦截从提取的观察到的协变量数据设置将被自动丢弃,如果原来的模型包括一个。
A n-by-k non-character (logical or numeric) matrix or data frame of observed covariate data with n data points or units and k covariates. All desired variable transformations and interaction terms should be included in this set of k covariates unless formula is alternatively used to produce them. However, an intercept should not be. Such a matrix may be obtained by passing model output (e.g., output from a call to lm) to model.matrix and excluding the intercept from the resulting matrix if one was fit. Note that whatif will attempt to coerce data frames to their internal numeric values. Hence, data frames should only contain logical, numeric, and factor columns; character columns will lead to an error being returned.
An的k非字符(逻辑或数字)矩阵观察到的n数据点或单位和k协变量的协变量数据或数据框。所有所需的变量转换和互动方面应包括在这组k协变量,除非formula交替用于生产。然而,拦截不应该的。可以通过这种矩阵传递模型的输出(例如,输出一个检测lm)到model.matrix,但不包括截距产生的矩阵,如果是合适的。需要注意的是whatif将尝试强制其内部的数值数据框。因此,数据框应该只包含逻辑,数字和系数列,字符列会导致一个错误被退回。
A string. Either the complete path (including file name) of the file containing the data or the path relative to your working directory. This file should be a white space delimited text file. If it contains a header, you must include a column of row names as discussed in the help file for the R function read.table. The data in the file should be as otherwise described in (2). </ol> Missing data is allowed and will be dealt with via the argument missing. It should be flagged using R's standard representation for missing data, NA.
一个字符串。无论是完整路径(包括文件名)的文件,其中包含的数据或到你的工作目录的相对路径。这个文件应该是一个空格分隔的文本文件。如果它包含一个头,你必须包含一列的行名的帮助文件中讨论的R函数read.table。在该文件中的数据应该是中另有说明(2)。 </ OL>丢失的数据是允许的,通过参数missing予以处理。它应该是标志的使用R“的标准表示丢失的数据,NA。
参数:cfact
A R object or a string. If a R object, a m-by-k non-character matrix or data frame of counterfactuals with m counterfactuals and the same k covariates (in the same order) as in data. However, if formula is used to select a subset of the k covariates, then cfact may contain either only these j <= K covariates or the complete set of k covariates. An intercept should not be included as one of the covariates. It will be automatically dropped from the counterfactuals generated by Zelig if the original model contained one. Data frames will again be coerced to their internal numeric values if possible. If a string, either the complete path (including file name) of the file containing the counterfactuals or the path relative to your working directory. This file should be a white space delimited text file. See the discussion under data for instructions on dealing with a header. All counterfactuals should be fully observed: if you supply counterfactuals with missing data, they will be list-wise deleted and a warning message will be printed to the screen.
AR对象或一个字符串。如果一个R对象,一个m的k非字符的反事实m反事实和相同的k协变量矩阵或数据框(同样的顺序排列)如在data。然而,如果formula用于选择k协变量的一个子集,那么cfact可能包含只有这些j <= K协变量或一套完整的k协变量。不包括截距为一体的协变量。它会自动从反事实Zelig下降,如果包含一个原始模型。数据框将再次强迫其内部的数值,如果可能的话。如果是字符串,无论是完整路径(包括文件名)的文件,其中包含的反事实或你的工作目录的相对路径。这个文件应该是一个空格分隔的文本文件。查看下data有关处理一个头的说明,讨论。所有的反事实应得到充分尊重,如果你提供的反事实丢失的数据,他们将是明智的列表中删除一个警告信息会被打印到屏幕上。
参数:range
An optional numeric vector of length k, where k is the number of covariates. Each element represents the range of the corresponding covariate for use in calculating Gower distances. Use this argument when covariate data do not represent the population of interest, such as selection by stratification or experimental manipulation. By default, the range of each covariate is calculated from the data (the difference of its maximum and minimum values in the sample), which is appropriate when a simple random sampling design was used. To supply your own range for the kth covariate, set the kth element of the vector equal to the desired range and all other elements equal to NA. Default is NULL.
一个可选的数字矢量的长度k,k是协变量的数量。每个元素都表示相应的协变量用于计算高尔距离的范围内。协变量数据时,使用此参数并不代表人口的利益,如选择通过分层或实验操作。缺省情况下,每个协变量的范围内计算从数据(其样品中的最大值和最小值的差),这是适当的,当使用一个简单的随机抽样设计。为k个协变量提供自己的范围内,设定k个元素的向量等于所要求的范围和所有其他元素等于NA。默认是NULL。
参数:freq
An optional numeric vector of any positive length, the elements of which comprise a set of distances. Used in calculating cumulative frequency distributions for the distances of the data points from each counterfactual. For each such distance and counterfactual, the cumulative frequency is the fraction of observed covariate data points with distance to the counterfactual less than or equal to the supplied distance value. The default varies with the distance measure used. When the Gower distance measure is employed, frequencies are calculated for the sequence of Gower distances from 0 to 1 in increments of 0.05. When the Euclidian distance measure is employed, frequencies are calculated for the sequence of Euclidian distances from the minimum to the maximum observed distances in twenty equal increments, all rounded to two decimal places. Default is NULL.
任何正值长度的一个可选的数字向量,其中的元素包括一组距离。用于从每个反事实的数据点的距离,在计算累积频率分布。对于每个这样的距离和反事实,累积频率是观察到的协变量的数据点的距离的供给的距离值小于或等于的反事实的馏分。默认情况下随所使用的距离度量。当高尔距离测度,频率高尔的距离为0.05的增量从0到1的顺序计算。当欧氏距离度量,频率计算欧几里得距离的顺序,从最小到最大观测距离在20个相等的增量,四舍五入到小数点后两位。默认是NULL。
参数:nearby
An optional scalar indicating which observed data points are considered to be nearby (i.e., withing "nearby" geometric variances of) the counterfactuals. Used to calculate the summary statistic returned by the function: the fraction of the observed data nearby each counterfactual. By default, the geometric variance of the covariate data is used. For example, setting nearby to 2 will identify the proportion of data points within two geometric variances of a counterfactual. Default is NULL.
一个可选的标量,表示观察到的数据点被认为是附近(即,“附近”几何方差内)的反事实。用于计算的汇总统计功能:附近各部分的观测数据反事实的返回。默认情况下,使用几何方差协变量的数据。例如,设置nearby2比例确定的数据点在两个几何差异的一个反事实的。默认是NULL。
参数:distance
An optional string indicating which of two distance measures to employ. The choices are either "gower", Gower's non-parametric distance measure (G^2), which is suitable for both qualitative and quantitative data; or "euclidian", squared Euclidian distance, which is only suitable for quantitative data. The default is "gower".
一个可选的字符串,表示这两种距离的措施,聘用。的选择要么是"gower",高尔非参数距离测量(G^2),这是合适的定性和定量数据;或"euclidian",欧氏距离的平方,这是只适用于定量数据。默认的"gower"。
参数:miss
An optional string indicating the strategy for dealing with missing data in the observed covariate data set. whatif supports two possible missing data strategies: "list", list-wise deletion of missing cases; and "case", ignoring missing data case-by-case. Note that if "case" is selected, cases with missing values are deleted listwise for the convex hull test and for computing Euclidian distances, but pairwise deletion is used in computing the Gower distances to maximally use available information. The user is strongly encouraged to treat missing data using specialized tools such as Amelia prior to feeding the data to whatif. Default is "list".
一个可选的字符串,指示处理丢失的数据中观察到的协变量数据集的策略。 whatif支持两种可能丢失数据的策略:"list",列表删除丢失的情况下; "case",而忽略丢失数据的情况下逐案。注意,如果"case"选择,具有缺失值的情况下被删除的凸包的测试和计算欧几里得距离完全排除,但,成对删除被用于计算高尔的距离,以最大限度地利用现有的信息。强烈建议用户,治疗前喂养的数据whatif,使用专门的工具,如阿米莉亚丢失的数据。默认是"list"。
参数:choice
An optional string indicating which analyses to undertake. The options are either "hull", only perform the convex hull membership test; "distance", do not perform the convex hull test but do everything else, such as calculating the distance between each counterfactual and data point; or "both", undertake both the convex hull test and the distance calculations (i.e., do everything). Default is "both".
一个可选的字符串,指示所进行的分析。选项是"hull",只有执行的凸包的成员资格测试;"distance",不执行的凸包测试,但做的一切,如计算每一个反事实和数据点之间的距离;或 "both",承担的凸包测试和距离计算(即,尽一切)。默认是"both"。
参数:return.inputs
A Boolean; should the processed observed covariate and counterfactual data matrices on which all whatif computations are performed be returned? Processing refers to internal whatif operations such as the subsetting of covariates via formula, the deletion of cases with missing values, and the coercion of data frames to numeric matrices. Primarily intended for diagnostic purposes. If TRUE, these matrices are returned as a list. Default is FALSE.
一个布尔值;协变量和反事实的观察数据矩阵上的所有whatif计算返回的处理吗?处理是指内部whatif操作,如子集的协变量通过formula,缺失值和胁迫的数据框,数字矩阵的情况下删除。主要用于诊断目的。如果TRUE,这些矩阵作为一个列表返回。默认是FALSE。
参数:return.distance
A Boolean; should the matrix of distances between each counterfactual and data point be returned? If TRUE, this matrix is returned as part of the output; if FALSE, it is not. Default is FALSE due to the large size that this matrix may attain.
一个布尔值;应矩阵反事实和数据点之间的距离,回来了吗?如果TRUE,此矩阵作为输出的一部分返回,如果FALSE,它是不。默认值是FALSE由于大尺寸,这个矩阵可以达到。
参数:...
Further arguments passed to and from other methods.
更多参数传递给其他方法。
Details
详细信息----------Details----------
This function is the primary tool for evaluating your counterfactuals. Specifically, it: <ol> Determines whether or not your counterfactuals are in the convex hull of the observed covariate data.
此功能是评估你的反事实的主要工具。具体而言,它<OL>决定是否反事实是不是你所观察到的协变量数据的凸包。
Computes the distance of your counterfactuals from each of the n observed covariate data points. The default distance function used is Gower's non-parametric measure.
计算每个n观察到的协变量的数据点的距离,您的反事实。默认情况下使用的距离函数是高尔的非参数的措施。
Computes a summary statistic for each counterfactual based on the distances in (2): the fraction of observed covariate data points with distances to your counterfactual less than a value you supply. By default, this value is taken to be the geometric variability of the observed data.
计算一个简要统计上的距离为基础的(2):观察到的协变量的数据点的距离到您的反事实的馏分您能提供一个值小于为每个反事实。默认情况下,此值的几何变化的观测数据。
Computes the cumulative frequency distribution of each counterfactual for the distances in (2) using values that you supply. By default, Gower distances from 0 to 1 in increments of 0.05 are used. </ol>
(2)使用您提供的值计算累积频率分布的每一个反事实的距离。缺省情况下,使用高尔为0.05的增量从0到1的距离。 </ OL>
值----------Value----------
An object of class "whatif", a list consisting of the following six or seven elements: <table summary="R valueblock"> <tr valign="top"><td>call</td> <td> The original call to whatif.</td></tr> <tr valign="top"><td>inputs</td> <td> A list with two elements, data and cfact. Only present if return.inputs was set equal to TRUE in the call to whatif. The first element is the processed observed covariate data matrix on which all whatif computations were performed. The second element is the processed counterfactual data matrix.</td></tr> <tr valign="top"><td>in.hull</td> <td> A logical vector of length m, where m is the number of counterfactuals. Each element of the vector is TRUE if the corresponding counterfactual is in the convex hull and FALSE otherwise.</td></tr> <tr valign="top"><td>dist</td> <td> A m-by-n numeric matrix, where m is the number of counterfactuals and n is the number of data points (units). Only present if return.distance was set equal to TRUE in the call to whatif. The [i, j]th entry of the matrix contains the distance between the ith counterfactual and the jth data point.</td></tr> <tr valign="top"><td>geom.var</td> <td> A scalar. The geometric variability of the observed covariate data.</td></tr> <tr valign="top"><td>sum.stat</td> <td> A numeric vector of length m, where m is the number of counterfactuals. The mth element contains the summary statistic for the corresponding counterfactual. This summary statistic is the fraction of data points with distances to the counterfactual less than the argument nearby, which by default is the geometric variability of the covariates.</td></tr> <tr valign="top"><td>cum.freq</td> <td> A numeric matrix. By default, the matrix has dimension m-by-21, where m is the number of counterfactuals; however, if you supplied your own frequencies via the argument freq, the matrix has dimension m-by-f, where f is the length of freq. Each row of the matrix contains the cumulative frequency distribution for the corresponding counterfactual calculated using either the distance measure-specific default set of distance values or the set that you supplied (see the discussion under the argument freq). Hence, the [i, j]th entry of the matrix is the fraction of data points with distances to the ith counterfactual less than or equal to the value represented by the jth column. The column names contain these values.</td></tr> </table>
“WHATIF”,包括以下六个或七个元素的列表类的对象:表summary="R valueblock"> <tr valign="top"> <TD> call</ TD> <TD原来的呼叫whatif。</ TD> </ TR> <tr valign="top"> <TD>inputs </ TD> <td>一个列表有两个元素,data和cfact。如果return.inputs等于TRUE在调用whatif。第一个元素的处理上观察到的协变量数据矩阵计算所有的whatif。第二个元素是处理反事实的数据矩阵。</ TD> </ TR> <tr valign="top"> <TD> in.hull </ TD> <td>一个逻辑向量的长度m ,m是反事实的数量。每一个元素的向量是TRUE如果是在凸包和相应的反事实的FALSE否则。</ TD> </ TR> <tr valign="top"> <TD><X > </ TD> <TD> Adist的m数字矩阵,其中n是反事实m是数据点的数量(单位)。如果n等于return.distance在调用TRUE。 whatif的个条目的矩阵包含[i, j]次反事实和i个数据点之间的距离的。</ TD> </ TR> <tr valign="top"> <TD> j</ TD> <td>一个标量。所观察到的协变量数据的几何变化。</ TD> </ TR> <tr valign="top"> <TD>geom.var </ TD> <td>一个数值向量的长度sum.stat ,其中m是反事实的数量。 m个元素包含相应的反事实的汇总统计。本摘要统计的反事实的距离小于该参数是小数的数据点m,默认情况下是几何的协变量的变化。</ TD> </ TR> <TR VALIGN =“顶” > <TD> nearby </ TD> <td>一个数字矩阵。默认情况下,矩阵的维数cum.freq-21,其中m是反事实的数量;但是,如果您提供您自己的频率通过参数m,矩阵的维数freq-m,其中f的长度f。的矩阵的每一行中包含的累积频率分布,计算出相应的反事实使用特定测量的距离的距离值的默认设置或您所提供的组(参见下的讨论的参数freq)。因此,freq个条目的矩阵是与[i, j]次反事实的距离小于或等于由i列表示的值的数据点的馏分。列名包含这些值。</ TD> </ TR> </ TABLE>
注意----------Note----------
This function requires the lpSolve package.
此功能需要lpSolve包。
(作者)----------Author(s)----------
Stoll, Heather <a href="mailto:hstoll@polsci.ucsb.edu">hstoll@polsci.ucsb.edu</a>, King, Gary
<a href="mailto:king@harvard.edu">king@harvard.edu</a> and Zeng, Langche <a href="mailto:zeng@ucsd.edu">zeng@ucsd.edu</a>
参考文献----------References----------
Extreme Counterfactuals." Political Analysis 14 (2). Available from http://gking.harvard.edu.
King, Gary and Langche Zeng. 2007. "When Can History Be Our Guide? The Pitfalls of Counterfactual Inference." International Studies Quarterly
参见----------See Also----------
plot.whatif, summary.whatif, print.whatif, print.summary.whatif
plot.whatif,summary.whatif,print.whatif,print.summary.whatif
实例----------Examples----------
## Create example data sets and counterfactuals[#创建示例数据集和反事实]
my.cfact <- matrix(rnorm(3*5), ncol = 5)
my.data <- matrix(rnorm(100*5), ncol = 5)
## Evaluate counterfactuals[#评估反事实]
my.result <- whatif(data = my.data, cfact = my.cfact)
## Evaluate counterfactuals and supply own gower distances for [#评估反事实,并提供自己的的高尔距离为]
## cumulative frequency distributions[#累积频率分布]
my.result <- whatif(cfact = my.cfact, data = my.data, freq = c(0, .25, .5, 1, 1.25, 1.5))
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|