predab.resample(rms)
predab.resample()所属R语言包:rms
Predictive Ability using Resampling
预测能力的重采样
译者:生物统计家园网 机器人LoveR
描述----------Description----------
predab.resample is a general-purpose function that is used by functions for specific models. It computes estimates of optimism of, and bias-corrected estimates of a vector of indexes of predictive accuracy, for a model with a specified design matrix, with or without fast backward step-down of predictors. If bw=TRUE, the design matrix x must have been created by ols, lrm, or cph. If bw=TRUE, predab.resample stores as the kept attribute a logical matrix encoding which factors were selected at each repetition.
predab.resample是一个通用的功能所使用的特定型号的功能。乐观的估计,估计的向量指标的预测精度和偏差校正计算,对于一个指定的设计矩阵的模型,带或不带快速倒退式的预测。如果bw=TRUE,设计矩阵x必须已创建的ols,lrm或cph。如果bw=TRUE,predab.resample店为kept属性的逻辑因素在每个重复的矩阵编码。
用法----------Usage----------
predab.resample(fit.orig, fit, measure,
method=c("boot","crossvalidation",".632","randomization"),
bw=FALSE, B=50, pr=FALSE,
rule="aic", type="residual", sls=.05, aics=0,
tol=1e-12, force=NULL, non.slopes.in.x=TRUE, kint=1,
cluster, subset, group=NULL, debug=FALSE, ...)
参数----------Arguments----------
参数:fit.orig
object containing the original full-sample fit, with the x=TRUE and y=TRUE options specified to the model fitting function. This model should be the FULL model including all candidate variables ever excluded because of poor associations with the response.
对象,其中包含原来的全样本匹配,与x=TRUE和y=TRUE指定的模型拟合函数的选项。这种模式应该是完整的模型,包括所有候选变量以往任何时候都排除在外,因为穷人协会的响应。
参数:fit
a function to fit the model, either the original model fit, or a fit in a sample. fit has as arguments x,y, iter, penalty, penalty.matrix, xcol, and other arguments passed to predab.resample. If you don't want iter as an argument inside the definition of fit, add ... to the end of its argument list. iter is passed to fit to inform the function of the sampling repetition number (0=original sample). If bw=TRUE, fit should allow for the possibility of selecting no predictors, i.e., it should fit an intercept-only model if the model has intercept(s). fit must return objects coef and fail (fail=TRUE if fit failed due to singularity or non-convergence - these cases are excluded from summary statistics). fit must add design attributes to the returned object if bw=TRUE. The penalty.matrix parameter is not used if penalty=0. The xcol vector is a vector of columns of X to be used in the current model fit. For ols and psm it includes a 1 for the intercept position. xcol is not defined if iter=0 unless the initial fit had been from a backward step-down. xcol is used to select the correct rows and columns of penalty.matrix for the current variables selected, for example.
一个函数来拟合模型,要么是原始模型的拟合,或一个合适的样品中。适合作为参数x,y,iter,penalty,penalty.matrix,xcol,和其他参数传递给predab.resample 。如果你不想iter作为一个参数里面定义的fit,添加...到最后,它的参数列表。 iter被传递给fit通知的功能的取样重复数(0 =原始样品)。如果bw=TRUE,fit应该选择任何预测,即允许的可能性,它应该适合仅截距模型,如果模型拦截(S)。 fit必须返回对象coef和fail(fail=TRUE如果fit失败,原因是单数或不衔接 - 被排除在这些情况下,汇总统计数据)。如果fitbw=TRUE必须添加的设计属性返回的对象。如果不使用penalty.matrixpenalty=0参数。 xcolX可以使用在目前的模型拟合列向量是一个向量。对于ols和psm它包括一个1的拦截位置。 xcol如果iter=0,除非初始适合已经从一个落后的降压。 xcol用于penalty.matrix为当前选择的变量,例如,以选择正确的行和列。
参数:measure
a function to compute a vector of indexes of predictive accuracy for a given fit. For method=".632" or method="crossval", it will make the most sense for measure to compute only indexes that are independent of sample size. The measure function should take the following arguments or use ...: xbeta (X beta for current fit), y, evalfit, fit, iter, and fit.orig. iter is as in fit. evalfit is set to TRUE by predab.resample if the fit is being evaluated on the sample used to make the fit, FALSE otherwise; fit.orig is the fit object returned by the original fit on the whole sample. Using evalfit will sometimes save computations. For example, in bootstrapping the area under an ROC curve for a logistic regression model, lrm already computes the area if the fit is on the training sample. fit.orig is used to pass computed configuration parameters from the original fit such as quantiles of predicted probabilities that are used as cut points in other samples. The vector created by measure should have names() associated with it.
一个函数来计算的预测精度的矢量的索引,对于一个给定的拟合。 method=".632"或method="crossval",它会做出最有意义的措施,只计算索引是独立的样本量。测量功能应采用下列参数或使用...:xbeta(X测试版目前适合),y,evalfit,fit,iter和fit.orig。 iter是作为在fit。 evalfit设置为TRUE predab.resample如果配合目前正在评估适合用来做样品,FALSE否则,“fit.orig是合适的对象返回原来的适合整个样本。使用evalfit有时会节省计算。例如,在引导了一个logistic回归模型的ROC曲线下面积,lrm已计算面积,如果配合上的训练样本。 fit.orig被用来传递从原来的适合如分位数作为切点,其他样品的预测概率计算出来的配置参数。向量的措施应该有names()与它相关联的。
参数:method
The default is "boot" for ordinary bootstrapping (Efron, 1983, Eq. 2.10). Use ".632" for Efron's .632 method (Efron, 1983, Section 6 and Eq. 6.10), "crossvalidation" for grouped cross–validation, "randomization" for the randomization method. May be abbreviated down to any level, e.g. "b", ".", "cross", "rand".
默认值是"boot"的普通自举(埃弗龙,1983年式2.10)。使用".632"·埃夫隆的.632方法(埃弗龙,1983年,第6和式6.10),"crossvalidation"分组交叉验证,"randomization"的随机化方法。可能被缩写的任何水平,例如"b",".","cross","rand"。
参数:bw
Set to TRUE to do fast backward step-down for each training sample. Default is FALSE.
设置为TRUE做快退降压每个训练样本。默认是FALSE。
参数:B
Number of repetitions, default=50. For method="crossvalidation", this is also the number of groups the original sample is split into.
重复次数,默认为50。对于method="crossvalidation",这亦是原始样品被分成的组的数目。
参数:pr
TRUE to print results for each sample. Default is FALSE.
TRUE打印每个样品的结果。默认是FALSE。
参数:rule
Stopping rule for fastbw, "aic" or "p". Default is "aic" to use Akaike's information criterion.
停止规则的fastbw,"aic"或"p"。默认值是"aic"使用Akaike的信息准则。
参数:type
Type of statistic to use in stopping rule for fastbw, "residual" (the default) or "individual".
类型统计使用停止规则为fastbw,"residual"(默认值)或"individual"。
参数:sls
Significance level for stopping in fastbw if rule="p". Default is .05.
如果rule="p"停在fastbw的显着性水平为。默认是.05。
参数:aics
Stopping criteria for rule="aic". Stops deleting factors when chi-square - 2 times d.f. falls below aics. Default is 0.
停止标准为rule="aic"。停止删除因素时,卡方 - 2次DF落在下面aics。默认是0。
参数:tol
Tolerance for singularity checking. Is passed to fit and fastbw.
奇异检查公差。是通过fit和fastbw。
参数:force
see fastbw
看到fastbw
参数:non.slopes.in.x
set to FALSE if the design matrix x does not have columns for intercepts and these columns are needed
设置为FALSE如果设计矩阵x没有用于拦截这些列的列都需要
参数:kint
For multiple intercept models such as the ordinal logistic model, you may specify which intercept to use as kint. This affects the linear predictor that is passed to measure.
对于多个拦截模式,如有序模型,你可以指定,可拦截使用kint。这会影响传递给measure的线性预测。
参数:cluster
Vector containing cluster identifiers. This can be specified only if method="boot". If it is present, the bootstrap is done using sampling with replacement from the clusters rather than from the original records. If this vector is not the same length as the number of rows in the data matrix used in the fit, an attempt will be made to use naresid on fit.orig to conform cluster to the data. See bootcov for more about this.
Vector,其中包含簇标识符。这可以指定只有method="boot"。如果它是存在的,自举与更换从簇,而不是从原始的记录是通过使用采样。如果该向量适合用于数据矩阵中的行数是不一样的长度,将尝试使用naresidfit.orig符合cluster的数据。见bootcov更多信息。
参数:subset
specify a vector of positive or negative integers or a logical vector when you want to have the measure function compute measures of accuracy on a subset of the data. The whole dataset is still used for all model development. For example, you may want to validate or calibrate a model by assessing the predictions on females when the fit was based on males and females. When you use cr.setup to build extra observations for fitting the continuation ratio ordinal logistic model, you can use subset to specify which cohort or observations to use for deriving indexes of predictive accuracy. For example, specify subset=cohort=="all" to validate the model for the first layer of the continuation ratio model (Prob(Y=0)).
指定一个向量的正的或负的整数或逻辑向量,当你想有measure功能的一个子集的数据计算的精度的措施。整个数据集采用的依然是所有模式的发展。例如,您可能想validate或calibrate一个模型评估适合女性的预测是基于对男性和女性。当你使用cr.setup装修的延续比序Logistic模型的建立额外的观察,你可以使用subset到指定使用cohort或意见,得出指标的预测精度。例如,指定subset=cohort=="all"来验证模型的延续比模型的第一层(PROB(Y = 0))。
参数:group
a grouping variable used to stratify the sample upon bootstrapping. This allows one to handle k-sample problems, i.e., each bootstrap sample will be forced to selected the same number of observations from each level of group as the number appearing in the original dataset.
一组变量用于,分层样本后的引导。这允许一个处理K-样本问题,即,每个引导将被迫选择相同数目的观测值出现在原始数据集的数量从每个级别组样品。
参数:debug
set to TRUE to print subscripts of all training and test samples
设置为TRUE的所有训练和测试样本打印标
参数:...
The user may add other arguments here that are passed to fit and measure. </table>
用户可以在这里添加其他的参数被传递给fit和measure。 </ TABLE>
Details
详细信息----------Details----------
For method=".632", the program stops with an error if every observation is not omitted at least once from a bootstrap sample. Efron's ".632" method was developed for measures that are formulated in terms on per-observation contributions. In general, error measures (e.g., ROC areas) cannot be written in this way, so this function uses a heuristic extension to Efron's formulation in which it is assumed that the average error measure omitting the ith observation is the same as the average error measure omitting any other observation. Then weights are derived for each bootstrap repetition and weighted averages over the B repetitions can easily be computed.
对于method=".632",程序停止一个错误,如果不省略每一个观察至少一次从引导样品。 ·埃夫隆的“0.632”的方法被开发为在观察每个捐款的条款,制定措施,。在一般情况下,错误的措施(例如,ROC面积)可以不以这种方式被写入,因此该函数使用一个启发式延伸埃夫隆的配方中,它假定省略i个观测的平均误差测量是相同的的平均误差测量省略任何其他观察。然后,权重B重复可以很容易地计算每个引导的重复和加权平均得到。
值----------Value----------
a matrix of class "validate" with rows corresponding to indexes computed by measure, and the following columns:
矩阵类"validate"计算measure,下面列的索引的行对应:
参数:index.orig
indexes in original overall fit
在原来的整体配合的索引
参数:training
average indexes in training samples
在训练样本的平均指数
参数:test
average indexes in test samples
在测试样品的平均指数
参数:optimism
average training-test except for method=".632" - is .632 times (index.orig - test)
平均training-test除了method=".632"“ - 是0.632倍(index.orig - test)
参数:index.corrected
index.orig-optimism
index.orig-optimism
参数:n
number of successful repetitions with the given index non-missing </table>
一些成功的重复与非缺失给定的索引</ TABLE>
(作者)----------Author(s)----------
Frank Harrell<br>
Department of Biostatistics, Vanderbilt University<br>
f.harrell@vanderbilt.edu
参考文献----------References----------
参见----------See Also----------
rms, validate, fastbw, lrm, ols, cph, bootcov
rms,validate,fastbw,lrm,ols,cph,bootcov
实例----------Examples----------
# See the code for validate.ols for an example of the use of[查看的代码的使用的一个例子为validate.ols]
# predab.resample[predab.resample]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|