R语言 rms包 val.prob()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-27 19:17:07

val.prob(rms)
val.prob()所属R语言包：rms

                                       Validate Predicted Probabilities
                                       验证预测概率

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

The val.prob function is useful for validating predicted probabilities against binary events.
val.prob：功能是有用的，用于验证对二进制事件的预测概率。

Given a set of predicted probabilities p or predicted log odds logit, and a vector of binary outcomes y that were not used in developing the predictions p or logit, val.prob computes the following indexes and statistics: Somers' D_{xy} rank correlation between p and y [2(C-.5), C=ROC area], Nagelkerke-Cox-Snell-Maddala-Magee R-squared index, Discrimination index D [ (Logistic model L.R. chi-square - 1)/n], L.R. chi-square, its P-value, Unreliability index U, chi-square with 2 d.f.  for testing unreliability (H0: intercept=0, slope=1), its P-value, the quality index Q, Brier score (average squared difference in p and y), Intercept, and Slope, E_{max}=maximum absolute difference in predicted and calibrated probabilities, the Spiegelhalter Z-test for calibration accuracy, and its two-tailed P-value.  If pl=TRUE, plots fitted logistic  calibration curve and optionally a smooth nonparametric fit using lowess(p,y,iter=0) and grouped proportions vs.  mean predicted probability in group.  If the predicted probabilities or logits are constant, the statistics are returned and no plot is made.
给定一组的预测概率p或预测的log赔率logit，和一个向量的二进制结果y中未使用的发展的预测p或logit ，val.prob计算以下指标和统计数据：萨默斯D_{xy} p和y[2(C-.5)，C= ROC面积的排名之间的相关性]，Nagelkerke  - 考克斯 - 斯内尔-Maddala-马吉R-平方指数，歧视指数D [（Logistic模型LRchi-square -  1）/ N]，LR chi-square，P值，虚增指数U，chi-square2 DF测试不可靠（H0：拦截= 0，斜率= 1），其P价值，质量指标Q，Brier得分（平均平方差p和y）Intercept和Slope，E_{max} =最大绝对差异概率预测和校准，在Spiegelhalter Z测试校准精度，其双尾P价值。如果pl=TRUE，图装MF校准曲线和可选的平滑非参数适合使用lowess(p,y,iter=0)和分组的比例与组平均预测的概率。如果预测概率或logits的是不变的，返回的统计，没有图。

When group is present, different statistics are computed, different graphs are made, and the object returned by val.prob is different.  group specifies a stratification variable. Validations are done separately by levels of group and overall.  A print method prints summary statistics and several quantiles of predicted probabilities, and a plot method plots calibration curves with summary statistics superimposed, along with selected quantiles of the predicted probabilities (shown as tick marks on calibration curves).  Only the lowess calibration curve is estimated.  The statistics computed are the average predicted probability, the observed proportion of events, a 1 d.f. chi-square statistic for testing for overall mis-calibration (i.e., a test of the observed vs. the overall average predicted probability of the event) (ChiSq), and a 2 d.f. chi-square statistic for testing simultaneously that the intercept of a linear logistic calibration curve is zero and the slope is one (ChiSq2), average absolute calibration error (average absolute difference between the lowess-estimated calibration curve and the line of identity, labeled Eavg), Eavg divided by the difference between the 0.95 and 0.05 quantiles of predictive probabilities (Eavg/P90), a "median odds ratio", i.e., the anti-log of the median absolute difference between predicted and calibrated predicted log odds of the event (Med OR), the C-index (ROC area), the Brier quadratic error score (B), a chi-square test of goodness of fit based on the Brier score (B ChiSq), and the Brier score computed on calibrated rather than raw predicted probabilities (B cal).  The first chi-square test is a test of overall calibration accuracy ("calibration in the large"), and the second will also detect errors such as slope shrinkage caused by overfitting or regression to the mean.  See Cox (1970) for both of these score tests.  The goodness of fit test based on the (uncalibrated) Brier score is due to Hilden, Habbema, and Bjerregaard (1978) and is discussed in Spiegelhalter (1986).  When group is present you can also specify sampling weights (usually frequencies), to obtained weighted calibration curves.
当group存在，不同的统计计算，不同的图形，并返回的对象val.prob是不同的。 group指定分层变量。验证组和整体水平的分别。 Aprint方法打印汇总统计和多位数的预测概率，和一个plot方法图校准曲线叠加的摘要统计，随着选定位数的预测概率（显示为刻度校准曲线）。只有lowess校准曲线进行估计。的统计信息计算的预测概率的平均值，所观察到的事件的比例，自由度为1卡方统计的整体管理信息系统校准测试（即测试的观察与整体平均预测概率事件）（ChiSq），和2 DF卡方统计量进行测试，同时线性MF校准曲线的截距为零的斜率是一个（ChiSq2），平均绝对校准误差（平均绝对差之间的lowess-估计校准曲线和线的身份，标记Eavg），Eavg之间的差异在0.95和0.05分位数的预测概率（Eavg/P90），“中间的比值比”，即除以反log中位数的预测和校准之间的绝对差的预测事件log赔率（Med OR），在C-指数（ROC面积），石南木二次误差得分（B），一个卡方检验的拟合优度的基础上的石南木得分（B ChiSq），和蒺藜得分的计算校准，而不是原始的预测概率（B cal）。第一个卡方检验是一个测试的整体校准精度（校准中大“），第二个也将检测到的错误，如坡收缩所造成的过度拟合或回归平均。请参阅Cox（1970）两个这些得分测试。善良的拟合优度检验的基础上（未校准）石南木得分是由于希尔登，Habbema，比耶勒高（1978）和中讨论Spiegelhalter（1986）。当group存在，你也可以指定采样weights（通常是频率），以获得加权校准曲线。

To get the behavior that results from a grouping variable being present without having a grouping variable, use group=TRUE.  In the plot method, calibration curves are drawn and labeled by default where they are maximally separated using the labcurve function. The following parameters do not apply when group is present: pl, smooth, logistic.cal, m, g, cuts, emax.lim, legendloc, riskdist, mkh, connect.group, connect.smooth.  The following parameters apply to the plot method but not to val.prob: xlab, ylab, lim, statloc, cex.
为了获得行为的存在，而无需分组变量分组变量的结果，使用group=TRUE。在plot方法，校准曲线绘制并标记在默认情况下，他们最大labcurve使用函数分离。下面的参数并不适用于当group存在pl，smooth，logistic.cal，m，g，cuts emax.lim，legendloc，riskdist，mkh，connect.group，connect.smooth。以下参数适用于plot方法，但不val.prob：xlab，ylab，lim，statloc，cex 。

用法----------Usage----------

val.prob(p, y, logit, group, weights=rep(1,length(y)), normwt=FALSE,
      pl=TRUE, smooth=TRUE, logistic.cal=TRUE,
      xlab="Predicted Probability", ylab="Actual Probability",
      lim=c(0, 1), m, g, cuts, emax.lim=c(0,1),
      legendloc=lim[1] + c(0.55 * diff(lim), 0.27 * diff(lim)),
      statloc=c(0,0.99), riskdist="calibrated", cex=.7, mkh=.02,
      connect.group=FALSE, connect.smooth=TRUE, g.group=4,
      evaluate=100, nmin=0)

## S3 method for class 'val.prob'
print(x, ...)

## S3 method for class 'val.prob'
plot(x, xlab="Predicted Probability",
   ylab="Actual Probability",
   lim=c(0,1), statloc=lim, stats=1:12, cex=.5,
   lwd.overall=4, quantiles=c(.05,.95), flag, ...)

参数----------Arguments----------

参数：p
predicted probability
预测概率

参数：y
vector of binary outcomes
矢量的二进制结果

参数：logit
predicted log odds of outcome.  Specify either p or logit.
预测的结果的log赔率。指定是p或logit。

参数：group
a grouping variable.  If numeric this variable is grouped into g.group quantile groups (default is quartiles).  Set group=TRUE to use the group algorithm but with a single stratum for val.prob.
一组变量。如果数字变量分为g.group位数组（默认情况下四分位数）。设置group=TRUE使用group的算法，但用一个层val.prob。

参数：weights
an optional numeric vector of per-observation weights (usually frequencies), used only if group is given.
一个可选的数字矢量的每个观察的权重（通常是频率），，只有group。

参数：normwt
set to TRUE to make weights sum to the number of non-missing observations.
设置为TRUEweights和非缺失观测的数量。

参数：pl
TRUE to plot calibration curves and optionally statistics
TRUE来绘制校准曲线和选择性统计

参数：smooth
plot smooth fit to (p,y) using lowess(p,y,iter=0)
图流畅的配合(p,y)使用lowess(p,y,iter=0)

参数：logistic.cal
plot linear logistic calibration fit to (p,y)
图线性MF校准适合(p,y)

参数：xlab
x-axis label, default is "Predicted Probability" for val.prob.
X轴标签，默认情况下是"Predicted Probability"val.prob。

参数：ylab
y-axis label, default is "Actual Probability" for val.prob.
Y轴标签，默认是"Actual Probability"val.prob。

参数：lim
limits for both x and y axes
为x和y轴的限制

参数：m
If grouped proportions are desired, average no. observations per group
如果分组的比例是理想的，平均无。每个组的观察

参数：g
If grouped proportions are desired, number of quantile groups
如果分组的比例是理想的，一些分量组

参数：cuts
If grouped proportions are desired, actual cut points for constructing intervals, e.g. c(0,.1,.8,.9,1) or seq(0,1,by=.2)
如果分组的比例是理想的，实际建设时间间隔，如切点c(0,.1,.8,.9,1)或seq(0,1,by=.2)

参数：emax.lim
Vector containing lowest and highest predicted probability over which to compute Emax.
矢量包含最低和最高预测概率计算Emax。

参数：legendloc
If pl=TRUE, list with components x,y or vector c(x,y) for upper left corner of legend for curves and points.  Default is c(.55, .27) scaled to lim.  Use locator(1) to use the mouse, FALSE to suppress legend.
如果pl=TRUE，组件列表x,y或向量c(x,y)曲线和点左上角的传说。默认是c(.55, .27)扩展到lim。使用locator(1)使用鼠标，FALSE抑制传说。

参数：statloc
D_{xy}, C, R^2, D, U, Q, Brier score, Intercept, Slope, and E_{max} will be added to plot, using statloc as the upper left corner of a box (default is c(0,.9)). You can specify a list or a vector.  Use locator(1) for the mouse, FALSE to suppress statistics. This is plotted after the curve legends.
D_{xy}，C，R^2，D，U，Q，Brier得分，Intercept， Slope和E_{max}将被添加到图，用statloc“”一个盒子的左上角（默认是c(0,.9)）。您可以指定一个列表或一个矢量。使用locator(1)鼠标，FALSE抑制统计。这是绘制后曲线的传奇。

参数：riskdist
Defaults to "calibrated" to plot the relative frequency distribution of calibrated probabilities after dividing into 101 bins from lim[1] to lim[2]. Set to "predicted" to use raw assigned risk, FALSE to omit risk distribution. Values are scaled so that highest bar is 0.15*(lim[2]-lim[1]).
默认为"calibrated"绘制的相对频率校准的概率分布后分为101箱lim[1]到lim[2]。设置为"predicted"使用原分配风险，FALSE忽略风险分布。因此，最高的条形是0.15*(lim[2]-lim[1])值的换算。

参数：cex
Character size for legend or for table of statistics when group is given
字符大小传说或表的统计信息，当group

参数：mkh
Size of symbols for legend. Default is 0.02 (see par()).
传说的符号的大小。默认值是0.02（见par()）。

参数：connect.group
Defaults to FALSE to only represent group fractions as triangles. Set to TRUE to also connect with a solid line.
默认为FALSE仅代表组分数为三角形。设置为TRUE也用实线连接。

参数：connect.smooth
Defaults to TRUE to draw smoothed estimates using a dashed line. Set to FALSE to instead use dots at individual estimates.
默认为TRUE使用虚线绘制平滑估计。设置为FALSE在个人的估计，而不是使用点。

参数：g.group
number of quantile groups to use when group is given and variable is numeric.
位数组数时使用的group，并给出变量的数字。

参数：evaluate
number of points at which to store the lowess-calibration curve. Default is 100.  If there are more than evaluate unique predicted probabilities, evaluate equally-spaced quantiles of the unique predicted probabilities, with linearly interpolated calibrated values, are retained for plotting (and stored in the object returned by val.prob.
在其中存储lowess-校准曲线的点的数量。默认值是100。如果有更多的evaluate独特的预测概率比，evaluate的等距位数独特的预测概率，与校准值线性内插，保留的图（和返回的对象存储在<X >。

参数：nmin
applies when group is given.  When nmin > 0, val.prob will not store coordinates of smoothed calibration curves in the outer tails, where there are fewer than nmin raw observations represented in those tails.  If for example nmin=50, the plot function will only plot the estimated calibration curve from a to b, where there are 50 subjects with predicted probabilities < a and > b. nmin is ignored when computing accuracy statistics.
适用于，当group。当nmin> 0，val.prob不会存储坐标校准曲线的平滑外尾部，那里有不到nmin为代表的原始观测在那些尾巴。例如，如果nmin= 50，plot功能将只绘制校准曲线估计从a到b，其中有50名受检的预测概率< a 和> b。 nmin被忽略，当计算精度统计。

参数：x
result of val.prob (with group in effect)
结果的val.prob（group有效的）

参数：...
optional arguments for labcurve (through plot).  Commonly used options are col (vector of colors for the strata plus overall) and lty.  Ignored for print.
可选参数labcurve（通过plot）。常用的选项col的阶层加上整体的颜色（矢量）和lty。为print忽略。

参数：stats
vector of column numbers of statistical indexes to write on plot
矢量统计指标列数写图

参数：lwd.overall
line width for plotting the overall calibration curve
整体校准曲线绘制的线宽度

参数：quantiles
a vector listing which quantiles should be indicated on each calibration curve using tick marks.  The values in quantiles can be any number of values from the following: .01, .025, .05, .1, .25, .5, .75, .9, .95, .975, .99. By default the 0.05 and 0.95 quantiles are indicated.
一个向量上市位数上应注明每个校准曲线刻度线。 quantiles中的值可以从以下任意数量的值：0.01，0.025，0.05，0.1，0.25，0.5，0.75，0.9，0.95，0.975，0.99 。默认情况下，0.05和0.95位数表示。

参数：flag
a function of the matrix of statistics (rows representing groups) returning a vector of character strings (one value for each group, including "Overall").  plot.val.prob will print this vector of character values to the left of the statistics.  The flag function  can refer to columns of the matrix used as input to the function by their names given in the description above.  The default function returns "*" if either ChiSq2 or B ChiSq is significant at the 0.01 level and " " otherwise.
返回一个字符串向量（为每个组的一个值，其中包括“整体”）的函数矩阵的统计数据（行，分别代表组）。 plot.val.prob将打印此向量的统计信息的左侧的字符值。 flag函数可以参考作为该函数的输入，通过在上面的描述中给出，其名称所使用的矩阵的列。默认的函数返回"*"，如果是ChiSq2或B ChiSq是在0.01的水平显着" "否则。

Details

详细信息----------Details----------

The 2 d.f. chi-square test and Med OR exclude predicted or calibrated predicted probabilities ≤q 0 to zero or ≥q 1, adjusting the sample size as needed.
2 D.F. chi-square测试Med OR排除预测或校准的预测概率≤q 0零或≥q 1，根据需要调整样本量。

值----------Value----------

val.prob without group returns a vector with the following named elements: Dxy, R2, D, D:Chi-sq, D:p, U, U:Chi-sq, U:p, Q, Brier, Intercept, Slope, S:z, S:p, Emax. When group is present val.prob returns an object of class val.prob containing a list with summary statistics and calibration curves for all the strata plus "Overall".
val.prob不group返回一个矢量，以下元素：Dxy，R2，D，D:Chi-sq，D:p， U，U:Chi-sq，U:p，Q，Brier，Intercept，Slope，S:z，S:p，Emax。当group存在val.prob返回一个对象类val.prob包含一个列表，汇总统计和校准曲线的所有阶层加"Overall"。

（作者）----------Author(s)----------

Frank Harrell<br>
Department of Biostatistics, Vanderbilt University<br>
f.harrell@vanderbilt.edu

参考文献----------References----------

Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.  Stat in Med 15:361–387.
accuracy of probability predictions (Technical Report).
logistic regression models.  Stat in Med 10:1213–1226.
prediction scores.  Stat in Med 28:377–388.
of discriminant analysis and logistic regression under multivariate normality.  In Biostatistics: Statistics in Biomedical, Public Health, and Environmental Sciences.  The Bernard G. Greenberg Volume, ed. PK Sen. New York: North-Holland, p. 333–343.
London: Methuen.
Stat in Med 5:421–433.
Clin Epi 63:938-939
models-A new proposal:The coefficient of discrimination.  Am Statist 63:366–372.

参见----------See Also----------

validate.lrm, lrm.fit, lrm, labcurve, wtd.stats, scat1d
validate.lrm，lrm.fit，lrm，labcurve，wtd.stats，scat1d

实例----------Examples----------

# Fit logistic model on 100 observations simulated from the actual [Logistic回归模型拟合100个观测模拟实际]
# model given by Prob(Y=1 given X1, X2, X3) = 1/(1+exp[-(-1 + 2X1)]),[PROB由给定的模型（Y = 1给定的X1，X2，X3）= 1 /（1 + EXP [ - （-1 + 2X1）]），]
# where X1 is a random uniform [0,1] variable.  Hence X2 and X3 are [其中，X1是一个随机均匀[0,1]变量。因此，X2和X3是]
# irrelevant.  After fitting a linear additive model in X1, X2,[无关紧要的。在拟合的线性相加模型在X1，X2，]
# and X3, the coefficients are used to predict Prob(Y=1) on a[和X3，系数被用于预测PROB（Y = 1）上的]
# separate sample of 100 observations.  Note that data splitting is[独立的100个观测样本。请注意，数据分割]
# an inefficient validation method unless n > 20,000.[效率低下的验证方法，除非n> 20,000。]

set.seed(1)
n <- 200
x1 <- runif(n)
x2 <- runif(n)
x3 <- runif(n)
logit <- 2*(x1-.5)
P <- 1/(1+exp(-logit))
y <- ifelse(runif(n)<=P, 1, 0)
d <- data.frame(x1,x2,x3,y)
f <- lrm(y ~ x1 + x2 + x3, subset=1:100)
pred.logit <- predict(f, d[101:200,])
phat <- 1/(1+exp(-pred.logit))
val.prob(phat, y[101:200], m=20, cex=.5)  # subgroups of 20 obs.[20 OBS群。]

# Validate predictions more stringently by stratifying on whether[更严格的分层是否验证预测]
# x1 is above or below the median[x1是高于或低于中位数]

v <- val.prob(phat, y[101:200], group=x1[101:200], g.group=2)
v
plot(v)
plot(v, flag=function(stats) ifelse(
  stats[,'ChiSq2'] > qchisq(.95,2) |
  stats[,'B ChiSq'] > qchisq(.95,1), '*', ' ') )
# Stars rows of statistics in plot corresponding to significant[星行的统计，在相应的显著图]
# mis-calibration at the 0.05 level instead of the default, 0.01[在0.05的水平上，而不是默认的，0.01的校准错误]

plot(val.prob(phat, y[101:200], group=x1[101:200], g.group=2),
            col=1:3) # 3 colors (1 for overall)[3种颜色（整体）]

# Weighted calibration curves[加权校准曲线]
# plot(val.prob(pred, y, group=age, weights=freqs))[图（val.prob（PRED组，Y =年龄，权重=频“））]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册