val.surv(rms)
val.surv()所属R语言包:rms
Validate Predicted Probabilities Against Observed Survival Times
验证预测概率与观测生存时间
译者:生物统计家园网 机器人LoveR
描述----------Description----------
The val.surv function is useful for validating predicted survival probabilities against right-censored failure times. If u is specified, the hazard regression function hare in the polspline package is used to relate predicted survival probability at time u to observed survival times (and censoring indicators) to estimate the actual survival probability at time u as a function of the estimated survival probability at that time, est.surv. If est.surv is not given, fit must be specified and the survest function is used to obtain the predicted values (using newdata if it is given, or using the stored linear predictor values if not). hare is given the sole predictor fun(est.surv) where fun is given by the user or is inferred from fit. fun is the function of predicted survival probabilities that one expects to create a linear relationship with the linear predictors.
val.surv:功能是有用的,用于验证对右删失数据的预测生存概率。如果u指定,风险回归功能harepolspline包是用来与时间u观察到的存活时间(和审查指标)来预测生存概率估计实际的生存概率时间u当时的函数的估计生存概率,est.surv。 est.surv如果没有给出,fit必须指定,得到的预测值(使用survest功能是用来newdata,如果它被赋予的,或使用存储的线性预测值,如果不)。 hare的唯一预测fun(est.surv)其中fun是由用户或推断出fit。 fun是生存概率的预测,预计会开设的线性关系,线性预测的功能。
hare uses an adaptive procedure to find a linear spline of fun(est.surv) in a model where the log hazard is a linear spline in time t, and cross-products between the two splines are allowed so as to not assume proportional hazards. Thus hare assumes that the covariate and time functions are smooth but not much else, if the number of events in the dataset is large enough for obtaining a reliable flexible fit. There are special print and plot methods when u is given. In this case, val.surv returns an object of class "val.survh", otherwise it returns an object of class "val.surv".
hare使用一个自适应过程,找到一个线性样条曲线的fun(est.surv)在log的危害是一个模型,线性样条曲线在时间t,和两个样条线之间的交叉产品允许不承担比例风险。因此hare假设的协变量和时间的函数是平滑的,但没有多少人,如果在数据集中的事件的数量足够大,为获得一个可靠灵活的配合。有特殊的print和plotu方法。在这种情况下,val.surv返回一个类的对象"val.survh",否则它返回一个对象类"val.surv"。
If u is not specified, val.surv uses Cox-Snell (1968) residuals on the cumulative probability scale to check on the calibration of a survival model against right-censored failure time data. If the predicted survival probability at time t for a subject having predictors X is S(t|X), this method is based on the fact that the predicted probability of failure before time t, 1 - S(t|X), when evaluated at the subject's actual survival time T, has a uniform (0,1) distribution. The quantity 1 - S(T|X) is right-censored when T is. By getting one minus the Kaplan-Meier estimate of the distribution of 1 - S(T|X) and plotting against the 45 degree line we can check for calibration accuracy. A more stringent assessment can be obtained by stratifying this analysis by an important predictor variable. The theoretical uniform distribution is only an approximation when the survival probabilities are estimates and not population values.
u如果没有被指定,val.surv使用考克斯 - 斯内尔(1968)残差的累积概率尺度检查的校准的生存模型对右删失数据的实时数据。如果在时间预测的生存概率t的被摄体具有预测因子X被S(t|X),该方法是根据在故障之前的时间的预测概率的事实,t, 1 - S(t|X),当评估对象的实际生存时间T,有一个统一的(0,1)分布。的数量1 - S(T|X)是正确的审查时T是。一减Kaplan-Meier估计的分布1 - S(T|X)45度线,我们可以检查校准精度和图反对。一个更严格的评估,可以通过分层分析的一个重要的预测变量。均匀分布的理论只是一个近似值时的生存概率为估计值,而不是人口值。
When censor is specified to val.surv, a different validation is done that is more stringent but that only uses the uncensored failure times. This method is used for type I censoring when the theoretical censoring times are known for subjects having uncensored failure times. Let T, C, and F denote respectively the failure time, censoring time, and cumulative failure time distribution (1 - S). The expected value of F(T | X) is 0.5 when T represents the subject's actual failure time. The expected value for an uncensored time is the expected value of F(T | T ≤q C, X) = 0.5 F(C | X). A smooth plot of F(T|X) - 0.5 F(C|X) for uncensored T should be a flat line through y=0 if the model is well calibrated. A smooth plot of 2F(T|X)/F(C|X) for uncensored T should be a flat line through y=1.0. The smooth plot is obtained by smoothing the (linear predictor, difference or ratio) pairs.
当censor指定val.surv,不同的验证是比较严格的,但只使用未经审查的故障时间。 I型理论的审查时间是众所周知的未经审查的失效时间的审查时使用此方法。让我们T,C和F表示分别发生故障时,审查的时间,累积失效时间分布(1 - S)。 F(T | X)的预期值是0.5,当T表示对象的实际发生故障的时间。预期值为未经审查的时间是预期值的F(T | T ≤q C, X) = 0.5 F(C | X)。光滑的曲线F(T|X) - 0.5 F(C|X)为未经审查的T应该是一条平线通过y=0,如果模型能很好地校准。光滑的曲线2F(T|X)/F(C|X)为未经审查的T应该是一条平线,通过y=1.0。是通过平滑(线性预测,差值或比值)对光滑的图。
用法----------Usage----------
val.surv(fit, newdata, S, est.surv, censor,
u, fun, lim, evaluate=100, pred, maxdim=5, ...)
## S3 method for class 'val.survh'
print(x, ...)
## S3 method for class 'val.survh'
plot(x, lim, xlab, ylab,
riskdist=TRUE, add=FALSE,
scat1d.opts=list(nhistSpike=200), ...)
## S3 method for class 'val.surv'
plot(x, group, g.group=4,
what=c('difference','ratio'),
type=c('l','b','p'),
xlab, ylab, xlim, ylim, datadensity=TRUE, ...)
参数----------Arguments----------
参数:fit
a fit object created by cph or psm
一个合适的对象cph或psm
参数:newdata
a data frame for which val.surv should obtain predicted survival probabilities. If omitted, survival estimates are made for all of the subjects used in fit.
一个数据框的val.surv应该得到的预测生存概率。如果省略,则生存的估计是所有的受试者在fit。
参数:S
an Surv object
Surv对象
参数:est.surv
a vector of estimated survival probabilities corresponding to times in the first column of S.
一个向量的估计对应的生存概率的第一列S次。
参数:censor
a vector of censoring times. Only the censoring times for uncensored observations are used.
审查倍的向量。设限时间为未经审查的观察的。
参数:u
a single numeric follow-up time
一个单一的数字随访时间
参数:fun
a function that transforms survival probabilities into the scale of the linear predictor. If fit is given, and represents either a Cox, Weibull, or exponential fit, fun is automatically set to log(-log(p)).
一个函数,变换成规模的线性预测器的生存概率。 fit如果是给定的,并代表了Cox,Weibull分布,指数拟合,fun会自动设置为登录(log(P))。
参数:lim
a 2-vector specifying limits of predicted survival probabilities for obtaining estimated actual probabilities at time u. Default for val.surv is the limits for predictions from datadist, which for large n is the 10th smallest and 10th largest predicted survival probability. For plot.val.survh, the default for lim is the range of the combination of predicted probabilities and calibrated actual probabilities. lim is used for both axes of the calibration plot.
2矢量预测的生存概率获得估计实际的概率在指定的限制时间u。默认为val.surv是的限制datadist,为大n是第10个最小和第10大预测生存概率的预测。对于plot.val.survh,默认为lim是预测概率和校准的实际概率的组合的范围内。 lim用于两个轴的校正曲线。
参数:evaluate
the number of evenly spaced points over the range of predicted probabilities. This defines the points at which calibrated predictions are obtained for plotting.
预测概率的范围内均匀分布的点的数目。这定义了点校准的预测,得到了策划。
参数:pred
a vector of points at which to evaluate predicted probabilities, overriding lim
一个向量的点,以评估预测概率,覆盖lim
参数:maxdim
see hare
看到hare
参数:x
result of val.surv
结果val.surv
参数:xlab
x-axis label. For plot.survh, defaults for xlab and ylab come from u and the units of measurement for the raw survival times.
X轴标签。对于plot.survh,xlab和ylab来u和测量单位为原料的存活时间的默认值。
参数:ylab
y-axis label
Y轴标签
参数:riskdist
set to FALSE to not call scat1d to draw the distribution of predicted (uncalibrated) probabilities
设置为FALSE,不叫scat1d画的预测(未校准)的概率分布
参数:add
set to TRUE if adding to an existing plot
设置为TRUE,如果加入到现有的图
参数:scat1d.opts
a list of options to pass to scat1d. By default, the option nhistSpike=200 is passed so that a spike histogram is used if the sample size exceeds 200.
一个list的选项传递给scat1d。默认情况下,选项“nhistSpike=200通过这样的峰值直方图时使用的样本量超过200。
参数:...
When u is given to val.surv, ... represents optional arguments to hare. It can represent arguments to pass to plot or lines for plot.val.survh. Otherwise, ... contains optional arguments for plsmo or plot. For print.val.survh, ... is ignored.
当u给val.surv,,...表示可选参数hare。它可以代表参数传递给plot或linesplot.val.survh。否则,...包含可选参数plsmo或plot。对于print.val.survh,...将被忽略。
参数:group
a grouping variable. If numeric this variable is grouped into g.group quantile groups (default is quartiles). group, g.group, what, and type apply when u is not given.
一组变量。如果数字变量分为g.group位数组(默认情况下四分位数)。 group,g.group,what和type当适用于u不给。
参数:g.group
number of quantile groups to use when group is given and variable is numeric.
位数组数时使用的group,并给出变量的数字。
参数:what
the quantity to plot when censor was in effect. The default is to show the difference between cumulative probabilities and their expectation given the censoring time. Set what="ratio" to show the ratio instead.
绘制的数量,当censor效果。默认情况下是与累积概率和期望的审查时间,以示区别。设置what="ratio"显示的比例。
参数:type
Set to the default ("l") to plot the trend line only, "b" to plot both individual subjects ratios and trend lines, or "p" to plot only points.
设置为默认值("l")来绘制趋势线,"b"绘制两个个别科目的比率和趋势线,或"p"绘制只点。
参数:xlim
参数:ylim
axis limits for plot.val.surv when the censor variable was used.
轴限制plot.val.survcensor变量被使用。
参数:datadensity
By default, plot.val.surv will show the data density on each curve that is created as a result of censor being present. Set datadensity=FALSE to suppress these tick marks drawn by scat1d.
默认情况下,plot.val.surv将显示的数据密度每一条曲线上创建的结果censor存在。设置datadensity=FALSE抑制,这些刻度线绘制scat1d的。
值----------Value----------
a list of class "val.surv" or "val.survh"
的列表类"val.surv"或"val.survh",
(作者)----------Author(s)----------
Frank Harrell<br>
Department of Biostatistics, Vanderbilt University<br>
f.harrell@vanderbilt.edu
参考文献----------References----------
discussion). JRSSB 30:248–275.
validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy. Stat in Med 23:2375–2398.
prediction scores. Stat in Med 28:377–388.
参见----------See Also----------
validate, calibrate, hare, scat1d, cph, psm, groupkm
validate,calibrate,hare,scat1d,cph,psm,groupkm
实例----------Examples----------
# Generate failure times from an exponential distribution[从指数分布生成的失效时间]
set.seed(123) # so can reproduce results[这样可以重现结果]
n <- 1000
age <- 50 + 12*rnorm(n)
sex <- factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4)))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
t <- -log(runif(n))/h
units(t) <- 'Year'
label(t) <- 'Time to Event'
ev <- ifelse(t <= cens, 1, 0)
t <- pmin(t, cens)
S <- Surv(t, ev)
# First validate true model used to generate data[首先验证用于生成数据的真实模型]
# If hare is available, make a smooth calibration plot for 1-year[如果野兔,确保平稳校正曲线1年]
# survival probability where we predict 1-year survival using the[我们预测1年生存率生存概率]
# known true population survival probability[真正的人口生存概率]
# In addition, use groupkm to show that grouping predictions into[此外,使用分组预测表明,到groupkm]
# intervals and computing Kaplan-Meier estimates is not as accurate.[时间间隔和计算Kaplan-Meier估计是不准确的。]
if('polspline' %in% row.names(installed.packages())) {
s1 <- exp(-h*1)
w <- val.surv(est.surv=s1, S=S, u=1,
fun=function(p)log(-log(p)))
plot(w, lim=c(.85,1), scat1d.opts=list(nhistSpike=200, side=1))
groupkm(s1, S, m=100, u=1, pl=TRUE, add=TRUE)
}
# Now validate the true model using residuals[现在,验证真实的模型残差]
w <- val.surv(est.surv=exp(-h*t), S=S)
plot(w)
plot(w, group=sex) # stratify by sex[分层按性别]
# Now fit an exponential model and validate[现在安装一个指数模型和验证]
# Note this is not really a validation as we're using the[请注意,这是不是一个真正的验证,因为我们使用的是]
# training data here[这里的训练数据]
f <- psm(S ~ age + sex, dist='exponential', y=TRUE)
w <- val.surv(f)
plot(w, group=sex)
# We know the censoring time on every subject, so we can[我们知道,每一个主题的审查时间,所以我们可以]
# compare the predicted Pr[T <= observed T | T>c, X] to[比较预测的PR [T <=观察到T | T> C,X]]
# its expectation 0.5 Pr[T <= C | X] where C = censoring time[其预期0.5 PR [T = C | X],其中C =审查时间]
# We plot a ratio that should equal one[我们绘制一个比应该等于1]
w <- val.surv(f, censor=cens)
plot(w)
plot(w, group=age, g=3) # stratify by tertile of age[分层的前三分之一,年龄]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|