rmsOverview(rms)
rmsOverview()所属R语言包:rms
Overview of rms Package
RMS包装概述
译者:生物统计家园网 机器人LoveR
描述----------Description----------
rms is the package that goes along with the book Regression Modeling Strategies. rms does regression modeling, testing, estimation, validation, graphics, prediction, and typesetting by storing enhanced model design attributes in the fit. rms is a re-written version of the Design package that has improved graphics and duplicates very little code in the survival package.
RMS是包,随之而来的书回归建模策略。 RMS做回归分析,测试,评估,验证,图形,预测,并通过存储在合适的增强型设计属性排版。 RMS是一个重新编写的版本,改进的图形和重复的代码非常少的生存包的设计方案。
The package is a collection of about 180 functions that assist and streamline modeling, especially for biostatistical and epidemiologic applications. It also contains functions for binary and ordinal logistic regression models and the Buckley-James multiple regression model for right-censored responses, and implements penalized maximum likelihood estimation for logistic and ordinary linear models. rms works with almost any regression model, but it was especially written to work with logistic regression, Cox regression, accelerated failure time models, ordinary linear models, the Buckley-James model, generalized lease squares for longitudinal data (using the nlme package), generalized linear models, and quantile regression (using the quantreg package). rms requires the Hmisc package to be installed. Note that Hmisc has several functions useful for data analysis (especially data reduction and imputation).
包是一个约180功能的集合,以协助和流线造型,特别是对生物统计学和流行病学的应用程序。它也包含二进制和有序Logistic回归模型和巴克利,詹姆斯多元回归模型,右删失的响应函数,并实施处罚后勤和普通线性模型的极大似然估计。 RMS适用于几乎所有的回归模型,但它特别编写工作Logistic回归,Cox回归,加速失效时间模型,一般线性模型,巴克利,詹姆斯,广义租赁平方纵向数据(使用NLME包),广义线性模型,并位数回归(使用quantreg包)。 RMS要求Hmisc的包,要安装。需要注意的是Hmisc有多种功能,可对数据进行分析(特别是数据缩减和插补)。
Older references below pertaining to the Design package are relevant to rms.
旧下面的参考资料有关设计方案相关的RMS。
Details
详细信息----------Details----------
To make use of automatic typesetting features you must have LaTeX or one of its variants installed.<br>
为了使自动排版功能,您必须有它的变种安装的乳液或使用。<BR>
Some aspects of rms (e.g., latex) will not work correctly if options(contrasts=) other than c("contr.treatment", "contr.poly") are used.
的RMS的某些方面(例如,latex)不会正常工作,如果options(contrasts=)比c("contr.treatment", "contr.poly")使用。
rms relies on a wealth of survival analysis functions written by Terry Therneau of Mayo Clinic. Front-ends have been written for several of Therneau's functions, and other functions have been slightly modified.
RMS依赖于丰富的书面由特里Therneau的梅奥诊所的生存分析功能。的前端已被写入几个Therneau的职能,和其他功能已被稍微修改。
实施统计方法----------Statistical Methods Implemented----------
Ordinary linear regression models
普通线性回归模型
Binary and ordinal logistic models (proportional odds and continuation ratio models)
二进制和有序Logistic模型(比例的赔率和持续比模型)
Cox model
Cox比例风险模型
Parametric survival models in the accelerated failure time class
在加速失效时间类参数生存模型
Buckley-James least-squares linear regression model with possibly right-censored responses
巴克利 - 詹姆斯最小二乘法线性回归模型可能是正确的审查措施
Generalized linear model
广义线性模型
Quantile regression
分位数回归
Generalized least squares
广义最小二乘
Bootstrap model validation to obtain unbiased estimates of model performance without requiring a separate validation sample
仿真模型验证,以获得无偏估计模型的性能,而无需一个单独的验证样品
Automatic Wald tests of all effects in the model that are not parameterization-dependent (e.g., tests of nonlinearity of main effects when the variable does not interact with other variables, tests of nonlinearity of interaction effects, tests for whether a predictor is important, either as a main effect or as an effect modifier)
自动瓦尔德测试模型中的所有效果是不依赖的参数(例如,测试的主要影响非线性的变量时,不与其它变量,非线性的互动效应的测试,测试的预测是否是很重要的,无论是作为主要的效果,或作为效果改性剂)
Graphical depictions of model estimates (effect plots, odds/hazard ratio plots, nomograms that allow model predictions to be obtained manually even when there are nonlinear effects and interactions in the model)
估计模型(图形描绘效果图,赔率/危险比图,列线图,使模型的预测时,即使是在模型中的非线性效应和交互效应,获得手动)
Various smoothed residual plots, including some new residual plots for verifying ordinal logistic model assumptions
的各种平滑的残差图,包括一些新的验证有序Logistic模型假设的残差图
Composing S functions to evaluate the linear predictor (X*beta hat), hazard function, survival function, quantile functions analytically from the fitted model
撰写S功能评估的线性预测(X*beta hat),生存函数,风险函数,分位数函数分析拟合模型
Typesetting of fitted model using LaTeX
使用LaTeX排版的拟合模型
Robust covariance matrix estimation (Huber or bootstrap)
鲁棒协方差矩阵估计(胡贝尔或引导)
Cubic regression splines with linear tail restrictions (natural splines)
三次样条回归与线性尾部的限制(自然样条)
Tensor splines
张量积样条曲线
Interactions restricted to not be doubly nonlinear
双重非线性的相互作用限制
Penalized maximum likelihood estimation for ordinary linear regression and logistic regression models. Different parts of the model may be penalized by different amounts, e.g., you may want to penalize interaction or nonlinear effects more than main effects or linear effects
普通的线性回归和Logistic回归模型的最大惩罚似然估计。可处以不同金额的不同部分的模型,例如,你可能想惩罚互动或以上的主效应或非线性效应的非线性效应
Estimation of hazard or odds ratios in presence of nolinearity and interaction
估计的nolinearity和互动中存在的危险或比值比
Sensitivity analysis for an unmeasured binary confounder in a binary logistic model
二分类Logistic模型的不可测量的二进制混杂因素的敏感度分析
动机----------Motivation----------
rms was motivated by the following needs:
RMS是出于以下需求:
need to automatically print interesting Wald tests that can be constructed from the design
需要自动打印沃尔德有趣的测试,可以从设计建造
tests of linearity with respect to each predictor
相对于每个预测变量的线性测试
tests of linearity of interactions
测试线性的相互作用
pooled interaction tests (e.g., all interactions involving race)
汇集互动的测试(例如,所有涉及种族的相互作用)
pooled tests of effects with higher order effects
汇集测试与高阶效应的影响
test of main effect not meaningful when effect in interaction
测试的主要作用不是有意义的互动效果
pooled test of main effect + interaction effect is meaningful
汇集测试+互动效果的主要作用是有意义的
test of 2nd-order interaction + any 3rd-order interaction containing those factors is meaningful
二阶互动+任何包含这些因素的三阶交互测试是有意义的
need to store transformation parameters with the fit
需要存储变换参数拟合
example: knot locations for spline functions
例如:结的位置样条函数
these are "remembered" when getting predictions, unlike standard S or R
这些“记忆”时,得到的预测,与标准的S或R
for categorical predictors, save levels so that same dummy variables will be generated for predictions; check that all levels in out-of-data predictions were present when model was fitted
分类预测,保存,会产生相同的虚拟变量进行预测的水平;检查时,各级的数据预测模型拟合
need for uniform re-insertion of observations deleted because of NAs when using predict without newdata or when using resid
请使用predict不newdata或当使用resid需要统一重新插入删除,因为来港定居的意见
need to easily plot the regression effect of any predictor
需要很容易地绘制任何预测的回归效应
example: age is represented by a linear spline with knots at 40 and 60y plot effect of age on log odds of disease, adjusting interacting factors to easily specified constants
例如:年龄为代表的线性样条在40节和年龄对疾病的log赔率60Y图互动的因素,调整容易的常量
vary 2 predictors: plot x1 on x-axis, separate curves for discrete x2 or 3d perspective plot for continuous x2
不同的预测:图连续X2 X1的X轴,单独的曲线离散x2或3D透视图
if predictor is represented as a function in the model, plots should be with respect to the original variable:<br> f <- lrm(y ~ log(cholesterol)+age) <br> plot(Predict(f, cholesterol)) # cholesterol on x-axis, default range
如果表示的函数模型中的预测,图应该是给原来的变量:<BR> f <- lrm(y ~ log(cholesterol)+age)参考plot(Predict(f, cholesterol)) # cholesterol on x-axis, default range
need to store summary of distribution of predictors with the fit
需要存储的预测值的分布的摘要与契合
plotting limits (default: 10th smallest, 10th largest values or %-tiles)
策划限制(默认是:10日最小10最大的值或%瓷砖)
effect limits (default: .25 and .75 quantiles for continuous vars.)
效应限制(默认值:0.25和0.75位数连续瓦尔)。
adjustment values for other predictors (default: median for continuous predictors, most frequent level for categorical ones)
其他预测调整值(默认值:连续预测中位数,最频繁的层级分类的)
discrete numeric predictors: list of possible values example: x=0,1,2,3,5 -> by default don't plot prediction at x=4
离散的数字预测:可能值列表,例如:X = 0,1,2,3,5 - >默认情况下,不绘制预测在x = 4
values are on the inner-most variable, e.g. cholesterol, not log(chol.)
值是在最内层的变量,例如胆固醇,无法登录(chol.)
allows estimation/plotting long after original dataset has been deleted
估计/绘图长后,原来的数据集已被删除
for Cox models, underlying survival also stored with fit, so original data not needed to obtain predicted survival curves
Cox模型,基本的生存也相契合,所以原始数据存储需要获得预测的生存曲线
need to automatically print estimates of effects in presence of non- linearity and interaction
需要自动打印效果的估计存在的非线性和相互作用
example: age is quadratic, interacting with sex default effect is inter-quartile-range hazard ratio (for Cox model), for sex=reference level
例如:年龄是二次,互动性的默认效果是四分位范围内的危险比(Cox模型),性别=参考电平
user-controlled effects: summary(fit, age=c(30,50), sex="female") -> odds ratios for logistic model, relative survival time for accelerated failure time survival models
用户控制的效果:summary(fit, age=c(30,50), sex="female") - >比值比Logistic模型,相对加速失效时间生存模型的生存时间为
effects for all variables (e.g. odds ratios) may be plotted with multiple-confidence-level bars
的所有变量的(如比值比)的影响可能会绘制出多层次的信任条形
need for prettier and more concise effect names in printouts, especially for expanded nonlinear terms and interaction terms
需要在打印输出效果更漂亮,更简洁的名字,特别是扩大非线性项和交互项
use inner-most variable name to identify predictors
使用最内层的变量名的预测因子
e.g. for pmin(x^2-3,10) refer to factor with legal S-name x
例如pmin(x^2-3,10)参考因素与法律S-x
need to recognize that an intercept is not always a simple concept
需要认识到,一个拦截并不总是一个简单的概念
some models (e.g., Cox) have no intercept
一些模型(例如,考克斯)有没有拦截
some models (e.g., ordinal logistic) have multiple intercepts
一些型号(例如,有序Logistic)有多个拦截
need for automatic high-quality printing of fitted mathematical model (with dummy variables defined, regression spline terms simplified, interactions "factored"). Focus is on regression splines instead of nonparametric smoothers or smoothing splines, so that explicit formulas for fit may be obtained for use outside S. rms can also compose S functions to evaluate X*Beta from the fitted model analytically, as well as compose SAS code to do this.
需要高品质的印刷自动拟合数学模型(与虚拟变量定义,回归样条曲线的条款简化,相互作用“保理”)。重点是在非参数平滑或平滑样条曲线的回归样条,明确的拟合公式为使用外S. RMS的组成S功能评估X*Beta从拟合模型的分析,以及撰写SAS代码来做到这一点。
need for automatic drawing of nomogram to represent the fitted model
需要自动绘制列线图来表示拟合模型
need for automatic bootstrap validation of a fitted model, with only one S command (with respect to calibration and discrimination)
需要对自动引导的拟合模型验证,只有一个S命令(校准和歧视)
need for robust (Huber sandwich) estimator of covariance matrix, and be able to do all other analysis (e.g., plots, C.L.) using the adjusted covariances
需要强大的(胡贝尔三明治)的协方差矩阵估计,使用调整后的协方差可以做所有其他的分析(如,图,CL)
need for robust (bootstrap) estimator of covariance matrix, easily used in other analyses without change
需要强大的(引导)的协方差矩阵估计,方便地用于其他没有变化分析
need for Huber sandwich and bootstrap covariance matrices adjusted for cluster sampling
贝尔三明治和自举协方差矩阵的调整,整群抽样的需要
need for routine reporting of how many observations were deleted by missing values on each predictor (see na.delete in Hmisc)
需要进行定期报告的每一个预测的遗漏值多少意见被删除(na.delete在Hmisc)
need for optional reporting of descriptive statistics for Y stratified by missing status of each X (see na.detail.response)
需要可选的描述性统计报告缺失的现状,每个X的Y分层(见na.detail.response)
need for pretty, annotated survival curves, using the same commands for parametric and Cox models
需要为漂亮,注释生存的曲线,使用相同的命令参数和Cox模型
need for ordinal logistic model (proportional odds model, continuation ratio model)
需要有序Logistic模型(比例优势模型,延续比例模型)
need for estimating and testing general contrasts without having to be conscious of variable coding or parameter order
需要,而无需将变量的编码或参数顺序意识的估计和测试对比
拟合函数与均方根----------Fitting Functions Compatible with rms----------
rms will work with a wide variety of fitting functions, but it is meant especially for the following:
有效值将与各种各样的拟合函数,但它是指特别是为下列:
Purpose
目的
Ordinary least squares linear model
普通最小二乘法线性模型
Binary and ordinal logistic regression
二进制和有序Logistic回归
model
模型
Accelerated failure time parametric
加速失败时间参数
survival model
生存模式
Cox proportional hazards regression
Cox比例风险回归
Buckley-James censored least squares
巴克利 - 詹姆斯审查最小二乘
linear model
线性模型
Version of glm for use with rms
使用RMS版本的glm
Version of gls for use with rms
使用RMS版本的gls
Version of rq for use with rms
使用RMS版本的rq
在RMS的方法----------Methods in rms----------
The following generic functions work with fits with rms in effect:
下面的泛型函数的工作与RMS配合的效果:
Purpose
目的
Print parameters and statistics of fit
打印参数和统计数据的拟合
Fitted regression coefficients
拟合的回归系数
Formula used in the fit
公式拟合中使用
Detailed specifications of fit
详细规格适合
Robust covariance matrix estimates
鲁棒协方差矩阵估计
Bootstrap covariance matrix estimates
自举协方差矩阵估计
Summary of effects of predictors
的预测效果摘要
Plot continuously shaded confidence
绘制连续阴影的信心
bars for results of summary
条形总结的结果
Wald tests of most meaningful hypotheses
瓦尔德测试的最有意义的假设
General contrasts, C.L., tests
一般的对比,C.L.,测试
Depict results of anova graphically
图形化描绘的方差分析的结果
Partial predictor effects
部分预测效果
Plot predictor effects using lattice graphics
使用点阵图形的图预测的影响
3-D plot of effects of varying two
3-D曲线的影响不同的两个
continuous predictors
连续预测
Generate data frame with predictor
生成数据框与预测
combinations (optionally interactively)
组合(可选交互方式)
Obtain predicted values or design matrix
得到的预测值或设计矩阵
Fast backward step-down variable
快退降压变量
selection
选择
Residuals, influence statistics from fit
从拟合的残差,影响统计
Which observations are overly
观测过于
influential
有影响
Sensitivity of one binary predictor in
一个二进制的预测灵敏度
lrm and cph models to an unmeasured
LRM和CPH模型的一个不可测量的
binary confounder
二进制的影响因子
LaTeX representation of fitted
乳胶表示装
model or anova or summary table
模型或anova或summary表
S function analytic representation
S函数的解析表达式
of a fitted regression model (X*Beta)
的一个拟合回归模型(X*Beta)
S function analytic representation
S函数的解析表达式
of a fitted hazard function (for psm)
一个装有危险的功能(适用于psm)
S function analytic representation of
S函数的解析表达式
fitted survival function (for psm,cph)
合身的的生存功能(为psm,cph)
S function analytic representation of
S函数的解析表达式
fitted function for quantiles of
分位数的拟合函数
survival time (for psm, cph)
生存时间(为psm, cph)
Draws a nomogram for the fitted model
绘制列线图拟合模型
Estimate survival probabilities
估计生存概率
(for psm, cph)
(为psm, cph)
Plot survival curves (psm, cph)
绘制生存曲线(每平方米,CPH)
Validate indexes of model fit using
验证模型的拟合指标的使用
resampling
重采样
Estimate calibration curve for model
估计校准曲线模型
using resampling
使用重采样
Variance inflation factors for a fit
一个合适的方差膨胀因子
Bring elements corresponding to missing
把缺少相应的元素
data back into predictions and residuals
数据备份到预测和残差
Print summary of missing values
打印的遗漏值摘要
Find optimum penality for penalized MLE
寻找最佳的刑罚处罚MLE
Print effective d.f. for each type of
打印有效D.F.对于每种类型的
variable in model, for penalized fit or
模型中的变量,处罚适合或
pentrace result
pentrace结果
Impute repeated measures data with
插补重复测量数据
non-random dropout
非随机漏失
experimental, non-functional
实验性的,非功能性的
背景例子----------Background for Examples----------
The following programs demonstrate how the pieces of the rms package work together. A (usually) one-time call to the function datadist requires a pass at the entire data frame to store distribution summaries for potential predictor variables. These summaries contain (by default) the .25 and .75 quantiles of continuous variables (for estimating effects such as odds ratios), the 10th smallest and 10th largest values (or .1 and .9 quantiles for small n) for plotting ranges for estimated curves, and the total range. For discrete numeric variables (those having <=10 unique values), the list of unique values is also stored. Such summaries are used by the summary.rms, Predict, and nomogram.rms functions. You may save time and defer running datadist. In that case, the distribution summary is not stored with the fit object, but it can be gathered before running summary or plot.
下面的程序演示了如何一起工作的均方根包件。 A(通常)一次性调用的功能datadist的,需要在整个数据框的传球潜在的预测变量来存储分配摘要。这些摘要包含(默认情况下),0.25和0.75位数的连续变量(如比值比估计的影响),10日和10日值最大(或0.1和0.9位数为小n)用于绘制范围估计曲线,和总范围。对于离散数值型变量(那些<=10独特的价值),唯一值的列表也被存储。这样的总结summary.rms, Predict和nomogram.rms函数中使用。您可以节省时间和,推迟运行datadist。在这种情况下,分发概要不存储与合适的对象,但它可以被收集在运行前summary或plot。
d <- datadist(my.data.frame) # or datadist(x1,x2)<br> options(datadist="d") # omit this or use options(datadist=NULL)<br> # if not run datadist yet<br> cf <- ols(y ~ x1 * x2)<br> anova(f)<br> fastbw(f)<br> Predict(f, x2) predict(f, newdata)
d <- datadist(my.data.frame) # or datadist(x1,x2)参考options(datadist="d") # omit this or use options(datadist=NULL)参考 # if not run datadist yet参考cf <- ols(y ~ x1 * x2)参考anova(f)参考fastbw(f)参考 X>Predict(f, x2)
In the Examples section there are three detailed examples using a fitting function designed to be used with rms, lrm (logistic regression model). In Detailed Example 1 we create 3 predictor variables and a two binary response on 500 subjects. For the first binary response, dz, the true model involves only sex and age, and there is a nonlinear interaction between the two because the log odds is a truncated linear relationship in age for females and a quadratic function for males. For the second binary outcome, dz.bp, the true population model also involves systolic blood pressure (sys.bp) through a truncated linear relationship. First, nonparametric estimation of relationships is done using the Hmisc package's plsmo function which uses lowess with outlier detection turned off for binary responses. Then parametric modeling is done using restricted cubic splines. This modeling does not assume that we know the true transformations for age or sys.bp but that these transformations are smooth (which is not actually the case in the population).
在“示例”部分中有三个详细的例子,用拟合函数与RMS设计用于lrm(logistic回归模型)。在详细示例1中,我们创建了3个预测变量和两个二进制响应500的主题。对于第一个二进制响应,dz,真实的模型包括仅sex和age,有一个两者之间的非线性相互作用,因为log的赔率是一个被截断的线性关系,age女性和男性的二次函数为。对于第二个二进制结果,dz.bp,真正的人口模型还包括收缩压(sys.bp)通过截断的线性关系。首先,使用Hmisc包的plsmo功能,使用非参数估计的关系lowess孤立点检测的关闭二进制反应的。参数化建模是使用受限制的三次样条。这个造型不认为我们知道age或sys.bp真正的变革,但这些变革是光滑的(这是不实际的人口的情况下)。
For Detailed Example 2, suppose that a categorical variable treat has values "a", "b", and "c", an ordinal variable num.diseases has values 0,1,2,3,4, and that there are two continuous variables, age and cholesterol. age is fitted with a restricted cubic spline, while cholesterol is transformed using the transformation log(cholesterol - 10). Cholesterol is missing on three subjects, and we impute these using the overall median cholesterol. We wish to allow for interaction between treat and cholesterol. The following S program will fit a logistic model, test all effects in the design, estimate effects, and plot estimated transformations. The fit for num.diseases really considers the variable to be a 5-level categorical variable. The only difference is that a 3 d.f. test of linearity is done to assess whether the variable can be re-modeled "asis". Here we also show statements to attach the rms package and store predictor characteristics from datadist.
详细示例2,假设一个明确的变量治疗的值"a", "b"和"c",定序变量num.diseases值0,1,2,3,4,和有两个连续变量,age和cholesterol。 age嵌合具有受限三次样条,同时cholesterol使用变换log(cholesterol - 10)转化。胆固醇是失踪了三个议题,我们归咎于这些采用整体平均胆固醇。我们希望能让之间的相互作用treat和cholesterol。下列程序将适合的MF模式,测试所有的设计,估计效果,图估计转换的影响。适合num.diseases真的认为是一个5级分类变量的变量。唯一的区别是,一个3 D.F.测试是为了评估线性变量是否可以仿照“现状”。在这里,我们也显示报表附加的RMS包装和储存的预测特性,从datadist。
Detailed Example 3 shows some of the survival analysis capabilities of rms related to the Cox proportional hazards model. We simulate data for 2000 subjects with 2 predictors, age and sex. In the true population model, the log hazard function is linear in age and there is no age x sex interaction. In the analysis below we do not make use of the linearity in age. rms makes use of many of Terry Therneau's survival functions that are builtin to S.
详细示例3显示了一些相关的Cox比例风险模型的生存分析能力的rms。我们模拟数据与2的预测,2000年的主题age和sex。在真正的人口模型,log风险函数是线性age和有没有agexsex互动。在下面的分析中,我们没有使用中的线性年龄。 RMS许多特里Therneau的生存功能,使用内置命令S.
The following is a typical sequence of steps that would be used with rms in conjunction with the Hmisc transcan function to do single imputation of all NAs in the predictors (multiple imputation would be better but would be harder to do in the context of bootstrap model validation), fit a model, do backward stepdown to reduce the number of predictors in the model (with all the severe problems this can entail), and use the bootstrap to validate this stepwise model, repeating the variable selection for each re-sample. Here we take a short cut as the imputation is not repeated within the bootstrap.
下面是一个典型序列的步骤,将用于与RMS一起Hmisc transcan功能,使单一插补的所有NAS的预测(多重插补效果会更好,但做起来难的背景下仿真模型验证),拟合模型,,做落后降压数量减少的预测模型中的所有严重的问题,这可能需要使用引导,以验证这种阶梯式的模式,为每个再重复变量选择样本。在这里,我们走捷径的插补内不重复的引导。
In what follows we (atypically) have only 3 candidate predictors. In practice be sure to have the validate and calibrate functions operate on a model fit that contains all predictors that were involved in previous analyses that used the response variable. Here the imputation is necessary because backward stepdown would otherwise delete observations missing on any candidate variable.
在下文中,我们(非典型)只有3个候选预测。在实践中是一定要有的验证和校准功能操作在参与响应变量的分析,它包含了所有的预测模型拟合。这里的归集是必要的,因为落后的降压,否则删除观察缺少任何候选人的变量。
Note that you would have to define x1, x2, x3, y to run the following code.
请注意,您必须定义x1, x2, x3, y运行下面的代码。
xt <- transcan(~ x1 + x2 + x3, imputed=TRUE)<br> impute(xt) # imputes any NAs in x1, x2, x3<br> # Now fit original full model on filled-in data<br> f <- lrm(y ~ x1 + rcs(x2,4) + x3, x=TRUE, y=TRUE) #x,y allow boot.<br> fastbw(f)<br> # derives stepdown model (using default stopping rule)<br> validate(f, B=100, bw=TRUE) # repeats fastbw 100 times<br> cal <- calibrate(f, B=100, bw=TRUE) # also repeats fastbw<br> plot(cal)
xt <- transcan(~ x1 + x2 + x3, imputed=TRUE)参考impute(xt) # imputes any NAs in x1, x2, x3参考# Now fit original full model on filled-in data参考f <- lrm(y ~ x1 + rcs(x2,4) + x3, x=TRUE, y=TRUE) #x,y allow boot.参考fastbw(f)参考# derives stepdown model (using default stopping rule)参考 X>参考validate(f, B=100, bw=TRUE) # repeats fastbw 100 times参考cal <- calibrate(f, B=100, bw=TRUE) # also repeats fastbw
一般要避免的问题----------Common Problems to Avoid----------
Don't have a formula like y ~ age + age^2. In S you need to connect related variables using a function which produces a matrix, such as pol or rcs. This allows effect estimates (e.g., hazard ratios) to be computed as well as multiple d.f. tests of association.
不要有这样的公式y ~ age + age^2。在S,你需要连接相关的变量,使用一个函数,它产生一个矩阵,如pol或rcs。这允许效果估计(例如,危险比),以及作为多个df的被计算为测试协会。
Don't use poly or strata inside formulas used in rms. Use pol and strat instead.
不要使用poly或strata内的均方根公式。使用pol和strat。
Almost never code your own dummy variables or interaction variables in S. Let S do this automatically. Otherwise, anova can't do its job.
几乎从来没有编写自己的虚拟变量或交互变量S.设S自动执行此操作。否则,anova不能做自己的工作。
Almost never transform predictors outside of the model formula, as then plots of predicted values vs. predictor values, and other displays, would not be made on the original scale. Use instead something like y ~ log(cell.count+1), which will allow cell.count to appear on x-axes. You can get fancier, e.g., y ~ rcs(log(cell.count+1),4) to fit a restricted cubic spline with 4 knots in log(cell.count+1). For more complex transformations do something like f <- function(x) {<br> ... various 'if' statements, etc.<br> log(pmin(x,50000)+1)<br> }<br> fit1 <- lrm(death ~ f(cell.count))<br> fit2 <- lrm(death ~ rcs(f(cell.count),4))<br> }
几乎从来没有转化的预测模型公式之外,然后绘制的预测值与预测值和其它显示,不会被原有规模。 ,而不是使用类似y ~ log(cell.count+1),这将使cell.count显示x-轴。您可以更大胆,例如y ~ rcs(log(cell.count+1),4):4节的速度,以适应限制三次样条log(cell.count+1)。对于更复杂的转换做类似的东西f <- function(x) {参考... various 'if' statements, etc.参考log(pmin(x,50000)+1)参考}参考fit1 <- lrm(death ~ f(cell.count))参考<X >参考fit2 <- lrm(death ~ rcs(f(cell.count),4))
Don't put $ inside variable names used in formulas. Either attach data frames or use data=.
不要把$内在公式中使用的变量名。无论是连接数据框或使用data=。
Don't forget to use datadist. Try to use it at the top of your program so that all model fits can automatically take advantage if its distributional summaries for the predictors.
不要忘了使用datadist。尝试使用它的顶部,使你的程序,所有模型拟合可以自动采取的优势,如果其分配的预测摘要。
Don't validate or calibrate models which were reduced by dropping "insignificant" predictors. Proper bootstrap or cross-validation must repeat any variable selection steps for each re-sample. Therefore, validate or calibrate models which contain all candidate predictors, and if you must reduce models, specify the option bw=TRUE to validate or calibrate.
不要validate或calibrate删除“微不足道”的预测模型减少。正确的引导或任何变量的选择必须重复交叉验证步骤,每一个重样。因此,validate或calibrate的车型包含所有候选预测,如果你必须减少车型,指定选项“bw=TRUE到validate或calibrate。
Dropping of "insignificant" predictors ruins much of the usual statistical inference for regression models (confidence limits, standard errors, P-values, chi-squares, ordinary indexes of model performance) and it also results in models which will have worse predictive discrimination.
删除“微不足道”的预测废墟许多通常的统计推断,回归模型(置信区间,标准差,P值chi-squares,普通的模型的性能指标),它也导致模型中这将有更坏的预测歧视。
访问包----------Accessing the Package----------
Use require(rms).
使用require(rms)。
发布应用程序的有效值和样条回归----------Published Applications of rms and Regression Splines----------
Spline fits
样条拟合
Spanos A, Harrell FE, Durack DT (1989): Differential diagnosis of acute meningitis: An analysis of the predictive value of initial observations. JAMA 2700-2707.
斯帕诺斯A,哈勒尔FE,:迪拉克DT(1989)的急性脑膜炎的鉴别诊断:初步观察分析的预测值。 JAMA 2700年至2707年。
Ohman EM, Armstrong PW, Christenson RH, et al. (1996): Cardiac troponin T levels for risk stratification in acute myocardial ischemia. New Eng J Med 335:1333-1341.
奥曼EM,阿姆斯特朗PW,克里斯坦森RH等。 (1996):心肌肌钙蛋白T水平在急性心肌缺血的危险分层。新英格兰医学杂志335:1333-1341。
Bootstrap calibration curve for a parametric survival model:
引导校准曲线的参数生存模型:
Knaus WA, Harrell FE, Fisher CJ, Wagner DP, et al. (1993): The clinical evaluation of new drugs for sepsis: A prospective study design based on survival analysis. JAMA 270:1233-1241.
诺斯WA,哈勒尔FE,费舍尔CJ,瓦格纳DP,等。 (1993):败血症:生存分析的基础上设计的前瞻性研究新的药物的临床评价。 JAMA 270:1233-1241。
Splines, interactions with splines, algebraic form of fitted model from latex.rms
样条曲线,互动与样条曲线的拟合模型,代数形式从latex.rms
Knaus WA, Harrell FE, Lynn J, et al. (1995): The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of Internal Medicine 122:191-203.
诺斯市,哈勒尔FE,熊黛林J,等。 (1995):支持重病住院成人的生存预后模型:客观估计。通志内科122:191-203。
Splines, odds ratio chart from fitted model with nonlinear and interaction terms, use of transcan for imputation
样条曲线,比值比图拟合模型与非线性和交互,使用transcan的归集
Lee KL, Woodlief LH, Topol EJ, Weaver WD, Betriu A. Col J, Simoons M, Aylward P, Van de Werf F, Califf RM. Predictors of 30-day mortality in the era of reperfusion for acute myocardial infarction: results from an international trial of 41,021 patients. Circulation 1995;91:1659-1668.
利KL,Woodlief LH,白杨EJ,韦弗WD艾尔沃德Simoons中号,Betriu A.上校J,P,范·de WERF F·卡利夫RM。从国际审判的41,021例患者的30天死亡率在急性心肌梗死再灌注的时代:结果的预测。流通1995,91:1659-1668。
Splines, external validation of logistic models, prediction rules using point tables
样条曲线,Logistic模型的外部验证,用点表的预测规则
Steyerberg EW, Hargrove YV, et al (2001): Residual mass histology in testicular cancer: development and validation of a clinical prediction rule. Stat in Med 2001;20:3847-3859.
Steyerberg,哈格罗夫YV,EW等人(2001):睾丸癌的组织学:开发和验证的临床预测规则的残余质量。 2001年在医学统计; 20:3847-3859。
van Gorp MJ, Steyerberg EW, et al (2003): Clinical prediction rule for 30-day mortality in Bjork-Shiley convexo-concave valve replacement. J Clinical Epidemiology 2003;56:1006-1012.
面包车GORP MJ,Steyerberg EW等人(2003年):30天的死亡率比约克 - 希利的凹凸瓣膜置换的临床预测规则。 Ĵ临床流行病学2003,56:1006-1012。
Model fitting, bootstrap validation, missing value imputation
模型拟合,引导验证,遗漏值归集
Krijnen P, van Jaarsveld BC, Steyerberg EW, Man in 't Veld AJ, Schalekamp, MADH, Habbema JDF (1998): A clinical prediction rule for renal artery stenosis. Annals of Internal Medicine 129:705-711.
,2001 Krijnen P,BC,Steyerberg EW,在“T草原AJ,Schalekamp,MADH,Habbema JDF(1998年):肾动脉狭窄的临床预测规则的人。纪事内科129:705-711。
Model fitting, splines, bootstrap validation, nomograms
模型拟合,样条曲线,引导验证,诺模图
Kattan MW, Eastham JA, Stapleton AMF, Wheeler TM, Scardino PT. A preoperative nomogram for disease recurrence following radical prostatectomy for prostate cancer. J Natl Ca Inst 1998; 90(10):766-771.
Kattan MW,埃斯罕JA,斯台普顿AMF,惠勒TM,斯卡迪诺PT。后疾病复发的前列腺癌根治性前列腺切除术术前的诺模图。 ĴNATL钙研究所1998,90(10):766-771。
Kattan, MW, Wheeler TM, Scardino PT. A postoperative nomogram for disease recurrence following radical prostatectomy for prostate cancer. J Clin Oncol 1999; 17(5):1499-1507
Kattan,MW,惠勒TM,斯卡迪诺PT。后疾病复发的前列腺癌根治性前列腺切除术术后的诺模图。临床肿瘤学杂志1999,17(5):1499-1507
Kattan MW, Zelefsky MJ, Kupelian PA, Scardino PT, Fuks Z, Leibel SA. A pretreatment nomogram for predicting the outcome of three-dimensional conformal radiotherapy in prostate cancer. J Clin Oncol 2000; 18(19):3252-3259.
斯卡迪诺Kupelian PA,Kattan MW,Zelefsky MJ,PT,FuksŽ,:利贝尔SA。一个预处理诺模图的预测前列腺癌的三维适形放疗的结果。临床肿瘤学杂志,2000,18(19):3252-3259。
Eastham JA, May R, Robertson JL, Sartor O, Kattan MW. Development of a nomogram which predicts the probability of a positive prostate biopsy in men with an abnormal digital rectal examination and a prostate specific antigen between 0 and 4 ng/ml. Urology. (In press).
埃斯罕JA,可以读,罗伯逊JL,萨特Ø,Kattan MW。发展的诺模图的异常数字直肠检查和前列腺特异性抗原介于0和4纳克/毫升的前列腺穿刺活检阳性的男性预测的概率。中华泌尿外科杂志。 (记者)。
Kattan MW, Heller G, Brennan MF. A competing-risk nomogram fir sarcoma-specific death following local recurrence. Stat in Med 2003; 22; 3515-3525.
Kattan MW,海G,布伦南MF。竞争的风险的列线图杉木肉瘤局部复发后死亡。医学统计在2003,22,3515-3525。
Penalized maximum likelihood estimation, regression splines, web site to get predicted values
惩罚最大似然估计,回归样条曲线,网站,得到的预测值
Smits M, Dippel DWJ, Steyerberg EW, et al. Predicting intracranial traumatic findings on computed tomography in patients with minor head injury: The CHIP prediction rule. Ann Int Med 2007; 146:397-405.
史密茨M,Dippel DWJ,Steyerberg EW等。预测脑外伤CT扫描的患者有轻微的头部外伤:该芯片的预测规则的结果。安诠释医学,2007,146:397-405。
Nomogram with 2- and 5-year survival probability and median survival time (but watch out for the use of univariable screening)
诺模图2 - 5年的生存概率和中位生存时间(但要小心使用单变量筛选)
Clark TG, Stewart ME, Altman DG, Smyth JF. A prognostic model for ovarian cancer. Br J Cancer 2001; 85:944-52.
克拉克(TG),斯图尔特ME,奥特曼DG,史密斯JF。卵巢癌的预后模型。消化杂志2001; 85:944-52。
Comprehensive example of parametric survival modeling with an extensive nomogram, time ratio chart, anova chart, survival curves generated using survplot, bootstrap calibration curve
综合例子,具有广泛的诺模图,时间比图,方差分析图表,使用survplot的生存曲线,举校准曲线的参数生存模型
Teno JM, Harrell FE, Knaus WA, et al. Prediction of survival for older hospitalized patients: The HELP survival model. J Am Geriatrics Soc 2000; 48: S16-S24.
特诺,哈勒尔FE,诺斯WA,JM等。老年住院患者生存:HELP生存,模型的预测。 Ĵ上午老年病志2000; 48:S16-S24。
Model fitting, imputation, and several nomograms expressed in tabular form
模型拟合,估算,并以表格的形式表示的数列线图
Hasdai D, Holmes DR, et al. Cardiogenic shock complicating acute myocardial infarction: Predictors of death. Am Heart J 1999; 138:21-31.
哈斯代D,福尔摩斯DR等。心源性休克合并急性心肌梗死患者死亡的预测因子。我心1999; 138:21-31。
Ordinal logistic model with bootstrap calibration plot
有序的MF模式,引导校正曲线
Wu AW, Yasui U, Alzola CF et al. Predicting functional status outcomes in hospitalized patients aged 80 years and older. J Am Geriatric Society 2000; 48:S6-S15.
吴AW Alzola,安井U,CF等。在住院的患者年龄在80岁及以上的预测功能状态的结果。研究老年协会2000; 48:S6-S15。
Propensity modeling in evaluating medical diagnosis, anova dot chart
在,评估医疗诊断,方差分析,点图的倾向建模
Weiss JP, Gruver C, et al. Ordering an echocardiogram for evaluation of left ventricular function: Level of expertise necessary for efficient use. J Am Soc Echocardiography 2000; 13:124-130.
魏斯,Gruver C,JP等。订购超声心动图评价左室功能的有效利用所需的专业知识水平。 Ĵ上午SOC超声心动图,2000,13:124-130。
Simulations using rms to study the properties of various modeling strategies
使用RMS的模拟研究不同的建模策略的属性
Steyerberg EW, Eijkemans MJC, Habbema JDF. Stepwise selection in small data sets: A simulation study of bias in logistic regression analysis. J Clin Epi 1999; 52:935-942.
Steyerberg EW,Eijkemans澳门赛马会,Habbema JDF。逐步选择在小数据集的模拟研究偏向于logistic回归分析。临床盈1999; 52:935-942。
Steyerberg WE, Eijekans MJC, Harrell FE, Habbema JDF. Prognostic modeling with logistic regression analysis: In search of a sensible strategy in small data sets. Med Decision Making 2001; 21:45-56.
Steyerberg我们,Eijekans违反澳门赛马会赛事规例,:哈勒尔FE,Habbema JDF。预后模型logistic回归分析:在搜索一个明智的策略在小数据集。医学决策; 21:45-56。
Statistical methods and references related to rms, along with case studies which includes the rms code which produced the analyses
统计方法和相关文献RMS,随着案例研究,其中包括RMS代码分析
Harrell FE, Lee KL, Mark DB (1996): Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat in Med 15:361-387.
哈勒尔FE,李KL,马克DB(1996):多变量预后模型:在发展模式的问题,评估假设和充分性,并测量和减少错误。医学统计中15:361-387。
Harrell FE, Margolis PA, Gove S, Mason KE, Mulholland EK et al. (1998): Development of a clinical prediction model for an ordinal outcome: The World Health Organization ARI Multicentre Study of clinical signs and etiologic agents of pneumonia, sepsis, and meningitis in young infants. Stat in Med 17:909-944.
戈夫马戈利斯PA,哈勒尔FE,S,梅森KE,穆赫兰EK等。 (1998):发展为一个有序的结果:世界卫生组织急性呼吸道感染的多中心研究的临床症状和病原体的肺炎,败血症和脑膜炎在婴幼儿的临床预测模型。医学统计中17:909-944。
Bender R, Benner, A (2000): Calculating ordinal regression models in SAS and S-Plus. Biometrical J 42:677-699.
本德尔本纳,R,A(2000):计算序数回归模型,SAS和S-PLUS。生物测定Ĵ42:677-699。
报告问题----------Bug Reports----------
The author is willing to help with problems. Send E-mail to f.harrell@vanderbilt.edu. To report bugs, please do the following:
笔者愿意帮助的问题。发送E-的邮件f.harrell @ vanderbilt.edu。要报告错误,请执行以下操作:
If the bug occurs when running a function on a fit object (e.g., anova), attach a dump'd text version of the fit object to your note. If you used datadist but not until after the fit was created, also send the object created by datadist. Example: save(myfit,"/tmp/myfit.rda") will create an R binary save file that can be attached to the E-mail.
如果错误发生在一个合适的对象上运行一个函数(例如,anova),附加一个dumpD文本版本的合适的对象,你的注意。如果您使用datadist但直到拟合,创建后,也发送的datadist创建的对象。例如:save(myfit,"/tmp/myfit.rda")将创建一个R二进制保存文件,可以连接到的E-mail。
If the bug occurs during a model fit (e.g., with lrm, ols, psm, cph), send the statement causing the error with a save'd version of the data frame used in the fit. If this data frame is very large, reduce it to a small subset which still causes the error.
如果错误发生在一个模型拟合(例如,用lrm, ols, psm, cph),该语句发送给一个saved版适合用于数据框错误造成。如果该数据框是非常大的,减少它的一小部分仍然会导致错误。
版权声明----------Copyright Notice----------
GENERAL DISCLAIMER This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.
免责声明这个程序是自由软件,您可以重新分配和/或修改自由软件基金会所发表的GNU通用公共许可证的条款下的任一版本2或(由你选择)任何更新的版本。
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. In short: you may use this code any way you like, as long as you don't charge money for it, remove this notice, or hold anyone liable for its results. Also, please acknowledge the source and communicate changes to the author.
这个程序是分布式的希望,这将是有用的,但没有任何担保,甚至没有暗示的保证适销性或针对特定用途的。有关详细信息,请参阅GNU通用公共许可证。总之:您可以使用此代码,任何你喜欢的方式,只要你不充钱,去除这个通知,或持有其结果承担责任的人。另外,请确认作者的源和沟通。
If this software is used is work presented for publication, kindly reference it using for example: Harrell FE (2009): rms: S functions for biostatistical/epidemiologic modeling, testing, estimation, validation, graphics, and prediction. Programs available from biostat.mc.vanderbilt.edu/rms. Be sure to reference other packages used as well as R itself.
如果该软件采用的是出版工作,请参考使用,例如:哈勒尔FE(2009):RMS:S生物统计/流行病学建模,测试,评估,验证,图形,和预测的功能。程序可从biostat.mc.vanderbilt.edu / RMS。一定要参照其他包以及R本身。
(作者)----------Author(s)----------
Frank E Harrell Jr<br>
Professor of Biostatistics<br>
Chair, Department of Biostatistics<br>
Vanderbilt University School of Medicine<br>
Nashville, Tennessee<br>
<a href="mailto:f.harrell@vanderbilt.edu">f.harrell@vanderbilt.edu</a>
参考文献----------References----------
Regression Modeling Strategies by FE Harrell (Springer-Verlag, 2001) and the web page http://biostat.mc.vanderbilt.edu/rms. See also the Statistics in Medicine articles by Harrell et al listed below for case studies of modeling and model validation using rms. Also see the free book by Alzola and Harrell at http://biostat.mc.vanderbilt.edu.
rms are found at http://biostat.mc.vanderbilt.edu/DataSets.
实例----------Examples----------
######################[#####################]
# Detailed Example 1 #[详细示例1#]
######################[#####################]
# May want to first invoke the Hmisc store function[可能要到第一次调用Hmisc的存储功能]
# so that new variables will go into a temporary directory[因此,新的变量将进入一个临时目录]
set.seed(17) # So can repeat random number sequence[因此,可以重复的随机数序列]
n <- 500
sex <- factor(sample(c('female','male'), n, rep=TRUE))
age <- rnorm(n, 50, 10)
sys.bp <- rnorm(n, 120, 7)
# Use two population models, one with a systolic[使用两个人口模型,收缩压]
# blood pressure effect and one without[血压作用和一个没有]
L <- ifelse(sex=='female', .1*(pmin(age,50)-50), .005*(age-50)^2)
L.bp <- L + .4*(pmax(sys.bp,120)-120)
dz <- ifelse(runif(n) <= plogis(L), 1, 0)
dz.bp <- ifelse(runif(n) <= plogis(L.bp), 1, 0)
# Use summary.formula in the Hmisc package to summarize the[使用summary.formula在Hmisc包总结]
# data one predictor at a time[数据在一个时间的一个预测]
s <- summary(dz.bp ~ age + sex + sys.bp)
options(digits=3)
print(s)
plot(s)
plsmo(age, dz, group=sex, fun=qlogis, ylim=c(-3,3))
plsmo(age, L, group=sex, method='raw', add=TRUE, prefix='True', trim=0)
title('Lowess-smoothed Estimates with True Regression Functions')
dd <- datadist(age, sex, sys.bp)
options(datadist='dd')
# can also do: dd <- datadist(dd, newvar)[也可以这样做:DD < - datadist(日,newvar)]
f <- lrm(dz ~ rcs(age,5)*sex, x=TRUE, y=TRUE)
f
# x=TRUE, y=TRUE for pentrace[X = TRUE,Y = TRUE pentrace]
fpred <- Function(f)
fpred
fpred(age=30, sex=levels(sex))
anova(f)
p <- Predict(f, age, sex, conf.int=FALSE)
plot(p, ylim=c(-3,3), data=data.frame(age,sex))
# Specifying data to plot.Predict results in sex-specific[指定的数据,plot.Predict结果性别特异性]
# rug plots for age using the Hmisc scat1d function[对于年龄使用Hmisc scat1d功能的地毯图]
plsmo(age, L, group=sex, method='raw', add=TRUE, prefix='True', trim=0)
title('Spline Fits with True Regression Functions')
f.bp <- lrm(dz.bp ~ rcs(age,5)*sex + rcs(sys.bp,5))
p <- Predict(f.bp, age, sys.bp, np=75)
for(method in c('contour','persp','image')) {
bplot(p, method=method)
#if(method=='image') iLegend(p, c(34,40),c(115, 120))[(==图像)iLegend(P,C(34,40),C(115,120))]
}
cat('Doing 25 bootstrap repetitions to validate model\n')
validate(f, B=25) # in practice try to use 150[在实践中尝试使用150]
cat('Doing 25 bootstrap reps to check model calibration\n')
cal <- calibrate(f, B=25) # use 150 in practice[在实践中,使用150]
plot(cal)
title('Calibration of Unpenalized Model')
p <- pentrace(f, penalty=c(.009,.009903,.02,.2,.5,1))
f <- update(f, penalty=p$penalty)
f
specs(f,long=TRUE)
edf <- effective.df(f)
p <- Predict(f, age, sex, conf.int=FALSE)
plot(p, ylim=c(-3,3), data=llist(age, sex))
plsmo(age, L, group=sex, method='raw', add=TRUE, prefix='True', trim=0)
title('Penalized Spline Fits with True Regression Functions')
options(digits=3)
s <- summary(f)
s
plot(s)
s <- summary(f, sex='male')
plot(s)
fpred <- Function(f)
fpred
fpred(age=30, sex=levels(sex))
sascode(fpred)
cat('Doing 40 bootstrap reps to validate penalized model\n')
validate(f, B=40)
cat('Doing 40 bootstrap reps to check penalized model calibration\n')
cal <- calibrate(f, B=40)
plot(cal)
title('Calibration of Penalized Model')
nom <- nomogram(f.bp, fun=plogis,
funlabel='Prob(dz)',
fun.at=c(.15,.2,.3,.4,.5,.6,.7,.8,.9,.95,.975))
plot(nom, fun.side=c(1,3,1,3,1,3,1,3,1,3,1))
options(datadist=NULL)
#####################[####################]
#Detailed Example 2 #[详细示例2#]
#####################[####################]
# Simulate the data. [模拟数据。]
n <- 1000 # define sample size[确定样本量]
set.seed(17) # so can reproduce the results[所以可以重现的结果]
treat <- factor(sample(c('a','b','c'), n, TRUE))
num.diseases <- sample(0:4, n, TRUE)
age <- rnorm(n, 50, 10)
cholesterol <- rnorm(n, 200, 25)
weight <- rnorm(n, 150, 20)
sex <- factor(sample(c('female','male'), n, TRUE))
label(age) <- 'Age' # label is in Hmisc[标签是在Hmisc]
label(num.diseases) <- 'Number of Comorbid Diseases'
label(cholesterol) <- 'Total Cholesterol'
label(weight) <- 'Weight, lbs.'
label(sex) <- 'Sex'
units(cholesterol) <- 'mg/dl' # uses units.default in Hmisc[使用units.default在Hmisc]
# Specify population model for log odds that Y=1[指定的log几率的人口模型Y = 1]
L <- .1*(num.diseases-2) + .045*(age-50) +
(log(cholesterol - 10)-5.2)*(-2*(treat=='a') +
3.5*(treat=='b')+2*(treat=='c'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)][模拟二进制y以有PROB(y = 1时)= 1 / [1 +(-L)]]
y <- ifelse(runif(n) < plogis(L), 1, 0)
cholesterol[1:3] <- NA # 3 missings, at random[3 missings,在随机]
ddist <- datadist(cholesterol, treat, num.diseases,
age, weight, sex)
# Could have used ddist <- datadist(data.frame.name)[也可以使用ddist < - datadist(data.frame.name)]
options(datadist="ddist") # defines data dist. to rms[定义数据区。为均方根]
cholesterol <- impute(cholesterol) # see impute in Hmisc package[看到归罪于Hmisc包]
# impute, describe, and several other basic functions are[推诿,描述,和其他一些基本功能是]
# distributed as part of the Hmisc package[作为部分的Hmisc包分发]
fit <- lrm(y ~ treat*log(cholesterol - 10) +
scored(num.diseases) + rcs(age))
describe(y ~ treat + scored(num.diseases) + rcs(age))
# or use describe(formula(fit)) for all variables used in fit[或使用在适合使用的所有变量的描述式(FIT)]
# describe function (in Hmisc) gets simple statistics on variables[描述的功能(Hmisc)上得到了简单的统计变量]
#fit <- robcov(fit) # Would make all statistics which follow[适合 - robcov(FIT)#统计,]
# use a robust covariance matrix[使用强大的协方差矩阵]
# would need x=TRUE, y=TRUE in lrm[需要X = TRUE,Y = TRUE LRM]
specs(fit) # Describe the design characteristics[描述的设计特点]
a <- anova(fit)
print(a, which='subscripts') # print which parameters being tested[打印参数测试]
plot(anova(fit)) # Depict Wald statistics graphically[描绘Wald统计量图形化]
anova(fit, treat, cholesterol) # Test these 2 by themselves[这2个测试自己]
summary(fit) # Estimate effects using default ranges[使用默认的范围估算的影响]
plot(summary(fit)) # Graphical display of effects with C.L.[图形显示效果,C.L.]
summary(fit, treat="b", age=60)
# Specify reference cell and adjustment val[指定参考单元和调整缬氨酸]
summary(fit, age=c(50,70)) # Estimate effect of increasing age from[年龄的增加,估计效果]
# 50 to 70[50到第70]
summary(fit, age=c(50,60,70)) # Increase age from 50 to 70, [增大年龄从50至70岁,]
# adjust to 60 when estimating [调整到60的时候估计]
# effects of other factors[其他因素的影响]
# If had not defined datadist, would have to define[如果没有定义datadist的,必须定义]
# ranges for all var.[所有VAR的范围。]
# Estimate and test treatment (b-a) effect averaged[估算和测试处理(B-A)的效果平均]
# over 3 cholesterols[超过3胆固醇]
contrast(fit, list(treat='b',cholesterol=c(150,200,250)),
list(treat='a',cholesterol=c(150,200,250)),
type='average')
# Remove type='average' to get 3 separate contrasts for b-a[删除类型=平均来获得3个独立的对比度,巴]
# Plot effects. plot(fit) plots effects of all predictors,[图的效果。图(FIT)图的所有预测变量的影响,]
# showing values used for interacting factors as subtitles[用于相互作用的因素,如字幕的显示值]
# The ref.zero parameter is helpful for showing effects of[ref.zero参数是有助益表示影响]
# predictors on a common scale for comparison of strength[预测的通用比例的强度进行比较]
plot(Predict(fit, ref.zero=TRUE), ylim=c(-2,2))
plot(Predict(fit, age=seq(20,80,length=100), treat, conf.int=FALSE))
# Plots relationship between age and log[年龄和log图之间的关系]
# odds, separate curve for each treat, no C.I.[赔率,每个治疗,没有单独的曲线C.I.]
bplot(Predict(fit, age, cholesterol, np=70))
# 3-dimensional perspective plot for age, cholesterol, and[3年龄,胆固醇和三维透视图]
# log odds using default ranges for both variables[log赔率使用这两个变量的默认范围为]
p <- Predict(fit, num.diseases, fun=function(x) 1/(1+exp(-x)),
conf.int=.9) #or fun=plogis[或有趣的plogis]
plot(p, ylab="Prob", conf.int=.9, nlevels=5)
# Treat as categorical variable even though numeric[尽管数字视为分类变量]
# Plot estimated probabilities instead of log odds[图估计概率,而不是log赔率]
# Again, if no datadist were defined, would have to[同样,如果没有被定义datadist,就必须]
# tell plot all limits[告诉绘制的所有限制]
logit <- predict(fit, expand.grid(treat="b",num.diseases=1:3,
age=c(20,40,60),
cholesterol=seq(100,300,length=10)))
# Also see Predict[另请参阅预测]
#logit <- predict(fit, gendata(fit, nobs=12))[罗吉< - 预测(适用gendata(适合,NOBS = 12))]
# Interactively specify 12 predictor combinations using UNIX[交互指定12个预测组合使用UNIX]
# For UNIX or Windows, generate 9 combinations with other variables[对于UNIX或Windows中,产生9种组合与其他变量]
# set to defaults, get predicted values[设置为默认值,预测值]
logit <- predict(fit, gendata(fit, age=c(20,40,60),
treat=c('a','b','c')))
# Since age doesn't interact with anything, we can quickly and[由于年龄不与任何交互,我们可以快速,]
# interactively try various transformations of age,[交互地尝试不同的变革的时代,]
# taking the spline function of age as the gold standard. We are[作为金标准的样条函数的年龄。我们]
# seeking a linearizing transformation. Here age is linear in the[寻求一个线性变换。这里的年龄是线性的,]
# population so this is not very productive. Also, if we simplify the[人口,所以这是非常有成效的。此外,如果我们简化了]
# model the total degrees of freedom will be too small and[建模总自由度将会太小,和]
# confidence limits too narrow[信心限制过于狭窄]
ag <- 10:80
logit <- predict(fit, expand.grid(treat="a",
num.diseases=0, age=ag,
cholesterol=median(cholesterol)),
type="terms")[,"age"]
# Also see Predict[另请参阅预测]
# Note: if age interacted with anything, this would be the age[注意:如果年龄与任何互动,这将是年龄]
# "main effect" ignoring interaction terms[“主要作用”忽略交互项]
# Could also use[也可以使用]
# logit <- plot(f, age=ag, \dots)$x.xbeta[,2][罗吉 - 图(女,年龄= AG,\点)x.xbeta [,2]]
# which allows evaluation of the shape for any level[它允许任何级别的形状评价]
# of interacting factors. When age does not interact with[相互作用的因素。当年龄不与]
# anything, the result from[任何东西,结果从]
# predict(f, \dots, type="terms") would equal the result from[预测(F,\点,类型=“条款”),就等于结果]
# plot if all other terms were ignored[图如果被忽略所有其他条款]
# Could also use[也可以使用]
# logit <- predict(fit, gendata(fit, age=ag, cholesterol=median\dots))[罗吉< - 预测(适用gendata(适合年龄= AG,胆固醇中位数\点))]
plot(ag^.5, logit) # try square root vs. spline transform.[尝试与样条变换的平方根。]
plot(ag^1.5, logit) # try 1.5 power[尝试1.5电源]
# w <- latex(fit) # invokes latex.lrm, creates fit.tex[W < - 乳胶(FIT)#调用latex.lrm的的,创建fit.tex]
# print(w) # display or print model on screen[打印(W)模型在屏幕上显示或打印]
# Draw a nomogram for the model fit[模型拟合绘制的诺模图]
plot(nomogram(fit, fun=plogis, funlabel="Prob[Y=1]"))
# Compose S function to evaluate linear predictors from fit[组成S从拟合函数计算线性预测]
g <- Function(fit)
g(treat='b', cholesterol=260, age=50)
# Leave num.diseases at reference value[将num.diseases的参考价值]
# Use the Hmisc dataRep function to summarize sample[使用Hmisc dataRep的功能总结样品]
# sizes for subjects as cross-classified on 2 key[尺寸为主题的交叉分类2键]
# predictors[的预测]
drep <- dataRep(~ roundN(age,10) + num.diseases)
print(drep, long=TRUE)
# Some approaches to making a plot showing how[一些方法使一个图展示了如何]
# predicted values vary with a continuous predictor[预测值的变化与持续预测]
# on the x-axis, with two other predictors varying[在x-轴,与两个其他预测变]
fit <- lrm(y ~ log(cholesterol - 10) +
num.diseases + rcs(age) + rcs(weight) + sex)
combos <- gendata(fit, age=10:100,
cholesterol=c(170,200,230),
weight=c(150,200,250))
# num.diseases, sex not specified -> set to mode[num.diseases,性别未指定 - >设置为模式]
# can also used expand.grid or Predict[也可以使用expand.grid或预测]
combos$pred <- predict(fit, combos)
require(lattice)
xyplot(pred ~ age | cholesterol*weight, data=combos, type='l')
xYplot(pred ~ age | cholesterol, groups=weight,
data=combos, type='l') # in Hmisc[在Hmisc]
xYplot(pred ~ age, groups=interaction(cholesterol,weight),
data=combos, type='l')
# Can also do this with plot.Predict but a single[也可以这样做但一个单一的plot.Predict]
# plot may be busy:[图可能是忙道:]
ch <- c(170, 200, 230)
p <- Predict(fit, age, cholesterol=ch, weight=150,
conf.int=FALSE)
plot(p, ~age | cholesterol)
#Here we use plot.Predict to make 9 separate plots, with CLs[在这里,我们使用的plot.Predict 9个独立图,与CLS]
p <- Predict(fit, age, cholesterol=c(170,200,230), weight=c(150,200,250))
plot(p, ~age | cholesterol*weight)
options(datadist=NULL)
######################[#####################]
# Detailed Example 3 #[详细示例3#]
######################[#####################]
n <- 2000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n,
rep=TRUE, prob=c(.6, .4)))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
t <- -log(runif(n))/h
label(t) <- 'Follow-up Time'
e <- ifelse(t<=cens,1,0)
t <- pmin(t, cens)
units(t) <- "Year"
age.dec <- cut2(age, g=10, levels.mean=TRUE)
dd <- datadist(age, sex, age.dec)
options(datadist='dd')
Srv <- Surv(t,e)
# Fit a model that doesn't assume anything except[拟合模型,不承担任何东西,除了]
# that deciles are adequate representations of age[十分位数是适当的年龄表示]
f <- cph(Srv ~ strat(age.dec)+strat(sex), surv=TRUE)
# surv=TRUE speeds up computations, and confidence limits when[存活率= TRUE计算速度高达和信心时限制]
# there are no covariables are still accurate.[有没有协变量仍然是准确的。]
# Plot log(-log 3-year survival probability) vs. mean age[图log(log3年的生存概率)与平均年龄]
# within age deciles and vs. sex[在年龄十分位数和与性别]
p <- Predict(f, age.dec, sex, time=3, loglog=TRUE)
plot(p)
plot(p, ~ as.numeric(as.character(age.dec)) | sex, ylim=c(-5,-1))
# Show confidence bars instead. Note some limits are not present (infinite)[信心条形。请注意一些限制是不存在的(无限)]
agen <- as.numeric(as.character(p$age.dec))
xYplot(Cbind(yhat, lower, upper) ~ agen | sex, data=p)
# Fit a model assuming proportional hazards for age and[拟合模型假设的年龄和比例风险]
# absence of age x sex interaction[没有年龄为x性互动]
f <- cph(Srv ~ rcs(age,4)+strat(sex), surv=TRUE)
survplot(f, sex, n.risk=TRUE)
# Add ,age=60 after sex to tell survplot use age=60[添加,年龄在性生活后,告诉survplot使用年限= 60 = 60]
# Validate measures of model performance using the bootstrap[验证模型的性能,使用引导措施]
# First must add data (design matrix and Srv) to fit object[首先必须添加数据(设计矩阵和SRV),以适应对象]
f <- update(f, x=TRUE, y=TRUE)
validate(f, B=10, dxy=TRUE, u=5) # use t=5 for Dxy (only)[使用T = DXY(仅)]
# Use B=150 in practice[在实践中,使用B = 150]
# Validate model for accuracy of predicting survival at t=1[在t = 1,验证模型的准确性预测生存]
# Get Kaplan-Meier estimates by divided subjects into groups[Kaplan-Meier估计成组分为主题]
# of size 200 (for other values of u must put time.inc=u in[尺寸200(其他的u值必须把time.inc = U]
# call to cph)[打检测给CPH)]
cal <- calibrate(f, B=10, u=1, m=200) # B=150 in practice[B = 150在实践中]
plot(cal)
# Check proportional hazards assumption for age terms[年龄方面的检查比例风险假设]
z <- cox.zph(f, 'identity')
print(z); plot(z)
# Re-fit this model without storing underlying survival[请重新安装这个模型没有存储基本的生存]
# curves for reference groups, but storing raw data with[参照群体,但存储原始数据,曲线]
# the fit (could also use f <- update(f, surv=FALSE, x=TRUE, y=TRUE))[飞度(也可以使用F < - 更新(F,存活率= FALSE,X = TRUE时,y = TRUE))]
f <- cph(Srv ~ rcs(age,4)+strat(sex), x=TRUE, y=TRUE)
# Get accurate C.L. for any age[获得准确的C.L.任何年龄]
# Note: for evaluating shape of regression, we would not ordinarily[注:评价形状的回归,我们通常不会]
# bother to get 3-year survival probabilities - would just use X * beta[懒得3年的生存概率 - 只想用X *测试]
# We do so here to use same scale as nonparametric estimates[我们这样做,这里使用同等规模的非参数估计]
f
anova(f)
ages <- seq(20, 80, by=4) # Evaluate at fewer points. Default is 100[评估少点。默认值是100]
# For exact C.L. formula n=100 -> much memory[仅供参考C.L.式中n = 100 - >多少内存]
plot(Predict(f, age=ages, sex, time=3, loglog=TRUE), ylim=c(-5,-1))
# Fit a model assuming proportional hazards for age but[拟合模型假设比例风险年龄,但]
# allowing for general interaction between age and sex[允许一般年龄和性别之间的相互作用]
f <- cph(Srv ~ rcs(age,4)*strat(sex), x=TRUE, y=TRUE)
anova(f)
ages <- seq(20, 80, by=6)
# Still fewer points - more parameters in model[少点 - 更多的参数模型]
# Plot 3-year survival probability (log-log and untransformed)[图3年的生存概率(log记录和未转化)]
# vs. age and sex, obtaining accurate confidence limits[与年龄,性别,获得准确的置信区间]
plot(Predict(f, age=ages, sex, time=3, loglog=TRUE), ylim=c(-5,-1))
plot(Predict(f, age=ages, sex, time=3))
# Having x=TRUE, y=TRUE in fit also allows computation of influence stats[有X = TRUE,Y = TRUE在合适的也可以影响统计数据的计算]
r <- resid(f, "dfbetas")
which.influence(f)
# Use survest to estimate 3-year survival probability and[使用survest估计3年的生存概率和]
# confidence limits for selected subjects[所选科目的置信限]
survest(f, expand.grid(age=c(20,40,60), sex=c('Female','Male')),
times=c(2,4,6), conf.int=.95)
# Create an S function srv that computes fitted[创建一个S计算安装的功能SRV]
# survival probabilities on demand, for non-interaction model[非交互模型的生存概率的需求,]
f <- cph(Srv ~ rcs(age,4)+strat(sex), surv=TRUE)
srv <- Survival(f)
# Define functions to compute 3-year estimates as a function of[作为一个功能的定义的函数来计算3年的估计]
# the linear predictors (X*Beta)[线性预测(X * Beta版)]
surv.f <- function(lp) srv(3, lp, stratum="sex=Female")
surv.m <- function(lp) srv(3, lp, stratum="sex=Male")
# Create a function that computes quantiles of survival time[创建一个函数,计算位数的生存时间]
# on demand[对需求]
quant <- Quantile(f)
# Define functions to compute median survival time[定义函数来计算的中位生存时间]
med.f <- function(lp) quant(.5, lp, stratum="sex=Female")
med.m <- function(lp) quant(.5, lp, stratum="sex=Male")
# Draw a nomogram to compute several types of predicted values[绘制列线图计算几种类型的预测值]
plot(nomogram(f, fun=list(surv.m, surv.f, med.m, med.f),
funlabel=c("S(3 | Male)","S(3 | Female)",
"Median (Male)","Median (Female)"),
fun.at=list(c(.8,.9,.95,.98,.99),c(.1,.3,.5,.7,.8,.9,.95,.98),
c(8,12),c(1,2,4,8,12))))
options(datadist=NULL)
########################################################[################################################## #####]
# Simple examples using small datasets for checking #[简单的例子,使用小数据集检查#]
# calculations across different systems in which random#[计算在不同的系统中,随机#]
# number generators cannot be synchronized. #[数生成器不能同步。 #]
########################################################[################################################## #####]
x1 <- 1:20
x2 <- abs(x1-10)
x3 <- factor(rep(0:2,length.out=20))
y <- c(rep(0:1,8),1,1,1,1)
dd <- datadist(x1,x2,x3)
options(datadist='dd')
f <- lrm(y ~ rcs(x1,3) + x2 + x3)
f
specs(f, TRUE)
anova(f)
anova(f, x1, x2)
plot(anova(f))
s <- summary(f)
s
plot(s, log=TRUE)
par(mfrow=c(2,2))
plot(Predict(f))
par(mfrow=c(1,1))
plot(nomogram(f))
g <- Function(f)
g(11,7,'1')
contrast(f, list(x1=11,x2=7,x3='1'), list(x1=10,x2=6,x3='2'))
fastbw(f)
gendata(f, x1=1:5)
# w <- latex(f)[W < - 乳胶(F)]
f <- update(f, x=TRUE,y=TRUE)
which.influence(f)
residuals(f,'gof')
robcov(f)$var
validate(f, B=10)
cal <- calibrate(f, B=10)
plot(cal)
f <- ols(y ~ rcs(x1,3) + x2 + x3, x=TRUE, y=TRUE)
anova(f)
anova(f, x1, x2)
plot(anova(f))
s <- summary(f)
s
plot(s, log=TRUE)
plot(Predict(f))
plot(nomogram(f))
g <- Function(f)
g(11,7,'1')
contrast(f, list(x1=11,x2=7,x3='1'), list(x1=10,x2=6,x3='2'))
fastbw(f)
gendata(f, x1=1:5)
# w <- latex(f)[W < - 乳胶(F)]
f <- update(f, x=TRUE,y=TRUE)
which.influence(f)
residuals(f,'dfbetas')
robcov(f)$var
validate(f, B=10)
cal <- calibrate(f, B=10)
plot(cal)
S <- Surv(c(1,4,2,3,5,8,6,7,20,18,19,9,12,10,11,13,16,14,15,17))
survplot(survfit(S ~ x3))
f <- psm(S ~ rcs(x1,3)+x2+x3, x=TRUE,y=TRUE)
f
# NOTE: LR chi-sq of 39.67 disagrees with that from old survreg[注:LR卡面积的39.67不同意与老survreg,从]
# and old psm (77.65); suspect were also testing sigma=1[和老的PSM(77.65),犯罪嫌疑人还测试σ= 1]
for(w in c('survival','hazard'))
print(survest(f, data.frame(x1=7,x2=3,x3='1'),
times=c(5,7), conf.int=.95, what=w))
# S-Plus 2000 using old survival package:[2000年,S-PLUS使用旧的生存包:]
# S(t):.925 .684 SE:0.729 0.556 Hazard:0.0734 0.255[S(T):0.925 0.684 SE:0.729 0.556危害:0.0734 0.255]
plot(Predict(f, x1, time=5))
f$var
set.seed(3)
# robcov(f)$var when score residuals implemented[robcov(F)是$ var得分残差实施]
bootcov(f, B=30)$var
validate(f, B=10)
cal <- calibrate(f, cmethod='KM', u=5, B=10, m=10)
plot(cal)
r <- resid(f)
survplot(r)
f <- cph(S ~ rcs(x1,3)+x2+x3, x=TRUE,y=TRUE,surv=TRUE,time.inc=5)
f
plot(Predict(f, x1, time=5))
robcov(f)$var
bootcov(f, B=10)
validate(f, B=10)
cal <- calibrate(f, cmethod='KM', u=5, B=10, m=10)
survplot(f, x1=c(2,19))
options(datadist=NULL)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|