lrm(rms)
lrm()所属R语言包:rms
Logistic Regression Model
Logistic回归模型
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Fit binary and proportional odds ordinal logistic regression models using maximum likelihood estimation or penalized maximum likelihood estimation. See cr.setup for how to fit forward continuation ratio models with lrm.
适合二进制和比例的赔率有序Logistic回归模型采用最大似然估计或惩罚最大似然估计。见cr.setup如何适应向前延续的比例模型lrm。
用法----------Usage----------
lrm(formula, data, subset, na.action=na.delete, method="lrm.fit",
model=FALSE, x=FALSE, y=FALSE, linear.predictors=TRUE, se.fit=FALSE,
penalty=0, penalty.matrix, tol=1e-7,
strata.penalty=0, var.penalty=c('simple','sandwich'),
weights, normwt, ...)
## S3 method for class 'lrm'
print(x, digits=4, strata.coefs=FALSE,
coefs=TRUE, latex=FALSE, title='Logistic Regression Model', ...)
参数----------Arguments----------
参数:formula
a formula object. An offset term can be included. The offset causes fitting of a model such as logit(Y=1) = Xβ + W, where W is the offset variable having no estimated coefficient. The response variable can be any data type; lrm converts it in alphabetic or numeric order to an S factor variable and recodes it 0,1,2,... internally.
一个公式对象。一个offset一词可以包括在内。偏移的原因拟合的模型,如logit(Y=1) = Xβ + W,其中W是没有估计系数的偏移量。响应变量可以是任何数据类型;lrm将其转换的S因子变量的字母或数字顺序和重新编码0,1,2,...内部。
参数:data
data frame to use. Default is the current frame.
数据框使用。默认为当前帧。
参数:subset
logical expression or vector of subscripts defining a subset of observations to analyze
逻辑表达式或向量的下标定义的一个子集的观测分析
参数:na.action
function to handle NAs in the data. Default is na.delete, which deletes any observation having response or predictor missing, while preserving the attributes of the predictors and maintaining frequencies of deletions due to each variable in the model. This is usually specified using options(na.action="na.delete").
函数在数据处理NA的。默认值是na.delete,删除任何观察响应或预测缺少的属性的预测,同时保留和维护由于每个变量在模型中缺失的频率。这通常是指定使用options(na.action="na.delete")。
参数:method
name of fitting function. Only allowable choice at present is lrm.fit.
拟合函数的名称。仅允许选择目前是lrm.fit的。
参数:model
causes the model frame to be returned in the fit object
在合适的对象模型框架返回
参数:x
causes the expanded design matrix (with missings excluded) to be returned under the name x. For print, an object created by lrm.
扩展的设计矩阵(missings除外)的名称x返回。 print,创建的对象的lrm。
参数:y
causes the response variable (with missings excluded) to be returned under the name y.
响应变量(missings除外)的名称y返回。
参数:linear.predictors
causes the predicted X beta (with missings excluded) to be returned under the name linear.predictors. When the response variable has more than two levels, the first intercept is used.
导致预测的Xβ(missings除外),要返回的名称linear.predictors。当响应变量具有两个以上的水平,第一截距使用。
参数:se.fit
causes the standard errors of the fitted values to be returned under the name se.fit.
导致的名义下se.fit返回的拟合值的标准误差。
参数:penalty
The penalty factor subtracted from the log likelihood is 0.5 β' P β, where β is the vector of regression coefficients other than intercept(s), and P is penalty factors * penalty.matrix and penalty.matrix is defined below. The default is penalty=0 implying that ordinary unpenalized maximum likelihood estimation is used. If penalty is a scalar, it is assumed to be a penalty factor that applies to all non-intercept parameters in the model. Alternatively, specify a list to penalize different types of model terms by differing amounts. The elements in this list are named simple, nonlinear, interaction and nonlinear.interaction. If you omit elements on the right of this series, values are inherited from elements on the left. Examples: penalty=list(simple=5, nonlinear=10) uses a penalty factor of 10 for nonlinear or interaction terms. penalty=list(simple=0, nonlinear=2, nonlinear.interaction=4) does not penalize linear main effects, uses a penalty factor of 2 for nonlinear or interaction effects (that are not both), and 4 for nonlinear interaction effects.
惩罚因子对数似然减去0.5 β' P β,其中β是向量回归系数截距(S)以外,和P是penalty factors * penalty.matrix和<X >的定义如下。默认为penalty.matrix意味着普通unpenalized的最大似然估计。如果penalty=0是一个标量,它被认为是一个惩罚因子,适用于所有非截距模型中的参数。此外,也可以指定一列由不同数量,不同类型的模型项惩罚。在此列表中的元素被命名为penalty和simple, nonlinear, interaction。如果省略的权利,这一系列的元素,元素的左值都继承自。例子:nonlinear.interaction采用了惩罚因子为10的非线性相互作用。 penalty=list(simple=5, nonlinear=10)不惩罚线性主效应的非线性相互作用的影响(是不是两个)和4非线性相互作用的影响,采用的是惩罚因子为2。
参数:penalty.matrix
specifies the symmetric penalty matrix for non-intercept terms. The default matrix for continuous predictors has the variance of the columns of the design matrix in its diagonal elements so that the penalty to the log likelhood is unitless. For main effects for categorical predictors with c categories, the rows and columns of the matrix contain a c-1 \times c-1 sub-matrix that is used to compute the sum of squares about the mean of the c parameter values (setting the parameter to zero for the reference cell) as the penalty component for that predictor. This makes the penalty independent of the choice of the reference cell. If you specify penalty.matrix, you may set the rows and columns for certain parameters to zero so as to not penalize those parameters. Depending on penalty, some elements of penalty.matrix may be overridden automatically by setting them to zero. The penalty matrix that is used in the actual fit is penalty \times diag(pf) \times penalty.matrix \times diag(pf), where pf is the vector of square roots of penalty factors computed from penalty by Penalty.setup in rmsMisc. If you specify penalty.matrix you must specify a nonzero value of penalty or no penalization will be done.
指定对称罚矩阵非截距项。连续预测变量的默认矩阵有设计在其对角线上的元素的矩阵的各列的方差,这样的惩罚到loglikelhood是无量纲的。 c类别,行和列的矩阵的分类预测的主要影响包含一个c-1 \times c-1子矩阵,用于计算的平方的平均值的c参数值(设置单元格的引用参数设置为零),预测的刑罚组件。这使得刑罚独立的参考单元的选择。如果你指定了penalty.matrix,你可以设置某些参数为零,因此不会惩罚那些参数的行和列。根据penalty,一些元素的penalty.matrix可能会被改写自动将它们设置为0。处罚矩阵,用于在实际的配合是penalty \times diag(pf) \times penalty.matrix \times diag(pf),pf是向量的惩罚因子的平方根计算从penaltyPenalty.setuprmsMisc的。如果你指定了penalty.matrix,“你必须指定一个非零值penalty或没有处罚会做。
参数:tol
singularity criterion (see lrm.fit)
奇异标准(见lrm.fit)
参数:strata.penalty
scalar penalty factor for the stratification factor, for the experimental strat variable
标惩罚因子的分层因素,实验strat变量
参数:var.penalty
the type of variance-covariance matrix to be stored in the var component of the fit when penalization is used. The default is the inverse of the penalized information matrix. Specify var.penalty="sandwich" to use the sandwich estimator (see below under var), which limited simulation studies have shown yields variances estimates that are too low.
方差 - 协方差矩阵的类型的要被存储在var组件使用处罚时的拟合。默认情况下是受到处罚的信息矩阵的逆。指定了var.penalty="sandwich",使用夹心估计(见下文var),其中有限模拟研究表明,收益率的方差估计太低。
参数:weights
a vector (same length as y) of possibly fractional case weights
一个向量(y)可能是部分的情况下,权重相同的长度,
参数:normwt
set to TRUE to scale weights so they sum to the length of y; useful for sample surveys as opposed to the default of frequency weighting
设置为TRUE规模weights所以他们总结的长度y;有用的抽样调查,而不是默认的频率加权
参数:...
arguments that are passed to lrm.fit
参数传递给lrm.fit
参数:digits
number of significant digits to use
数量巨大的数字使用
参数:strata.coefs
set to TRUE to print the (experimental) strata coefficients
打印(实验)地层系数设置为TRUE
参数:coefs
specify coefs=FALSE to suppress printing the table of model coefficients, standard errors, etc. Specify coefs=n to print only the first n regression coefficients in the model.
指定coefs=FALSE抑制打印表格模型系数,标准误差等指定coefs=n要打印只有第一个n回归系数的模型。
参数:latex
a logical value indicating whether information should be formatted as plain text or as LaTeX markup
一逻辑值,表示信息是否应该被格式化为纯文本或乳胶标记
参数:title
a character string title to be passed to prModFit
一个字符串标题要传递给prModFit
值----------Value----------
The returned fit object of lrm contains the following components in addition to the ones mentioned under the optional arguments.
返回合适的对象lrm包含以下组件的可选参数下,除了提到的那些。
参数:call
calling expression
调用表达式
参数:freq
table of frequencies for Y in order of increasing Y
表的频率为Y为了增加Y
参数:stats
vector with the following elements: number of observations used in the fit, maximum absolute value of first derivative of log likelihood, model likelihood ratio chi-square, d.f., P-value, c index (area under ROC curve), Somers' D_{xy}, Goodman-Kruskal gamma, Kendall's tau-a rank correlations between predicted probabilities and observed response, the Nagelkerke R^2 index, the Brier score computed with respect to Y > its lowest level, the g-index, gr (the g-index on the odds ratio scale), and gp (the g-index on the probability scale using the same cutoff used for the Brier score). Probabilities are rounded to the nearest 0.002 in the computations or rank correlation indexes. In the case of penalized estimation, the "Model L.R." is computed without the penalty factor, and "d.f." is the effective d.f. from Gray's (1992) Equation 2.9. The P-value uses this corrected model L.R. chi-square and corrected d.f. The score chi-square statistic uses first derivatives which contain penalty components.
包含下列元素:矢量观测的配合,最大绝对值的一阶导数的对数似然模型似然比chi-square,DF,“P值,c指数( ROC曲线下面积),萨默斯D_{xy},古德曼-克鲁斯卡gamma,Kendall的tau-a排名之间的相关性预测的概率和观测到的响应,Nagelkerke R^2指数,野蔷薇得分就Y >的最低水平,g指数,gr(指数g的几率比规模),和gp(计算与g指数的概率规模使用相同的截止使用的石南木得分)。概率四舍五入至最接近的0.002的计算或等级的相关指标。惩罚估计的情况下,"Model L.R."没有惩罚因子计算,和"d.f."是有效的DF从灰色的(1992)公式2.9。 P值使用此修正模型L.R. chi-square和纠正D.F.评分卡方统计使用一阶导数,其中包含惩罚组件。
参数:fail
set to TRUE if convergence failed (and maxiter>1)
如果融合失败(TRUE设置为maxiter>1)
参数:coefficients
estimated parameters
估计参数
参数:var
estimated variance-covariance matrix (inverse of information matrix). If penalty>0, var is either the inverse of the penalized information matrix (the default, if var.penalty="simple") or the sandwich-type variance - covariance matrix estimate (Gray Eq. 2.6) if var.penalty="sandwich". For the latter case the simple information-matrix - based variance matrix is returned under the name var.from.info.matrix.
估计的方差 - 协方差矩阵的(信息矩阵逆)。如果penalty>0,var是受到处罚的信息矩阵的逆(默认情况下,如果var.penalty="simple")或夹心型方差 - 协方差矩阵估计(灰色公式2.6)如果 var.penalty="sandwich"。对于后者的情况下,简单的基于信息矩阵 - 的方差矩阵的名称下var.from.info.matrix返回。
参数:effective.df.diagonal
is returned if penalty>0. It is the vector whose sum is the effective d.f. of the model (counting intercept terms).
将返回如果penalty>0。它是矢量,其总和是有效D.F.模型的(计数截距项)。
参数:u
vector of first derivatives of log-likelihood
矢量的一阶导数的对数似然
参数:deviance
-2 log likelihoods (counting penalty components) When an offset variable is present, three deviances are computed: for intercept(s) only, for intercepts+offset, and for intercepts+offset+predictors. When there is no offset variable, the vector contains deviances for the intercept(s)-only model and the model with intercept(s) and predictors.
-2对数似然(计数处罚组件)的偏移量是三个deviances计算:截距(S),拦截+偏移量,拦截+偏移+的预测。当不存在偏移变量,该向量包含deviances的截距()的唯一模式和模型的截距(s)和预测因子。
参数:est
vector of column numbers of X fitted (intercepts are not counted)
矢量X装的列数(不计入拦截)
参数:non.slopes
number of intercepts in model
数截距模型
参数:penalty
see above
看到以上
参数:penalty.matrix
the penalty matrix actually used in the estimation
实际使用的处罚矩阵的估计
(作者)----------Author(s)----------
Frank Harrell<br>
Department of Biostatistics, Vanderbilt University<br>
f.harrell@vanderbilt.edu
参考文献----------References----------
Applied Statistics 41:191–201, 1992.
Stat in Med 13:2427–2436, 1994.
with applications to breast cancer prognosis. JASA 87:942–951, 1992.
Stat in Med 12:2305–2314, 1993.
Presentation on UVa Web page, 1998.
参见----------See Also----------
lrm.fit, predict.lrm, rms.trans, rms, glm, latex.lrm, residuals.lrm, na.delete, na.detail.response, pentrace, rmsMisc, vif, cr.setup, predab.resample, validate.lrm, calibrate, Mean.lrm, gIndex, prModFit
lrm.fit,predict.lrm,rms.trans,rms,glm,latex.lrm,residuals.lrm,na.delete,na.detail.response,pentrace,rmsMisc,vif,cr.setup,predab.resample,validate.lrm,calibrate,Mean.lrm ,gIndex,prModFit
实例----------Examples----------
#Fit a logistic model containing predictors age, blood.pressure, sex[适合的MF模型,其中包含的预测的年龄,blood.pressure,性]
#and cholesterol, with age fitted with a smooth 5-knot restricted cubic [和胆固醇,随着年龄的装有顺利5结限制立方]
#spline function and a different shape of the age relationship for males [样条函数与一个不同的形状的年龄关系的男]
#and females. As an intermediate step, predict mean cholesterol from[和女性。作为一个中间步骤,预测平均胆固醇从]
#age using a proportional odds ordinal logistic model[年龄使用比例赔率有序模型]
#[]
n <- 1000 # define sample size[确定样本量]
set.seed(17) # so can reproduce the results[所以可以重现的结果]
age <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol <- rnorm(n, 200, 25)
sex <- factor(sample(c('female','male'), n,TRUE))
label(age) <- 'Age' # label is in Hmisc[标签是在Hmisc]
label(cholesterol) <- 'Total Cholesterol'
label(blood.pressure) <- 'Systolic Blood Pressure'
label(sex) <- 'Sex'
units(cholesterol) <- 'mg/dl' # uses units.default in Hmisc[使用units.default在Hmisc]
units(blood.pressure) <- 'mmHg'
#To use prop. odds model, avoid using a huge number of intercepts by[要使用的道具。赔率模型,避免使用大量的拦截]
#grouping cholesterol into 40-tiles[分组的胆固醇转变为40砖]
ch <- cut2(cholesterol, g=40, levels.mean=TRUE) # use mean values in intervals[使用的时间间隔的平均值]
table(ch)
f <- lrm(ch ~ age)
print(f, latex=TRUE, coefs=4)
m <- Mean(f) # see help file for Mean.lrm[请参阅为Mean.lrm的帮助文件]
d <- data.frame(age=seq(0,90,by=10))
m(predict(f, d))
# Repeat using ols[重复使用OLS]
f <- ols(cholesterol ~ age)
predict(f, d)
# Specify population model for log odds that Y=1[指定的log几率的人口模型Y = 1]
L <- .4*(sex=='male') + .045*(age-50) +
(log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)][模拟二进制y以有PROB(y = 1时)= 1 / [1 +(-L)]]
y <- ifelse(runif(n) < plogis(L), 1, 0)
cholesterol[1:3] <- NA # 3 missings, at random[3 missings,在随机]
ddist <- datadist(age, blood.pressure, cholesterol, sex)
options(datadist='ddist')
fit <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)),
x=TRUE, y=TRUE)
# x=TRUE, y=TRUE allows use of resid(), which.influence below[X = Y = TRUE,TRUE,允许使用的重油(),which.influence以下]
# could define d <- datadist(fit) after lrm(), but data distribution[可以定义D < - datadist(FIT)后的LRM的(),但数据分布]
# summary would not be stored with fit, so later uses of Predict[摘要存储相契合,所以后来使用的预测]
# or summary.rms would require access to the original dataset or[summary.rms将需要访问原始数据集或]
# d or specifying all variable values to summary, Predict, nomogram[d或指定的所有变量的值来总结,预测,诺模图]
anova(fit)
p <- Predict(fit, age, sex)
plot(p)
plot(Predict(fit, age=20:70, sex="male")) # need if datadist not used[需要如果不使用datadist]
print(cbind(resid(fit,"dfbetas"), resid(fit,"dffits"))[1:20,])
which.influence(fit, .3)
# latex(fit) #print nice statement of fitted model[乳胶(FIT)#不错的声明拟合模型]
#[]
#Repeat this fit using penalized MLE, penalizing complex terms[重复此配合使用惩罚最大似然估计,惩罚复杂的术语]
#(for nonlinear or interaction effects)[(非线性相互作用的影响)]
#[]
fitp <- update(fit, penalty=list(simple=0,nonlinear=10), x=TRUE, y=TRUE)
effective.df(fitp)
# or lrm(y ~ \dots, penalty=\dots)[或LRM(Y~\点,罚款= \点)]
#Get fits for a variety of penalties and assess predictive accuracy [获取适合各种处罚,并评估预测的准确性]
#in a new data set. Program efficiently so that complex design [在一个新的数据集。计划有效地使复杂的设计]
#matrices are only created once.[矩阵只创建一次。]
set.seed(201)
x1 <- rnorm(500)
x2 <- rnorm(500)
x3 <- sample(0:1,500,rep=TRUE)
L <- x1+abs(x2)+x3
y <- ifelse(runif(500)<=plogis(L), 1, 0)
new.data <- data.frame(x1,x2,x3,y)[301:500,]
#[]
for(penlty in seq(0,.15,by=.005)) {
if(penlty==0) {
f <- lrm(y ~ rcs(x1,4)+rcs(x2,6)*x3, subset=1:300, x=TRUE, y=TRUE)
# True model is linear in x1 and has no interaction[x1和真正的模型是线性的,没有任何交互,]
X <- f$x # saves time for future runs - don't have to use rcs etc.[节省时间,为将来运行 - 没有使用RCS等。]
Y <- f$y # this also deletes rows with NAs (if there were any)[这也删除行与NAS(如果有)]
penalty.matrix <- diag(diag(var(X)))
Xnew <- predict(f, new.data, type="x", incl.non.slopes=FALSE)
# expand design matrix for new data[扩大设计新的数据矩阵]
Ynew <- new.data$y
} else f <- lrm.fit(X,Y, penalty.matrix=penlty*penalty.matrix)
#[]
cat("\nPenalty :",penlty,"\n")
pred.logit <- f$coef[1] + (Xnew %*% f$coef[-1])
pred <- plogis(pred.logit)
C.index <- somers2(pred, Ynew)["C"]
Brier <- mean((pred-Ynew)^2)
Deviance<- -2*sum( Ynew*log(pred) + (1-Ynew)*log(1-pred) )
cat("ROC area:",format(C.index)," Brier score:",format(Brier),
" -2 Log L:",format(Deviance),"\n")
}
#penalty=0.045 gave lowest -2 Log L, Brier, ROC in test sample for S+[试样中的S处罚了最低-2登录L,野蔷薇,ROC = 0.045 +]
#[]
#Use bootstrap validation to estimate predictive accuracy of[使用引导验证,估计预测的准确性]
#logistic models with various penalties[Logistic模型的各种处罚]
#To see how noisy cross-validation estimates can be, change the[交叉验证估计可如何嘈杂,改变]
#validate(f, \dots) to validate(f, method="cross", B=10) for example.[验证,\(F点),以验证(F =“叉乘”,B = 10)。]
#You will see tremendous variation in accuracy with minute changes in[你会看到巨大变化的准确度的微小变化]
#the penalty. This comes from the error inherent in using 10-fold[的违约金。这来自固有的误差,在使用10倍的]
#cross validation but also because we are not fixing the splits. [交叉验证,但也因为我们没有固定的分裂。]
#20-fold cross validation was even worse for some[20倍交叉验证,甚至对一些差]
#indexes because of the small test sample size. Stability would be[索引,因为小试样品的大小。稳定性将]
#obtained by using the same sample splits for all penalty values [通过使用相同的样品获得的分裂可用于所有惩罚值]
#(see above), but then we wouldn't be sure that the choice of the [(见上文),但后来我们不会那么有把握的选择]
#best penalty is not specific to how the sample was split. This[最好的惩罚是不具体是怎么样被分裂了。这]
#problem is addressed in the last example.[在最后一个例子中的问题的解决方法。]
#[]
penalties <- seq(0,.7,by=.1) # really use by=.02[真正使用= 0.02]
index <- matrix(NA, nrow=length(penalties), ncol=11,
dimnames=list(format(penalties),
c("Dxy","R2","Intercept","Slope","Emax","D","U","Q","B","g","gp")))
i <- 0
for(penlty in penalties)
{
cat(penlty, "")
i <- i+1
if(penlty==0)
{
f <- lrm(y ~ rcs(x1,4)+rcs(x2,6)*x3, x=TRUE, y=TRUE) # fit whole sample[适合整个样本]
X <- f$x
Y <- f$y
penalty.matrix <- diag(diag(var(X))) # save time - only do once[节省时间 - 只能做一次]
}
else
f <- lrm(Y ~ X, penalty=penlty,
penalty.matrix=penalty.matrix, x=TRUE,y=TRUE)
val <- validate(f, method="boot", B=20) # use larger B in practice[在实践中,使用大一些的B]
index[i,] <- val[,"index.corrected"]
}
par(mfrow=c(3,3))
for(i in 1:9)
{
plot(penalties, index[,i],
xlab="Penalty", ylab=dimnames(index)[[2]][i])
lines(lowess(penalties, index[,i]))
}
options(datadist=NULL)
# Example of weighted analysis[加权分析的实例]
x <- 1:5
y <- c(0,1,0,1,0)
reps <- c(1,2,3,2,1)
lrm(y ~ x, weights=reps)
x <- rep(x, reps)
y <- rep(y, reps)
lrm(y ~ x) # same as above[与上述相同]
#[]
#Study performance of a modified AIC which uses the effective d.f.[的一种变型的AIC使用有效df的研究表现]
#See Verweij and Van Houwelingen (1994) Eq. (6). Here AIC=chisq-2*df.[VERWEIJ和:凡Houwelingen(1994年)式。 (6)。这里AIC = chisq-2 * DF。]
#Also try as effective d.f. equation (4) of the previous reference.[也可以尝试有效D.F.等式(4)的先前的参考。]
#Also study performance of Shao's cross-validation technique (which was[同样学习表演的邵的交叉验证技术(这是]
#designed to pick the "right" set of variables, and uses a much smaller[设计来接的“右”的变量集,并使用一个更小的]
#training sample than most methods). Compare cross-validated deviance[比大多数方法的训练样本)。交叉验证偏差比较]
#vs. penalty to the gold standard accuracy on a 7500 observation dataset.[与刑罚的黄金标准在7500观测数据集的准确性。]
#Note that if you only want to get AIC or Schwarz Bayesian information[需要注意的是,如果你只想要得到的AIC或施瓦茨贝叶斯信息]
#criterion, all you need is to invoke the pentrace function.[标准,所有你需要的是调用的pentrace功能。]
#NOTE: the effective.df( ) function is used in practice[注:effective.df()函数用于在实践中]
#[]
## Not run: [#不运行:]
for(seed in c(339,777,22,111,3)){
# study performance for several datasets[几个数据集性能研究]
set.seed(seed)
n <- 175; p <- 8
X <- matrix(rnorm(n*p), ncol=p) # p normal(0,1) predictors[p正常(0,1)的预测因子]
Coef <- c(-.1,.2,-.3,.4,-.5,.6,-.65,.7) # true population coefficients[真正的人口系数]
L <- X %*% Coef # intercept is zero[截距是零]
Y <- ifelse(runif(n)<=plogis(L), 1, 0)
pm <- diag(diag(var(X)))
#Generate a large validation sample to use as a gold standard[产生大量的验证作为金标准样品使用]
n.val <- 7500
X.val <- matrix(rnorm(n.val*p), ncol=p)
L.val <- X.val %*% Coef
Y.val <- ifelse(runif(n.val)<=plogis(L.val), 1, 0)
#[]
Penalty <- seq(0,30,by=1)
reps <- length(Penalty)
effective.df <- effective.df2 <- aic <- aic2 <- deviance.val <-
Lpenalty <- single(reps)
n.t <- round(n^.75)
ncv <- c(10,20,30,40) # try various no. of reps in cross-val.[尝试各种没有。的跨值代表。]
deviance <- matrix(NA,nrow=reps,ncol=length(ncv))
#If model were complex, could have started things off by getting X, Y[如果是复杂的模型,已经开始的事情了,获得X,Y]
#penalty.matrix from an initial lrm fit to save time[penalty.matrix从最初的LRM适应,以节省时间]
#[]
for(i in 1:reps) {
pen <- Penalty[i]
cat(format(pen),"")
f.full <- lrm.fit(X, Y, penalty.matrix=pen*pm)
Lpenalty[i] <- pen* t(f.full$coef[-1]) %*% pm %*% f.full$coef[-1]
f.full.nopenalty <- lrm.fit(X, Y, initial=f.full$coef, maxit=1)
info.matrix.unpenalized <- solve(f.full.nopenalty$var)
effective.df[i] <- sum(diag(info.matrix.unpenalized %*% f.full$var)) - 1
lrchisq <- f.full.nopenalty$stats["Model L.R."]
# lrm does all this penalty adjustment automatically (for var, d.f.,[LRM这一切刑罚自动调整(VAR,DF,]
# chi-square)[卡方)]
aic[i] <- lrchisq - 2*effective.df[i]
#[]
pred <- plogis(f.full$linear.predictors)
score.matrix <- cbind(1,X) * (Y - pred)
sum.u.uprime <- t(score.matrix) %*% score.matrix
effective.df2[i] <- sum(diag(f.full$var %*% sum.u.uprime))
aic2[i] <- lrchisq - 2*effective.df2[i]
#[]
#Shao suggested averaging 2*n cross-validations, but let's do only 40[邵平均为2 n * n的交叉验证,但是让我们做的只有40]
#and stop along the way to see if fewer is OK[停止前进的道路上,如果少是确定]
dev <- 0
for(j in 1:max(ncv)) {
s <- sample(1:n, n.t)
cof <- lrm.fit(X[s,],Y[s],
penalty.matrix=pen*pm)$coef
pred <- cof[1] + (X[-s,] %*% cof[-1])
dev <- dev -2*sum(Y[-s]*pred + log(1-plogis(pred)))
for(k in 1:length(ncv)) if(j==ncv[k]) deviance[i,k] <- dev/j
}
#[]
pred.val <- f.full$coef[1] + (X.val %*% f.full$coef[-1])
prob.val <- plogis(pred.val)
deviance.val[i] <- -2*sum(Y.val*pred.val + log(1-prob.val))
}
postscript(hor=TRUE) # along with graphics.off() below, allow plots[与graphics.off()以下,让图]
par(mfrow=c(2,4)) # to be printed as they are finished[要打印的,因为它们是完成]
plot(Penalty, effective.df, type="l")
lines(Penalty, effective.df2, lty=2)
plot(Penalty, Lpenalty, type="l")
title("Penalty on -2 log L")
plot(Penalty, aic, type="l")
lines(Penalty, aic2, lty=2)
for(k in 1:length(ncv)) {
plot(Penalty, deviance[,k], ylab="deviance")
title(paste(ncv[k],"reps"))
lines(supsmu(Penalty, deviance[,k]))
}
plot(Penalty, deviance.val, type="l")
title("Gold Standard (n=7500)")
title(sub=format(seed),adj=1,cex=.5)
graphics.off()
}
## End(Not run)[#(不执行)]
#The results showed that to obtain a clear picture of the penalty-[结果表明,要得到清晰的图像的罚款]
#accuracy relationship one needs 30 or 40 reps in the cross-validation.[精度关系需要30或40个代表中的交叉验证。]
#For 4 of 5 samples, though, the super smoother was able to detect[虽然,对于5个样品中的4,超平滑能够检测到]
#an accurate penalty giving the best (lowest) deviance using 10-fold[一个准确的罚款给予最佳(最低)越轨使用10倍]
#cross-validation. Cross-validation would have worked better had[交叉验证。交叉验证工作更好地]
#the same splits been used for all penalties.[的分裂被用于所有的惩罚。]
#The AIC methods worked just as well and are much quicker to compute.[AIC方法一样可以正常工作和更快的计算。]
#The first AIC based on the effective d.f. in Gray's Eq. 2.9[基于有效D.F.第一AIC在灰色的EQ。 2.9]
#(Verweij and Van Houwelingen (1994) Eq. 5 (note typo)) worked best.[(VERWEIJ和范Houwelingen的(1994)(5)式(附注错字))的效果最好。]
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|