找回密码
 注册
查看: 2722|回复: 0

R语言 rms包 residuals.lrm()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-9-27 19:14:37 | 显示全部楼层 |阅读模式
residuals.lrm(rms)
residuals.lrm()所属R语言包:rms

                                        Residuals from a Logistic Regression Model Fit
                                         从Logistic回归模型拟合的残差

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

For a binary logistic model fit, computes the following residuals, letting P denote the predicted probability of the higher category of Y, X denote the design matrix (with a column of 1s for the intercept), and L denote the logit or linear predictors: ordinary (Y-P), score (X (Y-P)), pearson ((Y-P)/√{P(1-P)}), deviance (for Y=0 is -√{2|\log(1-P)|}, for Y=1 is √{2|\log(P)|}, pseudo dependent variable used in influence statistics  (L + (Y-P)/(P(1-P))), and partial (X_{i}β_{i} +   (Y-P)/(P(1-P))).
对于二分类Logistic模型拟合,计算出以下的残差,让P表示预测的概率较高类别的Y,X表示设计矩阵(一列1为拦截),L表示Logit或线性预测:普通(Y-P),得分(X (Y-P)),培生集团((Y-P)/√{P(1-P)}),偏差(Y=0 是-√{2|\log(1-P)|},Y=1是√{2|\log(P)|},伪因变量的影响力统计(L + (Y-P)/(P(1-P))),和部分(X_{i}β_{i} +   (Y-P)/(P(1-P)))。

Will compute all these residuals for an ordinal logistic model, using as temporary binary responses dichotomizations of Y, along with the corresponding P, the probability that Y ≥q cutoff.  For type="partial", all  possible dichotomizations are used, and for type="score", the actual components of the first derivative of the log likelihood are used for an ordinal model.  Alternatively, specify type="score.binary" to use binary model score residuals but for all cutpoints of Y (plotted only, not returned). The score.binary,  partial, and perhaps score residuals are useful for checking the proportional odds assumption. If the option pl=TRUE is used to plot the score or score.binary residuals,  a score residual plot is made for each column of the design (predictor) matrix, with Y cutoffs on the x-axis and the mean +- 1.96 standard errors of the score residuals on the y-axis.  You can instead use a box plot to display these residuals, for both score.binary and score. Proportional odds dictates a horizontal score.binary plot.  Partial residual plots use smooth nonparametric estimates, separately for each cutoff of Y.  One examines that plot for parallelism of the curves to check the proportional odds assumption, as well as to see if the predictor behaves linearly.
将计算所有这些有序模型的残差,使用临时二进制的反应dichotomizations的Y,连同相应的P“的概率Y ≥q截止。对于type="partial",的使用,以及所有可能的dichotomizationstype="score",对数似然的一阶导数的实际的组件被用于一个序模型。另外,请指定type="score.binary"使用二进制模式得分残留量,但所有的切点Y(只绘制的,不退还)。 score.binary,partial,也许score残差检查的比例赔率假设是有用的。如果选项“pl=TRUE用于绘制score或score.binary残差,得分残差图的设计(预测)矩阵的每一列,与Y临界值上的x轴方向和的均值+  -  1.96在y-轴的score残差的标准误差。相反,您可以使用一箱线图显示这些残留物,都score.binary和score。比例优势决定了一个的水平score.binary图。部分残差图使用平滑非参数估计,分别为每个截止Y。一检查,图平行的曲线来检查的比例比值的假设,以及如果预测呈线性。

Also computes a variety of influence statistics and the  le Cessie - van Houwelingen - Copas - Hosmer unweighted sum of squares test for global goodness of fit, done separately for each cutoff of Y in the case of an ordinal model.
还计算了各种影响统计数据和乐Cessie  - 面包车Houwelingen  -  COPAS  - 霍斯默未加权的平方之和为全球的拟合优度测试,分别为每个截止Y在一个序模型的情况下。

The plot.lrm.partial function computes partial residuals for a series of binary logistic model fits that all used the same predictors and that specified x=TRUE, y=TRUE.  It then computes smoothed partial residual relationships (using lowess with iter=0) and plots them separately for each predictor, with residual plots from all model fits shown on the same plot for that predictor.
plot.lrm.partial函数计算的部分残留物二元Logistic回归模型进行了一系列的适合,都采用了相同的预测值和该指定的x=TRUE, y=TRUE。然后,它计算平滑部分剩余关系(lowess与iter=0)和图分别为每一个预测,所有在同一个图,预测模型拟合的残差图。


用法----------Usage----------


## S3 method for class 'lrm'
residuals(object, type=c("ordinary", "score", "score.binary",
                  "pearson", "deviance", "pseudo.dep", "partial",
                  "dfbeta","dfbetas","dffit","dffits","hat","gof","lp1"),
           pl=FALSE, xlim, ylim, kint, label.curves=TRUE, which, ...)

## S3 method for class 'lrm.partial'
plot(..., labels, center=FALSE, ylim)



参数----------Arguments----------

参数:object
object created by lrm  
所创建的对象lrm


参数:...
for residuals, applies to type="partial"  when pl is not FALSE.  These are extra arguments passed to the smoothing function.  Can also be used to pass extra arguments to boxplot for type="score" or "score.binary".  For plot.lrm.partial this specifies a series of binary model fit objects.  
为residuals,适用于type="partial"如果pl不FALSE的。这些额外的参数传递的平滑函数。也可以用来传递额外的参数给boxplottype="score"或"score.binary"。对于plot.lrm.partial指定了一系列的二进制模型拟合对象。


参数:type
type of residual desired.  Use type="lp1" to get approximate leave-out-1 linear predictors, derived by subtracting the dffit from the original linear predictor values.  
类型的残余所需的。使用type="lp1"得到近似的离开,是减去dffit从原来的线性预测值的线性预测。


参数:pl
applies only to type="partial", "score", and "score.binary".   For score residuals in an ordinal model, set pl=TRUE to get means and  approximate 0.95 confidence bars vs. Y, separately for each X.  Alternatively, specify pl="boxplot" to use boxplot to draw the plot, with notches and with width proportional to the square root of the cell sizes. For partial residuals, set pl=TRUE (which uses lowess) or pl="supsmu" to get smoothed partial residual plots for all columns of X using supsmu. Use pl="loess" to use loess and get confidence bands ("loess" is not implemented for ordinal responses).  Under R, pl="loess" uses lowess and does not provide confidence bands. If there is more than one X, you should probably use par(mfrow=c( , )) before calling resid. Note that pl="loess" results in plot.loess being called, which requires a large memory allocation.  
仅适用于type="partial","score"和"score.binary"。得分残差序模型,pl=TRUE得到的手段和近似的0.95置信条形与Y,分别为每X。另外,指定pl="boxplot"boxplot画的图,带有凹槽的宽度和单元大小的平方根成正比。对于部分残留物,设置pl=TRUE(,使用lowess)或pl="supsmu"得到平滑的部分剩余图的所有列的X使用supsmu。使用pl="loess"使用loess和得到置信区间("loess"没有实现为序反应)。根据R,pl="loess"使用lowess和不提供置信区间。如果有一个以上的X,你应该使用par(mfrow=c( , )),然后再调用resid。需要注意的是pl="loess"结果plot.loess被调用,这需要大量的内存分配。


参数:xlim
plotting range for x-axis (default = whole range of predictor)  
作图范围为x轴(默认值=全范围的预测)


参数:ylim
plotting range for y-axis (default = whole range of residuals, range of all confidence intervals for score or score.binary or range of all smoothed curves for partial if pl=TRUE, or 0.1 and 0.9 quantiles of the residuals for pl="boxplot".)  
作图范围为y轴(默认=全范围的残差的置信区间score或score.binary或范围的所有平滑的曲线,范围partial如果pl=TRUE,或0.1和0.9位数的残差pl="boxplot")。


参数:kint
for an ordinal model for residuals other than partial, score, or score.binary, specifies the intercept (and the cutoff of Y) to use for the calculations. Specifying kint=2, for example, means to use Y ≥q 3rd level.  
为的序模型残差以外partial,score或score.binary,指定的截距(截止Y)使用的计算。指定kint=2,例如,使用Y ≥q3级。


参数:label.curves
set to FALSE to suppress curve labels when type="partial".  The default, TRUE, causes labcurve to be invoked to label curves where they are most separated.  label.curves can be a list containing the opts parameter for labcurve, to send options to labcurve, such as tilt.  The default for tilt here is TRUE.  
时FALSE设置为type="partial"抑制曲线标签。默认情况下,TRUE,原因labcurve被调用的标签,他们最分开的曲线。 label.curves可以是一个列表,其中包含opts参数labcurve,发送选项labcurve,如tilt。的默认值tilt这里是TRUE。


参数:which
a vector of integers specifying column numbers of the design matrix for which to compute or plot residuals, for type="partial","score","score.binary".  
的向量整数,指定设计矩阵的列数的残差计算或图,type="partial","score","score.binary"的。


参数:labels
for plot.lrm.partial this specifies a vector of character strings  providing labels for the list of binary fits.  By default, the names of the fit objects are used as labels.  The labcurve function is used to label the curve with the labels.  



参数:center
for plot.lrm.partial this causes partial residuals for every model to have a mean of zero before smoothing and plotting  </table>
为plot.lrm.partial这会导致局部的每个模型的残差有一个均值为零平滑前和绘图</ TABLE>


Details

详细信息----------Details----------

For the goodness-of-fit test, the le Cessie-van Houwelingen normal test statistic for the unweighted sum of squared errors (Brier score times n) is used.  For an ordinal response variable, the test  for predicting the probability that Y&ge;q j is done separately for all j (except the first).  Note that the test statistic can have strange behavior  (i.e., it is far too large) if the model has no predictive value.
善良的拟合优度检验,乐Cessie车Houwelingen正常检验统计量的未加权误差平方和(石南木得分时间n)使用。对于一个有序响应变量,预测的概率Y&ge;q j单独进行为所有的j(除了第一个)的测试。需要注意的是检验统计量可以有奇怪的行为(即,它是过于庞大)如果模型没有预测值。

For most of the values of type, you must have specified x=TRUE, y=TRUE to lrm.
对于大多数的值type,你必须指定x=TRUE, y=TRUElrm。

There is yet no literature on interpreting score residual plots for the ordinal model.  Simulations when proportional odds is satisfied have still shown a U-shaped residual plot.  The series of binary model score residuals for all cutoffs of Y seems to better check the assumptions. See the last example.
目前仍没有文献解释评分的顺序模型的残差图。模拟比例优势感到满意时,仍然表现出一个U形的残差图。所有截断的一系列二元模型得分残留量的Y似乎假设,以更好地检查。最后一个例子。


值----------Value----------

a matrix (type="partial","dfbeta","dfbetas","score"),  test statistic (type="gof"), or a vector otherwise.   For partial residuals from an ordinal model, the returned object is a 3-way array (rows of X by columns of X by cutoffs of Y), and NAs deleted during the fit are not re-inserted into the residuals.  For score.binary, nothing is returned.
一个矩阵(type="partial","dfbeta","dfbetas","score"),测试统计(type="gof"),或一个矢量另有。对于部分从序模型的残差,返回的对象是一个三路阵列(X列X的截止时间Y)和NAS在装修中删除的行的没有重新插入残差。对于score.binary,不返回任何值。


(作者)----------Author(s)----------



Frank Harrell<br>
Department of Biostatistics<br>
Vanderbilt University<br>
f.harrell@vanderbilt.edu




参考文献----------References----------



comparison of goodness-of-fit tests for the logistic regression model. Stat in Med 16:965&ndash;980, 1997.


参见----------See Also----------

lrm, naresid, which.influence, loess, supsmu, lowess, boxplot, labcurve
lrm,naresid,which.influence,loess,supsmu,lowess,boxplot,labcurve


实例----------Examples----------


set.seed(1)
x1 <- runif(200, -1, 1)
x2 <- runif(200, -1, 1)
L  <- x1^2 - .5 + x2
y  <- ifelse(runif(200) <= plogis(L), 1, 0)
f <- lrm(y ~ x1 + x2, x=TRUE, y=TRUE)
resid(f)            #add rows for NAs back to data[对于NAS的数据添加行]
resid(f, "score")   #also adds back rows[还增加了后排]
r &lt;- resid(f, "partial")  #for checking transformations of X's[检查X的转换]
par(mfrow=c(1,2))
for(i in 1:2) {
  xx <- if(i==1)x1 else x2
  plot(xx, r[,i], xlab=c('x1','x2')[i])
  lines(lowess(xx,r[,i]))
}
resid(f, "partial", pl="loess")  #same as last 3 lines[同样作为最后3行]
resid(f, "partial", pl=TRUE) #plots for all columns of X using supsmu[X上使用的所有列supsmu图]
resid(f, "gof")           #global test of goodness of fit[全球的拟合优度测试]
lp1 &lt;- resid(f, "lp1")    #approx. leave-out-1 linear predictors[约。留出1线性预测]
-2*sum(y*lp1 + log(1-plogis(lp1)))  #approx leave-out-1 deviance[约留出1偏差]
                                    #formula assumes y is binary[公式假定y是二进制]


# Simulate data from a population proportional odds model[从人口比例优势模型的模拟数据]
set.seed(1)
n   <- 400
age <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
L <- .05*(age-50) + .03*(blood.pressure-120)
p12 &lt;- plogis(L)    # Pr(Y&gt;=1)[镨(Y> = 1)]
p2  &lt;- plogis(L-1)  # Pr(Y=2)[镨(Y = 2)]
p   &lt;- cbind(1-p12, p12-p2, p2)   # individual class probabilites[个别类probabilites的]
# Cumulative probabilities:[累积概率:]
cp  <- matrix(cumsum(t(p)) - rep(0n-1), rep(3,n)), byrow=TRUE, ncol=3)
# simulate multinomial with varying probs:[模拟多项不同probs:]
y <- (cp < runif(n)) %*% rep(1,3)
y <- as.vector(y)
# Thanks to Dave Krantz for this trick[感谢戴夫·克兰茨这一招]
f <- lrm(y ~ age + blood.pressure, x=TRUE, y=TRUE)
par(mfrow=c(2,2))
resid(f, 'score.binary',   pl=TRUE)              #plot score residuals[图得分残差]
resid(f, 'partial', pl=TRUE)                     #plot partial residuals[绘制部分残差]
resid(f, 'gof')           #test GOF for each level separately[分别为每个级别的测试GOF]


# Make a series of binary fits and draw 2 partial residual plots[并制订了一系列的二元配合2分残差图]
#[]
f1 <- lrm(y>=1 ~ age + blood.pressure, x=TRUE, y=TRUE)
f2  <- update(f1, y==2 ~.)
par(mfrow=c(2,1))
plot.lrm.partial(f1, f2)


# Simulate data from both a proportional odds and a non-proportional[从赔率的比例和非比例的模拟数据]
# odds population model.  Check how 3 kinds of residuals detect[赔率人口模型。检查有3种的残留物检测]
# non-prop. odds[非支柱。优势]
set.seed(71)
n <- 400
x <- rnorm(n)

par(mfrow=c(2,3))
for(j in 1:2) {     # 1: prop.odds   2: non-prop. odds[1:prop.odds 2:非支柱。优势]
  if(j==1)
    L <- matrix(c(1.4,.4,-.1,-.5,-.9),nrow=n,ncol=5,byrow=TRUE) + x/2 else {
          # Slopes and intercepts for cutoffs of 1:5 :[截止时间为1:5的斜率和截距:]
          slopes <- c(.7,.5,.3,.3,0)
          ints   <- c(2.5,1.2,0,-1.2,-2.5)
      L <- matrix(ints,nrow=n,ncol=5,byrow=TRUE)+
           matrix(slopes,nrow=n,ncol=5,byrow=TRUE)*x
        }
  p <- plogis(L)
  # Cell probabilities[单元的概率]
  p <- cbind(1-p[,1],p[,1]-p[,2],p[,2]-p[,3],p[,3]-p[,4],p[,4]-p[,5],p[,5])
  # Cumulative probabilities from left to right[由左到右的累积概率]
  cp  <- matrix(cumsum(t(p)) - rep(0n-1), rep(6,n)), byrow=TRUE, ncol=6)
  y   <- (cp < runif(n)) %*% rep(1,6)


  f <- lrm(y ~ x, x=TRUE, y=TRUE)
  for(cutoff in 1:5)print(lrm(y>=cutoff ~ x)$coef)


  print(resid(f,'gof'))
  resid(f, 'score', pl=TRUE)
  # Note that full ordinal model score residuals exhibit a[需要注意的是全序模型得分残差表现出]
  # U-shaped pattern even under prop. odds[U-形图案,即使在道具。优势]
  ti <- if(j==2) 'Non-Proportional Odds\nSlopes=.7 .5 .3 .3 0' else
    'True Proportional Odds\nOrdinal Model Score Residuals'
  title(ti)
  resid(f, 'score.binary', pl=TRUE)
  if(j==1) ti <- 'True Proportional Odds\nBinary Score Residuals'
  title(ti)
  resid(f, 'partial', pl=TRUE)
  if(j==1) ti <- 'True Proportional Odds\nPartial Residuals'
  title(ti)
}
par(mfrow=c(1,1))


## Not run: [#不运行:]
# Get data used in Hosmer et al. paper and reproduce their calculations[霍斯默等使用的数据。纸和再现他们的计算]
v <- Cs(id, low, age, lwt, race, smoke, ptl, ht, ui, ftv, bwt)
d <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat",
                skip=6, col.names=v)
d <- upData(d, race=factor(race,1:3,c('white','black','other')))
f <- lrm(low ~ age + lwt + race + smoke, data=d, x=TRUE,y=TRUE)
f
resid(f, 'gof')
# Their Table 7 Line 2 found sum of squared errors=36.91, expected[他们的表7 2号线发现误差的平方= 36.91,预计总和]
# value under H0=36.45, variance=.065, P=.071[根据H0值= 36.45,方差= 0.065,P = 0.071]
# We got 36.90, 36.45, SD=.26055 (var=.068), P=.085[我们得到了36.90,36.45,SD = 0.26055(VAR = 0.068),P = .085]
# Note that two logistic regression coefficients differed a bit[需要注意的是两个logistic回归系数不同位]
# from their Table 1[从他们的表1]

## End(Not run)[#(不执行)]

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2024-11-24 11:52 , Processed in 0.026033 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表