ols(rms)
ols()所属R语言包:rms
Linear Model Estimation Using Ordinary Least Squares
使用普通最小二乘法线性模型估计
译者:生物统计家园网 机器人LoveR
描述----------Description----------
Fits the usual weighted or unweighted linear regression model using the same fitting routines used by lm, but also storing the variance-covariance matrix var and using traditional dummy-variable coding for categorical factors. Also fits unweighted models using penalized least squares, with the same penalization options as in the lrm function. For penalized estimation, there is a fitter function call lm.pfit.
适用于通常的加权或不加权的线性回归模型的使用相同的嵌合例程使用由lm,而且存储的方差 - 协方差矩阵var和使用传统的伪变量编码分类因素。还适合未加权的模型,使用补偿最小二乘,如lrm函数具有相同的处罚选项。处罚估算,有一个钳工功能,调用“lm.pfit。
用法----------Usage----------
ols(formula, data, weights, subset, na.action=na.delete,
method="qr", model=FALSE,
x=FALSE, y=FALSE, se.fit=FALSE, linear.predictors=TRUE,
penalty=0, penalty.matrix, tol=1e-7, sigma,
var.penalty=c('simple','sandwich'), ...)
参数----------Arguments----------
参数:formula
an S formula object, e.g. <br> Y ~ rcs(x1,5)*lsp(x2,c(10,20))
一个公式对象,例如参考Y~RCS(X1,5)*:LSP(X2,C(10,20))
参数:data
name of an S data frame containing all needed variables. Omit this to use a data frame already in the S “search list”.
S的数据框包含所有需要的变量的名称。省略此使用一个数据框已经在S“搜索列表”。
参数:weights
an optional vector of weights to be used in the fitting process. If specified, weighted least squares is used with weights weights (that is, minimizing sum(w*e^2)); otherwise ordinary least squares is used.
在嵌合过程中要使用可选的权重向量。如果指定的话,使用权重加权最小二乘weights(即,最小化sum(w*e^2)),否则为普通最小二乘使用。
参数:subset
an expression defining a subset of the observations to use in the fit. The default is to use all observations. Specify for example age>50 & sex="male" or c(1:100,200:300) respectively to use the observations satisfying a logical expression or those having row numbers in the given vector.
表达式定义的一个子集,在拟合中使用的观测。默认值是使用所有观测。指定例如age>50 & sex="male"或c(1:100,200:300)分别使用满足一个逻辑表达式,或那些具有在给定的矢量的行数的观测。
参数:na.action
specifies an S function to handle missing data. The default is the function na.delete, which causes observations with any variable missing to be deleted. The main difference between na.delete and the S-supplied function na.omit is that na.delete makes a list of the number of observations that are missing on each variable in the model. The na.action is usally specified by e.g. options(na.action="na.delete").
指定一个S函数来处理丢失的数据。默认的功能是na.delete,这会导致丢失被删除任何变量的观测。之间的主要区别na.delete和在S-提供的函数na.omit是na.delete使观测值的数量的列表,在模型中的每个变量上缺少。 na.action这就通常指定例如options(na.action="na.delete")。
参数:method
specifies a particular fitting method, or "model.frame" instead to return the model frame of the predictor and response variables satisfying any subset or missing value checks.
指定一个特定的拟合方法,或"model.frame",而不是返回到模型的预测帧和响应变量满足任何一个子集或遗漏值检查。
参数:model
default is FALSE. Set to TRUE to return the model frame as element model of the fit object.
默认是FALSE。设置TRUE返回的模型框架元素model合适的对象。
参数:x
default is FALSE. Set to TRUE to return the expanded design matrix as element x (without intercept indicators) of the returned fit object. Set both x=TRUE if you are going to use the residuals function later to return anything other than ordinary residuals.
默认是FALSE。设置为TRUE返回扩展的设计矩阵元素x(没有截取指标)返回的合适的对象。设置两个x=TRUE“”如果你要使用的residuals功能后返回残差比普通的任何其他。
参数:y
default is FALSE. Set to TRUE to return the vector of response values as element y of the fit.
默认是FALSE。设置为TRUE返回的响应值的向量元素y的契合。
参数:se.fit
default is FALSE. Set to TRUE to compute the estimated standard errors of the estimate of X beta and store them in element se.fit of the fit.
默认是FALSE。设置为TRUE来计算估计标准误差的估计X beta他们在元素se.fit中和存储的契合。
参数:linear.predictors
set to FALSE to cause predicted values not to be stored
设置为FALSE造成的预测值不存储
参数:penalty
参数:penalty.matrix
see lrm
看到lrm
参数:tol
tolerance for information matrix singularity
信息矩阵奇异容忍
参数:sigma
If sigma is given, it is taken as the actual root mean squared error parameter for the model. Otherwise sigma is estimated from the data using the usual formulas (except for penalized models). It is often convenient to specify sigma=1 for models with no error, when using fastbw to find an approximate model that predicts predicted values from the full model with a given accuracy.
如果sigma,它被当作实际均方根误差模型的参数。否则sigma估计使用的常用公式(除了惩罚模型)的数据。指定sigma=1没有错误的模型,当使用fastbw找到完整的模型预测值与给定精度的近似模型,预测通常是很方便。
参数:var.penalty
the type of variance-covariance matrix to be stored in the var component of the fit when penalization is used. The default is the inverse of the penalized information matrix. Specify var.penalty="sandwich" to use the sandwich estimator (see below under var), which limited simulation studies have shown yields variances estimates that are too low.
方差 - 协方差矩阵的类型的要被存储在var组件使用处罚时的拟合。默认情况下是受到处罚的信息矩阵的逆。指定了var.penalty="sandwich",使用夹心估计(见下文var),其中有限模拟研究表明,收益率的方差估计太低。
参数:...
arguments to pass to lm.wfit or lm.fit
参数,以传递lm.wfit或lm.fit的
Details
详细信息----------Details----------
For penalized estimation, the penalty factor on the log likelihood is -0.5 β' P β / σ^2, where P is defined above. The penalized maximum likelihood estimate (penalized least squares or ridge estimate) of beta is (X'X + P)^{-1} X'Y. The maximum likelihood estimate of σ^2 is (sse + β' P β) / n, where sse is the sum of squared errors (residuals). The effective.df.diagonal vector is the diagonal of the matrix X'X/(sse/n) σ^{2} (X'X + P)^{-1}.
惩罚因子对数似然对于受罚估计,是-0.5 β' P β / σ^2,P定义以上。惩罚最大似然估计(补偿最小二乘,或岭估计)beta是(X'X + P)^{-1} X'Y。最大似然估计的σ^2是(sse + β' P β) / n,其中sse方误差(残差)的总和。 effective.df.diagonal矢量是对角线的矩阵X'X/(sse/n) σ^{2} (X'X + P)^{-1}。
值----------Value----------
the same objects returned from lm (unless penalty or penalty.matrix are given - then an abbreviated list is returned since lm.pfit is used as a fitter) plus the design attributes (see rms). Predicted values are always returned, in the element linear.predictors. The vectors or matrix stored if y=TRUE or x=TRUE have rows deleted according to subset and to missing data, and have names or row names that come from the data frame used as input data. If penalty or penalty.matrix is given, the var matrix returned is an improved variance-covariance matrix for the penalized regression coefficient estimates. If var.penalty="sandwich" (not the default, as limited simulation studies have found it provides variance estimates that are too low) it is defined as σ^{2} (X'X + P)^{-1} X'X (X'X + P)^{-1}, where P is penalty factors * penalty.matrix, with a column and row of zeros added for the intercept. When var.penalty="simple" (the default), var is σ^{2} (X'X + P)^{-1}. The returned list has a vector stats with named elements n, Model L.R., d.f., R2, g, Sigma. Model L.R. is the model likelihood ratio chi-square statistic, and R2 is R^2. For penalized estimation, d.f. is the effective degrees of freedom, which is the sum of the elements of another vector returned, effective.df.diagonal, minus one for the intercept. g is the g-index. Sigma is the penalized maximum likelihood estimate (see below).
返回相同的对象从lm(除非penalty或penalty.matrix - 给出一个简短的列表,则返回自lm.pfit作为钳工),再加上设计属性(见rms“)。预测值总是返回,在元件linear.predictors。存储的向量或矩阵,则y=TRUE或x=TRUE已删除的行根据subset和丢失的数据,并有来自作为输入数据的数据框的名称或行名称。如果penalty或penalty.matrix,var矩阵返回的是一个改进的惩罚的回归系数估计的方差 - 协方差矩阵。如果var.penalty="sandwich"(不是默认的,因为有限的模拟研究已经发现它提供了σ^{2} (X'X + P)^{-1} X'X (X'X + P)^{-1},P是penalty factors * penalty.matrix,它被定义为方差估计过低)零的列和行的增加截距。当var.penalty="simple"(默认值),var是σ^{2} (X'X + P)^{-1}。返回的列表中有一个向量stats命名的元素n, Model L.R., d.f., R2, g, Sigma。 Model L.R.是模型的似然比chi-square统计的,和R2是R^2。惩罚估计,d.f.是返回的有效程度的自由,这是另一个向量的元素的总和,effective.df.diagonal,减去一个用于拦截。 g的的g指数。 Sigma是惩罚最大似然估计(见下文)。
(作者)----------Author(s)----------
Frank Harrell<br>
Department of Biostatistics, Vanderbilt University<br>
f.harrell@vanderbilt.edu
参见----------See Also----------
rms, rms.trans, anova.rms, summary.rms, predict.rms, fastbw, validate, calibrate, Predict, specs.rms, cph, lrm, which.influence, lm, summary.lm, print.ols, residuals.ols, latex.ols, na.delete, na.detail.response, datadist, pentrace, vif, abs.error.pred
rms,rms.trans,anova.rms,summary.rms,predict.rms,fastbw,validate,calibrate,Predict,specs.rms,cph,lrm,which.influence,lm,summary.lm,print.ols,residuals.ols ,latex.ols,na.delete,na.detail.response,datadist,pentrace,vif,abs.error.pred
实例----------Examples----------
set.seed(1)
x1 <- runif(200)
x2 <- sample(0:3, 200, TRUE)
distance <- (x1 + x2/3 + rnorm(200))^2
d <- datadist(x1,x2)
options(datadist="d") # No d -> no summary, plot without giving all details[无D - 没有摘要,的图没有给所有的细节]
f <- ols(sqrt(distance) ~ rcs(x1,4) + scored(x2), x=TRUE)
# could use d <- datadist(f); options(datadist="d") at this point,[可以使用d < - datadist(F)选项(datadist =“D”)在这一点上,]
# but predictor summaries would not be stored in the fit object for[但预测器摘要不会被存储在合适的对象]
# use with plot.Design, summary.Design. In that case, the original[使用与plot.Design,summary.Design。在这种情况下,原始]
# dataset or d would need to be accessed later, or all variable values[数据集“或”d“将需要访问购买,或所有变量的值]
# would have to be specified to summary, plot[总结,图都必须指定]
anova(f)
which.influence(f)
summary(f)
summary.lm(f) # will only work if penalty and penalty.matrix not used[只会工作,如果不使用罚款和penalty.matrix]
# Fit a complex model and approximate it with a simple one[适应复杂的模型和近似它与一个简单的]
x1 <- runif(200)
x2 <- runif(200)
x3 <- runif(200)
x4 <- runif(200)
y <- x1 + x2 + rnorm(200)
f <- ols(y ~ rcs(x1,4) + x2 + x3 + x4)
pred <- fitted(f) # or predict(f) or f$linear.predictors[或预测(六)或f $ linear.predictors的]
f2 <- ols(pred ~ rcs(x1,4) + x2 + x3 + x4, sigma=1)
# sigma=1 prevents numerical problems resulting from R2=1[σ= 1可防止产生的R2 = 1的数值计算问题]
fastbw(f2, aics=100000)
# This will find the best 1-variable model, best 2-variable model, etc.[这将找到最好的1变量模型,最好的2变量模型等。]
# in predicting the predicted values from the original model[在预测的预测值从原来的模型]
options(datadist=NULL)
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|