找回密码
 注册
查看: 8161|回复: 0

R语言:cv.glm()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-17 10:03:55 | 显示全部楼层 |阅读模式
cv.glm(boot)
cv.glm()所属R语言包:boot

                                         Cross-validation for Generalized Linear Models
                                         广义线性模型的交叉验证

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

This function calculates the estimated K-fold cross-validation prediction  error for generalized linear models.
此函数计算的广义线性模型估计的K-fold交叉验证预测误差。


用法----------Usage----------


cv.glm(data, glmfit, cost, K)



参数----------Arguments----------

参数:data
A matrix or data frame containing the data.  The rows should be cases and the columns correspond to variables, one of which is the response.  
矩阵或数据框包含的数据。行的应该是情况,列对应至变数,其中之一就是响应。


参数:glmfit
An object of class "glm" containing the results of a generalized linear model fitted to data.  
一个类的对象"glm"包含安装data广义线性模型的结果。


参数:cost
A function of two vector arguments specifying the cost function for the  cross-validation.  The first argument to cost should correspond to the observed responses and the second argument should correspond to the predicted or fitted responses from the generalized linear model.  cost must return a non-negative scalar value.  The default is the average squared error function.  
两个向量参数指定为交叉验证的成本函数的函数。 cost的第一个参数应符合观测到的响应和第二个参数应符合从广义线性模型的预测或安装反应。 cost必须返回一个非负的标值。默认为平均平方误差函数。


参数:K
The number of groups into which the data should be split to estimate the cross-validation prediction error.  The value of K must be such that all groups are of approximately equal size.  If the supplied value of K does not satisfy this criterion then it will be set to the closest integer which does and a warning is generated specifying the value of K used.  The default is to set K equal to the number of observations in data which gives the usual leave-one-out cross-validation.  </table>
到其中的数据应分开,以估计的交叉验证预测误差的群体数量。 K价值必须等,各组的大小约等于。 K如果提供的值不符合这个标准,那么它将会设置为最接近的整数K使用指定的值生成一个警告。默认设置K等于观测data这给平时休假一出交叉验证的数量。 </ TABLE>


Details

详情----------Details----------

The data is divided randomly into K groups.  For each group the generalized linear model is fit to data omitting that group, then the function cost is applied to the observed responses in the group that was omitted from the fit and the prediction made by the fitted models for those observations.
K组随机分为数据。每个组的广义线性模型适合data省略该组,然后功能cost被应用到组中的拟合和拟合模型的预测,从省略观察到的反应这些意见。

When K is the number of observations leave-one-out cross-validation is used and all the possible splits of the data are used.  When K is less than the number of observations the K splits to be used are found by randomly partitioning the data into K groups of approximately equal size.  In this latter case a certain amount of bias is introduced.  This can be reduced by using a simple adjustment (see equation 6.48 in Davison and Hinkley, 1997). The second value returned in delta is the estimate adjusted by this method.
当K的若干意见留一出交叉验证用于所有可能的分割使用的数据。当K不到的K分裂是随机的数据分割成K组大小约等于要使用的若干意见。在后一种情况下一定数额的偏见介绍。这可以减少通过使用一个简单的调整(见6.48方程戴维森和欣克利,1997年)。 delta返回第二个值是用这种方法调整的估计。


值----------Value----------

The returned value is a list with the following components.
返回值是一个由以下部分组成的列表。


参数:call
The original call to cv.glm.  
cv.glm原来的呼叫。


参数:K
The value of K used for the K-fold cross validation.  
价值K使用的K-fold交叉验证。


参数:delta
A vector of length two.  The first component is the raw cross-validation estimate of prediction error.  The second component is the adjusted cross-validation estimate.  The adjustment is designed to compensate for the bias introduced by not using leave-one-out cross-validation.  
一个长度为2的向量。第一部分是原料的交叉验证预测误差的估计。第二部分是调整后的交叉验证估计。调整设计,以弥补不使用留一交叉验证,介绍了偏见。


参数:seed
The value of .Random.seed when cv.glm was called.   </table>
.Random.seed值cv.glm被称为。 </ TABLE>


副作用----------Side Effects----------

The value of .Random.seed is updated.
的.Random.seed值更新。


参考文献----------References----------

Classification and Regression Trees. Wadsworth.
v-fold cross-validation and repeated learning-testing methods. Biometrika, 76, 503&ndash;514
Bootstrap Methods and Their Application. Cambridge University Press.
Journal of the American Statistical Association, 81, 461&ndash;470.
predictions (with Discussion).  Journal of the Royal Statistical Society, B, 36, 111&ndash;147.

参见----------See Also----------

glm, glm.diag, predict
glm,glm.diag,predict


举例----------Examples----------


# leave-one-out and 6-fold cross-validation prediction error for [留一出6倍交叉验证的预测误差为]
# the mammals data set.[数据集的哺乳动物。]
data(mammals, package="MASS")
mammals.glm <- glm(log(brain) ~ log(body), data = mammals)
(cv.err <- cv.glm(mammals, mammals.glm)$delta)
(cv.err.6 <- cv.glm(mammals, mammals.glm, K = 6)$delta)

# As this is a linear model we could calculate the leave-one-out [由于这是一个线性模型,我们可以计算留一出]
# cross-validation estimate without any extra model-fitting.[交叉验证估计没有任何额外的模型拟合。]
muhat <- fitted(mammals.glm)
mammals.diag <- glm.diag(mammals.glm)
(cv.err <- mean((mammals.glm$y - muhat)^2/(1 - mammals.diag$h)^2))


# leave-one-out and 11-fold cross-validation prediction error for [留一出,11倍交叉验证预测误差]
# the nodal data set.  Since the response is a binary variable an[节点的数据集。由于反应是一个二进制变量]
# appropriate cost function is[适当的成本函数是]
cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)

nodal.glm <- glm(r ~ stage+xray+acid, binomial, data = nodal)
(cv.err <- cv.glm(nodal, nodal.glm, cost, K = nrow(nodal))$delta)
(cv.11.err <- cv.glm(nodal, nodal.glm, cost, K = 11)$delta)

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-23 00:00 , Processed in 0.040766 second(s), 23 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表