R语言:choose.k()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-16 19:36:44

choose.k(mgcv)
choose.k()所属R语言包：mgcv

                                    Basis dimension choice for smooths
                                       为平滑的基础上尺寸的选择

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Choosing the basis dimension, and checking the choice, when using penalized regression smoothers.
选择的基础层面，检查的选择，当使用惩罚的回归平滑。

Penalized regression smoothers gain computational efficiency by virtue of being defined using a basis of relatively modest size, k. When setting up models in the mgcv package, using s or te terms in a model formula, k must be chosen: the defaults are essentially arbitrary.
处罚的回归平滑获得美德，用相对适度规模的基础上被定义，k计算效率。当设置了mgcv包模式，使用s或te模型公式中的条款，k必须选择：默认基本上任意。

In practice k-1 (or k) sets the upper limit on the degrees of freedom associated with an s smooth (1 degree of freedom is usually lost to the identifiability constraint on the smooth). For te smooths the upper limit of the degrees of freedom is given by the product of the k values provided for each marginal smooth less one, for the constraint. However the actual effective degrees of freedom are controlled by the degree of penalization selected during fitting, by GCV, AIC, REML or whatever is specified. The exception to this is if a smooth is specified using the fx=TRUE option, in which case it is unpenalized.
在实践中k-1（k）s顺利的关联度自由设置上限（1自由度通常丢失的顺利辨识约束）。 te平滑的自由度上限k提供每个边缘光滑，减去一个值约束，产品的给予。然而，实际有效程度的自由GCV的REML法，工商局，或任何指定，选择在装修的处罚程度的控制。唯一的例外是如果指定使用fx=TRUE选项，在这种情况下，它是unpenalized顺利。

So, exact choice of k is not generally critical: it should be chosen to be large enough that you are reasonably sure of having enough degrees of freedom to represent the underlying "truth" reasonably well, but small enough to maintain reasonable computational efficiency. Clearly "large" and "small" are dependent on the particular problem being addressed.
所以确切的选择，k的是不是一般的关键：应选择足够大，你是合理确保有足够的自由程度，合理地代表底层的“真理”，但足以维持合理的计算效率。明确大和小是依赖于正在解决的具体问题。

As with all model assumptions, it is useful to be able to check the choice of k informally. If the effective degrees of freedom for a model term are estimated to be much less than k-1 then this is unlikely to be very worthwhile, but as the EDF approach k-1, checking can be important. A useful general purpose approach goes as follows: (i) fit your model and extract the deviance residuals; (ii) for each smooth term in your model, fit an equivalent, single, smooth to the residuals, using a substantially increased k to see if there is pattern in the residuals that could potentially be explained by increasing k. Examples are provided below.
与所有的模型假设，它是有用的，能够检查k非正式的选择。如果自由模型长期有效度估计要远远小于k-1然后这是不太可能是非常有价值的，但作为EDF的方法k-1，检查可以是重要的。一个有用的通用方法如下：（一）符合你的模型和提取的偏差残差;（ii）为您的模型在每个顺利长远，符合等效的，单一的，光滑的残差，使用大量增加<X >，看是否有增加k是在有可能被解释的残差模式。下面提供的例子。

The obvious, but more costly, alternative is simply to increase the suspect k  and refit the original model. If there are no statistically important changes as a result of  doing this, then k was large enough. (Change in the smoothness selection criterion,  and/or the effective degrees of freedom, when k is increased, provide the obvious  numerical measures for whether the fit has changed substantially.)
明显的，但更昂贵的替代方法是简单地增加犯罪嫌疑人k和改装的原始模型。如果有这样做的结果没有统计上的重要变化，那么k够大的。（平滑甄选准则的变更，和/或有效自由度，当k增加，提供明显的数值措施是否适合已发生重大变化。）

One scenario that can cause confusion is this: a model is fitted with k=10 for a smooth term, and the EDF for the term is estimated as 7.6, some way below the maximum of 9. The model is then refitted with k=20 and the EDF increases to 8.7 - what is happening - how come the EDF was not 8.7 the first time around? The explanation is that the function space with k=20 contains a larger subspace of functions with EDF 8.7 than did the function space with k=10: one of the functions in this larger subspace fits the data a little better than did any function in the smaller subspace. These subtleties seldom have much impact on the statistical conclusions to be drawn from a model fit, however.
可能会造成混乱的一个场景是这样的：安装k=10顺利任期，并在EDF模型为长期某种方式为7.6，低于9日的最高估计。 - 正在发生的事情 - 模型然后用k=20和法国电力上升8.7改装怎么来的EDF 8.7第一时间解决？函数空间与k=20包含更多的功能与EDF 8.7子空间函数空间比用k=10的解释是：在这个较大的子空间的功能之一，适合的数据比没有好一点任何在较小的子空间的功能。这些细微之处很少有太大的影响，但是从模型的拟合得出的统计结论。

作者（S）----------Author(s)----------

Simon N. Wood <a href="mailto:simon.wood@r-project.org">simon.wood@r-project.org</a>

参考文献----------References----------

举例----------Examples----------

## Simulate some data ....[＃模拟一些数据....]
library(mgcv)
set.seed(0)
dat <- gamSim(1,n=400,scale=2)

## fit a GAM with quite low `k'[＃适合用相当低的K 1自由亚齐运动“]
b<-gam(y~s(x0,k=6)+s(x1,k=6)+s(x2,k=6)+s(x3,k=6),data=dat)
plot(b,pages=1)

## Economical tactic (see below for more obvious approach)....[＃经济策略（更明显的方法见下文）......]
## check for residual pattern, removeable by increasing `k'[＃检查剩余的格局，拆除增加K]
## typically `k', below, chould be substantially larger than [＃通常K，下面，chould大大大于]
## the original, `k' but certainly less than n/2.[＃原来，K，但肯定小于n / 2。]
## Note use of cheap "cs" shrinkage smoothers, and gamma=1.4[＃注意：使用廉价的“CS”收缩平滑，γ= 1.4]
## to reduce chance of overfitting...[＃减少过拟合的机会...]
rsd <- residuals(b)
gam(rsd~s(x0,k=40,bs="cs"),gamma=1.4,data=dat) ## fine[＃罚款]
gam(rsd~s(x1,k=40,bs="cs"),gamma=1.4,data=dat) ## fine[＃罚款]
gam(rsd~s(x2,k=40,bs="cs"),gamma=1.4,data=dat) ## `k' too low[＃K太低]
gam(rsd~s(x3,k=40,bs="cs"),gamma=1.4,data=dat) ## fine[＃罚款]

## similar example with multi-dimensional smooth[＃类似的例子多维平稳]
b1 <- gam(y~s(x0)+s(x1,x2,k=15)+s(x3),data=dat)
rsd <- residuals(b1)
gam(rsd~s(x0,k=40,bs="cs"),gamma=1.4,data=dat) ## fine[＃罚款]
gam(rsd~s(x1,x2,k=100,bs="ts"),gamma=1.4,data=dat) ## `k' too low[＃K太低]
gam(rsd~s(x3,k=40,bs="cs"),gamma=1.4,data=dat) ## fine[＃罚款]

## and a `te' example[＃和一个德的例子]
b2 <- gam(y~s(x0)+te(x1,x2,k=4)+s(x3),data=dat)
rsd <- residuals(b2)
gam(rsd~s(x0,k=40,bs="cs"),gamma=1.4,data=dat) ## fine[＃罚款]
gam(rsd~te(x1,x2,k=10,bs="cs"),gamma=1.4,data=dat) ## `k' too low[＃K太低]
gam(rsd~s(x3,k=40,bs="cs"),gamma=1.4,data=dat) ## fine[＃罚款]

## same approach works with other families in the original model[＃同样的方法在原来的模式，与其他家庭]
dat <- gamSim(1,n=400,scale=.25,dist="poisson")
bp<-gam(y~s(x0,k=6)+s(x1,k=6)+s(x2,k=6)+s(x3,k=6),
      family=poisson,data=dat)
rsd <- residuals(bp)
gam(rsd~s(x0,k=40,bs="cs"),gamma=1.4,data=dat) ## fine[＃罚款]
gam(rsd~s(x1,k=40,bs="cs"),gamma=1.4,data=dat) ## fine[＃罚款]
gam(rsd~s(x2,k=40,bs="cs"),gamma=1.4,data=dat) ## `k' too low[＃K太低]
gam(rsd~s(x3,k=40,bs="cs"),gamma=1.4,data=dat) ## fine[＃罚款]

rm(dat)

## More obvious, but more expensive tactic... Just increase [＃更明显，但更昂贵的战术...只是增加]
## suspicious k until fit is stable.[＃可疑，直到合适的k是稳定的。]

set.seed(0)
dat <- gamSim(1,n=400,scale=2)
## fit a GAM with quite low `k'[＃适合用相当低的K 1自由亚齐运动“]
b <- gam(y~s(x0,k=6)+s(x1,k=6)+s(x2,k=6)+s(x3,k=6),
      data=dat,method="REML")
b
## edf for 3rd smooth is highest as proportion of k -- increase k[＃第三顺利EDF为k的比例最高 - 增加ķ]
b <- gam(y~s(x0,k=6)+s(x1,k=6)+s(x2,k=12)+s(x3,k=6),
      data=dat,method="REML")
b
## edf substantially up, -ve REML substantially down[＃EDF大幅上升，VE REML法大幅下降]
b <- gam(y~s(x0,k=6)+s(x1,k=6)+s(x2,k=24)+s(x3,k=6),
      data=dat,method="REML")
b
## slight edf increase and -ve REML change[＃EDF略有增加，-VE REML法的变化]
b <- gam(y~s(x0,k=6)+s(x1,k=6)+s(x2,k=40)+s(x3,k=6),
      data=dat,method="REML")
b
## defintely stabilized (but really k around 20 would have been fine)[＃defintely稳定（但真正ķ约20本来罚款）]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册