mgcv-FAQ(mgcv)
mgcv-FAQ()所属R语言包:mgcv
Frequently Asked Questions for package mgcv
为包mgcv常见问题
译者:生物统计家园网 机器人LoveR
描述----------Description----------
This page provides answers to some of the questions that get asked most often about mgcv
这页的提供约mgcv最常被问到的一些问题的答案
常见问题列表----------FAQ list----------
How can I compare gamm models? In the identity link normal errors case, then AIC and hypotheis testing based methods are fine. Otherwise it is best to work out a strategy based on the summary.gam Alternatively, simple random effects can be fitted with gam, which makes comparison straightforward. Package gamm4 is an alternative, which allows AIC type model selection for generalized models.
我怎样才能比较GAMM模型?在身份链接正常情况下的错误,那么AIC和hypotheis测试的方法是罚款。否则,它是最好的工作summary.gam另外,可以安装gam简单的随机效应为基础的战略,这使得比较简单。包gamm4是一种替代方法,这使得广义模型的模型选择AIC的类型。
How do I get the equation of an estimated smooth? This slightly misses the point of semi-parametric modelling: the idea is that we estimate the form of the function from data without assuming that it has a particular simple functional form. Of course for practical computation the functions do have underlying mathematical representations, but they are not very helpful, when written down. If you do need the functional forms then see chapter 4 of Wood (2006). However for most purposes it is better to use predict.gam to evaluate the function for whatever argument values you need. If derivatives are required then the simplest approach is to use finite differencing (which also allows SEs etc to be calculated).
我如何估计光滑方程?这稍微偏出点半参数化建模的想法是,我们估计假设的形式,从数据的功能,它有一个特别简单的函数形式。实际计算过程中的功能已基本数学交涉,但他们都不是很有益的,当写下来。如果你需要的功能形式,然后看到伍德(2006年)第4章。然而,对于大多数的目的是更好地使用predict.gam评估任何你所需要的参数值的功能。如果需要衍生工具那么简单的方法是使用有限差分(这也让社企等计算)。
Some code from Wood (2006) causes an error: why? The book was written using mgcv version 1.3. To allow for REML estimation of smoothing parameters in versions 1.5, some changes had to be made to the syntax. In particular the function gam.method no longer exists. The smoothness selection method (GCV, REML etc) is now controlled by the method argument to gam while the optimizer is selected using the optimizer argument. See gam and http://www.maths.bath.ac.uk/~sw283/igam/index.html for details.
伍德(2006年)的一些代码会导致一个错误:为什么?这本书是写使用mgcv 1.3版本。允许在版本1.5中平滑参数REML法估计,一些变化作出的语法。在特定的功能gam.method不再存在。选择平滑法(GCV的,REML法等)现method参数控制gam使用optimizer参数优化选择。看到gam和http://www.maths.bath.ac.uk/的细节〜sw283/igam/index.html。
Why is a model object saved under a previous mgcv version not usable with the current mgcv version? I'm sorry about this issue, I know it's really annoying. Here's my defence. Each mgcv version is run through an extensive test suite before release, to ensure that it gives the same results as before, unless there are good statistical reasons why not (e.g. improvements to p-value approximation, fixing of an error). However it is sometimes necessary to modify the internal structure of model objects in a way that makes an old style object unusable with a newer version. For example, bug fixes or new R features sometimes require changes in the way that things are computed which in turn require modification of the object structure. Similarly improvements, such as the ability to compute smoothing parameters by RE/ML require object level changes. The only fix to this problem is to access the old object using the original mgcv version (available on CRAN), or to recompute the fit using the current mgcv version.
为什么一个模型对象保存在以前的mgcv版本不与当前mgcv版本可用?我对这个问题很抱歉,我知道这是真烦人。这里是我的防守。通过广泛的测试套件运行每个mgcv版本发布之前,要确保它给前相同的结果,除非有良好的统计的原因,为什么不(如改善p值近似,固定一个错误)。然而,它有时是必要的,修改的方式,使一个新的版本无法使用旧式对象模型对象的内部结构。例如,错误修正或新的R功能有时需要在事情计算方式,这反过来又需要修改对象结构的变化。同样的改进,如计算能力平滑参数,RE / ML要求对象级的变化。这个问题的唯一的解决方法是使用原来的mgcv版本(可在CRAN),或重新计算适合使用当前mgcv版本访问旧对象。
When using gamm or gamm4, the reported AIC is different for the gam object and the lme or lmer object. Why is this? There are several reasons for this. The most important is that the models being used are actually different in the two representations. When treating the GAM as a mixed model, you are implicitly assuming that if you gathered a replicate dataset, the smooths in your model would look completely different to the smooths from the original model, except for having the same degree of smoothness. Technically you would expect the smooths to be drawn afresh from their distribution under the random effects model. When viewing the gam from the usual penalized regression perspective, you would expect smooths to look broadly similar under replication of the data. i.e. you are really using Bayesian model for the smooths, rather than a random effects model (it's just that the frequentist random effects and Bayesian computations happen to coincide for computing the estimates). As a result of the different assumptions about the data generating process, AIC model comparisons can give rather different answers depending on the model adopted. Which you use should depend on which model you really think is appropriate. In addition the computations of the AICs are different. The mixed model AIC uses the marginal liklihood and the corresponding number of model parameters. The gam model uses the penalized likelihood and the effective degrees of freedom.
当使用gamm或gamm4,AIC的是gam对象不同和lme或lmer对象的。这是为什么?这有几个原因。最重要的是正在使用的模型,实际上是在不同的两种表示。当作混合模型的自由亚齐运动时,你都隐含假设,如果你聚集复制的数据集,模型中的平滑,看起来完全不同,从原来的模型平滑,除了具有相同的平滑程度。技术上,你会期望的平滑要绘制其分布随机效应模型下重新。从通常的惩罚回归的角度观看时GAM,你会期望平滑寻找下复制的数据大致类似。也就是说,你真正使用贝叶斯模型的平滑,而不是随机效应模型(只是不谋而合计算估计,frequentist随机效应和贝叶斯计算)。由于有关的数据生成过程中的不同假设的结果,AIC的模型比较,可以给根据模型通过比较不同的答案。应取决于您所使用的模型,你真的认为是适当的。此外,工商行政管理部门的计算是不同的。混合模型AIC使用边际似然和相应数量的模型参数。 GAM模型采用的惩罚的可能性和有效自由度。
My new method is failing to beat mgcv, what can I do? If speed is the problem, then make sure that you use the slowest basis possible ("tp") with a large sample size, and experiment with different optimizers to find one that is slow for your problem. For prediction error/MSE, then leaving the smoothing basis dimensions at their arbitrary defaults, when these are inappropriate for the problem setting, is a good way of reducing performance. Similarly, using p-splines in place of derivative penalty based splines will often shave a little more from the performance here. Unlike REML/ML, prediction error based smoothness selection criteria such as Mallows Cp and GCV often produce a small proportion of severe overfits, so careful choise of smoothness selection method can help further. In particular GCV etc. usually result in worse confidence interval and p-value performance than ML or REML. If all this fails, try using a really odd simulation setup for which mgcv is clearly not suited: for example poor performance is almost guaranteed for small noisy datasets with large numbers of predictors.
我的新方法,未能击败mgcv,我能做些什么呢?如果速度是问题,那么请确保您使用最慢的基础("tp")与大样本的大小,并与不同的优化实验,找到一个你的问题是缓慢的。预测误差/微型和小型企业,则留在其任意违约,当这些问题设置不适当的,是降低性能的一个好办法的基础上尺寸的平滑。同样,在地方衍生基于样条罚款使用P-样条往往会刮胡子从性能多一点。 REML法/ ML不同,基于预测误差的平滑如锦葵茂和GCV的选择标准往往产生一个严重overfits的比例很小,这么细致光滑选择法的体例选择可以进一步帮助。特别是GCV的等,通常会导致差的置信区间和p值表现比ML或REML法。如果这一切都失败,尝试使用一个非常奇怪的仿真设置,这mgcv显然是不适合表现不佳,例如:小吵集大量的预测几乎是保证。
作者(S)----------Author(s)----------
Simon N. Wood <a href="mailto:simon.wood@r-project.org">simon.wood@r-project.org</a>
参考文献----------References----------
and Hall/CRC Press.
转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。
注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
|