R语言 mgcv包 gam()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-9-23 10:25:31

gam(mgcv)
gam()所属R语言包：mgcv

                                    Generalized additive models with integrated smoothness estimation
                                       广义加性模型与集成的平滑估计

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Fits a generalized additive model (GAM) to data, the term "GAM" being taken to include any quadratically penalized GLM. The degree of smoothness of model terms is estimated as part of fitting. gam can also fit any GLM subject to multiple quadratic penalties (including  estimation of degree of penalization). Isotropic or scale invariant smooths of any number of variables are available as model terms, as are linear functionals of such smooths; confidence/credible intervals are readily available for any quantity predicted using a fitted model; gam is extendable: users can add smooths.
适合一个广义相加模型（GAM）的数据，“GAM”被视为包括任何二次处罚GLM。模型计算的平滑度估计作为拟合的一部分。 gam也可以适用于任何GLM多个二次处罚（包括估计程度的处罚）。各向同性或规模不变平滑的任意数量的变量的模型计算，这样的线性泛函平滑的信心/可信区间都是现成的使用拟合模型预测任何数量，“gam是可扩展的：用户可以添加平滑。

Smooth terms are represented using penalized regression splines (or similar smoothers) with smoothing parameters selected by GCV/UBRE/AIC/REML or by regression splines with fixed degrees of freedom (mixtures of the two are permitted). Multi-dimensional smooths are  available using penalized thin plate regression splines (isotropic) or tensor product splines  (when an isotropic smooth is inappropriate). For an overview of the smooths available see smooth.terms.  For more on specifying models see gam.models, random.effects and linear.functional.terms. For more on model selection see gam.selection. Do read gam.check and choose.k.
平滑术语表示使用惩罚回归花键（或类似的平滑）与由GCV / UBRE的/ AIC / REML或由固定的自由度（两个的混合物被允许）的的回归花键与选择的平滑化参数。多维平滑可使用惩罚薄板回归样条曲线（各向同性）或张量积样条线（各向同性的光滑是不恰当的）。的平滑的概述，请参阅smooth.terms。欲了解更多有关指定模型gam.models，random.effects和linear.functional.terms。模型选择的更多信息，请参阅gam.selection。不要读为gam.check和choose.k。

See gam from package gam, for GAMs via the original Hastie and Tibshirani approach (see details for differences to this implementation).
见GAM包gam，GAMS通过原来的Hastie和Tibshirani方法（详情请参阅本实施方案的差异）。

For very large datasets see bam, for mixed GAM see gamm and random.effects.
对于非常大的数据集，请参阅bam，混合GAM看到gamm和random.effects。

用法----------Usage----------

gam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,
na.action,offset=NULL,method="GCV.Cp",
optimizer=c("outer","newton"),control=list(),scale=0,
select=FALSE,knots=NULL,sp=NULL,min.sp=NULL,H=NULL,gamma=1,
fit=TRUE,paraPen=NULL,G=NULL,in.out,...)

参数----------Arguments----------

参数：formula
A GAM formula (see formula.gam and also gam.models).  This is exactly like the formula for a GLM except that smooth terms, s and te can be added  to the right hand side to specify that the linear predictor depends on smooth functions of predictors  (or linear functionals of these).
一个GAM的公式（见formula.gam和gam.models）。这是完全一样的公式，除非GLM那光滑的条款，s和te可以被添加到指定的线性预测依赖于光滑函数的预测（或线性泛函的右手边这些）。

参数：family
This is a family object specifying the distribution and link to use in fitting etc. See glm and family for more details. A negative binomial family is provided: see negbin.  quasi families actually result in the use of extended quasi-likelihood  if method is set to a RE/ML method (McCullagh and Nelder, 1989, 9.6).
这是一个家庭对象指定的分配和使用链接配件等glm和family更多的细节。负二项分布家庭提供：看到negbin。 quasi家庭实际上导致在使用扩展的拟似然method设置为一个RE / ML方法（McCullagh和Nelder，1989年，9.6）。

参数：data
A data frame or list containing the model response variable and  covariates required by the formula. By default the variables are taken  from environment(formula): typically the environment from  which gam is called.
式所需的一个数据框或列表包含模型响应变量，协变量。默认情况下，变量从environment(formula)：gam被称为典型的环境。

参数：weights
prior weights on the data.
现有的数据上的权重。

参数：subset
an optional vector specifying a subset of observations to be used in the fitting process.
一个可选的矢量指定的装配过程中可以使用的观测值的一个子集。

参数：na.action
a function which indicates what should happen when the data contain "NA"s.  The default is set by the "na.action" setting of "options", and is "na.fail" if that is unset.  The “factory-fresh” default is "na.omit".
一个函数，它表示时会发生什么数据包含“NA”。默认设置是“na.action设置选项，na.fail”如果是没有设置的。 “工厂新鲜的”默认“na.omit。

参数：offset
Can be used to supply a model offset for use in fitting. Note that this offset will always be completely ignored when predicting, unlike an offset  included in formula: this conforms to the behaviour of lm and glm.
可以用来提供一个模型偏移量用于接头。请注意，此偏移量总是被完全忽略当预测，不像一个偏移量包含在formula：这符合的lm和glm的行为。

参数：control
A list of fit control parameters to replace defaults returned by  gam.control. Values not set assume default values.
一个合适的控制参数，以取代默认值返回gam.control。未设置假设值默认值。

参数：method
The smoothing parameter estimation method. "GCV.Cp" to use GCV for unknown scale parameter and Mallows' Cp/UBRE/AIC for known scale. "GACV.Cp" is equivalent, but using GACV in place of GCV. "REML"  for REML estimation, including of unknown scale, "P-REML" for REML estimation, but using a Pearson estimate  of the scale. "ML" and "P-ML" are similar, but using maximum likelihood in place of REML.
平滑参数估计方法。 "GCV.Cp"使用GCV对未知的尺度参数和锦葵“的CP / UBRE / AIC已知的规模。 "GACV.Cp"是等价的，但使用的GCV GACV的地方。 "REML"REML估计，包括不明刻度，"P-REML"REML估计，但使用的Pearson估计规模。 "ML"和"P-ML"是相似的，但用最大似然的地方REML。

参数：optimizer
An array specifying the numerical optimization method to use to optimize the smoothing  parameter estimation criterion (given by method). "perf" for performance iteration. "outer"  for the more stable direct approach. "outer" can use several alternative optimizers, specified in the  second element of optimizer: "newton" (default), "bfgs", "optim", "nlm"  and "nlm.fd" (the latter is based entirely on finite differenced derivatives and is very slow).
一个数组，指定的数值优化方法，使用优化的平滑参数估计准则（method）。 "perf"性能迭代。 "outer"更稳定的直接方法。 "outer"可以使用optimizer："newton"（默认），"bfgs"，"optim"，"nlm"和第二个元素中指定的几种可供选择的优化， "nlm.fd"（后者则是完全基于上有限差分衍生工具，很慢）。

参数：scale
If this is positive then it is taken as the known scale parameter. Negative signals that the  scale parameter is unknown. 0 signals that the scale parameter is 1  for Poisson and binomial and unknown otherwise.  Note that (RE)ML methods can only work with scale parameter 1 for the Poisson and binomial cases.
如果这是正的，那么它被当作已知尺度参数。负信号，规模参数是未知的。 0信号泊松分布和二项分布和未知的，否则，尺度参数为1。需要注意的是（RE）的ML方法只能工作与尺度参数的泊松分布和二项式情况下。

参数：select
If this is TRUE then gam can add an extra penalty to each term so  that it can be penalized to zero.  This means that the smoothing parameter estimation that is  part of fitting can completely remove terms from the model. If the corresponding  smoothing parameter is estimated as zero then the extra penalty has no effect.
如果这是TRUE然后gam可以添加一个额外的处罚，以每学期，以便它可以被扣分零。这意味着平滑参数估计是拟合的一部分的，可以完全除去从模型中的条款。如果相应的平滑参数估计值为零，那么额外的罚款没有任何效果。

参数：knots
this is an optional list containing user specified knot values to be used for basis construction.  For most bases the user simply supplies the knots to be used, which must match up with the k value supplied (note that the number of knots is not always just k).  See tprs for what happens in the "tp"/"ts" case.  Different terms can use different numbers of knots, unless they share a covariate.
这是一个可选的列表，其中包含用户指定的节点值用于基础建设。对于最基础的用户只需提供要使用的节，它必须匹配的k值（附注的节点数不是永远只是k）。见tprs"tp"/"ts"情况下会发生什么。不同的术语可以使用不同的节数，除非他们共享一个协。

参数：sp
A vector of smoothing parameters can be provided here. Smoothing parameters must be supplied in the order that the smooth terms appear in the model  formula. Negative elements indicate that the parameter should be estimated, and hence a mixture  of fixed and estimated parameters is possible. If smooths share smoothing parameters then length(sp)  must correspond to the number of underlying smoothing parameters.
平滑化参数的一种向量，可以提供在这里。必须提供平滑参数的顺序，顺利的词出现在模型公式。负性元件表明应当估计的参数，因此，固定和估计参数的混合物是可能的。如果平滑份额平滑参数，那么length(sp)必须符合相关的平滑参数的数量。

参数：min.sp
Lower bounds can be supplied for the smoothing parameters. Note that if this option is used then the smoothing parameters full.sp, in the  returned object, will need to be added to what is supplied here to get the  smoothing parameters actually multiplying the penalties. length(min.sp) should  always be the same as the total number of penalties (so it may be longer than sp, if smooths share smoothing parameters).
下界能够供给的平滑化参数。请注意，如果使用此选项，然后平滑参数full.sp，返回的对象中，将需要添加什么是这里提供的平滑参数乘以处罚。 length(min.sp)应始终是相同的刑罚（所以它可能是长于sp，如果平滑份额平滑参数）的总人数。

参数：H
A user supplied fixed quadratic penalty on the parameters of the  GAM can be supplied, with this as its coefficient matrix. A common use of this term is  to add a ridge penalty to the parameters of the GAM in circumstances in which the model is close to un-identifiable on the scale of the linear predictor, but perfectly well defined on the response scale.
用户提供的固定二次罚的GAM的参数可以提供，这是系数矩阵。使用这一术语是一个常见的添加脊处罚，GAM的情况下，该模型是未识别的线性预测的规模，但完全定义的响应规模的参数。

参数：gamma
It is sometimes useful to inflate the model degrees of  freedom in the GCV or UBRE/AIC score by a constant multiplier. This allows  such a multiplier to be supplied.
有时它是有用的GCV或UBRE的/ AIC得分由一个常乘数充气模型的自由度。这允许将要提供这样一个乘法器。

参数：fit
If this argument is TRUE then gam sets up the model and fits it, but if it is FALSE then the model is set up and an object G containing what would be required to fit is returned is returned. See argument G.
如果这种说法是TRUE然后gam设置模式和适合它，但如果它是FALSE然后对模型进行设置和对象G包含将需要，以适应返回返回。请参阅参数G。

参数：paraPen
optional list specifying any penalties to be applied to parametric model terms.  gam.models explains more.
可选的列表，指定参数模型计算被应用到任何处罚。 gam.models解释更多。

参数：G
Usually NULL, but may contain the object returned by a previous call to gam with  fit=FALSE, in which case all other arguments are ignored except for gamma, in.out, scale, control, method optimizer and fit.
通常是NULL，但可能包含对象返回以前调用gam的fit=FALSE，在这种情况下，所有其它参数将被忽略，除了gamma，in.out ，scale，control，methodoptimizer和fit。

参数：in.out
optional list for initializing outer iteration. If supplied then this must contain two elements: sp should be an array of initialization values for all smoothing parameters (there must be a value for all smoothing parameters, whether fixed or to be estimated, but those for fixed s.p.s are not used); scale is the typical scale of the GCV/UBRE function, for passing to the outer optimizer, or the the initial value of the scale parameter, if this is to be estimated by RE/ML.
初始化外部循环的可选列表。如果提供，则必须包含两个要素：sp应该是一个数组初始化所有的平滑参数值（是固定的还是要估计，必须有所有的平滑参数的值，而固定SPS不使用的话）;scale是GCV / UBRE功能的的典型尺度，用于传递到外的优化器，或尺度参数的初始值，如果这是要估计的RE / ML。

参数：...
further arguments for  passing on e.g. to gam.fit (such as mustart).
在例如通过进一步的论据gam.fit（如mustart）。

Details

----------Details----------

A generalized additive model (GAM) is a generalized linear model (GLM) in which the linear  predictor is given by a user specified sum of smooth functions of the covariates plus a  conventional parametric component of the linear predictor. A simple example is:
一个广义相加模型（GAM）是一个广义线性模型（GLM）的线性预测是由用户指定的协变量的函数平滑，再加上传统的参数化组件的线性预测的总和。一个简单的例子是：

where the (independent) response variables y_i~Poi, and f_1 and f_2 are smooth functions of covariates x_1 and  x_2. The log is an example of a link function.
（独立的）响应变量y_i~Poi和f_1和f_2是光滑函数的协变量x_1和x_2。的log的一个例子是一个链接函数。

If absolutely any smooth functions were allowed in model fitting then maximum likelihood  estimation of such models would invariably result in complex overfitting estimates of  f_1  and f_2. For this reason the models are usually fit by  penalized likelihood  maximization, in which the model (negative log) likelihood is modified by the addition of  a penalty for each smooth function, penalizing its "wiggliness". To control the tradeoff  between penalizing wiggliness and penalizing badness of fit each penalty is multiplied by  an associated smoothing parameter: how to estimate these parameters, and  how to practically represent the smooth functions are the main statistical questions  introduced by moving from GLMs to GAMs.
如果确实被允许在任何光滑的函数模型拟合，最大似然估计这些模型往往会导致复杂的过拟合估计f_1和f_2。出于这个原因的模型通常是适合由惩罚的可能性最大化，其中模型（负对数）的可能性被修改通过加入每个平滑函数罚款，惩罚“wiggliness。要控制，之间的的惩罚wiggliness和惩罚不良适合每个罚球乘以相关的平滑参数：如何估计这些参数的权衡，以及如何在实践中代表顺利的功能是主要的统计问题，介绍了从GLMS GAMS。

The mgcv implementation of gam represents the smooth functions using  penalized regression splines, and by default uses basis functions for these splines that  are designed to be optimal, given the number basis functions used. The smooth terms can be  functions of any number of covariates and the user has some control over how smoothness of  the functions is measured.
mgcvgam实施顺利使用惩罚的回归样条曲线的功能，在默认情况下使用这些曲线的设计是最佳的，因为数基函数的基础功能。光滑的术语可以是任意数量的协变量的函数，并且用户具有一定的控制的函数的平滑度如何测量。

gam in mgcv solves the smoothing parameter estimation problem by using the  Generalized Cross Validation (GCV) criterion
gam在mgcv解决了平滑参数估计问题通过使用广义交叉验证（GCV）标准，

or an Un-Biased Risk Estimator (UBRE )criterion
或无偏风险估计（UBRE）标准

where D is the deviance, n the number of data, s the scale parameter and  DoF the effective degrees of freedom of the model. Notice that UBRE is effectively just AIC rescaled, but is only used when s is known.
其中D是越轨行为，n数据的数量，s的尺度参数和DoF有效度模型的自由。请注意，UBRE实际上只是AIC重新调整，但只用在s被称为。

Alternatives are GACV, or a Laplace approximation to REML. There is some evidence that the latter may actually be the most effective choice.
替代品GACV，或Laplace逼近REML。有一些证据表明，后者实际上可能是最有效的选择。

Smoothing parameters are chosen to  minimize the GCV, UBRE/AIC, GACV or REML scores for the model, and the main computational challenge solved  by the mgcv package is to do this efficiently and reliably. Various alternative numerical methods are provided which can be set by argument optimizer.
平滑化参数的选择，以尽量减少GCV，UBRE / AIC，GACV或模型REML分数，和求解的主要计算挑战mgcv包是有效和可靠地做到这一点。各种替代数值方法提供了可以设置的参数optimizer。

Broadly gam works by first constructing basis functions and one or more quadratic penalty  coefficient matrices for each smooth term in the model formula, obtaining a model matrix for  the strictly parametric part of the model formula, and combining these to obtain a  complete model matrix (/design matrix) and a set of penalty matrices for the smooth terms.  Some linear identifiability constraints are also obtained at this point. The model is  fit using gam.fit, a modification of glm.fit. The GAM  penalized likelihood maximization problem is solved by Penalized Iteratively  Reweighted  Least Squares (P-IRLS) (see e.g. Wood 2000).  Smoothing parameter selection is integrated in one of two ways. (i) "Performance iteration" uses the fact that at each P-IRLS iteration a penalized  weighted least squares problem is solved, and the smoothing parameters of that problem can  estimated by GCV or UBRE. Eventually, in most cases, both model parameter estimates and smoothing  parameter estimates converge. (ii) Alternatively the P-IRLS scheme is iterated to convergence for each trial set of smoothing parameters, and GCV, UBRE or REML scores are only evaluated on convergence - optimization is then "outer" to the P-IRLS loop: in this case the P-IRLS iteration has to be differentiated, to facilitate optimization, and gam.fit3 is used in place of gam.fit. The default is the second method, outer iteration.
广义gam的工作原理是第一构造的基础功能和一个或多个二次罚系数矩阵中的模型公式为每个平滑内，获得模型矩阵模型公式为严格的参数的一部分，并结合这些以获得一个完整的模型/设计矩阵（矩阵）和刑罚矩阵顺利条款的一组。一些线性辨识性约束在这一点上也能获得。该模型是适合使用gam.fit，glm.fit修改。的的GAM处罚的可能性最大化问题得到解决，由受罚迭代加权最小二乘法（P-IRLS）（如木材2000）。平滑参数的选择是集成在以下两种方式之一。（I）的性能迭代“，在每个P-IRLS迭代一个惩罚加权最小二乘问题的解决，这个问题可以平滑参数估计GCV或UBRE所使用的事实。最终，在大多数情况下，两个模型参数的估计和平滑参数估计值的收敛。（2）或者的P-IRLS计划的迭代收敛，为每个审判平滑参数，GCV，UBRE或REML分数的评价收敛 - “外部”是的P-IRLS循环的优化：在本情况下，P-IRLS迭代以加以区分，以方便优化，和gam.fit3被用于代替gam.fit。默认的是第二种方法，外部循环。

Several alternative basis-penalty types  are built in for representing model smooths, but alternatives can easily be added (see smooth.terms  for an overview and smooth.construct for how to add smooth classes). In practice the  default basis is usually the best choice, but the choice of the basis dimension (k in the  s and te terms) is something that should be considered carefully (the exact value is not critical, but it is important not to make it restrictively small, nor very large and computationally costly). The basis should  be chosen to be larger than is believed to be necessary to approximate the smooth function concerned.  The effective degrees of freedom for the smooth will then be controlled by the smoothing penalty on  the term, and (usually) selected automatically (with an upper limit set by k-1 or occasionally k). Of course  the k should not be made too large, or computation will be slow (or in extreme cases there will be more  coefficients to estimate than there are data).
几种可供选择的依据，处罚类型建立模型平滑，但替代品可以很容易地添加（见smooth.terms的概述和smooth.construct如何添加平滑的类）。在实践中，默认的基础通常是最好的选择，但选择的基础尺寸（ks和te条款）的东西，应该仔细考虑（确切值不是关键的，但重要的是不要使它限定小，也不是非常大的和计算昂贵）。应选择的基础上，要大于被认为是必要的近似的平滑函数有关。将被控制的有效程度的自由的顺利平滑的术语刑罚，和（通常情况下）自动选择（上限设定k-1或偶尔k）。当然，k不应该过大，或计算将是缓慢的（或在极端的情况下，将会有更多的系数估计比有数据）。

Note that gam assumes a very inclusive definition of what counts as a GAM:  basically any penalized GLM can be used: to this end gam allows the non smooth model  components to be penalized via argument paraPen and allows the linear predictor to depend on  general linear functionals of smooths, via the summation convention mechanism described in  linear.functional.terms.
请注意，gam承担的最重要的一个GAM一个很大的包容性的定义：基本上，可用于任何处罚GLM：为此gam允许非光滑模型组件被处罚通过参数paraPen和允许的线性预测依赖上一般线性泛函的平滑，通过的求和约定机制，在linear.functional.terms。

Details of the default underlying fitting methods are given in Wood (2011 and 2004). Some alternative methods are discussed in Wood (2000 and 2006).
相关拟合方法的默认木材（2011年和2004年）。一些替代方法进行了探讨伍德（2000年至2006年）。

gam() is not a clone of Trevor Hastie's oroginal (as supplied in S-PLUS or package gam) The major differences are (i) that by default estimation of the degree of smoothness of model terms is part of model fitting, (ii) a Bayesian approach to variance estimation is employed that makes for easier confidence interval calculation (with good coverage probabilities), (iii) that the model can depend on any (bounded) linear functional of smooth terms, (iv) the parametric part of the model can be penalized,  (v) simple random effects can be incorporated, and  (vi) the facilities for incorporating smooths of more than one variable are different: specifically there are no lo smooths, but instead (a) s terms can have more than one argument, implying an isotropic smooth and (b) te or t2 smooths are provided as an effective means for modelling smooth interactions of any number of variables via scale invariant tensor product smooths. Splines on the sphere, Duchon splines  and Gaussian Markov Random Fields are also available. See gam  from package gam, for GAMs via the original Hastie and Tibshirani approach.
gam()是不是克隆的特雷弗·黑斯蒂的oroginal（提供S-PLUS或包GAM）的主要区别是：（i）默认情况下，估计模型的平滑程度是模型拟合，（ II）的方差估计采用贝叶斯方法，使得更容易置信区间计算（具有良好的覆盖概率），（三），该模型可以依赖于任何（有限）线性泛函的光滑条款，（iv）该参数的一部分，该模型可以受到惩罚，（五）简单随机效应可以注册成立，及（vi）的设施，结合平滑的一个以上的变量是不同的：具体而言，有没有lo平滑，而是（一） s条款可以超过一个参数，这意味着各向同性的平滑和（b）te或t2平滑的有效手段通过规模不变的造型流畅的相互作用任意数量的变量张量积平滑。样条在球体上，杜琼样条曲线和高斯马尔可夫随机场也可提供。见GAM包gam，Hastie和Tibshirani的方法GAMS通过。

值----------Value----------

If fit=FALSE the function returns a list G of items needed to fit a GAM, but doesn't actually fit it.
如果fit=FALSE该函数返回一个列表G需要的物品，以适应GAM，但实际上并不适合它。

Otherwise the function returns an object of class "gam" as described in gamObject.
否则，该函数返回一个类的对象"gam"所描述的gamObject。

警告----------WARNINGS ----------

The default basis dimensions used for smooth terms are essentially arbitrary, and  it should be checked that they are not too small. See choose.k and gam.check.
默认的基本尺寸用于平滑条款的本质上是任意的，应该检查他们是不是太小。见choose.k和gam.check。

You must have more unique combinations of covariates than the model has total parameters. (Total parameters is sum of basis dimensions plus sum of non-spline  terms less the number of spline terms).
您必须具有独特的组合比模型的协变量的总体参数。（总参数的基本尺寸加总和非样条条款的数量越少样条计算）的总和。

Automatic smoothing parameter selection is not likely to work well when  fitting models to very few response data.
工作时，很少的响应数据的拟合模型的自动平滑参数的选择是不太可能。

For data with many  zeroes clustered together in the covariate space it is quite easy to set up  GAMs which suffer from identifiability problems, particularly when using Poisson or binomial families. The problem is that with e.g. log or logit links, mean value zero corresponds to an infinite range on the linear predictor scale.
对于许多零的协变量空间中聚集在一起的数据，这是很容易建立GAMS遭受辨识性问题，特别是使用泊松或二项式家庭的当。的问题是，与例如log或罗吉特链接，平均值的零对应到一个无限的范围内的线性预测规模。

（作者）----------Author(s)----------

Simon N. Wood <a href="mailto:simon.wood@r-project.org">simon.wood@r-project.org</a>

Front end design inspired by the S function of the same name based on the work
of Hastie and Tibshirani (1990). Underlying methods owe much to the work of
Wahba (e.g. 1990) and Gu (e.g. 2002).

参考文献----------References----------

and marginal likelihood estimation of semiparametric generalized linear  models. Journal of the Royal Statistical Society (B) 73(1):3-36
generalized additive models. J. Amer. Statist. Ass. 99:673-686. [Default method for additive case by GCV (but no longer for generalized)]

generalized additive mixed models. Biometrics 62(4):1025-1036
and Hall/CRC Press.
in mixed models. Statistical Computing.
Model Components. Scandinavian Journal of Statistics, 39(1), 53-74.

and Hall.

with Multiple Quadratic Penalties. J.R.Statist.Soc.B 62(2):413-428 [The original mgcv paper, but no longer the default methods.]

the Newton method. SIAM J. Sci. Statist. Comput. 12:383-398

functions in generalized linear models. J. Am. Statist.Ass. 81:96-103

to environmental modelling. Ecological Modelling 157:157-177

参见----------See Also----------

mgcv-package, gamObject, gam.models, smooth.terms, linear.functional.terms, s, te predict.gam, plot.gam, summary.gam, gam.side, gam.selection, gam.control gam.check, linear.functional.terms negbin, magic,vis.gam
mgcv-package，gamObject，gam.models，smooth.terms，linear.functional.terms，s，tepredict.gam，<X >，plot.gam，summary.gam，gam.side，gam.selectiongam.control，gam.checklinear.functional.terms，negbin， magic

实例----------Examples----------

library(mgcv)
set.seed(2) ## simulate some data... [＃模拟一些数据...]
dat <- gamSim(1,n=400,dist="normal",scale=2)
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
summary(b)
plot(b,pages=1,residuals=TRUE)  ## show partial residuals[＃显示部分残差]
plot(b,pages=1,seWithMean=TRUE) ## `with intercept' CIs[＃拦截的证明书]
## run some basic model checks, including checking[＃运行一些基本的模型检查，包括检查]
## smoothing basis dimensions...[＃平滑基础尺寸...]
gam.check(b)

## same fit in two parts .....[＃适合两部分......]
G <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),fit=FALSE,data=dat)
b <- gam(G=G)
print(b)

## change the smoothness selection method to REML[＃改变平滑的选择方法，以REML]
b0 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat,method="REML")
plot(b0,pages=1,scheme=1)

## Would a smooth interaction of x0 and x1 be better?[＃将一个流畅的交互X0和X1是更好吗？]
## Use tensor product smooth of x0 and x1, basis [＃使用张量积X0和X1的流畅性，基础]
## dimension 49 (see ?te for details, also ?t2).[尺寸49（见TE的详细信息，也T2）。]
bt <- gam(y~te(x0,x1,k=7)+s(x2)+s(x3),data=dat,
      method="REML")
plot(bt,pages=1)
plot(bt,pages=1,scheme=2) ## alternative visualization[＃另一种可视化]
AIC(b0,bt) ## interaction worse than additive[＃互动不如添加剂]

## If it is believed that x0 and x1 are naturally on [＃如果可以相信，x0和x1自然]
## the same scale, and should be treated isotropically [＃同样的规模，并应被视为各向同性]
## then could try...[＃，那么可以尝试...]
bs <- gam(y~s(x0,x1,k=50)+s(x2)+s(x3),data=dat,
      method="REML")
plot(bs,pages=1)
AIC(b0,bt,bs) ## additive still better. [＃添加剂仍然较好。]

## Now do automatic terms selection as well[＃现在做自动的条款的选择，以及]
b1 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat,
   method="REML",select=TRUE)
plot(b1,pages=1)

## set the smoothing parameter for the first term, estimate rest ...[＃设置平滑参数的第一个任期内，估计截断...]
bp <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),sp=c(0.01,-1,-1,-1),data=dat)
plot(bp,pages=1,scheme=1)
## alternatively...[＃或者...]
bp <- gam(y~s(x0,sp=.01)+s(x1)+s(x2)+s(x3),data=dat)

# set lower bounds on smoothing parameters ....[平滑参数设置下限....]
bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),
      min.sp=c(0.001,0.01,0,10),data=dat)
print(b);print(bp)

# same with REML[同REML]
bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),
      min.sp=c(0.1,0.1,0,10),data=dat,method="REML")
print(b0);print(bp)

## now a GAM with 3df regression spline term & 2 penalized terms[＃现在的处罚条款与的3DF回归样条术语和2 GAM]

b0<-gam(y~s(x0,k=4,fx=TRUE,bs="tp")+s(x1,k=12)+s(x2,k=15),data=dat)
plot(b0,pages=1)

## now simulate poisson data...[＃现在模拟Poisson数据...]
dat <- gamSim(1,n=4000,dist="poisson",scale=.1)

## use "cr" basis to save time, with 4000 data...[使用“CR”的基础上，以节省时间，4000的数据...]
b2<-gam(y~s(x0,bs="cr")+s(x1,bs="cr")+s(x2,bs="cr")+
      s(x3,bs="cr"),family=poisson,data=dat,method="REML")
plot(b2,pages=1)

## drop x3, but initialize sp's from previous fit, to [＃下降X3，但初始化SP，从以前的配合，]
## save more time...[＃节省更多的时间。]

b2a<-gam(y~s(x0,bs="cr")+s(x1,bs="cr")+s(x2,bs="cr"),
      family=poisson,data=dat,method="REML",
      in.out=list(sp=b2$sp[1:3],scale=1))
par(mfrow=c(2,2))
plot(b2a)

par(mfrow=c(1,1))
## similar example using performance iteration[＃类似的例子使用性能迭代]
dat <- gamSim(1,n=400,dist="poisson",scale=.25)

b3<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,
      data=dat,optimizer="perf")
plot(b3,pages=1)

## repeat using GACV as in Wood 2008...[＃重复使用GACV的2008年在木...]

b4<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,
      data=dat,method="GACV.Cp",scale=-1)
plot(b4,pages=1)

## repeat using REML as in Wood 2011...[]

b5<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,
      data=dat,method="REML")
plot(b5,pages=1)

## a binary example (see later for large dataset version)...[＃一个二进制的例子（见后面的大数据集的版本）...]

dat <- gamSim(1,n=400,dist="binary",scale=.33)

lr.fit <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=binomial,
            data=dat,method="REML")

## plot model components with truth overlaid in red[＃图模型组件覆盖在红色与真理]
op <- par(mfrow=c(2,2))
fn <- c("f0","f1","f2","f3");xn <- c("x0","x1","x2","x3")
for (k in 1:4) {
  plot(lr.fit,residuals=TRUE,select=k)
  ff <- dat[[fn[k]]];xx <- dat[[xn[k]]]
  ind <- sort.int(xx,index.return=TRUE)$ix
  lines(xx[ind],(ff-mean(ff))[ind]*.33,col=2)
}
par(op)
anova(lr.fit)
lr.fit1 <- gam(y~s(x0)+s(x1)+s(x2),family=binomial,
            data=dat,method="REML")
lr.fit2 <- gam(y~s(x1)+s(x2),family=binomial,
            data=dat,method="REML")
AIC(lr.fit,lr.fit1,lr.fit2)

## A Gamma example, by modify `gamSim' output...[＃的伽玛例如，通过的修改gamSim“输出...]

dat <- gamSim(1,n=400,dist="normal",scale=1)
dat$f <- dat$f/4 ## true linear predictor [＃真正的线性预测]
Ey <- exp(dat$f);scale <- .5 ## mean and GLM scale parameter[＃的意思，GLM尺度参数]
## Note that `shape' and `scale' in `rgamma' are almost[＃需要注意的是形和规模在rgamma是几乎]
## opposite terminology to that used with GLM/GAM...[＃相对所用的术语，以与GLM / GAM ...]
dat$y <- rgamma(Ey*0,shape=1/scale,scale=Ey*scale)
bg <- gam(y~ s(x0)+ s(x1)+s(x2)+s(x3),family=Gamma(link=log),
      data=dat,method="REML")
plot(bg,pages=1,scheme=1)

## For inverse Gaussian, see ?rig[＃逆高斯，看到了吗？钻机]

## now a 2D smoothing example...[＃现在的2D平滑的例子...]

eg <- gamSim(2,n=500,scale=.1)
attach(eg)

op <- par(mfrow=c(2,2),mar=c(4,4,1,1))

contour(truth$x,truth$z,truth$f) ## contour truth[＃轮廓真相]
b4 <- gam(y~s(x,z),data=data) ## fit model[＃拟合模型]
fit1 <- matrix(predict.gam(b4,pr,se=FALSE),40,40)
contour(truth$x,truth$z,fit1) ## contour fit[＃轮廓适合]
persp(truth$x,truth$z,truth$f) ## persp truth[＃persp真相]
vis.gam(b4)                   ## persp fit[＃persp适合]
detach(eg)
par(op)

##################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]
## largish dataset example with user defined knots[＃稍大的数据例如用户自定义节]
##################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]

par(mfrow=c(2,2))
eg <- gamSim(2,n=10000,scale=.5)
attach(eg)

ind<-sample(1:10000,1000,replace=FALSE)
b5<-gam(y~s(x,z,k=50),data=data,
      knots=list(x=data$x[ind],z=data$z[ind]))
## various visualizations[＃各种可视化]
vis.gam(b5,theta=30,phi=30)
plot(b5)
plot(b5,scheme=1,theta=50,phi=20)
plot(b5,scheme=2)

par(mfrow=c(1,1))
## and a pure "knot based" spline of the same data[＃一个纯粹的“结”样条相同的数据]
b6<-gam(y~s(x,z,k=100),data=data,knots=list(x= rep((1:10-0.5)/10,10),
      z=rep((1:10-0.5)/10,rep(10,10))))
vis.gam(b6,color="heat",theta=30,phi=30)

## varying the default large dataset behaviour via `xt'[＃XT通过改变默认的大型数据集的行为“]
b7 <- gam(y~s(x,z,k=50,xt=list(max.knots=1000,seed=2)),data=data)
vis.gam(b7,theta=30,phi=30)
detach(eg)

################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]
## Approximate large dataset logistic regression for rare events[＃约大数据集的logistic回归罕见的事件]
## based on subsampling the zeroes, and adding an offset to[＃进行二次取样的零，并加一个偏移量的基础上]
## approximately allow for this.[＃约允许。]
## Doing the same thing, but upweighting the sampled zeroes[＃做同样的事情，但upweighting采样的零]
## leads to problems with smoothness selection, and CIs.[＃平滑的选择和独联体的问题。]
################################################################[＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃]
n <- 100000  ## simulate n data [＃模拟n个数据]
dat <- gamSim(1,n=n,dist="binary",scale=.33)
p <- binomial()$linkinv(dat$f-6) ## make 1's rare[＃1的罕见]
dat$y <- rbinom(p,1,p)    ## re-simulate rare response[重新模拟罕见的反应]

## Now sample all the 1's but only proportion S of the 0's[＃现在都1的，但只有比例S的0]
S <- 0.02                ## sampling fraction of zeroes[＃取样部分的零]
dat <- dat[dat$y==1 | runif(n) < S,] ## sampling[＃采样]

## Create offset based on total sampling fraction[＃创建偏移根据总抽样比]
dat$s <- rep(log(nrow(dat)/n),nrow(dat))

lr.fit <- gam(y~s(x0,bs="cr")+s(x1,bs="cr")+s(x2,bs="cr")+s(x3,bs="cr")+
            offset(s),family=binomial,data=dat,method="REML")

## plot model components with truth overlaid in red[＃图模型组件覆盖在红色与真理]
op <- par(mfrow=c(2,2))
fn <- c("f0","f1","f2","f3");xn <- c("x0","x1","x2","x3")
for (k in 1:4) {
   plot(lr.fit,select=k,scale=0)
   ff <- dat[[fn[k]]];xx <- dat[[xn[k]]]
   ind <- sort.int(xx,index.return=TRUE)$ix
   lines(xx[ind],(ff-mean(ff))[ind]*.33,col=2)
}
par(op)
rm(dat)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

zyy199212 · 发表于 2015-10-12 10:19:58

厉害！{:soso_e183:}{:soso_e179:}

账号		自动登录	找回密码
密码			注册