找回密码
 注册
查看: 5098|回复: 0

R语言:gam()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-16 21:36:20 | 显示全部楼层 |阅读模式
gam(mgcv)
gam()所属R语言包:mgcv

                                        Generalized additive models with integrated smoothness estimation
                                         广义相加模型与综合平滑估计

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

Fits a generalized additive model (GAM) to data, the term "GAM" being taken to include any quadratically penalized GLM.   The degree of smoothness of model terms is estimated as part of fitting. gam can also fit any GLM subject to multiple quadratic penalties (including  estimation of degree of penalization). Isotropic or scale invariant smooths of any number of variables are available as model terms, as are linear functionals of such smooths; confidence/credible intervals are readily available for any quantity predicted using a fitted model; gam is extendable: users can add smooths.
适合广义相加模型(GAM)的数据,“自由亚齐运动”正在采取包括任何二次处罚的GLM。估计拟合模型计算的平滑度。 gam也可以容纳多个二次处罚(包括处罚程度的估计)的GLM的主题。各向同性或规模不变的平滑任意数量的变量是模型计算的,因为这样的线性泛函平滑;信心/可信区间都是现成的任何使用拟合模型的预测数量;gam是可扩展的,用户可以添加平滑。

Smooth terms are represented using penalized regression splines (or similar smoothers) with smoothing parameters selected by GCV/UBRE/AIC/REML or by regression splines with fixed degrees of freedom (mixtures of the two are permitted). Multi-dimensional smooths are  available using penalized thin plate regression splines (isotropic) or tensor product splines  (when an isotropic smooth is inappropriate). For an overview of the smooths available see smooth.terms.  For more on specifying models see gam.models, random.effects and linear.functional.terms.  For more on model selection see gam.selection.
光滑的条款表示GCV的/ UBRE / AIC / REML法或固定的自由程度(允许两者的混合物)的回归样条平滑参数的选择与使用惩罚回归样条线(或类似的平滑)。多维平滑可使用处罚薄板回归样条(各向同性)或张量积样条(各向同性顺利时是不恰当的)。概述的平滑smooth.terms。如需指定型号上看到gam.models,random.effects和linear.functional.terms。模型选择的更多信息,请参阅gam.selection。

gam() is not a clone of what S-PLUS provides: the major differences are (i) that by default estimation of the degree of smoothness of model terms is part of model fitting, (ii) a Bayesian approach to variance estimation is employed that makes for easier confidence interval calculation (with good coverage probabilities), (iii) that the model can depend on any (bounded) linear functional of smooth terms, and, (iv) the parametric part of the model can be penalized, (v) simple random effects can be incorporated, and  (vi) the facilities for incorporating smooths of more than one variable are different: specifically there are no lo smooths, but instead (a) s terms can have more than one argument, implying an isotropic smooth and (b) te smooths are provided as an effective means for modelling smooth interactions of any number of variables via scale invariant tensor product smooths. If you want a clone of what S-PLUS provides use gam from package gam.
gam()是不是克隆,S-PLUS提供的主要区别是:(i),默认情况下,模型计算的平滑度的估计是模型拟合的一部分,(二)方差估计的贝叶斯方法采用使得容易的置信区间的计算(具有良好的覆盖概率),(三)该模型可以依赖于任何(界)线性平滑功能,(四)模型的参数部分可以受到惩罚, (五)可以简单的随机效应的成立,以及(vi)设施纳入平滑多个变量是不同的:具体有没有lo平滑,而是()s的条款有一个以上的参数,这意味着各向同性顺利(二)te的平滑作为建模的任何变量,通过规模不变张量产品平滑流畅的相互作用的有效手段。如果你想克隆什么,S-PLUS提供包gam使用GAM。

For very large datasets see bam, for mixed GAM see gamm and random.effects.
对于非常大的数据集看到bam,看到混合自由亚齐运动gamm和random.effects。


用法----------Usage----------



gam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,
    na.action,offset=NULL,method="GCV.Cp",
    optimizer=c("outer","newton"),control=list(),scale=0,
    select=FALSE,knots=NULL,sp=NULL,min.sp=NULL,H=NULL,gamma=1,
    fit=TRUE,paraPen=NULL,G=NULL,in.out,...)



参数----------Arguments----------

参数:formula
A GAM formula (see formula.gam and also gam.models).  This is exactly like the formula for a GLM except that smooth terms, s and te can be added  to the right hand side to specify that the linear predictor depends on smooth functions of predictors  (or linear functionals of these).   
一个自由亚齐运动公式(见formula.gam和gam.models)。这正是像那光滑的条款除外公式为的GLM,s和te可以添加到右边的指定取决于预测光滑函数(或线性泛函的线性预测这些)。


参数:family
This is a family object specifying the distribution and link to use in fitting etc. See glm and family for more details. A negative binomial family is provided: see negbin.  quasi families actually result in the use of extended quasi-likelihood  if method is set to a RE/ML method (McCullagh and Nelder, 1989, 9.6).   
这是一个家庭的对象指定的分布和装修等环节使用glm和family更多细节。负二项分布家庭提供:看到negbin。 quasi家庭,实际上在使用扩展的拟似然导致如果method设置为一个RE / ML方法(McCullagh和Nelder,1989年,9.6)。


参数:data
A data frame or list containing the model response variable and  covariates required by the formula. By default the variables are taken  from environment(formula): typically the environment from  which gam is called.  
数据框列表,其中包含模型响应变量和协变量所需的公式。默认情况下,从environment(formula):通常从gam被称为环境变量。


参数:weights
prior weights on the data.
数据前的重量。


参数:subset
an optional vector specifying a subset of observations to be used in the fitting process.
一个可选的向量指定要在装修过程中使用的观测的子集。


参数:na.action
a function which indicates what should happen when the data contain "NA"s.  The default is set by the "na.action" setting of "options", and is "na.fail" if that is unset.  The “factory-fresh” default is "na.omit".
一个函数,它表示数据时,包含“NA的,应该发生什么。由的“na.action的”选项“设置,默认设置是”na.fail“如果没有设置。的“新鲜工厂”默认na.omit“。


参数:offset
Can be used to supply a model offset for use in fitting. Note that this offset will always be completely ignored when predicting, unlike an offset  included in formula: this conforms to the behaviour of lm and glm.
可以用来提供拟合模型偏移。请注意,这个偏移将永远被完全忽略预测时,不同偏移formula:这符合lm和glm的行为。


参数:control
A list of fit control parameters to replace defaults returned by  gam.control. Values not set assume default values.  
一个合适的控制参数列表取代gam.control返回的默认值。没有设置的值假设默认值。


参数:method
The smoothing parameter estimation method. "GCV.Cp" to use GCV for unknown scale parameter and Mallows' Cp/UBRE/AIC for known scale. "GACV.Cp" is equivalent, but using GACV in place of GCV. "REML"  for REML estimation, including of unknown scale, "P-REML" for REML estimation, but using a Pearson estimate  of the scale. "ML" and "P-ML" are similar, but using maximum likelihood in place of REML.  
平滑参数估计方法。 "GCV.Cp"使用未知的尺度参数和锦葵“CP / UBRE的/已知规模AIC的GCV的。 "GACV.Cp"是等价的,但使用GACV GCV的地方。 "REML"REML法估计,包括未知的规模,"P-REML"REML法估计,但皮尔森估计使用的规模。 "ML"和"P-ML"很相似,但使用REML法进行的最大可能性。


参数:optimizer
An array specifying the numerical optimization method to use to optimize the smoothing  parameter estimation criterion (given by method). "perf" for performance iteration. "outer"  for the more stable direct approach. "outer" can use several alternative optimizers, specified in the  second element of optimizer: "newton" (default), "bfgs", "optim", "nlm"  and "nlm.fd" (the latter is based entirely on finite differenced derivatives and is very slow).  
数组指定的数值优化方法来优化平滑参数估计的标准(method)。 "perf"迭代性能。 "outer"更稳定的直接的办法。 "outer"可以使用几种可供选择的优化,在指定的第二个元素optimizer:"newton"(默认),"bfgs","optim","nlm"“ "nlm.fd"(后者是完全基于有限差分衍生工具是很慢的)。


参数:scale
If this is positive then it is taken as the known scale parameter. Negative signals that the  scale parameter is unknown. 0 signals that the scale parameter is 1  for Poisson and binomial and unknown otherwise.  Note that (RE)ML methods can only work with scale parameter 1 for the Poisson and binomial cases.      
如果这是积极的,那么它被称为尺度参数。尺度参数是未知的,消极的信号。尺度参数泊松分布和二项式和未知的,否则是1 0信号。注意(RE)的ML方法可以只使用尺度参数的泊松分布和二项式情况1。


参数:select
If this is TRUE then gam can add an extra penalty to each term so  that it can be penalized to zero.  This means that the smoothing parameter estimation that is  part of fitting can completely remove terms from the model. If the corresponding  smoothing parameter is estimated as zero then the extra penalty has no effect.  
如果这是TRUE然后gam可以增加一个额外的罚款每届任期,以便它可以被处罚为零。这意味着平滑参数估计是拟合的一部分,完全可以从模型中删除的条款。如果相应的平滑参数估计为零,那么额外的罚款,没有任何效果。


参数:knots
this is an optional list containing user specified knot values to be used for basis construction.  For most bases the user simply supplies the knots to be used, which must match up with the k value supplied (note that the number of knots is not always just k).  See tprs for what happens in the "tp"/"ts" case.  Different terms can use different numbers of knots, unless they share a covariate.  
这是一个可选列表,其中包含用户指定的结值被用于基础建设。对于最基础的用户只需提供要使用的绳结,必须与k提供的值(注意,结的数量并不总是公正k)。看到tprs"tp"/"ts"情况发生的事情的。不同的条件,可以使用不同数量的结,除非它们共享一个协。


参数:sp
A vector of smoothing parameters can be provided here. Smoothing parameters must be supplied in the order that the smooth terms appear in the model  formula. Negative elements indicate that the parameter should be estimated, and hence a mixture  of fixed and estimated parameters is possible. If smooths share smoothing parameters then length(sp)  must correspond to the number of underlying smoothing parameters.
这里可以提供一个平滑的参数向量。必须提供平滑参数的顺序,顺利的条款,在出现的模型公式。消极因素表明,应估计参数,因此混合固定和参数估计是可能的。如果平滑份额的平滑参数,那么length(sp)必须符合基本平滑参数的数量。


参数:min.sp
Lower bounds can be supplied for the smoothing parameters. Note that if this option is used then the smoothing parameters full.sp, in the  returned object, will need to be added to what is supplied here to get the  smoothing parameters actually multiplying the penalties. length(min.sp) should  always be the same as the total number of penalties (so it may be longer than sp, if smooths share smoothing parameters).
下界可以提供平滑参数。注意:如果使用此选项,那么平滑参数full.sp,在返回的对象,将需要被添加到这里提供平滑参数乘以罚则。 length(min.sp)应始终作为处罚的总数(因此它可能会比sp长,如果平滑份额平滑参数)相同。


参数:H
A user supplied fixed quadratic penalty on the parameters of the  GAM can be supplied, with this as its coefficient matrix. A common use of this term is  to add a ridge penalty to the parameters of the GAM in circumstances in which the model is close to un-identifiable on the scale of the linear predictor, but perfectly well defined on the response scale.
可以提供一个用户提供固定的“自由亚齐运动参数的二次罚款,以此作为其系数矩阵。本学期的一个常见的用途是添加一个的脊处罚自由亚齐运动的情况,在该模型是接近未识别的线性预测的规模,但很好的响应规模定义的参数。


参数:gamma
It is sometimes useful to inflate the model degrees of  freedom in the GCV or UBRE/AIC score by a constant multiplier. This allows  such a multiplier to be supplied.   
它有时是有用的一个常乘数模型度的自由膨胀的GCV或UBRE的/ AIC的得分。这可以提供这样一个乘数。


参数:fit
If this argument is TRUE then gam sets up the model and fits it, but if it is FALSE then the model is set up and an object G containing what would be required to fit is returned is returned. See argument G.
如果这种说法是TRUE然后gam设置模式和适合它,但如果它是FALSE然后模型的建立和对象G含有会要求,以适应返回返回。见参数G。


参数:paraPen
optional list specifying any penalties to be applied to parametric model terms.  gam.models explains more.
可选列表指定被应用到参数模型条款的任何处罚。 gam.models解释更多。


参数:G
Usually NULL, but may contain the object returned by a previous call to gam with  fit=FALSE, in which case all other arguments are ignored except for gamma, in.out, scale, control, method optimizer and fit.
通常是NULL,但可能包含返回的对象由以前调用gam用fit=FALSE,在这种情况下,所有其他参数忽略除gamma,in.out ,scale,control,methodoptimizer和fit。


参数:in.out
optional list for initializing outer iteration. If supplied then this must contain two elements: sp should be an array of initialization values for all smoothing parameters (there must be a value for all smoothing parameters, whether fixed or to be estimated, but those for fixed s.p.s are not used); scale is the typical scale of the GCV/UBRE function, for passing to the outer optimizer, or the the initial value of the scale parameter, if this is to be estimated by RE/ML.
可选列表初始化外部循环。如果提供,那么这必须包含两个要素:sp应该是所有平滑参数的初始化值的数组(必须有一个平滑的所有参数的值,是否固定或估计,但这些都不是固定的SPS使用); scale是典型的规模GCV / UBRE功能,通过优化外,或尺度参数的初始值,如果是这样,RE / ML估计。


参数:...
further arguments for  passing on e.g. to gam.fit (such as mustart).  
上,例如通过进一步的参数gam.fit(如mustart)。


Details

详情----------Details----------

A generalized additive model (GAM) is a generalized linear model (GLM) in which the linear  predictor is given by a user specified sum of smooth functions of the covariates plus a  conventional parametric component of the linear predictor. A simple example is:
一个广义相加模型(GAM)是一个广义线性模型(GLM),其中线性预测是由用户指定的光滑函数的协变量的总和,加上传统的线性预测的参数组件。一个简单的例子是:

where the (independent) response variables y_i~Poi, and f_1 and f_2 are smooth functions of covariates x_1 and  x_2. The log is an example of a link function.
(独立)响应变量y_i~Poi,f_1和f_2光滑函数的协变量x_1和x_2。日志是连结功能的例子。

If absolutely any smooth functions were allowed in model fitting then maximum likelihood  estimation of such models would invariably result in complex overfitting estimates of  f_1  and f_2. For this reason the models are usually fit by  penalized likelihood  maximization, in which the model (negative log) likelihood is modified by the addition of  a penalty for each smooth function, penalizing its "wiggliness". To control the tradeoff  between penalizing wiggliness and penalizing badness of fit each penalty is multiplied by  an associated smoothing parameter: how to estimate these parameters, and  how to practically represent the smooth functions are the main statistical questions  introduced by moving from GLMs to GAMs.
如果被允许在模型拟合绝对光滑函数,那么这种模式的最大似然估计,往往会导致复杂的过拟合估计f_1和f_2。出于这个原因,模型通常适合受处罚的可能性最大化,在修改模型(负日志)的可能性,除了为每个光滑函数的罚款处罚其wiggliness“。要控制之间的权衡的惩罚wiggliness和惩罚不良的适合每个罚款是由相关的平滑参数乘以:如何估计这些参数,如何切实代表的顺利功能是通过移动从GLMs到GAMS推出的主要统计问题。

The mgcv implementation of gam represents the smooth functions using  penalized regression splines, and by default uses basis functions for these splines that  are designed to be optimal, given the number basis functions used. The smooth terms can be  functions of any number of covariates and the user has some control over how smoothness of  the functions is measured.
mgcv gam实施代表使用惩罚的回归样条的光滑函数,默认情况下使用这些设计是最佳的,因为数量的基础功能使用的样条基函数。光滑的条款可以是任何数量的协变量的功能和用户有一些平滑的职能是如何测量的控制。

gam in mgcv solves the smoothing parameter estimation problem by using the  Generalized Cross Validation (GCV) criterion
gammgcv解决了通过广义交叉验证(GCV),标准的平滑参数估计问题

or an Un-Biased Risk Estimator (UBRE )criterion
或联合国偏风险估计(UBRE)标准

where D is the deviance, n the number of data, s the scale parameter and  DoF the effective degrees of freedom of the model. Notice that UBRE is effectively just AIC rescaled, but is only used when s is known.
D是越轨行为,n的数据的数量,s尺度参数和DoF的有效度模型自由。请注意,UBRE是有效只是AIC的重新调整,但只用于当s被称为。

Alternatives are GACV, or a Laplace approximation to REML. There is some evidence that the latter may actually be the most effective choice.
替代品的GACV,或以REML法拉普拉斯近似。有一些证据表明,后者实际上可能是最有效的选择。

Smoothing parameters are chosen to  minimize the GCV, UBRE/AIC, GACV or REML scores for the model, and the main computational challenge solved  by the mgcv package is to do this efficiently and reliably. Various alternative numerical methods are provided which can be set by argument optimizer.
平滑参数的选择,以尽量减少GCV的,UBRE /工商局,GACV或模型REML法评分,计算的主要挑战,解决mgcv包是这样做有效和可靠。提供各种替代的数值方法,可以通过设置参数optimizer。

Broadly gam works by first constructing basis functions and one or more quadratic penalty  coefficient matrices for each smooth term in the model formula, obtaining a model matrix for  the strictly parametric part of the model formula, and combining these to obtain a  complete model matrix (/design matrix) and a set of penalty matrices for the smooth terms.  Some linear identifiability constraints are also obtained at this point. The model is  fit using gam.fit, a modification of glm.fit. The GAM  penalized likelihood maximization problem is solved by Penalized Iteratively  Reweighted  Least Squares (P-IRLS) (see e.g. Wood 2000).  Smoothing parameter selection is integrated in one of two ways. (i) "Performance iteration" uses the fact that at each P-IRLS iteration a penalized  weighted least squares problem is solved, and the smoothing parameters of that problem can  estimated by GCV or UBRE. Eventually, in most cases, both model parameter estimates and smoothing  parameter estimates converge. (ii) Alternatively the P-IRLS scheme is iterated to convergence for each trial set of smoothing parameters, and GCV, UBRE or REML scores are only evaluated on convergence - optimization is then "outer" to the P-IRLS loop: in this case the P-IRLS iteration has to be differentiated, to facilitate optimization, and gam.fit3 is used in place of gam.fit. The default is the second method, outer iteration.
广泛gam工程建设的基础上的职能和每个光滑任期模型公式中的一个或多个二次罚系数矩阵,获得的模型公式的严格的参数化零件模型矩阵,并结合这些获得一个完整的模型矩阵(矩阵/设计)和一组顺利条款的罚款矩阵。在这一点上也得到一些线性辨识约束。该模型适合使用gam.fit,glm.fit修改。 GAM的惩罚的可能性最大化问题解决判罚迭代加权最小二乘(IRLS)(如木材2000)。平滑参数的选取集成的两种方法之一。 (一)“性能迭代使用,在每个P-IRLS迭代1惩罚加权最小二乘问题的解决,问题的平滑参数估计GCV或UBRE。最终,在大多数情况下,模型参数估计和平滑参数估计收敛。 (二)另外的P-IRLS计划迭代收敛平滑参数设置每个审判,GCV的,UBRE或REML法的得分仅收敛评估 - 优化然后是“外”的P-IRLS循环:情况下的P-IRLS迭代加以区别,以方便优化,gam.fit3gam.fit的地方使用。默认的是第二种方法,外部循环。

Several alternative basis-penalty types  are built in for representing model smooths, but alternatives can easily be added (see smooth.terms  for an overview and smooth.construct for how to add smooth classes). In practice the  default basis is usually the best choice, but the choice of the basis dimension (k in the  s and te terms) is something that should be considered carefully (the exact value is not critical, but it is important not to make it restrictively small, nor very large and computationally costly). The basis should  be chosen to be larger than is believed to be necessary to approximate the smooth function concerned.  The effective degrees of freedom for the smooth will then be controlled by the smoothing penalty on  the term, and (usually) selected automatically (with an upper limit set by k-1 or occasionally k). Of course  the k should not be made too large, or computation will be slow (or in extreme cases there will be more  coefficients to estimate than there are data).
几种可供选择的基础上罚款类型建立模型平滑,但可以很容易地添加替代(见smooth.terms概述smooth.construct如何添加平滑类)。默认的基础上,在实践中通常是最好的选择,但选择的基础尺寸(ks和te条款)是应仔细考虑(精确值并不是很重要,但重要的是它限制小,也不是非常大的计算昂贵)。应选择的基础上,要大一些,比被认为是必要的逼近光滑函数。自由的顺利有效度,然后将被控制在长期的平滑罚款,并(通常)自动选择(与k-1或偶尔k的上限集)。当然k应该不会作出太大,或计算将是缓慢的(或在极端情况下,会有更多的系数估计比数据)。

Note that gam assumes a very inclusive definition of what counts as a GAM:  basically any penalized GLM can be used: to this end gam allows the non smooth model  components to be penalized via argument paraPen and allows the linear predictor to depend on  general linear functionals of smooths, via the summation convention mechanism described in  linear.functional.terms.
请注意,gam假设一个计数作为一个自由亚齐运动很大的包容性的定义:基本上可以使用任何处罚的GLM为此gam允许非平稳模型组件,可通过参数paraPen处罚“允许依靠一般的平滑线性泛函的线性预测,通过求和约定linear.functional.terms机制。

Details of the default underlying fitting methods are given in Wood (2011 and 2004). Some alternative methods are discussed in Wood (2000 and 2006).
伍德(2011年和2004年)基本拟合方法的默认详情。伍德(2000年和2006年)中讨论一些替代方法。


值----------Value----------

If fit=FALSE the function returns a list G of items needed to fit a GAM, but doesn't actually fit it.
如果fit=FALSE函数返回一个列表G需要,以适应自由亚齐运动,但实际上并不适合它的项目。

Otherwise the function returns an object of class "gam" as described in gamObject.
否则,函数返回一个类的对象"gam"gamObject的描述。


警告----------WARNINGS ----------

You must have more unique combinations of covariates than the model has total parameters. (Total parameters is sum of basis dimensions plus sum of non-spline  terms less the number of spline terms).
你必须有更独特的组合变项,比模型总参数。 (总参数的基础上尺寸的总和,加上非样条条款的总和少了样条条款)。

Automatic smoothing parameter selection is not likely to work well when  fitting models to very few response data.
自动平滑参数的选择是不太可能的工作很少响应数据拟合模型时。

For data with many  zeroes clustered together in the covariate space it is quite easy to set up  GAMs which suffer from identifiability problems, particularly when using Poisson or binomial families. The problem is that with e.g. log or logit links, mean value zero corresponds to an infinite range on the linear predictor scale.
为在协的空间聚集了众多零的数据,这是很容易成立GAMS遭受辨识问题,特别是当使用泊松或二项分布家庭。问题是,与如日志或罗吉特链接,意味着零值对应的线性预测规模的无限范围。


作者(S)----------Author(s)----------


Simon N. Wood <a href="mailto:simon.wood@r-project.org">simon.wood@r-project.org</a>

Front end design inspired by the S function of the same name based on the work
of Hastie and Tibshirani (1990). Underlying methods owe much to the work of
Wahba (e.g. 1990) and Gu (e.g. 2002).




参考文献----------References----------


and marginal likelihood estimation of semiparametric generalized linear  models. Journal of the Royal Statistical Society (B) 73(1):3-36
generalized additive models. J. Amer. Statist. Ass. 99:673-686. [Default method for additive case by GCV (but no longer for generalized)]

generalized additive mixed models. Biometrics 62(4):1025-1036
and Hall/CRC Press.


and Hall.


with Multiple Quadratic Penalties. J.R.Statist.Soc.B 62(2):413-428 [The original mgcv paper, but no longer the default methods.]


the Newton method. SIAM J. Sci. Statist. Comput. 12:383-398


functions in generalized linear models. J. Am. Statist.Ass. 81:96-103

to environmental modelling. Ecological Modelling 157:157-177


参见----------See Also----------

mgcv-package, gamObject, gam.models, smooth.terms, linear.functional.terms, s, te predict.gam, plot.gam, summary.gam, gam.side, gam.selection,mgcv, gam.control gam.check, linear.functional.terms negbin, magic,vis.gam
mgcv-package,gamObject,gam.models,smooth.terms,linear.functional.terms,s,tepredict.gam,<X >,plot.gam,summary.gam,gam.side,gam.selection,mgcvgam.control,gam.checklinear.functional.terms, negbin,magic


举例----------Examples----------


library(mgcv)
set.seed(0) ## simulate some data... [#模拟一些数据...]
dat <- gamSim(1,n=400,dist="normal",scale=2)
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
summary(b)
plot(b,pages=1,residuals=TRUE)  ## show partial residuals[#显示部分残差]
plot(b,pages=1,seWithMean=TRUE) ## `with intercept' CIs[#拦截“证明书]

## same fit in two parts .....[#同样适合两部分......]
G <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),fit=FALSE,data=dat)
b <- gam(G=G)
print(b)

## change the smoothness selection method to REML[#REML法改变平滑的选择方法]
b0 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat,method="REML")
plot(b0,pages=1,scheme=1)

## Would a smooth interaction of x0 and x1 be better?[#X0和X1的流畅交互更好?]
## Use tensor product smooth of x0 and x1, basis [#使用张量积X0和X1顺利,基础]
## dimension 49 (see ?te for details, also ?t2).[#尺寸49(见详情TE,也T2)。]
bt <- gam(y~te(x0,x1,k=7)+s(x2)+s(x3),data=dat,
          method="REML")
plot(bt,pages=1)
plot(bt,pages=1,scheme=2) ## alternative visualization[#替代的可视化]
AIC(b0,bt) ## interaction worse than additive[#交互比添加剂差]

## If it is believed that x0 and x1 are naturally on [#如果它被认为X0和X1自然]
## the same scale, and should be treated isotropically [#相同的规模,应被视为各向同性]
## then could try...[#然后可以尝试...]
bs <- gam(y~s(x0,x1,k=50)+s(x2)+s(x3),data=dat,
          method="REML")
plot(bs,pages=1)
AIC(b0,bt,bs) ## additive still better. [#添加剂更好。]

## Now do automatic terms selection as well[#现在做选择自动条款,以及]
b1 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat,
       method="REML",select=TRUE)
plot(b1,pages=1)


## set the smoothing parameter for the first term, estimate rest ...[#设置的第一任期内,估计截断的平滑参数...]
bp <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),sp=c(0.01,-1,-1,-1),data=dat)
plot(bp,pages=1,scheme=1)
## alternatively...[#或者...]
bp <- gam(y~s(x0,sp=.01)+s(x1)+s(x2)+s(x3),data=dat)


# set lower bounds on smoothing parameters ....[平滑参数设定的下限....]
bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),
        min.sp=c(0.001,0.01,0,10),data=dat)
print(b);print(bp)

# same with REML[同与REML法]
bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),
        min.sp=c(0.1,0.1,0,10),data=dat,method="REML")
print(b0);print(bp)


## now a GAM with 3df regression spline term &amp; 2 penalized terms[#现在与的3DF回归样条任期和2的GAM处罚条款]

b0<-gam(y~s(x0,k=4,fx=TRUE,bs="tp")+s(x1,k=12)+s(x2,k=15),data=dat)
plot(b0,pages=1)

## now simulate poisson data...[#现在模拟泊松资料的...]
dat <- gamSim(1,n=4000,dist="poisson",scale=.1)

## use "cr" basis to save time, with 4000 data...[#使用“CR”的基础上,与4000数据保存时间,...]
b2<-gam(y~s(x0,bs="cr")+s(x1,bs="cr")+s(x2,bs="cr")+
        s(x3,bs="cr"),family=poisson,data=dat,method="REML")
plot(b2,pages=1)

## drop x3, but initialize sp's from previous fit, to [#下降X3,但初始化SP,从以前的契合,]
## save more time...[#节省更多的时间......]

b2a<-gam(y~s(x0,bs="cr")+s(x1,bs="cr")+s(x2,bs="cr"),
         family=poisson,data=dat,method="REML",
         in.out=list(sp=b2$sp[1:3],scale=1))
par(mfrow=c(2,2))
plot(b2a)

par(mfrow=c(1,1))
## similar example using performance iteration[#类似的例子,使用性能迭代]
dat <- gamSim(1,n=400,dist="poisson",scale=.25)

b3<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,
        data=dat,optimizer="perf")
plot(b3,pages=1)

## repeat using GACV as in Wood 2008...[#重复使用GACV作为2008年在木材...]

b4<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,
        data=dat,method="GACV.Cp",scale=-1)
plot(b4,pages=1)

## repeat using REML as in Wood 2011...[#重复使用REML法作为2011年在木材...]

b5<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,
        data=dat,method="REML")
plot(b5,pages=1)


## a binary example (see later for large dataset version)...[#二进制的例子(见稍后为大型数据集版本)...]

dat <- gamSim(1,n=400,dist="binary",scale=.33)

lr.fit <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=binomial,
              data=dat,method="REML")

## plot model components with truth overlaid in red[覆盖在红色的图与真理的模型组件]
op <- par(mfrow=c(2,2))
fn <- c("f0","f1","f2","f3");xn <- c("x0","x1","x2","x3")
for (k in 1:4) {
  plot(lr.fit,residuals=TRUE,select=k)
  ff <- dat[[fn[k]]];xx <- dat[[xn[k]]]
  ind <- sort.int(xx,index.return=TRUE)$ix
  lines(xx[ind],(ff-mean(ff))[ind]*.33,col=2)
}
par(op)
anova(lr.fit)
lr.fit1 <- gam(y~s(x0)+s(x1)+s(x2),family=binomial,
               data=dat,method="REML")
lr.fit2 <- gam(y~s(x1)+s(x2),family=binomial,
               data=dat,method="REML")
AIC(lr.fit,lr.fit1,lr.fit2)

## A Gamma example, by modify `gamSim' output...[#伽玛例如,通过修改gamSim的输出...]

dat <- gamSim(1,n=400,dist="normal",scale=1)
dat$f &lt;- dat$f/4 ## true linear predictor [#真正的线性预测]
Ey &lt;- exp(dat$f);scale &lt;- .5 ## mean and GLM scale parameter[#的意思,GLM尺度参数]
## Note that `shape' and `scale' in `rgamma' are almost[#注意形和规模rgamma“几乎]
## opposite terminology to that used with GLM/GAM...[#对面的GLM /自由亚齐运动所使用的术语来...]
dat$y <- rgamma(Ey*0,shape=1/scale,scale=Ey*scale)
bg <- gam(y~ s(x0)+ s(x1)+s(x2)+s(x3),family=Gamma(link=log),
          data=dat,method="REML")
plot(bg,pages=1,scheme=1)

## For inverse Gaussian, see ?rig[#逆高斯,看到了什么?钻机]

## now a 2D smoothing example...[#现在的二维平滑的例子...]

eg <- gamSim(2,n=500,scale=.1)
attach(eg)

op <- par(mfrow=c(2,2),mar=c(4,4,1,1))

contour(truth$x,truth$z,truth$f) ## contour truth[#轮廓真相]
b4 &lt;- gam(y~s(x,z),data=data) ## fit model[#适合模型]
fit1 <- matrix(predict.gam(b4,pr,se=FALSE),40,40)
contour(truth$x,truth$z,fit1)   ## contour fit[#轮廓适合]
persp(truth$x,truth$z,truth$f)    ## persp truth[#persp真相]
vis.gam(b4)                     ## persp fit[#persp适合]
detach(eg)
par(op)

##################################################[#################################################]
## largish dataset example with user defined knots[#稍大例如与用户定义的节集]
##################################################[#################################################]

par(mfrow=c(2,2))
eg <- gamSim(2,n=10000,scale=.5)
attach(eg)

ind<-sample(1:10000,1000,replace=FALSE)
b5<-gam(y~s(x,z,k=50),data=data,
        knots=list(x=data$x[ind],z=data$z[ind]))
## various visualizations[#各种可视化]
vis.gam(b5,theta=30,phi=30)
plot(b5)
plot(b5,scheme=1,theta=50,phi=20)
plot(b5,scheme=2)

par(mfrow=c(1,1))
## and a pure "knot based" spline of the same data[#和一个纯粹的“结基于”相同的数据样条]
b6<-gam(y~s(x,z,k=100),data=data,knots=list(x= rep((1:10-0.5)/10,10),
        z=rep((1:10-0.5)/10,rep(10,10))))
vis.gam(b6,color="heat",theta=30,phi=30)

## varying the default large dataset behaviour via `xt'[#不同的默认的大型数据集的行为,通过XT“]
b7 <- gam(y~s(x,z,k=50,xt=list(max.knots=1000,seed=2)),data=data)
vis.gam(b7,theta=30,phi=30)
detach(eg)

################################################################[################################################## #############]
## Approximate large dataset logistic regression for rare events[#近似的大型数据集稀有事件logistic回归]
## based on subsampling the zeroes, and adding an offset to[#基于欠采样的零,并加入一个偏移量]
## approximately allow for this.[#约允许。]
## Doing the same thing, but upweighting the sampled zeroes[#做同样的事情,但upweighting采样零]
## leads to problems with smoothness selection, and CIs.[#平滑的选择,与独联体的问题。]
################################################################[################################################## #############]
n &lt;- 100000  ## simulate n data [#模拟n个数据]
dat <- gamSim(1,n=n,dist="binary",scale=.33)
p &lt;- binomial()$linkinv(dat$f-6) ## make 1's rare[#1的罕见]
dat$y &lt;- rbinom(p,1,p)      ## re-simulate rare response[#重新模拟罕见的反应]

## Now sample all the 1's but only proportion S of the 0's[#现在品尝所有的,但只有比例为0的小号]
S &lt;- 0.02                   ## sampling fraction of zeroes[#抽样比的零]
dat &lt;- dat[dat$y==1 | runif(n) &lt; S,] ## sampling[#采样]

## Create offset based on total sampling fraction[#创建抵销的基础上总抽样比]
dat$s <- rep(log(nrow(dat)/n),nrow(dat))

lr.fit <- gam(y~s(x0,bs="cr")+s(x1,bs="cr")+s(x2,bs="cr")+s(x3,bs="cr")+
              offset(s),family=binomial,data=dat,method="REML")

## plot model components with truth overlaid in red[覆盖在红色的图与真理的模型组件]
op <- par(mfrow=c(2,2))
fn <- c("f0","f1","f2","f3");xn <- c("x0","x1","x2","x3")
for (k in 1:4) {
       plot(lr.fit,select=k,scale=0)
       ff <- dat[[fn[k]]];xx <- dat[[xn[k]]]
       ind <- sort.int(xx,index.return=TRUE)$ix
       lines(xx[ind],(ff-mean(ff))[ind]*.33,col=2)
}
par(op)
rm(dat)


转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-23 21:29 , Processed in 0.027283 second(s), 15 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表