R语言:mgcv()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-17 10:14:09

mgcv(mgcv)
mgcv()所属R语言包：mgcv

                                       Multiple Smoothing Parameter Estimation by GCV or UBRE
                                       GCV或UBRE多个平滑参数估计

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Function to efficiently estimate smoothing parameters in Generalized Ridge Regression Problem with multiple (quadratic) penalties, by GCV  or UBRE. The function uses Newton's method in multi-dimensions, backed up by steepest descent to iteratively  adjust a set of relative smoothing parameters for each penalty. To ensure that the overall level of smoothing is optimal, and to guard against trapping by local minima, a highly efficient global minimisation with respect to  one overall smoothing parameter is also made at each iteration. This is the original Wood (2000) method. It has now  been superceded by the methods in magic (Wood, 2004) and gam.fit3 (Wood, 2011).
功能，有效地多个（二次）处罚GCV或UBRE，平滑参数估计广义岭回归问题。该函数使用牛顿的方法，多层面，反复调整刑罚为每一套相对平滑参数的最速下降备份。在每次迭代，以确保平滑的整体水平是最优的，并防止陷入局部极小，高效方面的一个整体平滑参数的全局极小。这是原材（2000）的方法。如今，它已被取代magic（木，2004年）和gam.fit3（木，2011）的方法。

For a listing of all routines in the mgcv package type:<br> library(help="mgcv"). For an overview of the mgcv package see mgcv-package.
对于上市的所有例程mgcv包类型：参考library(help="mgcv")。 mgcv包的概述见mgcv-package。

用法----------Usage----------

mgcv(y,X,sp,S,off,C=NULL,w=rep(1,length(y)),H=NULL,
   scale=1,gcv=TRUE,control=mgcv.control())

参数----------Arguments----------

参数：y
The response data vector.
响应数据向量。

参数：X
The design matrix for the problem, note that ncol(X) must give the number of model parameters, while nrow(X)  should give the number of data.
问题的设计矩阵，请注意，ncol(X)必须给模型参数的数量，而nrow(X)应该给予一些数据。

参数：sp
An array of smoothing parameters. If control$fixed==TRUE then these are taken as being the  smoothing parameters. Otherwise any positive values are assumed to be initial estimates and negative values to signal auto-initialization.
平滑参数的数组。如果control$fixed==TRUE然后这些作为平滑参数。否则，任何正面的价值观被认为是初步估计和负值信号自动初始化。

参数：S
A list of penalty matrices. Only the smallest square block containing all non-zero matrix elements is actually stored, and off[i] indicates the element of the parameter vector that  S[[i]][1,1] relates to.
罚款矩阵列表。只有最小见方的块，其中包含所有非零矩阵元素实际上是存储和off[i]表示S[[i]][1,1]涉及到的参数向量元素。

参数：off
Offset values indicating where in the overall parameter a particular stored penalty starts operating.  For example if p is the model parameter vector and k=nrow(S[[i]])-1, then the ith penalty is given by <br> t(p[off[i]

off[i]+k)])%*%S[[i]]%*%p[off[i]

off[i]+k)].
偏移值，表明整体参数在一个特定的存储刑罚开始运作。例如，如果p是模型参数向量和k=nrow(S[[i]])-1，然后第i个点球参考t(p[off[i]

off[i]+k)])%*%S[[i]]%*%p[off[i]

off[i]+k)]。

参数：C
Matrix containing any linear equality constraints  on the problem (i.e. C in Cp=0).
矩阵包含任何线性等式约束问题（即CCp=0）。

参数：w
A vector of weights for the data (often proportional to the  reciprocal of the standard deviation of y).
一个数据（通常是成正比的标准差的倒数y）的权重向量。

参数：H
A single fixed penalty matrix to be used in place of the multiple  penalty matrices in S. mgcv cannot mix fixed and estimated penalties.
地方多罚矩阵S一个单一的定额罚款矩阵。 mgcv不能混用固定和估计的处罚。

参数：scale
This is the known scale parameter/error variance to use with UBRE.  Note that it is assumed that the variance of y_i is  given by \code{scale}/w_i.
这是已知的规模参数/误差方差与UBRE使用。请注意这是假设该的y_i差异是\code{scale}/w_i。

参数：gcv
If gcv is TRUE then smoothing parameters are estimated by GCV, otherwise UBRE is used.
如果gcv是真，那么平滑参数GCV的估计，否则UBRE的使用。

参数：control
A list of control options returned by mgcv.control.
mgcv.control返回控制选项列表。

Details

详情----------Details----------

This is documentation for the code implementing the method described in section  4 of  Wood (2000) . The method is a computationally efficient means of applying GCV to  the  problem of smoothing parameter selection in generalized ridge regression problems  of  the form:
这是为落实木材（2000）第4节中所描述的方法的代码文档。该方法是一种计算应用GCV的广义岭回归问题的形式平滑参数的选择问题的有效手段：

possibly subject to constraints Cp=0.  X is a design matrix, p a parameter vector,  y a data vector, W a diagonal weight matrix, S_i a positive semi-definite matrix  of coefficients defining the ith penalty and C a matrix of coefficients  defining any linear equality constraints on the problem. The smoothing parameters are the lambda_i but there is an overall smoothing parameter rho as well. Note that X must be of full column rank, at least when projected  into the null space of any equality constraints.
可能是受约束的Cp=0。 X是p参数向量，y数据向量，W重量对角线矩阵，S_i半正定矩阵的设计矩阵，系数确定第i罚款和C的界定问题上的任何线性等式约束的系数矩阵。平滑参数lambda_i但有一个整体平滑参数rho以及。注意X必须是列满秩的，至少在预计到任何等式约束的空空间。

The method operates by alternating very efficient direct searches for  rho with Newton or steepest descent updates of the logs of the lambda_i.  Because the GCV/UBRE scores are flat w.r.t. very large or very small lambda_i,  it's important to get good starting parameters, and to be careful not to step into a flat region of the smoothing parameter space. For this reason the algorithm rescales any Newton step that  would result in a log(lambda_i) change of more than 5. Newton steps are only used if the Hessian of the GCV/UBRE is postive definite, otherwise steepest descent is used. Similarly steepest  descent is used if the Newton step has to be contracted too far (indicating that the quadratic model  underlying Newton is poor). All initial steepest descent steps are scaled so that their largest component is 1. However a step is calculated, it is never expanded if it is successful (to avoid flat portions of the objective),  but steps are successively halved if they do not decrease the GCV/UBRE score, until they do, or the direction is deemed to have  failed. M$conv provides some convergence diagnostics.
该方法操作交替非常有效的直接搜索rho牛顿或lambda_i的日志最速下降更新。由于GCV / UBRE成绩平w.r.t.非常大或非常小的lambda_i，它是重要的是获得良好的开端参数，要小心，不要步入平滑参数空间的平坦区域。出于这个原因，该算法重新调整任何牛顿的步骤，将导致一个log(lambda_i)变化超过5。牛顿步骤仅用于如果黑森州的GCV / UBRE的是阳性明确，否则最速下降。同样使用最速下降，如果牛顿一步都有承包太多表明二次模型的基本牛顿差。所有的初始速下降步骤缩放，所以，他们最大的组成部分，是1。但计算了一步，它从来没有扩大，如果它是成功的（以避免目标的平坦部分），但先后步骤减半，如果他们不降低GCV / UBRE得分，直到他们这样做，或方向被视为都失败了。 M$conv提供了一些收敛的诊断。

The method is coded in C and is intended to be portable. It should be  noted that seriously ill conditioned problems (i.e. with close to column rank  deficiency in the design matrix) may cause problems, especially if weights vary  wildly between observations.
该方法的编码C“，旨在为便携式。应当指出，重病（即接近列在设计矩阵秩亏）空调的问题可能会导致问题，特别是如果重量不同意见之间广泛。

值----------Value----------

An object is returned with the following elements:
返回一个对象包含下列元素：

参数：b
The best fit parameters given the estimated smoothing parameters.
最合适的参数估计平滑参数。

参数：scale
The estimated or supplied scale parameter/error variance.
估计或提供大规模的参数/误差方差。

参数：score
The UBRE or GCV score.
UBRE或GCV的得分。

参数：sp
The estimated (or supplied) smoothing parameters (lambda_i/rho)
估计（或提供）平滑参数（lambda_i/rho）

参数：Vb
Estimated covariance matrix of model parameters.
模型参数估计的协方差矩阵。

参数：hat
diagonal of the hat/influence matrix.
帽子/影响矩阵对角线。

参数：edf
array of estimated degrees of freedom for each parameter.
阵列的每个参数估计的自由程度。

参数：info
A list of convergence diagnostics, with the following elements:
一个衔接的诊断列表，包含下列元素：

edfArray of whole model estimated degrees of freedom.
预计整个模型edfArray的自由度。

scoreArray of ubre/gcv scores at the edfs for the final set of relative smoothing parameters.
ubre / GCV成绩的相对平滑参数的最后一组EDFS scoreArray。

gthe gradient of the GCV/UBRE score w.r.t. the smoothing parameters at termination.
gthe梯度GCV / UBRE得分w.r.t.在终止平滑参数。

hthe second derivatives corresponding to g above - i.e. the leading diagonal of the Hessian.
hthe对应的第二衍生物g以上 - 即领先的Hessian对角线。

ethe eigenvalues of the Hessian. These should all be non-negative!
e下的特征值的黑森州。这些应该都是非负！

iterthe number of iterations taken.
iterthe采取迭代的数量。

in.okTRUE if the second smoothing parameter guess improved the GCV/UBRE score. (Please report examples  where this is FALSE)
in.okTRUE第二平滑参数，如果想提高GCV / UBRE得分。（请报告的例子，这是FALSE）

step.failTRUE if the algorithm terminated by failing to improve the GCV/UBRE score rather than by "converging".  Not necessarily a problem, but check the above derivative information quite carefully.
step.failTRUE如果未能改善GCV / UBRE得分，而不是通过“融合”的算法终止。不一定是一个问题，但相当仔细检查上述衍生工具的信息。

警告----------WARNING ----------

The method may not behave well with near column rank defficient X
该方法的行为可能与附近的列秩defficientX

作者（S）----------Author(s)----------

Simon N. Wood <a href="mailto:simon.wood@r-project.org">simon.wood@r-project.org</a>

参考文献----------References----------

the Newton method. SIAM J. Sci. Statist. Comput. 12:383-398
with Multiple  Quadratic Penalties. J.R.Statist.Soc.B 62(2):413-428
generalized additive models. J. Amer. Statist. Ass. 99:673-686
and marginal likelihood estimation of semiparametric generalized linear  models. Journal of the Royal Statistical Society (B) 73(1):3-36

参见----------See Also----------

gam, magic
gam，magic

举例----------Examples----------

## Not run: [＃无法运行：]
library(help="mgcv") # listing of all routines[上市的所有例程]

set.seed(1);n<-400;sig2<-4
x0 <- runif(n, 0, 1);x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1);x3 <- runif(n, 0, 1)
f <- 2 * sin(pi * x0)
f <- f + exp(2 * x1) - 3.75887
f <- f+0.2*x2^11*(10*(1-x2))^6+10*(10*x2)^3*(1-x2)^10-1.396
e <- rnorm(n, 0, sqrt(sig2))
y <- f + e
# set up additive model[成立加法模型]
G<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),fit=FALSE)
# fit using mgcv[适合使用mgcv]
mgfit<-mgcv(G$y,G$X,G$sp,G$S,G$off,C=G$C)

## End(Not run) [＃结束（不运行）]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册