R语言:smooth.spline()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-16 18:53:02

smooth.spline(stats)
smooth.spline()所属R语言包：stats

                                    Fit a Smoothing Spline
                                       适合平滑样条

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

Fits a cubic smoothing spline to the supplied data.
适合三次样条提供的数据。

用法----------Usage----------

smooth.spline(x, y = NULL, w = NULL, df, spar = NULL,
            cv = FALSE, all.knots = FALSE, nknots = NULL,
            keep.data = TRUE, df.offset = 0, penalty = 1,
            control.spar = list(), tol = 1e-6 * IQR(x))

参数----------Arguments----------

参数：x
a vector giving the values of the predictor variable, or  a list or a two-column matrix specifying x and y.
向量的预测变量，或一个列表或一个两列的矩阵，指定x和y的值。

参数：y
responses. If y is missing or NULL, the responses are assumed to be specified by x, with x the index vector.
反应。如果y丢失或NULL，假设由x与x索引向量指定的答复。

参数：w
optional vector of weights of the same length as x; defaults to all 1.
可选向量相同长度的作为x的重量;默认为全1。

参数：df
the desired equivalent number of degrees of freedom (trace of the smoother matrix).
所需的同等数量的自由度（平滑矩阵的迹线）。

参数：spar
smoothing parameter, typically (but not necessarily) in (0,1].  The coefficient λ of the integral of the squared second derivative in the fit (penalized log likelihood) criterion is a monotone function of spar, see the details below.
平滑参数，通常（但不一定）(0,1]。系数λ标准（处罚记录的可能性）的平方在合适的第二个衍生的积分是spar的单调函数，看下面的细节。

参数：cv
ordinary (TRUE) or "generalized" cross-validation (GCV) when FALSE; setting it to NA skips the evaluation of leverages and any score.
普通（TRUE）或“广义交叉验证（GCV），FALSE设置NA跳过利用的评价和任何得分。

参数：all.knots
if TRUE, all distinct points in x are used as knots.  If FALSE (default), a subset of x[] is used, specifically x[j] where the nknots indices are evenly spaced in 1:n, see also the next argument nknots.
如果TRUE，在所有不同点x被用作海里。如果FALSE（默认），一个子集x[]使用，特别是x[j]其中nknots指数是均匀分布在1:n，也看到了下一个参数nknots。

参数：nknots
integer giving the number of knots to use when all.knots=FALSE.  Per default, this is less than n, the number of unique x values for n > 49.
整数，节数，使用时all.knots=FALSE。每默认情况下，这是比n，独特的xn > 49值的数目。

参数：keep.data
logical specifying if the input data should be kept in the result.  If TRUE (as per default), fitted values and residuals are available from the result.
结果，如果输入的数据应在保持逻辑指定。如果TRUE（按默认），拟合值和残差是从结果。

参数：df.offset
allows the degrees of freedom to be increased by df.offset in the GCV criterion.
允许自由度df.offset在GCV的标准要增加。

参数：penalty
the coefficient of the penalty for degrees of freedom in the GCV criterion.
系数在GCV的标准自由度的刑罚。

参数：control.spar
optional list with named components controlling the root finding when the smoothing parameter spar is computed, i.e., missing or NULL, see below.  Note that this is partly experimental and may change with general spar computation improvements!
可选控制求根平滑参数spar计算，即，失踪或NULL，见下文命名的组件。请注意，这是部分实验，并可能改变与一般的晶石计算的改进！

low:lower bound for spar; defaults to -1.5 (used to implicitly default to 0 in R versions earlier than 1.4).
低：spar下界;默认为-1.5（R版本1.4之前使用隐式默认为0）。

high:upper bound for spar; defaults to +1.5.
高：上spar约束;默认至+1.5。

tol:the absolute precision (tolerance) used; defaults to 1e-4 (formerly 1e-3).
TOL：绝对精度（公差）;默认1E-4（前身为1E-3）。

eps:the relative precision used; defaults to 2e-8 (formerly 0.00244).
EPS：使用相对精度;默认2E-8（前身为0.00244）。

trace:logical indicating if iterations should be traced.
跟踪：指示是否应追溯到迭代逻辑。

maxit:integer giving the maximal number of iterations; defaults to 500.    Note that spar is only searched for in the interval [low, high].
麦克斯特：整数，迭代的最大数量;默认为500。注意spar只搜索间隔[low, high]。

参数：tol
A tolerance for same-ness of the x values.  The values are binned into bin of size tol and values which fall into the same bin are regarded as the same.
一个宽容的x值相同性。分级成大小tol“属于同斌的值是相同的bin值。

Details

详情----------Details----------

Neither x nor y are allowed to containing missing or infinite values.
既不x也y允许包含丢失或无限值。

The x vector should contain at least four distinct values. "Distinct" here is controlled by tol: values which are regarded as the same are replaced by the first of their values and the corresponding y and w are pooled accordingly.
x向量应至少包含四个不同的值。 “独特的”这里是tol：相同的值取代其价值和相应y和w汇集相应的控制。

The computational λ used (as a function of \code{spar}) is λ = r * 256^(3*spar - 1) where r = tr(X' W X) / tr(Σ), Σ is the matrix given by Σ[i,j] = Integral B''[i](t) B''[j](t) dt, X is given by X[i,j] = B[j](x[i]), W is the diagonal matrix of weights (scaled such that its trace is n, the original number of observations) and B[k](.) is the k-th B-spline.
计算λ用（作为\code{spar}的功能）是λ = r * 256^(3*spar - 1)其中r = tr(X' W X) / tr(Σ)，Σ是Σ[i,j] = Integral B''[i](t) B''[j](t) dt给予的矩阵，XX[i,j] = B[j](x[i])，W是对角矩阵的重量（缩放等，是它的踪影n，观察原号码）和B[k](.)的 k次B样条。

Note that with these definitions, f_i = f(x_i), and the B-spline basis representation f = X c (i.e., c is the vector of spline coefficients), the penalized log likelihood is L = (y - f)' W (y - f) + λ c' Σ c, and hence c is the solution of the (ridge regression) (X' W X + λ Σ) c = X' W y.
请注意，这些定义，f_i = f(x_i)，B样条基表示f = X c（即c样条系数向量），处罚记录的可能性是L = (y - f)' W (y - f) + λ c' Σ c的，因此c（岭回归）(X' W X + λ Σ) c = X' W y的解决方案。

If spar is missing or NULL, the value of df is used to determine the degree of smoothing.  If both are missing, leave-one-out cross-validation (ordinary or "generalized" as determined by cv) is used to determine λ. Note that from the above relation,       spar is spar = s0 + 0.0601 * log(λ), which is intentionally different from the S-PLUS implementation of smooth.spline (where spar is proportional to λ).  In R's (log λ) scale, it makes more sense to vary spar linearly.
如果spar丢失或NULL值df是用来确定的平滑度。如果两者都失踪，留一出交叉验证（普通或“广义”作为确定cv）用于确定λ。请注意，从上面的关系，spar是spar = s0 + 0.0601 * log(λ)，这是故意从不同的smooth.spline（其中spar是λ的比例，S-PLUS实施）。在R（log λ）的规模，它更有意义的变化spar线性。

Note however that currently the results may become very unreliable for spar values smaller than about -1 or -2.  The same may happen for values larger than 2 or so. Don't think of setting spar or the controls low and high outside such a safe range, unless you know what you are doing!
但是请注意，目前的结果可能会变得非常不可靠，spar值小于-1或-2。相同的值大于2或可能发生。不要以为设置spar或控制的low和high以外的一个安全的范围，除非你知道你正在做的！

The "generalized" cross-validation method will work correctly when there are duplicated points in x.  However, it is ambiguous what leave-one-out cross-validation means with duplicated points, and the internal code uses an approximation that involves leaving out groups of duplicated points.  cv=TRUE is best avoided in that case.
当有重复x点的“广义交叉验证的方法将正常工作。然而，它是暧昧重复点意味着什么留一出交叉验证，内部代码使用近似离开了重复点组。 cv=TRUE在这种情况下最好避免。

值----------Value----------

An object of class "smooth.spline" with components
一个对象的类"smooth.spline"组件

参数：x
the distinct x values in increasing order, see the "Details" above.
鲜明的x为了增加值，见上面的“详细资料”。

参数：y
the fitted values corresponding to x.
对应x拟合值。

参数：w
the weights used at the unique values of x.
使用的加权x独特的价值观。

参数：yin
the y values used at the unique y values.
独特y值的y值。

参数：data
only if keep.data = TRUE: itself a list with components x, y and w of the same length.  These are the original (x_i,y_i,w_i),    i=1,…,n, values where data$x may have repeated values and hence be longer than the above x component; see details.
只有keep.data = TRUE的：list组件x，y和w的长度相同。这些都是原(x_i,y_i,w_i),    i=1,…,n，价值观，其中data$x可能有重复的值，因此比上述x组件，看到的细节。

参数：lev
(when cv was not NA) leverages, the diagonal values of the smoother matrix.
（cv不NA）利用，平滑的矩阵对角线值。

参数：cv.crit
cross-validation score, "generalized" or true, depending on cv.
根据交叉验证的得分，“广义”或真实的，对cv。

参数：pen.crit
penalized criterion
处罚标准

参数：crit
the criterion value minimized in the underlying .Fortran routine "sslvrg".
底层.Fortran常规sslvrg最小化的标准值。

参数：df
equivalent degrees of freedom used.  Note that (currently) this value may become quite imprecise when the true df is between and 1 and 2.
同等程度的自由。请注意，（目前）这个值可能会变得非常不精确，当真正的df之间和1和2。

参数：spar
the value of spar computed or given.
spar计算或给定值。

参数：lambda
the value of λ corresponding to spar, see the details above.
λspar的价值，看到上面的细节。

参数：iparms
named integer(3) vector where ..$ipars["iter"] gives number of spar computing iterations used.
命名为整数（3）向量..$ipars["iter"]给人晶石计算使用迭代的数量。

参数：fit
list for use by predict.smooth.spline, with components
列表使用predict.smooth.spline，与组件

knot:the knot sequence (including the repeated boundary knots).
结：结序列（包括重复的边界节）。

nk:number of coefficients or number of "proper" knots plus 2.
NK：系数或适当节加2。

coef:coefficients for the spline basis used.
系数：采用样条基系数。

min, range:numbers giving the corresponding quantities of x.
分钟，范围：数字给予相应数量的x。

参数：call
the matched call.
匹配的呼叫。

注意----------Note----------

The default all.knots = FALSE and nknots = NULL entails using only O(n^{0.2}) knots instead of n for n > 49.  This cuts speed and memory requirements, but not drastically anymore since R version 1.5.1 where it is only O(nk) + O(n) where nk is the number of knots. In this case where not all unique x values are used as knots, the result is not a smoothing spline in the strict sense, but very close unless a small smoothing parameter (or large df) is used.
默认all.knots = FALSE和nknots = NULL需要，而不是使用O(n^{0.2})结nn > 49。这削减的速度和内存的要求，但不显着了R版本1.5.1，因为它是唯一O(nk) + O(n)其中nk结数。在此，并非所有的独特x值作为结的情况，结果是没有一个严格意义上的光滑样条，但非常接近，除非使用一个小的平滑参数（或大df）。

作者（S）----------Author(s)----------

<font face="Courier New,Courier" color="#666666"><b>R</b></font> implementation by B. D. Ripley and Martin Maechler
(<code>spar/lambda</code>, etc).

This function is based on code in the <code>GAMFIT</code> Fortran program by
T. Hastie and R. Tibshirani (<a href="http://lib.stat.cmu.edu/general/">http://lib.stat.cmu.edu/general/</a>),
which makes use of spline code by Finbarr O'Sullivan.  Its design
parallels the <code>smooth.spline</code> function of Chambers & Hastie (1992).

参考文献----------References----------

Statistical Models in S, Wadsworth & Brooks/Cole.
Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman and Hall.
Generalized Additive Models.  Chapman and Hall.

参见----------See Also----------

predict.smooth.spline for evaluating the spline and its derivatives.
predict.smooth.spline评价样条及其衍生物。

举例----------Examples----------

require(graphics)

attach(cars)
plot(speed, dist, main = "data(cars)  &  smoothing splines")
cars.spl <- smooth.spline(speed, dist)
(cars.spl)
## This example has duplicate points, so avoid cv=TRUE[＃这个例子有重复点，所以要避免CV = TRUE]

lines(cars.spl, col = "blue")
lines(smooth.spline(speed, dist, df=10), lty=2, col = "red")
legend(5,120,c(paste("default [C.V.] => df =",round(cars.spl$df,1)),
            "s( * , df = 10)"), col = c("blue","red"), lty = 1:2,
   bg='bisque')
detach()

## Residual (Tukey Anscombe) plot:[＃残余（杜克安斯库姆）的图：]
plot(residuals(cars.spl) ~ fitted(cars.spl))
abline(h = 0, col="gray")

## consistency check:[＃一致性检查：]
stopifnot(all.equal(cars$dist,
                  fitted(cars.spl) + residuals(cars.spl)))

##-- artificial example[ - 人工例如]
y18 <- c(1:3,5,4,7:3,2*(2:5),rep(10,4))
xx  <- seq(1,length(y18), len=201)
(s2  <- smooth.spline(y18)) # GCV[GCV的]
(s02  <- smooth.spline(y18, spar = 0.2))
(s02. <- smooth.spline(y18, spar = 0.2, cv=NA))
plot(y18, main=deparse(s2$call), col.main=2)
lines(s2, col = "gray"); lines(predict(s2, xx), col = 2)
lines(predict(s02, xx), col = 3); mtext(deparse(s02$call), col = 3)

## The following shows the problematic behavior of 'spar' searching:[＃下面显示的“晶石”的搜索行为问题：]
(s2  <- smooth.spline(y18, control =
                  list(trace = TRUE, tol = 1e-6, low = -1.5)))
(s2m <- smooth.spline(y18, cv = TRUE, control =
                  list(trace = TRUE, tol = 1e-6, low = -1.5)))
## both above do quite similarly (Df = 8.5 +- 0.2)[＃两个以上做非常类似（DF = 8.5 +  -  0.2）]

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册