                                        General-purpose Optimization

                                         译者:生物统计家园网 机器人LoveR


General-purpose optimization based on Nelder–Mead, quasi-Newton and conjugate-gradient algorithms. It includes an option for box-constrained optimization and simulated annealing.


optim(par, fn, gr = NULL, ...,
      method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN", "Brent"),
      lower = -Inf, upper = Inf,
      control = list(), hessian = FALSE)


Initial values for the parameters to be optimized over.

A function to be minimized (or maximized), with first argument the vector of parameters over which minimization is to take place.  It should return a scalar result.

A function to return the gradient for the "BFGS", "CG" and "L-BFGS-B" methods.  If it is NULL, a finite-difference approximation will be used.  For the "SANN" method it specifies a function to generate a new candidate point.  If it is NULL a default Gaussian Markov kernel is used.
一个函数来返回"BFGS","CG"和"L-BFGS-B"方法的梯度。如果是NULL,将使用有限差分近似。 "SANN"方法为它指定一个函数来生成一个新的候选点。如果是的话NULL默认高斯马尔可夫内核被使用。

Further arguments to be passed to fn and gr.

The method to be used. See "Details".

参数:lower, upper
Bounds on the variables for the "L-BFGS-B" method, or bounds in which to search for method "Brent".

A list of control parameters. See "Details".

Logical. Should a numerically differentiated Hessian matrix be returned?



Note that arguments after ... must be matched exactly.

By default this function performs minimization, but it will maximize if control$fnscale is negative.

The default method is an implementation of that of Nelder and Mead (1965), that uses only function values and is robust but relatively slow.  It will work reasonably well for non-differentiable functions.

Method "BFGS" is a quasi-Newton method (also known as a variable metric algorithm), specifically that published simultaneously in 1970 by Broyden, Fletcher, Goldfarb and Shanno.  This uses function values and gradients to build up a picture of the surface to be optimized.

Method "CG" is a conjugate gradients method based on that by Fletcher and Reeves (1964) (but with the option of Polak–Ribiere or Beale–Sorenson updates).  Conjugate gradient methods will generally be more fragile than the BFGS method, but as they do not store a matrix they may be successful in much larger optimization problems.

Method "L-BFGS-B" is that of Byrd et. al. (1995) which allows box constraints, that is each variable can be given a lower and/or upper bound. The initial value must satisfy the constraints. This uses a limited-memory modification of the BFGS quasi-Newton method. If non-trivial bounds are supplied, this method will be selected, with a warning.
方法"L-BFGS-B"是伯德等。等。 (1995年),这使得框的限制,是每个变量可以给出一个较低的和/或上限。初始值必须满足的约束。使用有限内存BFGS拟牛顿方法的修改。如果提供了不平凡的界限,这种方法将被选中,一个警告。

Nocedal and Wright (1999) is a comprehensive reference for the previous three methods.

Method "SANN" is by default a variant of simulated annealing given in Belisle (1992). Simulated-annealing belongs to the class of stochastic global optimization methods. It uses only function values but is relatively slow. It will also work for non-differentiable functions. This implementation uses the Metropolis function for the acceptance probability. By default the next candidate point is generated from a Gaussian Markov kernel with scale proportional to the actual temperature. If a function to generate a new candidate point is given, method "SANN" can also be used to solve combinatorial optimization problems. Temperatures are decreased according to the logarithmic cooling schedule as given in Belisle (1992, p. 890); specifically, the temperature is set to temp / log(((t-1) %/% tmax)*tmax + exp(1)), where t is the current iteration step and temp and tmax are specifiable via control, see below.  Note that the "SANN" method depends critically on the settings of the control parameters. It is not a general-purpose method but can be very useful in getting to a good value on a very rough surface.
方法"SANN"默认情况下,在Belisle(1992)模拟退火的一个变种。模拟退火是属于类随机全局优化方法。它仅使用函数值,但相对缓慢。它也将适用于非可微函数。此实现使用验收概率大都市功能。默认情况下,下一个候选点产生高斯马尔可夫内核与规模成正比的实际温度。如果给出一个函数来生成一个新的候选点,方法"SANN"也可以被用来解决组合优化问题。根据对数冷却进度表温度下降Belisle(1992,890)。具体地说,温度设置为temp / log(((t-1) %/% tmax)*tmax + exp(1)),其中t是当前迭代步骤和temp 和tmax是通过control可指定,见下文。注意"SANN"方法的关键取决于控制参数的设置。它不是一个通用的方法,但可以在一个非常粗糙的表面上有一个良好的价值是非常有用的。

Method "Brent" is for one-dimensional problems only, using optimize().  It can be useful in cases where optim() is used inside other functions where only method can be specified, such as in mle from package stats4.

Function fn can return NA or Inf if the function cannot be evaluated at the supplied value, but the initial value must have a computable finite value of fn. (Except for method "L-BFGS-B" where the values should always be finite.)
功能fn可以返回NA或Inf如果该功能不能在所提供的价值进行评估,但初始值必须有一个可计算的有限值fn。 (除方法"L-BFGS-B"值应该始终是有限的。)

optim can be used recursively, and for a single parameter as well as many.  It also accepts a zero-length par, and just evaluates the function with that argument.

The control argument is a list that can supply any of the following components:

trace Non-negative integer. If positive, tracing information on the progress of the optimization is produced. Higher values may produce more tracing information: for method "L-BFGS-B" there are six levels of tracing.  (To understand exactly what
trace非负整数。如果是正数,优化进度的跟踪信息产生。更高的值可能会产生更多的跟踪信息:方法"L-BFGS-B"有6个级别的跟踪。 (要了解什么

fnscale An overall scaling to be applied to the value of fn and gr during optimization. If negative, turns the problem into a maximization problem. Optimization is

parscale A vector of scaling values for the parameters. Optimization is performed on par/parscale and these should be comparable in the sense that a unit change in any element produces about a unit change in the scaled value.  Not used (nor needed)

ndeps A vector of step sizes for the finite-difference approximation to the gradient, on par/parscale

maxit The maximum number of iterations. Defaults to 100 for the derivative-based methods, and 500 for "Nelder-Mead".

For "SANN" maxit gives the total number of function evaluations: there is no other stopping criterion. Defaults to 10000.
"SANN"maxit给功能评价总数:有没有其他的停止准则。 10000默认。

abstol The absolute convergence tolerance. Only

reltol Relative convergence tolerance.  The algorithm stops if it is unable to reduce the value by a factor of reltol * (abs(val) + reltol) at a step.  Defaults to
reltol相对收敛公差。该算法停止,如果它不能由一个一步reltol * (abs(val) + reltol)的因素,以减少价值。默认为

alpha, beta, gamma Scaling parameters for the "Nelder-Mead" method. alpha is the reflection factor (default 1.0), beta the contraction factor (0.5) and
alpha,beta,gamma"Nelder-Mead"方法缩放参数。 alpha是反射系数(默认为1.0),beta收缩因子(0.5)

REPORT The frequency of reports for the "BFGS", "L-BFGS-B" and "SANN" methods if control$trace is positive. Defaults to every 10 iterations for "BFGS" and

type for the conjugate-gradients method. Takes value 1 for the Fletcher–Reeves update, 2 for

lmm is an integer giving the number of BFGS updates

factr controls the convergence of the "L-BFGS-B" method. Convergence occurs when the reduction in the objective is within this factor of the machine tolerance. Default is 1e7,

pgtol helps control the convergence of the "L-BFGS-B" method. It is a tolerance on the projected gradient in the current search direction. This defaults to zero, when the check is

temp controls the "SANN" method. It is the starting temperature for the cooling schedule. Defaults to

tmax is the number of function evaluations at each

Any names given to par will be copied to the vectors passed to fn and gr.  Note that no other attributes of par are copied over.


A list with components:

The best set of parameters found.

The value of fn corresponding to par.

A two-element integer vector giving the number of calls to fn and gr respectively. This excludes those calls needed to compute the Hessian, if requested, and any calls to fn to compute a finite-difference approximation to the gradient.

An integer code. 0 indicates successful completion (which is always the case for "SANN" and "Brent").  Possible error codes are     
一个整数的代码。 0表示成功完成(这始终是"SANN"和"Brent"的情况下)。可能出现的错误代码

1indicates that the iteration limit maxit had been reached.  

10indicates degeneracy of the Nelder–Mead simplex.  

51indicates a warning from the "L-BFGS-B" method; see component message for further details.  

52indicates an error from the "L-BFGS-B" method; see component message for further details.     

A character string giving any additional information returned by the optimizer, or NULL.

Only if argument hessian is true. A symmetric matrix giving an estimate of the Hessian at the solution found.  Note that this is the Hessian of the unconstrained problem even if the box constraints are active.


optim will work with one-dimensional pars, but the default method does not work well (and will warn).  Method "Brent" uses optimize provided bounds are available; "BFGS" often works well enough if not.


The code for methods "Nelder-Mead", "BFGS" and "CG" was based originally on Pascal code in Nash (1990) that was translated by p2c and then hand-optimized.  Dr Nash has agreed that the code can be made freely available.

The code for method "L-BFGS-B" is based on Fortran code by Zhu, Byrd, Lu-Chen and Nocedal obtained from Netlib (file "opt/lbfgs_bcm.shar": another version is in "toms/778").
代码的方法"L-BFGS-B"朱鲁,伯德,陈和Nocedal基于Fortran代码从NETLIB(文件获得“opt/lbfgs_bcm.shar:另一个版本是在toms/778 )。

The code for method "SANN" was contributed by A. Trapletti.
代码的方法"SANN"贡献由A. Trapletti。


annealing algorithms on <code>Rd</code>. J. Applied Probability, 29, 885&ndash;895.
memory algorithm for bound constrained optimization. SIAM J. Scientific Computing, 16, 1190&ndash;1208.
conjugate gradients. Computer Journal 7, 148&ndash;154.
Computers. Linear Algebra and Function Minimisation. Adam Hilger.
minimization. Computer Journal 7, 308&ndash;313.

参见----------See Also----------

nlm, nlminb.

optimize for one-dimensional minimization and constrOptim for constrained optimization.



fr &lt;- function(x) {   ## Rosenbrock Banana function[]
    x1 <- x[1]
    x2 <- x[2]
    100 * (x2 - x1 * x1)^2 + (1 - x1)^2
grr &lt;- function(x) { ## Gradient of 'fr'[#梯度“FR”]
    x1 <- x[1]
    x2 <- x[2]
    c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1),
       200 *      (x2 - x1 * x1))
optim(c(-1.2,1), fr)
optim(c(-1.2,1), fr, grr, method = "BFGS")
optim(c(-1.2,1), fr, NULL, method = "BFGS", hessian = TRUE)
## These do not converge in the default number of steps[#这些不衔接的步骤默认数量]
optim(c(-1.2,1), fr, grr, method = "CG")
optim(c(-1.2,1), fr, grr, method = "CG", control=list(type=2))
optim(c(-1.2,1), fr, grr, method = "L-BFGS-B")

flb <- function(x)
    { p <- length(x); sum(c(1, rep(4, p-1)) * (x - c(1, x[-p])^2)^2) }
## 25-dimensional box constrained[#25维框约束]
optim(rep(3, 25), flb, NULL, method = "L-BFGS-B",
      lower=rep(2, 25), upper=rep(4, 25)) # par[24] is *not* at boundary[面值[24] *不*边界]

## "wild" function , global minimum at about -15.81515[#“野生”的功能,全球至少约-15.81515]
fw <- function (x)
    10*sin(0.3*x)*sin(1.3*x^2) + 0.00001*x^4 + 0.2*x+80
plot(fw, -50, 50, n=1000, main = "optim() minimising 'wild function'")

res <- optim(50, fw, method="SANN",
             control=list(maxit=20000, temp=20, parscale=20))
## Now improve locally {typically only by a small bit}:[#现在提高本地{通常只由一个小位}:]
(r2 <- optim(res$par, fw, method="BFGS"))
points(r2$par, r2$value, pch = 8, col = "red", cex = 2)

## Combinatorial optimization: Traveling salesman problem[#组合优化旅行商问题]
library(stats) # normally loaded[通常加载]

eurodistmat <- as.matrix(eurodist)

distance &lt;- function(sq) {  # Target function[目标函数]
    sq2 <- embed(sq, 2)

genseq &lt;- function(sq) {  # Generate new candidate sequence[生成新的候选序列]
    idx <- seq(2, NROW(eurodistmat)-1)
    changepoints <- sample(idx, size=2, replace=FALSE)
    tmp <- sq[changepoints[1]]
    sq[changepoints[1]] <- sq[changepoints[2]]
    sq[changepoints[2]] <- tmp

sq &lt;- c(1:nrow(eurodistmat), 1)  # Initial sequence: alphabetic[初始序列:字母]
# rotate for conventional orientation[传统的方向旋转]
loc <- -cmdscale(eurodist, add=TRUE)$points
x <- loc[,1]; y <- loc[,2]
s <- seq_len(nrow(eurodistmat))
tspinit <- loc[sq,]

plot(x, y, type="n", asp=1, xlab="", ylab="",
     main="initial solution of traveling salesman problem", axes = FALSE)
arrows(tspinit[s,1], tspinit[s,2], tspinit[s+1,1], tspinit[s+1,2],
       angle=10, col="green")
text(x, y, labels(eurodist), cex=0.8)

set.seed(123) # chosen to get a good soln relatively quickly[soln的相对迅速获得了良好的选择]
res <- optim(sq, distance, genseq, method = "SANN",
             control = list(maxit = 30000, temp = 2000, trace = TRUE,
                            REPORT = 500))
res  # Near optimum distance around 12842[近约12842最佳距离]

tspres <- loc[res$par,]
plot(x, y, type="n", asp=1, xlab="", ylab="",
     main="optim() 'solving' traveling salesman problem", axes = FALSE)
arrows(tspres[s,1], tspres[s,2], tspres[s+1,1], tspres[s+1,2],
       angle=10, col="red")
text(x, y, labels(eurodist), cex=0.8)

