Monthly Archives: November 2012

Optimisation problems: Choosing a method

In this morning’s Code Coffee meeting, we discussed optimisation methods (for doing things like least-squares minimisation, not for optimising code). This is an extremely common problem in astrophysics – you have some data, and you want to find the model that best fits it. In fact, this is a very common problem in general, and so there are lots of methods out there for handling optimisation. Of course, depending on the type of problem you have, some methods will be better than others. So how do you choose between them? And how do you implement the methods? I’ve summarised some of the discussion we had on these topics below. There’s also a great website with a detailed optimisation guide here.

N.B. I should note that most problems that astrophysicists are interested in involve the optimisation of real functions given data (so-called continuous optimisation). There are vast branches of computer science research concerned with combinatorial optimisation, which I’m not so interested in here.

Choosing which method to use

As I’ve already mentioned, optimisation is a really common problem, so there are very many different optimisation algorithms, most of them with a number of different implementations. Over the course of the meeting, we came up with a basic step-by-step guide to help you choose which method to use:

  1. Is the problem linear? Use a linear method, since these are guaranteed to converge to the right answer (within some tolerance). The Conjugate Gradient method is a good first port of call for general linear problems. If you have a particularly simple problem, you could even directly invert the matrix for your linear operator.
  2. Can the problem be rewritten to be linear? If you can change variables or otherwise re-cast the problem in such a way that it becomes linear, do that and use a linear solver!
  3. Is simple brute-force good enough? Some of the simpler (or at least better-known) optimisation methods are very robust, and some are guaranteed to converge, even for pretty general non-linear problems. If you have a small number of parameters, or not too many data points, it’s often quicker and easier to use one of these. Ryan Houghton recommended four, in order of increasing complexity: Amoeba; Levenberg-Marquardt; NNLS (non-negative least-squares); and BVLS (bound variable least-squares). The first two are discussed in Numerical Recipes, and the latter two have pretty widely-used implementations.
  4. Can you analytically calculate gradients? If you can analytically/cheaply calculate the Jacobian and/or Hessian matrix of your problem (i.e. the derivatives of the function you’re minimising with respect to the model parameters), you can use a gradient-based method. Providing this extra information can result in substantial increases in the speed and convergence properties of the optimiser, so it’s worth doing, especially for larger problems.
  5. Is your problem computationally intensive/complex? Some problems are just too large or too messy to use any of the methods advocated so far, so you’ll have to look at more sophisticated methods with more complex implementations. Fortunately, there are some nice, general libraries that implement lots of fancy algorithms (e.g. NLopt, or some of the codes here).
  6. Do you have access to a supercomputer? If more sophisticated algorithms are still unsuitable, you might have to bite the bullet and just use a brute-force method and a lot of computer time. This can be nasty, especially if you don’t know how long it will take for the optimiser to converge. In this case, you might want to spend some more time thinking about clever ways to simplify your problem.

Interesting implementations

Over the course of the discussion, quite a few nice-looking optimisation libraries were mentioned. I’ve listed a few of them below.

  • NLopt has interfaces for lots of programming languages, and implements a wide range of optimisation algorithms. You can switch between algorithms by simply changing a single line of code.
  • L-BFGS is a limited-memory code for doing constrained and unconstrained large-scale optimisation problems. Consider using this if your problem has lots of parameters, or you have quite a lot of data. There’s also KNITRO from the same author, which looks more advanced.
  • levmar is a nice code for doing Levenberg-Marquardt optimisation, with support for a number of programming languages.
  • NNLS is a non-negative least-squares implementation, and is available for a number of programming languages commonly used in astrophysics. It’s supposed to have pretty good convergence properties.
  • More algorithms are listed here, and particular implementations of them are either listed here, or are only a Google search away.

Properties of optimisation algorithms

  • Convergence: Some algorithms are guaranteed to converge, but that doesn’t mean that they will converge to the correct answer!
  • Constrained optimisation: Sometimes it’s necessary to put constraints into the optimisation problem (e.g. you don’t want the mass of your black hole to go negative). You might be able to avoid doing this if you have a good guess for the starting point of the optimisation, so it’s less likely to wander off into a disallowed region of parameter space. If not, there are a number of methods for performing constrained optimisations, but they tend to be reliable only for linear constraints. Word on the street is that codes capable of handling more complicated non-linear constraints are jealously-guarded Wall Street secrets…

Thanks go to Mark Walker, Joe Zuntz, Min-Su Shin, Sigurd Naess and Ryan Houghton for contributing greatly to this discussion.