# PDE-constrained optimisation¶

## Problem statement¶

Let \(m\) be a vector of some parameters. For example, \(m\) might be the values of an initial condition, or of a source term, or of a boundary condition.

Let \(F(u, m) \equiv 0\) be a (system of) partial differential
equations that describe the physics of the problem of
interest. \(F\) is a vector expression (one entry for each
equation), with all terms in the equation gathered on to the left-hand
side. The idea is that, for any (feasible) choice of \(m \in \mathbb{R}^M\), the
PDE \(F\) can be solved to yield the solution \(u \in \mathbb{R}^U\). In other
words, *the solution* \(u\) *can be thought of as an implicit
function* \(u(m)\) *of the parameters* \(m\), related through
the PDE \(F(u, m) \equiv 0\). We never have an explicit expression
for \(u\) in terms of \(m\), but as we shall see, we can still
discuss its derivative \({\mathrm{d}u}/{\mathrm{d}m}\).

If the problem \(F(u, m)\) is time-dependent, this abstraction still holds. In this case, think of \(u\) as a vector containing all time values of all prognostic variables. In the discrete case, \(u\) is a vector with the value of the solution at the first timestep, then the value at the second timestep, and so on, for however many timesteps are required.

Finally, let \(J(u, m)\) be a *functional* of interest. \(J\)
represents the quantity to be optimised: for example, the quality of a
design is to be maximised, or the misfit between observations and
computations is to be minimised.

A general statement of the PDE-constrained optimisation problem is then given as follows: find the \(m\) that minimises \(J(u, m)\), subject to the constraint that \(F(u, m) = 0\). For simplicity, we suppose that there are no further constraints on the choice of \(m\); there are well-known techniques for handling such situations. If \(J\) is to be maximised instead of minimised, just consider minimising the functional \(-J\).

Throughout this introduction, we shall implicitly consider the case
where *the dimension of the parameter space is very large*. This means
that we shall seek out algorithms that scale well with the dimension
of the parameter space, and discard those that do not. We shall also
generally assume that *solving the PDE is very expensive*: therefore,
we will seek out algorithms which attempt to minimise the number of
PDE solutions required. This combination of events – a large
parameter space, and an expensive PDE – is the most interesting,
common, practical and difficult situation, and therefore it is the one
we shall attempt to tackle head-on.

## Solution approaches¶

There are many ways to approach solving this problem. The approach
that we shall take here is to apply a *gradient-based optimisation
algorithm*, as these techniques scale to large numbers of parameters
and to complex, nonlinear, time-dependent PDE constraints.

To apply an optimisation algorithm, we will convert the
PDE-constrained optimisation problem into an unconstrained
optimisation problem. Let \(\widehat{J}(m) \equiv J(u(m), m)\) be
the functional *considered as a pure function of the parameters*
\(m\): that is, to compute \(\widehat{J}(m)\), solve the PDE
\(F(u, m) = 0\) for \(u\), and then evaluate \(J(u,
m)\). The functional \(\widehat{J}\) has the PDE constraint “built
in”: by considering \(\widehat{J}\) instead of \(J\), we
convert the constrained optimisation problem to a simpler,
unconstrained one. The problem is now posed as: find the \(m\)
that minimises \(\widehat{J}(m)\).

Given some software that solves the PDE \(F(u, m) = 0\), we have a black box for computing the value of the functional \(\widehat{J}\), given some argument \(m\). If we can only evaluate the functional, and have no information about its derivatives, then we are forced to use a gradient-free optimisation algorithm such as a genetic algorithm. The drawback of such methods is that they typically scale very poorly with the dimension of the parameter space: even for a moderate sized parameter space, a gradient-free algorithm will typically take hundreds or thousands of functional evaluations before terminating. Since each functional evaluation involves a costly PDE solve, such an approach quickly becomes impractical.

By contrast, optimisation algorithms that can exploit information about the derivatives of \(\widehat{J}\) can typically converge onto a local minimum with one or two orders of magnitude fewer iterations, as the gradient provides information about where to step next in parameter space. Therefore, if evaluating the PDE solution is expensive (and it usually is), then computing derivative information of \(\widehat{J}\) becomes very important for the practical solution of such PDE-constrained optimisation problems.

So, how should the gradient \({\mathrm{d}\widehat{J}}/{\mathrm{d}m}\) be computed? There are three main approaches, each with their own advantages and disadvantages. Discussing these strategies is the topic of the next section.

References

[2M-Kar39] | W. Karush. Minima of functions of several variables with inequalities as side constraints. Master’s thesis, University of Chicago, Chicago, IL, USA, 1939. |

[2M-KT51] | H. W. Kuhn and A. W. Tucker. Nonlinear programming. In Proceedings of 2nd Berkeley Symposium, 481–492. University of California Press, 1951. |