# Properties of the adjoint equations¶

The adjoint equations have a reputation for being
counterintuitive. When told the adjoint equations run backwards in
time, this can strike the novice as bizarre. Therefore, it is worth
taking some time to explore these properties, until it is *obvious*
that the adjoint system should run backwards in time, and (more
generally) reverse the propagation of information. In fact, these
supposedly confusing properties are induced by nothing more exotic
than simple transposition.

## The adjoint reverses the propagation of information¶

### A simple advection example¶

Suppose we are are solving a one-dimensional advection-type equation on a mesh with three nodes, at \(x_0=0\), \(x_1=0.5\), and \(x_2=1\). The velocity goes from left to right, and so we impose an inflow boundary condition at the left-most node \(x_0\). A simple sketch of the linear system that might describe this configuration could look as follows:

where \(a, b, c, d\) and \(e\) are some coefficients of the
matrix arising from a discretisation of the equation. The equation for
\(u_1\) does not depend on \(u_2\), as information is flowing
from left to right. The structure of the matrix dictates the
propagation of information of the system: first \(u_0\) is set to
the boundary condition value, then \(u_1\) may be computed, and
then finally \(u_2\). The *lower-triangular nature of the matrix
reflects the rightward propagation of information.*

Notice that \(u_0\) is *prescribed*: that is, the value of
\(u_0\) does not depend on the values at any other nodes; all
off-diagonal entries on the row for \(u_0\) are zero. Notice
further that the value \(u_2\) is *diagnostic*: no other nodes
depend on its value; all off-diagonal entries on its column are zero.

Now suppose that we take the adjoint of this system with respect to some functional \(J(u)\). The operator is linear (no entry in the matrix depends on \(u\)), and so the adjoint of this system is just its transpose:

where \(\lambda\) is the adjoint variable corresponding to
\(u\). Observe that transposing the forward system yields an
upper-triangular adjoint system: *the adjoint propagates information
from right to left, in the opposite sense to the propagation of the
forward system*. To solve this system, one would first solve for
\(\lambda_2\), then compute \(\lambda_1\), and finally
\(\lambda_0\).

Further notice that \(\lambda_2\) is now prescribed: it can be computed directly from the data, with no dependencies on the values of other adjoint variables; all of the off-diagonal entries in its row are zero. \(\lambda_0\) is now diagnostic: no other variables depend on its value; all off-diagonal entries in its column are zero.

### A time-dependent example¶

Now consider a time-dependent system. For convenience, we assume the
system is linear, but the result holds true in exactly the same way
for nonlinear systems. We start with an initial condition \(f_0\)
for \(u_0\) (where the subscript denotes the timestep, rather than
the node). We then use this information to compute the value at the
next timestep, \(u_1\). This information is then used to compute
\(u_2\), and so on. This temporal structure can be represented as
a *block-structured* matrix:

where \(I\) is the identity operator, \(A, B, C, D\) and \(E\) are some operators arising from the discretisation of the time-dependent system, \(f_0\) is the initial condition for \(u_0\), and \(f_n\) is the source term for the equation for \(u_n\).

Again, the *temporal propagation of information forward in time is
reflected in the lower-triangular structure of the matrix*. This
reflects the fact that it is possible to timestep the system, and
solve for parts of the solution \(u\) at a time. If the discrete
operator were not lower-triangular, all timesteps of the solution
\(u\) would be coupled, and would have to be solved for together.

Notice again that the value at the initial time \(u_0\) is prescribed, and the value at the final time \(u_2\) is diagnostic.

Now let us take the adjoint of this system. Since the operator has been assumed to be linear, the adjoint of this system is given by the block-structured matrix

where \(\lambda\) is the adjoint variable corresponding to
\(u\). Observe that the adjoint system is now upper-triangular:
*the adjoint propagates information from later times to earlier times,
in the opposite sense to the propagation of the forward system*. To
solve this system, one would first solve for \(\lambda_2\), then
compute \(\lambda_1\), and finally \(\lambda_0\).

Notice once more that the prescribed-diagnostic relationship
applies. In the forward model, the initial condition is prescribed,
and the solution at the final time is diagnostic. In the adjoint
model, the solution at the final time is prescribed (a so-called
*terminal condition*, rather than an initial condition), and the
solution at the beginning of time is diagnostic. This is why when the
adjoint of a continuous system is derived, the formulation always
includes the specification of a terminal condition on the adjoint
system.

## The adjoint equation is linear¶

As noted in the previous section, the operator of the tangent linear system is the linearisation of the operator about the solution \(u\); therefore, the adjoint system is always linear in \(\lambda\).

This has two major effects. The first is a beneficial effect on the
computation time of the adjoint run: while the forward model may be
nonlinear, *the adjoint is always linear, and so it can be much
cheaper to solve than the forward model*. For example, if the forward
model employs a Newton solver for the nonlinear problem that uses on
average \(5\) linear solves to converge to machine precision, then
a rough estimate for the adjoint computation is that it will take
\(1/5\) the runtime of the forward model.

The second major effect is on the storage requirements of the adjoint
run. Unfortunately, this effect is not beneficial. The adjoint
operator is a linearisation of the nonlinear operator about the
solution \(u\): therefore, *if the forward model is nonlinear, the
forward solution must be available to assemble the adjoint system*. If
the forward model is steady, this is not a significant difficulty:
however, *if the forward model is time-dependent, the entire solution
trajectory through time must be available*.

The obvious approach to making the entire solution trajectory available is to store the value of every variable solved for. This approach is the simplest, and it is the most efficient option if enough storage is available on the machine to store the entire solution at once. However, for long simulations with many degrees of freedom, it is usually impractical to store the entire solution trajectory, and therefore some alternative approach must be implemented.

### Summary¶

Now that the adjoint and tangent linear equations have been introduced, and some of their properties discussed, let us see in more detail the applications of these concepts. This is discussed in the next section.