## A Simple Example¶

Suppose we have a module we want to use with our FEniCS model, in this example the module will be named normalise and consist of only one function: normalise(func). The module looks like this:

from fenics import *

def normalise(func):
vec = func.vector()
vec /= vec.norm('l2')
return Function(func.function_space(), vec)


The function normalise(func) normalises the vector form of a FEniCS function, then returns the FEniCS function form of that normalised vector. A simple fenics program that uses this function might look like this:

from fenics import *
from normalise import normalise

mesh = UnitSquareMesh(10, 10)
V = FunctionSpace(mesh, 'CG', 1)

f = project(Expression('x[0]*x[1]', degree=1), V)

g = normalise(f)

J = assemble(g*dx)


Here we define a function on a space, normalise it with our function and integrate it over the space. Now we want to know the gradient of $$J$$ with respect to the initial conditions, we could try simply adding

from fenics_adjoint import *


and

dJdf = compute_gradient(J, Control(f))


but that won’t work, because pyadjoint does not know that it should record the normalisation and it does not know what the derivative of the normalisation is. We should create a new module that overloads normalise(func), telling pyadjoint how to deal with it.

Let us now create a module overloading the normalise(func) function. We need to start by importing the FEniCS and dolfin-adjoint modules, along with some specific functions needed for overloading and of course the function we want to overload.

from fenics import *

from normalise import normalise


Since we are overloading normalise(func) we need to change it’s name to keep access to it:

backend_normalise = normalise


### The Block class¶

The pyadjoint tape consists of instances of Block subclasses. These subclasses implement methods that can compute partial derivatives of their respective function. Thus, to properly overload normalise we must implement a Block subclass, which we call NormaliseBlock. In addition to writing a constructor we have to override the methods evaluate_adj_component() and recompute_component(), we will also add a __str__() method. In our example the constructor may look like this

class NormaliseBlock(Block):
def __init__(self, func, **kwargs):
super(NormaliseBlock, self).__init__()
self.kwargs = kwargs


We call the superclass constructor and save the key word arguments. Then we tell pyadjoint that the operation this block represents depends on the function func. As func should be an overloaded object it has a block_variable() attribute.

Next we can define a __str__() method. This gives a name to the block, so the output of this is for example how the block is represented in graphs made with visualise as explained in the section on debugging.

def __str__(self):
return "NormaliseBlock"


We need a recompute method that can recompute the function with new inputs.

def recompute_component(self, inputs, block_variable, idx, prepared):
return backend_normalise(inputs[0])


We get a list of new inputs which is of length 1 because we only have one input variable. Or more precisely, we only added one dependency in the constructor.

The method evaluate_adj_component() should evaluate the one component of the vector-Jacobian product. In the mathematical background we discussed the tangent linear model and the adjoint on the level of the whole model. Here we consider more concretely how dolfin-adjoint treats each block. pyadjoint treats a forward model as a series of equation solves. Some of these equations are complicated PDEs that are solved by the FEniCS function solve, but others are of the straightforward form

$y = f(x_1,\ldots,x_n),$

where $$y$$ is the only unknown. Our normalise function may be represented by this kind of equation. When differentiating a functional pyadjoint works by considering each block as a link in chain formed by the chain rule. If a functional is the result of a series of straightforward transformations on an initial condition:

$J(u_n(u_{n-1}(\ldots(u_0)\ldots))),$

then by the chain rule

$\frac{\mathrm{d}J}{\mathrm{d}u_0} = \frac{\partial J}{\partial u_n}\frac{\partial u_n}{\partial u_{n-1}}\ldots\frac{\partial u_1}{\partial u_0}.$

If we consider instead the adjoint model we will find the transpose of $$\frac{\mathrm{d}J}{\mathrm{d}u_0}$$:

$\frac{\mathrm{d}J}{\mathrm{d}u_0}^* = \frac{\partial u_1}{\partial u_0}^*\frac{\partial u_2}{\partial u_{1}}^*\ldots\frac{\partial J}{\partial u_n}^*.$

Calculating from the right we find that for each link

$y_i = \frac{\partial u_i}{\partial u_{i-1}}^*y_{i+1},$

where

$y_{n+1} = \frac{\partial J}{\partial u_n}^*.$

and

$y_1 = \frac{\mathrm{d} J}{\mathrm{d} u_0}^*$

Each block only needs to find the transpose of its own gradient! This is implemented in evaluate_adj().

### Back to our example¶

Mathematically our normalisation block may be represented in index notation as

$f(x_i) = \frac{x_i}{||x||}.$

The Jacobian matrix consists of the entries

$\frac{\partial f(x_i)}{\partial x_j} = \frac{1}{||x||} \delta_{ij} - \frac{x_i x_j}{||x||^3}$
evaluate_adj() takes a vector as input and returns the transpose of the Jacobian matrix
multiplied with that vector:
$\nabla f^* \cdot y = \sum_j \frac{\partial f(x_j)}{\partial x_i} y_j = \sum_{j} \frac{1}{||x||} \delta_{ij} y_j - \frac{x_i x_j}{||x||^3} y_j = \frac{y_i}{||x||} - \frac{x_i}{||x||^3} \sum_j x_j y_j$

evaluate_adj_component() works as evaluate_adj(), but computes only the component that corresponds to a single dependency (input). In other words, given an index $$i$$ evaluate_adj_component() computes the component $$\left(\nabla f^* \cdot y\right)_i$$.

By default, evaluate_adj() calls evaluate_adj_component() for each of the relevant components.

Now let us look at the implementation:

def evaluate_adj_component(self, inputs, adj_inputs, block_variable, idx, prepared=None):
x = inputs[idx].vector()
inv_xnorm = 1.0 / x.norm('l2')
return inv_xnorm * adj_input - inv_xnorm ** 3 * x.inner(adj_input) * x


evaluate_adj_component() takes 5 arguments:

• inputs is a list of the inputs where we compute the derivative, i.e $$x$$ in the above derivations. This list has the same length as the list of dependencies.
• adj_inputs is a list of the adjoint inputs, i.e $$y_{i+1}$$ above with this method representing the computation of $$y_i$$. This list has the same length as the list of outputs.
• block_variable is the block variable of the dependency (input) that we differentiate with respect to.
• idx is the index of the dependency, that we differentiate with respect to, in the list of dependencies. Given a function output $$z = f(x, y)$$, where the dependency list is [x, y], then $$(\partial z/\partial x)^*$$ for idx == 0 and $$(\partial z/\partial y)^*$$ for idx == 1.
• prepared can be anything. It is the return value of prepare_evaluate_adj(), which is run before evaluate_adj_component() is called for each relevant dependency and the default return value is None. If your implementation would benefit from doing some computations independent of the relevant dependencies, you should consider implementing prepare_evaluate_adj(). For example, for solve() the adjoint equation is solved in prepare_evaluate_adj(), and the adjoint solution is provided in the prepared parameter.

For more in-depth documentation on Blocks in pyadjoint, see

Now we are ready to define our overloaded function. For simple functions, where the function return value is the output, pyadjoint offers a convenience function for overloading. For this example, we utilize this convenience function:

from pyadjoint.overloaded_function import overload_function


That’s it! Now we are ready to use our function normalise with dolfin-adjoint. Let us perform a taylor test to see if it works:

from fenics import *

from numpy.random import rand

mesh = UnitSquareMesh(10, 10)
V = FunctionSpace(mesh, 'CG', 1)

f = project(Expression('x[0]*x[1]', degree=1), V)

g = normalise(f)

J = assemble(g*dx)

h = Function(V)
h.vector()[:] = rand(h.vector().local_size())

taylor_test(ReducedFunctional(J, Control(f)), f, h)


This gives the output:

Running Taylor test
Computed residuals: [5.719808547999933e-06, 1.4356712128880207e-06, 3.596346874345e-07, 8.999840626988876e-08]
Computed convergence rates: [1.99424146695553, 1.9971213080328687, 1.9985608192605893]


It works.