derivative evaluation by automatic differentiation of programs · outline 1 introduction 2...

93
Derivative Evaluation by Automatic Differentiation of Programs Laurent Hasco¨ et [email protected] http://www-sop.inria.fr/tropics Ecole d’´ et´ e CEA-EDF-INRIA, Juillet 2005 Laurent Hasco¨ et () Automatic Differentiation CEA-EDF-INRIA 2005 1 / 88

Upload: others

Post on 23-Sep-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Derivative Evaluation by

Automatic Differentiation of Programs

Laurent [email protected]://www-sop.inria.fr/tropics

Ecole d’ete CEA-EDF-INRIA,Juillet 2005

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 1 / 88

Page 2: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Outline

1 Introduction

2 Formalization

3 Reverse AD

4 Alternative formalizations

5 Memory issues in Reverse AD: Checkpointing

6 Multi-directional

7 Reverse AD for Optimization

8 AD for Sensitivity to Uncertainties

9 Some AD Tools

10 Static Analyses in AD tools

11 The TAPENADE AD tool

12 Validation of AD results

13 Expert-level AD

14 ConclusionLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 2 / 88

Page 3: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

So you need derivatives ?...

Given a program P computing a function F

F : IRm → IRn

X 7→ Y

we want to build a program that computes the derivativesof F .

Specifically, we want the derivatives of the dependent,i.e. some variables in Y ,with respect to the independent,i.e. some variables in X .

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 3 / 88

Page 4: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Which derivatives do you want?

Derivatives come in various shapes and flavors:

Jacobian Matrices: J =(

∂yj

∂xi

)Directional or tangent derivatives, differentials:dY = Y = J × dX = J × XGradients:

When n = 1 output : gradient = J =(

∂y∂xi

)When n > 1 outputs: gradient = Y

t × J

Higher-order derivative tensors

Taylor coefficients

Intervals ?

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 4 / 88

Page 5: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Divided Differences

Given X , run P twice, and compute Y

Y =P(X + εX )− P(X )

ε

Pros: immediate; no thinking required !

Cons: approximation; what ε ?⇒ Not so cheap after all !

Most applications require inexpensive and accuratederivatives.

⇒ Let’s go for exact, analytic derivatives !

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 5 / 88

Page 6: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Automatic Differentiation

Augment program P to make it compute the analyticderivatives

P: a = b*T(10) + c

The differentiated program must somehow compute:P’: da = db*T(10) + b*dT(10) + dc

How can we achieve this?

AD by Overloading

AD by Program transformation

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 6 / 88

Page 7: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

AD by overloading

Tools: adol-c, adtageo,...Few manipulations required:

DOUBLE → ADOUBLE ;

link with provided overloaded +,-,*,. . .

Easy extension to higher-order, Taylor series, intervals,. . . but not so easy for gradients.

Anecdote?:

real → complex

x = a*b →(x , dx) = (a*b-da*db , a*db+da*b)

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 7 / 88

Page 8: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

AD by Program transformation

Tools: adifor, taf, tapenade,...

Complex transformation required:

Build a new program that computes the analyticderivatives explicitly.Requires a compiler-like, sophisticated tool

1 PARSING,2 ANALYSIS,3 DIFFERENTIATION,4 REGENERATION

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 8 / 88

Page 9: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Overloading vs Transformation

Overloading is versatile,

Transformed programs are efficient:

Global program analyses are possible and mostwelcome !

The compiler can optimize the generated program.

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 9 / 88

Page 10: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Example: Tangent differentiation

by Program transformation

SUBROUTINE FOO(v1, v2, v4, p1)

REAL v1,v2,v3,v4,p1

v3 = 2.0*v1 + 5.0

v4 = v3 + p1*v2/v3

END

v3d = 2.0*v1d

v4d = v3d + p1*(v2d*v3-v2*v3d)/(v3*v3)

REAL v1d,v2d,v3d,v4d

v1d, v2d, v4d,•

Just inserts “differentiated instructions” into FOO

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 10 / 88

Page 11: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Example: Tangent differentiation

by Program transformation

SUBROUTINE FOO(v1, v2, v4, p1)

REAL v1,v2,v3,v4,p1

v3 = 2.0*v1 + 5.0

v4 = v3 + p1*v2/v3

END

v3d = 2.0*v1d

v4d = v3d + p1*(v2d*v3-v2*v3d)/(v3*v3)

REAL v1d,v2d,v3d,v4d

v1d, v2d, v4d,•

Just inserts “differentiated instructions” into FOO

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 10 / 88

Page 12: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Example: Tangent differentiation

by Program transformation

SUBROUTINE FOO(v1, v2, v4, p1)

REAL v1,v2,v3,v4,p1

v3 = 2.0*v1 + 5.0

v4 = v3 + p1*v2/v3

END

v3d = 2.0*v1d

v4d = v3d + p1*(v2d*v3-v2*v3d)/(v3*v3)

REAL v1d,v2d,v3d,v4d

v1d, v2d, v4d,•

Just inserts “differentiated instructions” into FOOLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 10 / 88

Page 13: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Outline

1 Introduction

2 Formalization

3 Reverse AD

4 Alternative formalizations

5 Memory issues in Reverse AD: Checkpointing

6 Multi-directional

7 Reverse AD for Optimization

8 AD for Sensitivity to Uncertainties

9 Some AD Tools

10 Static Analyses in AD tools

11 The TAPENADE AD tool

12 Validation of AD results

13 Expert-level AD

14 ConclusionLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 11 / 88

Page 14: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Dealing with the Programs’ Control

Programs contain control:discrete ⇒ non-differentiable.

if (x <= 1.0) thenprintf("x too small");

else {y = 1.0;while (y <= 10.0) {

y = y*x;x = x+0.5;

}}

Not differentiable for x=1.0Not differentiable for x=2.9221444

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 12 / 88

Page 15: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Take control away!

We differentiate programs. But control ⇒ non-differentiability!

Freeze the current control:For one given control, the program becomes a simple list ofinstructions ⇒ differentiable:

printf("x too small");

y = 1.0; y = y*x; x = x+0.5;

AD differentiates these lists of instructions:

Program

CodeList 1

CodeList 2

CodeList N

Diff(CodeList 1)

Diff(CodeList 2)

Diff(CodeList N)

Diff(Program)

Control 1:

Control N:

Control 1

Control N

Caution: the program is only piecewise differentiable !Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 13 / 88

Page 16: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Computer Programs as Functions

Identify sequences of instructions

{I1; I2; . . . Ip−1; Ip; }

with composition of functions.

Each simple instruction

Ik : v4 = v3 + v2/v3

is a function fk : IRq → IRq whereThe output v4 is built from the input v2 and v3All other variable are passed unchanged

Thus we see P : {I1; I2; . . . Ip−1; Ip; } as

f = fp ◦ fp−1 ◦ · · · ◦ f1

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 14 / 88

Page 17: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Using the Chain Rule

We see program P as:

f = fp ◦ fp−1 ◦ · · · ◦ f1

We define for short:

W0 = X and Wk = fk(Wk−1)

The chain rule yields:

f ′(X ) = f ′p(Wp−1).f′p−1(Wp−2). . . . .f

′1(W0)

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 15 / 88

Page 18: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

The Jacobian Program

f ′(X ) = f ′p(Wp−1).f′p−1(Wp−2). . . . .f

′1(W0)

translates immediately into a program that computes theJacobian J:

I1 ; /* W = f1(W ) */

I2 ; /* W = f2(W ) */...

Ip ; /* W = fp(W ) */

W = X ;J = f ′1(W ) ;

J = f ′2(W ) ∗ J ;

J = f ′p(W ) ∗ J ;

Y = W ;

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 16 / 88

Page 19: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

The Jacobian Program

f ′(X ) = f ′p(Wp−1).f′p−1(Wp−2). . . . .f

′1(W0)

translates immediately into a program that computes theJacobian J:

I1 ; /* W = f1(W ) */

I2 ; /* W = f2(W ) */...

Ip ; /* W = fp(W ) */

W = X ;J = f ′1(W ) ;

J = f ′2(W ) ∗ J ;

J = f ′p(W ) ∗ J ;

Y = W ;

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 16 / 88

Page 20: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Tangent mode and Reverse mode

Full J is expensive and often useless.We’d better compute useful projections of J.

tangent AD :

Y = f ′(X ).X = f ′p(Wp−1).f′p−1(Wp−2) . . . f ′1(W0).X

reverse AD :

X = f ′t(X ).Y = f ′t1 (W0). . . . f′tp−1(Wp−2).f

′tp (Wp−1).Y

Evaluate both from right to left:⇒ always matrix × vector

Theoretical cost is about 4 times the cost of P

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 17 / 88

Page 21: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Costs of Tangent and Reverse AD

F : IRm → IRn

( )[]m inputs

n outputs

Gradient

Tangent

J costs m ∗ 4 ∗ P using the tangent modeGood if m <= nJ costs n ∗ 4 ∗ P using the reverse modeGood if m >> n (e.g n = 1 in optimization)Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 18 / 88

Page 22: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Back to the Tangent Mode example

v3 = 2.0*v1 + 5.0v4 = v3 + p1*v2/v3

Elementary Jacobian matrices:

f ′(X ) = ...

1

11

0 p1

v31− p1∗v2

v23

0

11

2 01

...

v3 = 2 ∗ v1

v4 = v3 ∗ (1− p1 ∗ v2/v23 ) + v2 ∗ p1/v3

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 19 / 88

Page 23: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Tangent Mode example continued

Tangent AD keeps the structure of P :...

v3d = 2.0*v1dv3 = 2.0*v1 + 5.0v4d = v3d*(1-p1*v2/(v3*v3)) + v2d*p1/v3v4 = v3 + p1*v2/v3

...Differentiated instructions insertedinto P’s original control flow.

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 20 / 88

Page 24: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Outline

1 Introduction

2 Formalization

3 Reverse AD

4 Alternative formalizations

5 Memory issues in Reverse AD: Checkpointing

6 Multi-directional

7 Reverse AD for Optimization

8 AD for Sensitivity to Uncertainties

9 Some AD Tools

10 Static Analyses in AD tools

11 The TAPENADE AD tool

12 Validation of AD results

13 Expert-level AD

14 ConclusionLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 21 / 88

Page 25: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Focus on the Reverse mode

X = f ′t(X ).Y = f ′t1 (W0) . . . f ′tp (Wp−1).Y

Ip−1 ;W = Y ;W = f ′tp (Wp−1) * W ;

Ip−2 ;

Restore Wp−2 before Ip−2 ;W = f ′tp−1(Wp−2) * W ;

I1 ;...

...Restore W0 before I1 ;W = f ′t1 (W0) * W ;X = W ;

Instructions differentiated in the reverse order !

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 22 / 88

Page 26: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Focus on the Reverse mode

X = f ′t(X ).Y = f ′t1 (W0) . . . f ′tp (Wp−1).Y

Ip−1 ;W = Y ;W = f ′tp (Wp−1) * W ;

Ip−2 ;

Restore Wp−2 before Ip−2 ;W = f ′tp−1(Wp−2) * W ;

I1 ;...

...Restore W0 before I1 ;W = f ′t1 (W0) * W ;X = W ;

Instructions differentiated in the reverse order !

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 22 / 88

Page 27: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Focus on the Reverse mode

X = f ′t(X ).Y = f ′t1 (W0) . . . f ′tp (Wp−1).Y

Ip−1 ;W = Y ;W = f ′tp (Wp−1) * W ;

Ip−2 ;

Restore Wp−2 before Ip−2 ;W = f ′tp−1(Wp−2) * W ;

I1 ;...

...Restore W0 before I1 ;W = f ′t1 (W0) * W ;X = W ;

Instructions differentiated in the reverse order !Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 22 / 88

Page 28: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Reverse mode: graphical interpretation

time

I I I I I

II

II

1 2 3 p-2 p-1

pp-1

21

Bottleneck: memory usage (“Tape”).

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 23 / 88

Page 29: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Back to the example

v3 = 2.0*v1 + 5.0v4 = v3 + p1*v2/v3

Transposed Jacobian matrices:

f ′t(X ) = ...

1 2

10

1

1 01 p1

v3

1 1− p1∗v2

v23

0

...

v 2 = v 2 + v 4 ∗ p1/v3

...v 1 = v 1 + 2 ∗ v 3

v 3 = 0Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 24 / 88

Page 30: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Reverse Mode example continued

Reverse AD inverses the structure of P :

...v3 = 2.0*v1 + 5.0v4 = v3 + p1*v2/v3

...

.........................../*restore previous state*/v2b = v2b + p1*v4b/v3v3b = v3b + (1-p1*v2/(v3*v3))*v4bv4b = 0.0......................../*restore previous state*/v1b = v1b + 2.0*v3bv3b = 0.0......................../*restore previous state*/

...

Differentiated instructions insertedinto the inverse of P’s original control flow.

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 25 / 88

Page 31: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Control Flow Inversion : conditionals

The control flow of the forward sweepis mirrored in the backward sweep.

...

if (T(i).lt.0.0) then

T(i) = S(i)*T(i)

endif

...

if (...) then

Sb(i) = Sb(i) + T(i)*Tb(i)

Tb(i) = S(i)*Tb(i)

endif

...

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 26 / 88

Page 32: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Control Flow Inversion : loops

Reversed loops run in the inverse order

...

Do i = 1,N

T(i) = 2.5*T(i-1) + 3.5

Enddo

...

Do i = N,1,-1

Tb(i-1) = Tb(i-1) + 2.5*Tb(i)

Tb(i) = 0.0

EnddoLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 27 / 88

Page 33: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Control Flow Inversion : spaghetti

Remember original Control Flow when it mergesB1t1

B2 B3

B4

B5

PUSH(0) PUSH(1)

PUSH(0) PUSH(1)

B5

B4

B2 B3

B1

POP(test)

POP(test)

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 28 / 88

Page 34: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Outline

1 Introduction

2 Formalization

3 Reverse AD

4 Alternative formalizations

5 Memory issues in Reverse AD: Checkpointing

6 Multi-directional

7 Reverse AD for Optimization

8 AD for Sensitivity to Uncertainties

9 Some AD Tools

10 Static Analyses in AD tools

11 The TAPENADE AD tool

12 Validation of AD results

13 Expert-level AD

14 ConclusionLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 29 / 88

Page 35: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Yet another formalization using computation

graphs

A sequence of instructions corresponds to a computationgraph

DO i=1,n

IF (B(i).gt.0.0) THEN

r = A(i)*B(i) + y

X(i) = 3*r - B(i)*X(i-3)

y = SIN(X(i)*r)

ENDIF

ENDDO

y A(i) B(i) X(i-3)

*

+ *3

*

-

*

SIN

r y X(i)

Source program Computation Graph

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 30 / 88

Page 36: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Jacobians by Vertex Elimination

1B(i)

A(i) X(i-3)B(i)

1

3

1-1

X(i)r

1

1COS(X(i)*r)

1

y A(i) B(i) X(i-3)

r y X(i)

COS(X(i)*r) *(X(i)*A(i) + r*(3*A(i) - X(i-3)))

y A(i) B(i) X(i-3)

r y X(i)

Jacobian Computation Graph Bipartite Jacobian Graph

Forward vertex elimination ⇒ tangent AD.

Reverse vertex elimination ⇒ reverse AD.

Other orders (“cross-country”) may be optimal.

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 31 / 88

Page 37: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Yet another formalization: Lagrange multipliers

v3 = 2.0*v1 + 5.0v4 = v3 + p1*v2/v3

Can be viewed as constrains. We know that theLagrangian L(v1, v2, v3, v4, v3, v4) =v4 + v3.(−v3 + 2.v1 + 5) + v4.(−v4 + v3 + p1 ∗ v2/v3)is such that:

v1 =∂v4

∂v1=

∂L∂v1

and v2 =∂v4

∂v2=

∂L∂v2

provided∂L∂v3

=∂L∂v4

=∂L∂v3

=∂L∂v4

= 0

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 32 / 88

Page 38: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

The vi are the Lagrange multipliers associated to theinstruction that sets vi .

For instance, equation ∂L∂v3

= 0 gives us:

v4.(1− p1.v2/(v3.v3))− v3 = 0

To be compared with instructionv3b = v3b + (1-p1*v2/(v3*v3))*v4b(initial v3b is set to 0.0)

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 33 / 88

Page 39: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Outline

1 Introduction

2 Formalization

3 Reverse AD

4 Alternative formalizations

5 Memory issues in Reverse AD: Checkpointing

6 Multi-directional

7 Reverse AD for Optimization

8 AD for Sensitivity to Uncertainties

9 Some AD Tools

10 Static Analyses in AD tools

11 The TAPENADE AD tool

12 Validation of AD results

13 Expert-level AD

14 ConclusionLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 34 / 88

Page 40: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Time/Memory tradeoffs for reverse AD

From the definition of the gradient X

X = f ′t(X ).Y = f ′t1 (W0) . . . f ′tp (Wp−1).Y

we get the general shape of reverse AD program:

time

I I I I I

II

II

1 2 3 p-2 p-1

pp-1

21

⇒ How can we restore previous values?

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 35 / 88

Page 41: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Restoration by recomputation

(RA: Recompute-All)

Restart execution from a stored initial state:

time

I I I I I

I

I

I

I

I

1 2 3 p-2 p-1

p

p-1

2

1

1

Memory use low, CPU use high ⇒ trade-off needed !

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 36 / 88

Page 42: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Checkpointing (RA strategy)

On selected pieces of the program, possibly nested,remember the output state to avoid recomputation.

time

p{time

Memory and CPU grow like log(size(P))Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 37 / 88

Page 43: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Restoration by storage

(SA: Store-All)

Progressively undo the assignments made by the forwardsweep

time

I I I I I

IIIIII

1 2 3 p-2 p-1

pp-1p-2321

Memory use high, CPU use low ⇒ trade-off needed !

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 38 / 88

Page 44: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Checkpointing (SA strategy)

On selected pieces of the program, possibly nested, don’tstore intermediate values and re-execute the piece whenvalues are required.

time

C{time

Memory and CPU grow like log(size(P))

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 39 / 88

Page 45: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Checkpointing on calls (SA)

A classical choice: checkpoint procedure calls !

A

B

C

D

A A

B

C

D D D B B

C C C

x : original form of x

x : forward sweep for x

x : backward sweep for x

: take snapshot

: use snapshot

Memory and CPU grow like log(size(P)) when call tree iswell balanced.

Ill-balanced call trees require not checkpointing some calls

Careful analysis keeps the snapshots small.

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 40 / 88

Page 46: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Outline

1 Introduction

2 Formalization

3 Reverse AD

4 Alternative formalizations

5 Memory issues in Reverse AD: Checkpointing

6 Multi-directional

7 Reverse AD for Optimization

8 AD for Sensitivity to Uncertainties

9 Some AD Tools

10 Static Analyses in AD tools

11 The TAPENADE AD tool

12 Validation of AD results

13 Expert-level AD

14 ConclusionLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 41 / 88

Page 47: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Multi-directional mode and Jacobians

If you want Y = f ′(X ).X for the same X and several X

either run the tangent differentiated program severaltimes, evaluating f several times.

or run a “Multi-directional” tangent once, evaluatingf once.

Same for X = f ′t(X ).Y for several Y .

In particular, multi-directional tangent or reverse is goodto get the full Jacobian.

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 42 / 88

Page 48: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Sparse Jacobians with seed matrices

When Jacobian is sparse,use “seed matrices” to propagate fewer X or Y

Multi-directional tangent mode:a b

cd

e f g

×

111

1

=

a b

cd

e f g

Multi-directional reverse mode:

(1 1

1 1

a b

cd

e f g

=

(a c be f d g

)

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 43 / 88

Page 49: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Outline

1 Introduction

2 Formalization

3 Reverse AD

4 Alternative formalizations

5 Memory issues in Reverse AD: Checkpointing

6 Multi-directional

7 Reverse AD for Optimization

8 AD for Sensitivity to Uncertainties

9 Some AD Tools

10 Static Analyses in AD tools

11 The TAPENADE AD tool

12 Validation of AD results

13 Expert-level AD

14 ConclusionLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 44 / 88

Page 50: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Applications to Optimization

From a simulation program P :

P :(design parameters)γ 7→ (cost function)J(γ)

it takes a gradient J ′(γ) to obtain an optimizationprogram.

Reverse mode AD builds program P that computes J ′(γ)

Optimization algorithms (Gradient descent, SQP, . . . )may also use 2nd derivatives. AD can provide them too.

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 45 / 88

Page 51: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Special case: steady-state

If J is defined on a state W , and W results from animplicit steady state equation

Ψ(W , γ) = 0

which is solved iteratively: W0, W1, W2, ..., W∞

then pure reverse AD of P may prove too expensive(memory...)

Solutions exist:

reverse AD on the final steady state only.Andreas Griewank’s“Piggy-backing”reverse AD on Ψ alone + hand-codingLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 46 / 88

Page 52: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

A color picture (at last !...)

AD-computed gradient of a scalar cost (sonic boom)with respect to skin geometry:

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 47 / 88

Page 53: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

... and after a few optimization steps

Improvement of the sonic boom under the plane after 8optimization cycles:

(Plane geometry provided by Dassault Aviation)

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 48 / 88

Page 54: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Outline

1 Introduction

2 Formalization

3 Reverse AD

4 Alternative formalizations

5 Memory issues in Reverse AD: Checkpointing

6 Multi-directional

7 Reverse AD for Optimization

8 AD for Sensitivity to Uncertainties

9 Some AD Tools

10 Static Analyses in AD tools

11 The TAPENADE AD tool

12 Validation of AD results

13 Expert-level AD

14 ConclusionLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 49 / 88

Page 55: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Studying Uncertainties

Assume a state W is defined as a function W (c)of uncertain parameters c .Assume a scalar cost function J(W ) is defined on W .

To model the influence of c on J(W (c)), numericianswant

dJ

dcand also

d2J

dc2

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 50 / 88

Page 56: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Repeated application of AD, Tangent-on-Reverse

Given the program W that computes (solves?) W (c)and the program J that computes the cost j = J(W )we may very well apply AD to Q(c) = J(W(c)) = j !

Q : c 7→ j time :t

Q : c , ( j + 1) 7→ c +(

∂j∂ci

)∀i

time :4t

Q : c , c + ek 7→ ck +(

∂2j∂ci∂ck

)∀i

time :16t

Q∗

: c , (c) + (ek)∀k 7→ (ck)∀k +(

∂2j∂ci∂ck

)∀i ,k

time :16mt

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 51 / 88

Page 57: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

The problem of Implicit Formulations

The cost function J(W ) is explicit and relatively simplebut the state W is often defined implicitely by

Ψ(W , c) = 0

Program W includes an iterative solver !

⇒ Do we really want to differentiate this? (No!. . . )

⇒ Let’s go back up to the math level !

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 52 / 88

Page 58: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

First derivative

Differentiating the implicit state equation wrt c , we get:

∂Ψ

∂W· ∂W

∂c+

∂Ψ

∂c= 0 ⇒ ∂W

∂c= −

[∂Ψ

∂W

]−1

· ∂Ψ

∂c

So we can write the gradient:

dJ

dc=

∂J

∂W· ∂W

∂c= − ∂J

∂W·[

∂Ψ

∂W

]−1

· ∂Ψ

∂c

For efficiency reasons, it’s best to solve for Π first:

∂Ψ

∂W

∗· Π =

∂J

∂W

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 53 / 88

Page 59: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

How to solve an adjoint equation

Π is often called an adjoint state. Its adjoint equation isof the general shape:

∂Ψ

∂W

∗· Π = Y

We can solve it iteratively (“matrix-free resolution”),provided repeated computations, for various X ’s, of

∂Ψ

∂W

∗· X

Calling Psi the procedure that computes Ψ(W , c),PsiW , reverse AD of Psi wrt W , computes just that !

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 54 / 88

Page 60: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Second derivatives

Differentiating dJdc again, we get

d2J

dc2= −dΠ

dc· ∂Ψ

∂c− Π · d

dc

(∂Ψ

∂W

)AD can help computing every term of this formula.Let’s focus for example on dΠ

dc :⇒ we can play the adjoint trick again!

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 55 / 88

Page 61: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Solving for dΠdc

Again we go back to an implicit equation, now for Π:

∂Ψ

∂W

∗· Π =

∂J

∂W

Differentiating it wrt c , we get:[d

dc(∂Ψ

∂W

∗)

]· Π +

∂Ψ

∂W

∗· dΠ

dc=

d

dc(

∂J

∂W

∗)

which rewrites as

∂Ψ

∂W

∗· dΠ

dc=

d

dc

(∂J

∂W

)− d

dc

(∂Ψ

∂W

∗· Πc0

)Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 56 / 88

Page 62: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Solving for ∂Π∂c using AD

∂Ψ∂W

∗ · dΠdc = d

dc

(∂J∂W

)− d

dc

(∂Ψ∂W

∗ · Πc0

)PsiW

on Ψ + Πc0

˙PsiW

matrix-freeresolution,

using PsiW

JW

JW

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 57 / 88

Page 63: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Outline

1 Introduction

2 Formalization

3 Reverse AD

4 Alternative formalizations

5 Memory issues in Reverse AD: Checkpointing

6 Multi-directional

7 Reverse AD for Optimization

8 AD for Sensitivity to Uncertainties

9 Some AD Tools

10 Static Analyses in AD tools

11 The TAPENADE AD tool

12 Validation of AD results

13 Expert-level AD

14 ConclusionLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 58 / 88

Page 64: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Tools for overloading-based AD

If language supports overloading (f95, c++)Tool provides:

help for “re-typing” diff variables

a library of overloaded operations

The reverse mode, or cross-country elimination, cannotbe done on the fly. Tools use

a tape recording of partial derivatives and executiontrace

a special program to compute the derivatives fromthe tape.

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 59 / 88

Page 65: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Tools for source-transformation AD

Source transformation requires complex tools, but offersmore room for optimization.

parsing →analysis →differentiationf77 type-checking tangentf9x use/kill reversec dependencies multi-directionalmatlab activity . . .. . . . . .

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 60 / 88

Page 66: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Some AD tools

nagware f95 Compiler: Overloading, tangent,reverse

adol-c : Overloading+Tape; tangent, reverse,higher-order

adifor : Regeneration ; tangent, reverse?, Store-All+ Checkpointing

tapenade : Regeneration ; tangent, reverse,Store-All + Checkpointing

taf : Regeneration ; tangent, reverse, Recompute-All+ Checkpointing

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 61 / 88

Page 67: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Some Limitations of AD tools

Fundamental problems:

Piecewise differentiability

Convergence of derivatives

Reverse AD of very large codes

Technical Difficulties:

Pointers and memory allocation

Objects

Inversion or Duplication of random control(communications, random,...)

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 62 / 88

Page 68: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Outline

1 Introduction

2 Formalization

3 Reverse AD

4 Alternative formalizations

5 Memory issues in Reverse AD: Checkpointing

6 Multi-directional

7 Reverse AD for Optimization

8 AD for Sensitivity to Uncertainties

9 Some AD Tools

10 Static Analyses in AD tools

11 The TAPENADE AD tool

12 Validation of AD results

13 Expert-level AD

14 ConclusionLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 63 / 88

Page 69: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Activity analysis

Finds out the variables that, at some locationdo not depend on any independent,or have no dependent depending on them.

Derivative either null or useless ⇒ simplifications

orig. prog tangent mode w/activity analysis

c = a*b

a = 5.0

d = a*c

e = a/c

e=floor(e)

cd = a*bd + ad*bc = a*bad = 0.0a = 5.0dd = a*cd + ad*cd = a*ced=ad/c-a*cd/c**2e = a/ced = 0.0e = floor(e)

cd = a*bd + ad*bc = a*b

a = 5.0dd = a*cdd = a*c

e = a/ced = 0.0e = floor(e)

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 64 / 88

Page 70: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Adjoint Liveness

The important result of the reverse mode is in X . The original resultY is of no interest.

The last instruction of the program P can be removed from P.

Recursively, other instructions of P can be removed too.

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 65 / 88

Page 71: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

orig. program reverse mode Adjoint Live code

IF(a.GT.0.)THEN

a = LOG(a)

ELSEa = LOG(c)CALL SUB(a)

ENDIFEND

IF(a.GT.0.)THENCALL PUSH(a)a = LOG(a)CALL POP(a)ab = ab/aELSEa = LOG(c)CALL PUSH(a)CALL SUB(a)CALL POP(a)CALL SUB_B(a,ab)cb = cb + ab/cab = 0.0END IF

IF (a.GT.0.) THEN

ab = ab/aELSEa = LOG(c)

CALL SUB_B(a,ab)cb = cb + ab/cab = 0.0END IF

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 66 / 88

Page 72: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

“To Be Restored” analysis

In reverse AD, not all values must be restored during the backwardsweep.

Variables occurring only in linear expressions do not appear in thedifferentiated instructions.⇒ not To Be Restored.

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 67 / 88

Page 73: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

x = x + EXP(a)y = x + a**2a = 3*z

reverse mode: reverse mode:naive backward sweep backward sweep with TBR

CALL POP(a)zb = zb + 3*abab = 0.0CALL POP(y)ab = ab + 2*a*ybxb = xb + ybyb = 0.0CALL POP(x)ab = ab + EXP(a)*xb

CALL POP(a)zb = zb + 3*abab = 0.0ab = ab + 2*a*ybxb = xb + ybyb = 0.0ab = ab + EXP(a)*xb

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 68 / 88

Page 74: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Aliasing

In reverse AD, it is important to know whether two variables in aninstruction are the same.

a[i] = 3*a[i+1] a[i] = 3*a[i] a[i] = 3*a[j]

variablescertainlydifferent

variablescertainly equal

? ⇒tmp = 3*a[j]

a[i] = tmp

ab[i+1]= ab[i+1]+ 3*ab[i]

ab[i] = 0.0

ab[i] = 3* ab[i] tmpb = ab[i]ab[i] = 0.0ab[j] = ab[j]

+ 3*tmpb

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 69 / 88

Page 75: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Snapshots

Taking small snapshots saves a lot of memory:

time

C{ D{Snapshot(C) = Use(C) ∩ (Write(C) ∪Write(D))

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 70 / 88

Page 76: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Undecidability

Analyses are static: operate on source, don’t knowrun-time data.

Undecidability: no static analysis can answer yes orno for every possible program : there will always beprograms on which the analysis will answer “I can’ttell”

⇒ all tools must be ready to take conservativedecisions when the analysis is in doubt.

In practice, tool “laziness” is a far more commoncause for undecided analyses and conservativetransformations.

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 71 / 88

Page 77: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Outline

1 Introduction

2 Formalization

3 Reverse AD

4 Alternative formalizations

5 Memory issues in Reverse AD: Checkpointing

6 Multi-directional

7 Reverse AD for Optimization

8 AD for Sensitivity to Uncertainties

9 Some AD Tools

10 Static Analyses in AD tools

11 The TAPENADE AD tool

12 Validation of AD results

13 Expert-level AD

14 ConclusionLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 72 / 88

Page 78: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

A word on TAPENADE

Automatic Differentiation Tool

Name: tapenade version 2.1Date of birth: January 2002Ancestors: Odyssee 1.7Address: www.inria.fr/tropics/

tapenade.html

Specialties: AD Reverse, Tangent, Vector Tangent, RestructurationReverse mode Strategy: Store-All, Checkpointing on callsApplicable on: fortran95, fortran77, and olderImplementation Languages: 90% java, 10% c

Availability: Java classes for Linux and Windows, or Web server

Internal features: Type-Checking, Read-Written Analysis,Fwd and Bwd Activity, Adjoint Liveness analysis, TBR, . . .

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 73 / 88

Page 79: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

TAPENADE on the web

http://www-sop.inria.fr/tropics

applied to industrial and academic codes:Aeronautics, Hydrology, Chemistry, Biology, Agronomy...

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 74 / 88

Page 80: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

TAPENADE Architecture

Use a general abstract Imperative Language (IL)

Represent programs as Call Graphs of Flow Graphs

trees (IL) trees (IL)

XXX parser

C parser (C)

Fortran95 parser (C)

Fortran77 parser (C)Black-box signatures

XXX printer

C printer

Fortran95 printer (Java)

Fortran77 printer (Java)

other tool

Imperative Language Analyzer (Java)

Differentiation Engine (Java)

User Interface (Java / XHTML)

API

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 75 / 88

Page 81: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

TAPENADE Program Internal Representation

using Calls-Graphs and Flow-Graphs:

trees (IL) trees (IL)

XXX parser

C parser (C)

Fortran95 parser (C)

Fortran77 parser (C)

Signaturesof externals

XXX printer

C printer

Fortran95 printer (Java)

Fortran77 printer (Java)

other tool

Imperative Language Analyzer (Java)

Differentiation Engine (Java)

API

Top

A B

C

push

pop

pop

cycle

do exit

do loop

if true if false

Entry

small(A,B)

n = 01,100

Header

DO 100 i=1,100

IF (A(i).ge.0)

A(i) = nn = n + 1B(i) = 0

A(i) = B(i)

print *,n

Exit

end

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 76 / 88

Page 82: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Outline

1 Introduction

2 Formalization

3 Reverse AD

4 Alternative formalizations

5 Memory issues in Reverse AD: Checkpointing

6 Multi-directional

7 Reverse AD for Optimization

8 AD for Sensitivity to Uncertainties

9 Some AD Tools

10 Static Analyses in AD tools

11 The TAPENADE AD tool

12 Validation of AD results

13 Expert-level AD

14 ConclusionLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 77 / 88

Page 83: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Validation methods

From a program P that evaluates

F : IRm → IRn

X 7→ Y

tangent AD creates

P : X , X 7→ Y , Y

and reverse AD creates

P : X , Y 7→ X

Wow can we validate these programs ?

Tangent wrt Divided Differences

Reverse wrt TangentLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 78 / 88

Page 84: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Validation of Tangent wrt Divided Differences

For a given X , set g(h ∈ IR) = F (X + h.Xd):

g ′(0) = limε→0

F (X + ε×X )− F (X )

ε

Also, from the chain rule:

g ′(0) = F ′(X )× X = Y

So we can approximate Y by running P twice, at points Xand X + ε× X

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 79 / 88

Page 85: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Validation of Reverse wrt Tangent

For a given X , tangent code returned Y

Initialize Y = Y and run the reverse code, yielding X .We have :

(X · X ) = (F ′t(X )× Y · X )

= Y t × F ′(X )× X

= Y t × Y

= (Y · Y )

Often called the “dot-product test”

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 80 / 88

Page 86: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Outline

1 Introduction

2 Formalization

3 Reverse AD

4 Alternative formalizations

5 Memory issues in Reverse AD: Checkpointing

6 Multi-directional

7 Reverse AD for Optimization

8 AD for Sensitivity to Uncertainties

9 Some AD Tools

10 Static Analyses in AD tools

11 The TAPENADE AD tool

12 Validation of AD results

13 Expert-level AD

14 ConclusionLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 81 / 88

Page 87: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Black-box routines

If the tool permits, give dependency signature (sparsitypattern) of all external procedures ⇒ better activityanalysis ⇒ better diff program.

FOO: ( )Inputs

Outputs

Id

Id

After AD, provide required hand-coded derivative (FOO Dor FOO B)

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 82 / 88

Page 88: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Linear or auto-adjoint procedures

Make linear or auto-adjoint procedures “black-box”.

Since they are their own tangent or reverse derivatives,provide their original form as hand-coded derivative.

In many cases, this is more efficient than pure AD ofthese procedures

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 83 / 88

Page 89: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Independent loops

If a loop has independent iterations, possibly terminatedby a sum-reduction, then

Standard: Improved:

doi = 1,N

body(i)end

doi = N,1←−−−−body(i)

end

⇐⇒doi = 1,N

body(i)←−−−−body(i)

end

In the Recompute-All context, this reduces the memoryconsumption by a factor N

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 84 / 88

Page 90: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

Outline

1 Introduction

2 Formalization

3 Reverse AD

4 Alternative formalizations

5 Memory issues in Reverse AD: Checkpointing

6 Multi-directional

7 Reverse AD for Optimization

8 AD for Sensitivity to Uncertainties

9 Some AD Tools

10 Static Analyses in AD tools

11 The TAPENADE AD tool

12 Validation of AD results

13 Expert-level AD

14 ConclusionLaurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 85 / 88

Page 91: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

AD: Context

DERIVATIVES

Div. Diff Analytic Diff

Maths AD

Overloading Source Transfo

Multi-dir Tangent Reverse

inaccuracy

programming

control

flexibility

efficiency

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 86 / 88

Page 92: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

AD: To Bring Home

If you want the derivatives of an implemented mathfunction, you should seriously consider AD.

Divided Differences aren’t good for you (nor forothers...)

Especially think of AD when you need higher order(taylor coefficients) for simulation or gradients(reverse mode) for sensitivity analysis or optimization.

Reverse AD is a discrete equivalent of the adjointmethods from control theory: gives a gradient atremarkably low cost.

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 87 / 88

Page 93: Derivative Evaluation by Automatic Differentiation of Programs · Outline 1 Introduction 2 Formalization 3 Reverse AD 4 Alternative formalizations 5 Memory issues in Reverse AD: Checkpointing

AD tools: To Bring Home

AD tools provide you with highly optimized derivativeprograms in a matter of minutes.

AD tools are making progress steadily, but the bestAD will always require end-user intervention.

Laurent Hascoet () Automatic Differentiation CEA-EDF-INRIA 2005 88 / 88