low-memory tour reversal in directed graphs1

29
Low-Memory Tour Reversal in Directed Graphs 1 Uwe Naumann , Viktor Mosenkis, Elmar Peise LuFG Informatik 12, RWTH Aachen University, Germany 1 EuroAD 9, INRIA, Nov. 2009

Upload: others

Post on 10-Apr-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Low-Memory Tour Reversal in Directed Graphs1

Low-Memory Tour Reversal in Directed Graphs1

Uwe Naumann, Viktor Mosenkis, Elmar PeiseLuFG Informatik 12, RWTH Aachen University, Germany

1EuroAD 9, INRIA, Nov. 2009

Page 2: Low-Memory Tour Reversal in Directed Graphs1

Contents

◮ Motivation: Derivative Code Compilers (dcc)

◮ Motivating Example

◮ Control Flow Reversal Problem

◮ Tour Reversal Problem

◮ Loop Compression

◮ Offline Algorithm

◮ Online Algorithm

◮ Implementation

◮ Implementation

◮ Test Results

◮ Conclusion and Outlook

Page 3: Low-Memory Tour Reversal in Directed Graphs1

Motivation: Derivative Code Compilers (dcc)

F : Rn → R

m ⇒ Tokens ⇒ AST ⇒ AAST ⇒ F ′

tangent-linear:

F (x, x) ≡ ∇F (x) · x, x ∈ Rn

adjoint:F (x, y) ≡ ∇F (x)T · y, y ∈ R

m

second-order tangent-linear:

F (x, x, x) ≡< ∇2F (x), x, x >, x, x ∈ Rn

second-order adjoint:

F (x, x, y) ≡< ∇2F (x), x, y >, x ∈ Rn, y ∈ R

m

... and higher-order tangent-linear and adjoint codes.

Page 4: Low-Memory Tour Reversal in Directed Graphs1

dcc

◮ AD by source code transformation for subset of C

◮ started off as teaching tool

◮ higher-order derivative codes by reapplication

◮ joint call tree reversal

◮ additional overloading mode

◮ main component of AC-SAMMM (see outlook)

◮ there are others2 ... :-(

2Tapenade, TAC, ADIC

Page 5: Low-Memory Tour Reversal in Directed Graphs1

Case Study (Perhaps Later ...)

Use dcc to generate

◮ TLC

◮ ADC

◮ SOTLC

◮ SOADC◮ FoR◮ RoF◮ RoR

◮ TOTLC (FoFoR)

◮ TOADC (FoFoR)

for

y = f (x) =

n−1∏

i=0

xi (Speelpenning).

Page 6: Low-Memory Tour Reversal in Directed Graphs1

Motivating Example

F : Rn → R y = F (x) =

n−1∏

i=0

xi

void F( i n t n ,f l o a t ∗x ,f l o a t &y ) {

i n t i =0;while ( i<n ) {

i f ( i ==1)y=x [ i ] ;

e l s e

y=y∗x [ i ]i=i +1;

}}

e.g n = 3

y=x [ 0 ] ;y=y∗x [ 1 ] ;y=y∗x [ 2 ] ;

xa [2 ]= y∗ ya ; ya=x [ 2 ] ∗ ya ;xa [1 ]= y∗ ya ; ya=x [ 1 ] ∗ ya ;xa [0 ]= ya ;

Page 7: Low-Memory Tour Reversal in Directed Graphs1

Motivating Example (cont’d)

void Fa ( i n t n , f l o a t ∗x , f l o a t ∗ xa , f l o a t &y , f l o a t ya ){

i n t i =0;while ( i<n ) {

i f ( i ==1){ push ( cs , 1 ) ;

push ( fds , y ) ;y=x [ i ] ; }

e l s e

{ push ( cs , 2 ) ;push ( fds , y ) ;y=y∗x [ i ] }

push ( cs , 3 ) ;push ( i d s , i ) ;i=i +1;

}

i n t bb ;while ( pop ( cs , bb ) )

switch ( bb ) {case 1 : { y=pop ( f d s ) ;

xa [ i ]=ya ;break ; }

case 2 : { y=pop ( f d s ) ;xa [ i ]=y∗ ya ;ya=x [ i ]∗ ya ;break ; }

case 3 : { i=pop ( i d s ) ;break ; }

}

}

Page 8: Low-Memory Tour Reversal in Directed Graphs1

Motivating Example (cont’d)

i n t i =0;while ( i<n ) {

i f ( i ==1){ push ( cs , 1 ) ;

push ( fds , y ) ;y=x [ i ] ; }

e l s e

{ push ( cs , 2 ) ;push ( fds , y ) ;y=y∗x [ i ] }

push ( cs , 3 ) ;push ( i d s , i ) ;i=i +1;

}

control stack (cs)

1, 2, 3, 2, 3, 2, 3, . . . , 2, 3

float data stack (fds)

y0, y1, y2, y3, y4, . . .

integer data stack (ids)

0, 1, 2, 3, 4, . . .

or0, 1, 1, 1, 1, . . .

if increments are stored

Page 9: Low-Memory Tour Reversal in Directed Graphs1

Control Flow Reversal (CFR)

Given: y = F (x) as conditional SAC:

for (j = n; j < n + p + m; j + +)if (vj−1 > 0)

vj = ϕj−n(vi )i≺j

v−1 = x0; v0 = x1

if (v0 > 0) v1 = v−1 · v0 else ...if (v1 > 0) v2 = sin(v1) else ...if (v2 > 0) v3 = v−1 · v2 else ...if (v3 > 0) v4 = v3/v0 else ...if (v4 > 0) v5 = cos(v3) else ...x0 = v4; x1 = v5

-1 0

1

2

3

5 4

DAG G = (V ,E )

Wanted: predecessors of all vertices in reverse order requiringevaluation of conditions in reverse order, i.e. the values ofv4, v3, v2, v1, v0

Page 10: Low-Memory Tour Reversal in Directed Graphs1

CFR is NP-complete

... by reduction from DAG Reversal

... from Fixed Cost DAG Reversal

◮ unit cost for storage andrecomputation

◮ fixed total cost |V |

◮ minimize storage

... from Vertex Cover enabling recom-putation at unit cost from stored argu-ments.

-1 0

1

2

3

5 4

◮ U. Naumann: DAG Reversal is NP-complete, Journal ofDiscrete Algorithms, Volume 7, Issue 4, December 2009, Pages402-410.

Page 11: Low-Memory Tour Reversal in Directed Graphs1

Tour Reversal

1 : . . .f o r ( i =0; i <2∗n , i++) {

2 : i f ( i %2){ . . .}3 : e l s e { . . . }

}4 : . . .

2

1

3

4

0

1

0

1

... by enumeration of basic blocks

1, 2, 3, . . . , 2, 3, 4

... by counting loop traversals and flagging branches

0, 1 . . . , 0, 1, 2n (reducible control flow only!)

Tour Reversal ProblemUse minimal memory to reverse a tour (i1, i2, . . . , il ) in a directedgraph G = (V ,E ).

Page 12: Low-Memory Tour Reversal in Directed Graphs1

Loop Compression

Enumerate Basic Blocks

◮ uncompressed tour:1, 2, 3, . . . , 2, 3, 42n + 2 integers

◮ compressed tour: 1, 2, 3, [n, 2], 46 integers

Count loops and flag branches

◮ uncompressed tour:0, 1, 0, 1, 0, 1, . . . , 0, 1, 2n2n + 1 integers

◮ compressed tour: 0, 1, [n, 2], 2n5 integers

1 : . . .f o r ( i =0; i <2∗n , i++) {

2 : i f ( i %2){ . . .}3 : e l s e { . . . }

}4 : . . .

Page 13: Low-Memory Tour Reversal in Directed Graphs1

Offline Algorithm

Dynamic programming compresses according to

f (di ,j) =

1 i = j

mini≤s<j

f (di ,s ◦ ds+1,j) otherwise,

where di ,j is the optimally compressed subtour from i -th to j-thindex and

f (di ,j) := |N ∩ V | (loop elements not counted).

Problems

◮ need to see whole tour in order to find an optimal solution

◮ slow (O(n3) for tour of length n)

Page 14: Low-Memory Tour Reversal in Directed Graphs1

Online Algorithm

1, 2, 3, . . . , 2, 3, 4 becomes

push(1): 1

push(2): 1, 2

push(3): 1, 2, 3

push(2): 1, 2, 3, 2

push(3): 1, 2, 3, [2, 2]

push(2): 1, 2, 3, [2, 2], 2

push(3): 1, 2, 3, [3, 2]

...

last push: 1, 2, 3, [n, 2], 4

pop: 1, 2, 3, [n, 2]

pop: 1, 2, 3, [n −1, 2], 2

◮ push◮ pushes element to top of

stack◮ compresses the loops on

window of predefined size

◮ pop◮ unrolls one instance of

last loop if needed◮ returns element from top

of stack and removes itfrom stack

Page 15: Low-Memory Tour Reversal in Directed Graphs1

Test Results

Problem uncompr. stack size compr. stack size compr. rate

burger 3432 MB 144 Byte 25013889

bratu 4576 MB 168 Byte 28572380

ljc 3692 MB 704 KByte 5500

Problem Window size Runtime rate

burger 51 ≈ 2.5

bratu 51 ≈ 2.8

ljc 51 ≈ 5.5

ljc 11 ≈ 3.3

Page 16: Low-Memory Tour Reversal in Directed Graphs1

Implementation

◮ C++ front-end◮ distribution: ccistack.{hpp,cpp}◮ class ccistack◮ void ccistack::push(int) and int ccistack::pop()

◮ C front-end◮ distribution: cistack.{h,c}◮ void init (struct cistack *k,

int MaxStackSize, int WindowSize)

to allocate◮ void finalize (struct cistack *k) to free◮ void push (struct cistack *k, int i)◮ int pop (struct cistack *k) and

int empty (struct cistack *k)

◮ Fortran front-end◮ uses C front-end

Page 17: Low-Memory Tour Reversal in Directed Graphs1

C++ Front-End

#include <f s t r eam>using namespace s t d ;#include ” c i s t a c k . h”

i n t main ( i n t argc , char∗ a rgv [ ] ) {i f s t r e a m i n f i l e ( a rgv [ 1 ] ) ;o f s t r eam o u t f i l e ( a rgv [ 2 ] ) ;c i s t a c k cs (100 , 21 ) ; i n t v ;whi le ( ! i n f i l e . e o f ( ) ) {

i n f i l e >> v ;i f ( ! i n f i l e . e o f ( ) ) cs . push ( v ) ;

}whi le ( ! cs . empty ( ) )

o u t f i l e << cs . pop ( ) << end l ;return 0 ;

}

Page 18: Low-Memory Tour Reversal in Directed Graphs1

C Front-End

#include ” c i s t a c k . h”

i n t main ( i n t argc , char∗ a rgv [ ] ) {s t ruct c i s t a c k cs ;FILE∗ i n f i l e=fopen ( a rgv [ 1 ] , ” r+” ) ;FILE∗ o u t f i l e=fopen ( a rgv [ 2 ] , ”w+” ) ;i n i t (&cs , 1 0 0 , 4 ) ;i n t v ;whi le ( f s c a n f ( i n f i l e , ”%i ” ,&v)>0) push(&cs , v ) ;whi le ( ! empty ( ) ) f p r i n t f ( o u t f i l e , ”%i \n” , pop(&cs ) ) ;f i n a l i z e (&cs ) ;f c l o s e ( c sou t ) ; f c l o s e ( o u t f i l e ) ; f c l o s e ( i n f i l e ) ;

}

Page 19: Low-Memory Tour Reversal in Directed Graphs1

Conclusion (I)

◮ powerful control flow reversal tool ready to use for developersof adjoint compiler technology and/or debugging/profilingtools

◮ paper (almost) ready for submission

Outlook

◮ compression of integer data stack

◮ compression of float data stack

Page 20: Low-Memory Tour Reversal in Directed Graphs1

Conclusion (II)

You need a (adjoint) derivative code compiler if

◮ finite differences cannot be trusted

◮ finite differences or exact forward sensitivities are tooexpensive

◮ you are unable to build and solve the adjoint system manually

You need to invest3, 6, 18, 36

(wo)man months for sustained

runtime of adjoint

runtime of original simulation

of50, 20, < 10, < 4

Page 21: Low-Memory Tour Reversal in Directed Graphs1

Further Ongoing Activities at STCE

◮ CompAD-III (J. Riehme and D. Gendler, with UHerts andNAG)

◮ Adjoint error correction in ICON (J. Riehme and K. Leppkes,with MPI-M)

◮ Model reliability analysis in Sisyphe/Telemac (J. Riehme, withBAW)

◮ Data assimilation in JURASSIC (E. Varnik and M. Forster,with ICG-I at FZJ)

◮ dcc and the AaChen platform for Structured AutomaticManipulation of Mathematical Models (AC-SAMMM)(M. Forster and B. Gendler, with AVT)

◮ Toward adjoint MPI (M. Schanen, with ANL and INRIA)

◮ Pushing TBR (with ANL and INRIA)

◮ Call tree reversal (H. Lakhdar and X. Jin)

Page 22: Low-Memory Tour Reversal in Directed Graphs1

Further Ongoing Activities at STCE (cont.)

◮ Elimination techniques in linearized DAGs (V. Mosenkis)

◮ Higher-order adjoints for uncertainty quatification (M.Beckers, with UHerts)

◮ Adjoint subgradients for McCormick relaxations (M. Beckers,V. Mosenkis, and M. Maier, with MechEng at MIT)

◮ What color is the non-constant part of your Jacobian?(E. Varnik and L. Razik)

◮ Shared-memory parallelism in tape-based adjoints (K. Leppkesand J. Riehme)

◮ Submitted: Hybrid AD for C/C++ (ADOL-C + dcc,D. Gendler, with UPaderborn)

Page 23: Low-Memory Tour Reversal in Directed Graphs1

The AaChen platform for Structured Automatic

Manipulation of Mathematical Models

dcc Configuration

...DyOSMexa

Solution Solution

Problem

I2

I1

dcc

I3tailored AD

manipulation

symbolicAD

model refinement

ModelsDerivative Custom

Standard (AC−SAMMM)

MathematicalSubmodels inC− (imperative)

MathematicalModel in C−+(descriptive)

Diversity (The World)

Model in Mathematical

− Modelica

− gPROMS− ...

ToC−+

Page 24: Low-Memory Tour Reversal in Directed Graphs1

Global Optimization using McCormick Relaxations

-1

0

1

2

3

4

5

-6 -4 -2 0 2 4

Original functionNat. interval ext. underest.

Convex underestimatorAffine underestimator

Global upper bound

(adjoint) subgradients by NAG Fortran compiler based on� A. Mitsos, B. Chachuat, and P. I. Barton: McCormick-Based Relaxations of

Algorithms, SIAM Journal on Optimization, 2009.

(C. Corbett, M. Beckers, V. Mosenkis, M. Maier)

Page 25: Low-Memory Tour Reversal in Directed Graphs1

Uncertainty Quantification / Robust Optimizatione.g. minµx (µy +

σ2y ) where y = F (x) and w.l.o.g. F : R → R

using

◮ approximate mean

µy = F (µx) +F ′′(µx)

2· σ2

x

◮ approximate variance

σ2y = F ′(µx)

2 σ2x + F ′(µx)F ′′(µx)Sx σ3

x

+1

4

(

F ′′(µx))2

(Kx − 1)σ4x

for given initial mean µx , variance σ2x , skewness Sx , and kurtosis

Kx of x ∈ R. Second-order method (e.g. Newton) requiresderivatives up to fourth order. (M. Beckers)

Page 26: Low-Memory Tour Reversal in Directed Graphs1

Adjoint (Total Model) Error Correction

... in ICON GCM: ICOsahedral Non-hydrostatic General Circulation Modeldeveloped by Max Planck Institute forMeteorology and Deutscher Wetterdienst.

(Discrete) Adjoint by NAG FortranCompiler. (J. Riehme)

(References to) theoretical foundations e.g. in

� R. Rannacher, F.-T. Suttmeier: A posteriori error control in finite element

methods via duality techniques: Application to perfect plasticity.

Computational Mechanics, 1998.

� M.B. Giles, N.A. Pierce and E. Suli: Progress in adjoint error correction for

integral functionals, Computing and Visualization in Science, 2004.

Page 27: Low-Memory Tour Reversal in Directed Graphs1

Adjoint Error Correction (Linear Case)

Given: xf = F (A, xs) after solution of A · xf = xs and value oflinear objective J = J(g , xf ) =< g , xf > .

Wanted: corrected objective J =< g , xcf > − < xc

s ,A · xcf − xs > .

Solution:AT · xs = xf = g

NOT using discrete adjoint code

xf = A−1 · xs

J =< g , xf >

xf = g · J = g

xs = A−T · xf

produced by derivative code compiler.

Page 28: Low-Memory Tour Reversal in Directed Graphs1

Adjoint Error Correction (Nonlinear Case)

Given: xf = F (A, xs) after solution of N(x) = 0 and nonlinearobjective J = J(xf ) with gradient g ≡ ∂J

∂xf.

Wanted: corrected objective J = J(xcf )− < xc

s ,N(xcf ) > .

Solution to the adjoint system preferably using discrete adjointcode

xf = F (xs)

J = J(xf )

xf = g · J = g

xs= F ′(xs)T · xf

produced by derivative code compiler.

Page 29: Low-Memory Tour Reversal in Directed Graphs1

Objective: Potential Energy wrt. Height Field

rotating earth; all water except at poles; objective: potentialenergy somewhere in middle of the Sahara ... (F. Rauser, P. Korn)

Test Case 3: Unsteady Solid Body Rotation from� Laeuter: Unsteady analytical solutions of the spherical shallow water

equations. Journal of Computational Physics, 2005