my postdoctoral research

16

Upload: po-ting-wu

Post on 12-Aug-2015

183 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: My Postdoctoral Research

Where Do We Need Derivatives?

Numerical Methods:

Solution of ODE, DAE, Optimization, Nonlinear equations.

Sensitivity Analysis:

How does a computer model react to perturbations in input parame-ters or model \constants?"

Design Optimization:

Choose parameters such that model computes \better" design.

Data Assimilation & Inverse Problems:

Find values for model parameters such that model reproduces exper-imentally obtained results.

Derivatives play a central role as the Taylor Series allows to

predict the e�ect of changes in input parameters, e.g.:

f (x + �x) � f (x) +@ f

@ x�xT +O(jj�xjj2)

Page 2: My Postdoctoral Research

Approaches to Computing Derivatives

By Hand:

Tedious and Error-Prone

Divided Di�erences:

Can't assess reliability. Di�cult to assess numerical accuracy (e.g.,

truncation and cancellation error) and expensive when computingderivatives w.r.t. many independent variables.

one-sided di�s:@ f (x)

@ xi

jx=xo�

f (xo � h � ei)� f (xo)

h

central di�s:@ f (x)

@ xi

jx=xo�

f (xo + h � ei)� f (xo � h � ei)

2h

Symbolic:

Infeasible for large codes. Not directly applicable to larger programs

with loops and branches. (e.g., Maple, Mathematica)

Automatic Di�erentiation:

� Requires little human time

� Incurs no truncation error

� Attractive computational complexity

� Applicable to codes of arbitrary size

Page 3: My Postdoctoral Research

Hierarchical Structure of ADIFOR

AlternativesLots of

Program

Procedure

Loop Nest

Loop Body

Basic Block

Statement

Expression

ADIFOR Approach

Page 4: My Postdoctoral Research

Analysis

Fortran

Code

AD Intrinsics

Template

Expander

Derivative

Fortran

CodeComputing

Derivative

Code

Compile

and Link

Preprocessor

ADIFOR

Library

AD Intrinsics

Derivative

User’s

Driver

Library

SparsLinC

Computational Differentiation

at Argonne National Laboratory

The ADIFOR System

Page 5: My Postdoctoral Research

Iterative

Solvers

ODE’s, DAE’s

Optimization

Fortran

(77,90,M,HPF)

Little

Languages

C, C++

MPI,PVM

Pseudo-Adjoints, Interface

Contraction, Breaking Dependencies

Non-smooth functionsHessians

New

Capabilities

New

Languages

Chain

Rule

Associativity

Numerical

Methods

The Big Picture of AD Tools

Page 6: My Postdoctoral Research

A Modular Approach to Building AD Tools

Parallel Output Program

Unparsing

Derivative Augmentation

Differentiation Executive

Parsing and Canonicalization Program Analysis

Input Program

Annotated Intermediate Representation

Parallel DerivativeRun-timeSystem

Page 7: My Postdoctoral Research

Time-Parallel Scheme for Derivative Computing(FORTRAN-M Implementation)

Chain rule associativity breaks dependencies and generates newtask parallelism (in addition to existing one!).

ManagerMatrix-matrix

Master Wrapper

Multiplier

parallel_to_MM channel

parallel_to_MM channel

Gradient Process 1

manager_to_parallel channel

manager_to_parallel channel

idle channel

idle channel

...Serial top-level

Gradient Process N

serial_to_manager channel

x y

w

y z

z

x

y

dw/dx

t t+1 t+2dH /dx dH /dy dH /dz

Ht Ht+1

proc. 0

proc. 1

proc. 2

7 22 36 50 65 79 94

Compute_Der Compute_Fun Compute_Mat Receive Send

0

1

2

3

4

5

6

7

8

Page 8: My Postdoctoral Research

Time-Parallel Scheme for Derivative Computing(MPI Implementation)

Chain rule associativity breaks dependencies and generates newtask parallelism (in addition to existing one!).

t+2dH /dzdH /dyt+1dH /dxt

x yHt Ht+1y

x y

zx tH Ht+1

dw/dx

wproc. 0

proc. 1

proc. 2

zy

Master Wrapper

Manager(option)

Gradient Process 1

Matrix-matrixMultiplier

Gradient Process N

parallel_to_MM channel

parallel_to_MM channel

manager_to_parallel channel

manager_to_parallel channel

idle channel

idle channel

...

3.0 9.1 15.1 21.2 27.2 33.3 39.3

Compute_Der Compute_Fun Compute_Mat Receive Send

0

1

2

3

4

5

6

7

8

9

Page 9: My Postdoctoral Research

Parallel System Design with Task Manager

The parallel-task manager process will keep track of which pro-cesses are active, and select an inactive process and send anactivations message to that process. This allows for a het-

erogeneous compute situation, where we might have a slowerprocessor.

4.9 14.6 24.3 34.0 43.7 53.4 63.1

Compute_Der Compute_Fun Compute_Mat Receive Send

0

1

2

3

4

(System Design without Task Manager)

5.0 15.0 25.0 35.0 45.0 55.0 65.0

Compute_Der Compute_Fun Compute_Mat Receive Send

0

1

2

3

4

5

(System Design with Task Manager)

For the parallel resource utilization, spawning parallel gradi-ents computing can be done either by the round-robin scheme

statically (top), or by introducing a task manager dynamically(bottom).

Page 10: My Postdoctoral Research

Parallel System Design with Task Manager

The parallel-task manager process will keep track of which pro-cesses are active, and select an inactive process and send anactivations message to that process. This allows for a het-

erogeneous compute situation, where we might have a slowerprocessor.

4.2 12.5 20.8 29.1 37.4 45.7 54.0

Compute_Der Compute_Fun Compute_Mat Receive Send

0

1

2

3

4

(System Design without Task Manager)

4.2 12.6 21.0 29.4 37.8 46.2 54.6

Compute_Der Compute_Fun Compute_Mat Receive Send

0

1

2

3

4

5

(System Design with Task Manager)

For the parallel resource utilization, spawning parallel gradi-ents computing can be done either by the round-robin scheme

statically (top), or by introducing a task manager dynamically(bottom).

Page 11: My Postdoctoral Research

Upshot: Parallel Performance Analysis

64 191 319 446 573 701 828

Compute_Der Compute_Fun Compute_Mat Receive Send

0

1

2

3

4

(ADIFOR Dense)

65 196 326 457 587 717 848

Compute_Der Compute_Fun Compute_Mat Receive Send

0

1

2

3

4

(ADIFOR Color)

76 228 380 533 685 837 989

Compute_Der Compute_Fun Compute_Mat Receive Send

0

1

2

3

4

(ADIFOR Sparse)

76 227 378 529 680 831 982

Compute_Der Compute_Fun Compute_Mat Receive Send

0

1

2

3

4

(ADIFOR Mixed-1)

94 283 471 659 848 1036 1224

Compute_Der Compute_Fun Compute_Mat Receive Send

0

1

2

3

4

(ADIFOR Mixed-2)

Page 12: My Postdoctoral Research

Speedup for ADIFOR Application:Shallow Water Equations model (SWE)

The serial and parallel speedup for the ShallowWater Equations

model (SWE), which utilizes a time-dependent leapfrog scheme.

Shallow Water Equations model (SWE)

grid size = 21x21 n = 3*21*21 = 1323, p = 4, s = n + p = 1327machine: IBM SP, time-loop: 40

0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

160.00

ADIFOR Serial Parallel: 1 2 4 8 16 32

no. of derivative slaves

Spe

edup

DenseColorSparseMixed-1Mixed-2

The serial speedup has been done by employing the chain ruleand the sparsity patterns. Chain rule associativity breaks de-

pendencies and generates new task parallelism.

Page 13: My Postdoctoral Research

ADIFOR Application:Shallow Water Equations model (SWE)

The Shallow Water Equations model (SWE), which utilizes a

time-dependent leapfrog scheme.

We let Z(t); Z(t � 1) denote the current and previous state of

the time-dependent system. The next state is obtained by

Z(t + 1) = G(Z(t); Z(t + 1);W;B(t + 1); Obs(t+ 1))

where G is the time-stepping operator, W are the time-independent parameters, B(t + 1) are the next boundary con-ditions, and Obs(t+ 1) are observations of the next state.

05

1015

2025

0

5

10

15

20

25−50

−40

−30

−20

−10

0

10

20

Shallow Water Equations model (SWE)

05

1015

2025

0

5

10

15

20

25−10

−8

−6

−4

−2

0

2

4

x 106

Shallow Water Equations model (SWE) AD−Sensitivity

4-D variational data assimilation with shallow water equations(SWE) when controlling both boundary and initial conditions(left) and its sensitivity to a uniform relative change in the

observations and weights (right).

Page 14: My Postdoctoral Research

ADIFOR Application: MM5 PSU/NCARMesoscale Weather Model

The Fifth-Generation Penn State/NCAR Mesoscale WeatherModel (MM5) is regional forecasting model. See \A Description

of the Fifth-Generation Penn State/NCAR Mesoscale WeatherModel (MM5)", G. A. Grell, J. Dudhia, and D. R. Stau�er,NCAR/TN-398+STR, 1994.

Water vapor mass fraction (left) and its sensitivity to a uniform

relative change in the surface pressure �eld (right).

Page 15: My Postdoctoral Research

MM5's Sensitivity to Initial Temperature

Grid size: 63� 63� 23.Median distance of grid points: 101 km.Radius of perturbation: 4.6 grid points.

Sensitivity of Temperature in deg/deg attime t = 0h 30min (6th time step) on the519 mb sigma-level.

Page 16: My Postdoctoral Research

ADIFOR Application:High-Speed Civil Transport

MARSEN: 3-D marching Euler code - Vamshi Mohan Ko-

rivi and Art Taylor, Old Dominion University, Perry Newman,

NASA Langley

Aerodyn. Opt. Studies using a 3-D Supersonic

Euler Code with E�cient Calculation of Sensi-

tivity Derivatives, V. M. Korivi, P. Newman, A.

Taylor, AIAA-94-4270-CP, 1994.