parallelising dynamic programming

65
Parallelising Dynamic Programming Raphael Reitzig University of Kaiserslautern Department of Computer Science Algorithms and Complexity Group September 27th, 2012

Upload: raphael-reitzig

Post on 14-Apr-2017

150 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Parallelising Dynamic Programming

Parallelising Dynamic Programming

Raphael Reitzig

University of KaiserslauternDepartment of Computer ScienceAlgorithms and Complexity Group

September 27th, 2012

Page 2: Parallelising Dynamic Programming

VisionCompile dynamic programming recurrences into efficient parallelcode.

Page 3: Parallelising Dynamic Programming

Goal 1Understand what efficiency means in parallel algorithms.

Goal 2Characterise dynamic programming recurrences in a suitable way.

Goal 3Find and implement efficient parallel algorithms for DP.

Page 4: Parallelising Dynamic Programming

Goal 1Understand what efficiency means in parallel algorithms.

Goal 2Characterise dynamic programming recurrences in a suitable way.

Goal 3Find and implement efficient parallel algorithms for DP.

Page 5: Parallelising Dynamic Programming

Goal 1Understand what efficiency means in parallel algorithms.

Goal 2Characterise dynamic programming recurrences in a suitable way.

Goal 3Find and implement efficient parallel algorithms for DP.

Page 6: Parallelising Dynamic Programming

Analysing Parallelism

Page 7: Parallelising Dynamic Programming

Complexity theory

Classifies problems

Focuses on inherent parallelism

Answers: How many processors do you need to be really faston inputs of a given size?

But......p grows with n – no statement about constant p and growing n!

Page 8: Parallelising Dynamic Programming

Complexity theory

Classifies problems

Focuses on inherent parallelism

Answers: How many processors do you need to be really faston inputs of a given size?

But......p grows with n – no statement about constant p and growing n!

Page 9: Parallelising Dynamic Programming

Complexity theory

Classifies problems

Focuses on inherent parallelism

Answers: How many processors do you need to be really faston inputs of a given size?

But......p grows with n – no statement about constant p and growing n!

Page 10: Parallelising Dynamic Programming

Complexity theory

Classifies problems

Focuses on inherent parallelism

Answers: How many processors do you need to be really faston inputs of a given size?

But......p grows with n – no statement about constant p and growing n!

Page 11: Parallelising Dynamic Programming

Amdahl’s law

Parallel speedup ≤ 11−γ+ γ

p.

Answers: How many processors can you utilise on given inputs?

But......does not capture growth of n!

Page 12: Parallelising Dynamic Programming

Amdahl’s law

Parallel speedup ≤ 11−γ+ γ

p.

Answers: How many processors can you utilise on given inputs?

But......does not capture growth of n!

Page 13: Parallelising Dynamic Programming

Amdahl’s law

Parallel speedup ≤ 11−γ+ γ

p.

Answers: How many processors can you utilise on given inputs?

But......does not capture growth of n!

Page 14: Parallelising Dynamic Programming

Work and depth

Work W = T A1 and depth D = T A

Brent’s Law: A with Wp ≤ T A

p < Wp + D is possible in a certain

setting.

But......has limited applicability and D can be slippery!

Page 15: Parallelising Dynamic Programming

Work and depth

Work W = T A1 and depth D = T A

Brent’s Law: A with Wp ≤ T A

p < Wp + D is possible in a certain

setting.

But......has limited applicability and D can be slippery!

Page 16: Parallelising Dynamic Programming

Work and depth

Work W = T A1 and depth D = T A

Brent’s Law: A with Wp ≤ T A

p < Wp + D is possible in a certain

setting.

But......has limited applicability and D can be slippery!

Page 17: Parallelising Dynamic Programming

Relative runtimes

Speedup SAp :=

TA1

TAp

Efficiency EAp := TB

p·TAp

But......what are good values?

Clear: SAp ∈ [0, p] and EA

p ∈ [0, 1]

– but we can certainly not alwayshit the optima!

Page 18: Parallelising Dynamic Programming

Relative runtimes

Speedup SAp :=

TA1

TAp

Efficiency EAp := TB

p·TAp

But......what are good values?

Clear: SAp ∈ [0, p] and EA

p ∈ [0, 1]

– but we can certainly not alwayshit the optima!

Page 19: Parallelising Dynamic Programming

Relative runtimes

Speedup SAp :=

TA1

TAp

Efficiency EAp := TB

p·TAp

But......what are good values?

Clear: SAp ∈ [0, p] and EA

p ∈ [0, 1]

– but we can certainly not alwayshit the optima!

Page 20: Parallelising Dynamic Programming

Relative runtimes

Speedup SAp :=

TA1

TAp

Efficiency EAp := TB

p·TAp

But......what are good values?

Clear: SAp ∈ [0, p] and EA

p ∈ [0, 1]

– but we can certainly not alwayshit the optima!

Page 21: Parallelising Dynamic Programming

Relative runtimes

Speedup SAp :=

TA1

TAp

Efficiency EAp := TB

p·TAp

But......what are good values?

Clear: SAp ∈ [0, p] and EA

p ∈ [0, 1] – but we can certainly not alwayshit the optima!

Page 22: Parallelising Dynamic Programming

Proposal: Asymptotic relative runtimes

Definition

SAp(∞) := lim inf

n→∞SAp(n)

?= p

EAp (∞) := lim inf

n→∞EAp (n)

?= 1

GoalFind parallel algorithms that are asymptotically as scalable andefficient as possible for all p.

Page 23: Parallelising Dynamic Programming

Proposal: Asymptotic relative runtimes

Definition

SAp(∞) := lim inf

n→∞SAp(n)

?= p

EAp (∞) := lim inf

n→∞EAp (n)

?= 1

GoalFind parallel algorithms that are asymptotically as scalable andefficient as possible for all p.

Page 24: Parallelising Dynamic Programming

Disclaimer

This means:A good parallel algorithm can utilise any number of processors ifthe inputs are large enough.

Not:More processors are always better.

Just as in sequential algorithmics.

Page 25: Parallelising Dynamic Programming

Disclaimer

This means:A good parallel algorithm can utilise any number of processors ifthe inputs are large enough.

Not:More processors are always better.

Just as in sequential algorithmics.

Page 26: Parallelising Dynamic Programming

Disclaimer

This means:A good parallel algorithm can utilise any number of processors ifthe inputs are large enough.

Not:More processors are always better.

Just as in sequential algorithmics.

Page 27: Parallelising Dynamic Programming

Afterthoughts

Machine modelKeep it simple: (P)RAM with p processors and spawn/join.

Which quantities to analyse?

Elementary operations, memory accesses, inter-threadcommunication, ...

Implicit interaction – blocking, communication via memory, ... – isinvisible in code!

Page 28: Parallelising Dynamic Programming

Afterthoughts

Machine modelKeep it simple: (P)RAM with p processors and spawn/join.

Which quantities to analyse?

Elementary operations, memory accesses, inter-threadcommunication, ...

Implicit interaction – blocking, communication via memory, ... – isinvisible in code!

Page 29: Parallelising Dynamic Programming

Afterthoughts

Machine modelKeep it simple: (P)RAM with p processors and spawn/join.

Which quantities to analyse?

Elementary operations, memory accesses, inter-threadcommunication, ...Implicit interaction – blocking, communication via memory, ... – isinvisible in code!

Page 30: Parallelising Dynamic Programming

Attacking Dynamic Programming

Page 31: Parallelising Dynamic Programming

Disclaimer

Only two dimensions

Only finite domains

Only rectangular domains

Memoisation-table point-of-view

Page 32: Parallelising Dynamic Programming

Reducing to dependencies

e(i , j) :=

0 i = j = 0

j i = 0 ∧ j > 0

i i > 0 ∧ j = 0

min

e(i − 1, j) + 1

e(i , j − 1) + 1

e(i − 1, j − 1) + [ vi 6= wj ]

else

Page 33: Parallelising Dynamic Programming

Reducing to dependencies

e(i , j) :=

0 i = j = 0

j i = 0 ∧ j > 0

i i > 0 ∧ j = 0

min

e(i − 1, j) + 1

e(i , j − 1) + 1

e(i − 1, j − 1) + [ vi 6= wj ]

else

Page 34: Parallelising Dynamic Programming

Gold standard

Page 35: Parallelising Dynamic Programming

?

Page 36: Parallelising Dynamic Programming

?

Page 37: Parallelising Dynamic Programming

?

Page 38: Parallelising Dynamic Programming

?

Page 39: Parallelising Dynamic Programming

?

Page 40: Parallelising Dynamic Programming
Page 41: Parallelising Dynamic Programming
Page 42: Parallelising Dynamic Programming
Page 43: Parallelising Dynamic Programming

?

Page 44: Parallelising Dynamic Programming
Page 45: Parallelising Dynamic Programming
Page 46: Parallelising Dynamic Programming
Page 47: Parallelising Dynamic Programming
Page 48: Parallelising Dynamic Programming

Simplification

DL D DR

UL U UR

L R

Page 49: Parallelising Dynamic Programming

Three cases

Impossible

Possible

Page 50: Parallelising Dynamic Programming

Three cases

Impossible

Possible

Page 51: Parallelising Dynamic Programming

Three cases

Assuming dependencies are area-complete and uniform, there areonly three cases up to symmetry:

Page 52: Parallelising Dynamic Programming

Facing Reality

Page 53: Parallelising Dynamic Programming

Challenges

Contention

Method of synchronisation

Metal issues (moving threads, cache sync)

Page 54: Parallelising Dynamic Programming

Challenges

Contention

Method of synchronisation

Metal issues (moving threads, cache sync)

Page 55: Parallelising Dynamic Programming

Challenges

Contention

Method of synchronisation

Metal issues (moving threads, cache sync)

Page 56: Parallelising Dynamic Programming

Performance Examples

Edit distance on two-core shared memory machine:

0 0.2 0.4 0.6 0.8 1 1.2 1.4·105

0

0.5

1

1.5

2

2.5

0 0.2 0.4 0.6 0.8 1 1.2 1.4·105

0

0.5

1

1.5

2

2.5

Page 57: Parallelising Dynamic Programming

Performance Examples

Edit distance on four-core NUMA machine:

0 1 2 3 4·105

0

1

2

3

4

0 1 2 3 4·105

0

1

2

3

4

Page 58: Parallelising Dynamic Programming

Performance Examples

Pseudo-Bellman-Ford on two-core shared memory machine:

0 0.2 0.4 0.6 0.8 1 1.2 1.4·105

0

0.5

1

1.5

2

2.5

0 0.2 0.4 0.6 0.8 1 1.2 1.4·105

0

1

2

3

4

Page 59: Parallelising Dynamic Programming

Performance Examples

Pseudo-Bellman-Ford on four-core NUMA machine:

0 1 2 3 4·105

0

1

2

3

4

0 1 2 3 4·105

0

2

4

6

8

Page 60: Parallelising Dynamic Programming

Future Work

Fill gaps in theory (caching and communication).

Generalise theory to more dimensions and interleaved DPs.

Improve and extend implementations.

More experiments (different problems, more diverse machines).

Improve compiler integration (detection, backtracing, result

functions).

Integrate with other tools.

Page 61: Parallelising Dynamic Programming

Future Work

Fill gaps in theory (caching and communication).

Generalise theory to more dimensions and interleaved DPs.

Improve and extend implementations.

More experiments (different problems, more diverse machines).

Improve compiler integration (detection, backtracing, result

functions).

Integrate with other tools.

Page 62: Parallelising Dynamic Programming

Future Work

Fill gaps in theory (caching and communication).

Generalise theory to more dimensions and interleaved DPs.

Improve and extend implementations.

More experiments (different problems, more diverse machines).

Improve compiler integration (detection, backtracing, result

functions).

Integrate with other tools.

Page 63: Parallelising Dynamic Programming

Future Work

Fill gaps in theory (caching and communication).

Generalise theory to more dimensions and interleaved DPs.

Improve and extend implementations.

More experiments (different problems, more diverse machines).

Improve compiler integration (detection, backtracing, result

functions).

Integrate with other tools.

Page 64: Parallelising Dynamic Programming

Future Work

Fill gaps in theory (caching and communication).

Generalise theory to more dimensions and interleaved DPs.

Improve and extend implementations.

More experiments (different problems, more diverse machines).

Improve compiler integration (detection, backtracing, result

functions).

Integrate with other tools.

Page 65: Parallelising Dynamic Programming

Future Work

Fill gaps in theory (caching and communication).

Generalise theory to more dimensions and interleaved DPs.

Improve and extend implementations.

More experiments (different problems, more diverse machines).

Improve compiler integration (detection, backtracing, result

functions).

Integrate with other tools.