work–stealing.by.stealing.state.from.....

38
Work–Stealing by Stealing State from Live Stack Frames of a Running Applica<on Generously supported by IBM & the Australian Research Council 1 Stephen M. Blackburn Australian Na?onal University Vivek Kumar Australian Na<onal University Daniel Frampton Australian Na?onal University Olivier Tardieu IBM T.J. Watson Research Center David Grove IBM T.J. Watson Research Center

Upload: others

Post on 24-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

Work–Stealing  by  Stealing  State  from    Live  Stack  Frames  of  a  Running  Applica<on    

Generously  supported  by  IBM  &  the  Australian  Research  Council   1  

Stephen  M.  Blackburn  

Australian  Na?onal  University  

Vivek  Kumar  

Australian  Na<onal  University  

Daniel  Frampton  

Australian  Na?onal  University  

Olivier  Tardieu  IBM  T.J.  Watson  Research  Center  

David  Grove  IBM  T.J.  Watson  Research  Center  

Page 2: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

Motivation

•  Multicore era – Dynamic task parallelism

2 Kumar  et  al.  

– Work–sharing •  Central task queue •  Scalability bottleneck with increase in threads

– Work–stealing •  Fixed number of threads •  One task queue per thread

•  Load balancing

Page 3: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

3 Kumar  et  al.   3

OUT IN

Work–Stealing

A

Page 4: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

3 Kumar  et  al.  

Work–Stealing

A

Page 5: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

IN OUT

3 Kumar  et  al.  

Work–Stealing

A B

Page 6: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

VICTIM

OUT IN IN OUT

3 Kumar  et  al.  

Work–Stealing

A B

Page 7: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

A B C D E F G H I J K L M N O

IN OUT

4 Kumar  et  al.  

A B C D E F G H I J K L M N O

Overheads

A B C D E F G H I J K L M N O

A B C D E F G

H I J K L M N O

VICTIM THIEF

Coordination

Enough context provided ??

Control Flow

A B

A

Page 8: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

5

X10 (2.0) Fib (40) – Single Thread Work-Stealing Execution Time Normalized with Sequential Execution Time

Tim

e

Kumar  et  al.  

Overheads

Overheads

Page 9: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

5

X10 (2.0) Fib (40) – Single Thread Work-Stealing Execution Time Normalized with Sequential Execution Time

Tim

e

Kumar  et  al.  

Overheads

Control Flow

Page 10: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

public static def main() { val n = 20; val a:int; val b:int; finish { async{ a = A(n);} b = B(n-1); } val result = a + b; }

Overhead – Control Flow final static class _$main extends MainFrame { public def fast(worker:Worker):void { this.n = 20; this._pc = 1; val tmp:_$mainF0 = new _$mainF0(this); tmp.fast(worker); this.result = this.a + this.b } public def resume(worker:Worker):void { switch (this._pc) { case 1: this.result = this.a + this.b; } } public def back(workerWorker,frame:Frame):void {} }

final static class _$mainF0 extends FinishFrame { public def fast(worker:Worker):void { val tmp = new _$mainF0A0(ff,ff); tmp.fast(worker); } public def resume(worker:Worker):void { public def back(worker:Worker,frame:Frame):void {} }

final static class _$mainF0A0 extends RegularFrame { public def fast(worker:Worker):void { this._pc = 1; push(worker); val tmp:_$mainF0A0B0 = new _$mainF0A0B0(ff); tmp.fast(worker); _$main.b = B(_$main.n - 1); } public def resume(worker:Worker):void { switch (this._pc) { case 1: _$main.b = B(_$main.n - 1); } } public def back(workerWorker,frame:Frame):void {} }

final static class _$mainF0B0A0 extends AsyncFrame { public def fast(worker:Worker):void { _$main.a = A(_$main.n); poll(worker); } public def resume(worker:Worker):void { public def back(workerWorker,frame:Frame):void {} }

Steal Point

Page 11: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

5

X10 (2.0) Fib (40) – Single Thread Work-Stealing Execution Time Normalized with Sequential Execution Time

Tim

e

Kumar  et  al.  

Overheads

Control Flow

Providing Contexts

Page 12: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

public static def main() { val n = 20; val a:int; val b:int; finish { async{ a = A(n);} b = B(n-1); } val result = a + b; }

Overhead – Providing Contexts final static class _$main extends MainFrame { public def fast(worker:Worker):void {

this.n = 20; this._pc = 1;

val tmp:_$mainF0 = new _$mainF0(this); tmp.fast(worker); this.result = this.a + this.b } public def resume(worker:Worker):void { switch (this._pc) { case 1: this.result = this.a + this.b; } } public def back(workerWorker,frame:Frame):void {} }

final static class _$mainF0 extends FinishFrame { public def fast(worker:Worker):void {

val tmp = new _$mainF0A0(ff,ff); tmp.fast(worker); } public def resume(worker:Worker):void { public def back(worker:Worker,frame:Frame):void {} }

final static class _$mainF0A0 extends RegularFrame { public def fast(worker:Worker):void { this._pc = 1; push(worker);

val tmp:_$mainF0A0B0 = new _$mainF0A0B0(ff); tmp.fast(worker); _$main.b = B(_$main.n - 1); } public def resume(worker:Worker):void { switch (this._pc) { case 1: _$main.b = B(_$main.n - 1); } } public def back(workerWorker,frame:Frame):void {} }

final static class _$mainF0B0A0 extends AsyncFrame { public def fast(worker:Worker):void { _$main.a = A(_$main.n); poll(worker); } public def resume(worker:Worker):void { public def back(workerWorker,frame:Frame):void {} }

Steal Point

Page 13: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

5

X10 (2.0) Fib (40) – Single Thread Work-Stealing Execution Time Normalized with Sequential Execution Time

Tim

e

Kumar  et  al.  

Overheads

Control Flow

Providing Contexts

Coordination Effort

Page 14: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

public static def main() { val n = 20; val a:int; val b:int; finish { async{ a = A(n);} b = B(n-1); } val result = a + b; }

Overhead – Coordination Effort final static class _$main extends MainFrame { public def fast(worker:Worker):void { this.n = 20; this._pc = 1; val tmp:_$mainF0 = new _$mainF0(this); tmp.fast(worker); this.result = this.a + this.b } public def resume(worker:Worker):void { switch (this._pc) { case 1: this.result = this.a + this.b; } } public def back(workerWorker,frame:Frame):void {} }

final static class _$mainF0 extends FinishFrame { public def fast(worker:Worker):void { val tmp = new _$mainF0A0(ff,ff); tmp.fast(worker); } public def resume(worker:Worker):void { public def back(worker:Worker,frame:Frame):void {} }

final static class _$mainF0A0 extends RegularFrame { public def fast(worker:Worker):void { this._pc = 1;

push(worker); val tmp:_$mainF0A0B0 = new _$mainF0A0B0(ff); tmp.fast(worker); _$main.b = B(_$main.n - 1); } public def resume(worker:Worker):void { switch (this._pc) { case 1: _$main.b = B(_$main.n - 1); } } public def back(workerWorker,frame:Frame):void {} }

final static class _$mainF0B0A0 extends AsyncFrame { public def fast(worker:Worker):void { _$main.a = A(_$main.n);

poll(worker); } public def resume(worker:Worker):void { public def back(workerWorker,frame:Frame):void {} }

Steal Point

Page 15: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

6 Kumar  et  al.  

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

2 3 4 5 6 7 8

30 35 40

Ratio of Total Continuations Stolen to Total Continuations Produced in X10 (2.0) Fibonacci Benchmark

Ste

al R

atio

Steal Ratio

Page 16: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

6 Kumar  et  al.  

Ratio of Total Continuations Stolen to Total Continuations Produced in X10 (2.0) Fibonacci Benchmark

Ste

al R

atio

Steal Ratio

Log Scale

Page 17: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Our philosophy: – Small steal ratio

  Saving contexts for every continuation inefficient

–  Provide contexts only when steal occurs

7 Kumar  et  al.  

VM Supported Work–Stealing

•  Our approach: – Thief steals victim’s Java stack frame – Thief forces the victim to yield to start

the steal

Page 18: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

Fast_A() Fast_B() Fast_C() Fast_D() Fast_E() Fast_G()

run()

Fast_H() Fast_I()

run() FP_D FP_G FP_I

Fast_A() Fast_B() Fast_C()

Thread-0 Thread-1 Deque

Java

Sta

ck G

row

th

VM Assistance

8 Kumar  et  al.  

Page 19: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

Fast_A() Fast_B() Fast_C() Fast_D() Fast_E() Fast_G()

run()

Fast_H() Fast_I()

run() Initiate()

FP_D FP_G FP_I

Steal_FP() Fast_A() Fast_B() Fast_C()

Thread-0 Thread-1 Deque

Java

Sta

ck G

row

th

VM Assistance

Kumar  et  al.   8

Page 20: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

Fast_A() Fast_B() Fast_C() Fast_D() Fast_E() Fast_G()

run()

Fast_H() Fast_I()

run() Initiate() Steal()

FP_G FP_I

Steal_FP()

run() Initiate() Steal()

Fast_A() Fast_B() Fast_C()

Thread-0 Thread-1 Deque

Java

Sta

ck G

row

th

VM Assistance

Kumar  et  al.   8

Page 21: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

Fast_A() Fast_B() Fast_C() Fast_D() Fast_E() Fast_G()

run()

Fast_H() Fast_I()

run() Initiate()

Steal()

Fast_A Fast_ B Fast_C Fast_D

run() Initiate() Steal()

FP_G FP_I

Steal_FP()

run() Initiate() Steal()

Fast_A() Fast_B() Fast_C()

Thread-0 Thread-1 Thread-1 Deque

Java

Sta

ck G

row

th

VM Assistance

Kumar  et  al.  

New Stack

8

Page 22: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

Fast_A() Fast_B() Fast_C() Fast_D() Fast_E() Fast_G()

run()

Fast_H() Fast_I()

run() Initiate()

Steal()

Fast_A Fast_ B Fast_C Fast_D

FP_G FP_I

Fast_A() Fast_B() Fast_C()

Thread-0 Thread-1 Deque

Java

Sta

ck G

row

th

VM Assistance

Kumar  et  al.   8

Page 23: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

Fast_A() Fast_B() Fast_C() Fast_D() Fast_E() Fast_G()

run()

Fast_H() Fast_I()

run() Initiate() FP_G

FP_I Fast_A() Fast_B() Fast_C()

Thread-0 Thread-1 Deque

Java

Sta

ck G

row

th

VM Assistance

Kumar  et  al.  

States_A States_D States_C States_B

8

Page 24: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

Fast_A() Fast_B() Fast_C() Fast_D() Fast_E() Fast_G()

run()

Fast_H() Fast_I()

run() Initiate() Fast_A Fast_ B Fast_C Fast_D

Fast_D Fast_J Fast_K Fast_L Fast_M

States_A States_D States_C States_B

FP_G FP_I

Fast_A() Fast_B() Fast_C()

Thread-0 Thread-1 Deque

Java

Sta

ck G

row

th

VM Assistance

Kumar  et  al.   8

Page 25: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

Fast_A() Fast_B() Fast_C() Fast_D() Fast_E() Fast_G()

run()

Fast_H() Fast_I()

run() Initiate() Fast_D

States_A States_D States_C States_B

FP_G FP_I

Fast_A() Fast_B() Fast_C()

Thread-0 Thread-1 Deque

Java

Sta

ck G

row

th

VM Assistance

Kumar  et  al.   8

Page 26: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

Fast_A() Fast_B() Fast_C() Fast_D()

run() run() Initiate() Fast_D

States_A States_D States_C States_B

Fast_A() Fast_B() Fast_C()

Thread-0 Thread-1

Java

Sta

ck G

row

th

VM Assistance

Kumar  et  al.   8

Page 27: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

run() run() Initiate() Fast_D

States_A States_D States_C States_B

Thread-0 Thread-1

Java

Sta

ck G

row

th

VM Assistance

Kumar  et  al.   8

Page 28: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

run() run() Initiate() Fast_A Fast_ B Fast_C Fast_D

Fast_D Fast_J Fast_K Fast_L Fast_M

States_A States_D States_C States_B

Thread-0 Thread-1

Java

Sta

ck G

row

th

VM Assistance

Kumar  et  al.   8

Page 29: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

run() run() Initiate() Fast_D

States_A States_D States_C States_B

Thread-0 Thread-1

Java

Sta

ck G

row

th

VM Assistance

Kumar  et  al.   8

Page 30: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

run() run() Initiate() Fast_D Slow_D

States_A States_C States_B

Thread-0 Thread-1

Java

Sta

ck G

row

th

VM Assistance

Kumar  et  al.   8

Page 31: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

run() run() Initiate() Fast_A Fast_D Fast_J Slow_D Slow_C

States_A States_B

Thread-0 Thread-1

Java

Sta

ck G

row

th

VM Assistance

Kumar  et  al.   8

Page 32: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

run() run() Initiate() Fast_A Fast_ B

Fast_D Fast_J Fast_K

Slow_D Slow_C Slow_B

States_A

Thread-0 Thread-1

Java

Sta

ck G

row

th

VM Assistance

Kumar  et  al.   8

Page 33: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

•  Stealing the Stack Frames

run() run() Initiate() Fast_A Fast_ B Fast_C

Fast_D Fast_J Fast_K Fast_L

Slow_D Slow_C Slow_B Slow_A

Thread-0 Thread-1

Java

Sta

ck G

row

th

VM Assistance

Kumar  et  al.  

Calculation Finished

8

Page 34: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

9

a)  Integrate (1000) b)  Fib (40)

Tim

e

Kumar  et  al.  

Experimental Results – Execution Time

Total Threads Total Threads

Default WS

VM Supported WS

Execution Time Normalized With Single Thread Default Work-Stealing Time

Page 35: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

10

Spe

edup

Kumar  et  al.  

Experimental Results – Speedup Default WS

VM Supported WS

a)  Integrate (1000) b)  Fib (40) Total Threads Total Threads

Speedup Relative to Single Thread Default Work-Stealing Time

Page 36: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

11 Kumar  et  al.  

•  Multicore era –  Dynamic task parallelism.

•  Load balancing –  Work–stealing schedulers

Summary

Overheads = Control Flow

Providing Contexts

Coordination Effort + +

Page 37: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

12 Kumar  et  al.  

•  Multicore era –  Dynamic task parallelism.

•  Load balancing –  Work–stealing schedulers

Overheads = Control Flow

Providing Contexts

Coordination Effort + +

Future Work

  Test with high steal ratio benchmarks

  Research new VM extensions to make X10 run faster.

Page 38: Work–Stealing.by.Stealing.State.from.. …x10.sourceforge.net/documentation/papers/X10Workshop2011/...Motivation • Multicore era – Dynamic task parallelism Kumar*etal.* 2 –

13 Kumar  et  al.  

Questions …..?