embracing heterogeneity with dynamic core boosting

Post on 16-Feb-2016

31 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Embracing Heterogeneity with Dynamic Core Boosting. Hyoun Kyu Cho and Scott Mahlke. University of Michigan. May 20, 2014. Parallel Programming. Core1. Core2. Workload. Core3. Core4. Workload Imbalance Among Threads. Asymmetric S/W Control flow divergence - PowerPoint PPT Presentation

TRANSCRIPT

University of MichiganElectrical Engineering and Computer Science1

Embracing Heterogeneity with Dynamic Core Boosting

Hyoun Kyu Cho and Scott Mahlke

University of Michigan

May 20, 2014

University of MichiganElectrical Engineering and Computer Science2

Parallel Programming

Core1

Core2

Core3

Core4

Workload

University of MichiganElectrical Engineering and Computer Science3

Workload Imbalance Among Threads

• Asymmetric S/W– Control flow divergence– Non-deterministic memory

latencies– Synchronization operations

• Asymmetric H/W– Heterogeneous multicores– Core-to-core process variation

University of MichiganElectrical Engineering and Computer Science4

Performance Impact of Asymmetric H/W

• Symmetric 8 Cores vs. 8 Cores w/ variations

University of MichiganElectrical Engineering and Computer Science5

CPU Time Wasted for SynchronizationHomogeneous Heterogeneous

University of MichiganElectrical Engineering and Computer Science6

Thread Criticality due to Workload Imbalance

T1

T2

T3

T4

T5

IdleBarrier

time

T1

T2

T3

T4

T5time

University of MichiganElectrical Engineering and Computer Science7

Accelerating Critical Path w/ Core Boosting

T1

T2

T3

T4

T5

IdleBarrier

time

T1

T2

T3

T4

T5time

T1

T2

T3

T4

T5time

University of MichiganElectrical Engineering and Computer Science8

Modeling Workload Imbalance & Boosting

University of MichiganElectrical Engineering and Computer Science9

Boosting Assignment• Data parallel programs

• Pipeline parallel programsWorkerWorker Worker Worker Worker

Stage1 Stage2 Stage3 Stage4

University of MichiganElectrical Engineering and Computer Science10

Boosting Data Parallel Programs• Greedy scheduling

University of MichiganElectrical Engineering and Computer Science11

Boosting Pipeline Parallel Programs• Epoch-based scheduling

– Monitors CPU utilization with H/W performance counter– Assigns boosting budget at the end of epoch

University of MichiganElectrical Engineering and Computer Science12

Dynamic Core Boosting

University of MichiganElectrical Engineering and Computer Science13

Progress Monitoring Example … pthread_barrier_wait(barrier); period = calc_period_LID_007(start, end); for ( i = start ; i < end ; i++ ) { … compute(…); if ( side_exit ) { SET_PROGRESS_TO(MAX_PROGRESS_007); break; } if ( ( ( end – i ) % period ) == 0 ) PROGRESS_STEP_FORWARD; } pthread_barrier_wait(barrier); …

University of MichiganElectrical Engineering and Computer Science14

Evaluation Methodology• Asymmetry emulation with Dynamic Binary Translation

– Slow down proportionally instead of accelerating• 8 cores with frequency variation

– • 1 core boosted, boosting rate = 1.5x• Compares

– Heterogeneous– Reactive– DCB

University of MichiganElectrical Engineering and Computer Science15

Performance Improvementbla

cksc

holes

body

track

cann

eal

dedu

pfa

cesim ferre

tflu

idanim

ate

raytr

ace

strea

mcluste

rsw

aptio

nsx2

64g.

mean

0.5

0.6

0.7

0.8

0.9

1.0Heterogeneous Reactive DCB

Norm

aliz

ed E

xecu

tion

Tim

e

University of MichiganElectrical Engineering and Computer Science16

Synchronization Overheadsbl

acks

chol

esbo

dytra

ckca

nnea

lde

dup

face

sim

ferre

tflu

idan

imat

era

ytra

cest

ream

clus

ter

swap

tions

x264

g.m

ean

0%10%20%30%40%50%60%70%80%

Heterogeneous Reactive DCB

Rel

ativ

e C

PU T

ime

University of MichiganElectrical Engineering and Computer Science17

Thread Arrival Time

University of MichiganElectrical Engineering and Computer Science18

Conclusion• DCB mitigates workload imbalance in performance

asymmetric CMPs– Accelerating critical threads– Coordinating compiler, runtime, and architecture for

near-optimal assignment

• Overall, improves performance by 33%, outperforming a reactive boosting scheme by 10%

University of MichiganElectrical Engineering and Computer Science19

Thank you!

University of MichiganElectrical Engineering and Computer Science20

Core Boosting with Frequency Scaling

Transition time < 10ns [Dreslinski`12]

University of MichiganElectrical Engineering and Computer Science21

Asymmetry Emulation with DBT

University of MichiganElectrical Engineering and Computer Science22

Evaluation Platform Accuracybl

acks

chol

esbo

dytra

ckca

nnea

lde

dup

face

sim

ferre

tflu

idan

imat

era

ytra

cest

ream

clus

ter

swap

tions

x264

mea

n

0%

2%

4%

6%

8%

10%

12%

Rel

ativ

e Er

ror

top related