christopher batten christopher torng, moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350...

32
Asymmetry-Aware Work-Stealing Runtimes Christopher Torng, Moyang Wang, and Christopher Batten School of Electrical and Computer Engineering Cornell University 43rd Int’l Symp. on Computer Architecture, June 2016

Upload: others

Post on 20-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Asymmetry-Aware Work-Stealing Runtimes

Christopher Torng, Moyang Wang, andChristopher Batten

School of Electrical and Computer EngineeringCornell University

43rd Int’l Symp. on Computer Architecture, June 2016

Page 2: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing

Runtimes

Single-ISA

Heterogeneous

Architectures

Dynamic Voltage

and Frequency

Scaling

Dynamic

AsymmetryStatic

Asymmetry

How can we use asymmetry awareness to improve the

performance and energy efficiency of a work-stealing runtime?

Cornell University Christopher Torng 2 / 21

Page 3: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing Runtimes

Work in

Progress

Task

Queues

Core 0 Core 1 Core 2 Core 3

Task B

Task A

Spawn Task B

Cornell University Christopher Torng 3 / 21

Page 4: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing Runtimes

Work in

Progress

Task

Queues

Core 0 Core 1 Core 2 Core 3

Task B

Dequeue Task B

Cornell University Christopher Torng 3 / 21

Page 5: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing Runtimes

Work in

Progress

Task

Queues

Core 0 Core 1 Core 2 Core 3

Task C

Task B

Spawn Task C

Cornell University Christopher Torng 3 / 21

Page 6: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing Runtimes

Work in

Progress

Task

Queues

Core 0 Core 1 Core 2 Core 3

Task B

Spawn Task D

Task D

Task C

Cornell University Christopher Torng 3 / 21

Page 7: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing Runtimes

Work in

Progress

Task

Queues

Core 0 Core 1 Core 2 Core 3

Task CTask D

Steal Task D Steal Task C

Task B

Cornell University Christopher Torng 3 / 21

Page 8: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing Runtimes

Work in

Progress

Task

Queues

Core 0 Core 1 Core 2 Core 3

Task CTask D

Task E Task F

Spawn Task FSpawn Task E

Cornell University Christopher Torng 3 / 21

Page 9: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing Runtimes

Work in

Progress

Task

Queues

Core 0 Core 1 Core 2 Core 3

Task CTask DTask E Task F

Steal Task E Steal Task F

I Work stealing has good performance, space requirements, andcommunication overheads in both theory and practice

I Supported in many popular concurrency platforms including:Intel’s Cilk Plus, Intel’s C++ TBB, Microsoft’s .NET Task ParallelLibrary, Java’s Fork/Join Framework, and OpenMP

Cornell University Christopher Torng 3 / 21

Page 10: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Static Asymmetry vs. Dynamic Asymmetry

Samsung Exynos OctaMobile Processor

Big

ARM Cores

Little

ARM Cores

A7 A7 A15 A15

L2$ L2$

A7 A7 A15 A15

Cell ControlPower Mux

Test Chip with Four Integrated Voltage Regulators

Load

100 150 200 250 300 350 400

1.41.31.21.11.00.90.80.7

Time (ns)

Volta

ge (

V)

120 ns

150 ns

Integrated Voltage Regulation

From W, Godycki, C. Torng, I. Bukreyev, A. Apsel, C. Batten.“Enabling Realistic Fine-Grain Voltage Scaling with

Reconfigurable Power Distribution Networks” MICRO, 2014

Cornell University Christopher Torng 4 / 21

Page 11: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

• Motivation • First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes Evaluation

Work-Stealing

Runtimes

Dynamic

Asymmetry

Static

Asymmetry

Single-ISA

Heterogeneous

Architectures

Dynamic Voltage

and Frequency

Scaling

Bender et al.

"Online Scheduling

of Parallel Programs on

Heterogeneous Sys ..."

SPAA 2002

Ribic et al.

"Energy-Efficient

Work-Stealing

Language Runtimes"

ASPLOS 2014

Azizi et al.

"Energy-performance Tradeoffs in Processor Architecture and Circuit Design:

A Marginal Cost Analysis" ISCA 2010

How can we use asymmetry awareness to improve the

performance and energy efficiency of a work-stealing runtime?

Cornell University Christopher Torng 5 / 21

Page 12: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation • First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes Evaluation

L

L

B

B

Work-Stealing

Runtimes

Dynamic

AsymmetryStatic

Asymmetry

Work-Pacing

Work-Sprinting Work-Mugging

Talk Outline

Motivation

First-Order Modeling

Asymmetry-AwareWork-Stealing Runtimes

Evaluation

Cornell University Christopher Torng 6 / 21

Page 13: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation • First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes Evaluation

Building Intuition by Exploring a 1 Big 1 Little SystemN

orm

aliz

ed

Pow

er

Normalized IPS

0.5 1.0 1.5 2.0 2.5 3.0

8

7

6

5

4

3

2

1

LB

System with 1 big 1 little

Little Core

Four-Way

Big Core

(2.0, 6.0)

(1.0, 1.0)B

L7.0

Power

B

L

3.0

IPS

Cornell University Christopher Torng 7 / 21

Page 14: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation • First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes Evaluation

Building Intuition by Exploring a 1 Big 1 Little SystemN

orm

aliz

ed

Pow

er

Normalized IPS

0.5 1.0 1.5 2.0 2.5 3.0

8

7

6

5

4

3

2

1

LB

System with 1 big 1 little

Little Core

Four-Way

Big Core

(2.0, 6.0)

(1.0, 1.0)B

L7.0

Power

B

L

3.0

IPS

B

LL

B

Cornell University Christopher Torng 7 / 21

Page 15: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation • First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes Evaluation

Building Intuition by Exploring a 1 Big 1 Little SystemN

orm

aliz

ed

Pow

er

Normalized IPS

0.5 1.0 1.5 2.0 2.5 3.0

8

7

6

5

4

3

2

1

LB

System with 1 big 1 little

Little Core

Four-Way

Big Core

(2.0, 6.0)

(1.0, 1.0)B

L7.0

Power

B

L

3.0

IPS

B

LL

B B

LL

B

Same Power

10% Performance Increase

Cornell University Christopher Torng 7 / 21

Page 16: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation • First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes Evaluation

The Law of Equi-Marginal Utility

Normalized IPS

0.5 1.0 1.5 2.0 2.5 3.0

8

7

6

5

4

3

2

1

Alfred Marshall (1824 - 1924)

"Other things being equal, a consumer

gets maximum satisfaction when he

allocates his limited income to the

purchase of different goods in such a

way that the Marginal Utility derived

from the last unit of money spent on

each item of expenditure

tend to be equal."

British Economist

Norm

aliz

ed

Pow

er

Balance the ratio of

utility (IPS) to cost (power)

dy,cost

dx,utility

Slope

Slope

1.0 V

1.0 V

Cornell University Christopher Torng 8 / 21

Page 17: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation • First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes Evaluation

The Law of Equi-Marginal Utility

Normalized IPS

0.5 1.0 1.5 2.0 2.5 3.0

8

7

6

5

4

3

2

1

Alfred Marshall (1824 - 1924)

"Other things being equal, a consumer

gets maximum satisfaction when he

allocates his limited income to the

purchase of different goods in such a

way that the Marginal Utility derived

from the last unit of money spent on

each item of expenditure

tend to be equal."

British Economist

Norm

aliz

ed

Pow

er

Balance the ratio of

utility (IPS) to cost (power)

Slope

Slope0.9 V

1.3 V

Arbitrage

"Buy Low, Sell High"

Cornell University Christopher Torng 8 / 21

Page 18: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation • First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes Evaluation

Systematic Approach for Balancing Marginal UtilityN

orm

aliz

ed

En

erg

y E

ffic

ien

cy

Normalized IPS

0.6 0.8 1.0 1.2 1.4

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

isopo

wer1 Big 1 Little System

at Nominal voltage

Assumptions

Individual (VB, VL) pair

Pareto-Optimal Frontier

Perfectly parallel application

Ideal load balancing

Performance

at expense of

energy efficiency

Energy efficiency

at expense of

performance

Cornell University Christopher Torng 9 / 21

Page 19: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation • First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes Evaluation

Systematic Approach for Balancing Marginal UtilityN

orm

aliz

ed

En

erg

y E

ffic

ien

cy

Normalized IPS

0.6 0.8 1.0 1.2 1.4

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

isopo

wer1 Big 1 Little System

at Nominal voltage

Assumptions

Individual (VB, VL) pair

Pareto-Optimal Frontier

Perfectly parallel application

Ideal load balancing

Improve both

performance and

energy efficiency

Cornell University Christopher Torng 9 / 21

Page 20: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation • First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes Evaluation

Systematic Approach for Balancing Marginal UtilityN

orm

aliz

ed

En

erg

y E

ffic

ien

cy

Normalized IPS

0.6 0.8 1.0 1.2 1.4

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

isopo

wer1 Big 1 Little System

at Nominal voltage

Assumptions

Individual (VB, VL) pair

Pareto-Optimal Frontier

Perfectly parallel application

Ideal load balancing

Marginal Utility-Based

Optimization Problem

Constraint: isopower line

Objective: maximize performance

Solved numerically

Cornell University Christopher Torng 9 / 21

Page 21: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes • Evaluation

L

L

B

B

Work-Stealing

Runtimes

Dynamic

AsymmetryStatic

Asymmetry

Work-Pacing

Work-Sprinting Work-Mugging

Talk Outline

Motivation

First-Order Modeling

Asymmetry-AwareWork-Stealing Runtimes

Evaluation

Cornell University Christopher Torng 10 / 21

Page 22: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes • Evaluation

Work-Pacing: Building Intuition

Balance performance/power

across cores in the

high-parallel (HP) region

L

L

B

B

Busy Steal Loop2 Big, 2 Little

System with both big cores active and both little cores active

B

L

Norm

aliz

ed

Pow

er

Normalized IPS

0.0 0.5 1.0 1.5 2.0 2.5 3.00

1

2

3

4

5

6

7

VL

VB

Ma

rgin

al I

PS

/W

0.70 1.00 1.30 1.60 1.901.04 1.00 0.92 0.76 0.24

Aggregate

Throughput

Ag

gre

ga

te S

yste

m I

PS

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0.70 1.00 1.30 1.60 1.901.04 1.00 0.92 0.76 0.24

VL

VB

B L

Cornell University Christopher Torng 11 / 21

Page 23: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes • Evaluation

Work-Pacing, Work-Sprinting, and Work-Mugging

L

L

B

B

Steal LoopBusy

Work-Pacing

Balance performance/power

across cores in the

high-parallel (HP) regionRest cores in the steal loop

to the lowest voltage

With additional power slack,

balance performance/power

across busy cores in the

low-parallel (LP) region

Work-Sprinting

Work-Mugging

Move work from

slow little cores

to fast big cores in thelow-parallel (LP) region

Cornell University Christopher Torng 12 / 21

Page 24: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes • Evaluation

Work-Pacing and Work-Sprinting Mechanisms

On-Chip Interconnect

L1$ L1$

B L

VoltageRegulators

DRAM Memory Controller

L1$ L1$

B L

Shared L2$ Cache Banks

DVFSController

Work in

Progress

Task

Queues

Big Big Little Little

Which cores are stealing? Big or little?

Task A Task B

Hints

Dec

isio

n

Vdd

B LB L

A A A AVB VL

A = Active, S = Stealing

0.91V 1.30VA A A s 0.98V 1.30V 0.70V

Stealing

Task C

A A s s

A s A AA s A sA s s s

s s A As s A ss s s s

1.03V 1.30V

1.04V 1.30V1.13V 1.30V1.21V 0.70V

0.70V 1.30V0.70V 1.30V0.70V 0.70V

0.70V

0.70V0.70V0.70V

0.70V0.70V0.70V

Activity Pattern Voltages

Cornell University Christopher Torng 13 / 21

Page 25: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation First-Order Modeling • Asymmetry-Aware Work-Stealing Runtimes • Evaluation

Work-Mugging Mechanisms

On-Chip Interconnect

L1$ L1$

B L

DRAM Memory Controller

L1$ L1$

B L

Shared L2$ Cache Banks

VoltageRegulators

DVFSController

User-Level Interrupt Network

Initiate Mug MugInterrupt

Thread Context Swap

Mug InstructionI Thread ID to mugI Address of thread-swapping handler

User-Level Interrupt NetworkI Simple, low-bandwidth inter-core

networkI Latency on order of 20 cycles

Thread Context SwapI Threads store architectural state to

separate locations in shared memoryI Both threads syncI Threads load architectural state from

other location

Cornell University Christopher Torng 14 / 21

Page 26: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation •

L

L

B

B

Work-Stealing

Runtimes

Dynamic

AsymmetryStatic

Asymmetry

Work-Pacing

Work-Sprinting Work-Mugging

Talk Outline

Motivation

First-Order Modeling

Asymmetry-AwareWork-Stealing Runtimes

Evaluation

Cornell University Christopher Torng 15 / 21

Page 27: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation •

Evaluation Methodology: Modeling

Work-Stealing RuntimeI State-of-the-art Intel TBB-inspired work-stealing schedulerI Chase-Lev task queues with occupancy-based victim selectionI Instrumented with activity hints

Cycle-Level ModelingI Heterogeneous system modeled in gem5 cycle-approximate simulatorI Support for scaling per-core frequencies + central DVFS Controller

Energy ModelingI Event-based energy modeling based on detailed RTL/gate-level sims

(Synopsys ASIC toolflow, TSMC LP, 65 nm 1.0 V)I Carefully selected subset of McPAT results tuned to our µarchitecture

Cornell University Christopher Torng 16 / 21

Page 28: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation •

Work-Pacing in cilk-sort

time

Big

Lit

tle

No AAWS Techniques

Busy Steal Loop

Activity Bar

DVFS Controller Decision

0.70 V1.04 V

1.24 V

1.00 V1.30 V

0.90 V

Work-Pacing

Big

Lit

tle

time

Speedup

of 1.11x

Norm

aliz

ed

En

erg

y E

ffic

ien

cy

0.9

1.0

1.1

1.2

1.3

1.4

1.5

Performance1.0 1.21.1 1.3 1.4 1.5

isopo

wer

Speedup 1.11x

Energy Efficiency 1.11x

Cornell University Christopher Torng 17 / 21

Page 29: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation •

Work-Sprinting in quicksort

1.00 V0.70 V

1.04 V

1.24 V

Busy Steal Loop

1.30 V

1.10 V

time

Big

Littl

e

No AAWS Techniques

DVFS Controller Decision

Activity Bar

Work-Sprinting

Big

Littl

e

time

Speedupof 1.34x No

rmal

ized

Ene

rgy

Effic

ienc

y0.9

1.0

1.1

1.2

1.3

1.4

1.5

Performance1.0 1.21.1 1.3 1.4 1.5

isopo

wer

Speedup 1.34xEnergy Efficiency 1.16x

Cornell University Christopher Torng 18 / 21

Page 30: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation •

Work-Mugging in radix sortB

igL

ittl

e

No AAWS Techniques

time

Busy Steal Loop

Activity Bar

DVFS Controller Decision

1.00 V

0.90 V

0.70 V1.04 V

1.24 V

1.30 V

Work-Mugging

Big

Lit

tle

time Speedup 1.17x

Norm

aliz

ed

En

erg

y E

ffic

ien

cy

0.9

1.0

1.1

1.2

1.3

1.4

1.5

Performance1.0 1.21.1 1.3 1.4 1.5

isopo

wer

Speedup 1.17x

Energy Efficiency 1.40x

Cornell University Christopher Torng 19 / 21

Page 31: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation •

Evaluation of Complete AAWS Runtime

0.0 0.2 0.4 0.6 0.8

Norm

aliz

ed

En

erg

y E

ffic

ien

cy

0.9

1.0

1.1

1.2

1.3

1.4

1.5

Performance1.0 1.21.1 1.3 1.4 1.5

isopo

werpbbs-bfs

pbbs-quicksort

pbbs-samplesort

pbbs-dictionary

pbbs-convex-hull

pbbs-radix-sort

pbbs-knn

pbbs-max-independent-set

pbbs-nbody

pbbs-remove-duplicates

pbbs-suffix-array

pbbs-spanning-tree

cilk-cholesky

cilk-cilksort

cilk-heat

cilk-knapsack

cilk-matrix-multiply

parsec-blackscholes

unbalanced-tree-search

Application Kernels

Median: 1.10 x Max: 1.32 xPerformanceMedian: 1.11 x Max: 1.53 xEnergy Efficiency

Cornell University Christopher Torng 20 / 21

Page 32: Christopher Batten Christopher Torng, Moyang …cbatten/pdfs/torng-aaws...100 150 200 250 300 350 400 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 Time (ns) Voltage (V) 120 ns 150 ns Integrated

Motivation First-Order Modeling Asymmetry-Aware Work-Stealing Runtimes • Evaluation •

Work-Stealing

Runtimes

Dynamic

AsymmetryStatic

Asymmetry

Take-Away Point

Holistically combining

• work-stealing runtimes• static asymmetry• dynamic asymmetry

through the use of

• work-pacing• work-sprinting• work-mugging

can improve both performance andenergy efficiency in future multicoresystems

This work was partially supported by the NSF, AFOSR, and donations from Intel and Synopsys

Cornell University Christopher Torng 21 / 21