the bubblewrap many-core

129
http://iacoma.cs.uiuc.edu / University of Illinois The BubbleWrap Many-Core: Popping Cores for Sequential Acceleration Ulya Karpuzcu, Brian Greskamp, Josep Torrellas University of Illinois

Upload: others

Post on 03-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

http://iacoma.cs.uiuc.edu/

Universityof Illinois

The BubbleWrap Many-Core:Popping Cores for Sequential Acceleration

Ulya Karpuzcu, Brian Greskamp, Josep TorrellasUniversity of Illinois

Ulya Karpuzcu The BubbleWrap Many-Core

Problem: The Power Wall

2

Ulya Karpuzcu The BubbleWrap Many-Core

Problem: The Power Wall

2

Ideal Scaling

Ulya Karpuzcu The BubbleWrap Many-Core

Problem: The Power Wall

• Dynamic (switching) power density remains constant

2

Ideal Scaling

Ulya Karpuzcu The BubbleWrap Many-Core

Problem: The Power Wall

• Dynamic (switching) power density remains constant

2

Ideal Scaling

PDYN = #Devices x

Ulya Karpuzcu The BubbleWrap Many-Core

Problem: The Power Wall

• Dynamic (switching) power density remains constant

2

Ideal Scaling

Frequency of switchingPDYN = #Devices x x

Ulya Karpuzcu The BubbleWrap Many-Core

Problem: The Power Wall

• Dynamic (switching) power density remains constant

2

Ideal Scaling

Frequency of switchingPDYN = #Devices Energy per switchingx x

Ulya Karpuzcu The BubbleWrap Many-Core

Problem: The Power Wall

• Dynamic (switching) power density remains constant

2

Ideal Scaling

Frequency of switchingPDYN = #Devices Energy per switchingx x

Ulya Karpuzcu The BubbleWrap Many-Core

Problem: The Power Wall

• Dynamic (switching) power density remains constant

2

Ideal Scaling

Frequency of switchingPDYN = #Devices Energy per switchingx x

Ulya Karpuzcu The BubbleWrap Many-Core

∝Vdd2

Problem: The Power Wall

• Dynamic (switching) power density remains constant

2

Ideal Scaling

Frequency of switchingPDYN = #Devices Energy per switchingx x

Ulya Karpuzcu The BubbleWrap Many-Core

∝Vdd2

Problem: The Power Wall

• Dynamic (switching) power density remains constant

2

Ideal Scaling

Practical Scaling• Vdd has been scaling down slower than ideally

Frequency of switchingPDYN = #Devices Energy per switchingx x

Ulya Karpuzcu The BubbleWrap Many-Core

∝Vdd2

Problem: The Power Wall

• Dynamic (switching) power density remains constant

2

Ideal Scaling

Practical Scaling• Vdd has been scaling down slower than ideally

• Historically: Higher Vdd ≡ Higher performance

Frequency of switchingPDYN = #Devices Energy per switchingx x

Ulya Karpuzcu The BubbleWrap Many-Core

∝Vdd2

Problem: The Power Wall

• Dynamic (switching) power density remains constant

2

Ideal Scaling

Practical Scaling• Vdd has been scaling down slower than ideally

• Historically: Higher Vdd ≡ Higher performance

• Recently: Practically stagnated Vth scaling to control static power

Frequency of switchingPDYN = #Devices Energy per switchingx x

Ulya Karpuzcu The BubbleWrap Many-Core

∝Vdd2

Problem: The Power Wall

• Dynamic (switching) power density remains constant

2

Ideal Scaling

Practical Scaling• Vdd has been scaling down slower than ideally

• Historically: Higher Vdd ≡ Higher performance

• Recently: Practically stagnated Vth scaling to control static power

Dynamic Power Density is increasing

Frequency of switchingPDYN = #Devices Energy per switchingx x

Ulya Karpuzcu The BubbleWrap Many-Core

The Power Wall

3

Ulya Karpuzcu The BubbleWrap Many-Core

Num

ber o

f Cor

es p

er C

hip

Nor

mal

ized

to 2

008

2008 2013 2018 20220

5

10

15

20

25

The Power Wall

3

ITRS 2008

Ulya Karpuzcu The BubbleWrap Many-Core

Num

ber o

f Cor

es p

er C

hip

Nor

mal

ized

to 2

008

2008 2013 2018 20220

5

10

15

20

25

The Power Wall

3

Total

ITRS 2008

Ulya Karpuzcu The BubbleWrap Many-Core

Num

ber o

f Cor

es p

er C

hip

Nor

mal

ized

to 2

008

2008 2013 2018 20220

5

10

15

20

25

The Power Wall

3

Total

ITRS 2008

Ulya Karpuzcu The BubbleWrap Many-Core

Num

ber o

f Cor

es p

er C

hip

Nor

mal

ized

to 2

008

2008 2013 2018 20220

5

10

15

20

25

The Power Wall

3

TotalPowered-on for 100W

ITRS 2008

Ulya Karpuzcu The BubbleWrap Many-Core

Num

ber o

f Cor

es p

er C

hip

Nor

mal

ized

to 2

008

2008 2013 2018 20220

5

10

15

20

25

The Power Wall

3

TotalPowered-on for 100W

ITRS 2008

Ulya Karpuzcu The BubbleWrap Many-Core

Num

ber o

f Cor

es p

er C

hip

Nor

mal

ized

to 2

008

The Power Wall

4

TotalPowered on for 100W

ITRS 2008

0

5

10

15

20

25

2008 2013 2018 2022

Active CoresDormant Cores

Ulya Karpuzcu The BubbleWrap Many-Core

Num

ber o

f Cor

es p

er C

hip

Nor

mal

ized

to 2

008

The Power Wall

4

TotalPowered on for 100W

ITRS 2008

0

5

10

15

20

25

2008 2013 2018 2022

Active CoresDormant Cores

In 9 years

Ulya Karpuzcu The BubbleWrap Many-Core

Num

ber o

f Cor

es p

er C

hip

Nor

mal

ized

to 2

008

The Power Wall

4

TotalPowered on for 100W

ITRS 2008

0

5

10

15

20

25

2008 2013 2018 2022

Active CoresDormant Cores

Only ~25% of coresactive at any time

In 9 years

Ulya Karpuzcu The BubbleWrap Many-Core

The BubbleWrap Many-Core

5

Ulya Karpuzcu The BubbleWrap Many-Core

The BubbleWrap Many-Core

5

• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life

Ulya Karpuzcu The BubbleWrap Many-Core

The BubbleWrap Many-Core

5

• Base: A homogeneous many-core

• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life

Ulya Karpuzcu The BubbleWrap Many-Core

The BubbleWrap Many-Core

5

• Base: A homogeneous many-core

• Throughput Cores

• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life

Ulya Karpuzcu The BubbleWrap Many-Core

The BubbleWrap Many-Core

5

• Base: A homogeneous many-core

• Throughput Cores

• Most energy-efficient cores

• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life

Ulya Karpuzcu The BubbleWrap Many-Core

The BubbleWrap Many-Core

5

• Base: A homogeneous many-core

• Throughput Cores

• Most energy-efficient cores

• Run parallel sections

• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life

Ulya Karpuzcu The BubbleWrap Many-Core

The BubbleWrap Many-Core

5

• Base: A homogeneous many-core

• Throughput Cores

• Most energy-efficient cores

• Run parallel sections

• Operate at nominal V/f

• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life

Nom

inal

V/f

Ulya Karpuzcu The BubbleWrap Many-Core

The BubbleWrap Many-Core

5

• Base: A homogeneous many-core

• Throughput Cores

• Most energy-efficient cores

• Run parallel sections

• Operate at nominal V/f

• Expendable Cores

• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life

Nom

inal

V/f

Ulya Karpuzcu The BubbleWrap Many-Core

The BubbleWrap Many-Core

5

• Base: A homogeneous many-core

• Throughput Cores

• Most energy-efficient cores

• Run parallel sections

• Operate at nominal V/f

• Expendable Cores

• Run sequential sections

• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life

Nom

inal

V/f

Ulya Karpuzcu The BubbleWrap Many-Core

The BubbleWrap Many-Core

5

• Base: A homogeneous many-core

• Throughput Cores

• Most energy-efficient cores

• Run parallel sections

• Operate at nominal V/f

• Expendable Cores

• Run sequential sections

• Operate at elevated V/f

• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life

Nom

inal

V/f

Hig

h V/

f

Ulya Karpuzcu The BubbleWrap Many-Core

The BubbleWrap Many-Core

5

• Base: A homogeneous many-core

• Throughput Cores

• Most energy-efficient cores

• Run parallel sections

• Operate at nominal V/f

• Expendable Cores

• Run sequential sections

• Operate at elevated V/f

• Discarded early due to shorter service life (Popped like BubbleWrap)

• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life

Nom

inal

V/f

Hig

h V/

f

Ulya Karpuzcu The BubbleWrap Many-Core

Core Aging

6

Ulya Karpuzcu The BubbleWrap Many-Core

Core Aging

• Manifestation: Progressive slow-down in logic as the core is being used

6

Ulya Karpuzcu The BubbleWrap Many-Core

Core Aging

• Manifestation: Progressive slow-down in logic as the core is being used

• Main contributor: Bias Temperature Instability (BTI)

6

Ulya Karpuzcu The BubbleWrap Many-Core

Core Aging

• Manifestation: Progressive slow-down in logic as the core is being used

• Main contributor: Bias Temperature Instability (BTI)

• Induces increase in critical path delays ∝ timeconst.<1

6

Ulya Karpuzcu The BubbleWrap Many-Core

Core Aging

• Manifestation: Progressive slow-down in logic as the core is being used

• Main contributor: Bias Temperature Instability (BTI)

• Induces increase in critical path delays ∝ timeconst.<1

• Aging rate: Exponential dependence on Vdd and T

6

Ulya Karpuzcu The BubbleWrap Many-Core

τ0

Aging-induced Degradation

7

Crit

ical

Pat

h De

lay

VddNOM

SNOM0time

Vdd

0 SNOM time

Ulya Karpuzcu The BubbleWrap Many-Core

τ0

Aging-induced Degradation

7

Crit

ical

Pat

h De

lay

VddNOM

SNOM0time

Vdd

0 SNOM time

Ulya Karpuzcu The BubbleWrap Many-Core

τ0

Aging-induced Degradation

7

Crit

ical

Pat

h De

lay

VddNOM

SNOM0time

Vdd

0 SNOM time

Ulya Karpuzcu The BubbleWrap Many-Core

τ0

Aging-induced Degradation

7

Crit

ical

Pat

h De

lay

VddNOM

SNOM0time

Vdd

0 SNOM time

Ulya Karpuzcu The BubbleWrap Many-Core

τ0

Aging-induced Degradation

7

Crit

ical

Pat

h De

lay

VddNOM

SNOM0time

Vdd

0 SNOM time

Ulya Karpuzcu The BubbleWrap Many-Core

τ0

Aging-induced Degradation

7

Crit

ical

Pat

h De

lay

VddNOM

SNOM0time

Vdd

0 SNOM

• BTI-induced increase in critical path delays ∝ timeconst.<1

time

Ulya Karpuzcu The BubbleWrap Many-Core

τ0

Aging-induced Degradation

7

Crit

ical

Pat

h De

lay

VddNOM

SNOM0time

Vdd

0 SNOM

• BTI-induced increase in critical path delays ∝ timeconst.<1

• fNOM set by the delay at the end of the service-life (SNOM)

fNOM = 1/τD

time

τD

fNOM

Ulya Karpuzcu The BubbleWrap Many-Core

τ0

Aging-induced Degradation

7

Crit

ical

Pat

h De

lay

VddNOM

SNOM0time

Vdd

0 SNOM

• BTI-induced increase in critical path delays ∝ timeconst.<1

• fNOM set by the delay at the end of the service-life (SNOM)

fNOM = 1/τD

time

τD

Wasted Opportunity: Guardband

fNOM

Ulya Karpuzcu The BubbleWrap Many-Core

Impact of Operation at Higher V/f on Aging

8

time

Crit

ical

Pat

h De

lay

SNOM0time

Vdd

0 SNOM

VddNOM

Ulya Karpuzcu The BubbleWrap Many-Core

Impact of Operation at Higher V/f on Aging

8

time

Crit

ical

Pat

h De

lay

SNOM0time

Vdd

0 SNOM SSHORTSSHORT

VddNOM

Ulya Karpuzcu The BubbleWrap Many-Core

Impact of Operation at Higher V/f on Aging

8

time

Crit

ical

Pat

h De

lay

SNOM0time

Vdd

0 SNOM SSHORTSSHORT

VddOP

• Higher Vdd: VddOP >> VddNOM

VddNOM

Ulya Karpuzcu The BubbleWrap Many-Core

Impact of Operation at Higher V/f on Aging

8

time

Crit

ical

Pat

h De

lay

SNOM0time

Vdd

0 SNOM SSHORTSSHORT

VddOP

• Higher Vdd: VddOP >> VddNOM

VddNOM

Ulya Karpuzcu The BubbleWrap Many-Core

Impact of Operation at Higher V/f on Aging

8

time

Crit

ical

Pat

h De

lay

SNOM0time

Vdd

0 SNOM SSHORTSSHORT

VddOP

• Higher Vdd: VddOP >> VddNOM

• Result: Lower critical path delay; higher aging rate

VddNOM

Ulya Karpuzcu The BubbleWrap Many-Core

Impact of Operation at Higher V/f on Aging

8

time

Crit

ical

Pat

h De

lay

SNOM0time

Vdd

0 SNOM SSHORTSSHORT

VddOP τOP

• Higher Vdd: VddOP >> VddNOM

• Result: Lower critical path delay; higher aging rate

• Run at constant fOP = 1/τOP until SSHORT; then discard

VddNOM fOP

Ulya Karpuzcu The BubbleWrap Many-Core

Impact of Operation at Higher V/f on Aging

8

time

Crit

ical

Pat

h De

lay

SNOM0time

Vdd

0 SNOM SSHORTSSHORT

VddOP τOP

• Higher Vdd: VddOP >> VddNOM

• Result: Lower critical path delay; higher aging rate

• Run at constant fOP = 1/τOP until SSHORT; then discard

VddNOM fOP

Wasted Opportunity

Ulya Karpuzcu The BubbleWrap Many-Core

How to Manage Aging Optimally?

9

Wasted Opportunity

τ0

time

Crit

ical

Pat

h De

lay

SNOM0

τD

fNOM=1/τD

Ulya Karpuzcu The BubbleWrap Many-Core

How to Manage Aging Optimally?

9

Wasted Opportunity

Contribution: DVS for Aging Management (DVSAM)

τ0

time

Crit

ical

Pat

h De

lay

SNOM0

τD

fNOM=1/τD

Ulya Karpuzcu The BubbleWrap Many-Core

How to Manage Aging Optimally?

9

Wasted Opportunity

• Change Vdd with time to compensate for critical path degradation

Contribution: DVS for Aging Management (DVSAM)

τ0

time

Crit

ical

Pat

h De

lay

SNOM0

τD

fNOM=1/τD

Ulya Karpuzcu The BubbleWrap Many-Core

How to Manage Aging Optimally?

9

Wasted Opportunity

• Change Vdd with time to compensate for critical path degradation

• Enforce minimum Vdd needed for any f-target

Contribution: DVS for Aging Management (DVSAM)

τ0

time

Crit

ical

Pat

h De

lay

SNOM0

τD

fNOM=1/τD

Ulya Karpuzcu The BubbleWrap Many-Core

How to Manage Aging Optimally?

9

Wasted Opportunity

• DVSAM-Pow: Turn wasted opportunity to power efficiency

• Change Vdd with time to compensate for critical path degradation

• Enforce minimum Vdd needed for any f-target

Contribution: DVS for Aging Management (DVSAM)

τ0

time

Crit

ical

Pat

h De

lay

SNOM0

τD

fNOM=1/τD

Ulya Karpuzcu The BubbleWrap Many-Core

How to Manage Aging Optimally?

9

Wasted Opportunity

• DVSAM-Pow: Turn wasted opportunity to power efficiency

• DVSAM-Perf: Turn wasted opportunity to higher frequency

• Change Vdd with time to compensate for critical path degradation

• Enforce minimum Vdd needed for any f-target

Contribution: DVS for Aging Management (DVSAM)

τ0

time

Crit

ical

Pat

h De

lay

SNOM0

τD

fNOM=1/τD

Ulya Karpuzcu The BubbleWrap Many-Core 10

DVSAM-Pow

Ulya Karpuzcu The BubbleWrap Many-Core 10

DVSAM-PowIdea: Minimize power consumption at fNOM = 1/τD

Ulya Karpuzcu The BubbleWrap Many-Core 10

DVSAM-PowIdea: Minimize power consumption at fNOM = 1/τD

τ0

time

Crit

ical

Pat

h De

lay

SNOM0

VddNOM

time

Vdd

0 SNOM

Ulya Karpuzcu The BubbleWrap Many-Core 10

DVSAM-PowIdea: Minimize power consumption at fNOM = 1/τD

τ0

time

Crit

ical

Pat

h De

lay

SNOM0

VddNOM

time

Vdd

0 SNOM

• Critical path delays are kept at τD until SNOM: Run at fNOM

Ulya Karpuzcu The BubbleWrap Many-Core 10

DVSAM-PowIdea: Minimize power consumption at fNOM = 1/τD

τ0

time

Crit

ical

Pat

h De

lay

SNOM0

τDVddNOM

time

Vdd

0 SNOM

• Critical path delays are kept at τD until SNOM: Run at fNOM

Ulya Karpuzcu The BubbleWrap Many-Core 10

DVSAM-PowIdea: Minimize power consumption at fNOM = 1/τD

τ0

time

Crit

ical

Pat

h De

lay

SNOM0

τDVddNOM

time

Vdd

0 SNOM

VddOP

• Critical path delays are kept at τD until SNOM: Run at fNOM

• Start with low Vdd and increase slowly

Ulya Karpuzcu The BubbleWrap Many-Core 10

DVSAM-PowIdea: Minimize power consumption at fNOM = 1/τD

τ0

time

Crit

ical

Pat

h De

lay

SNOM0

τDVddNOM

time

Vdd

0 SNOM

VddOP

• Critical path delays are kept at τD until SNOM: Run at fNOM

• Start with low Vdd and increase slowly

Ulya Karpuzcu The BubbleWrap Many-Core 10

DVSAM-PowIdea: Minimize power consumption at fNOM = 1/τD

τ0

time

Crit

ical

Pat

h De

lay

SNOM0

τDVddNOM

time

Vdd

0 SNOM

VddOP

• Critical path delays are kept at τD until SNOM: Run at fNOM

• Start with low Vdd and increase slowly

Ulya Karpuzcu The BubbleWrap Many-Core 10

DVSAM-PowIdea: Minimize power consumption at fNOM = 1/τD

τ0

time

Crit

ical

Pat

h De

lay

SNOM0

τDVddNOM

time

Vdd

0 SNOM

VddOP

• Critical path delays are kept at τD until SNOM: Run at fNOM

• Start with low Vdd and increase slowly

Ulya Karpuzcu The BubbleWrap Many-Core 11

DVSAM-Pow

Ulya Karpuzcu The BubbleWrap Many-Core 11

• Vdd < VddNOM and f = fNOM throughout SNOM

DVSAM-Pow

Ulya Karpuzcu The BubbleWrap Many-Core 11

• Vdd < VddNOM and f = fNOM throughout SNOM

• Power savings due to Vdd < VddNOM

DVSAM-Pow

Ulya Karpuzcu The BubbleWrap Many-Core 11

• Vdd < VddNOM and f = fNOM throughout SNOM

• Power savings due to Vdd < VddNOM

➡ More cores active for the same P-budget

DVSAM-Pow

Ulya Karpuzcu The BubbleWrap Many-Core 11

• Vdd < VddNOM and f = fNOM throughout SNOM

• Power savings due to Vdd < VddNOM

➡ More cores active for the same P-budget➡ Increased throughput

DVSAM-Pow

Ulya Karpuzcu The BubbleWrap Many-Core 12

DVSAM-Perf

Ulya Karpuzcu The BubbleWrap Many-Core 12

DVSAM-PerfIdea: Maximize frequency for the same service life

Ulya Karpuzcu The BubbleWrap Many-Core 12

DVSAM-PerfIdea: Maximize frequency for the same service life

time

Crit

ical

Pat

h De

lay

SNOM0

τDVddNOM

time

Vdd

0 SNOM

τ0

Ulya Karpuzcu The BubbleWrap Many-Core 12

DVSAM-PerfIdea: Maximize frequency for the same service life

τOP

time

Crit

ical

Pat

h De

lay

SNOM0

τDVddNOM

time

Vdd

0 SNOM

• Shorter critical path delay τOP until SNOM: Run at higher f = 1 / τOP

τ0

Ulya Karpuzcu The BubbleWrap Many-Core 12

DVSAM-PerfIdea: Maximize frequency for the same service life

τOP

time

Crit

ical

Pat

h De

lay

SNOM0

τDVddOP

VddNOM

time

Vdd

0 SNOM

• Shorter critical path delay τOP until SNOM: Run at higher f = 1 / τOP

• Start with low Vdd and increase rapidly

τ0

Ulya Karpuzcu The BubbleWrap Many-Core 12

DVSAM-PerfIdea: Maximize frequency for the same service life

τOP

time

Crit

ical

Pat

h De

lay

SNOM0

τDVddOP

VddNOM

time

Vdd

0 SNOM

• Shorter critical path delay τOP until SNOM: Run at higher f = 1 / τOP

• Start with low Vdd and increase rapidly

τ0

Ulya Karpuzcu The BubbleWrap Many-Core 12

DVSAM-PerfIdea: Maximize frequency for the same service life

τOP

time

Crit

ical

Pat

h De

lay

SNOM0

τDVddOP

VddNOM

time

Vdd

0 SNOM

• Shorter critical path delay τOP until SNOM: Run at higher f = 1 / τOP

• Start with low Vdd and increase rapidly

τ0

Ulya Karpuzcu The BubbleWrap Many-Core 12

DVSAM-PerfIdea: Maximize frequency for the same service life

τOP

time

Crit

ical

Pat

h De

lay

SNOM0

τDVddOP

VddNOM

time

Vdd

0 SNOM

• Shorter critical path delay τOP until SNOM: Run at higher f = 1 / τOP

• Start with low Vdd and increase rapidly

τ0

Ulya Karpuzcu The BubbleWrap Many-Core 13

DVSAM-Perf for SSHORT

Ulya Karpuzcu The BubbleWrap Many-Core 13

DVSAM-Perf for SSHORT

Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance

Ulya Karpuzcu The BubbleWrap Many-Core 13

τOP

time

Crit

ical

Pat

h De

lay

SNOM0

VddOP

timeVd

d0 SNOM

DVSAM-Perf for SSHORT

Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance

Ulya Karpuzcu The BubbleWrap Many-Core 13

τOP

time

Crit

ical

Pat

h De

lay

SNOM0

VddOP

timeVd

d0 SNOM

DVSAM-Perf for SSHORT

Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance

SSHORT SSHORT

Ulya Karpuzcu The BubbleWrap Many-Core 13

τOP

time

Crit

ical

Pat

h De

lay

SNOM0

VddOP

timeVd

d0 SNOM

DVSAM-Perf for SSHORT

Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance

SSHORT SSHORT

τSH

Ulya Karpuzcu The BubbleWrap Many-Core 13

τOP

time

Crit

ical

Pat

h De

lay

SNOM0

VddOP

timeVd

d0 SNOM

DVSAM-Perf for SSHORT

Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance

SSHORT SSHORT

τSH

VddSH

Ulya Karpuzcu The BubbleWrap Many-Core 13

τOP

time

Crit

ical

Pat

h De

lay

SNOM0

VddOP

timeVd

d0 SNOM

DVSAM-Perf for SSHORT

Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance

SSHORT SSHORT

τSH

VddSH

• Even higher frequency than DVSAM-Perf for short service life

fSSHORT

fSNOM

Ulya Karpuzcu The BubbleWrap Many-Core 13

τOP

time

Crit

ical

Pat

h De

lay

SNOM0

VddOP

timeVd

d0 SNOM

DVSAM-Perf for SSHORT

Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance

SSHORT SSHORT

τSH

VddSH

• Even higher frequency than DVSAM-Perf for short service life

fSSHORT

fSNOM

Ulya Karpuzcu The BubbleWrap Many-Core 13

τOP

time

Crit

ical

Pat

h De

lay

SNOM0

VddOP

timeVd

d0 SNOM

DVSAM-Perf for SSHORT

Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance

SSHORT SSHORT

τSH

VddSH

• Even higher frequency than DVSAM-Perf for short service life

fSSHORT

fSNOM

Ulya Karpuzcu The BubbleWrap Many-Core

BubbleWrap Environments

14

Throughput Cores

Expendable Cores

Ulya Karpuzcu The BubbleWrap Many-Core

BubbleWrap Environments

14

• Two choices for Throughput CoresThroughput Cores

Expendable Cores

Ulya Karpuzcu The BubbleWrap Many-Core

BubbleWrap Environments

14

• Two choices for Throughput Cores

• Nominal operationThroughput Cores

Expendable Cores

Ulya Karpuzcu The BubbleWrap Many-Core

BubbleWrap Environments

14

• Two choices for Throughput Cores

• Nominal operation

• Use DVSAM-Pow and expand the set of throughout cores for the same power budget

Throughput Cores

Expendable Cores

Ulya Karpuzcu The BubbleWrap Many-Core

BubbleWrap Environments

14

• Two choices for Throughput Cores

• Nominal operation

• Use DVSAM-Pow and expand the set of throughout cores for the same power budget

Throughput Cores

Expendable Cores

• Two choices for Expendable Cores

Ulya Karpuzcu The BubbleWrap Many-Core

BubbleWrap Environments

14

• Two choices for Throughput Cores

• Nominal operation

• Use DVSAM-Pow and expand the set of throughout cores for the same power budget

Throughput Cores

Expendable Cores

• Two choices for Expendable Cores

• Higher, constant Vdd until SSHORT; then discard

Ulya Karpuzcu The BubbleWrap Many-Core

BubbleWrap Environments

14

• Two choices for Throughput Cores

• Nominal operation

• Use DVSAM-Pow and expand the set of throughout cores for the same power budget

Throughput Cores

Expendable Cores

• Two choices for Expendable Cores

• Higher, constant Vdd until SSHORT; then discard

• DVSAM-Perf until SSHORT; then discard

Ulya Karpuzcu The BubbleWrap Many-Core

Hardware Support?

15

Ulya Karpuzcu The BubbleWrap Many-Core

Hardware Support?• No change in the core architecture

15

Ulya Karpuzcu The BubbleWrap Many-Core

Hardware Support?• No change in the core architecture

• Need circuits to measure aging

15

Ulya Karpuzcu The BubbleWrap Many-Core

Hardware Support?• No change in the core architecture

• Need circuits to measure aging

• Need high-precision DVS

15

Ulya Karpuzcu The BubbleWrap Many-Core

Hardware Support?• No change in the core architecture

• Need circuits to measure aging

• Need high-precision DVS

• Clock and power distribution

15

Ulya Karpuzcu The BubbleWrap Many-Core

Hardware Support?• No change in the core architecture

• Need circuits to measure aging

• Need high-precision DVS

• Clock and power distribution• Two separate V/f domains: One for Expendable and one for

Throughput Cores

15

Ulya Karpuzcu The BubbleWrap Many-Core

BubbleWrap Evaluation

16

Ulya Karpuzcu The BubbleWrap Many-Core

BubbleWrap Evaluation• 32 core chip: NT = 16 Throughput and NE = 16 Expendable cores

16

Ulya Karpuzcu The BubbleWrap Many-Core

BubbleWrap Evaluation• 32 core chip: NT = 16 Throughput and NE = 16 Expendable cores

• 22nm high-k metal-gate process

16

Ulya Karpuzcu The BubbleWrap Many-Core

BubbleWrap Evaluation• 32 core chip: NT = 16 Throughput and NE = 16 Expendable cores

• 22nm high-k metal-gate process

• Multiprogrammed workload synthesized from SPEC2000

16

Ulya Karpuzcu The BubbleWrap Many-Core

BubbleWrap Evaluation• 32 core chip: NT = 16 Throughput and NE = 16 Expendable cores

• 22nm high-k metal-gate process

• Multiprogrammed workload synthesized from SPEC2000

• SESC enhanced by a power & thermal model

16

Ulya Karpuzcu The BubbleWrap Many-Core

Frequency Gains of Sequential Section

17

Ulya Karpuzcu The BubbleWrap Many-Core

Frequency Gains of Sequential Section

17

1 0.8 0.6 0.4 0.20.90

0.95

1.00

1.05

1.10

1.15

1.20

Nor

mal

ized

Freq

uenc

y G

ain

Sequential Fraction (LSEQ)

Ulya Karpuzcu The BubbleWrap Many-Core

Frequency Gains of Sequential Section

17

Base

1 0.8 0.6 0.4 0.20.90

0.95

1.00

1.05

1.10

1.15

1.20

Nor

mal

ized

Freq

uenc

y G

ain

Sequential Fraction (LSEQ)

Ulya Karpuzcu The BubbleWrap Many-Core

Frequency Gains of Sequential Section

18

Base DVSAM-Perf for SSHORT

0.90

0.95

1.00

1.05

1.10

1.15

1.20

1 0.8 0.6 0.4 0.2Nor

mal

ized

Freq

uenc

y G

ain

Sequential Fraction (LSEQ)

Ulya Karpuzcu The BubbleWrap Many-Core

Frequency Gains of Sequential Section

18

• Large f gains are feasible

Base DVSAM-Perf for SSHORT

0.90

0.95

1.00

1.05

1.10

1.15

1.20

1 0.8 0.6 0.4 0.2Nor

mal

ized

Freq

uenc

y G

ain

Sequential Fraction (LSEQ)

Ulya Karpuzcu The BubbleWrap Many-Core

Frequency Gains of Sequential Section

18

• Large f gains are feasible• f increases with smaller sequential section

Base DVSAM-Perf for SSHORT

0.90

0.95

1.00

1.05

1.10

1.15

1.20

1 0.8 0.6 0.4 0.2Nor

mal

ized

Freq

uenc

y G

ain

Sequential Fraction (LSEQ)

Ulya Karpuzcu The BubbleWrap Many-Core

Frequency Gains of Sequential Section

18

• Large f gains are feasible• f increases with smaller sequential section• For DVSAM-Perf, each expendable core runs for LSEQ/NE x SNOM

Base DVSAM-Perf for SSHORT

0.90

0.95

1.00

1.05

1.10

1.15

1.20

1 0.8 0.6 0.4 0.2Nor

mal

ized

Freq

uenc

y G

ain

Sequential Fraction (LSEQ)

Ulya Karpuzcu The BubbleWrap Many-Core

Power Consumption of Sequential Section

19

Ulya Karpuzcu The BubbleWrap Many-Core

Power Consumption of Sequential Section

19

• Each Expendable core has max P budget of two cores

Ulya Karpuzcu The BubbleWrap Many-Core

1 0.8 0.6 0.4 0.2 00

0.5

1.0

1.5

2.0

Nor

mal

ized

Pow

er C

onsu

mpt

ion

Sequential Fraction (LSEQ)

Power Consumption of Sequential Section

19

• Each Expendable core has max P budget of two cores

Ulya Karpuzcu The BubbleWrap Many-Core

1 0.8 0.6 0.4 0.2 00

0.5

1.0

1.5

2.0

Nor

mal

ized

Pow

er C

onsu

mpt

ion

Sequential Fraction (LSEQ)

Power Consumption of Sequential Section

19

• Each Expendable core has max P budget of two cores

Base

Ulya Karpuzcu The BubbleWrap Many-Core

0

0.5

1.0

1.5

2.0

1 0.8 0.6 0.4 0.2 0

Nor

mal

ized

Pow

er C

onsu

mpt

ion

Sequential Fraction (LSEQ)

Power Consumption of Sequential Section

20

• Each Expendable core has max P budget of two cores

Base DVSAM-Perf for SSHORT

Ulya Karpuzcu The BubbleWrap Many-Core

0

0.5

1.0

1.5

2.0

1 0.8 0.6 0.4 0.2 0

Nor

mal

ized

Pow

er C

onsu

mpt

ion

Sequential Fraction (LSEQ)

Power Consumption of Sequential Section

20

• Each Expendable core has max P budget of two cores

• Tolerable power cost for the frequency gains

Base DVSAM-Perf for SSHORT

Ulya Karpuzcu The BubbleWrap Many-Core

Conclusion

21

The BubbleWrap Many-Core: Exploiting dormant cores for sequential acceleration

Ulya Karpuzcu The BubbleWrap Many-Core

Conclusion

• Simple homogeneous design

21

The BubbleWrap Many-Core: Exploiting dormant cores for sequential acceleration

Ulya Karpuzcu The BubbleWrap Many-Core

Conclusion

• Simple homogeneous design

• No architectural or software changes

21

The BubbleWrap Many-Core: Exploiting dormant cores for sequential acceleration

Ulya Karpuzcu The BubbleWrap Many-Core

Conclusion

• Simple homogeneous design

• No architectural or software changes

• Improves sequential and parallel performance

21

The BubbleWrap Many-Core: Exploiting dormant cores for sequential acceleration

Ulya Karpuzcu The BubbleWrap Many-Core

Conclusion

• Simple homogeneous design

• No architectural or software changes

• Improves sequential and parallel performance

• Fully sequential applications at 16% higher f

21

The BubbleWrap Many-Core: Exploiting dormant cores for sequential acceleration

Ulya Karpuzcu The BubbleWrap Many-Core

Conclusion

• Simple homogeneous design

• No architectural or software changes

• Improves sequential and parallel performance

• Fully sequential applications at 16% higher f

• Fully parallel applications at 30% higher throughput

21

The BubbleWrap Many-Core: Exploiting dormant cores for sequential acceleration

http://iacoma.cs.uiuc.edu/

Universityof Illinois

The BubbleWrap Many-Core:Popping Cores for Sequential Acceleration

Ulya Karpuzcu, Brian Greskamp, Josep TorrellasUniversity of Illinois