the bubblewrap many-core
TRANSCRIPT
http://iacoma.cs.uiuc.edu/
Universityof Illinois
The BubbleWrap Many-Core:Popping Cores for Sequential Acceleration
Ulya Karpuzcu, Brian Greskamp, Josep TorrellasUniversity of Illinois
Ulya Karpuzcu The BubbleWrap Many-Core
Problem: The Power Wall
• Dynamic (switching) power density remains constant
2
Ideal Scaling
Ulya Karpuzcu The BubbleWrap Many-Core
Problem: The Power Wall
• Dynamic (switching) power density remains constant
2
Ideal Scaling
PDYN = #Devices x
Ulya Karpuzcu The BubbleWrap Many-Core
Problem: The Power Wall
• Dynamic (switching) power density remains constant
2
Ideal Scaling
Frequency of switchingPDYN = #Devices x x
Ulya Karpuzcu The BubbleWrap Many-Core
Problem: The Power Wall
• Dynamic (switching) power density remains constant
2
Ideal Scaling
Frequency of switchingPDYN = #Devices Energy per switchingx x
Ulya Karpuzcu The BubbleWrap Many-Core
Problem: The Power Wall
• Dynamic (switching) power density remains constant
2
Ideal Scaling
Frequency of switchingPDYN = #Devices Energy per switchingx x
Ulya Karpuzcu The BubbleWrap Many-Core
Problem: The Power Wall
• Dynamic (switching) power density remains constant
2
Ideal Scaling
Frequency of switchingPDYN = #Devices Energy per switchingx x
Ulya Karpuzcu The BubbleWrap Many-Core
∝Vdd2
Problem: The Power Wall
• Dynamic (switching) power density remains constant
2
Ideal Scaling
Frequency of switchingPDYN = #Devices Energy per switchingx x
Ulya Karpuzcu The BubbleWrap Many-Core
∝Vdd2
Problem: The Power Wall
• Dynamic (switching) power density remains constant
2
Ideal Scaling
Practical Scaling• Vdd has been scaling down slower than ideally
Frequency of switchingPDYN = #Devices Energy per switchingx x
Ulya Karpuzcu The BubbleWrap Many-Core
∝Vdd2
Problem: The Power Wall
• Dynamic (switching) power density remains constant
2
Ideal Scaling
Practical Scaling• Vdd has been scaling down slower than ideally
• Historically: Higher Vdd ≡ Higher performance
Frequency of switchingPDYN = #Devices Energy per switchingx x
Ulya Karpuzcu The BubbleWrap Many-Core
∝Vdd2
Problem: The Power Wall
• Dynamic (switching) power density remains constant
2
Ideal Scaling
Practical Scaling• Vdd has been scaling down slower than ideally
• Historically: Higher Vdd ≡ Higher performance
• Recently: Practically stagnated Vth scaling to control static power
Frequency of switchingPDYN = #Devices Energy per switchingx x
Ulya Karpuzcu The BubbleWrap Many-Core
∝Vdd2
Problem: The Power Wall
• Dynamic (switching) power density remains constant
2
Ideal Scaling
Practical Scaling• Vdd has been scaling down slower than ideally
• Historically: Higher Vdd ≡ Higher performance
• Recently: Practically stagnated Vth scaling to control static power
Dynamic Power Density is increasing
Frequency of switchingPDYN = #Devices Energy per switchingx x
Ulya Karpuzcu The BubbleWrap Many-Core
Num
ber o
f Cor
es p
er C
hip
Nor
mal
ized
to 2
008
2008 2013 2018 20220
5
10
15
20
25
The Power Wall
3
ITRS 2008
Ulya Karpuzcu The BubbleWrap Many-Core
Num
ber o
f Cor
es p
er C
hip
Nor
mal
ized
to 2
008
2008 2013 2018 20220
5
10
15
20
25
The Power Wall
3
Total
ITRS 2008
Ulya Karpuzcu The BubbleWrap Many-Core
Num
ber o
f Cor
es p
er C
hip
Nor
mal
ized
to 2
008
2008 2013 2018 20220
5
10
15
20
25
The Power Wall
3
Total
ITRS 2008
Ulya Karpuzcu The BubbleWrap Many-Core
Num
ber o
f Cor
es p
er C
hip
Nor
mal
ized
to 2
008
2008 2013 2018 20220
5
10
15
20
25
The Power Wall
3
TotalPowered-on for 100W
ITRS 2008
Ulya Karpuzcu The BubbleWrap Many-Core
Num
ber o
f Cor
es p
er C
hip
Nor
mal
ized
to 2
008
2008 2013 2018 20220
5
10
15
20
25
The Power Wall
3
TotalPowered-on for 100W
ITRS 2008
Ulya Karpuzcu The BubbleWrap Many-Core
Num
ber o
f Cor
es p
er C
hip
Nor
mal
ized
to 2
008
The Power Wall
4
TotalPowered on for 100W
ITRS 2008
0
5
10
15
20
25
2008 2013 2018 2022
Active CoresDormant Cores
Ulya Karpuzcu The BubbleWrap Many-Core
Num
ber o
f Cor
es p
er C
hip
Nor
mal
ized
to 2
008
The Power Wall
4
TotalPowered on for 100W
ITRS 2008
0
5
10
15
20
25
2008 2013 2018 2022
Active CoresDormant Cores
In 9 years
Ulya Karpuzcu The BubbleWrap Many-Core
Num
ber o
f Cor
es p
er C
hip
Nor
mal
ized
to 2
008
The Power Wall
4
TotalPowered on for 100W
ITRS 2008
0
5
10
15
20
25
2008 2013 2018 2022
Active CoresDormant Cores
Only ~25% of coresactive at any time
In 9 years
Ulya Karpuzcu The BubbleWrap Many-Core
The BubbleWrap Many-Core
5
• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life
Ulya Karpuzcu The BubbleWrap Many-Core
The BubbleWrap Many-Core
5
• Base: A homogeneous many-core
• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life
Ulya Karpuzcu The BubbleWrap Many-Core
The BubbleWrap Many-Core
5
• Base: A homogeneous many-core
• Throughput Cores
• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life
Ulya Karpuzcu The BubbleWrap Many-Core
The BubbleWrap Many-Core
5
• Base: A homogeneous many-core
• Throughput Cores
• Most energy-efficient cores
• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life
Ulya Karpuzcu The BubbleWrap Many-Core
The BubbleWrap Many-Core
5
• Base: A homogeneous many-core
• Throughput Cores
• Most energy-efficient cores
• Run parallel sections
• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life
Ulya Karpuzcu The BubbleWrap Many-Core
The BubbleWrap Many-Core
5
• Base: A homogeneous many-core
• Throughput Cores
• Most energy-efficient cores
• Run parallel sections
• Operate at nominal V/f
• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life
Nom
inal
V/f
Ulya Karpuzcu The BubbleWrap Many-Core
The BubbleWrap Many-Core
5
• Base: A homogeneous many-core
• Throughput Cores
• Most energy-efficient cores
• Run parallel sections
• Operate at nominal V/f
• Expendable Cores
• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life
Nom
inal
V/f
Ulya Karpuzcu The BubbleWrap Many-Core
The BubbleWrap Many-Core
5
• Base: A homogeneous many-core
• Throughput Cores
• Most energy-efficient cores
• Run parallel sections
• Operate at nominal V/f
• Expendable Cores
• Run sequential sections
• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life
Nom
inal
V/f
Ulya Karpuzcu The BubbleWrap Many-Core
The BubbleWrap Many-Core
5
• Base: A homogeneous many-core
• Throughput Cores
• Most energy-efficient cores
• Run parallel sections
• Operate at nominal V/f
• Expendable Cores
• Run sequential sections
• Operate at elevated V/f
• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life
Nom
inal
V/f
Hig
h V/
f
Ulya Karpuzcu The BubbleWrap Many-Core
The BubbleWrap Many-Core
5
• Base: A homogeneous many-core
• Throughput Cores
• Most energy-efficient cores
• Run parallel sections
• Operate at nominal V/f
• Expendable Cores
• Run sequential sections
• Operate at elevated V/f
• Discarded early due to shorter service life (Popped like BubbleWrap)
• Exploit dormant cores to accelerate sequential sections at the cost of a shorter per-core service life
Nom
inal
V/f
Hig
h V/
f
Ulya Karpuzcu The BubbleWrap Many-Core
Core Aging
• Manifestation: Progressive slow-down in logic as the core is being used
6
Ulya Karpuzcu The BubbleWrap Many-Core
Core Aging
• Manifestation: Progressive slow-down in logic as the core is being used
• Main contributor: Bias Temperature Instability (BTI)
6
Ulya Karpuzcu The BubbleWrap Many-Core
Core Aging
• Manifestation: Progressive slow-down in logic as the core is being used
• Main contributor: Bias Temperature Instability (BTI)
• Induces increase in critical path delays ∝ timeconst.<1
6
Ulya Karpuzcu The BubbleWrap Many-Core
Core Aging
• Manifestation: Progressive slow-down in logic as the core is being used
• Main contributor: Bias Temperature Instability (BTI)
• Induces increase in critical path delays ∝ timeconst.<1
• Aging rate: Exponential dependence on Vdd and T
6
Ulya Karpuzcu The BubbleWrap Many-Core
τ0
Aging-induced Degradation
7
Crit
ical
Pat
h De
lay
VddNOM
SNOM0time
Vdd
0 SNOM time
Ulya Karpuzcu The BubbleWrap Many-Core
τ0
Aging-induced Degradation
7
Crit
ical
Pat
h De
lay
VddNOM
SNOM0time
Vdd
0 SNOM time
Ulya Karpuzcu The BubbleWrap Many-Core
τ0
Aging-induced Degradation
7
Crit
ical
Pat
h De
lay
VddNOM
SNOM0time
Vdd
0 SNOM time
Ulya Karpuzcu The BubbleWrap Many-Core
τ0
Aging-induced Degradation
7
Crit
ical
Pat
h De
lay
VddNOM
SNOM0time
Vdd
0 SNOM time
Ulya Karpuzcu The BubbleWrap Many-Core
τ0
Aging-induced Degradation
7
Crit
ical
Pat
h De
lay
VddNOM
SNOM0time
Vdd
0 SNOM time
Ulya Karpuzcu The BubbleWrap Many-Core
τ0
Aging-induced Degradation
7
Crit
ical
Pat
h De
lay
VddNOM
SNOM0time
Vdd
0 SNOM
• BTI-induced increase in critical path delays ∝ timeconst.<1
time
Ulya Karpuzcu The BubbleWrap Many-Core
τ0
Aging-induced Degradation
7
Crit
ical
Pat
h De
lay
VddNOM
SNOM0time
Vdd
0 SNOM
• BTI-induced increase in critical path delays ∝ timeconst.<1
• fNOM set by the delay at the end of the service-life (SNOM)
fNOM = 1/τD
time
τD
fNOM
Ulya Karpuzcu The BubbleWrap Many-Core
τ0
Aging-induced Degradation
7
Crit
ical
Pat
h De
lay
VddNOM
SNOM0time
Vdd
0 SNOM
• BTI-induced increase in critical path delays ∝ timeconst.<1
• fNOM set by the delay at the end of the service-life (SNOM)
fNOM = 1/τD
time
τD
Wasted Opportunity: Guardband
fNOM
Ulya Karpuzcu The BubbleWrap Many-Core
Impact of Operation at Higher V/f on Aging
8
time
Crit
ical
Pat
h De
lay
SNOM0time
Vdd
0 SNOM
VddNOM
Ulya Karpuzcu The BubbleWrap Many-Core
Impact of Operation at Higher V/f on Aging
8
time
Crit
ical
Pat
h De
lay
SNOM0time
Vdd
0 SNOM SSHORTSSHORT
VddNOM
Ulya Karpuzcu The BubbleWrap Many-Core
Impact of Operation at Higher V/f on Aging
8
time
Crit
ical
Pat
h De
lay
SNOM0time
Vdd
0 SNOM SSHORTSSHORT
VddOP
• Higher Vdd: VddOP >> VddNOM
VddNOM
Ulya Karpuzcu The BubbleWrap Many-Core
Impact of Operation at Higher V/f on Aging
8
time
Crit
ical
Pat
h De
lay
SNOM0time
Vdd
0 SNOM SSHORTSSHORT
VddOP
• Higher Vdd: VddOP >> VddNOM
VddNOM
Ulya Karpuzcu The BubbleWrap Many-Core
Impact of Operation at Higher V/f on Aging
8
time
Crit
ical
Pat
h De
lay
SNOM0time
Vdd
0 SNOM SSHORTSSHORT
VddOP
• Higher Vdd: VddOP >> VddNOM
• Result: Lower critical path delay; higher aging rate
VddNOM
Ulya Karpuzcu The BubbleWrap Many-Core
Impact of Operation at Higher V/f on Aging
8
time
Crit
ical
Pat
h De
lay
SNOM0time
Vdd
0 SNOM SSHORTSSHORT
VddOP τOP
• Higher Vdd: VddOP >> VddNOM
• Result: Lower critical path delay; higher aging rate
• Run at constant fOP = 1/τOP until SSHORT; then discard
VddNOM fOP
Ulya Karpuzcu The BubbleWrap Many-Core
Impact of Operation at Higher V/f on Aging
8
time
Crit
ical
Pat
h De
lay
SNOM0time
Vdd
0 SNOM SSHORTSSHORT
VddOP τOP
• Higher Vdd: VddOP >> VddNOM
• Result: Lower critical path delay; higher aging rate
• Run at constant fOP = 1/τOP until SSHORT; then discard
VddNOM fOP
Wasted Opportunity
Ulya Karpuzcu The BubbleWrap Many-Core
How to Manage Aging Optimally?
9
Wasted Opportunity
τ0
time
Crit
ical
Pat
h De
lay
SNOM0
τD
fNOM=1/τD
Ulya Karpuzcu The BubbleWrap Many-Core
How to Manage Aging Optimally?
9
Wasted Opportunity
Contribution: DVS for Aging Management (DVSAM)
τ0
time
Crit
ical
Pat
h De
lay
SNOM0
τD
fNOM=1/τD
Ulya Karpuzcu The BubbleWrap Many-Core
How to Manage Aging Optimally?
9
Wasted Opportunity
• Change Vdd with time to compensate for critical path degradation
Contribution: DVS for Aging Management (DVSAM)
τ0
time
Crit
ical
Pat
h De
lay
SNOM0
τD
fNOM=1/τD
Ulya Karpuzcu The BubbleWrap Many-Core
How to Manage Aging Optimally?
9
Wasted Opportunity
• Change Vdd with time to compensate for critical path degradation
• Enforce minimum Vdd needed for any f-target
Contribution: DVS for Aging Management (DVSAM)
τ0
time
Crit
ical
Pat
h De
lay
SNOM0
τD
fNOM=1/τD
Ulya Karpuzcu The BubbleWrap Many-Core
How to Manage Aging Optimally?
9
Wasted Opportunity
• DVSAM-Pow: Turn wasted opportunity to power efficiency
• Change Vdd with time to compensate for critical path degradation
• Enforce minimum Vdd needed for any f-target
Contribution: DVS for Aging Management (DVSAM)
τ0
time
Crit
ical
Pat
h De
lay
SNOM0
τD
fNOM=1/τD
Ulya Karpuzcu The BubbleWrap Many-Core
How to Manage Aging Optimally?
9
Wasted Opportunity
• DVSAM-Pow: Turn wasted opportunity to power efficiency
• DVSAM-Perf: Turn wasted opportunity to higher frequency
• Change Vdd with time to compensate for critical path degradation
• Enforce minimum Vdd needed for any f-target
Contribution: DVS for Aging Management (DVSAM)
τ0
time
Crit
ical
Pat
h De
lay
SNOM0
τD
fNOM=1/τD
Ulya Karpuzcu The BubbleWrap Many-Core 10
DVSAM-PowIdea: Minimize power consumption at fNOM = 1/τD
τ0
time
Crit
ical
Pat
h De
lay
SNOM0
VddNOM
time
Vdd
0 SNOM
Ulya Karpuzcu The BubbleWrap Many-Core 10
DVSAM-PowIdea: Minimize power consumption at fNOM = 1/τD
τ0
time
Crit
ical
Pat
h De
lay
SNOM0
VddNOM
time
Vdd
0 SNOM
• Critical path delays are kept at τD until SNOM: Run at fNOM
Ulya Karpuzcu The BubbleWrap Many-Core 10
DVSAM-PowIdea: Minimize power consumption at fNOM = 1/τD
τ0
time
Crit
ical
Pat
h De
lay
SNOM0
τDVddNOM
time
Vdd
0 SNOM
• Critical path delays are kept at τD until SNOM: Run at fNOM
Ulya Karpuzcu The BubbleWrap Many-Core 10
DVSAM-PowIdea: Minimize power consumption at fNOM = 1/τD
τ0
time
Crit
ical
Pat
h De
lay
SNOM0
τDVddNOM
time
Vdd
0 SNOM
VddOP
• Critical path delays are kept at τD until SNOM: Run at fNOM
• Start with low Vdd and increase slowly
Ulya Karpuzcu The BubbleWrap Many-Core 10
DVSAM-PowIdea: Minimize power consumption at fNOM = 1/τD
τ0
time
Crit
ical
Pat
h De
lay
SNOM0
τDVddNOM
time
Vdd
0 SNOM
VddOP
• Critical path delays are kept at τD until SNOM: Run at fNOM
• Start with low Vdd and increase slowly
Ulya Karpuzcu The BubbleWrap Many-Core 10
DVSAM-PowIdea: Minimize power consumption at fNOM = 1/τD
τ0
time
Crit
ical
Pat
h De
lay
SNOM0
τDVddNOM
time
Vdd
0 SNOM
VddOP
• Critical path delays are kept at τD until SNOM: Run at fNOM
• Start with low Vdd and increase slowly
Ulya Karpuzcu The BubbleWrap Many-Core 10
DVSAM-PowIdea: Minimize power consumption at fNOM = 1/τD
τ0
time
Crit
ical
Pat
h De
lay
SNOM0
τDVddNOM
time
Vdd
0 SNOM
VddOP
• Critical path delays are kept at τD until SNOM: Run at fNOM
• Start with low Vdd and increase slowly
Ulya Karpuzcu The BubbleWrap Many-Core 11
• Vdd < VddNOM and f = fNOM throughout SNOM
• Power savings due to Vdd < VddNOM
DVSAM-Pow
Ulya Karpuzcu The BubbleWrap Many-Core 11
• Vdd < VddNOM and f = fNOM throughout SNOM
• Power savings due to Vdd < VddNOM
➡ More cores active for the same P-budget
DVSAM-Pow
Ulya Karpuzcu The BubbleWrap Many-Core 11
• Vdd < VddNOM and f = fNOM throughout SNOM
• Power savings due to Vdd < VddNOM
➡ More cores active for the same P-budget➡ Increased throughput
DVSAM-Pow
Ulya Karpuzcu The BubbleWrap Many-Core 12
DVSAM-PerfIdea: Maximize frequency for the same service life
Ulya Karpuzcu The BubbleWrap Many-Core 12
DVSAM-PerfIdea: Maximize frequency for the same service life
time
Crit
ical
Pat
h De
lay
SNOM0
τDVddNOM
time
Vdd
0 SNOM
τ0
Ulya Karpuzcu The BubbleWrap Many-Core 12
DVSAM-PerfIdea: Maximize frequency for the same service life
τOP
time
Crit
ical
Pat
h De
lay
SNOM0
τDVddNOM
time
Vdd
0 SNOM
• Shorter critical path delay τOP until SNOM: Run at higher f = 1 / τOP
τ0
Ulya Karpuzcu The BubbleWrap Many-Core 12
DVSAM-PerfIdea: Maximize frequency for the same service life
τOP
time
Crit
ical
Pat
h De
lay
SNOM0
τDVddOP
VddNOM
time
Vdd
0 SNOM
• Shorter critical path delay τOP until SNOM: Run at higher f = 1 / τOP
• Start with low Vdd and increase rapidly
τ0
Ulya Karpuzcu The BubbleWrap Many-Core 12
DVSAM-PerfIdea: Maximize frequency for the same service life
τOP
time
Crit
ical
Pat
h De
lay
SNOM0
τDVddOP
VddNOM
time
Vdd
0 SNOM
• Shorter critical path delay τOP until SNOM: Run at higher f = 1 / τOP
• Start with low Vdd and increase rapidly
τ0
Ulya Karpuzcu The BubbleWrap Many-Core 12
DVSAM-PerfIdea: Maximize frequency for the same service life
τOP
time
Crit
ical
Pat
h De
lay
SNOM0
τDVddOP
VddNOM
time
Vdd
0 SNOM
• Shorter critical path delay τOP until SNOM: Run at higher f = 1 / τOP
• Start with low Vdd and increase rapidly
τ0
Ulya Karpuzcu The BubbleWrap Many-Core 12
DVSAM-PerfIdea: Maximize frequency for the same service life
τOP
time
Crit
ical
Pat
h De
lay
SNOM0
τDVddOP
VddNOM
time
Vdd
0 SNOM
• Shorter critical path delay τOP until SNOM: Run at higher f = 1 / τOP
• Start with low Vdd and increase rapidly
τ0
Ulya Karpuzcu The BubbleWrap Many-Core 13
DVSAM-Perf for SSHORT
Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance
Ulya Karpuzcu The BubbleWrap Many-Core 13
τOP
time
Crit
ical
Pat
h De
lay
SNOM0
VddOP
timeVd
d0 SNOM
DVSAM-Perf for SSHORT
Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance
Ulya Karpuzcu The BubbleWrap Many-Core 13
τOP
time
Crit
ical
Pat
h De
lay
SNOM0
VddOP
timeVd
d0 SNOM
DVSAM-Perf for SSHORT
Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance
SSHORT SSHORT
Ulya Karpuzcu The BubbleWrap Many-Core 13
τOP
time
Crit
ical
Pat
h De
lay
SNOM0
VddOP
timeVd
d0 SNOM
DVSAM-Perf for SSHORT
Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance
SSHORT SSHORT
τSH
Ulya Karpuzcu The BubbleWrap Many-Core 13
τOP
time
Crit
ical
Pat
h De
lay
SNOM0
VddOP
timeVd
d0 SNOM
DVSAM-Perf for SSHORT
Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance
SSHORT SSHORT
τSH
VddSH
Ulya Karpuzcu The BubbleWrap Many-Core 13
τOP
time
Crit
ical
Pat
h De
lay
SNOM0
VddOP
timeVd
d0 SNOM
DVSAM-Perf for SSHORT
Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance
SSHORT SSHORT
τSH
VddSH
• Even higher frequency than DVSAM-Perf for short service life
fSSHORT
fSNOM
Ulya Karpuzcu The BubbleWrap Many-Core 13
τOP
time
Crit
ical
Pat
h De
lay
SNOM0
VddOP
timeVd
d0 SNOM
DVSAM-Perf for SSHORT
Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance
SSHORT SSHORT
τSH
VddSH
• Even higher frequency than DVSAM-Perf for short service life
fSSHORT
fSNOM
Ulya Karpuzcu The BubbleWrap Many-Core 13
τOP
time
Crit
ical
Pat
h De
lay
SNOM0
VddOP
timeVd
d0 SNOM
DVSAM-Perf for SSHORT
Idea: Aggressive DVSAM-Perf for a short service life to get even higher performance
SSHORT SSHORT
τSH
VddSH
• Even higher frequency than DVSAM-Perf for short service life
fSSHORT
fSNOM
Ulya Karpuzcu The BubbleWrap Many-Core
BubbleWrap Environments
14
• Two choices for Throughput CoresThroughput Cores
Expendable Cores
Ulya Karpuzcu The BubbleWrap Many-Core
BubbleWrap Environments
14
• Two choices for Throughput Cores
• Nominal operationThroughput Cores
Expendable Cores
Ulya Karpuzcu The BubbleWrap Many-Core
BubbleWrap Environments
14
• Two choices for Throughput Cores
• Nominal operation
• Use DVSAM-Pow and expand the set of throughout cores for the same power budget
Throughput Cores
Expendable Cores
Ulya Karpuzcu The BubbleWrap Many-Core
BubbleWrap Environments
14
• Two choices for Throughput Cores
• Nominal operation
• Use DVSAM-Pow and expand the set of throughout cores for the same power budget
Throughput Cores
Expendable Cores
• Two choices for Expendable Cores
Ulya Karpuzcu The BubbleWrap Many-Core
BubbleWrap Environments
14
• Two choices for Throughput Cores
• Nominal operation
• Use DVSAM-Pow and expand the set of throughout cores for the same power budget
Throughput Cores
Expendable Cores
• Two choices for Expendable Cores
• Higher, constant Vdd until SSHORT; then discard
Ulya Karpuzcu The BubbleWrap Many-Core
BubbleWrap Environments
14
• Two choices for Throughput Cores
• Nominal operation
• Use DVSAM-Pow and expand the set of throughout cores for the same power budget
Throughput Cores
Expendable Cores
• Two choices for Expendable Cores
• Higher, constant Vdd until SSHORT; then discard
• DVSAM-Perf until SSHORT; then discard
Ulya Karpuzcu The BubbleWrap Many-Core
Hardware Support?• No change in the core architecture
• Need circuits to measure aging
15
Ulya Karpuzcu The BubbleWrap Many-Core
Hardware Support?• No change in the core architecture
• Need circuits to measure aging
• Need high-precision DVS
15
Ulya Karpuzcu The BubbleWrap Many-Core
Hardware Support?• No change in the core architecture
• Need circuits to measure aging
• Need high-precision DVS
• Clock and power distribution
15
Ulya Karpuzcu The BubbleWrap Many-Core
Hardware Support?• No change in the core architecture
• Need circuits to measure aging
• Need high-precision DVS
• Clock and power distribution• Two separate V/f domains: One for Expendable and one for
Throughput Cores
15
Ulya Karpuzcu The BubbleWrap Many-Core
BubbleWrap Evaluation• 32 core chip: NT = 16 Throughput and NE = 16 Expendable cores
16
Ulya Karpuzcu The BubbleWrap Many-Core
BubbleWrap Evaluation• 32 core chip: NT = 16 Throughput and NE = 16 Expendable cores
• 22nm high-k metal-gate process
16
Ulya Karpuzcu The BubbleWrap Many-Core
BubbleWrap Evaluation• 32 core chip: NT = 16 Throughput and NE = 16 Expendable cores
• 22nm high-k metal-gate process
• Multiprogrammed workload synthesized from SPEC2000
16
Ulya Karpuzcu The BubbleWrap Many-Core
BubbleWrap Evaluation• 32 core chip: NT = 16 Throughput and NE = 16 Expendable cores
• 22nm high-k metal-gate process
• Multiprogrammed workload synthesized from SPEC2000
• SESC enhanced by a power & thermal model
16
Ulya Karpuzcu The BubbleWrap Many-Core
Frequency Gains of Sequential Section
17
1 0.8 0.6 0.4 0.20.90
0.95
1.00
1.05
1.10
1.15
1.20
Nor
mal
ized
Freq
uenc
y G
ain
Sequential Fraction (LSEQ)
Ulya Karpuzcu The BubbleWrap Many-Core
Frequency Gains of Sequential Section
17
Base
1 0.8 0.6 0.4 0.20.90
0.95
1.00
1.05
1.10
1.15
1.20
Nor
mal
ized
Freq
uenc
y G
ain
Sequential Fraction (LSEQ)
Ulya Karpuzcu The BubbleWrap Many-Core
Frequency Gains of Sequential Section
18
Base DVSAM-Perf for SSHORT
0.90
0.95
1.00
1.05
1.10
1.15
1.20
1 0.8 0.6 0.4 0.2Nor
mal
ized
Freq
uenc
y G
ain
Sequential Fraction (LSEQ)
Ulya Karpuzcu The BubbleWrap Many-Core
Frequency Gains of Sequential Section
18
• Large f gains are feasible
Base DVSAM-Perf for SSHORT
0.90
0.95
1.00
1.05
1.10
1.15
1.20
1 0.8 0.6 0.4 0.2Nor
mal
ized
Freq
uenc
y G
ain
Sequential Fraction (LSEQ)
Ulya Karpuzcu The BubbleWrap Many-Core
Frequency Gains of Sequential Section
18
• Large f gains are feasible• f increases with smaller sequential section
Base DVSAM-Perf for SSHORT
0.90
0.95
1.00
1.05
1.10
1.15
1.20
1 0.8 0.6 0.4 0.2Nor
mal
ized
Freq
uenc
y G
ain
Sequential Fraction (LSEQ)
Ulya Karpuzcu The BubbleWrap Many-Core
Frequency Gains of Sequential Section
18
• Large f gains are feasible• f increases with smaller sequential section• For DVSAM-Perf, each expendable core runs for LSEQ/NE x SNOM
Base DVSAM-Perf for SSHORT
0.90
0.95
1.00
1.05
1.10
1.15
1.20
1 0.8 0.6 0.4 0.2Nor
mal
ized
Freq
uenc
y G
ain
Sequential Fraction (LSEQ)
Ulya Karpuzcu The BubbleWrap Many-Core
Power Consumption of Sequential Section
19
• Each Expendable core has max P budget of two cores
Ulya Karpuzcu The BubbleWrap Many-Core
1 0.8 0.6 0.4 0.2 00
0.5
1.0
1.5
2.0
Nor
mal
ized
Pow
er C
onsu
mpt
ion
Sequential Fraction (LSEQ)
Power Consumption of Sequential Section
19
• Each Expendable core has max P budget of two cores
Ulya Karpuzcu The BubbleWrap Many-Core
1 0.8 0.6 0.4 0.2 00
0.5
1.0
1.5
2.0
Nor
mal
ized
Pow
er C
onsu
mpt
ion
Sequential Fraction (LSEQ)
Power Consumption of Sequential Section
19
• Each Expendable core has max P budget of two cores
Base
Ulya Karpuzcu The BubbleWrap Many-Core
0
0.5
1.0
1.5
2.0
1 0.8 0.6 0.4 0.2 0
Nor
mal
ized
Pow
er C
onsu
mpt
ion
Sequential Fraction (LSEQ)
Power Consumption of Sequential Section
20
• Each Expendable core has max P budget of two cores
Base DVSAM-Perf for SSHORT
Ulya Karpuzcu The BubbleWrap Many-Core
0
0.5
1.0
1.5
2.0
1 0.8 0.6 0.4 0.2 0
Nor
mal
ized
Pow
er C
onsu
mpt
ion
Sequential Fraction (LSEQ)
Power Consumption of Sequential Section
20
• Each Expendable core has max P budget of two cores
• Tolerable power cost for the frequency gains
Base DVSAM-Perf for SSHORT
Ulya Karpuzcu The BubbleWrap Many-Core
Conclusion
21
The BubbleWrap Many-Core: Exploiting dormant cores for sequential acceleration
Ulya Karpuzcu The BubbleWrap Many-Core
Conclusion
• Simple homogeneous design
21
The BubbleWrap Many-Core: Exploiting dormant cores for sequential acceleration
Ulya Karpuzcu The BubbleWrap Many-Core
Conclusion
• Simple homogeneous design
• No architectural or software changes
21
The BubbleWrap Many-Core: Exploiting dormant cores for sequential acceleration
Ulya Karpuzcu The BubbleWrap Many-Core
Conclusion
• Simple homogeneous design
• No architectural or software changes
• Improves sequential and parallel performance
21
The BubbleWrap Many-Core: Exploiting dormant cores for sequential acceleration
Ulya Karpuzcu The BubbleWrap Many-Core
Conclusion
• Simple homogeneous design
• No architectural or software changes
• Improves sequential and parallel performance
• Fully sequential applications at 16% higher f
21
The BubbleWrap Many-Core: Exploiting dormant cores for sequential acceleration
Ulya Karpuzcu The BubbleWrap Many-Core
Conclusion
• Simple homogeneous design
• No architectural or software changes
• Improves sequential and parallel performance
• Fully sequential applications at 16% higher f
• Fully parallel applications at 30% higher throughput
21
The BubbleWrap Many-Core: Exploiting dormant cores for sequential acceleration
http://iacoma.cs.uiuc.edu/
Universityof Illinois
The BubbleWrap Many-Core:Popping Cores for Sequential Acceleration
Ulya Karpuzcu, Brian Greskamp, Josep TorrellasUniversity of Illinois