1 leveraging the core-level complementary effects of pvt variations to reduce timing emergencies in...

27
1 Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan 1 , Xiaoyao Liang 2 , Yinhe Han 1 , and Xiaowei Li 1 1. Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) 2. NVIDIA Corporation Jun. 23, 2010

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

1

Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing

Emergencies in Multi-Core Processors

Guihai Yan1, Xiaoyao Liang2, Yinhe Han1, and Xiaowei Li1

1. Key Laboratory of Computer System and Architecture,Institute of Computing Technology, Chinese Academy of

Sciences (ICT, CAS)

2. NVIDIA CorporationJun. 23, 2010

2Key Laboratory of Computer System and Architecture, ICT CAS

Outline

Introduction to PVT variations

Analyzing “complementary effect”

• Timing domain

• Frequency domain

Implementation challenges & solutions

Experimental results

3Key Laboratory of Computer System and Architecture, ICT CAS

Introduction to variations Variation sources

• Process variation– Random dopant fluctuation– Sub-wave length lithography

• Voltage variation– Parasitic power delivery networks– Application variability– Inductive noise, IR-drop

• Temperature variation– Imbalanced activity– Hotspot

We focus on the primary manifestation• Performance variation

4Key Laboratory of Computer System and Architecture, ICT CAS

Process variation Sub-wavelength Lithography

• “What you get is not what you want”

• Systematic Random dopant fluctuations

• Vth variation• Random

1980 1990 2000 2010 2020

100nm

1m

10nm

1980 1990 2000 2010 20201980 1990 2000 2010 2020

100nm

1m

10nm

193nm193nm248nm248nm

365nm365nmLithographyLithographyWavelengthWavelength

65nm65nm

90nm90nm

130nm130nm

GenerationGeneration

GapGap

45nm45nm

32nm32nm

180nm180nm

13nm 13nm EUVEUV

Sub-wavelength lithography

[Borkar, DAC’09]

[Aitken, ATS’07]

Max Freq. differentiate by 20% !

[Teodorescu, ISCA’08]

P variation is time-independent, “DC component”

5Key Laboratory of Computer System and Architecture, ICT CAS

Temperature variation Application-

specific Slow-varying

• Milliseconds

• Typical thermal constant: 2ms

[Donald, ISCA’06]

Measured PentiumM processor temperatures

EL Synthesizer

EL Synthesizer

EL Synthesizer

EL Synthesizer

TM Agent

Core1 Core2

Core3 Core4

T variation is slow-varying, “Low-frequency components”

6Key Laboratory of Computer System and Architecture, ICT CAS

Voltage variation Fast-changing

• Inductive noise– a.k.a. L(di/dt) problem

• IR-drop

Hierarchical PDN

Why it is harder to keep a constant voltage level ?

V variation is fast-changing, “High-frequency components”

ExamplePower budget: 100WWorking voltage: 1V

Current: 100ATo keep voltage fluctuation

between ±5%, RPDN < 0.5 mOhm

Rcc

Rcc

Rcc

Rcc

Bump1

Bump4 Bump3

Bump2

Rb

Lb Lb

Lb Lb

Rb

RbRb

Core1 Core2

Core3Core4

GND

Cdecap

GND

GNDGND

Vc1Vc2

Vc3Vc4

Vcc grid

Vss grid

Vcc bump

Vss bump

Decap

GND

0.2 mOhm 0.2 mOhm 0.4 mOhm

90 pH 12 pH990 uF

2 mOhm

423 pH 45 pH

0.4 mOhm

264 uF

20 pH

0.15 mOhm

1222 uF

On-Chip Power Grid

Motherboard Socket and PackageVRM

(a) Power Delivery Path for Intel Xeon 5500 series Processors

(b) On-Chip Core-level Power Grid Model (c) Inter-Core Power Grid Model

Cavity Caps

7Key Laboratory of Computer System and Architecture, ICT CAS

Resultant impact of PVT variations

Fast cores

Slow coresViolent apps.High temp.

Low temp.Mild apps.

Timing (Delay) Variation

8Key Laboratory of Computer System and Architecture, ICT CAS

Prior solutions Strive to compensate P, V, and T variation

individually• Mitigate P variation

– ReCycle[ISCA’06], Body Bias[Micro’07], ReVIVal[ISCA’08] et al.

• Stabilize V variation– Pipeline damping[ISCA’03], DeCoR[HPCA’08] et al.

• Balance T variation– Hotspot [ISCA’03], DVFS + Activity

Migration[ISCA’03, HPCA’01, TODAES’07] et al.

Other timing-oriented solutions• Razor[JSSC’06], EVAL[Micro’08],

Tribeca[Micro’09] et al.

9Key Laboratory of Computer System and Architecture, ICT CAS

Our perspective Focus on the essential Timing issue

Delay variation

Process variation

Voltage variation

Temp. variation

Not Necessarily aggregated, but can cancel off each others in some cases. Hence, “Complementary”

Design Goal: Minimize Delay variation

Process

Voltage Temp.

Delay

10Key Laboratory of Computer System and Architecture, ICT CAS

Some terms Timing emergency (TE) Emergency level (EL)

• “Density” of TE• Define: EL = # of TE per

100 millions cycles

Violent vs. Mild • Voltage

– Large fluctuation = Violent

– Small fluctuation = Mild

• Temperature– “Hot” = Violent– “Cool” = Mild

• Process – Slow corner = Violent– Fast corner = Mild

Time

Del

ay

Timing Emergency

Threshold

Mild

Violent

Voltage Traces

11Key Laboratory of Computer System and Architecture, ICT CAS

How PVT Variations Complement each other ?

Observation in time domain

What if exchange the threads on Core1 and Core2?

T. Mild, V. MildCore1:

Large margin, low EL

T. Violent, V. ViolentCore2:

Little margin, High EL

Time

Del

ay

Threshold

Time

Del

ay

T Violent, V Violent

T Mild, V Mild T Mild, V Violent

T Violent, V Mild

Emergency

Excessive headroom

Mild + Violent

12Key Laboratory of Computer System and Architecture, ICT CAS

Frequency domain analysis

Y(f) = FFT(D(t)) Sample interval: 5ns Span of analysis: 1ms

DC component: “P”Low freq. component: “T”High freq. component : “V”

13Key Laboratory of Computer System and Architecture, ICT CAS

The strength of each component of PVT variations

Migrate threads = “ Graft” V component

Del

ay

DT

H

Time

Core2

Del

ay

DT

H

Time

Core1

TM

TM

T V

FrequencyS

pect

rum

de

viat

ion

T V

Frequency

Spe

ctru

m

devi

atio

n

T

V

Frequency

Spe

ctru

m

devi

atio

n

T V

FrequencyS

pect

rum

de

viat

ion

P P

P P

P T

PT

14Key Laboratory of Computer System and Architecture, ICT CAS

Frequency domain analysis (cont.)

Relative frequency spectrum deviations on 2GHz quad-core processor. • P: 0-100Hz, T: 100Hz-1MHz, V: 1MHz-250MHz.

Potential • Core3 and Core4 are mild

Strategy • exchange threads on Core1 and Core4, Core2 and Core 3

15Key Laboratory of Computer System and Architecture, ICT CAS

How to exploit such “complementary effect”?

Straightforward approach

T component P component V component

Product test Voltage sensor Temp. sensor Aging sensor

Xyz sensor

•Pros. •Conceptually simple

•Cons. •Slow: V. and T. sensor are slow•Incomprehensive: e.g. what if aging ? Our approach: Delay sensor-based scheme

Delay sensor

V component (P+T) component

•Pros. •Fast•Comprehensive (Timing)

•Cons.•Need a little trick

16Key Laboratory of Computer System and Architecture, ICT CAS

Implementation (cont.)

What we have known

• Delay variation – Delay sensors

What we need to know • The strength of PT and

V component

How to bridge the gap?

Three challenges• Infer PVT component from delay Values

• On-the-fly thread migration decision-making

• On-the-fly variation prediction

17Key Laboratory of Computer System and Architecture, ICT CAS

Top view of architecture

EL Synthesizer

EL Synthesizer

EL Synthesizer

EL Synthesizer

TM Agent

To Inter-Cluster TM Agent

Core1 Core2

Core3 Core4

Delay sensor

I/O Interface

I/O Interface

$

Interconnect Network

$

$

$

Inter-Cluster

TM Agent

Cluster1

Cluster2 Cluster4

Timing Emergency Aware + Thread Migration

TEA-TM

18Key Laboratory of Computer System and Architecture, ICT CAS

Infer PVT component from Delay Values

Use mean delay to infer PT component ( < 1MHz )

This simplification greatly facilitates cost-efficient implementation of TEA-TM. Then, how about “V component”?

Mean delay

PT component

19Key Laboratory of Computer System and Architecture, ICT CAS

On-the-fly TEA-TM Decision Making

Urgent First Policy (UFP) • Do NOT directly rely on accurate V-component

Basic idea:

• Migrate the threads running on the highest EL core to the core with the smallest PT component. —— Always right, but may not be optimum!

EL = PT “+” V

Core1 Core2

Emergency Level

PT Component

TM

Refer to our paper for the more sophisticated “DUFP” heuristic

20Key Laboratory of Computer System and Architecture, ICT CAS

On-the-fly Variation Prediction Objective: reducing the emergency level in the future

• Emergency Level

• PT component

•Linear prediction mechanismEL prediction result

21Key Laboratory of Computer System and Architecture, ICT CAS

Experiments Methodology

• Trace-based evaluation

Modeled processor• Quad-core• Superscalar• 2GHz

PDN• Similar to Intel Xeon

5500 quad-core microprocessor

• 130W (peak 150W) Workload

Wattch

HotSpot

Power Traces

Floorplan Info.

Voltage Traces

Current Traces

Thermal Traces

Applications

PDN Model Info.

Hspice

22Key Laboratory of Computer System and Architecture, ICT CAS

Metrics Relative throughput loss

Relative Fairness

Where,

Where,

23Key Laboratory of Computer System and Architecture, ICT CAS

Impact of TM interval on average EL reduction

No migration overhead accounted 1ms at 2GHz, migration overhead is negligible 0.3 ms at 2GHz, migration overhead < 15%

Perf. Overhead & EL Reduction

Overall Throughput

Minimal TM Interval

Large Migration Penalty

Large Emergency

Rate

•When take migration penalty into account

24Key Laboratory of Computer System and Architecture, ICT CAS

Reduction in Relative Throughput Loss

TM Interval: 0.2ms, Accuracy: 90% Developing more sophisticated heuristics

25Key Laboratory of Computer System and Architecture, ICT CAS

Fairness Improvement

80% fairness improvement

26Key Laboratory of Computer System and Architecture, ICT CAS

Conclusion Analyzing the complementary effect

• from both time and frequency domain

Presenting a delay sensor-based scheme (TEA-TM) to exploit the comp. effect• Simple, cost-efficient

The experimental results show• Improved throughput

• Improved fairness

27Key Laboratory of Computer System and Architecture, ICT CAS