energy efficient designs with wide dynamic range vivek de intel fellow director of circuit...

24
Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

Upload: edith-brunson

Post on 01-Apr-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

Energy Efficient Designs with Wide Dynamic Range

Vivek DeIntel Fellow

Director of Circuit Technology ResearchCircuits Research Lab

Intel Labs

Page 2: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

2

2

Platform segment characteristics

Few designs must serve all segments

Serv

er

Deskto

pM

ob

ile

MID

Power Workload ThermalV-F Cores AmbientLoad

High

Med

Low

Verylow

ThroughputParallelHPC-RAS

BurstSerial-ParallelStreamGraphics

Med/HighWide

Med/HighWide

Med/HighWide

Low/MedWide

SOC

Active

Fan (-less)

Fan (-less)

PassiveFan-less

HighWide

Med-HighWide

Med-HighWide

Low-MedWide

ControlledCool

ControlledVarying

UncontrolledExtreme

UncontrolledExtreme

BurstSerial-ParallelStreamGraphics

BurstSerial-ParallelStreamGraphics

Page 3: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

3

3

Dynamic platform control

Deliver best user experience under constraints

Scalable On-die I nterconnect FabricScalable On-die I nterconnect Fabric

Graphics

Video

SpecialPurposeEngines

IntegratedMemory

Controllers

Off Die interconnect

Cache Cache Cache

Last LevelCache

Last LevelCache

Last LevelCache

Scalable On-die I nterconnect FabricScalable On-die I nterconnect Fabric

Graphics

Video

SpecialPurposeEngines

IntegratedMemory

Controllers

Off Die interconnect

Cache Cache Cache

Last LevelCache

Last LevelCache

Last LevelCache

Scalable On-die I nterconnect FabricScalable On-die I nterconnect Fabric

Graphics

Video

SpecialPurposeEngines

Graphics

Video

SpecialPurposeEngines

IntegratedMemory

Controllers

Off Die interconnect

Cache Cache Cache

Last LevelCache

Last LevelCache

Last LevelCache

Cache Cache CacheCache Cache Cache

Last LevelCache

Last LevelCache

Last LevelCache

DynamicV/F control

IndependentV/F control

regions

Workload-basedcore activation

& shutdown

Scenario-basedpower allocation

Maximizeperformance& efficiency

Page 4: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

4

4

Dynamic adaptation & reconfiguration

Adapt & reconfigure for best power-performance

Dyn

am

ic C

on

trol U

nit

Processor

AgingSensor

ThermalSensor

VoltageSensor

CurrentSensor

Sensors

VoltageControl

FrequencyControl

ConfigurationControl

Change V

Change F

Reconfigure

Page 5: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

5

5

Resilient platforms

Resiliency for performance, efficiency & reliability

Circuit & Design

Microarchitecture

Microcode

Firmware

VM

OS

Programming System

Applications

Low

er

err

or

rate

Less re

covery

overh

ead

Less

sili

con

overh

ead

Resiliency framework

Systemadaptation

Systemrecovery

Errorcorrection

Faultconfinement

Faultdiagnosis

Errordetection

Systemreconfiguration

Resilie

nt

pla

tform

featu

res

Page 6: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

6

6

TFLOPS at 62 watts

Time (msec)

0.0 1.0 1.50.5

user-demanddelivered

Ene

rgy

Time (msec)

0.0 1.0 1.50.5

user-demanddelivered

Ene

rgyCoarse-grain

management

Slow voltage &frequency

change

Fine-grain management

FUTURE

Time (msec)

0.0 1.0 1.50.5

user-demanddelivered

Ene

rgy

Time (msec)

0.0 1.0 1.50.5

user-demanddelivered

Ene

rgy

Fine-grain power managementTemporal domain

Fast voltage & frequency change

TODAY

• V/F change latency• Active-sleep transition latency • Wake-up latency• Wake-up prediction• State transition energy• Supply noise

Page 7: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

7

7

Fine-grain power managementSpatial domain

• Same voltage to all cores• Same frequency for all cores

Coarse-grain management

TODAY

• Each core/cluster at optimum voltage• Each core/cluster at optimum frequency

Fine-grain management

FUTURE

• V/F domain interfaces• Synchronization overhead• Clock generation/distribution• Power grid routing• Optimum V/F for non-cores• Sub-core clock/leakage gating

Page 8: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

8

8

ηηPSPS

IILOADLOAD

Conventional Conventional Power SourcePower Source

Efficiency Efficiency

Load AdaptiveLoad AdaptivePower Source Power Source

ηηPSPS

IILOADLOAD

Conventional Conventional Power SourcePower Source

Efficiency Efficiency

Load AdaptiveLoad AdaptivePower Source Power Source

Advanced voltage regulators

• Area efficient• Scalable • Persistent rail

Conversion

Efficient

TOP

BOTTOM

Converter chip

Load chip

Inductors

InputCapacitors

OutputCapacitors

5mmRF

Launch

70

75

80

85

90

0 5 10 15 20

Load Current [A]

Eff

icie

ncy

[%

]

L = 1.9nH

L = 0.8nH2.4V to 1.5V

2.4V to 1.2V

60MHz

100MHz80MHz

• Lower loss• Higher fidelity • Simpler

Distribution

Low loss

• Fast & efficient• Load adaptive• Independent rails

Control

Fine-grain

VR innovations for fine-grain power management

Page 9: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

9

9

2KB Data memory (DMEM)

3K

B In

st.

me

mo

ry (

IME

M) 6-read, 4-write 32 entry RF

32

64 64

32

64

RIB

96

9696

Mesochronous Interface

Processing Engine (PE)

Crossbar Router

MS

INT39

39

20 GB/s

FPMAC0

x

+

Normalize

32

32

FPMAC1

x

+32

32M

SIN

TM

SIN

T

MSINT

MSINT

Normalize

Tile

Research testchip

I/O Area

I/O Area

PLL

single tile

1.5mm

2.0mm

TAP

21

.72

mm

I/O Area

PLL TAP

12.64mm

Page 10: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

10

10

Many-core power management

Dynamic sleep

STANDBY: • Memory retains data• 50% less power/tileFULL SLEEP: • Memories fully off• 80% less power/tile

21 sleep regions per tile (not all shown)

FP Engine 1

FP Engine 2

Router

Data Memory

Instruction Memory

FP Engine 1

Sleeping:90% less

power

FP Engine 2

Sleeping:90% less

power

RouterSleeping:

10% less power(stays on to pass traffic)

Data MemorySleeping:

57% less power

Instruction MemorySleeping:

56% less power

Energy efficiency of 19.4 Gflops/Watt

Page 11: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

11

11

Memory integration

Polaris

DRAM

Package

denser than C4 pitch

C4 pitch

Processor with Cu bump

Package

Processor

Memory

256 KB SRAM per core4X C4 bump density8490 thru-silicon vias

Thru-SiliconVia

3D stacking with thru-silicon vias

Page 12: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

12

12

Voltage-frequency range limiters

Reliability & functional failures limit range

Volt

ag

e

Frequency

Vmax

Vmin

Fmax

• Reliability• Thermals• Power delivery

Vmax/Fmax limiters

• Circuit functional failures• Soft errors• Steep frequency roll-off• Aging

Vmin limiters

Page 13: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

13

13

Voltage-frequency marginsVolt

age

Frequency

Volt

age

Frequency

V m

arg

inIR drop

Inductivedroops

Load linevariations

V variation T variation

F margin

Nominal T

Worst T

V m

arg

in

F margin

Volt

age

Frequency

Volt

age

Frequency

Aging Path activity

BOL

EOL

V m

arg

in

F margin

Nominal

Worst

F margin

V m

arg

in

MISSignal

coupling

Criticalpath

Page 14: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

14

14

Array voltage

Cum

ula

tive f

ail

rate

Nominal array VminWorst die Vmin

SER, erratic bits

Array Vmin

Density

Vm

in

8T+ cell

Multi-V6T SRAM

6T SRAM

LLC density

Multi-voltage cache

Embedded level Embedded level shifters for wordline & shifters for wordline & write drivers minimize write drivers minimize area & power overheadarea & power overhead

Push active Vmin limit to VmaxPush active Vmin limit to Vmax

Page 15: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

15

15

Read WriteRetentio

n

NX WeakStron

g-

NPDStron

gWeak Balanced

PPUStron

gWeak Balanced

Dynamic multi-voltage cache6T SRAM cell

WL

BL

BL

#VCC

NX

NPD

PPU

VCC

R1

R2

VWL_

+

VOPAMP

0

TM

Magnitude

Duration

C

VCC

VSS

TD

WL

TD

M = VCC(1-e-T /RC)M

R

128K

b

min

-ce

lls

IO pad drivers

PB

IST

blo

ck

128K

b

min

-ce

lls

128K

b

min

-ce

lls

WL

UD

&

du

mm

y W

Ls

128K

b

p-c

ells

128K

b

p-c

ells

128K

b

p-c

ells

107K

b

p-c

ells

Transistor count 6.2M

Chip Area 0.91mm2

Vcc 1.1V-0.7V

Testing interfacemembrane probe card

Wordline underdrive for read Dynamic voltage collapse for write

45nm dynamic multi-V testchip

1.E+00

1.E+02

1.E+04

1.E+06

1.E+08

1.E+10

0.7 0.8 0.9 1 1.1

VWL(V)

Re

lati

ve

Sin

gle

Bit

Fa

ils

Strongest DVC

Weakest DVC

DVC off

Nominal DVC

MIN cell

26X less fails

Array to WL differential supply noise tracking

Pulse width control

Source: Intel

Page 16: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

16

16

Reduce cache size @ low V/F by eliminating failing words/bits

00101000

Word disable

Failing words

Bitmap of failing words

Bit fix

1.0E-11

1.0E-10

1.0E-09

1.0E-08

1.0E-07

1.0E-06

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

1.0E+00

0.4 0.45 0.5 0.55 0.6 0.65 0.7Supply Voltage

Fa

ilu

re P

rob

.

1-bit ECC

Word disable

Bit fix10-bit ECC

Vmin(mV)

Density

Capacity

Latency

(cycles)

IPC EPI

Conventional 660 1* 1*(8-way)

L1: 3L2: 20

1* 1*

32KB L1Word disable

500 0.92 0.5(4-way)

4 0.95 0.5

2MB L2Bit fix

500 1 0.75(6-way)

23 0.95 0.5* Normalized reference value

Cache reconfiguration

Source: Intel

Source: Intel

Page 17: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

17

17

Low-voltage logic design

4:1 Mux

“1”

“1”

“1”

“0”

“0”

“0”

“1”

“1”

“1”

“0”

“0”

“0”

Narrow muxes No stack height > 2

input

output

vcch vcch

vcch

vccl

vcch

input

output

vcch vcch

vcch

vccl

vcch

Robust level converters

Effi

cien

cy: M

IPS/W

att

Performance

Improve range

Impact max performance

Robust flip-flops

Effi

cien

cy: M

IPS/W

att

Performance

Improve range& efficiency

Design & technology optimizations to balancerange, performance & efficiency

Page 18: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

18

18

Low-voltage motion estimation engine

I/O Control

Output FIFO Input FIFOClock Generator

I/O ControlMotion Estimation Engine

Clock Generator

I/O Control

Output FIFO Input FIFOClock Generator

I/O ControlMotion Estimation Engine

Clock Generator

65nm CMOS70K transistors

Die area ~1mm2

0

50

100

150

200

250

300

350

400

450

0.2 0.4 0.6 0.8 1 1.2 1.4

Supply Voltage (V)

GO

PS

/Wat

t

P1264, 50°C

9.6X

0

50

100

150

200

250

300

350

400

450

0.2 0.4 0.6 0.8 1 1.2 1.4

Supply Voltage (V)

GO

PS

/Wat

t

P1264, 50°C

9.6X

412 Gops/Watt

@ 320 mV!

1

10

100

1000

10000

0.2 0.4 0.6 0.8 1 1.2 1.4

0.01

0.1

1

10

100

Supply Voltage (V)

Fm

ax (

MH

z)

Wide V-F range

0

50

100

150

200

250

300

350

400

450

0.2 0.4 0.6 0.8 1 1.2 1.4

Supply Voltage (V)

GO

PS

/Wat

t

P1264, 50°C

9.6X

0

50

100

150

200

250

300

350

400

450

0.2 0.4 0.6 0.8 1 1.2 1.4

Supply Voltage (V)

GO

PS

/Wat

t

P1264, 50°C

9.6X

0

50

100

150

200

250

300

350

400

450

0.2 0.4 0.6 0.8 1 1.2 1.4

Supply Voltage (V)

GO

PS

/Wat

t

P1264, 50°C

9.6X

0

50

100

150

200

250

300

350

400

450

0.2 0.4 0.6 0.8 1 1.2 1.4

Supply Voltage (V)

GO

PS

/Wat

t

P1264, 50°C

9.6X

412 Gops/Watt

@ 320 mV!

1

10

100

1000

10000

0.2 0.4 0.6 0.8 1 1.2 1.4

0.01

0.1

1

10

100

Supply Voltage (V)

Fm

ax (

MH

z)

Wide V-F range

Source: Intel Source: Intel

Page 19: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

19

19

Dynamic V & F adaptation

Clocking

Input Buffer

Sensors& Analog

DAB

TCP/IP Processor Core

JTAG

Noi

se g

en

Noi

se g

en

TCP/IP processor

PLL0

PLL1

DAB Control

Thermal sensor

Div

PMOS CBG

NMOS CBG

core clk

gate

Droop sensor

Time

Tem

p

Time

Vc

c

PLL2

NMOS body bias

PMOS body bias

I/O clk

Noise injector

CL

OC

KIN

GC

ON

TR

OL

F0

Inp

ut b

uffe

r

Ou

tpu

t po

rt

F1

F2

1st droop

2nd droop 3rd droop

ctrl

PL

L c

om

man

d

VR

M

• Adapt F/V to V/T change reduce V/T margin• Adapt F/V to aging reduce aging margin

Environment-aware dynamic adaptation

Prototype chip in

90nm

Source: Intel

Source: Intel

Page 20: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

20

20

Resilient circuits

MSFF MSFF

clkclk

Voltage droop

Error!

Errordet. • Detect errors in critical path FFs

• Propagate error signals• Correct errors by re-execution• Feedback to adaptive V/F

Error Detection Sequential (EDS)

65nm resilient circuits testchip

Page 21: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

21

21

Resiliency experimentsResponse to voltage droops

0

50

100

150

200

1.0 1.5 2.0 2.5 3.0

Throughput (BIPS)

Po

wer

(m

W)

Conventional Design

Resilient Design

21% Throughput GainOR

37% Power Reduction

0.0

1.0

2.0

3.0

4.0

5.0

2100 2400 2700 3000 3300 3600

Clock Frequency (MHz)

Th

rou

gh

pu

t (B

IPS

)

1.0E-13

1.0E-10

1.0E-07

1.0E-04

1.0E-01

1.0E+02

Erro

r Ra

te (%

)Conventional Design

Max TP

Resilient DesignMax TP

Nominal: VCC=1.2V & Temp=60CWorst-Case: 10% VCC Droop & Temp=110C

VCC & TemperatureFCLK Guardband

0.0

1.0

2.0

3.0

4.0

5.0

2100 2400 2700 3000 3300 3600

Clock Frequency (MHz)

Th

rou

gh

pu

t (B

IPS

)

1.0E-13

1.0E-10

1.0E-07

1.0E-04

1.0E-01

1.0E+02

Erro

r Ra

te (%

)Conventional Design

Max TP

Resilient DesignMax TP

Nominal: VCC=1.2V & Temp=60CWorst-Case: 10% VCC Droop & Temp=110C

VCC & TemperatureFCLK Guardband

Source: Intel

Source: Intel

Page 22: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

22

22

Summary

• Energy efficiency and wide dynamic operating range are critical for all platforms

• Integration, fine-grain power management, advanced voltage regulators & 3D memory stacking are key for energy efficiency

• Reliability, functionality, margins & efficiency limit dynamic operating range

• Multi-voltage design, dynamic adaptation, reconfiguration & resiliency are key enablers

Page 23: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

23

23

AcknowledgementCRL prototype Bangalore Design LTD Design Murli TirumalaTomm Aldridge Mark Anders Paolo Aseron Ravi MahajanJerry Bautista Nitin Borkar Shekhar Borkar Ravi PrasherKeith Bowman Saurabh Dighe Zeshan Chishti Venkat NatarajanLev Finkelstein Shih-Lien Lu Varghese George Anand DeshpandeMarci Glenn George Goodman Steve Gunther Ali FarhangMatthew Haycock Chris Wilkerson Ming Zhang Amit AgarwalAndrew Henroid Yatin Hoskote Jason Howard Robert ChauSteven Hsu Tanay Karnik Himanshu Kaul Rajesh KumarAlaa Alameldeen M. Khellah V. Erraguntla Ketan ParanjapeSean Koehl R. Krishnamurthy Partha Kundu Greg TaylorSanu Mathew Randy Mooney A. Raychowdhury P. VishakantaiahAlon Naveh Trang Nguyen Fabrice Paillet R. KuppuswamyClark Roberts Ronny Ronen Erfaim Rotem Rohit VidwansMark Rowland Greg Ruhl Gerhard Schrom Gautam DoshiJoe Schutz D. Somasekhar James Tschanz Sunit TyagiSriram Vangal Manny Vara Howard Wilson B. ChatterjeeBibiche Geuskens Steven Hsu Kevin Zhang Sati BanerjeeClair Webb Mark Bohr Gunjan Pandya

Page 24: Energy Efficient Designs with Wide Dynamic Range Vivek De Intel Fellow Director of Circuit Technology Research Circuits Research Lab Intel Labs

24

24

Q&A