cacti-io: cacti with off-chip power-area-timing models norman p. jouppi ¥, andrew b. kahng †‡,...

50
CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥ , Andrew B. Kahng †‡ , Naveen Muralimanohar ¥ , Vaishnav Srinivas November 6 th , 2012 ECE and CSE Departments University of California, San Diego Hewlett-Packard Laboratories ¥ , Palo Alto

Upload: lydia-mathews

Post on 11-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models

Norman P. Jouppi¥, Andrew B. Kahng†‡,Naveen Muralimanohar¥, Vaishnav Srinivas†

November 6th, 2012

ECE† and CSE‡ DepartmentsUniversity of California, San Diego

Hewlett-Packard Laboratories¥, Palo Alto

Page 2: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(2)

Agenda

• Introduction• Need for off-chip power-area-timing

models• CACTI-IO models• Case studies using CACTI-IO:

• High-capacity DDR3 configurations• 3-D stacking• LPDDRx for servers

• Summary

Page 3: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(3)

Memory Subsystem Performance• Latency/Access times: The Memory Wall

• Modern architectures try to hide the latency impact

• Capacity: Need for large server main memory• Bandwidth: The Memory Bandwidth Limit

• Latency hiding techniques do not help• Off-chip limits bandwidth

Source: Rogers et al.Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling

Page 4: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(4)

Memory Subsystem Power

• Memory subsystem power a significant portion

Page 5: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(5)

Memory Subsystem Power

• Memory subsystem power a significant portion• DRAM

Page 6: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(6)

Memory Subsystem Power

• Memory subsystem power a significant portion• DRAM, Buffers

Page 7: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(7)

Memory Subsystem Power

• Memory subsystem power a significant portion• DRAM, Buffers, Caches

Page 8: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(8)

Memory Subsystem Power

• Memory subsystem power a significant portion• DRAM, Buffers, Caches, Interconnect/IO/PHY

Page 9: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(9)

Memory Subsystem Power

• Memory subsystem power a significant portion• DRAM, Buffers, Caches, Interconnect/IO/PHY• Off-chip IO power is a key component

Source: Economou et al.Full-System Power Analysis and Modeling for Server Environments

Page 10: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(10)

Off-chip Performance

• Memory bandwidth limited by off-chip interface

Page 11: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(11)

Off-chip Performance

• Memory bandwidth limited by off-chip interface• Source-synchronous signaling

Page 12: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(12)

Off-chip Performance

• Memory bandwidth limited by off-chip interface• Source-synchronous signaling• Signal/Power Integrity

Page 13: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(13)

Off-chip Performance

• Memory bandwidth limited by off-chip interface• Source-synchronous signaling• Signal/Power Integrity: ISI

Page 14: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(14)

Off-chip Performance

• Memory bandwidth limited by off-chip interface• Source-synchronous signaling• Signal/Power Integrity: ISI, Crosstalk

Page 15: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(15)

Off-chip Performance

• Memory bandwidth limited by off-chip interface• Source-synchronous signaling• Signal/Power Integrity: ISI, Crosstalk, Supply Noise

Page 16: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(16)

Off-chip Performance

• Memory bandwidth limited by off-chip interface• Source-synchronous signaling• Signal, power integrity: ISI, Crosstalk, Supply Noise• Pincount

Page 17: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(17)

Off-chip Power

• Off-chip power significant portion of the memory subsystem

Page 18: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(18)

Off-chip Power

• Off-chip power significant portion of the memory subsystem

• Higher off-chip capacitance and voltages

Page 19: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(19)

Off-chip Power

• Off-chip power significant portion of the memory subsystem

• Higher off-chip capacitance and voltages• Terminations and Vref-biased receivers

Page 20: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(20)

Off-chip Power

• Off-chip power significant portion of the memory subsystem

• Higher off-chip capacitance and voltages• Terminations and Vref-biased receivers• Clocking elements

Page 21: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(21)

Off-chip PAT Models For Architects• Off-chip models for full-system simulator

• Simulators today do not account for IO/PHY power• Accurate off-chip power and performance numbers• Co-optimize off-chip & on-chip power/performance • Explore new off-chip topologies and technologies

Full System Simulator

Off-Chip Power/

Area/Timing Models

Accurate Off-chip Power/

Peformance

On-Chip Power/

Area/Timing Models

Optimal On-chip and

Off-chip Configuration

Page 22: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(22)

CACTI-IO

• CACTI well known for memory architects• CACTI-IO includes off-chip PAT models• CACTI-IO config file includes off-chip

parameters• CACTI-IO Tech Report available

# Memory State (R=Read, W=Write, I=Idle or S=Sleep)

//-iostate "R"-iostate "W"//-iostate "I"//-iostate "S"

# Is ECC Enabled (Y=Yes, N=No)

-dram_ecc "N"

#Address bus timing

//-addr_timing 0.5 //DDR, for LPDDR2 and LPDDR3-addr_timing 1.0 //SDR for DDR3, Wide-IO//-addr_timing 2.0 //2T timing//addr_timing 3.0 // 3T timing

# Bandwidth (Gbytes per second, this is the effective bandwidth)

-bus_bw 12.8 GBps

# Memory Density (Gbit per memory/DRAM die)

-mem_density 2 Gb

# IO frequency (MHz) (frequency of the external memory interface).

-bus_freq 800 MHz

# Duty Cycle (fraction of time in the Memory State defined above)

-duty_cycle 1.0

# Activity factor for Data (0->1 transitions) per cycle (for DDR, need to account for the higher activity in this parameter. E.g. max. activity factor for DDR is 1.0, for SDR is 0.5) -activity_dq 1.0

# Activity factor for Control/Address (0->1 transitions) per cycle (for DDR, need to account for the higher activity in this parameter. E.g. max. activity factor for DDR is 1.0, for SDR is 0.5)

-activity_ca 0

# Number of DQ pins

-num_dq 1

# Number of DQS pins

-num_dqs 0 //8 differential pairs

# Number of CA pins

-num_ca 0

# Number of CLK pins

-num_clk 2 //1 differential pair

# Number of Physical Ranks

-num_mem_dq 2 //Number of ranks (loads on DQ and DQS) per DIMM or buffer chip

# Width of the Memory Data Bus

-mem_data_width 1 //x4 or x8 or x16 or x32 memories

Page 23: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(23)

Agenda

• Introduction• Need for off-chip power-area-timing

models• CACTI-IO Models• Case Studies using CACTI-IO:

• High-capacity DDR3 configurations• 3-D Stacking• BOOM: LPDDRx for servers

• Summary

Page 24: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(24)

Dynamic Power• Dynamic Power (switching lumped caps)

• Interconnect Power

intE

fVVCαDNP dd

i

SWcpinsdyn ii

fEαDNP intcpinsint

tL VSW Vdd / Z0 if 2tL tb

tb VSW Vdd / Z0 if 2tL > tb

Page 25: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(25)

Termination Power• DQ:

• Multi rank• Few termination types• READ and WRITE• Assume 50% 0’s, 1’s• Includes Rx, Tx

• CA:• Fly-by• VDD/2 termination

Page 26: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(26)

PHY Power• Reference generators• Vref-biased receivers• Clock distribution• DLL/PLL• Phase Rotators

Page 27: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(27)

Performance: Eye Compliance• Timing Budget: Tx, Channel, and Rx (setup/hold)• Voltage Budget: Tx (VOL/VOH), Channel, Rx (VIL/VIH)

Page 28: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(28)

Channel Jitter

• DOE for topology parameters• Ron/Rtt/Cdram some of the key parameters• Linear interpolation of Taguchi array

Page 29: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(29)

Timing Budget

i i

ijitter RJiDJT 2

avgjitterjitter TT _0)F(

i

avgjitterioijitter TFFT _

DS

setupskew

setupjittererror

ck

DH

holdskew

holdjittererror

ck

TTTTT

TTTTT

4

4

Page 30: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(30)

Voltage Budget

NISWNN VVKV

N

SSOISIxtalkN

K

KKKK

for DOE

ILHrefM

NSWM

VVV

VVV

2

Page 31: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(31)

Area

fkfkfkR

N

)R,(R

kANArea

ONIO

TTIONIOIO

33

221

00

1

2min

• Driver area depends on RON and RTT

• Predriver stages fanout to driver• Fixed area for ESD and controls

Page 32: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(32)

Validation

• CACTI-IO models account for off-chip power, area and timing

• Validation against SPICE • Within 15% error across all the simulations• Lookup tables validated by construction

Page 33: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(33)

Power for LPDDR2 DQ Single-Lane

Total IO Power

Page 34: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(34)

Power for DDR3 DQ Single-Lane

Termination PowerTotal IO Power

Page 35: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(35)

Agenda

• Introduction• Need for off-chip power-area-timing

models• CACTI-IO Models• Case Studies using CACTI-IO:

• High-capacity DDR3 configurations• 3-D Stacking• BOOM: LPDDRx for servers

• Summary

Page 36: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(36)

Case Studies Using CACTI-IO

• We present three case studies:• High-capacity DDR3 configurations• 3-D configurations• BOOM (Buffered Output On Module): LPDDRx

for servers• Compare the configurations for:

• Capacity• Bandwidth• IO Power Efficiency

• BOOM case study with IO+DRAM power

Page 37: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(37)

Case Study 1: High-capacity DDR3• RDIMM

Page 38: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(38)

Case Study 1: High-capacity DDR3• RDIMM, LRDIMM

Page 39: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(39)

Case Study 1: High-capacity DDR3• RDIMM, LRDIMM, BoB (Buffer on Board) • BoB uses serial bus to host

Page 40: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(40)

Case Study 1: High-capacity DDR3• RDIMM, LRDIMM, BoB (Buffer on Board) • BoB uses serial bus to host• LRDIMM offers highest capacity• BoB offers best bandwidth and

power efficiency per GB of capacity

Page 41: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(41)

Case Study 2: 3-D Stacking• TSS based• Peak bandwidth of 176

GB/s for Micron’s Hybrid Memory Cube (HMC)

• Power efficiency varies by around 2X

Source: Micron

Page 42: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(42)

BOOM: LPDDRx for servers

• BOOM (Buffered Output On Module) architecture from Hewlett-Packard:• Buffer chip on the board• LPDDRx memories (lower speed, power)• Wider bus from the buffer to the DRAMs

• Achieves better power efficiency using LPDDRx memories

• Still meets performance using buffer

Page 43: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(43)

BOOM Topology

Page 44: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(44)

Case Study 3: BOOM

• 50% increase in IO efficiency with LPDDRx• No terminations with wider, slower buses• Serial bus from the buffer offers more

savings

Page 45: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(45)

BOOM: IO+DRAM Power

Page 46: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(46)

BOOM: IO+DRAM Power

• IO power a significant portion of the combined power (DRAM+IO): 50-60%

• IO Idle power a very significant contributor• LPDDR2 unterminated signaling reduces idle

power• BOOM-N4-L-400 w/ serial bus to host

provides a 3.4X energy savings (DRAM+IO) over the BOOM-N2-D-800

• Combining IO+DRAM allows for correct optimizations

Page 47: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(47)

Optimizing Fanout• IO power vs. number of ranks while

capacity and bandwidth are constant• Slower and wider provides better power• Die area and clock distribution goes up as

bus gets wider, so 200-400MHz seems like a sweet spot

BWfW

CapacityWWN

B

MBR

2

)/(

Page 48: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(48)

Agenda

• Introduction• Need for off-chip power-area-timing

models• CACTI-IO Models• Case Studies using CACTI-IO:

• High-capacity DDR3 configurations• 3-D Stacking• BOOM: LPDDRx for servers

• Summary

Page 49: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

(49)

Summary• Introduced CACTI-IO with off-chip models• CACTI-IO models include

• IO/Interconnect dynamic and termination power• PHY power• Voltage/Timing budgets for eye compliance• IO area

• 3 case studies show the capabilities of CACTI-IO• Calculate off-chip power/area/timing• Combine on-chip and off-chip power• Identify key configuration choices and optimizations

• Ongoing work:• Extend the models to other types of off-chip memory

and off-chip configurations, including PCRAM

Page 50: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,

Thank You!