high speed interface techniques for computer … speed interface techniques for computer peripherals...

17
Dr.-Ing. Peter Gregorius [email protected] [email protected] High Speed Interface Techniques for Computer Peripherals Workshop High Speed Interconnects Universität Stuttgart INT 07. November 2008

Upload: vuongkien

Post on 29-Apr-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

Dr.-Ing. Peter [email protected]

[email protected]

High Speed Interface Techniquesfor Computer Peripherals

Workshop High Speed InterconnectsUniversität Stuttgart INT07. November 2008

ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 2

High Speed Interface Techniques

Introduction, Trends and Limitations

Outlook & Summary

Review on Cascading Techniques (ISSCC2007)

Clocking

ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 3

Computer Peripherals I/O-Trend

SDRAM Datenrate pro Pin

Memory

Controller

Memory

Controller

Memory

Controller

Memory

Controller

High Performance Graphic Memory

High Performance PC Memory

GDDR� 1Gbit/s

GDDR3� 2.4Gbit/s

GDDR4� 2.8Gbit/s

GDDR5� >5Gbit/s

DDR� 0.4Gbit/s

DDR2� 0.8Gbit/s

DDR3� 1.6Gbit/s

DDR4� 3.2Gbit/s

ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 4

Memory Roadmap – Signaling

Example: GDDR5 / 2D planar chip-to-chipExample: GDDR5 / 2D planar chip-to-chip Example: DDR4 (under discussion) / dual slotExample: DDR4 (under discussion) / dual slot

2Gbit/s/pin TUI= 500ps Teye= 340ps

8Gbit/s/pin TUI= 125psTeye= 45ps

Note: SE signaling, channel data contains worst-case cross-talk and is based on actual board design.

2Gbit/s/pin TUI= 500psTeye= 180ps

3.2Gbit/s/pin TUI= 312.5ps

Teye= -

Conditions:1. SE signaling, at nominal supply voltage (1.5V)2. No power supply noise3. Physical BER=10-12

4. 4 Layer PCB

ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 5

Pin count, channel bandwidth and data rate

Standard ►

0

21

22

Memory I/O Evolution

20

0

4

6

8

2

23

24

25

26

27

SE

Off P

ackageLim

it ~ 2.6Gbit/s (B

ER

=10-12)

DS

Off P

ackageLim

it ~12Gbit/s (B

ER

=10-12)

Northbirdge

~150 Signals

► For the last decade the effective (DQ) data pin count remains constant. The evolution in data rate and bandwidth was based on the increase in frequency!

►Following the bandwidth roadmap in 2012/13 single ended signaling will hit a wall (signal + power integrity!)

►Differential signaling (DS) may give room for one or two generations but means a stop of the (G)DDRx evolution.

BW, Speed, Latency(GByte/s, Mbit/s, ns)

Density(Gbit)

Power (mW)

Cost($$$)

Key Requirementsfor Memory Subsystem

ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 6

Solution for high density - FBDIMM

4.8

800

4.0

667

3.2

533

2.4

400

AMB IO [Gb/s]

DDR2 IO [Mb/s]

DDR2 Connector

AMB: Advanced

Memory Buffer

FBDIMM: Fully Buffered Dual Inline Memory Module

Source: N.Dadalt, P.Gregorius, E. Thaller, L. Gazsi; "A Compact Triple-Band Low Jitter Digital LC PLL with Programmable Coil in130nm CMOS," IEEE Solid State Circuit Journal, June 2005

ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 7

Alternative Approach: repeater DRAM

DIMM with multi-drop connections DIMM with repeater DRAMs

• high pin count (DDR2 SO DIMM =204pins) • parallel interface• bidirectional single ended DQ (data bus)• uni-directional single ended CA bus

• lower pin count• serial interface (frame based)• differential signaling• simultaneous READ / WRITE

ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 8

Repeater Topologies

Loop Back Topology (FBDIMM) Loop Forward Topology

Differential P2P Connections

Embedded Command/Address WRITE bus

Coded READ bus with seamless data insertion

READ / WRITE latency depends on DRAM position within daisy chain

���� needs HS FIFO

READ + WRITE latency = const. ���� better to control for MCH

ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 9

Repeater DRAM Test Chip – HS IO Section

RT RT

RxP

RxN

+

-

RX

-

+

+

-

RX

-

+

+

-

RX

-

+

Sampler Unit

4x

PI

S1 S2 S3 S4

P2S

4 Data

4 CLK

RT RT

+

-

TX

-

+DxP

DxN

DzP

DzN

TxP

TxN

CTL_P2S

CTL_PI

RT RT

+

-

RX

-

+

CrP

CrN

IQ

DIVTree

RT RT

+

-

TX

-

+ CtN

CtP

Tree

CiP

CiN

CiP

CiN

SEL

SEL

transparent repeat

re-sample path

t

Vo1

t

Vo2

S1

t

Vo3 Vo4Bit0 Bit1 Bit2 Bit3

S2 S3 S4

ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 10

Sampler Eye Characterization

Data Rate = 4.8Gbit/s/lane (UI=208ps) / Accumulated Eye over 6 Lanes – Transparent Mode

phase interpolator setting shmoo 0-360°(60 steps)

rece

iver

offs

et s

hmoo

-200

mV

...+2

00m

V

Bit0 Bit1 Bit2 Bit3

~83ps

~108ps

~105ps

~66ps

DATA

L6 CL

L6 CL

L6 CL

L6 CL

L1L2L3L4L5

L1L2L3L4L5

L1L2L3L4L5

L1L2L3L4L5

CLK

6.5mm

Source: Z. Gu, P. Gregorius, D. Kehrer, L. Neumann, T. Rickes, H. Ruckerbauer, R. Schledz, M. Streibl, etc.; "Cascading Techniques for a High-Speed Memory Interface, "IEEE International Solid State Circuit Conference, Session 12.7, Februar 2007

ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 11

Intermediate Summary

►Proposed technique solves density, pin count and po wer requirements for future architectures.

►Limitations in physical link remains

4 Layer PCB signaling limit

◄SE 2D Planar ~10Gbit/s/pin◄DS 2D Planar ~18Gbit/s/pin

SE 2D Planar ~2.6Gbit/s/pin ►

DS 2D Planar ~12Gbit/s/pin ►

►DRAM roadmap doesn‘t follow CPU (nCore) BW requirem ent

► Clock and data distribution needs complex link trai ning ► Power consumption, testability etc.

ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 12

GDDR5 Power Integrity ◄► Signal Integrity

∑=

n

in tv

1

)(

BUFt∆

0.3

0.3

0.3

0.4

0.4

0.4

0.4

0.4

0.4

0.4

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

Peak-to-Peak Power Supply Noise Amplitude [%VDDxnom

]

Pow

er S

uppl

y N

oise

Fre

quen

cy [M

Hz]

0 1 2 3 4 5 6 7 8 9

200

400

600

800

1000

1200

1400

1600

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Eye closure= 0.5UI

0.2

0.3

0.3

0.3

0.3

0.3

0.3

0.3

0.3

0.3

0.4

0.4

0.4

0.4

0.5

0.5

0.5

0.4

0.4

Peak-to-Peak Power Supply Noise Amplitude [% VDDxnom

]

Pow

er S

uppl

y N

oise

Fre

quen

cy [M

Hz]

0 1 2 3 4 5 6 7 8 9

200

400

600

800

1000

1200

1400

1600

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Eye closure= 0.5UI

6Gbit/s PLL-off

6Gbit/sPLL-on

Power Supply Sensitivity Model

ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 13

Impact of Clock Path Device Mismatch

standard deviation = 9%·UI (7%·UI from the clock generator)

0

1

2

3

4

5

6

7

8

9

10

70% 80% 90% 100% 110% 120% 130%

Eye width (UI)

# of

sam

ples

Eye1_meas

Eye1_sim

TX_QR_Eye.exe

ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 14

Ghost Signaling Motivation

RX1

I/O 1

RX / TX

CLK

I/O4

RX / TXCSU

I/O 2

RX / TX

I/O 3

RX / TX

I/O4

RX / TX

I/O 1

RX / TX

I/O 2

RX / TX

I/O 3

RX / TX

RX2 RX3 RX4 RX1 RX2 RX3 RX4

TX1 TX2 TX3 TX4 TX1 TX2 TX3 TX4

>1mm

Standard On-Chip Clocking Ghost Signaling

• phase matching problem for on-chip QR sampling clock

• needs complex bit wise data training• distributed timing recovery (MCH ↔

DRAM)• signal integrity is difficult no good

scalability

• clock distributed together with data• source synchronous• no distributed timing recovery needed• no internal clock tree

• save power, less complex• good scalability• pin count reduced• improved signal integrity (no sampling

phase problems)

ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 15

Ghost Signaling Repeater High Speed I/O

ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 16

Summary

→ FBDIMM / DDR2; Advanced Memory Buffer, Differential Signaling 2.4Gbit/s – 4.8Gbit/s

High Density Solutions (e.g. Server)

Ultra High Bandwidth (Graphic)

→ RDIMM / DDR3; Single Ended Point-to-Point or Point- to-2 Point ~2Gbit/s max

→ Buffer-on-Board / NG DRAM with Repeater, Different ial Signaling 4.8Gbit/s – 9.6Gbit/s

→ GDDR5; Single Ended Point-to-Point up to ~10Gbit/s

→ GDDR6; 3D Integration Single Ended ~10Gbit/s but x2 Pin count (x64 DRAM)

→ Buffer-on-Board / DDR3 / DDR4(?); Differential Sign aling 4.8Gbit/s – 9.6Gbit/s

High Bandwidth with ‘moderate’ Power (Notebook)

→ DDR4, Single Ended Dual Slot up to 2.4Gbit/s or Poi nt-to-Point ~3Gbit/s

→ Ultra Parallel Single Ended 3D Integration up to 2G bit/s

3D-System on Interposer

Dr.-Ing. Peter [email protected]

[email protected]

Thank you

The World’s LeadingCreative Memory Company