circuit design for a 2.2 gbyte/s memory...

22
Circuit Design for a 2.2 GByte/s Memory Interface Stefanos Sidiropoulos Work done at Rambus Inc with A. Abhyankar, C. Chen, K. Chang, TJ Chin, N. Hays, J. Kim, Y. Li, G. Tsang, A. Wong, D. Stark Increasing Chip I/O Bandwidth Computers: Main memory: SDRAM100 (100 Mbps) RDRAM (0.8-1.1 Gbps) Peripherals: PCI (66 Mbps) Infiniband (2.5 Gbps) Networks: Physical Front End: LAN: Fast-Eth (100 Mbps) Gigabit-Eth (1Gbps) WAN: OC-12 (625 Mbps) OC-48 (2.4 Gbps) Switch Fabric: 625 Mbps 2.5 Gbps

Upload: hoangkhanh

Post on 12-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Circuit Design for a 2.2 GByte/s MemoryInterface

Stefanos Sidiropoulos

Work done at Rambus Inc with A. Abhyankar, C. Chen, K.Chang, TJ Chin, N. Hays, J. Kim, Y. Li, G. Tsang, A. Wong,

D. Stark

Increasing Chip I/O Bandwidth

Computers:Main memory:

SDRAM100 (100 Mbps) RDRAM (0.8-1.1 Gbps)

Peripherals:PCI (66 Mbps) Infiniband (2.5 Gbps)

Networks:Physical Front End:

LAN: Fast-Eth (100 Mbps) Gigabit-Eth (1Gbps)

WAN: OC-12 (625 Mbps) OC-48 (2.4 Gbps)

Switch Fabric:625 Mbps 2.5 Gbps

Outline

OverviewTiming Methods

Signaling Methods

Timing Circuits

Signaling Circuits

Results

Main Issues

Drive and capture signals at the correct timeBit times are as small as 2-3 gate delays

Send and receive signals robustlyNoise is a large fraction of the signal

Tx RxChannel

PCB, Coax, Fiber

< 400-mV

< 1-ns

1 0 0 1 0 1

Timing Architectures

Synchronous:

Same frequency and phaseConventional busses

Conventional Memories

Mesochronous:

Same frequency, unknownphase

Fast memories/busses

MP networks

Interconnection networks

Plesiochronous:

Almost the same frequencyNetwork front-end

Router core

t t

F0

tA tB

F0tA≠≠≠≠ tB

F1 F2F1≈≈≈≈ F2

Synchronous Systems

On-chip clock is a multiple of system clock:

Synthesize on-chip clock frequency

On-chip clock phase varies:

Cancel clock buffer delay

PLL/DLLCKX

CKC

DI

CKX

DI

CKC

on-chip logic

Mesochronous Systems

Position on-chip sampling clock at the optimal point i.e. maximize “timing” margin

PLL/DLL

ref

data

CKSRC

rcvr

logic

CKRCV

CKSRC

data

CKRCV

D0 D1 D2 D3

Plesiochronous Systems

Recover incoming data fundamental frequency

Position sampling clock at the “optimal” point

DIN

CRC

CKR

rcvrlogic D0 D1DIN

CKR

Signaling

Send and receive the data impaired by noise:Independent noise sources:

Thermal and uncorrelated system noise

Proportional noise sources:Reflections, cross-talk, signal-return noise

+

-

+

-

VS

VS/2

shared

+

-refd

+

-

dd

High Impedance

Dif

fere

nti

alS

ing

leE

nd

ed

Low Impedance

Outline

Background

Timing Circuits

Signaling Circuits

Results

Rambus Memory Channel

1.6-GB/s (800 Mbps/pin):Current mode signaling

Source synchronous clocking

M1 M2 M16

24

ClkGen

M1 M2 M16

Controller

D0 D1 D2

CTM

CFM

Increasing System Performance

Increase transfer rate:System Clock: 400 533 MHz (800 1066 Mbps/pin)

Peak Bandwidth: 1.6 2.2 GB/s

Challenges:

Timing MarginDevice Variations

Channel Imperfections

Voltage ErrorsBus Hand-off

Prototype DRAM Interface Chip

Technology: 0.25-µµµµm, 2.5-V CMOS

Supply: 1.8-V

Active Area: 11.2 x 1.3 mm2

Package: LGA, µµµµBGA

Chip Includes:

T/R DLL

2-Data bytes, 1-Address byte

Packet Protocol Logic

18 KB SRAM

Outline

Background

Timing CircuitsRequirements

Architecture

Timing Error Sources

Signaling Circuits

Results

RDRAM Timing Circuit Requirements

CFM

DQ/RQ

RCLK

D0 D1 D2 D3

CTM

DQ

TCLK

D0 D1 D2 D3

DLL

DQA DQBRQCTM CFM

TCLK RCLK RCLK TCLK RCLK

8 8 8

PLLs vs DLLs

Second/third order loop:

Stability is an issue

Frequency synthesis easy

Ref. Clk jitter gets filtered

Phase error accumulates

First order loop:

Stability guaranteed

Frequency synthesis problematic

Ref. Clk jitter propagates

Phase error does not accumulate

÷N

PDrefclk

VCO

Filter

PDrefclk

VCDL

Filter

clk

clk

Supply Noise: DLL vs PLL

No need for clock multiplication use a DLL

* Supply sensitivity: 0.1%-delay/%-supply/element

phaseerror(deg.)

time (ns)0 500 1000 1500

-50

-40

-30

-20

-10

0

DLLPLLBW 20MHzPLLBW 5MHz

DLL-pk

PLL-pk

6-stage DLL vs 6-stage PLL

Conventional DLL

Limited phase acquisition rangeGenerate delay by using phase interpolation

PD

refclk

clk

Variable Phase Interpolation

If φφφφ,,,, ψψψψ selectively span 2ππππ:

Can generate any ΘΘΘΘ

φφφφ,,,, ψψψψ can be generated by a DLL

φφφφ’

ψψψψ’

φφφφ

ψψψψ

ΘΘΘΘ

w = 0..N

NwwN ψψψψφφφφ

ΘΘΘΘ⋅⋅⋅⋅++++⋅⋅⋅⋅−−−−

====)(

ψψψψ0000ψψψψ1111

ψψψψ2222 ψψψψ3333

φφφφ0000

φφφφ1111

φφφφ2222

φφφφ3333

φφφφ ψψψψ

RDRAM Delay Buffers

Use differential elements with replica biasing:Increased noise immunity

Not easily portable

Require larger supply head-room but ok for 1.8-V

BiasCircuit

VCTL

VCN

VCP

[Hu’92]

[Maneatis’93]

Interpolator Design

Interpolator bias and input/output time constant scales

TDC remains linear over large frequency range

+

-

5DAC

VCN

VCP

Dual DLL Block Diagram

AmpAmp

PD/CP/Bias

PDup/dn

InputClock

Ref Clock

CORE

PERIPHERAL

FSM

Device Timing Variations

100 parts: µµµµ ≅≅≅≅ 30-ps, σσσσ ≅≅≅≅ 20-ps

Receive Window Distribution

0

5

10

15

20

25

-50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100

Receive-valid Window Center (ps)

# p

art

s

Propagation Delay Mismatch

Clock and data channels different

Clock and data spectral components different

Propagation delays can differ by ~ 100-ps

Regain margin: every DRAM transmit/receivetiming must be offset from its lock point

Discontinuity

φφφφ

DRAM

)]2sin()[sin()( ϕϕϕϕωωωωωωωω −−−−⋅⋅⋅⋅⋅⋅⋅⋅++++⋅⋅⋅⋅⋅⋅⋅⋅==== trtAtv)sin(')( θθθθωωωω ++++⋅⋅⋅⋅⋅⋅⋅⋅====⇒⇒⇒⇒ tAtv

2φθ

A’

A

rA

Module

Original Dual-DLL

AmpAmp

PD/CP/Bias

Mux+Interpolator

PD

Counter

Decoder

up/dn

FBClock

InputClock

8

MainClock

to I/O

Ref Clock

FSM

DLL for “in-system” Calibration

AmpAmp

PD/CP/Bias

Mux+Interpolator

PD

Counter

Decoder

up/dn

FBClock

8

Mux+Interpolator (_2)Decoder

Adder

8

Offset[7:0]

MainClock

to I/O

Ref Clock

InputClock

(set @boot time)

Outline

Background

Timing Circuits

Signaling CircuitsBus Environment Challenges

Output Subsystem Design

Results

“Back-to-Back” Reads

Compliance voltage for M2 as low as 0.5-V

2 ∆∆∆∆t2Mem2

Controller ∆∆∆∆t1+∆∆∆∆t2 2 ∆∆∆∆t2

Contr.

Mem1 Mem2

∆∆∆∆t2∆∆∆∆t1

Vterm

Vterm

Vterm-Vsw

Vterm-1.5Vsw

Output Driver Subsystem

_7

Q0

DQ0

_7

Q1

DQ1

_7

Q8

DQ8

+

-

CC[6:0]EN

_7

VG[6:0]

Driver BiasVoltage

Generator

VGREF

VGATE

77

Driver Bias Voltage Generator

Constant gate overdrive:Increase noise immunity

Constant saturation margin over PVT

IRIC

>>>>VT

IR����R

VGREF

R

Driver IV Characteristics

0

5

10

15

20

25

30

35

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Vpad (V)

Iou

t (m

A)

TT

SS

FF

Output Driver Model

Negative resistance compensates for finite ro

omoogmout vgArvvgi ⋅⋅⋅⋅⋅⋅⋅⋅−−−−++++⋅⋅⋅⋅==== 2/

gm

-A

gm2

vG

vO

ro

Output Driver Schematic

M6-M7 control maximum feedback current

M3/M4 ratio constrained to minimize time constant

VG[6:0]

DQ

M1[6:0]

M2[6:0]

QM3 M4

M5

M6[1:0]

M7[1:0]SL[1:0]

Driver IV Characteristics

0

5

10

15

20

25

30

35

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Vpad (V)

Iou

t (m

A)

TT

SS

FF

Outline

Introduction

Timing

Signaling

Results

Operating Range

VDD (Volts)

TB

IT (

nse

c)

1.0 2.5

0.75

2.75

1.8-V1.1 Gbps/pin

Measured DLL Jitter

< 100-ps peak-peak with interface and core active

Uncalibrated Output Data-valid Window

TBIT = 900-ps, TOFFS = default TQ offset ~ 150-ps

760-ps

1-V

-1.0 1.0∆∆∆∆t (ns)

1.5

VD

D (

Vo

lts)

2.5

Calibrated Output Data-valid Window

TBIT = 900-ps, calibrated TOFFS TQ offset < 20-ps

780-ps

1-V

VD

D (

Vo

lts)

-1.0 1.0∆∆∆∆t (ns)

1.5

2.5

Measured Calibration Accuracy

DNL, INL < 2-LSB

0

50

100

150

200

250

300

350

0 50 100 150 200 250code #

off

set

(deg

rees

)

400 MHz

533 MHz

RDRAM Power Modes

DLL must go into low-power “nap” mode

IVDD < 4-mA

Restore clock phase within 80-ns

Digital peripheral loop logic naturally holds state

Hold state of core loop on 25-pF charge-pump capacitor

Measured Driver I-V Characteristics

0

5

10

15

20

25

30

35

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Vpad (V)

Iou

t (m

A)

FB off

FB on

Summary

Increasing memory interface bandwidth: Minimize both voltage and timing errors:

Voltage errors are systematic

Compensated with new driver design

Timing Errors are unpredictable

Compensated with “in-system” calibration

Expect to see more digital “calibration” in high speed links:

Challenge is minimize overhead:Area, Power, Yield..

System bring-up and ease of use..