high speed interface techniques for computer … speed interface techniques for computer peripherals...
TRANSCRIPT
Dr.-Ing. Peter [email protected]
High Speed Interface Techniquesfor Computer Peripherals
Workshop High Speed InterconnectsUniversität Stuttgart INT07. November 2008
ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 2
High Speed Interface Techniques
Introduction, Trends and Limitations
Outlook & Summary
Review on Cascading Techniques (ISSCC2007)
Clocking
ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 3
Computer Peripherals I/O-Trend
SDRAM Datenrate pro Pin
Memory
Controller
Memory
Controller
Memory
Controller
Memory
Controller
High Performance Graphic Memory
High Performance PC Memory
GDDR� 1Gbit/s
GDDR3� 2.4Gbit/s
GDDR4� 2.8Gbit/s
GDDR5� >5Gbit/s
DDR� 0.4Gbit/s
DDR2� 0.8Gbit/s
DDR3� 1.6Gbit/s
DDR4� 3.2Gbit/s
ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 4
Memory Roadmap – Signaling
Example: GDDR5 / 2D planar chip-to-chipExample: GDDR5 / 2D planar chip-to-chip Example: DDR4 (under discussion) / dual slotExample: DDR4 (under discussion) / dual slot
2Gbit/s/pin TUI= 500ps Teye= 340ps
8Gbit/s/pin TUI= 125psTeye= 45ps
Note: SE signaling, channel data contains worst-case cross-talk and is based on actual board design.
2Gbit/s/pin TUI= 500psTeye= 180ps
3.2Gbit/s/pin TUI= 312.5ps
Teye= -
Conditions:1. SE signaling, at nominal supply voltage (1.5V)2. No power supply noise3. Physical BER=10-12
4. 4 Layer PCB
ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 5
Pin count, channel bandwidth and data rate
Standard ►
0
21
22
Memory I/O Evolution
20
0
4
6
8
2
23
24
25
26
27
SE
Off P
ackageLim
it ~ 2.6Gbit/s (B
ER
=10-12)
DS
Off P
ackageLim
it ~12Gbit/s (B
ER
=10-12)
Northbirdge
~150 Signals
► For the last decade the effective (DQ) data pin count remains constant. The evolution in data rate and bandwidth was based on the increase in frequency!
►Following the bandwidth roadmap in 2012/13 single ended signaling will hit a wall (signal + power integrity!)
►Differential signaling (DS) may give room for one or two generations but means a stop of the (G)DDRx evolution.
BW, Speed, Latency(GByte/s, Mbit/s, ns)
Density(Gbit)
Power (mW)
Cost($$$)
Key Requirementsfor Memory Subsystem
ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 6
Solution for high density - FBDIMM
4.8
800
4.0
667
3.2
533
2.4
400
AMB IO [Gb/s]
DDR2 IO [Mb/s]
DDR2 Connector
AMB: Advanced
Memory Buffer
FBDIMM: Fully Buffered Dual Inline Memory Module
Source: N.Dadalt, P.Gregorius, E. Thaller, L. Gazsi; "A Compact Triple-Band Low Jitter Digital LC PLL with Programmable Coil in130nm CMOS," IEEE Solid State Circuit Journal, June 2005
ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 7
Alternative Approach: repeater DRAM
DIMM with multi-drop connections DIMM with repeater DRAMs
• high pin count (DDR2 SO DIMM =204pins) • parallel interface• bidirectional single ended DQ (data bus)• uni-directional single ended CA bus
• lower pin count• serial interface (frame based)• differential signaling• simultaneous READ / WRITE
ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 8
Repeater Topologies
Loop Back Topology (FBDIMM) Loop Forward Topology
Differential P2P Connections
Embedded Command/Address WRITE bus
Coded READ bus with seamless data insertion
READ / WRITE latency depends on DRAM position within daisy chain
���� needs HS FIFO
READ + WRITE latency = const. ���� better to control for MCH
ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 9
Repeater DRAM Test Chip – HS IO Section
RT RT
RxP
RxN
+
-
RX
-
+
+
-
RX
-
+
+
-
RX
-
+
Sampler Unit
4x
PI
S1 S2 S3 S4
P2S
4 Data
4 CLK
RT RT
+
-
TX
-
+DxP
DxN
DzP
DzN
TxP
TxN
CTL_P2S
CTL_PI
RT RT
+
-
RX
-
+
CrP
CrN
IQ
DIVTree
RT RT
+
-
TX
-
+ CtN
CtP
Tree
CiP
CiN
CiP
CiN
SEL
SEL
transparent repeat
re-sample path
t
Vo1
t
Vo2
S1
t
Vo3 Vo4Bit0 Bit1 Bit2 Bit3
S2 S3 S4
ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 10
Sampler Eye Characterization
Data Rate = 4.8Gbit/s/lane (UI=208ps) / Accumulated Eye over 6 Lanes – Transparent Mode
phase interpolator setting shmoo 0-360°(60 steps)
rece
iver
offs
et s
hmoo
-200
mV
...+2
00m
V
Bit0 Bit1 Bit2 Bit3
~83ps
~108ps
~105ps
~66ps
DATA
L6 CL
L6 CL
L6 CL
L6 CL
L1L2L3L4L5
L1L2L3L4L5
L1L2L3L4L5
L1L2L3L4L5
CLK
6.5mm
Source: Z. Gu, P. Gregorius, D. Kehrer, L. Neumann, T. Rickes, H. Ruckerbauer, R. Schledz, M. Streibl, etc.; "Cascading Techniques for a High-Speed Memory Interface, "IEEE International Solid State Circuit Conference, Session 12.7, Februar 2007
ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 11
Intermediate Summary
►Proposed technique solves density, pin count and po wer requirements for future architectures.
►Limitations in physical link remains
4 Layer PCB signaling limit
◄SE 2D Planar ~10Gbit/s/pin◄DS 2D Planar ~18Gbit/s/pin
SE 2D Planar ~2.6Gbit/s/pin ►
DS 2D Planar ~12Gbit/s/pin ►
►DRAM roadmap doesn‘t follow CPU (nCore) BW requirem ent
► Clock and data distribution needs complex link trai ning ► Power consumption, testability etc.
ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 12
GDDR5 Power Integrity ◄► Signal Integrity
∑=
n
in tv
1
)(
BUFt∆
0.3
0.3
0.3
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
Peak-to-Peak Power Supply Noise Amplitude [%VDDxnom
]
Pow
er S
uppl
y N
oise
Fre
quen
cy [M
Hz]
0 1 2 3 4 5 6 7 8 9
200
400
600
800
1000
1200
1400
1600
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Eye closure= 0.5UI
0.2
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.4
0.4
0.4
0.4
0.5
0.5
0.5
0.4
0.4
Peak-to-Peak Power Supply Noise Amplitude [% VDDxnom
]
Pow
er S
uppl
y N
oise
Fre
quen
cy [M
Hz]
0 1 2 3 4 5 6 7 8 9
200
400
600
800
1000
1200
1400
1600
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Eye closure= 0.5UI
6Gbit/s PLL-off
6Gbit/sPLL-on
Power Supply Sensitivity Model
ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 13
Impact of Clock Path Device Mismatch
standard deviation = 9%·UI (7%·UI from the clock generator)
0
1
2
3
4
5
6
7
8
9
10
70% 80% 90% 100% 110% 120% 130%
Eye width (UI)
# of
sam
ples
Eye1_meas
Eye1_sim
TX_QR_Eye.exe
ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 14
Ghost Signaling Motivation
RX1
I/O 1
RX / TX
CLK
I/O4
RX / TXCSU
I/O 2
RX / TX
I/O 3
RX / TX
I/O4
RX / TX
I/O 1
RX / TX
I/O 2
RX / TX
I/O 3
RX / TX
RX2 RX3 RX4 RX1 RX2 RX3 RX4
TX1 TX2 TX3 TX4 TX1 TX2 TX3 TX4
>1mm
Standard On-Chip Clocking Ghost Signaling
• phase matching problem for on-chip QR sampling clock
• needs complex bit wise data training• distributed timing recovery (MCH ↔
DRAM)• signal integrity is difficult no good
scalability
• clock distributed together with data• source synchronous• no distributed timing recovery needed• no internal clock tree
• save power, less complex• good scalability• pin count reduced• improved signal integrity (no sampling
phase problems)
ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 15
Ghost Signaling Repeater High Speed I/O
ConfidentialQimonda · Dr. Peter Gregorius · PD · November 2008 · Page 16
Summary
→ FBDIMM / DDR2; Advanced Memory Buffer, Differential Signaling 2.4Gbit/s – 4.8Gbit/s
High Density Solutions (e.g. Server)
Ultra High Bandwidth (Graphic)
→ RDIMM / DDR3; Single Ended Point-to-Point or Point- to-2 Point ~2Gbit/s max
→ Buffer-on-Board / NG DRAM with Repeater, Different ial Signaling 4.8Gbit/s – 9.6Gbit/s
→ GDDR5; Single Ended Point-to-Point up to ~10Gbit/s
→ GDDR6; 3D Integration Single Ended ~10Gbit/s but x2 Pin count (x64 DRAM)
→ Buffer-on-Board / DDR3 / DDR4(?); Differential Sign aling 4.8Gbit/s – 9.6Gbit/s
High Bandwidth with ‘moderate’ Power (Notebook)
→ DDR4, Single Ended Dual Slot up to 2.4Gbit/s or Poi nt-to-Point ~3Gbit/s
→ Ultra Parallel Single Ended 3D Integration up to 2G bit/s
3D-System on Interposer