a 64gb/s pam-4 transmitter with 4-tap ffe and 2.26pj/b ... · mr and 28gb/s kp4 standards • at...
TRANSCRIPT
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 1 of 31
A 64Gb/s PAM-4 Transmitter with 4-Tap
FFE and 2.26pJ/b Energy Efficiency in
28nm CMOS FDSOI
G. Steffan1, E. Depaoli1, E. Monaco1,
N. Sabatino1, W. Audoglio1, A. A. Rossi1,
S. Erba1, M. Bassi2, A. Mazzanti2
1 STMicroelectronics, Pavia, Italy
2 Università degli Studi di Pavia, Pavia, Italy
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 2 of 31
Outline
• Motivation
• Proposed TX Architecture
• Reconfigurable FFE
• Output Driver
• High-Speed Serializer
• Clock Generation
• Measurement Results
• Conclusions
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 3 of 31
Network Traffic: Growth and Challenges
72
,5
88
,7
10
8,5
132,1
16
0,6
194
,4
0
50
100
150
200
2015 2016 2017 2018 2019 2020
Exabyte
sper
Month
Challenges
• Gate count increases faster than I/O speed
• Power dissipation, rather than technology and routing, mostly limits max I/O density
• Increasing data rate at > 25Gb/s increases link losses and power consumption
PAM-4 Modulation
• Helps maintain loss budget by decreasing Nyquist frequency
• SNR degradation can be recovered by using FEC
3x
[Cisco, The Zettabyte Era: Trends and Analysis][OIF-FD-Client-400G/1T-01.0 White Paper]
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 4 of 31
High-Speed PAM-4/NRZ TX Design
High output amplitude
and linearity – to
preserve SNR and H/V
opening
Very high bandwidth –
to speed-up non-
adjacent level
transitions
Reconfigurable FFE –
to be compliant with
several standards
PAM-4/NRZ high/low
speed modes – for
auto-negotiation and
substitution of legacy
components
Precise and reliable
serialization with low
power
Challenges
PAM-4 PAM-4
PAM-4/NRZ
PAM-4/NRZ PAM-4/NRZ
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 5 of 31
8:4
8:4
8:4
8:4
8:4
8:4
8:4
8:4
C-2 C-1 C0
C-2 C-1 C0
C0 C1 C2
C0 C1 C2
LSB
MSB
40b
FFEL
Vdd,CMOS
MM
ML FFEM
4:1
4:1
4:1
4:1
4:1
4:1
4:1
4:1
4x8b 4x4b
C-2 C-1 C0
C-2 C-1 C0
C0 C1 C2
C0 C1 C2
8b
C-2 C-1
C0
C1
C2
40:8
C-2 C-1
C0
C1
C2
40:8
5x8b
40b
4x8b 4x4b
8b
5x8b
TX Block Diagram
• Shift-registers delay 8bit bundles and generate five C[-2:2] FFE data-streams
• MUXs MM and ML enable C[-2:2] selection
• In PAM-4 mode, up to 4 FFE taps
• In NRZ mode, 40b LSB/MSB data is merged, but MM and ML can still be operated independently to provide up to 5 FFE taps
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 6 of 31
TX Block Diagram
6
12
24
6
3
6
12
3
Ou
tpu
t N
etw
ork
Vdd,DR
24
48
8:4
8:4
8:4
8:4
8:4
8:4
8:4
8:4
C-2 C-1 C0
C-2 C-1 C0
C0 C1 C2
C0 C1 C2
LSB
MSB
40b
FFEL
Vdd,CMOS
MM
ML FFEM
4:1
4:1
4:1
4:1
4:1
4:1
4:1
4:1
4x8b 4x4b
C-2 C-1 C0
C-2 C-1 C0
C0 C1 C2
C0 C1 C2
8b
C-2 C-1
C0
C1
C2
40:8
C-2 C-1
C0
C1
C2
40:8
5x8b
40b
4x8b 4x4b
8b
5x8b
• Output driver is composed of 72 elements
• 24 driver elements are driven by LSB data, 48 by MSB data
• Dedicated voltage supply Vdd,DR=1.2V
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 7 of 31
TX Block Diagram
6
12
24
6
3
6
12
3
Ou
tpu
t N
etw
ork
Vdd,DR
24
48
8:4
8:4
8:4
8:4
8:4
8:4
8:4
8:4
C-2 C-1 C0
C-2 C-1 C0
C0 C1 C2
C0 C1 C2
LSB
MSB
40b
FFEL
Vdd,CMOS
MM
ML FFEM
4:1
4:1
4:1
4:1
4:1
4:1
4:1
4:1
4x8b 4x4b
C-2 C-1 C0
C-2 C-1 C0
C0 C1 C2
C0 C1 C2
8b
C-2 C-1
C0
C1
C2
40:8
C-2 C-1
C0
C1
C2
40:8
5x8b
40b
4x8b 4x4b
8b
5x8b
REF CK
¸2¸4/5
I/Q
Generation
CK4-I
CK4-QPLL
2-8GHz
• PLL generates 2-8GHz clock signal
• High precision I/Q signals generator feeds the 40:1 serializer
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 8 of 31
Reconfigurable TX FFE
• At Full-Speed, it provides up to 4 FFE tap in PAM-4 mode and 5 tap in NRZ mode, meeting OIF CEI 56Gb/s MR and 28Gb/s KP4 standards
• At Half-Speed, data is oversampled and [C-2 ,C2] are mapped as 1-Pre/Post cursor, respectively, meeting 10Gb/s KR10 and 8.5Gb/s PCI Exp-3
• At Quarter-Speed, C2 is mapped as 1-Postcursor while C-2:1 are all set to the Main cursor. This configurations is compliant with 2.5Gb/s PCI-Exp1
FFEL
3
6
12
3C-1
C-2
C0
C-1
C-2
C0
C1
C0
C2
C1
C0
C2
C2 C1
C0
C-1
C-2C-2
C-1
C0
C1
C2
6
12
24
6C-1
C-2
C0
C-1
C-2
C0
C1
C0
C2
C1
C0
C2
C2 C1
C0
C-1
C-2C-2
C-1
C0
C1
C2
FFEM
Ou
tpu
t N
etw
ork
LSB
MSB
-21/72
-3/24
-21/72
-21/72
12/24
36/72
36/72
-9/24
-36/72
-36/72
-9/24
-36/72
2-PRE 1-PRE MAIN 1-POST 2-POST
45/72 -27/72
PAM-4 FS
NRZ FS
NRZ HS
NRZ QS
Coefficients Minimum Normalized Amplitude
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 9 of 31
State-of-the-art PAM-4 Output Drivers
• Hybrid voltage/current driver
• Very good linearity and high output
amplitude with 1V supply
• Bandwidth limited by increased load
• Low FFE programmability
• Pure current mode driver
• Simple implementation, high bandwidth
• Two supply domain and need of level shifter operating at output symbol rate
• High FFE programmability
[Bassi et al., ISSCC 2016, JSSC 2017] [Nazemi et al., ISSCC 2016]
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 10 of 31
Proposed Current Mode Driver
• InN and InP CMOS-level input data streams from serializer
• Gate voltages of MC1,2 current sources are constant, set by replica bias based on desired output swing Vref
• When output node is high, MC1,2 source is pulled to Vdd,CMOS, relaxing reliability constraints and allowing the use of thin oxide devices
<1>
<72>
<1>
<72>
InN InP
OutP OutN
Vbias
Vdd,CMOS Vdd,CMOS Vdd,CMOS
Vdd,DRVdd,DRVdd,DR
Vref
MC1 MC2
High output swing with good linearity and large bandwidth
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 11 of 31
Output Network
• 200V MM / 500V CDM, >>2kV HBM ESDs
• Driver capacitance is comparable with ESD
capacitance
• Double T-coil network enhances bandwidth
by 1.5 and improves impedance matching
at high frequency
Driver
Resistor
Load
BankESD
OutPCoil #1
Coil #2
ESD
OutNCoil #1
Coil #2
CBUMPCESDCDRIVERCLOAD
Coil #2Coil #1
-9
-6
-3
0
0 10 20 30
TF
[dB
]
Frequency [GHz]
With Coils
Without Coils
-30
-20
-10
0
0 10 20 30
Re
turn
Lo
ss [d
B]
Frequency [GHz]
With Coils
Without Coils
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 12 of 31
High-Speed Serializer Architectures
Half-rate architecture
• tBIT > tSetup + tMUX + tDIV – tD
• Low CPAR load of half-rate architecture
leads to very fast commutations
Quarter-rate architecture
• tBIT > tSetup + tMUX – tPULSE
• Higher CPAR load of quarter-rate
architecture leads to increased ISI
Propagating clock forward relaxes serializer timing constraints
Low load highly desirable to limit ISI
CK4-I CK4-Q
tMUX
tDIV
CK2
B0
B1
OUT
4:2
2:1
CPAR
D0
D1
D2
D3
CK4-I
CK4-Q
CK4-I
CK4-Q
FF
FF
FF
FF
tD
/2CK4-I
CK4-QSEL<3:0>
tPULSE
D0
D1
D2
D3
CK4-I
CK4-Q
SEL<0>
SEL<1>
SEL<2>
SEL<3>CK4-I
CK4-Q
tMUX
FF
FF
FF
FF
2xCPAR
OUT
4:1
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 13 of 31
b0 b2
b1 b3
b0 b1 b2 b3
CK4-I
CK4-Q
B0
B1
CK2
OUT
tMUXtMULT
tSetup
Proposed MUX Architecture
• Quarter-rate architecture to enhance speed and lower ISI
• Local X2 clock multiplier to save power
• Forward propagated delay implemented with X2 allows relaxed timing
constraints:
• tBIT > tSetup + tMUX – tMULT
CK4-I
CK4-QX2
tMUX
tMULT
CK2
B0
B1
D0
D2
D1
D3
OUT4:2 2:1
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 14 of 31
2
4
6
8
10
12
14
16
18
10 20 30 40 50
Jitte
r P
k-P
k [
ps]
Symbol Rate[Gsym/s]
Traditional 4:1 Mux
Proposed 4:1 Mux
Proposed MUX Architecture
• MUX 4:2 based on pass-gate to save power and guarantee tMUX > tMULT to respect
hold-time constraints
• NAND-based frequency doubler generates half rate clock for the last 2:1 MUX
• At 32 Gsym/s the Pk-Pk jitter on output node is reduce by 1.3 compared to a traditional
direct 4:1 MUX
CK4-I
CK4-QX2
tMUX
tMULT
CK2
B0
B1
D0
D2
D1
D3
OUT4:2 2:1
CK4-IP/N
CK4-QP/P
CK4-IN/P
CK4-QN/N
CK2P/N
X2
B0
CK2P
B1
CK2N
OUT
2:1
LAT
D0/D1
D2/D3
CK4-I/CK4-Q
B0/B1
4:2
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 15 of 31
Effects of I/Q Mismatches
• I/Q mismatches on quarter-rate clocks creates DCD on half-rate clock
• I/Q phase difference must be lower than 1.4°
Δ=5.6º
0.88UI1.12UI
Δ<1.4º
1UI1UI
CK-IP/N
CK-QP/P
CK-IN/P
CK-QN/N
CK2P/N
2UI
Δ
2UI
1UI-Δ 1UI+Δ1UI-Δ 1UI+Δ
CK4-I
CK4-Q
CK2
2UI
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 16 of 31
CK4-I
CK4-Q
CK2 1UI 1UI-Δ 1UI+Δ 1UI
2UI
2UI+Δ2UI-Δ
Effects of I/Q Duty-Cycle Distortion
• DCD on quarter-rate I/Q clocks translates to DCD on half-rate clocks with period of 4UI
CK-IP/N
CK-QP/P
CK-IN/P
CK-QN/N
CK2P/N
Δ<0.01UI
1UI 1UI 1UI 1UI
Δ=0.11UI
1UI 1UI1.11UI0.89UI
Generation of precise I/Q quarter-rate clocks is key, especially at high-speed
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 17 of 31
Clock Generation Tree
• Integer-N type PLL with two VCOs and output divider to generate
2-8GHz master clock
• Injection-Locking Ring Oscillator provides high-accuracy 8 phases against
PVTs
• Phase rotators interpolate 8 π/4-spaced phases to improve DNL and INL
• Quarter-rate clocks fed to serializer after DCC circuit
Locking
Signal
Vtune
CML to
CMOS
CML to
CMOS
6-8GHz
4-6GHzPFD CP
LPF
REF CK
/N
Bandgap
Regulator
Injection Locked Ring Oscillator
Integer-N PLL
CML to
CMOS
CML to
CMOS
DCC-I
DCC-Q
/1
/2
CK4-I
CK4-Q
Phase Rotator Duty-Cycle
Correction
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 18 of 31
Injection Locked Ring Oscillator
Supply voltage [V]
Temperature [°C]
Qu
ad
ratu
re p
ha
se
err
or
[°]
-40 0 40 80 120
8
4
0
-4
-80.8 0.9 1.0 1.1 1.2
-8
-6
-4
-2
0
2
4
6
8
0.8 0.9 1.0 1.1 1.2
Ph
ase E
rro
r [
]
Supply[V]
Phase Accuracy VS Supply Variation
a
-8
-6
-4
-2
0
2
4
6
8
-40 -20 0 20 40 60 80 100 120
Ph
ase
Err
or
[ ]
Temperature [⁰C]
Phase Accuracy VS Temperature Variation
Qu
ad
ratu
re p
ha
se
err
or
[°]
8
4
0
-4
-8
No calibration
Analog calibration ON
Analog + digital calibration
b
No calibration
Analog calibration ON
Analog + digital calibration
• A phase detector based on passive mixers measures the quadrature error and continuously tunes the oscillator Vtune for fine phase correction
• Concurrently, a window comparator monitors Vtune and drives digital coarse calibration in background.
• The quadrature phase error is kept lower than 1.5º when supply and temperature variations are between [0.9V, 1.2V] and [-40ºC, 120ºC]
preset
Locking
Signal
up
downvTH
vTL
clkLF
logiclogicregister
Fre
qu
en
cy C
od
e
Vtune
Buffer
Digital Loop
Analog Loop
fIN=8GHz
[Anzalone et al., ESSCIRC 2016]
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 19 of 31
Phase Rotator
• Phase Rotators consists of four slices driven by the ILRO outputs
• Each slice consists of 32 differential pair thermometric weighted to reduce switching
glitches and guarantee the monotonicity of the output phase
• At 11GHz, the maximum DNL and INL are 0.5 and 1 LSB, respectively
`
j1=135º,j2=180º
`
j1=90º,j2=135º
`
j1=45º,j2=90º
`
j1=0º,j2=45º
<0>
<15>
<0>
<15>
j1P j1N j2P j2N
1
0
-1
DN
L [
LS
B]
0
-1
INL
[L
SB
]
0.5
-0.5
1
2
-2
-3
-4
Code0 32 64 96 128
-1
-0.5
0
0.5
1
0 32 64 96 128
Measure DNL
-4
-3
-2
-1
0
1
2
0 32 64 96 128
Measure INL
= 2GHz with AQC = 11GHz with AQC = 11GHz without AQC
fIN=11GHz from ext
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 20 of 31
DCD Correction Circuit
• PMOS and NMOS switches operates independently
• Two 7 bit thermometric code to avoid glitches and guarantee the monotonicity of the correction
• DCD correction circuit capability equal to ±1.5% at 8GHz
INP,N OUTN,P
SELP<6:0>P,N
SELN<6:0>P,N
-2
-1
0
1
2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
DC
D @
8G
Hz [
%]
CODE
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 21 of 31
Chip Photo and Power Break-Down
PLL
Transmitter
• 10ML CMOS 28nm FDSOI CMOS from STMicroelectronics
• Chips encapsulated in flip-chip BGA packages
HF Serializer
45%
LF Serializer
8%Output Driver16%
Clock Gen.31%
• Power Consumption:
145mW @ 64Gb/s
• Vdd,CMOS=1V
• Vdd,DR=1.2V
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 22 of 31
Agilent DCA-X 86100D
Test Board
Ref Clock
-3
-2
-1
0
0 4 8 12 16 20
Insert
ion G
ain
[dB
]
Frequency [GHz]
Pkg&
Bo
ard
In
se
rtio
n G
ain
[d
B]
Frequency [GHz]
Measurement Setup
• Package and trace board loss
at 16GHz is 2.5dB
• Connectors and cable add
about two more dB of loss
• Total loss at 16GHz equal to
4.5dB
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 23 of 31
Output Eyes at 28/56 Gb/s
0.84UI 0.73V0.48UI
0.18V
• FIR setting:
[C-1 C0 C1]=[-1/24 18/24 -3/24]
• Vertical opening: 0.18V
• Horizontal opening: 0.48UI
• FIR setting:
[C-1 C0 C1]=[-1/24 18/24 -3/24]
• Vertical opening: 0.73V
• Horizontal opening: 0.84UI
PRBS-9 @ 28Gb/s QPRBS-13 @ 56Gb/s
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 24 of 31
Output Eyes at 32/64 Gb/s
0.75UI 0.6V0.36UI
0.14V
PRBS-9 @ 32Gb/s QPRBS-13 @ 64Gb/s
• FIR setting:
[C-1 C0 C1]=[-1/24 18/24 -3/24]
• Vertical opening: 0.14V
• Horizontal opening: 0.36UI
• FIR setting:
[C-1 C0 C1]=[-1/24 18/24 -3/24]
• Vertical opening: 0.6V
• Horizontal opening: 0.75UI
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 25 of 31
S22 and PLL Phase Noise
• Return loss better than the mask limit with margin
• Jitter of the clock is estimated by integrating phase noise starting from
500kHz offset
• The random jitter integrated up to 8GHz is 290fs
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 26 of 31
Comparison with State of Art
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 27 of 31
Conclusions
• Delivering high TX amplitude while preserving linearity and large
bandwidth is key for high-speed PAM-4 transmitters
• A new output driver allows high swing and good linearity with increased
supply while still employing thin-oxide devices operated reliably
• A smart FFE structure is proposed for back-compatibility with legacy
standards
• Measurements test chips realized in 28nm CMOS FDSOI technology by
STMicroelectronics prove the effectiveness of the proposed TX
6.4: A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI© 2017 IEEE International Solid-State Circuits Conference 28 of 31
Acknowledgement
• The authors are thankful to Dr. Guido Albasini, Daniele Baldi and Dr.
Davide Sanzogni and the layout team for their contributions