device technology for peta computingdevice technology for ... · device technology for peta...
Post on 27-Mar-2020
1 Views
Preview:
TRANSCRIPT
Device Technology for Peta ComputingDevice Technology for Peta Computing
June 21, 2002
Masao Fukuma
NEC Laboratories, NEC Corporation
The World First CMOS Super Computer LSI The World First CMOS Super Computer LSI
• 290MFLOPS• 0.25-um CMOS• Supply Voltage: 2.5V• Clock: 125MHz • No. of Transistors: 3M
SX-4 @1996
Earth SimulatorEarth Simulator
Japanese ‘Computenik’!?New York Times @2002
Performance:40.0-TFLOPS (Theoretical)35.7-TFLOPS (Linpack)
Foot Print:65m x 50m
Performance:= 8-GFLOPS / LSI * 5120= 64-GFLOPS / node * 640= 128-GLOFPS / BOX * 320
System Memory:10-TByte
Process : 0.15um CMOS 8-layer Copper
Frequency : 500MHz(System)
Performance: 8-GFLOPS
Single-Chip Vector SupercomputerSingle-Chip Vector Supercomputer
Multi-GFLOPS Processor Multi-GFLOPS Processor
10.0
1.0
0.1Floa
ting
Poin
t Exe
cutio
n (G
FLO
PS)
1995 1998 2001
SX-40.35µmCMOS4M Tr125 MHz x 2 =0.25GFLOPS
SX-50.25µmCMOS15M Tr250 MHz x 4 =1GFLOPS
0.15µmCMOS
8GFLOPS
Earth Sim.
High-speed Design with Lower Power
10M1985 1990 1995 2000 2005 2010
Year
MPU
Fre
quen
cy (H
z)
Num
ber o
f Tra
nsis
tors
(1
/chi
p)
100M
1G
10G
100K
1M
10M
100M
1G,:MPU for Supercomputer,:MPU for PC
MPU Performance TrendMPU Performance Trend
1K 10K 100K 1M 10M 100M1G
10G
100G
1T
10T
100T
1Peta
10PPe
rform
ance
(FLO
PS)
System Power (VA)
0.25µm0.15µm0.08µm0.05µm0.03µm
Technology/Power Trends for Peta-FLOPSTechnology/Power Trends for Peta-FLOPS(Scientific computing)
NEC CMOS (UX Series) RoadmapNEC CMOS (UX Series) Roadmap
1999 2000 2001 2002 2003CY1998
UC3180nm1.8V
150nm1.8V
UR3
UX4UX4150nm(Lg:130nm)1.5V
UR3H150nm1.8V
UX4UX4(Cu)(Cu)
UR4
UC40.12um1.2V
UX5UX5
CuCu
UX6UX6
CuCu
130nm(Lg: 95nm)1.2V
90nm(Lg: 65nm)1.0V
2004 2005 2006
UX7UX7
65nm(Lg: 45nm)0.9VCuCu
UX8UX8preliminarypreliminary45nm(Lg: 30nm)0.8VCuCu
2007
UX5AUX5A130nm(Lg:77-60nm)1.2V
UX6AUX6A90nm(Lg:48-38nm)1.0V
ASIC
MPUUX7AUX7A UX8AUX8A
preliminarypreliminary65nm(Lg:30-24nm)0.9V
65nm(Lg:20nm)0.9V
YEAR 2001 2002 2003 2004 2005 2006 2007 2010 2013 2016MPU Physical Gate Length (nm) 65 53 45 37 32 28 25 18 13 9Physical Gate Length low-standbypower (LSTP) (nm)
90 75 65 53 45 37 32 22 16 11
Nominal Ion at 25 C (mA/mm) 900 900 900 900 900 900 900 1200 1500 1500[NMOS] high-performanceNominal sub-threshold current (at25 C) (pA/オm) [NMOS] LSTP
1 1 1 1 1 1 1 3 7 10
Equivalent physical oxide thickness 1.3-1.6 1.2-1.5 1.1-1.6 0.9-1.4 0.8-1.3 0.7-1.2 0.6-1.1 0.5-0.8 0.4-0.6 0.4-0.5Tox (nm), high-performanceLgate 3σ variation (nm) 6.31 5.3 4.46 3.75 3.15 2.81 2.5 1.77 1.25 0.88Gate electrode sheet Rs (Ω/ ) 5 5 5 5 5 5 5 5 6 7Silicide thickness (nm) 35.8 29.2 24.8 20.4 17.6 15.4 13.8 9.9 7.2 5Contact silicide sheet Rs (Ω/ ) 4.2 5.1 6.1 7.4 8.5 9.7 10.9 15.2 21.0 30.3Drain extension Xj (nm) 27-45 22-36 19-31 15-25 13-22 12-19 10-17 7-12 5-9 4-6Number of metal levels 8 8 8 9 10 10 10 10 11 11Local wiring pitch (nm) 350 295 245 210 185 170 150 105 75 50Intermediate wiring pitch (nm) 450 380 320 265 240 215 195 135 95 65Minimum global wiring pitch (nm) 670 565 475 460 360 320 290 205 140 100Conductor effective resistivity 2.2 2.2 2.2 2.2 2.2 2.2 2.2 2.2 2.2 2.2(オΩ-cm) Cu intermediate wiringBarrier/cladding thickness 16 14 12 10 9 8 7 5 3.5 2.5(for Cu intermediate wiring) (nm)Interlevel metal insulator 3.0-3.6 3.0-3.6 3.0-3.6 2.6-3.1 2.6-3.1 2.6-3.1 2.3-2.7 2.1 1.9 1.8-effective dielectric constant (k)
ITRS Technology Roadmap 2001ITRS Technology Roadmap 2001
Changes in VLSI R&DChanges in VLSI R&D
Scaling has been driving VLSI.
However, the conventional scaling is less effective for sub-100nm SOC.
Why?
Application orientedMobile:Low Vdd, Thin Ox, Low leakage currentHigh end:Minimum Lg, Low Vt
Physical/practical limitsThin Ox, Subtheshold Leak, Lithography
CMOS Development CMOS Development
Gate Length (nm)1040100 206080200400
SOC
Pe
rform
ance
Conv.Scaling
Equivalent Scaling
Flexible ParameterCMOS
Gate Electrode :( Drivability, Parasitic RC)Low depletion, Low R, High drivability
Gate Insulator :(Drivability, SCE)
Thin Tox, High reliability, Low leakage, High mobility
Channel Design :(Drivability, SCE)Halo, Sharp profile
1nm
Si-Si Distance : ~0.3nm
4~6layer
Gate
Source Drain
Concentration
ShallowHigh
Junction Depth :(SCE, Parasitic RC)Shallow xj, High doping,Low deffect
Dep
th
GateDepletion
Insu
lato
r
Technology Challenge for sub-0.1µm CMOS Technology Challenge for sub-0.1µm CMOS
Sharp halo
Gate Electrode
Si substrate
-1.5 -1 -0.5 0 0.5 1 1.510-12
10-10
10-8
10-6
10-4
10-2
Gate voltage VG [V]
Dra
in c
urre
nt I D
[A/µ
m]
33-nm pMOSby C-S/D
|VD|=0.1,1.2V
24-nm nMOSby R-S/D
-1.5 -1 -0.5 0 0.5 1 1.50
200
400
600
800
1000
1200
33-nm pMOSby C-S/D
|VG|=0,0.3,,1.5V
Drain voltage VD [V]D
rain
cur
rent
I D[A
/µm
]
24-nm nMOSby R-S/D
Tr. Characteristics of 25nm(nFET)/33nm(pFET)Tr. Characteristics of 25nm(nFET)/33nm(pFET)
24nm n-FET w/ 800 µA/µm, 300 nA/µm by R-SD, Tox=1.6nm33nm p-FET w/ 400 µm/µm, 300nA/µm by C-SD, Tox=1.6nm
500 100 50 10
(A/V
/m)
1200
1000
800
600
400
200
0
Lgate (nm)
I D/V
D
Driving Capability Driving Capability
10-6
10-4
10-2
100
102
1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8
1p
1n
1f
Gate leakage current (A)@
L=0.1mm
, W=1m
mG
ate
leak
age
curre
nt (A
/cm
2 )
Physical Tox (nm)
Gate dielectric:SiO2nMOS Inversion
Ioff for Low power
Ioff for Standard
UX4
UX5
Ioff criteria for High speedUX6:0.10µm node
Gate leakage current vs. ToxGate leakage current vs. Tox
0 10 20 30 40 50
10-3
100
103
Dielectric Constant
J G (
A/cm
2 )
TEQ=1nmVOX=1Vm*=0.5m0
SiO2
φB=3.1eV
2.0eVφB=
1.5eV
1.0eV
0.5eV
High-k Materials for Low Gate Leakage CurrentHigh-k Materials for Low Gate Leakage CurrentHigh-k material selection guide line Higher Barrier hight than 1eVMedium K value (10-100) (due to lowering φB)
Al2O3 ZrO2 HfO2× La2O3 Ta2O5
SiO
2
0 1 2 3 4 510-12
10-9
10-6
10-3
100
103
TEQ (nm)
J G (
A/cm
2 )
100nm SOI-CMOS 100nm SOI-CMOS
0.0 0.5 1.0 1.5 2.00
10
20
30
Supply voltage [V]
Prop
agat
ion
dela
y τ p
d[p
sec/
stag
e]
Bulk
SOI
100–nm CMOS
SOI-CMOS : ~20% faster than bulk-CMOSSOI-CMOS : smaller junction capacitance
100 nm
CoSi2
Box
SOICoSi2
M9M9M9
M9M9M9
M1M1M1M2M2M2M3M3M3M4M4M4M5M5M5M6M6M6M7M7M7M8M8M8
M9M9M9
M10?M10?M10?
L-ox
k=2.4k=2.0
k=2.9
M4M4M4
M3M3M3
M2M2M2
M1M1M1
M5M5M5
W
Cu
SiO2
UX4(150nm) UX5(130nm) UX6(100nm) UX7(70nm)
Wiring IssuesWiring Issues
low-kk=4.2
Cu
Triple-Layered p-BCB/Cu SDI on 80nm CMOS Triple-Layered p-BCB/Cu SDI on 80nm CMOS
SiO2
W-plug
Cu wiringBCB
Cu-plug
80nm
80nm p/n-gate
tox=1.9nm(SiON)SDI: Single damascene interconnect
1.0µm
0.5µm
With Ta/TaN Barrier
Keff = 3.1
Full p-BCB StructureCu Dual-Damascene Interconnect
p-BCB
BMF(Barrier Metal-Free) Cu-DDI BMF(Barrier Metal-Free) Cu-DDI
BMF
MOCVD-Cu
50%-reduction in Via-resistanceCu-DDI module with the full low-k structure reducing the interconnect delay.
Performance of Cu-DDI with BMF structure Performance of Cu-DDI with BMF structure
0.1
1
10
0.1 0.2 0.3 0.4 0.5
Med
ium
Via
Res
ista
nce(
Ω/u
nit)
Designed Via Diameter(µm)
Cu-DDI(BMF with Cu-epi-contact)
Al/W-plug
Cu-DDI with TaN-barrier
2
2.5
3
3.5
4
4.5
0.2 0.3 0.4 0.5 0.6
Mid
ium
Tpd
(ns)
Space width(µm)
Full p-BCB(BMF, keff=3.1)
p-BCB/SiO2(Hybrid, Keff=3.6)
SiO2 Keff=5
0.1mm CMOS 5GHz Clocking0.1mm CMOS 5GHz Clocking
On-Chip Transmission Line
2.5mm 200ps 20ps
Local Clock Distribution
5GHz Clocking for 10mm x 10mm
Multiple System ConfigurationsMultiple System Configurations
e.g. Communication between boards Inside Routers
OIF: Optical InterfaceSW: SwitchMCSL: Multi-Channel
Serial link
Backplane or Cable
Line Interface
SW
MCSL
SW Multi-Channel Serial link
Case 1
Case 2
40G
10G
40G
Backplaneor cable
Backplane or Cable
LineInterface
MCSL
OIF
LineInterface
MCSL
OIF
Optical Signal for WAN/LAN
10G
LineInterface
MCSL
OIF
10G
LineInterface
MCSL
OIF
LineInterface
MCSL
OIF
LineInterface
MCSL
OIF
LineInterface
MCSL
OIF
LineInterface
MCSL
OIF
LineInterface
MCSL
OIF
10G10G
44 --C
H R
XC
H R
X
44 --C
H R
XC
H R
X
44 --C
H R
XC
H R
X44 --
CH
RX
CH
RX
44 --C
H R
XC
H R
X
44 --C
H T
XC
H T
X
44 --C
H T
XC
H T
X
44 --C
H T
XC
H T
X44 --
CH
TX
CH
TX
44 --C
H T
XC
H T
X
Microphotograph of 100-Gb/s Test ChipMicrophotograph of 100-Gb/s Test Chip
4-ChannelReceiver IP core
4-ChannelTransmitter IP core
100 Gb/s Bandwidth = 20-Channel 5-Gb/s Serial Links
NoiseSource
(20,000 Flip Flops)
1 mm
2.25
mm
0.15-µm CMOS, 6 ALSupply Voltage:
1.5 VPackage:
1,296-pin FC-BGAPower Dissipation:
142 mW/Ch. (TX)236 mW/Ch. (RX)
Bit Error Rate:Under 10-13 @ 5Gb/swith a 5-meter AWG28 Cable
Chip PhotoChip Photo
0.18µm CMOS
TOX=3.8nm2 metal layers
WaveformsWaveforms
Input Data(10Gb/s)
Input Clock(2.5GHz)
Output Data(2.5Gb/s)
VDD=1.3V
High-Speed InterfaceHigh-Speed Interface
Power (W)1 10
1
51020
Band
wid
th (
Gb/
s)
CMOS
Bipolar
Parallel SiGe
100
1T
High speed
Low power
P L L
1 0 :1
M U X
8 B 1 0 B
&
3 2 :8M U X
C D R
1 :1 0D E M U X
&W A
1 0 B 8 B
8 :3 2D E M U X
&
E B
1 .1 m m1 .3 m m
1.2
m
1.3
m
5Gbps x 4 Parallel I/O LSI
CMOS Breaks Power Walls for Speed
Pentium
Dynamically Reconfigurable LogicDynamically Reconfigurable Logic
10000
1000
100
10
1# of
App
licat
ion
Prog
ram
s
101 102 103 104 105
Power Efficiency (MIPS/W)
MPU
MCU
DSP
ASIC
DRL
Realtime Reconfguration: 5ns8 Configuration Data on a Chip0.25um CMOS, 5MTr
A B
A
B
A B
A
B
A B
A
B
A B
A
BFA
Reconfigurable ComputingReconfigurable Computing
Test Chip
(1)
(2)
(3)
(4)
(5)
Rec
onfig
urat
ion
PEx
ecut
ion
of P
rogr
am
Dynamically Reconfigurable LogicDynamically Reconfigurable LogicCustomize hardwaredynamically
External control style
Reconfigurable Hardware
A
D
B
C
E
A B D C E
Task 1
X
Z
W
Y
Task N
H I J W X
Y Z
Internal control style
µP
Task 1
Task N Tim
eTi
me
Task N
MemoryController
MemoryController
Task 1
0.15um 14 Million Trs.125MHz x 4PEs
x 2-way Superscalar
1GIPS@1W, Vdd=1.3V
Dynamic power control of PEs On-chip power switch
Idling:25mW, Sleep:0.2mW
0.15um 14 Million Trs.125MHz x 4PEs
x 2-way Superscalar
1GIPS@1W, Vdd=1.3V
Dynamic power control of PEs On-chip power switch
Idling:25mW, Sleep:0.2mW
10.5mm
10.5
mm
MP98:Single-Chip MultiprocessorMP98:Single-Chip MultiprocessorNovel multi-thread architectureSoftware for parallel architectureSophisticated power management
Power Management in MP98Power Management in MP98
333µm
38.8µm
PowerSW
Block
Cap.SW Internal
CircuitsCd
Ctrl
ControlCircuits
PowerSW
De-couplingCapacitors
Multi-oxide/-Vt for high-speed and low-power circuitsSoftware controlled power management circuit
10
1.0
t
Normalmode Sleep
Cache data
Idle
Leak Current
80us
0.01
Power(W)
25ms
0.0002
Ultimately Scaled-down MOSFETUltimately Scaled-down MOSFET1988 Symp. VLSI Tech.
Thin SOI structureTs<10nm :~Tinv
Minimum Lg predictedLg~10nm
Limiting factorTunneling current
L=15nmTox=1.5nmTs=10nmVd=0.5VVg=0.3V77K
200 nm
Top View of an EJ-MOSFETTop View of an EJ-MOSFET
SEM image
lower gate
n+ regionfield
2mm
200nm LLG
8nm < LLG < 100 nmTox=5nm
Subthreshold CharacteristicsSubthreshold Characteristics
- 1 0 110- 12
10- 10
10- 8
10- 6
10- 4
DR
AIN
CU
RR
ENT
(A) LLG = 8 nm
T = 300 K
200
150
1005025
0 0.5 1 1.510- 12
10- 10
10- 8
10- 6
10- 4
LOWER- GATE VOLTAGE (V)
LLG = 52 nm
T = 300 K
200
150
100
50
25
No temperature dependency of slope for L=8nm at <100KDirect tunneling between S and D
New Direction for the 21st Century New Direction for the 21st Century
Equivalent System ScalingNew Architecture/Circuit
Equivalent Device ScalingNew Material/Process
Synergetic Device/Ckt. Co-designFlexible Parameter CMOS
P
Logic 3
Logic2Logic1
Memory1
Memory2
Interconnect
P
Logic 3
Logic2Logic1
Memory1
Memory2
Interconnect
top related