computing performance: the n3xt...
TRANSCRIPT
Computing Performance:
The N3XT 1,000X
Department of Electrical Engineering
Stanford University
H.-S. Philip Wong
Collaborator: Subhasish Mitra
0 1 0 1
World Relies on Electronics
2
100101010101010101010101011001010010101010101001101010101010100111010011000101010101011001010001110010100101010101100010111010101010101010011010010101010101010101101010011001010110010101010101101001011010101010101001111100111110111010010010111010101101010110101010101010101010111001001001010101010
Abundant-data
World Relies on Electronics
3
100101010101010101010101011001010010101010101001101010101010100111010011000101010101011001010001110010100101010101100010111010101010101010011010010101010101010101101010011001010110010101010101101001011010101010101001111100111110111010010010111010101101010110101010101010101010111001001001010101010
World Relies on Electronics
Internet of Everything
4
100101010101010101010101011001010010101010101001101010101010100111010011000101010101011001010001110010100101010101100010111010101010101010011010010101010101010101101010011001010110010101010101101001011010101010101001111100111110111010010010111010101101010110101010101010101010111001001001010101010
Genomics
Smart Cities
Military Science
Finance
Security
Health Care Government
World Relies on Electronics
5
100101010101010101010101011001010010101010101001101010101010100111010011000101010101011001010001110010100101010101100010111010101010101010011010010101010101010101101010011001010110010101010101101001011010101010101001111100111110111010010010111010101101010110101010101010101010111001001001010101010
Genomics
Smart Cities
Military Science
Finance
Security
Health Care Government
Computational demands
exceed
Processing capability
World Relies on Electronics
6
7
10
MH
z
F
req.
5
GH
z
1980 Year 2013
Source: cpudb.stanford.edu
Many Walls Simultaneously
Power
wall
Year
Ha
rdw
are
bu
gs Complexity wall
Also:
resilience wall,
interconnect wall,
cooling wall, … Source: Intel
96%
Execution time
Compute
Memory access
Memory wall
US National Academy of Sciences (2011) 8
System
integration
Device
performance
Improve Computing Performance
9
Option 1: Better Transistors
Few experimental demos
Transistors ≠ system
System
integration
Device
performance10
Option 2: Design Tricks
Limited “tricks”
Complexity → design bugs
Multi-cores
Power
management
System
integration
Device
performance11
Improve Computing Performance
Multi-cores
Power
management
Target:
1,000X performance
Business as usual insufficient
System
integration
Device
performance12
Solution: Nanosystems
Translate new nanotech
New devices
New fabrication
New sensorsimperfections?
large-scale fabrication?
13
Solution: Nanosystems
Translate new nanotechinto new systems
New devices
New fabrication
New sensors
glo
ba
l IL
Vlocal ILVs
New
architectures
14
Solution: Nanosystems
Translate new nanotechinto new systems
enable new applicationsNew devices
New fabrication
New sensors
glo
ba
l IL
Vlocal ILVs
New
architectures
15
Abundant-Data Applications
16
Huge memory wall
96%
Application execution time
Compute
Memory access
Limited to 2-Dimensional Circuits
Computer Chips Today
17
N3XT NanosystemsComputation immersed in memory
18
Memory
N3XT NanosystemsComputation immersed in memory
Computing logic
Ultra-dense
vertical connections
19
Memory
N3XT NanosystemsComputation immersed in memory
Impossible with today’s technologies
Computing logic
Ultra-dense
vertical connections
20
21
Nano-Engineered
Computing Systems Technology
21
0 1 0 1
Unique N3XT Technology
22
End-to-end
Isolated improvements inadequate
Chip stacking
New apps
Memories
Nanoscale
cooling
Abundant
data
apps
1D / 2D
FETs,
RRAM,
mRAM
Architecture
&
software
Yield,
reliability
New
3D
fabrication
Existing efforts N3XT
Memory
Computing Logic
N3XT NanosystemsComputation immersed in memory
Ultra-dense
vertical connections
23
Carbon Nanotube FET (CNFET)
24
CNT: d = 1.2nm
2 µm
Gate
2 µm
Gate
Energy Delay Product
~ 10X benefit
IBM Power 7 model
d
CNFET
Sub-litho
CNFET Inverter
25
P+ Doped
N+ Doped
INPUT
Big Promise, Major Obstacles
26
Mis-positioned CNTs Metallic CNTs
Process advances alone inadequate
[Zhang IEEE TCAD 12]
Solution: Imperfection-immune design
CNT Growth circa 2005
27
Highly mis-positioned
10 µm
First Wafer-Scale Aligned CNT Growth
28
Quartz wafer
with catalyst
Aligned
CNT growth
Quartz wafer with CNTs
20mm
99.5% aligned CNTs
Stanford Nanofabrication Facility
[Patil VLSI Tech. 08, IEEE TNANO 09]
Wafer-Scale CNT Transfer
29[Patil VLSI Tech. 08, IEEE TNANO 09]
High-temperature CNT growth
900 °C
CNT transfer
120 °C
Low-temperature circuit fabrication
Before transfer After transfer
SiO2/SiQuartz
2 µm
CNTs
2 µm
Mis-Positioned CNT-Immune NAND
30
BA
A
B
Out
1. Grow CNTs
2. Extended gate, contacts
3. Etch gate & CNTs
4. Dope P & N regions
Etched
region
essential
Arbitrary logic functions
Graph algorithms
Vdd
Gnd
VLSI Metallic CNT Removal
31
Chip-scale electrical breakdown
Universally effective
[Patil IEDM 09, IEEE TNANO 10, Shulaker ACS Nano 14]
source
drain
2μm5μm
99.99% m-CNT removal, 4% s-CNT removal
New VMR
32
Arbitrary technology nodes: 10nm & beyond
[Shulaker IEDM 15]
Relaxed node m-CNTs Erased Scaled circuits
Record selectivity
99.99% m-CNTs erased, 1% s-CNTs erased
Most Importantly
33
VLSI processing
No per-unit customization
VLSI design
Immune CNT library
First Sub-system: ISSCC Demo
34[Shulaker ISSCC 13, IEEE JSSC 14] Collaborator: Prof. G. Gielen, KU Leuven
First Sub-system: ISSCC Demo
35[Shulaker ISSCC 13, IEEE JSSC 14] Collaborator: Prof. G. Gielen, KU Leuven
Wafer with CNFET circuits
Robot
ISSCC Jack Raper Outstanding Technology Directions Paper
Sacha: CNT Controlled Hand-shaking Robot
CNT Computer
36[Shulaker Nature 13]
CNT Computer
37[Shulaker Nature 13]
Turing-complete processor: entirely CNFETs
Instruction Fetch Data Fetch ALU Write-back
System Demos
38
Video: https://www.youtube.com/watch?v=7lmK4iNrIGo&feature=youtu.be
Reproducible Results
39
80 ALUs 200 D-Latches
~ 1,800 CNFETs~ 1,600 CNFETs
Waveforms overlaid
High-Performance CNFETs
40
Doping Current Drive
Contact Resistance Scaling
Dielectric
interactions
High-Performance CNFETs
41
> 100 CNTs/mm
Major challenge
New result
> 100 CNTs/mm
Record ION density
Controlled variations
CNFET
(Stanford lab)
Si FET
(foundries)
I ON
(µA
/µm
)
High-density CNTs
[Shulaker IEDM 14]
42
High Performance Obstacles
Doping Current Drive
Contact ResistanceScaling
Dielectric
interactions
43
SS = 97 mV/dec
n-CNFET p-CNFET
VDD
GND
In
Out
VIN(V)
VO
UT(V
)
1
100
• VDD from 1.0V 0.8V 0.6V 0.4V 0.2V
Complementary CNFET Logic
44
High Performance Obstacles
Doping Current Drive
Contact ResistanceScaling
Dielectric
interactions
45
Recent Progress
Top-contact Edge-contact
[Cao, Science 15 (IBM)]
Memory
Computing Logic
N3XT NanosystemsComputation immersed in memory
Ultra-dense
vertical connections
46
Many Nano-scale Innovations
47
Memory & logic devices
30 µm thick
Vertical metal nanowire arrays
Phase change: hotspots suppressed
Embedded cooling
3D Resistive RAM (RRAM)
<1 nm
MoS2
2D FETs: large-area monolayer MoS2
New Memories
filament
oxygen ion
Top Electrode
Bottom Electrode
metaloxide
oxygen vacancy
Bottom Electrode
Top Electrode
oxide isolation
switching region
phase change material
filament
Bottom Electrode
solid electrolyte
Active Top Electrode
metal atoms
STT-MRAM CBRAMRRAMPCM
Spin torque transfer magnetic random access memory
Phase change memory
Resistive switching random access memory
Conductive bridge random access memory
Random access, non-volatile, no erase before write
Soft Magnet
Pinned Magnet
tunnel barrier (oxide)
current
48
Scalable Embedded Memory
Bi-layer TiOx (2.5nm) / HfOx (1.5nm)
Y. Wu, H. Yi, Z. Zhang, Z. Jiang, J. Sohn,
S. Wong, H.-S. P. Wong, IEDM 2013.
(Stanford)
B.Govoreanu et al., IEDM 2011 (IMEC)
Scalable: 10 nmScalable: 12 nm
49
High Density 3D Memory
Stanford: IEDM ’12, ’13, VLSI ’13, ’14,DATE ’15, Nature Comm ‘15
2nd Layer
1st Layer
Al2O3
Graphene
SiO2
TiN
Al2O3
Graphene
SiO2
TiN TiN
2nd Layer
1stLayer
Pt
Pt
SiO2
HfOx
TiN
Pt
SiO2
40nm 40nm
5nm
5nm 5nm
G-RRAM (graphene thickness: 3Å) Pt-RRAM (Pt thickness: 25nm)
Pt-RRAM (Pt thickness: 5nm)
RRAM memory cells
50
High Density 3D Memory
< 1 μA
1 – 2 V
5 ns
> 1G cycles
F = 5 nm
128 layers
64 Tb per chipStanford: IEDM ’12, ’13, VLSI ’13, ’14,DATE ’15, Nature Comm ‘15
51
Memory
Computing Logic
N3XT NanosystemsComputation immersed in memory
Ultra-dense
vertical connections
52
First Logic + Memory Monolithic 3D
[Shulaker IEDM 14]
Si-F
ET
RR
AM
CN
FE
TLogic
RAM
RAM
Logic
The “High-rise”
chip
Circuit demos
Routing Element
Routing Element
Routing Element WL[3]
WL[1]
WL[0]
BL_1 BL_2
Routing Element
in2
in1
out2
out1
WL[2]
200 µm
53
Millions of sensors
Memory
CNT computing logic
Ultra-dense
vertical connections
Terabytes / second
Abundant sensor data:
Extensive, accurate classification
Interwoven Compute + Memory + Sensing
[M. Shulaker, Stanford. Unpublished] 61
To be published. Please keep in confidence
Complement with Software Solutions
62DSL = Domain-Specific Language
Co-optimized
s/w + h/w
Runtime
optimization
Learning:
key
architectural
concept
Yield,
reliability
Cross-
Layer
ResilienceDSL
compiler
Quantifying N3XT System Benefits
63
Heterogeneous nanotechnologies
Architecture design space
Physical design
Integrated thermal analysis
Yield, reliability
Sweet Spot: Abundant-Data Apps.
64PageRank app.
851X benefits
0%
20%
40%
60%
80%
100%
2D N3XT
0%
20%
40%
60%
80%
100%
2D N3XT
2.7% 4.3%
Energy: 37X Exec. Time: 23X
IBM graph analytics Data-intensive computing
Sweet Spot: Abundant-Data Apps.
65
IBM graph analytics Data-intensive computing
PageRank app.
851X benefits
0%
20%
40%
60%
80%
100%
2D N3XT
0%
20%
40%
60%
80%
100%
2D N3XT
Processor active Processor idle Memory access
0%
1%
2%
3%
0%
3%
5%
Energy: 37X Exec. Time: 23X
Massive Benefits Require- Not a logic device
- Not a memory device
- Not 3D integration
- Not thermal management
- Not new architectures
- Not yield and reliability management
66
Massive Benefits Require- Not a logic device
- Not a memory device
- Not 3D integration
- Not thermal management
- Not new architectures
- Not yield and reliability management
N3X
T
67
Sponsors
Non-Volatile Memory Technology Research Initiative
68
Conclusion
69
Nanosystems today
Game ON, to the era
N3XT 1,000X
Compute + memory densely interwoven
0 1 0 1
Memory
N3XT NanosystemsComputation immersed in memory
Computing logic
Ultra-dense
vertical connections
70