a 68 parallel row access neuromorphic core with 22k multi...
TRANSCRIPT
A 68 Parallel Row Access Neuromorphic Core with 22K Multi-Level Synapses Based on Logic-
Compatible Embedded Flash Memory TechnologyM. Kim1, J. Kim1, G. Park1, L. Everson1, H. Kim1,
S. Song1,2, S. Lee2 and C. H. Kim1
1Dept. of ECE, University of Minnesota2Anaflash Inc.
Outline• Background
• Logic-Compatible eFlash based Synapse
• Neuromorphic Core Design
• 65nm Test Chip Results
• Conclusions
2
3
• MLP: input/hidden/output layers• Unit perceptron: multiply-accumulate operation + activation function
W0
W1
W2
Wi
Weights
X0
X1
X2
Xi
Inputs
Ʃ Integrator
Activationfunction
Output
Artificial Neural Network (ANN)Unit perceptronMulti-layer perceptron (MLP)
X0
X1
X2
Xi
Input layer Output layerHidden layer
Output
ANN Digital vs Analog Implementation
4
• Pros : Digital CMOS• Issues : Large area, large power
consumption
• Pros : Current summation replaces complex digital blocks
• Issues: Sensitive to PVT variation, requires good synaptic device
Digital implementation Analog Implementation
W0
WiISum
IRef
X0
Xi
BL
Sense Amp.
Output
Adder
Accumulator Comparator
Threshold
Output
Multiplier WeightsX0 Xi ••• Wi W0 •••
Device SRAM MRAM RRAM PCRAM eFlash
CellConfiguration
Nonvolatile? No Yes Yes Yes Yes
Tunable? No No No Yes Yes
Logic Compatible? Yes Not yet No No Yes
Multi levelWeights? No No No Yes Yes
Area/bit 150F2 40F2 60F2 40F2 ~500F2
Memory Options for Synapse Circuit
5
This work
M3M1
M2
FG
S
DNW
NW
Dual-poly vs. Single-poly eFlash
6
• Needs additional masks to form floating gate (FG)
CG
FG
FG
S D S D DS S D
Dual-Poly eFlash Single-Poly eFlash
PW NW NWP-SUB
M3M1
M2
M1 M2M3
FG
S
DNW
NWFG : Floating GateCG : Control Gate
(WM1 >> WM2,WM3)
Proposed Synapse: Logic Compatible eFlash
7
CG
FG
FG
S D S D DS S D
Dual-Poly eFlash Single-Poly eFlash
PW NW NWP-SUB
M3M1
M2
M1 M2M3
FG
S
DNW
NWFG : Floating GateCG : Control Gate
(WM1 >> WM2,WM3)
• Floating gate: Back-to-back connected gate• Logic-compatible nonvolatile memory solution• Program verify allows precise weight programming
Proposed Synapse: Logic Compatible eFlash
8
• Cell current proportional to X∙W (=0µA, 5µA, or 10µA) • BL voltage pinned at 0.6V during read operation
0.0 0.2 0.4 0.6 0.8 1.002468
101214161820
Cell
Curr
ent (
µA)
BL Voltage (V)
VFG -0.600VVFG -0.066VVFG 0.046V
X=1, W=0 or X=0, W=0,1,2
X=1, W=1
X=1, W=2
65nm,1.0V,25°C, X(VRD)=0.8V
X
0V
X
2.0V
2.0V
FG: Floating Gate
BL : 0.6V
ICELLW
X W X∙W ICELL
0 0 0 0µA0 1 0 0µA0 2 0 0µA1 0 0 0µA1 1 1 5µA1 2 2 10µA
Proposed Synapse: Logic Compatible eFlash
9
• Even and odd bitline pair realizes 5 level weights
X
0V
X
2.0V
2.0V
FG: Floating Gate
Even BL : 0.6V
ICELL0W0
X W0 W1 X∙W0 X∙W1 ICELL0 ICELL1 ∆I1 0 2 0 2 0µA 10µA -10µA1 0 1 0 1 0µA 5µA -5µA1 0 0 0 0 0µA 0µA 0µA1 1 0 1 0 5µA 0µA 5µA1 2 0 2 0 10µA 0µA 10µA
Odd BL : 0.6V
ICELL1W1
NegativeWeights
PositiveWeights
Positive Weight Negative Weight
Detailed Erase and Program Operations
10
• FN tunneling utilized for erase and program• Program inhibition of unselected cells via self-boosting
~HV~HV
BLA(0V)
PWL(HV)
WWL(HV)
CSL(VDD)
RWL(VDD)
EWL(0V)
BLB(VDD)
Programmed Program-Inhibited via self-boosting
e- e-
Cel
l A
Cel
l B
OFF OFF
ON OFF
Program Operation
[8] S. Song, JSSC 2013 (UMN)
BL(0V)
PWL(0V)
WWL(HV)
CSL(0V)
RWL(0V)
EWL(0V)
M1
M2e-
OFF
OFF
Erase Operation
~0VWM1=8xWM2,3
Proposed Bitline Pair Implementation
• Spiking criteria∑WP -(-ThP) > ∑WN +(+ThN)
• Output generated in a single current summation and thresholding cycle
11
WP
WP
WP
ThP
WN
WN
WN
ThN
WL<0>
WL<1>
WL<2>
WL<67>
Even BL Odd BL
. . .. . .
Sensing neuron
Output
IEXC IINH
WP
WN
Th
Postive weight
Negative weight
Threshold
‘1’
‘0’
‘1’
‘1’
68 Row x 160 Column Core Architecture
12
• 5T eFlash array, high voltage switches, BL sensing circuits• Input data loaded on to 64 read wordlines, 4 rows for threshold
High Voltage Switch
Pixe
l-IN
SCAN
& Le
vel S
hifte
r
VPP1
~4
Charge Pump
Cont
rol
SPIKE-OUT SCAN
Ref. Gen.
HVS
HVS
HVS
5T eFlash Array
HVS
Verify/Sensing Neuron Circuit(Core Devices)
BL0 BL159
PWL0
WWL0
PWL67
WWL67
WRITE SCAN & BL DRIVER
PWL
WWL
CSL
RWL
EWL
BLEV BLOD
FGEV FGOD
WRITE SCAN
VREG -
+
VREG -
+
- +
SPIKE-OUT SCAN
IREF
V EV
V OD
VREF
BL
CSL
RWL
PWL
WWL
EWL
S1
M1
M2
M3
S2
FGIN
M1
M2 S2M3
S1
5T Eflash Unit Cell
Inference and Verify Operations
13
• Inference mode: Compares BL pair currents• Verify mode: Compare BL currents with a common ref. current
PWL
WWL
CSL
RWL
EWL
BLEV BLOD
FGEV FGOD
WRITE SCAN
VREG -
+
VREG -
+
- +
SPIKE-OUT SCAN
IREF
VEV VOD
VREF
On
On
Off
Off
PWL
WWL
CSL
RWL
EWL
BLEV BLOD
FGEV FGOD
WRITE SCAN
VREG -
+
VREG -
+
- +
SPIKE-OUT SCAN
IREF
VEV VOD
VREF
On
On
Off
Off
Inference mode 2-cycle current verify mode
PWL
WWL
CSL
RWL
EWL
BLEV BLOD
FGEV FGOD
WRITE SCAN
VREG -
+
VREG-
+
- +
SPIKE-OUT SCAN
IREF
VEV VOD
VREF
IEV IOD
Off Off
On OnPass-TR
SA(SenseAmp.)
65nm Die Photo and Feature Summary
14
WL Drivers w/ HVS
WL Drivers w/ HVS
EFlash Based Synapse Array
BL Sensing
BL Driver
Technology
Circuit Area 1100 X 600 μm2
65nm CMOS
600µm
1100µm
VDD (Core, IO) 1.0V / 2.5V
# of Neurons 320
# of Synapses
22K(=68x320)
Throughput
15.9µW (per neuron)Power
1.28G pixels/s per core
(tREAD : 50ns)
Program and Program Inhibition Characteristics
15
Cell
Curr
ent (
µA)
0 5 10 15 200
1020304050
0 5 10 15 200
1020304050
PGM Bias 7à8.8V
(0.1V Step)
PGM Bias 8.8à7V(0.1V Step)
Avg. current of 100 cells, 25°C, VRD=0.8V, VBL=0.6V
Pulse Count (20µs width)Pulse Count (20µs width)
Cell
Curr
ent (
µA)
Program inhibition modeProgram mode
Multi-level Program Sequence
16
• Program bias: 8.8V à 7.4V à 7.1V• Target cell current: 0µA, 5µA , and 10µA
Weight 0 PGM & 0 Verify Weights 1,2 PGM & Verify Weight 1 PGM & Verify8.8V
7.4V 7.1V 7.1V 7.1V
20µs 40µs 5µs 5µs
7.1V
Cell
Curr
ent (
µA) 50
40 30 20 10
0 Weight 0
Weight 1,2 Weight 2 Weight 1
Programmed Cell Current Variation
17
• Cell current variation less than 0.8µA after program-verify operations• Data shows that a higher number of levels is possible
Cell
Curr
ent (
µA) Weight 2 Cells
Weight 1 Cells
Weight 0 Cells
BL #
10
8
6
4
2
00 50 100 150 200 250 0
Cell
Curr
ent (
µA)
Weight 2 cells
Weight 1 cells
Weight 0 cells
WL #
Weight 0,1,2 Cell Current Variation, 25°C, VRD=0.8V, VBL=0.6V
0 10 20 30 40 50 60 700
2
4
6
8
10
Number of Program Pulses
18
• Initial cell current variation is 8µA
• Weight 2 and weight 1 cells completely programmed after 20 and 30 pulses, respectively.
25°C, VRD=0.8V, VBL=0.6V Ce
ll Cu
rren
t (µA
)
Weight 2 target
Weight 1 target
10 15 20 25 30
20
15
10
5
0
Program Pulse Count
MNIST Digit Recognition Accuracy Results
•Recognition accuracy is 91.8% for 10K MNIST test images, which is close to the software model’s 93.8% accuracy
19
MNIST dataset:Handwritten Digit images
Learning algorithm (RBM)
Training
Neuromorphic core(Test chip)
Synaptic weights
Inference
Linear classifier(Spike comparison
w/ trained ref.)
Hidden units: Spike set
Match
Digitrecognition
256 axon signals
Final results
16X16 Pixel data
eFlash-based core
0 1 2 3 4 5 6 7 8 9
10 1 2 3 4 5 6 7 8 9
# of RecognitionsImage 0
Image 1
Image 2
Image 3
Image 4 Image 5
Image 9
Image 8
Image 7
Image 6
Output Neuron Index
# of
Rec
ogni
tions
10
1001000
110
1001000
110
1001000
110
1001000
110
1001000
110
1001000
110
1001000110
1001000110
1001000110
1001000
Retention Test Results
20
• Excellent retention characteristics
‒ Reprogramming is not necessary
‒ A higher number of levels is possible
1K Pre-cycles, Tbake=150°C
100 101 102 103
Bake Time (hrs)Before baking
Cell
Curr
ent (
µA)
0
5
10
15
Weight 2 (mean)
±3σWeight 1 (mean)
Weight 0 (mean)
Comparison Table
[2] W. Chen, et al., IEDM, 2017.[3] X.Guo, et al., IEDM, 2017.[4] W.Khwa, et al., ISSCC, 2018.
[5] S. Gonugondia, et al., ISSCC, 2018.[6] W.Chen, et al., ISSCC, 2018.
21
ISSCC’18 [6]
ApplicationHandwritten
digit recognition
Technology 65nm
ISSCC’18 [5]
65nm
ISSCC’18 [4]Handwritten
digit recognition
65nm
# of Currents Summed Up 14 Cells 4 Cells30 Cells
Non volatile? YES (ReRAM) NO (SRAM) NO (SRAM)
Machine learningclassifier
Voltage 1.0V 1.0V1.0V
Logic Compatible?
IEDM’17 [2]
Weight Resolution
YES (ReRAM)
YES YES NONO
2 Bits 3 Bits 1 Bit 1 Bit
Program-verify? NO NO NO NO
2 Cells
150nm
Computing in memory
1.8V
IEDM’17 [3]
YES (eFlash)
NO
N/A
YES
N/A
180nm
Handwritten digit
recognition
2.7V
65nm
This work
Handwritten digit
recognition
YES (Eflash)
68 Cells
1.0V
YES
2.3 Bits (5 levels)
YES
Conclusions
• A logic-compatible 5T eFlash based neuromorphic core demonstrated in 65nm CMOS
• Key features‒ Non-volatile weight storage in a logic CMOS process‒ Precise multi-level weights enabled by program verify‒ 68 row parallel access
• Test chip results show 91.8% digit recognition accuracy
22