a 68 parallel row access neuromorphic core with 22k multi...

22
A 68 Parallel Row Access Neuromorphic Core with 22K Multi-Level Synapses Based on Logic- Compatible Embedded Flash Memory Technology M. Kim 1 , J. Kim 1 , G. Park 1 , L. Everson 1 , H. Kim 1 , S. Song 1,2 , S. Lee 2 and C. H. Kim 1 1 Dept. of ECE, University of Minnesota 2 Anaflash Inc.

Upload: others

Post on 18-Jan-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

A 68 Parallel Row Access Neuromorphic Core with 22K Multi-Level Synapses Based on Logic-

Compatible Embedded Flash Memory TechnologyM. Kim1, J. Kim1, G. Park1, L. Everson1, H. Kim1,

S. Song1,2, S. Lee2 and C. H. Kim1

1Dept. of ECE, University of Minnesota2Anaflash Inc.

Page 2: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Outline• Background

• Logic-Compatible eFlash based Synapse

• Neuromorphic Core Design

• 65nm Test Chip Results

• Conclusions

2

Page 3: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

3

• MLP: input/hidden/output layers• Unit perceptron: multiply-accumulate operation + activation function

W0

W1

W2

Wi

Weights

X0

X1

X2

Xi

Inputs

Ʃ Integrator

Activationfunction

Output

Artificial Neural Network (ANN)Unit perceptronMulti-layer perceptron (MLP)

X0

X1

X2

Xi

Input layer Output layerHidden layer

Output

Page 4: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

ANN Digital vs Analog Implementation

4

• Pros : Digital CMOS• Issues : Large area, large power

consumption

• Pros : Current summation replaces complex digital blocks

• Issues: Sensitive to PVT variation, requires good synaptic device

Digital implementation Analog Implementation

W0

WiISum

IRef

X0

Xi

BL

Sense Amp.

Output

Adder

Accumulator Comparator

Threshold

Output

Multiplier WeightsX0 Xi ••• Wi W0 •••

Page 5: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Device SRAM MRAM RRAM PCRAM eFlash

CellConfiguration

Nonvolatile? No Yes Yes Yes Yes

Tunable? No No No Yes Yes

Logic Compatible? Yes Not yet No No Yes

Multi levelWeights? No No No Yes Yes

Area/bit 150F2 40F2 60F2 40F2 ~500F2

Memory Options for Synapse Circuit

5

This work

M3M1

M2

FG

S

DNW

NW

Page 6: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Dual-poly vs. Single-poly eFlash

6

• Needs additional masks to form floating gate (FG)

CG

FG

FG

S D S D DS S D

Dual-Poly eFlash Single-Poly eFlash

PW NW NWP-SUB

M3M1

M2

M1 M2M3

FG

S

DNW

NWFG : Floating GateCG : Control Gate

(WM1 >> WM2,WM3)

Page 7: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Proposed Synapse: Logic Compatible eFlash

7

CG

FG

FG

S D S D DS S D

Dual-Poly eFlash Single-Poly eFlash

PW NW NWP-SUB

M3M1

M2

M1 M2M3

FG

S

DNW

NWFG : Floating GateCG : Control Gate

(WM1 >> WM2,WM3)

• Floating gate: Back-to-back connected gate• Logic-compatible nonvolatile memory solution• Program verify allows precise weight programming

Page 8: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Proposed Synapse: Logic Compatible eFlash

8

• Cell current proportional to X∙W (=0µA, 5µA, or 10µA) • BL voltage pinned at 0.6V during read operation

0.0 0.2 0.4 0.6 0.8 1.002468

101214161820

Cell

Curr

ent (

µA)

BL Voltage (V)

VFG -0.600VVFG -0.066VVFG 0.046V

X=1, W=0 or X=0, W=0,1,2

X=1, W=1

X=1, W=2

65nm,1.0V,25°C, X(VRD)=0.8V

X

0V

X

2.0V

2.0V

FG: Floating Gate

BL : 0.6V

ICELLW

X W X∙W ICELL

0 0 0 0µA0 1 0 0µA0 2 0 0µA1 0 0 0µA1 1 1 5µA1 2 2 10µA

Page 9: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Proposed Synapse: Logic Compatible eFlash

9

• Even and odd bitline pair realizes 5 level weights

X

0V

X

2.0V

2.0V

FG: Floating Gate

Even BL : 0.6V

ICELL0W0

X W0 W1 X∙W0 X∙W1 ICELL0 ICELL1 ∆I1 0 2 0 2 0µA 10µA -10µA1 0 1 0 1 0µA 5µA -5µA1 0 0 0 0 0µA 0µA 0µA1 1 0 1 0 5µA 0µA 5µA1 2 0 2 0 10µA 0µA 10µA

Odd BL : 0.6V

ICELL1W1

NegativeWeights

PositiveWeights

Positive Weight Negative Weight

Page 10: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Detailed Erase and Program Operations

10

• FN tunneling utilized for erase and program• Program inhibition of unselected cells via self-boosting

~HV~HV

BLA(0V)

PWL(HV)

WWL(HV)

CSL(VDD)

RWL(VDD)

EWL(0V)

BLB(VDD)

Programmed Program-Inhibited via self-boosting

e- e-

Cel

l A

Cel

l B

OFF OFF

ON OFF

Program Operation

[8] S. Song, JSSC 2013 (UMN)

BL(0V)

PWL(0V)

WWL(HV)

CSL(0V)

RWL(0V)

EWL(0V)

M1

M2e-

OFF

OFF

Erase Operation

~0VWM1=8xWM2,3

Page 11: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Proposed Bitline Pair Implementation

• Spiking criteria∑WP -(-ThP) > ∑WN +(+ThN)

• Output generated in a single current summation and thresholding cycle

11

WP

WP

WP

ThP

WN

WN

WN

ThN

WL<0>

WL<1>

WL<2>

WL<67>

Even BL Odd BL

. . .. . .

Sensing neuron

Output

IEXC IINH

WP

WN

Th

Postive weight

Negative weight

Threshold

‘1’

‘0’

‘1’

‘1’

Page 12: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

68 Row x 160 Column Core Architecture

12

• 5T eFlash array, high voltage switches, BL sensing circuits• Input data loaded on to 64 read wordlines, 4 rows for threshold

High Voltage Switch

Pixe

l-IN

SCAN

& Le

vel S

hifte

r

VPP1

~4

Charge Pump

Cont

rol

SPIKE-OUT SCAN

Ref. Gen.

HVS

HVS

HVS

5T eFlash Array

HVS

Verify/Sensing Neuron Circuit(Core Devices)

BL0 BL159

PWL0

WWL0

PWL67

WWL67

WRITE SCAN & BL DRIVER

PWL

WWL

CSL

RWL

EWL

BLEV BLOD

FGEV FGOD

WRITE SCAN

VREG -

+

VREG -

+

- +

SPIKE-OUT SCAN

IREF

V EV

V OD

VREF

BL

CSL

RWL

PWL

WWL

EWL

S1

M1

M2

M3

S2

FGIN

M1

M2 S2M3

S1

5T Eflash Unit Cell

Page 13: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Inference and Verify Operations

13

• Inference mode: Compares BL pair currents• Verify mode: Compare BL currents with a common ref. current

PWL

WWL

CSL

RWL

EWL

BLEV BLOD

FGEV FGOD

WRITE SCAN

VREG -

+

VREG -

+

- +

SPIKE-OUT SCAN

IREF

VEV VOD

VREF

On

On

Off

Off

PWL

WWL

CSL

RWL

EWL

BLEV BLOD

FGEV FGOD

WRITE SCAN

VREG -

+

VREG -

+

- +

SPIKE-OUT SCAN

IREF

VEV VOD

VREF

On

On

Off

Off

Inference mode 2-cycle current verify mode

PWL

WWL

CSL

RWL

EWL

BLEV BLOD

FGEV FGOD

WRITE SCAN

VREG -

+

VREG-

+

- +

SPIKE-OUT SCAN

IREF

VEV VOD

VREF

IEV IOD

Off Off

On OnPass-TR

SA(SenseAmp.)

Page 14: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

65nm Die Photo and Feature Summary

14

WL Drivers w/ HVS

WL Drivers w/ HVS

EFlash Based Synapse Array

BL Sensing

BL Driver

Technology

Circuit Area 1100 X 600 μm2

65nm CMOS

600µm

1100µm

VDD (Core, IO) 1.0V / 2.5V

# of Neurons 320

# of Synapses

22K(=68x320)

Throughput

15.9µW (per neuron)Power

1.28G pixels/s per core

(tREAD : 50ns)

Page 15: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Program and Program Inhibition Characteristics

15

Cell

Curr

ent (

µA)

0 5 10 15 200

1020304050

0 5 10 15 200

1020304050

PGM Bias 7à8.8V

(0.1V Step)

PGM Bias 8.8à7V(0.1V Step)

Avg. current of 100 cells, 25°C, VRD=0.8V, VBL=0.6V

Pulse Count (20µs width)Pulse Count (20µs width)

Cell

Curr

ent (

µA)

Program inhibition modeProgram mode

Page 16: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Multi-level Program Sequence

16

• Program bias: 8.8V à 7.4V à 7.1V• Target cell current: 0µA, 5µA , and 10µA

Weight 0 PGM & 0 Verify Weights 1,2 PGM & Verify Weight 1 PGM & Verify8.8V

7.4V 7.1V 7.1V 7.1V

20µs 40µs 5µs 5µs

7.1V

Cell

Curr

ent (

µA) 50

40 30 20 10

0 Weight 0

Weight 1,2 Weight 2 Weight 1

Page 17: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Programmed Cell Current Variation

17

• Cell current variation less than 0.8µA after program-verify operations• Data shows that a higher number of levels is possible

Cell

Curr

ent (

µA) Weight 2 Cells

Weight 1 Cells

Weight 0 Cells

BL #

10

8

6

4

2

00 50 100 150 200 250 0

Cell

Curr

ent (

µA)

Weight 2 cells

Weight 1 cells

Weight 0 cells

WL #

Weight 0,1,2 Cell Current Variation, 25°C, VRD=0.8V, VBL=0.6V

0 10 20 30 40 50 60 700

2

4

6

8

10

Page 18: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Number of Program Pulses

18

• Initial cell current variation is 8µA

• Weight 2 and weight 1 cells completely programmed after 20 and 30 pulses, respectively.

25°C, VRD=0.8V, VBL=0.6V Ce

ll Cu

rren

t (µA

)

Weight 2 target

Weight 1 target

10 15 20 25 30

20

15

10

5

0

Program Pulse Count

Page 19: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

MNIST Digit Recognition Accuracy Results

•Recognition accuracy is 91.8% for 10K MNIST test images, which is close to the software model’s 93.8% accuracy

19

MNIST dataset:Handwritten Digit images

Learning algorithm (RBM)

Training

Neuromorphic core(Test chip)

Synaptic weights

Inference

Linear classifier(Spike comparison

w/ trained ref.)

Hidden units: Spike set

Match

Digitrecognition

256 axon signals

Final results

16X16 Pixel data

eFlash-based core

0 1 2 3 4 5 6 7 8 9

10 1 2 3 4 5 6 7 8 9

# of RecognitionsImage 0

Image 1

Image 2

Image 3

Image 4 Image 5

Image 9

Image 8

Image 7

Image 6

Output Neuron Index

# of

Rec

ogni

tions

10

1001000

110

1001000

110

1001000

110

1001000

110

1001000

110

1001000

110

1001000110

1001000110

1001000110

1001000

Page 20: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Retention Test Results

20

• Excellent retention characteristics

‒ Reprogramming is not necessary

‒ A higher number of levels is possible

1K Pre-cycles, Tbake=150°C

100 101 102 103

Bake Time (hrs)Before baking

Cell

Curr

ent (

µA)

0

5

10

15

Weight 2 (mean)

±3σWeight 1 (mean)

Weight 0 (mean)

Page 21: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Comparison Table

[2] W. Chen, et al., IEDM, 2017.[3] X.Guo, et al., IEDM, 2017.[4] W.Khwa, et al., ISSCC, 2018.

[5] S. Gonugondia, et al., ISSCC, 2018.[6] W.Chen, et al., ISSCC, 2018.

21

ISSCC’18 [6]

ApplicationHandwritten

digit recognition

Technology 65nm

ISSCC’18 [5]

65nm

ISSCC’18 [4]Handwritten

digit recognition

65nm

# of Currents Summed Up 14 Cells 4 Cells30 Cells

Non volatile? YES (ReRAM) NO (SRAM) NO (SRAM)

Machine learningclassifier

Voltage 1.0V 1.0V1.0V

Logic Compatible?

IEDM’17 [2]

Weight Resolution

YES (ReRAM)

YES YES NONO

2 Bits 3 Bits 1 Bit 1 Bit

Program-verify? NO NO NO NO

2 Cells

150nm

Computing in memory

1.8V

IEDM’17 [3]

YES (eFlash)

NO

N/A

YES

N/A

180nm

Handwritten digit

recognition

2.7V

65nm

This work

Handwritten digit

recognition

YES (Eflash)

68 Cells

1.0V

YES

2.3 Bits (5 levels)

YES

Page 22: A 68 Parallel Row Access Neuromorphic Core with 22K Multi ...people.ece.umn.edu/groups/VLSIresearch/papers/2018/IEDM18_Eflash_slides.pdf · IEDM’17 [2] Weight Resolution YES (ReRAM)

Conclusions

• A logic-compatible 5T eFlash based neuromorphic core demonstrated in 65nm CMOS

• Key features‒ Non-volatile weight storage in a logic CMOS process‒ Precise multi-level weights enabled by program verify‒ 68 row parallel access

• Test chip results show 91.8% digit recognition accuracy

22