nonvolatile erasable flash nand memories - ieee of a 16gb nand by 240% • with hierarchical column...

1

Raul Adrian CerneaSanDisk Corporation, Milpitas, California

Nonvolatile Erasable Flash NAND Memories

2

• NAND Flash and memory cards• Basic operations and multi-level• Performance trends and ABL• 3X MLC program-throughput• Die size challenge and solutions• Accelerated data access rate• 3 bits per cell• Power and energy reduction • Summary and conclusions

Outline

3

Mobile/CE/PC Driving NAND Future

Music Phone

Camera PhonePDA Phone

Game Phone

Video Phone

GPS Phone

MobileTV

PC, C

E M

arke

tsM

obile

Mar

ket

Email Imaging Music GameVideo Map TV / STB

Application

4

Memory cards

CONTROLLER

MEMORY

MEMORY

MEMORY

MEMORY

Every card:One controllerAt least one memory chip

5

- - -- - -++++++

++++++

++++++

++++++

Erase and program

WORD LINE

WORD LINE

P WELL

P WELL

THIN OXIDE

GROUNDED BIT LINEFLOATING BIT LINE

FLOATING GATES++++++

++++++

++++++

++++++

++++++

E

P

6

- - -- - -

pEE E E

WORD LINE

Cross-section along the Word Line

NON-CONDUCTIVENAND CHAIN

CONDUCTIVENAND CHAIN

FLOATING GATES ++++++

++++++

++++++

++++++

7

Cross-section along the Bit Line

---

--- ---

---------

NON-CONDUCTIVENAND CHAIN

CONDUCTIVENAND CHAIN

WL FG FG WL

+++

+++

+++

+++

+++

+++

8

Program and verifyWORD LINEVOLTAGE

t

SLC

MLC

PROGRAM

WORD LINEVOLTAGE

t

PROGRAM

VERIFY

VERIFY

VTH

VTH

9

SLC and MLC

SINGLE LEVEL CELL

MULTI LEVEL CELLSTWO BITS PER CELL

THREE BITS PER CELL

FOUR BITS PER CELLVTH

VTH

VTH

VTH

NUMBER OF CELLS

10

Performance trends and All Bit Line Architecture

11

Program-throughput trends for NAND memories

MLC2003 2004 2005 2006 2007 2008

10MB/s

20MB/s

30MB/s

40MB/s

50MB/s

60MB/s

SLC

MLC

ABL MLCFULL

SEQUENCE

ABL MLC

CURRENT 16Gb ABLISSCC 2008

ISSCC 2006; 7.7

ISSCC 2005; 2.1ISSCC 2003; 16.7

MLCSLC

ISSCC 2004; 2.7

11%VLSI 2007; 18.4

MLCCONVENTIONAL

WITHOUTHIGH SPEED

I/O INTERFACE

ABLSLC

12

RO

W D

ECO

DER

SENSE AMPS + DATA LATCHES + COLUMN LOGIC

8Gb MEMORY ARRAY 8Gb MEMORY ARRAY

CONTROL SIGNALS AND I/O PADS

PERIPHERAL CIRCUITS PERIPHERAL CIRCUITS


RO

W D

ECO

DER



The 16Gb chip

BIT LINE

WORD LINE

13

Cell access

SENSEAMPLIFIER

SENSEAMPLIFIER

X DECODER:SELECTED BLOCK

X DECODER:SELECTED WORD LINE

X DECODER:UNSELECTED BLOCK(S)

SENSEAMPLIFIER

SENSEAMPLIFIER

32 NAND CELLS

Y DECODER:ACCESS TO ALL BIT LINES

14

All Bit Line memory versus “Conventional”

RO

W D

ECO

DER

RO

W D

ECO

DER

SENSE AMPLIFIERS + DATA LATCHESSENSE AMPLIFIERS + DATA LATCHES

PER

IPH

ERA

L C

IRC

UIT

S

I/O P

AD

S

4Gb MEMORY ARRAYPLANE 0

PLANE14Gb MEMORY ARRAY

8Gb CONVENTIONAL MEMORY

RO

W D

ECO

DER

SENSE AMPS + DATA LATCHES + LOGIC






RO

W D

ECO

DER

CONTROL SIGNALS AND I/O PADS

PERIPHERAL CIRCUITS PERIPHERAL CIRCUITS

16Gb ABL MEMORY

WL WL WL

WL

• Simultaneous access of two word lines of the same size

• ABL: double Column Logic, two sided

ISSCC 2006; 7.7

15

Features Conventional All Bit LineFull-Sequence

Density 8Gb 16Gb

Number of Blocks per Plane 1K 2K

Column Logic One Sided Two Sided

Number of Bit Lines per Sense Amplifier 2 1

4KB Pages Programmed in Parallel 2 8

Maximum MLC Program-Throughput 10MB/s 34MB/s

Differences between the two architectures

16

Maximum parallelism

• Conventional is an Even / Odd architecture– One Sense Amplifier handling two Bit Lines– Every other Bit Line shielded during sensing

• Full READ and PROGRAM capabilities for ABL– No shielding necessary (as in current mainstream NAND architecture)

EVEN

SENSEAMPLIFIER

ODD

MULTIPLEXER

CONVENTIONAL

ODD

TOP

BOTTOM

MEM

OR

Y A

RR

AY

ABLSENSE

AMPLIFIER

SENSEAMPLIFIER

EVEN

ODD

17

Conventional column architecture

SENSEAMPLIFIER

COLUMNSELECTION

DATALATCHES

INPUTOUTPUT

BIT LINEMULTIPLEX

MEMORY ARRAY

EVENBIT LINE

ODDBIT LINE

• Two Bit Lines for every read-program unit

• All similar column blocks

18

ABL: three level hierarchical architecture

• One Bit Line connected to one Sense Amplifier• Sense Amplifiers, Data Latches, one Sequential Logic Unit

and one Control Logic, all connected to the same bus• Column Block Selection, Pointer, I/O and Local Drivers

CONTROL LOGIC CONTROL LOGICCONTROL LOGICCONTROL LOGIC CONTROL LOGIC CONTROL LOGIC

INPUT / OUTPUTPOINTER

LOCAL DRIVERS

SEQUENTIALLOGIC

UNIT(SLU)

MEMORY ARRAY

COMMON BUS

N

BLOCK SELECTION

DATA LATCHESMEMORY ARRAY

MEMORY ARRAY

SENSE AMPLIFIERS

SA

19

3X MLC Program-Throughput

20

MLC programming time

• There is a step up in programming time when one compares the lower page to the upper page (one state versus two states)

• The lower and upper pages together (three states) are the slowest to be programmed

PROGRAMMING TIME

LOWER PAGETLOWER

UPPER PAGETUPPER

LOWER + UPPER PAGET(LOWER + UPPER)

LOWER PAGE

UPPER PAGE

LOWER AND UPPER

21

Conventional, ABL and Full-Sequence

• In ABL architecture, programming four pages in parallel is twice as fast as Conventional

• MLC programming in Full-Sequence is about three times faster than Conventional

CONVENTIONAL4 PAGE PGM

FOUR PAGE PROGRAMMING TIME

LOWER E,O + UPPER E,OT(LOWER + UPPER)

LOWER EVENTLOWER

LOWER ODDTLOWER

UPPER EVENTUPPER

UPPER ODDTUPPER

UPPER E,OTUPPER

LOWER E,OTLOWER

ALL BIT LINES, FULL-SEQUENCE4 PAGE PGM

ALL BIT LINES4 PAGE PGM

22

Yupin Effect

WL

++++++++

++++++ ++++++ ++++++++

- - -- - -

pEE E E

EVEN ODD EVEN ODD EVEN

• Additional negative voltage in a two-step sequence for Floating Gates programmed in step TWO (Yupin Effect*)

• Minimized effect in one-step programming (Full-Sequence)

PpE p E

++++++++

++++++++


- - -- - - - - -- - -- - -- - -

WL

ppE p E

++++++++

++++++++


- - -- - - - - -- - -- - -- - -

Ref.: SanDisk US Patent 5,867,429 (Feb. 2 1999)

23

ABL: faster verify with current sensing

• Pre-charge (by Bit Line driver): non-linear, fast (both)• Discharge (by NAND cell): linear, slow (Conventional)• Faster verify operation in ABL mode (pre-charge only)

TIME

BIT LINE DRIVER

BL

NA

ND

CH

AIN

SENSEAMPLIFIER

DISCHARGEPRECHARGE

CONVENTIONAL

ABLPRECHARGE

VOLTAGE SENSINGCURRENT SENSING

BIT

LIN

E VO

LTA

GE

24

Random Data Distribution

1

10

100

1000

10000

Threshold Voltage

Bit

Cou

nt

• A full Word Line distribution of random data, programmed in ABL, Full-Sequence at a 34MB/s program-throughput

25

Die Size Challenge and Solutions

26

Hierarchical architecture for area saving

• Hierarchical architecture: less repeating circuits– Only one Sequential Logic Unit and Control Logic per group– Only one Block Selection, Pointer and I/O per block

• Saved area allows for local drivers

CONTROL LOGIC CONTROL LOGICCONTROL LOGICCONTROL LOGIC CONTROL LOGIC CONTROL LOGIC

INPUT / OUTPUTPOINTER

LOCAL DRIVERS

SEQUENTIALLOGIC

UNIT(SLU)

N

BLOCK SELECTION

SA• One SA per BL meansdouble column circuitry– Area reduction is a must

27

Efficient pumps for area saving

• Every stage doubles the (regulated) input voltage• Cross coupled transistors as “Dynamical Diodes” to

prevent the loss of one diode per stage

CLK/CLK

CONVENTIONAL (DICKSON PUMP)

1st STAGE

2nd STAGE

CLOCK

SUPPLY

CLK CLK

28

V02

V04

V01

Each stage doubles the input voltage

Volta

ges

(lin)

-500m

0

500m

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

6.5

7

7.5

8

Time (lin) (TIME)0 200n 400n 600n 800n 1u

V01, V02, V04

01

23

45

67

8

Volta

ge (V

)

0 200n 400n 600n 800n 1μTime (s)

V01

V02

V04

SUPPLYCLOCK

2X SUPPLY

4X SUPPLY

29

Accelerated Input / Output Access Rate

30

Fast internal data I/O stream

1. Quad-split loading2. Four-point pick up3. Top / Bottom

12

3

POINTERSELECT 63CLK

DATA 63

POINTERSELECT 1

DATA 1DATA 0

POINTERSELECT 0

•Pointer selection•Three stage pipeline

31

Two step I/O redundancy access

• Regular clock for redundancy bytes at Data In (or Data Out)STEP 1: Full data stream

into the regular bufferSTEP 2: Data transfer

between regular buffer and redundancy zone

• Reversed process for Data Out

• Minimal time penalty for the hidden data transfer

DATA IN (OR DATA OUT) FLOW

PADS

RO

W D

ECO

DER

RED

UN

DA

NC

Y

RO

W D

ECO

DER

RED

UN

DA

NC

Y

DEF

ECT

CO

LUM

NS

DEF

ECT

CO

LUM

NS

1 1

1 12

22

2

32

Three bits per cell

33

Memory Density Trend

0

20

40

60

80

100

120

2000 2002 2004 2006 2008 2010

Year

Den

sity

(Mb/

mm

2)

This work (NAND X3)

Samsung (NAND)

Toshiba/SanDisk (NAND)

SanDisk/Toshiba D2

Hitachi (AND)

37% Mb/mm2

34

Chip architecture comparison

56nm 8G D2(ISSCC2006)

Chip size 99mm2

ISSCC2006182mm2

ISSCC20088Gb (80.8Mb/mm2) 16Gb (82Mb/mm2)

8Gb / plane

Bits/Cell 2bits/cell 2bits/cell 3bits/cell

Program/Sense Even or Odd ABL ABL

4Gb / plane

142.5mm2 (this work)(22% saving)

Density 16Gb (112Mb/mm2)

Architecture 8Gb / plane



(This Work)

35

3 bit per cell programming

• 3 pages (Lower/Middle/Upper) on each Word Line• Each page can be treated as an independent page• The pages have to be programmed sequentially

WLN+2

WLN+1

WLN 012

012

012

012

345

345

345

345

678

678

678

678

36

3 bit per cell programming and verify

• 7 verify operations after each programming pulse• Conversion from “page by page” to three page

programming for speed improvement• Program-throughput comparable to conventional

(non-ABL 2 bit per cell)

WORD LINEVOLTAGE

t

PROGRAM

VERIFY

37

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

7 program states

Bits

#

7 state distribution

38

Power and Energy Reduction

39

SELECTION

POINTER

SELECTION

DL

POINTER

OUTPUT

INPUT

POINTER POINTER POINTER POINTERPOINTER

SELECTION

CLK

POINTER 0

POINTER 1

POINTER 2

POINTER 3

t

Power reduction during Data Stream

• 2V internally down-regulated VDD (from 2.7~3.6V)• Pointer selection to avoid global address bus

switching

40

ABL for lower energy

• Even / Odd sensing in Conventional architecture– Double sensing operations per Word Line– Strong coupling between adjacent Bit Lines plus parasitic to Ground

• All Bit Line sensing in one operation– Bit Line coupling to ground only (20% of total parasitic)– Lower Bit Line voltages

• At least 10 times less charge wasted when using ABL

CONVENTIONAL, ODD BIT LINESVOLTAGE SENSING

CONVENTIONAL, EVEN BIT LINESVOLTAGE SENSING

ALL BIT LINESCURRENT SENSING

FULL WORD LINE READ TIME

41

Summary and Conclusions

42

ABL SLCPROGRAMMING

(WITHOUT HIGH SPEED I/O INTERFACE)

60MB/s

ABL MLCPROGRAMMING

NO FULL-SEQUENCE24MB/s

ABL MLCPROGRAMMINGFULL-SEQUENCE

34MB/s

16 Gb ABL DIE SIZE 182mm2

READ THROUGHPUTBY 8 I/O 50MB/s

Summary, 2 bit per cell

43

ABL MLCPROGRAMMINGFULL-SEQUENCE

8MB/s

16 Gb D3 DIE SIZE 142.5mm2

READ THROUGHPUTBY 8 I/O 40MB/s

Summary, 3 bit per cell

44

ABL is the architecture of the future

• The ABL architecture provides the fastest MLC program-throughput reported yet in 56nm and closes the gap between SLC and MLC

• By means of Full-Sequence programming the high voltage exposure is halved and the Bit Line interaction is minimized for maximum endurance

• The technology challenges of 43nm and beyond are well served by this architecture

• ABL is the architecture of choice for the next generations and 3 bit per cell encoding

45

Conclusions

• A new All Bit Line architecture boosts the MLC program-throughput of a 16Gb NAND by 240%

• With hierarchical column block architecture and a two-step redundancy access, a 50% faster internal I/O rate is achieved, even for lower power supply

• ABL provides an energy efficient system with an improved overall performance

• Lower die size when compared to a four plane “Conventional” chip of close performance

46

Acknowledgements• The author wants to specially thank George Samachisa and Yan Li for

their support• The circuits presented here were made possible by the work of many

engineers from SanDisk and 1)Toshiba:Long Pham, Farookh Moogat, Siu Chan, Binh Le, Shouchang Tsao, Tai-Yuan Tseng, Khanh Nguyen, Jason Li, Jayson Hu, Cynthia Hsu, Fanglin Zhang, Teruhiko Kamei, Hiroaki Nasu, Phil Kliza, Khin Htoo, Jeffrey Lutze, Yingda Dong, Masaaki Higashitani, Junnhui Yang, Hung-Szu Lin, VamshiSakhamuri, Jong Hak Yuh, Alan Li, Feng Pan, Sridhar Yadala, SubodhTaigor, Kishan Pradhan, James Lan, James Chan, Takumi Abe1, YasuyukiFukuda1, Hideo Mukai1, Koichi Kawakami1, Connie Liang, Tommy Ip, Shu-Fen Chang, Jaggi Lakshmipathi, Sharon Huynh, Dimitris Pantelakis, Seungpil Lee, Yupin Fong, Tien-Chien Kuo, Jong Park, Tapan Samaddar, Hao Nguyen, Man Mui, Emilio Yero, Gyuwan Kwon, Jun Wan, Tetsuya Kaneko1, Hiroshi Maejima1, Hitoshi Shiga1, Makoto Hamada1, NorihiroFujita1, Kazunori Kanebako1, Eugene Tam, Anne Koh, Iris Lu, Calvin Kuo, Trung Pham, Jonathan Huynh, Qui Nguyen, Hardwell Chibvongodze, Mitsuyuki Watanabe, Ken Oowada, Grishma Shah, Byungki Woo, Ray Gao, Patrick Hong, Liping Peng, Debi Das, Dhritiman Ghosh, Vivek Kalluru, Sanjay Kulkarni, Mehrdad Mofidi, Khandker Quader, Chi-Ming Wang

nonvolatile erasable flash nand memories - ieee of a 16gb nand by 240% • with hierarchical column...

Documents