nonvolatile erasable flash nand memories - ieee of a 16gb nand by 240% • with hierarchical column...
TRANSCRIPT
1
Raul Adrian CerneaSanDisk Corporation, Milpitas, California
Nonvolatile Erasable Flash NAND Memories
2
• NAND Flash and memory cards• Basic operations and multi-level• Performance trends and ABL• 3X MLC program-throughput• Die size challenge and solutions• Accelerated data access rate• 3 bits per cell• Power and energy reduction • Summary and conclusions
Outline
3
Mobile/CE/PC Driving NAND Future
Music Phone
Camera PhonePDA Phone
Game Phone
Video Phone
GPS Phone
MobileTV
PC, C
E M
arke
tsM
obile
Mar
ket
Email Imaging Music GameVideo Map TV / STB
Application
4
Memory cards
CONTROLLER
MEMORY
MEMORY
MEMORY
MEMORY
Every card:One controllerAt least one memory chip
5
- - -- - -++++++
++++++
++++++
++++++
Erase and program
WORD LINE
WORD LINE
P WELL
P WELL
THIN OXIDE
GROUNDED BIT LINEFLOATING BIT LINE
FLOATING GATES++++++
++++++
++++++
++++++
++++++
E
P
6
- - -- - -
pEE E E
WORD LINE
Cross-section along the Word Line
NON-CONDUCTIVENAND CHAIN
CONDUCTIVENAND CHAIN
FLOATING GATES ++++++
++++++
++++++
++++++
7
Cross-section along the Bit Line
---
--- ---
---------
NON-CONDUCTIVENAND CHAIN
CONDUCTIVENAND CHAIN
WL FG FG WL
+++
+++
+++
+++
+++
+++
8
Program and verifyWORD LINEVOLTAGE
t
SLC
MLC
PROGRAM
WORD LINEVOLTAGE
t
PROGRAM
VERIFY
VERIFY
VTH
VTH
9
SLC and MLC
SINGLE LEVEL CELL
MULTI LEVEL CELLSTWO BITS PER CELL
THREE BITS PER CELL
FOUR BITS PER CELLVTH
VTH
VTH
VTH
NUMBER OF CELLS
10
Performance trends and All Bit Line Architecture
11
Program-throughput trends for NAND memories
MLC2003 2004 2005 2006 2007 2008
10MB/s
20MB/s
30MB/s
40MB/s
50MB/s
60MB/s
SLC
MLC
ABL MLCFULL
SEQUENCE
ABL MLC
CURRENT 16Gb ABLISSCC 2008
ISSCC 2006; 7.7
ISSCC 2005; 2.1ISSCC 2003; 16.7
MLCSLC
ISSCC 2004; 2.7
11%VLSI 2007; 18.4
MLCCONVENTIONAL
WITHOUTHIGH SPEED
I/O INTERFACE
ABLSLC
12
RO
W D
ECO
DER
SENSE AMPS + DATA LATCHES + COLUMN LOGIC
8Gb MEMORY ARRAY 8Gb MEMORY ARRAY
CONTROL SIGNALS AND I/O PADS
PERIPHERAL CIRCUITS PERIPHERAL CIRCUITS
SENSE AMPS + DATA LATCHES + COLUMN LOGIC
RO
W D
ECO
DER
SENSE AMPS + DATA LATCHES + COLUMN LOGIC
SENSE AMPS + DATA LATCHES + COLUMN LOGIC
The 16Gb chip
BIT LINE
WORD LINE
13
Cell access
SENSEAMPLIFIER
SENSEAMPLIFIER
X DECODER:SELECTED BLOCK
X DECODER:SELECTED WORD LINE
X DECODER:UNSELECTED BLOCK(S)
SENSEAMPLIFIER
SENSEAMPLIFIER
32 NAND CELLS
Y DECODER:ACCESS TO ALL BIT LINES
14
All Bit Line memory versus “Conventional”
RO
W D
ECO
DER
RO
W D
ECO
DER
SENSE AMPLIFIERS + DATA LATCHESSENSE AMPLIFIERS + DATA LATCHES
PER
IPH
ERA
L C
IRC
UIT
S
I/O P
AD
S
4Gb MEMORY ARRAYPLANE 0
PLANE14Gb MEMORY ARRAY
8Gb CONVENTIONAL MEMORY
RO
W D
ECO
DER
SENSE AMPS + DATA LATCHES + LOGIC
SENSE AMPS + DATA LATCHES + LOGIC
8Gb MEMORY ARRAYPLANE 0
SENSE AMPS + DATA LATCHES + LOGIC
SENSE AMPS + DATA LATCHES + LOGIC
8Gb MEMORY ARRAYPLANE 1
RO
W D
ECO
DER
CONTROL SIGNALS AND I/O PADS
PERIPHERAL CIRCUITS PERIPHERAL CIRCUITS
16Gb ABL MEMORY
WL WL WL
WL
• Simultaneous access of two word lines of the same size
• ABL: double Column Logic, two sided
ISSCC 2006; 7.7
15
Features Conventional All Bit LineFull-Sequence
Density 8Gb 16Gb
Number of Blocks per Plane 1K 2K
Column Logic One Sided Two Sided
Number of Bit Lines per Sense Amplifier 2 1
4KB Pages Programmed in Parallel 2 8
Maximum MLC Program-Throughput 10MB/s 34MB/s
Differences between the two architectures
16
Maximum parallelism
• Conventional is an Even / Odd architecture– One Sense Amplifier handling two Bit Lines– Every other Bit Line shielded during sensing
• Full READ and PROGRAM capabilities for ABL– No shielding necessary (as in current mainstream NAND architecture)
EVEN
SENSEAMPLIFIER
ODD
MULTIPLEXER
CONVENTIONAL
ODD
TOP
BOTTOM
MEM
OR
Y A
RR
AY
ABLSENSE
AMPLIFIER
SENSEAMPLIFIER
EVEN
ODD
17
Conventional column architecture
SENSEAMPLIFIER
COLUMNSELECTION
DATALATCHES
INPUTOUTPUT
BIT LINEMULTIPLEX
MEMORY ARRAY
EVENBIT LINE
ODDBIT LINE
• Two Bit Lines for every read-program unit
• All similar column blocks
18
ABL: three level hierarchical architecture
• One Bit Line connected to one Sense Amplifier• Sense Amplifiers, Data Latches, one Sequential Logic Unit
and one Control Logic, all connected to the same bus• Column Block Selection, Pointer, I/O and Local Drivers
CONTROL LOGIC CONTROL LOGICCONTROL LOGICCONTROL LOGIC CONTROL LOGIC CONTROL LOGIC
INPUT / OUTPUTPOINTER
LOCAL DRIVERS
SEQUENTIALLOGIC
UNIT(SLU)
MEMORY ARRAY
COMMON BUS
N
BLOCK SELECTION
DATA LATCHESMEMORY ARRAY
MEMORY ARRAY
SENSE AMPLIFIERS
SA
19
3X MLC Program-Throughput
20
MLC programming time
• There is a step up in programming time when one compares the lower page to the upper page (one state versus two states)
• The lower and upper pages together (three states) are the slowest to be programmed
PROGRAMMING TIME
LOWER PAGETLOWER
UPPER PAGETUPPER
LOWER + UPPER PAGET(LOWER + UPPER)
LOWER PAGE
UPPER PAGE
LOWER AND UPPER
21
Conventional, ABL and Full-Sequence
• In ABL architecture, programming four pages in parallel is twice as fast as Conventional
• MLC programming in Full-Sequence is about three times faster than Conventional
CONVENTIONAL4 PAGE PGM
FOUR PAGE PROGRAMMING TIME
LOWER E,O + UPPER E,OT(LOWER + UPPER)
LOWER EVENTLOWER
LOWER ODDTLOWER
UPPER EVENTUPPER
UPPER ODDTUPPER
UPPER E,OTUPPER
LOWER E,OTLOWER
ALL BIT LINES, FULL-SEQUENCE4 PAGE PGM
ALL BIT LINES4 PAGE PGM
22
Yupin Effect
WL
++++++++
++++++ ++++++ ++++++++
- - -- - -
pEE E E
EVEN ODD EVEN ODD EVEN
• Additional negative voltage in a two-step sequence for Floating Gates programmed in step TWO (Yupin Effect*)
• Minimized effect in one-step programming (Full-Sequence)
PpE p E
++++++++
++++++++
EVEN ODD EVEN ODD EVEN
- - -- - - - - -- - -- - -- - -
WL
ppE p E
++++++++
++++++++
EVEN ODD EVEN ODD EVEN
- - -- - - - - -- - -- - -- - -
Ref.: SanDisk US Patent 5,867,429 (Feb. 2 1999)
23
ABL: faster verify with current sensing
• Pre-charge (by Bit Line driver): non-linear, fast (both)• Discharge (by NAND cell): linear, slow (Conventional)• Faster verify operation in ABL mode (pre-charge only)
TIME
BIT LINE DRIVER
BL
NA
ND
CH
AIN
SENSEAMPLIFIER
DISCHARGEPRECHARGE
CONVENTIONAL
ABLPRECHARGE
VOLTAGE SENSINGCURRENT SENSING
BIT
LIN
E VO
LTA
GE
24
Random Data Distribution
1
10
100
1000
10000
Threshold Voltage
Bit
Cou
nt
• A full Word Line distribution of random data, programmed in ABL, Full-Sequence at a 34MB/s program-throughput
25
Die Size Challenge and Solutions
26
Hierarchical architecture for area saving
• Hierarchical architecture: less repeating circuits– Only one Sequential Logic Unit and Control Logic per group– Only one Block Selection, Pointer and I/O per block
• Saved area allows for local drivers
CONTROL LOGIC CONTROL LOGICCONTROL LOGICCONTROL LOGIC CONTROL LOGIC CONTROL LOGIC
INPUT / OUTPUTPOINTER
LOCAL DRIVERS
SEQUENTIALLOGIC
UNIT(SLU)
N
BLOCK SELECTION
SA• One SA per BL meansdouble column circuitry– Area reduction is a must
27
Efficient pumps for area saving
• Every stage doubles the (regulated) input voltage• Cross coupled transistors as “Dynamical Diodes” to
prevent the loss of one diode per stage
CLK/CLK
CONVENTIONAL (DICKSON PUMP)
1st STAGE
2nd STAGE
CLOCK
SUPPLY
CLK CLK
28
V02
V04
V01
Each stage doubles the input voltage
Volta
ges
(lin)
-500m
0
500m
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
Time (lin) (TIME)0 200n 400n 600n 800n 1u
V01, V02, V04
01
23
45
67
8
Volta
ge (V
)
0 200n 400n 600n 800n 1μTime (s)
V01
V02
V04
SUPPLYCLOCK
2X SUPPLY
4X SUPPLY
29
Accelerated Input / Output Access Rate
30
Fast internal data I/O stream
1. Quad-split loading2. Four-point pick up3. Top / Bottom
12
3
POINTERSELECT 63CLK
DATA 63
POINTERSELECT 1
DATA 1DATA 0
POINTERSELECT 0
•Pointer selection•Three stage pipeline
31
Two step I/O redundancy access
• Regular clock for redundancy bytes at Data In (or Data Out)STEP 1: Full data stream
into the regular bufferSTEP 2: Data transfer
between regular buffer and redundancy zone
• Reversed process for Data Out
• Minimal time penalty for the hidden data transfer
DATA IN (OR DATA OUT) FLOW
PADS
RO
W D
ECO
DER
RED
UN
DA
NC
Y
RO
W D
ECO
DER
RED
UN
DA
NC
Y
DEF
ECT
CO
LUM
NS
DEF
ECT
CO
LUM
NS
1 1
1 12
22
2
32
Three bits per cell
33
Memory Density Trend
0
20
40
60
80
100
120
2000 2002 2004 2006 2008 2010
Year
Den
sity
(Mb/
mm
2)
This work (NAND X3)
Samsung (NAND)
Toshiba/SanDisk (NAND)
SanDisk/Toshiba D2
Hitachi (AND)
37% Mb/mm2
34
Chip architecture comparison
56nm 8G D2(ISSCC2006)
Chip size 99mm2
ISSCC2006182mm2
ISSCC20088Gb (80.8Mb/mm2) 16Gb (82Mb/mm2)
8Gb / plane
Bits/Cell 2bits/cell 2bits/cell 3bits/cell
Program/Sense Even or Odd ABL ABL
4Gb / plane
142.5mm2 (this work)(22% saving)
Density 16Gb (112Mb/mm2)
Architecture 8Gb / plane
56nm 16G D2(ISSCC2008)
56nm 16G D3(ISSCC2008)
(This Work)
35
3 bit per cell programming
• 3 pages (Lower/Middle/Upper) on each Word Line• Each page can be treated as an independent page• The pages have to be programmed sequentially
WLN+2
WLN+1
WLN 012
012
012
012
345
345
345
345
678
678
678
678
36
3 bit per cell programming and verify
• 7 verify operations after each programming pulse• Conversion from “page by page” to three page
programming for speed improvement• Program-throughput comparable to conventional
(non-ABL 2 bit per cell)
WORD LINEVOLTAGE
t
PROGRAM
VERIFY
37
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
7 program states
Bits
#
7 state distribution
38
Power and Energy Reduction
39
SELECTION
POINTER
SELECTION
DL
POINTER
OUTPUT
INPUT
POINTER POINTER POINTER POINTERPOINTER
SELECTION
CLK
POINTER 0
POINTER 1
POINTER 2
POINTER 3
t
Power reduction during Data Stream
• 2V internally down-regulated VDD (from 2.7~3.6V)• Pointer selection to avoid global address bus
switching
40
ABL for lower energy
• Even / Odd sensing in Conventional architecture– Double sensing operations per Word Line– Strong coupling between adjacent Bit Lines plus parasitic to Ground
• All Bit Line sensing in one operation– Bit Line coupling to ground only (20% of total parasitic)– Lower Bit Line voltages
• At least 10 times less charge wasted when using ABL
CONVENTIONAL, ODD BIT LINESVOLTAGE SENSING
CONVENTIONAL, EVEN BIT LINESVOLTAGE SENSING
ALL BIT LINESCURRENT SENSING
FULL WORD LINE READ TIME
41
Summary and Conclusions
42
ABL SLCPROGRAMMING
(WITHOUT HIGH SPEED I/O INTERFACE)
60MB/s
ABL MLCPROGRAMMING
NO FULL-SEQUENCE24MB/s
ABL MLCPROGRAMMINGFULL-SEQUENCE
34MB/s
16 Gb ABL DIE SIZE 182mm2
READ THROUGHPUTBY 8 I/O 50MB/s
Summary, 2 bit per cell
43
ABL MLCPROGRAMMINGFULL-SEQUENCE
8MB/s
16 Gb D3 DIE SIZE 142.5mm2
READ THROUGHPUTBY 8 I/O 40MB/s
Summary, 3 bit per cell
44
ABL is the architecture of the future
• The ABL architecture provides the fastest MLC program-throughput reported yet in 56nm and closes the gap between SLC and MLC
• By means of Full-Sequence programming the high voltage exposure is halved and the Bit Line interaction is minimized for maximum endurance
• The technology challenges of 43nm and beyond are well served by this architecture
• ABL is the architecture of choice for the next generations and 3 bit per cell encoding
45
Conclusions
• A new All Bit Line architecture boosts the MLC program-throughput of a 16Gb NAND by 240%
• With hierarchical column block architecture and a two-step redundancy access, a 50% faster internal I/O rate is achieved, even for lower power supply
• ABL provides an energy efficient system with an improved overall performance
• Lower die size when compared to a four plane “Conventional” chip of close performance
46
Acknowledgements• The author wants to specially thank George Samachisa and Yan Li for
their support• The circuits presented here were made possible by the work of many
engineers from SanDisk and 1)Toshiba:Long Pham, Farookh Moogat, Siu Chan, Binh Le, Shouchang Tsao, Tai-Yuan Tseng, Khanh Nguyen, Jason Li, Jayson Hu, Cynthia Hsu, Fanglin Zhang, Teruhiko Kamei, Hiroaki Nasu, Phil Kliza, Khin Htoo, Jeffrey Lutze, Yingda Dong, Masaaki Higashitani, Junnhui Yang, Hung-Szu Lin, VamshiSakhamuri, Jong Hak Yuh, Alan Li, Feng Pan, Sridhar Yadala, SubodhTaigor, Kishan Pradhan, James Lan, James Chan, Takumi Abe1, YasuyukiFukuda1, Hideo Mukai1, Koichi Kawakami1, Connie Liang, Tommy Ip, Shu-Fen Chang, Jaggi Lakshmipathi, Sharon Huynh, Dimitris Pantelakis, Seungpil Lee, Yupin Fong, Tien-Chien Kuo, Jong Park, Tapan Samaddar, Hao Nguyen, Man Mui, Emilio Yero, Gyuwan Kwon, Jun Wan, Tetsuya Kaneko1, Hiroshi Maejima1, Hitoshi Shiga1, Makoto Hamada1, NorihiroFujita1, Kazunori Kanebako1, Eugene Tam, Anne Koh, Iris Lu, Calvin Kuo, Trung Pham, Jonathan Huynh, Qui Nguyen, Hardwell Chibvongodze, Mitsuyuki Watanabe, Ken Oowada, Grishma Shah, Byungki Woo, Ray Gao, Patrick Hong, Liping Peng, Debi Das, Dhritiman Ghosh, Vivek Kalluru, Sanjay Kulkarni, Mehrdad Mofidi, Khandker Quader, Chi-Ming Wang