eecs598 non-volatile storage jerry kao

44
1 1 1 University of Michigan 1 EECS598 Non-Volatile Storage Jerry Kao [email protected] Electrical Engineering & Computer Science Department The University of Michigan, Ann Arbor

Upload: flashdomain

Post on 26-May-2015

373 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EECS598 Non-Volatile Storage Jerry Kao

11

1

University of Michigan 1

EECS598Non-Volatile Storage

Jerry [email protected]

Electrical Engineering & Computer Science DepartmentThe University of Michigan, Ann Arbor

Page 2: EECS598 Non-Volatile Storage Jerry Kao

22

2

University of Michigan 2

A SURVEY OF CIRCUIT INNOVATIONS IN FERROELECTRICRANDOM-ACCESS MEMORIES

Ali Sheikholeslami, and P. Glenn Gulak

Page 3: EECS598 Non-Volatile Storage Jerry Kao

33

3

University of Michigan 3

FRAM Structure

Motives for FRAM: short programming time and low power consumption.

Easily integration in a SoC.

Research are done in following three areas: material processing, modeling, circuit design.

Page 4: EECS598 Non-Volatile Storage Jerry Kao

44

4

University of Michigan 4

FRAM Comparison

FRAM is superior in term of write-access time and overall power consumption.

Target application: contactless smart card, and digital camera

Also hoping to be part of the mobile device market.

This paper focused on the six innovative circuit techniques.

Page 5: EECS598 Non-Volatile Storage Jerry Kao

55

5

University of Michigan 5

Ferromagnetic Cores Background Main technology prior to the 1950’s.

a current the x-access and y-access wire magnetized in a “0” or “1” direction.

Read access consists of a write access followed by sensing. Writing the wrong data will induce a large current.

write the data stored in sense amp back to cell after write access.

Page 6: EECS598 Non-Volatile Storage Jerry Kao

66

6

University of Michigan 6

Ferroelectric Capacitors Background Name was adopted to convey similarity in the hysteresis loop.

Key concept: spontaneous polarization: a displacement that is inherent to the cycstal structure and does not disappear in absence of electric field.

Popular matieral is lead zirconate titanate (PZT), perovskites.

At 0V, the cell has two possible states.

Page 7: EECS598 Non-Volatile Storage Jerry Kao

77

7

University of Michigan 7

Techniques to Reduce Voltage Disturbance

Novel material process to make the loop more square like.

Add the access transistor to each cell. (1T-1C)

Access transistor OFF FE cap disconnect from bit line (BL)

Access transistor ON FE cap is connected to BL and can be

read or write from plate line (PL).

voltage boosted VDD is applied to WL.

Page 8: EECS598 Non-Volatile Storage Jerry Kao

88

8

University of Michigan 8

Step-Sensing Approach Timing Diagram Step PL before sensing.

BL precharge to 0V

turn on WL resulting in a capacitor divider consisting CFE and CBL between PL and ground.

Raise PL to VDD.

Sense the voltage on BL, Vx.

Sense amp restore the original data in the cell.

Page 9: EECS598 Non-Volatile Storage Jerry Kao

99

9

University of Michigan 9

Pulse-Sensing Approach pulse PL before sense amp.

has a smaller common mode voltage.

step-sensing approach is preferred due to higher cm voltage.

Page 10: EECS598 Non-Volatile Storage Jerry Kao

1010

10

University of Michigan 10

Reference Voltage Generation Reference voltage between V0 and V1 is need to do the

comparison.

V0 and V1 are not exact and are process and time dependent.

Two type of ferroelectric imperfections: Relaxation: a partial loss of remanent charge in a µs if cap is not access

for a period of time. → V1↓ or V0↑

Imprint: the tendency of a cell to prefer one state over the other if it stay in that state for a long period of time. → shift in V1, V0, and VREF.

A variable reference is need to track the process Variation.

Page 11: EECS598 Non-Volatile Storage Jerry Kao

1111

11

University of Michigan 11

One Oversized Reference Capacitor per Column

Two additional cells in each column (1C’/BL).

CREF is sized larger than CFE so that VREF is midway between V0 and V1.

When WL0 and RWL0 or WL1 and RWL1 are turned on at the same time, and the sense amp amplify the difference between BL and /BL.

Reset transistor are added to reduce a voltage build up in the CREF.

VREF tuning achieves using adjustable CREF, adjustable RPL, or adjustable voltage reference generator.

Page 12: EECS598 Non-Volatile Storage Jerry Kao

1212

12

University of Michigan 12

Two Half-Sized Reference Cap per Column

also call (2 ×0.5C/BL)

Generate VREF=(V0+V1)/2

CREF1 and CREF0 are half of the size of CFE.

In this case, VREF is going to be slightly larger than (V0+V1)/2.

CREF1 and CREF0 fatigues faster than CFE.

Page 13: EECS598 Non-Volatile Storage Jerry Kao

1313

13

University of Michigan 13

Two Full-Sized Reference Cap per Two Columns

also called (2C/2BL).

CREF1 = CREF0 = CFE

BL1 has V1 and BL2 has V0 before EQ turn ON.

After EQ turn ON, VBL1=VBL2=(V0+V1)/2

At the end, a “0” and “1” must be restored in CREF0 and CREF1 by pulsing RPL thru transistor driven by RP.

Page 14: EECS598 Non-Volatile Storage Jerry Kao

1414

14

University of Michigan 14

Adding Reference Cells to Rows also called (2C/WL)

fatigue the reference voltage circuit less.

reference generated by shorting RBL and /RBL.

need to add Cext to balance cap due to RBL.

Page 15: EECS598 Non-Volatile Storage Jerry Kao

1515

15

University of Michigan 15

A Self-Reference Fully Differential Arch. also called (2T-2C)

Two CFEs store opposite values.

twice the voltage difference between BL and /BL.

only used in lower density memory.

Page 16: EECS598 Non-Volatile Storage Jerry Kao

1616

16

University of Michigan 16

Summary

2T-2C is the most robust, but has density issue.

among 1T-1C, 2C/2BL and 2C/WL schemes have superior sensing complexity and fatigue immunity, respectively.

Page 17: EECS598 Non-Volatile Storage Jerry Kao

1717

17

University of Michigan 17

Ferroelectric Memory Architecture adopted folded bitline architecture to

reduce the bitline mismatch.

constant PL architecture is desired since PL is slow to move.

Two disadvantages: A refresh is required.

voltage range across CFE is smaller.

Page 18: EECS598 Non-Volatile Storage Jerry Kao

1818

18

University of Michigan 18

Wordline-Parallel Plateline also called (WL//PL)

PL is parallel to WL

a row of cells are access at the same time.

If PL is shared between two row, un-accessed row can be disturbed.

When disturbed, “0” is reinforced, and “1” might be flipped.

Page 19: EECS598 Non-Volatile Storage Jerry Kao

1919

19

University of Michigan 19

Bitline-Parallel Plateline also called (BL//PL)

only a single cell can be selected.

absorb the y-decoder and reduce the power significantly.

PL activation can disturb all the cells in the column.

Page 20: EECS598 Non-Volatile Storage Jerry Kao

2020

20

University of Michigan 20

Segmented Plateline also called (Segmented PL)

Break the PL into local segments. faster PL than WL//PL

no disturbance to non-selected cell compared to BL//PL.

Page 21: EECS598 Non-Volatile Storage Jerry Kao

2121

21

University of Michigan 21

Merged Wordline/Plateline (ML) Architecture Since WL and PL are parallel, people though

of ways to merge them.

either two 1C-1T cells or one 2C-2T cell.

write “0” into C1 and “1” into C2.

four phase operations: BLn=0V and BLn+1=VDD

ML1 and ML2 set to VDD, forcing “0” into C1.

ML1 pulled down to ground, leaving “0” in C1, and forcing “1” into C2.

ML1 pull to VDD and ML2 are pull to ground forcing “1” into C1 if BLn were at VDD.

Faster read access time.

same read/write time

higher density

read access

write access

Page 22: EECS598 Non-Volatile Storage Jerry Kao

2222

22

University of Michigan 22

Nondriven Plateline Architecture also called Nondriven Plateline(NDP)

Constant voltage on PL reduce read/write access time.

PL=VDD/2

read operation BL1=BL2=0V

activate WL

VDD/2 used to switch the cap storing “1”. Good for SrBi2Ta2O9

Sense amp restore the value by holding BL1=BL2.

Write operation is done similar to read operation except that BL is hold at VDD or 0V.

Page 23: EECS598 Non-Volatile Storage Jerry Kao

2323

23

University of Michigan 23

Bitline-Driven Architecture PL=0V

full VDD when read, and no refresh on VDD/2

Shaded circuit precharge BL and /BL to VDD or 0V before activating the WL.

PL is only pulsed after sensing.

This reduce the read access time, but not read cycle time.

Performance can be improved if combined with segmented PL.

Page 24: EECS598 Non-Volatile Storage Jerry Kao

2424

24

University of Michigan 24

Dual-Mode Ferroelectric Memories limited the switching of CFE during

the power down and power up mode to reduce the fatigue problem.

During power shutdown: STO is turn on.

PL is pulsed, writing data to CFE

STO pull to ground, ready for power off.

During power on sequence:

Page 25: EECS598 Non-Volatile Storage Jerry Kao

2525

25

University of Michigan 25

Transpolarizer-Based Architectures two CFE connected in opposite

direction.

Simpler reference voltage since (V1+V0)/2 always equal to VDD/2.

Although it is a 1T-2C structure, the C is smaller than 1T-1C to get small signal level on BL.

Read operation with t4 and t5 doing write back.

Page 26: EECS598 Non-Volatile Storage Jerry Kao

2626

26

University of Michigan 26

Cross-Point Array of Ferroelectric Gain Cells

Memory architecture without PL and destructive read.

consist of array of gain cells.

two caps form a capacitor divider, and the transistor amplify the result.

In standby, WL=BL=VDD

In read, precharge BL to VDD and lower WL slightly. BL with cell storing “0” would have a larger current than BL with cell storing “1”.

Page 27: EECS598 Non-Volatile Storage Jerry Kao

2727

27

University of Michigan 27

Chain FRAM (NAND Architecture) similar to NAND flash.

in unit of cell block.

A cell block is terminated by a BL and PL on each end.

In standby, all WL=VDD.

in active operation, WLx=0V and raising Block-Select(BS). other WL remain high allowing BL voltage and PL voltage to reach the selected cells.

Increase the number of cell in cell block increase density but reduce readout delay.

1024 cells per bit line and 16 cells per cell block reduces area by 63%.

Page 28: EECS598 Non-Volatile Storage Jerry Kao

2828

28

University of Michigan 28

Architecture Summary

Page 29: EECS598 Non-Volatile Storage Jerry Kao

2929

29

University of Michigan 29

Future Trends Progress in density, access time, and SoC integration can be

assumed.

62kb and 256kb has been achieved with 1Mb expected.

Access time hasn’t improved, but can be through circuit innovation.

It is easier to integrate FRAM to SoC compare to EEPROM.

Page 30: EECS598 Non-Volatile Storage Jerry Kao

3030

30

University of Michigan 30

ULTRALOWPOWER DATA STORAGE FOR SENSOR NETWORKS

Gaurav Mathur, Peter Desnoyers, Deepak Ganesan, Prashant Shenoy

Page 31: EECS598 Non-Volatile Storage Jerry Kao

3131

31

University of Michigan 31

Motivation What is the most energy-efficient storage platform for the

sensor networks, and what is the implication on sensor network design?

Results Parallel NAND flash is 100X more energy-efficient storage compared to

other flash memories and the radio on MicaZ.

Page 32: EECS598 Non-Volatile Storage Jerry Kao

3232

32

University of Michigan 32

Background NOR flash is less dense than NAND and uses more energy for

erase and programming, but provides random read access time less than 100ns.

NAND flash has significantly higher starting latency, but can stream subsequently read bytes at high speed since it is always page-oriented.

Writes are “one-way.” Need to erase before the next write. A microcontroller is used to translate the disk like operation to NAND interface, which also increase power consumption. This takes care of erasure, page remapping, ECC, and wear leveling.

Page 33: EECS598 Non-Volatile Storage Jerry Kao

3333

33

University of Michigan 33

Flash Energy Consumption measured on Mica mote with 10Ω resistor with 3.3V supply

Toshiba NAND is 21X more efficient than Telos NOR.

Page 34: EECS598 Non-Volatile Storage Jerry Kao

3434

34

University of Michigan 34

Affect of Size of Data on Energy Consumption

read operation has a smaller energy overhead compared to write operation.

having a write buffer can amortizes the fix cost over a larger number of data bytes.

Page 35: EECS598 Non-Volatile Storage Jerry Kao

3535

35

University of Michigan 35

Idle Current NOR and NAND device are smaller between 2µA and 5µA,

which is smaller than mote CPU’s 5µA and 15µA or self discharge current of AA battery of 10µA.

NOR and NAND device has idle current that is 17X smaller than MMC.

Page 36: EECS598 Non-Volatile Storage Jerry Kao

3636

36

University of Michigan 36

Summary parallel NAND flash is the most energy efficient storage for

sensor network.

A desired device would have the performance of a parallel NAND and the pin count of a serial NAND flash.

ECC is better handle using the microcontroller during idle cycle.

Page 37: EECS598 Non-Volatile Storage Jerry Kao

3737

37

University of Michigan 37

Implication on Sensor Systems Compare energy consumption of flash to CPU, radio.

writing a byte in flash is 11X more expensive than computation.

radio transmission of a byte is 200X over write access, and 500X over read access.

Suggested that storage energy should be part of the trade-off.

Applications that benefit In-network Query Process.

Use of History

Network-level compression

Custody Transfer

Page 38: EECS598 Non-Volatile Storage Jerry Kao

3838

38

University of Michigan 38

Re-thinking Sensor Net Design Sensor network service involve three operation: computation,

storage and communication.

characterize those operations by two parameters: frequency and magnitude.

Model using a sensor service emulator.

Page 39: EECS598 Non-Volatile Storage Jerry Kao

3939

39

University of Michigan 39

Impact on Communication Service NAND flash provides significant energy gain for batch size

greater than 128 bytes.

In 1% duty cycles, it achieves 3.8 times less energy/byte with batch size of 512 bytes and 58 times improvement for a batch size of 65kbytes.

The 7.5% duty cycle has smaller preamble resulting in less fix energy cost per packet.

Page 40: EECS598 Non-Volatile Storage Jerry Kao

4040

40

University of Michigan 40

Impact on Data Aggregation effect of compression on energy consumption.

Three type of compression: lossless encoding, lossy encoding, feature extraction.

use a benchmark wavelet compression scheme optimized for floating pointless operation with computation complexity of 60N.

Conclude that 10X energy consumption saving for using of data aggregation.

Page 41: EECS598 Non-Volatile Storage Jerry Kao

4141

41

University of Michigan 41

Conclusion parallel NAND flash has 100 fold more energy efficient than

serial NOR flash.

This observation has implication for sensor network design.

Data shows that communication and data aggregation achieves at least an order of magnitude energy reduction.

Page 42: EECS598 Non-Volatile Storage Jerry Kao

4242

42

University of Michigan 42

THE MISSING MEMRISTOR FOUND

Dmitri B. Strukov, Gregory S. Snider, Duncan R. Stewart & R. Stanley Williams

Page 43: EECS598 Non-Volatile Storage Jerry Kao

4343

43

University of Michigan 43

The four fundamental two terminal circuit elements

Page 44: EECS598 Non-Volatile Storage Jerry Kao

4444

44

University of Michigan 44

Operation