presented by sam schiferl and pedram zamirai study of dram ... - eecs... · presented by sam...
TRANSCRIPT
Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance ErrorsISCA 2014
Yoongu Kim1 Ross Daly1 Jeremie Kim1 Chris Fallin1 Ji Hye Lee1 Donghyuk Lee1 Chris Wilkerson2 Konrad Lai Onur Mutlu1
1Carnegie Mellon University 2Intel Labs
Presented by Sam Schiferl and Pedram Zamirai
Outline
1. Motivation2. DRAM Structure3. Disturbance Errors4. Test System Setup5. Results6. Proposed Solution7. Conclusion8. Discussion
2
Motivation
● As DRAM process technology continues to downscale, memory reliability suffers due to:
○ Smaller cell holds limited charge○ Cells are closer together, which can lead to electromagnetic coupling○ Higher variation in process technology
● These issues can lead to the violation of memory isolation○ An access to one memory address should not have unintended side effects on data stored
in other addresses
● The authors investigate the vulnerability of three major commodity DRAM manufacturers to targeted disturbance error attacks
3
DRAM Structure
Single memory cell1Rows of cells1 Figure from paper
● Charge stored in capacitor to represent 0/1
● Access transistor used to read/write data to specific cell
4
DRAM Access
Single memory cell1Rows of cells1 Figure from paper
1. Row’s wordline is raised to high2. Row-buffer reads/write desired
columns3. Row’s wordline is closed
5
DRAM Access
Single memory cell1Rows of cells1 Figure from paper
1. Row’s wordline is raised to high2. Row-buffer reads/write desired
columns3. Row’s wordline is closed
6
DRAM Access
Single memory cell1Rows of cells1 Figure from paper
1. Row’s wordline is raised to high2. Row-buffer reads/write desired
columns3. Row’s wordline is closed
7
DRAM Access
Single memory cell1Rows of cells1 Figure from paper
1. Row’s wordline is raised to high2. Row-buffer reads/write desired
columns3. Row’s wordline is closed
8
DRAM Access
Single memory cell1Rows of cells1 Figure from paper
1. Row’s wordline is raised to high2. Row-buffer reads/write desired
columns3. Row’s wordline is closed
9
DRAM Refresh
● The charge of a memory cell constantly leaks, eventually leading to a loss of data
● Data must be refreshed periodically by raising the wordline
● DRAM specifications guarantee a retention time before the cell loses data
○ 64 ms retention time for DDR310
DRAM Refresh
● The charge of a memory cell constantly leaks, eventually leading to a loss of data
● Data must be refreshed periodically by raising the wordline
● DRAM specifications guarantee a retention time before the cell loses data
○ 64 ms retention time for DDR311
Disturbance Errors
● Unwanted interaction between two isolated circuit components
● Repeatedly toggling the voltage of a wordline can cause cells in nearby rows to leak charge at a faster rate - leak entire charge prior to refresh
● Causes:○ Noise injection○ Bridges○ Hot-carrier injection
Aggressor
Victims
Victims
12
Disturbance Error Attack
● Repeatedly read data from same row in DRAM and track bit flips in other DRAM rows
● Flush line from cache after each readmov (X), %eax mov (Y), %ebx clflush (X) clflush (Y)mfencejmp code1a
X & Y map to the same bank, but different rows
mov (X), %eax clflush (X)
mfencejmp code1a
Induces errors
Does not induce errors
13
Experimental Methodology
● Testing platform○ 8 Xilinx FPGA boards○ DDR3-800 memory controller○ Run at 50�C
● DRAM modules○ 129 DDR3 DRAM modules○ 972 DRAM chips
● Test Parameters○ Activation Interval (AI)○ Refresh Interval (RI)○ Data Pattern (DP)
14
Types of Tests
1. Toggle all lines in module repeatedly and locate all disturbed cells○ Quickly identify all disturbed cells throughout an entire module
2. Toggle single row repeatedly and identify specific disturbed cells○ Correlate victim cells with aggressor rows
15
Manufacturing Date
● No error in 19 oldest modules● Relatively recent phenomenon
16
Effective Parameters
● Access patterns○ Repeated toggling of wordline○ Opening & closing cause the problem
● Refresh interval (RI)● Activation interval (AI)● Data Patterns
Access Pattern Disturbance Errors?
(open-read-close)N Yes
(open-write-close)N Yes
open-readN-close No
open-writeN-close No
17
Effective Parameters
● Access patterns● Refresh interval (RI)
○ RI ↓ ⇒ Errors ↓■ Less leakage■ Less row openings
● Activation interval (AI)● Data Patterns
18
Effective Parameters
● Access patterns● Refresh interval (RI)● Activation interval (AI)
○ AI ↑ ⇒ Errors ↓■ Less row openings in each RI
● Data Patterns
19
Effective Parameters
● Access patterns● Refresh interval (RI)● Activation interval (AI)● Data Patterns
○ Victim cells lose charge when they are disturbed○ True-cell: High voltage = 1 ○ Anti-cell: High voltage = 0○ True is dominant○ Errors are mostly 1 → 0
20
Address Correlation
● No errors in aggressor itself● Strong peaks at ±1
○ Great effect on two immediate neighbor○ Logical and physical adjacency highly correlate
● Errors in non-adjacent rows○ Physically-adjacent ⇎ Logically-adjacent
21
Sensitivity Results
● Errors are mostly repeatable○ Ten iterations of testing○ Relatively constant average number of errors (±0.25%)
● Victim cells ≠ Weak cells○ Weak cells = cells with shortest retention time
● Not strongly affected by temperature○ ±20�C from ambient temperature → No effect
22
Probabilistic Adjacent Row Activation (PARA)
● After closing a row, memory controller might refresh one of the adjacent rows by probability of P (small constant)
○ Stateless solution
● It picks one of the neighbors randomly● Number of accesses ↑ ⇒ Refresh Probability ↑● Cannot prevent disturbance errors with absolute certainty
23
Conclusion
● Demonstrated, characterized and analyzed disturbance errors● Repeated accesses to the same row corrupts data in other rows● Emerging problem (affect current and future computing systems)● Proposed several solutions
24
Discussion Points
● Does the type of processor (ARM vs x86) have an effect on the feasibility of the attack?
25
Discussion Points
● Does the type of processor (ARM vs x86) have an effect on the feasibility of the attack?
● How practical is their PARA solution that relies on probabilistically refreshing candidate victim rows?
26
Discussion Points
● Does the type of processor (ARM vs x86) have an effect on the feasibility of the attack?
● How practical is their PARA solution that relies on probabilistically refreshing candidate victim rows?
● Should this attack be mitigated with a software or a hardware solution?
27
Potential Solutions
Solution Probable Defect
Make better chips Future smaller cells
Correct errors High cost & unable to correct multi-bit errors
Refresh all rows frequently Degrade performance and energy efficiency
Map faulty cells to spare cells (manufacturer) Not enough spare cells
Retire cells (end-user)1. Disable/remap faulty addresses2. Refresh faulty addresses more frequently
1: Every row in the module is a victim row2: refreshes victim rows more frequently even when there is no access to the module
Identify “hot” rows and refresh neighbors High hardware overhead to identify hot rows28