scalable high performance main memory system using pcm...
TRANSCRIPT
![Page 1: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/1.jpg)
© 2007 IBM Corporation International Symposium on Computer Architecture (ISCA-2009) Jul-28-09
Scalable High Performance Main Memory System Using PCM Technology
Moinuddin K. Qureshi Viji Srinivasan and Jude Rivers
IBM T. J. Watson Research Center, Yorktown Heights, NY
![Page 2: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/2.jpg)
© 2007 IBM Corporation 2
Main Memory Capacity Wall
More cores in system More concurrency Larger working set
Demand for main memory capacity continues to increase
Main Memory System consisting of DRAM are hitting: 1. Cost wall: Major % of cost of large servers is main memory 2. Scaling wall: DRAM scaling to small technology is challenge 3. Power wall:
IBM P670 Server Processor Memory Small (4 proc, 16GB) 384 Watts 314 Watts
Large (16 proc, 128GB) 840 Watts 1223 Watts Source: Lefurgy et al. IEEE Computer 2003
Need a practical solution to increase main-memory capacity
![Page 3: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/3.jpg)
© 2007 IBM Corporation 3
The Technology Hierarchy
More capacity by cheaper, denser, (slower) technology
21 23 27 211 213 215 219 223
Typical access latency in processor cycles (@ 4 GHz)
L1(SRAM) EDRAM DRAM HDD
25 29 217 221
Flash PCM
Phase Change Memory (PCM) promising candidate for large capacity main memory
High-Performance Disk Memory System
![Page 4: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/4.jpg)
© 2007 IBM Corporation 4
Outline
Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary
![Page 5: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/5.jpg)
© 2007 IBM Corporation 5
What is Phase Change Memory?
Phase change material (chalcogenide glass) exists in two states: 1. Amorphous: high resistivity 2. Crystalline: low resistivity
Materials can be switched between states reliably, quickly, large number of times
PCM stores data in terms of resistance • Low resistance (SET state) = 1 • High resistance (RESET state) = 0 I
Bit Line
Word Line
N
Word Line
N N
![Page 6: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/6.jpg)
© 2007 IBM Corporation 6
How does PCM work ? Tmelt
Tcryst
Time [ns]
RESET
SET
Tem
pera
ture
Switching by heating using electrical pulses
SET: sustained current to heat cell above Tcryst
RESET: cell heated above Tmelt and quenched
Large Current
SET Low resistance
103-104 Ω
Small Current
RESET High resistance
Access Device
Memory Element
106-107 Ω
Photo Courtesy: Bipin Rajendran, IBM
![Page 7: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/7.jpg)
© 2007 IBM Corporation 7
Key Characteristics of PCM
+ Scales better than DRAM, small cell size Prototypes as small as 3nm x 20 nm fabricated and tested [Raoux+ IBMJRD’08]
+ Can store multiple bits/cell More density in the same area Prototypes with 2 bits/cell in ISSCC’08. >2 bits/cell expected soon.
+ Non-Volatile Memory Technology Data retention of 10 years Power implications, system implications
Challenges: - More latency compared to DRAM. - Limited Endurance (~10 million writes per cell) - Write bandwidth constrained, so better to write less often.
![Page 8: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/8.jpg)
© 2007 IBM Corporation 8
Outline
Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary
![Page 9: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/9.jpg)
© 2007 IBM Corporation 9
Hybrid Memory System
Hybrid Memory System: 1. DRAM as cache to tolerate PCM Rd/Wr latency and Wr bandwidth 2. PCM as main-memory to provide large capacity at good cost/power
DATA W
PCM Main Memory
DATA T
DRAM Buffer
PCM Write Queue
T=Tag-Store
Processor
Flash Or
HDD
![Page 10: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/10.jpg)
© 2007 IBM Corporation 10
Lazy Write Architecture
Problem: Double PCM writes to dirty pages on install
DRAM Buffer PCM
Flash/Disk
WRQ
Processor
For example: Daxpy Kernel: Y[i] = Y[i] + X[i] Baseline has 2 writes for Y[i] and 1 for X[i] Lazy write has 1 write for Y[i] and 1 for X[i]
![Page 11: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/11.jpg)
© 2007 IBM Corporation 11
Line Level Write Back
Problem: Not all lines in a dirty page are dirty Solution: Dirty bits per line in DRAM buffer and
write-back only dirty lines from DRAM to PCM
Problem: With LLWB, not all lines in dirty pages are written uniformly
Num
Writ
es to
Eac
h Li
ne (M
ln)
db1 db2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Average Average
Line_id
![Page 12: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/12.jpg)
© 2007 IBM Corporation 12
Fine Grained Wear Leveling
Solution: Fine Grained Wear Leveling (FGWL) -When a page gets allocated page is rotated by a random shift value -The rotate value remains constant while page remains in memory -On replacement of a page, a new random value is assigned for a new page -Over time, the write traffic per line becomes uniform.
FGWL makes writes across lines in a dirty page uniform
Num
Writ
es to
Eac
h Li
ne (M
ln)
db1 db2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Average Average
Line_id
![Page 13: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/13.jpg)
© 2007 IBM Corporation 13
Outline
Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary
![Page 14: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/14.jpg)
© 2007 IBM Corporation 14
Evaluation Framework
Trace Driven Simulator: 16-core system (simple core), 8GB DRAM main-memory at 320 cycles HDD (2 ms) with Flash (32 us) with Flash hit-rate of 99%
Workloads: Database workloads & Data parallel kernels
1. Database workloads: db1 and db2 2. Unix utilities: qsort and binary search 3. Data Mining : K-means and Gauss Seidal 4. Streaming: DAXPY and Vector Dot Product
Assumption: PCM 4X denser & 4X slower than DRAM 32GB @ 1280 cycle read latency
![Page 15: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/15.jpg)
© 2007 IBM Corporation 15
Reduction in Page Faults
Benefit from capacity Need >16GB Streaming
![Page 16: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/16.jpg)
© 2007 IBM Corporation 16
Impact on Execution Time
PCM with DRAM buffer performs similar to equal capacity DRAM storage
![Page 17: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/17.jpg)
© 2007 IBM Corporation 17
Impact of PCM Latency
Hybrid memory system is relatively insensitive to PCM Latency
DRAM-8GB DRAM-32GB PCM-32GB HYBRID (1+32)GB
![Page 18: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/18.jpg)
© 2007 IBM Corporation 18
Power Evaluations
Significant Power and Energy savings with PCM based hybrid memory system
![Page 19: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/19.jpg)
© 2007 IBM Corporation 19
Outline
Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary
![Page 20: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/20.jpg)
© 2007 IBM Corporation 20
Impact of Write Endurance
F Frequency of System (4GHz) Y = Number of years (lifetime)
There are 225 seconds in a year
Num. cycles in Y years = Y. F.225
B Bytes/Cycle written to PCM S PCM capacity in bytes Wmax Max writes per PCM cell Assuming uniform writes to PCM
Endurance (in cycles) = (S/B).Wmax
Y = (S/B). Wmax F.225
If Wmax = 10 million, PCM will last for 2.5 years
For a 4GHz System, a 32GB PCM written at
1 Byte per Cycle
Y = Wmax 4 million
![Page 21: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/21.jpg)
© 2007 IBM Corporation 21
Lifetime Results
Configuration Avg. Bytes/Cycle Avg. Lifetime
1GB DRAM + 32GB PCM 0.807 3.0 yrs
+ Lazy Write 0.725 3.4 yrs
+ Line Level Write Back 0.316 7.6 yrs
+ Bypass Streaming Apps 0.247 9.7 yrs
Table shows average bytes per cycle written to PCM and Average lifetime of PCM assuming Wmax = 10 million
Proposed filtering techniques reduce write traffic to PCM by 3.2X, increasing its lifetime from 3 to 9.7 years
![Page 22: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/22.jpg)
© 2007 IBM Corporation 22
Outline
Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary
![Page 23: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/23.jpg)
© 2007 IBM Corporation 23
Summary
Need more main memory capacity: DRAM hitting power, cost, scaling wall
PCM is an emerging technology – 4x denser than DRAM but with slower access time and limited write endurance
We propose a Hybrid Memory System (DRAM+PCM) that provides significant power and performance benefits
Proposed write filtering techniques reduce writes by 3x and increase PCM lifetime from 3 years to 9 years
Not touched in this talk but important: Exploiting non-volatile memories for system enhancement & related OS issues.
![Page 24: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The](https://reader033.vdocument.in/reader033/viewer/2022042206/5ea83ab55dda637111478122/html5/thumbnails/24.jpg)
© 2007 IBM Corporation 24
Thanks!