safely overclocking flash i/o in ssds
TRANSCRIPT
Safely Overclocking Flash I/O in SSDs
Kai Zhao, SanDisk
Rino Micheloni, PMC-Sierra
Tong Zhang, Rensselaer Polytechnic Institute
Flash Memory Summit 2015Santa Clara, CA 1
Outline
• Motivation – ECC capability may be under utilized
• Benefit – improve system performance, especially for read operations
• Reliability – leverage LDPC codes to mitigate the impact of overlocking
Flash Memory Summit 2015Santa Clara, CA 2
Page Reliability Variation
• Larger reliability variations for chips even from the same batch
• Raw BER varies dramatically under different P/E cycles and with different retention time
Flash Memory Summit 2015Santa Clara, CA 3
0 5 10 15 20 25 3010
-5
10-4
10-3
10-2
Retention Time (day)
Ra
w B
ER
1000 P/E cycles5000 P/E cycles
C
D
B
A
0 5 10 150.00
0.05
0.10
0.15
Pa
ge
Co
un
t F
req
ue
ncy
Number of page errors
1k P/E cycles, 2 days retention
0 200 4000.00
0.05
0.10
0.15
Point DPoint CPoint B
5k P/E cycles, 30 days retention
Point A
Data Link Overclocking
• Worst-case based ECC is under utilization in most of the time• We could trade reliability (when raw BER is low) for performance• One possible way – Overclocking on data link
Flash Memory Summit 2015Santa Clara, CA 4
Potential Benefits
• Total latency = Operation (read/program) latency +
Data transfer latency + ECC decoding latency
• ECC decoding latency is relatively small
• Transfer latency is comparable to read latency
• Smaller transfer latency could significantly improve read performance
Flash Memory Summit 2015Santa Clara, CA 5
𝜏 h𝑝 𝑦−𝑎𝑐𝑐=𝜏𝑜𝑝+𝜏 𝑥𝑓𝑒𝑟+𝜏 𝐸𝐶𝐶
Practical Considerations
• Characterize errors caused by overclocking
• Errors in the overclocked data link
• Errors in the NAND flash cell array
• Overclocking for read and/or write path
• Evaluate read/write path separately
• Permanent errors on write path
• Adaptation of overclocking in late lifetime of the drive
• Read retry
• Multiple data transfer rate
Flash Memory Summit 2015Santa Clara, CA 6
Feasibility of Overclocking
• ONFI 2.2 compliant chips supporting data transfer rate up to 166 MBPS according to the datasheet
• Data link overclocked up to 275 MBPS
Flash Memory Summit 2015Santa Clara, CA 7
200 212.5 225 237.5 250 262.5 27510
-6
10-5
10-4
10-3
10-2
Transfer Data Rate (MHz)
RB
ER
Read Overclocking: Plane 0Read Overclocking: Plane 1Write Overclocking: Plane 0Write Overclocking: Plane 1
<
Different overclocking capability for the 2 planes
System Performance for Read Overclocking
Flash Memory Summit 2015Santa Clara, CA 8
WebSearch prj mds hm Financial0.0
0.5
1.0
1.5
Nor
mal
ized
Rea
d R
espo
nse
Tim
e 166 MBps 200 MBps 250 MBps 275 MBps
WebSearch prj mds hm Financial0.00
0.05
0.10
0.15
0.20
0.25
Rea
d R
espo
nse
Tim
e R
educ
tion 4 kB 8 kB 16 kB
•Read response time, page size is 8 kB
•Data link overclocked up to 275 MBPS
•Comparison of average read response time reduction with respect to page size
•Data transfer rate is 275 MBPS
Simulator: DiskSim with SSD addonTraces: MSR Cambridge trace, UMASS trace repository
WebSearch prj mds hm Financial0.85
0.90
0.95
1.00
1.05
Nor
mal
ized
Rea
d R
espo
nse
Tim
e
250 MBps Read-Retry, 10% retry probability Read-Retry, 5% retry probability) Multi-Transfer-Rate
Late Lifetime Performance
• Average read response time under different adaption strategies• Read retry vs. multi-transfer-rate for different planes
• Read retry doesn’t sense the flash array, just re-transfer data• Direct lower transfer speed for the worse plane
Flash Memory Summit 2015Santa Clara, CA 9
2500 3000 3500 4000 4500 5000
0.9
0.95
1
1.05
1.1
Number of P/E CyclesN
orm
aliz
ed
Re
ad
Re
spo
nse
Tim
e
WebSearchFinancialhmprj
Read retry only
Late Lifetime Performance
• Average read response time when combining read retry and multi-transfer-rate
• After 4000 P/E cycles, only lower the transfer rate for Plane 0
Flash Memory Summit 2015Santa Clara, CA 10
2500 3000 3500 4000 4500 5000
0.9
0.95
1
1.05
Number of P/E Cycles
No
rma
lize
d R
ea
d R
esp
on
se T
ime
WebSearchFinancialhmprj
Link Error Mitigation using LDPC
• Effectiveness of the proposed overclocking design strategy depends on how well the ECC can handle the extra errors
• Leveraging LDPC codes and data link error characteristics to• Improving the tolerance to overclocking induced errors, which can lead to
lower data re-transfer rate
• Reducing the average number of LDPC decoding iterations, which can lead to lower power consumption and higher decoding throughput
• Two design techniques• Overclocking-aware LLR Calculation
• Inter-plane interleaving
Flash Memory Summit 2015Santa Clara, CA 11
Inter-plane Interleaving
• Each plane has its own page register and I/O circuitry
• Performance after overclocking may be varying
• In our measurement, Plane 0 is much more vulnerable to overclocking-induced I/O errors
Flash Memory Summit 2015Santa Clara, CA 12
0 100 2000.0
0.1
0.2
0.3
Plane 0Plane 1
Pag
e C
ount
Fre
quen
cy
Number of Errors in One Page
Read Overclocking 275 MBps
• The input of LDPC code decoding is the estimated log-likelihood ratio (LLR)
• Binary Asymmetric Channel (BAC). Different error characteristics for each wire and plane
• In the overclocked SSD read channel, LLR can be expressed as
Overclocking-aware LLR Calculation
Flash Memory Summit 2015Santa Clara, CA 13
LLR (𝑥𝑘)=𝑝 (𝑥𝑘=0∨𝑦 𝑘)𝑝 (𝑥𝑘=1∨𝑦𝑘 )
LLR (𝑥𝑘)=𝛴𝑖=0
𝑙 𝑝𝑚(𝑙 ) ( 𝑦𝑘∨𝑣𝑘=𝑖 )𝑝 (𝑣𝑘=𝑖∨𝑥𝑘=0 , 𝑐 )
𝛴 𝑖=0𝑙 𝑝𝑚
( 𝑙 ) (𝑦 𝑘∨𝑣𝑘=𝑖 )𝑝 (𝑣𝑘=𝑖∨𝑥𝑘=1 , 𝑐 )
Error Pattern Characterization
• Plane 0 suffers from more overclocking induced errors, moreover, these errors non-uniformly distributed across the 8-bit bus.
• The pattern ‘0’=>’1’ dominates the errors from Plane 0, while on Plane 1, patterns ‘0’=>’1’ and ‘1’=>’0’ evenly distributed.
Flash Memory Summit 2015Santa Clara, CA 14
0
0.2
0.4
0.6
0.8
1
1.2
Rel
ativ
e Fr
eque
ncy Plane 0
'0' => '1'
Plane 0'1' => '0'
Plane 1'1' => '0'
Plane 1'0' => '1'
Error Pattern
Simulation Results
• LDPC decoding power reduction• The decoding power is determined by the number of iterations• Data re-transfer rate reduction• Data link overclocked to 275 MBPS
Flash Memory Summit 2015Santa Clara, CA 15
0 1000 2000 3000 40000
20
40
60
80
100
120
LDP
C d
ecod
ing
pow
er r
educ
tion
(%)
P/E cycles
Overclocking-aware LDPC decodingInter-plane interleavingCombined
0 1000 2000 3000 4000 500010
-6
10-4
10-2
100
Re-
tran
sfer
pro
babi
lity
P/E cycles
Conventional LDPC decoding
Overclocking-aware LDPC decoding
Inter-plane interleaving
Combined
Conclusion
• Overclock data link to fully utilize the ECC capability
• Read performance can be significantly improved
• Mitigate the impact of overclocking by leveraging LDPC codes and the error characterization.
Flash Memory Summit 2015Santa Clara, CA 16
Flash Memory Summit 2015Santa Clara, CA 17
Thank you