slam: high performance and energy efficient shared hybrid ...€¦ · prediction assisted stt-ram...
TRANSCRIPT
![Page 1: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/1.jpg)
Presented by: Swapnil Bhosale
Advisor: Dr. Sudeep Pasricha
Committee members:Dr. Sourajeet RoyDr. Wim Bohm
SLAM: High performance and energy efficient shared hybrid last level cache architecture in multicore systems
1
![Page 2: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/2.jpg)
• Introduction
• Related work
• Analysis of prior works
• Motivation
• Proposed SLAM framework
• Experimental setup
• Results
• Conclusion and future work
Overview
2
![Page 3: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/3.jpg)
• Most of the modern computing systems are multicore with multi-level cache memory
• Last level cache (LLC) is generally shared between private caches
Introduction
Source: http://www.eedailynews.com/2012/02/freescale-claims-highest-performance.html
Freescale semiconductor’s multi-core processor (B4860)
Source: http://www.cse.wustl.edu/~jain/cse567-11/ftp/multcore/index.html
3
![Page 4: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/4.jpg)
Introduction
Source: http://csillustrated.berkeley.edu/PDFs/handouts/cache-3-associativity-handout.pdf
4
![Page 5: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/5.jpg)
Introduction
Source: http://csillustrated.berkeley.edu/PDFs/handouts/cache-3-associativity-handout.pdf
5
![Page 6: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/6.jpg)
• Processor-memory
performance gap continues
to increase
• Traditional SRAM based
caches cannot cope up
with increasing gap
• Need for alternate memory
that can provide
o High capacity
o Less energy consumption
o Be closer to processor
Introduction
Source: https://mzh.io/%E5%A6%82%E4%BD%95%E8%AE%A9Go%E7%A8%8B%E5%BA%8F%E6%9B%B4%E5%BF%AB
6
![Page 7: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/7.jpg)
• Researchers proposed Spin Transfer Torque Random Access Memory (STTRAM)
• Attributes
o High density
o Low static power consumption
o Non-volatility
o Future scalability
o High endurance
o Small read latency
Introduction
Potential replacement for SRAM in cache memory hierarchy
Source: https://www.mram-info.com/stt-mram
7
![Page 8: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/8.jpg)
• Basic storage element is MTJ (Magnetic Tunnel Junction)
• Data is stored as relative magnetic orientation of two ferromagnetic layers
Source: https://www.mram-info.com/stt-mramSource: https://www.embedded.com/design/real-time-and-performance/4026000/The-future-of-scalable-STT-RAM-
as-a-universal-embedded-memory
Introduction
8
![Page 9: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/9.jpg)
SRAM STTRAM
Cell structure
Leakage power (for 1MB, 45nm tech node)
14.63mW2.32mW
(5-6x lesser than SRAM)
Area (for 1MB, 45nm tech node) 3.77sqmm0.95sqmm
(4-5x denser than SRAM)
Write latency 3.18ns12.01ns
(approx. 4x of SRAM)
Write energy 0.08nJ0.64nJ
(approx. 8x of SRAM)
Source: J. Ahn, S. Yoo and K. Choi, "DASCA: Dead Write
Prediction Assisted STT-RAM Cache Architecture," 2014
IEEE 20th International Symposium on High Performance
Computer Architecture (HPCA), Orlando, FL, 2014, pp. 25-
36.
Introduction
Need for techniques to overcome the drawbacks of STTRAM
9
![Page 10: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/10.jpg)
Introduction
• Related work
• Analysis of prior works
• Motivation
• Proposed SLAM framework
• Experimental setup
• Results
• Conclusion and future work
Overview
10
![Page 11: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/11.jpg)
• Some works focus on reducing
write energy by tuning MTJ
device propertieso “Relaxing non-volatility for fast and energy-
efficient STT-RAM caches”, [C. Smullen, et al,
IEEE HPCA, 2011]
o “Delivering on the promise of universal
memory for spin-transfer torque RAM (STT-
RAM)”, [C. Smullen, et al, IEEE ISLPED, 2011]
Related work
MTJ thickness MTJ write time
MTJ thickness MTJ Retention time
Tough compromise between MTJ write time and MTJ retention time
∝
∝
11
![Page 12: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/12.jpg)
• Other works focus on reducing write energy
at cell level
• Basic idea is to update bits with different
values
o "Energy reduction for STT-RAM using early write termination“,
[P. Zhou, et al, IEEE ICCAD, 2009]
o “Coding last level STT-RAM cache for high endurance and low
power”, [S. Yazdanshenas, et al, IEEE Computer Architecture
Letters, 2013]
These works do not consider non-uniformity of writes across the cache
Related work
1 0 1 0 1 1 1 1 1 0 1 0 1 1 0 0
Cache DATA field
Incoming bit pattern
1 0 1 0 1 1 1 1
Cache DATA field
12
comparator
![Page 13: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/13.jpg)
• There are some works that focus on reducing write
energy using hybrid last level cache (LLC) architecture
• Basic idea is to migrate write intensive cache lines to
SRAM region
o “Exploiting non-uniformity of write accesses for designing a high-
endurance hybrid last level cache”, [P. Safayenikoo, et al, IEEE CCECE,
2017]
o "High-endurance and performance-efficient design of hybrid cache
architectures through adaptive line replacement“, [A. Jadidi, et al, IEEE
ISLPED, 2011]
Provides better energy savings by taking advantage of both SRAM and STTRAM
way-0 SRAM
way-1 SRAM
way-2 STTRAM
way-3 STTRAM
way-4 STTRAM
way-5 STTRAM
way-6 STTRAM
way-7 STTRAM
8-way set in
hybrid cache
Related work
13
![Page 14: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/14.jpg)
Introduction
Related work
• Analysis of prior works
• Motivation
• Proposed SLAM framework
• Experimental setup
• Results
• Conclusion and future work
Overview
14
![Page 15: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/15.jpg)
Analysis of prior work PTHCM
TAG DATA
4-way
SRAM
12-way
STTRAM
A set in hybrid LLC
Main memory
• PTHCM (Prediction table hybrid cache management) use hybrid last level cache (LLC) comprised of SRAM and STTRAM
Prediction table
• Use prediction table to predict write intensive cache lines
• Migrate write intensive cache lines in STTRAM region to SRAM region
Citation: Baixing Quan, Tiefei Zhang, Tianzhou Chen and Jianzhong Wu, "Prediction table based management policy for STT-RAM and SRAM hybrid cache," 2012 7th International Conference on Computing and Convergence Technology (ICCCT), Seoul, 2012, pp. 1092-1097
What does
PTHCM do?
15
![Page 16: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/16.jpg)
Analysis of prior work PTHCM
Main memory
TAG WC AC WLC ALC
Prediction table
• Counters to keep access history of each cache lineo AC – actual access count of a cache line
(read/write)o WC – actual write count of a cache lineo ALC – prediction of access count of a cache lineo WLC – prediction of write count of a cache line
• Migration will happen ono Misso Write hit
• Prediction table is populated ono Eviction
TAG WC AC
Citation: Baixing Quan, Tiefei Zhang, Tianzhou Chen and Jianzhong Wu, "Prediction table based
management policy for STT-RAM and SRAM hybrid cache," 2012 7th International Conference on
Computing and Convergence Technology (ICCCT), Seoul, 2012, pp. 1092-1097
DATA
4-way
SRAM
12-way
STTRAM
A set in hybrid LLCHow does it
work?
16
![Page 17: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/17.jpg)
Analysis of prior work PTHCM
M
A
I
N
M
E
M
O
R
Y
x.TAG x.WC x.AC
TAG WC=0 AC=0 WLC ALC DATA
Line with minimum
WLC is replaced
Miss at line ‘x’
LRU line evicted
Citation: Baixing Quan, Tiefei Zhang, Tianzhou Chen and Jianzhong Wu, "Prediction table based
management policy for STT-RAM and SRAM hybrid cache," 2012 7th International Conference on
Computing and Convergence Technology (ICCCT), Seoul, 2012, pp. 1092-1097
Prediction table
TAG WC AC
x.TAG x.WC x.AC
Insert in SRAM to avoid write operation to
STTRAM
If entry not found, WLC and ALC are initialized to user set thresholds
DATA
Line ‘y’
Line ‘z’
4-way
SRAM
12-way
STTRAM
A set in hybrid LLC
• Counters to keep access history of each cache lineo AC – actual access count of a
cache line (read/write)o WC – actual write count of a
cache lineo ALC – prediction of access
count of a cache lineo WLC – prediction of write
count of a cache line
• Migration will happen on-o Misso Write hit
• Prediction table is populated on-o Eviction
17
![Page 18: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/18.jpg)
Analysis of prior work PTHCM
Main memory
TAG WC AC WLC ALC
Write hit
TAG WC++ AC++ WLC - - ALC - -
WC >
threshold
Line with minimum
WLC is replaced
Citation: Baixing Quan, Tiefei Zhang, Tianzhou Chen and Jianzhong Wu, "Prediction table based
management policy for STT-RAM and SRAM hybrid cache," 2012 7th International Conference on
Computing and Convergence Technology (ICCCT), Seoul, 2012, pp. 1092-1097
swap
Prediction table
TAG WC AC
• Counters to keep access history of each cache lineo AC – actual access count of
a cache line (read/write)o WC – actual write count of
a cache lineo ALC – prediction of access
count of a cache lineo WLC – prediction of write
count of a cache line
• Migration will happen on-o Misso Write hit
• Prediction table is populated on-o Eviction
DATA
Line ‘x’
Line ‘y’
4-way
SRAM
12-way
STTRAM
A set in hybrid LLC
18
![Page 19: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/19.jpg)
Analysis of prior work PTHCM
y.TAG y.WC y.AC
TAG WC AC WLC ALC
Main memory
Eviction
Citation: Baixing Quan, Tiefei Zhang, Tianzhou Chen and Jianzhong Wu, "Prediction table based
management policy for STT-RAM and SRAM hybrid cache," 2012 7th International Conference on
Computing and Convergence Technology (ICCCT), Seoul, 2012, pp. 1092-1097
Prediction table
TAG WC AC
y.TAG y.WC y.AC
If entry not found, make new entry in empty slot
If no empty slot, delete entry with minimum AC
• Counters to keep access history of each cache lineo AC – actual access count of a cache
line (read/write)o WC – actual write count of a cache
lineo ALC – prediction of access count of a
cache lineo WLC – prediction of write count of a
cache line
• Migration will happen on-o Misso Write hit
• Prediction table is populated on-o Eviction
DATA
Line ‘x’
Line ‘y’
4-way
SRAM
12-way
STTRAM
A set in hybrid LLC
19
![Page 20: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/20.jpg)
• Hardware overhead
o 3 bits to represent each of WC, AC, WLC and ALC
o 12 bits extra added to every cache line in LLC
o 65536 cache lines in 4MB hybrid LLC and 64B blocksize
o 12*65536 ~ 98kB additional space in LLC
o Considering 14 bits to represent TAG
o Each entry in prediction table is 20 bits in size
o 65536 entries in prediction table
o Size of prediction is 20*65536 ~ 163kB
o Size of swap/migration buffer ~ 68B
o Total hardware overhead = 262kB ~ 6.39% of LLC
WC AC WLC ALC
TAG DATA
Prediction table
TAG WC AC
Analysis of prior work PTHCM
163kB prediction table
Notable hardware overhead
20
TAG DATA
68B swap/migration buffer
4MB hybrid LLC
12 bits of extra fields
per cache line
![Page 21: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/21.jpg)
P1 P3
L1:
x=5L1:
Shared mem:
x=5
P2
L1:
x=5
Background: Cache coherence
P1 P3
L1:
x=6L1:
Shared mem:
x=5
P2
L1:
x=INV
eviction
Write-back
Rd->xWr->x
• Uniformity of shared resource data
• Achieved by writing back modified data to shared memory when o Evicted by ownero Requested by peer
processor
Coherent view of memory Non-coherent view of memory
21
![Page 22: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/22.jpg)
P1 P3
L1:
x=5L1:
Shared mem:
x=5
P2
L1:
x=5
Background: Cache coherence
P1 P3
L1:
x=6L1:
Shared mem:
x=6
P2
L1:
x=INV
• Uniformity of shared resource data
• Achieved by writing back modified data to shared memory when o Evicted by ownero Requested by peer
processor
Coherent view of memory Coherent view of memory
22
![Page 23: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/23.jpg)
Main memory
Analysis of prior work RWEEHCCitation: S. Agarwal and H. K. Kapoor, "Restricting writes for energy-efficient hybrid cache in multi-core architectures," 2016 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), Tallinn, 2016, pp. 1-6
TAG DATA
4-way
SRAM
12-way
STTRAM
A set in hybrid LLC
• RWEEHC (Restricting writes to energy efficient hybrid cache) use hybrid last level cache (LLC) comprised of SRAM and STTRAM
• Exploit cache coherency to predict write intensive cache lines
• Migrate write intensive cache lines in STTRAM region to SRAM region
What does
RWEEHC do?
23
![Page 24: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/24.jpg)
Analysis of prior work RWEEHC
• Adds extra states (STT_STATE) to predict write intensive cache block
• STT_STATES• P: Dataless entry into STTRAM region • ST-D: Possible candidate for migration
to SRAM• SR-C: Block migrated to SRAM region
• Migration is done on• Writeback to a block in ST-D state in
STTRAM region
Citation: S. Agarwal and H. K. Kapoor, "Restricting writes for energy-efficient hybrid cache in multi-
core architectures," 2016 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-
SoC), Tallinn, 2016, pp. 1-6
Main memory
TAG STT_STATE DATA
4-way
SRAM
12-way
STTRAM
A set in hybrid LLCHow does it
work?
24
![Page 25: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/25.jpg)
Block ‘x’
TAG DATA
L1
Miss at line ‘x’
Analysis of prior work RWEEHC
• STT_STATES• P: Dataless entry into STTRAM region • ST-D: Possible candidate for migration
to SRAM• SR-C: Block migrated to SRAM region
• Migration is done on• Writeback to a block in
ST-D state in STTRAM region
Citation: S. Agarwal and H. K. Kapoor, "Restricting writes for energy-efficient hybrid cache in multi-
core architectures," 2016 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-
SoC), Tallinn, 2016, pp. 1-6
• Adds extra states (STT_STATE) to predict write intensive cache block
TAG STT_STATE DATA
4-way
SRAM
12-way
STTRAM
A set in hybrid LLC
Main memory
25
![Page 26: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/26.jpg)
Block ‘x’
TAG DATA
10100110 101111001010
Core0 - L1
Miss at line ‘x’
Dataless
entry
x
Analysis of prior work RWEEHC
• STT_STATES• P: Dataless entry into STTRAM region • ST-D: Possible candidate for migration
to SRAM• SR-C: Block migrated to SRAM region
• Migration is done on• Writeback to a block
in ST-D state in STTRAM region
Citation: S. Agarwal and H. K. Kapoor, "Restricting writes for energy-efficient hybrid cache in multi-
core architectures," 2016 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-
SoC), Tallinn, 2016, pp. 1-6
• Adds extra states (STT_STATE) to predict write intensive cache block
TAG STT_STATE DATA
10100110 P
4-way
SRAM
12-way
STTRAM
A set in hybrid LLC
x
Main memory
26
![Page 27: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/27.jpg)
Block ‘x’
TAG DATA
10100110 101111011111
Core 0 - L1
x
TAG DATA
Core 1 - L1
Analysis of prior work RWEEHC
Rd ‘x’
Writeback
• STT_STATES• P: Dataless entry into
STTRAM region • ST-D: Possible candidate
for migration to SRAM• SR-C: Block migrated to
SRAM region
• Migration is done on• Writeback to a block in
ST-D state in STTRAM region
Transition to ST-D state on writeback in P state
Citation: S. Agarwal and H. K. Kapoor, "Restricting writes for energy-efficient hybrid cache in multi-
core architectures," 2016 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-
SoC), Tallinn, 2016, pp. 1-6
• Adds extra states (STT_STATE) to to predict write intensive cache block
TAG STT_STATE DATA
10100110 P
4-way
SRAM
12-way
STTRAM
A set in hybrid LLC
x
Main memory
27
![Page 28: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/28.jpg)
Block ‘x’
TAG DATA
10100110 101111011111
Core 0 - L1
x
TAG DATA
Core 1 - L1
Dirty
eviction
Analysis of prior work RWEEHCTransition to ST-D state on writeback in P state
Writeback
• STT_STATES• P: Dataless entry into
STTRAM region • ST-D: Possible candidate
for migration to SRAM• SR-C: Block migrated to
SRAM region
• Migration is done on• Writeback to a block in
ST-D state in STTRAM region
Citation: S. Agarwal and H. K. Kapoor, "Restricting writes for energy-efficient hybrid cache in multi-
core architectures," 2016 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-
SoC), Tallinn, 2016, pp. 1-6
• Adds extra states (STT_STATE) to to predict write intensive cache block
TAG STT_STATE DATA
10100110 P
4-way
SRAM
12-way
STTRAM
A set in hybrid LLC
x
Main memory
28
![Page 29: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/29.jpg)
Block ‘x’
TAG DATA
10100110 101111011111
Core 0 - L1
x
TAG DATA
Core 1 - L1
Analysis of prior work RWEEHC
• STT_STATES• P: Dataless entry into
STTRAM region • ST-D: Possible candidate
for migration to SRAM• SR-C: Block migrated to
SRAM region
• Migration is done on• Writeback to a block in
ST-D state in STTRAM region
Possible candidate for migrationCitation: S. Agarwal and H. K. Kapoor, "Restricting writes for energy-efficient hybrid cache in multi-
core architectures," 2016 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-
SoC), Tallinn, 2016, pp. 1-6
• Adds extra states (STT_STATE) to to predict write intensive cache block
TAG STT_STATE DATA
10100110 ST-D 101111011111
4-way
SRAM
12-way
STTRAM
A set in hybrid LLC
x
Main memory
29
![Page 30: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/30.jpg)
Block ‘x’
TAG DATA
10100110 101111011100
Core 0 - L1
x
TAG DATA
10100110 101111011100
Core 1 - L1
PAUSE
Migrate to
SRAM
region
Analysis of prior work RWEEHC
x
• STT_STATES• P: Dataless entry into
STTRAM region • ST-D: Possible
candidate for migration to SRAM
• SR-C: Block migrated to SRAM region
• Migration is done on• Writeback to a block in
ST-D state in STTRAM region
Writeback
to ‘x’
Citation: S. Agarwal and H. K. Kapoor, "Restricting writes for energy-efficient hybrid cache in multi-
core architectures," 2016 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-
SoC), Tallinn, 2016, pp. 1-6
• Adds extra states (STT_STATE) to to predict write intensive cache block TAG STT_STATE DATA
10100110 ST-D 101111011111
4-way
SRAM
12-way
STTRAM
A set in hybrid LLC
x
Main memory
30
![Page 31: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/31.jpg)
Block ‘x’
Analysis of prior work RWEEHC
TAG DATA
10100110 101111011111
Core 0 - L1
x
TAG DATA
10100110 101111011111
Core 1 - L1
x
• STT_STATES• P: Dataless entry into
STTRAM region • ST-D: Possible candidate
for migration to SRAM• SR-C: Block migrated to
SRAM region
• Migration is done on• Writeback to a block in
ST-D state in STTRAM region
Citation: S. Agarwal and H. K. Kapoor, "Restricting writes for energy-efficient hybrid cache in multi-
core architectures," 2016 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-
SoC), Tallinn, 2016, pp. 1-6
• Adds extra states (STT_STATE) to to predict write intensive cache block
TAG STT_STATE DATA
10100110 SR-C 101111011111
4-way
SRAM
12-way
STTRAM
A set in hybrid LLC
x
Main memory
31
![Page 32: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/32.jpg)
Analysis of prior work RWEEHC
TAG DATA
10100110 101111011111
Core 0 - L1
x
TAG DATA
10100110 101111011111
Core 1 - L1
x
Resume
the
Writeback
• STT_STATES• P: Dataless entry into
STTRAM region • ST-D: Possible
candidate for migration to SRAM
• SR-C: Block migrated to SRAM region
• Migration is done on• Writeback to a block in
ST-D state in STTRAM region
SR-C is stable state
Citation: S. Agarwal and H. K. Kapoor, "Restricting writes for energy-efficient hybrid cache in multi-
core architectures," 2016 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-
SoC), Tallinn, 2016, pp. 1-6
• Adds extra states (STT_STATE) to to predict write intensive cache block
TAG STT_STATE DATA
10100110 SR-C 101111011100
4-way
SRAM
12-way
STTRAM
A set in hybrid LLC
x
Block ‘x’
Main memory
32
![Page 33: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/33.jpg)
Analysis of prior work RWEEHCTAG STT_STATE DATA
• Hardware overhead
o 2 bits to represent STT_STATE
o 65536 cache lines in 4MB hybrid LLC
and 64B blocksize
o 66B for swap/migration buffer
o 2*65536 + 528 ~ 16kB additional
space in LLC
o Total hardware overhead = 16kB ~
0.39% of LLC
Negligible hardware overhead
33
TAG STT_STATE DATA
66B swap/migration buffer
4MB hybrid LLC
16kB space for STT_STATE
![Page 34: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/34.jpg)
Analysis of prior work RWEEHC
• Performance overhead
o Dataless entry cause high writebacks
to LLC
o Writeback buffer gets full more often
o Hence system stalls more often
P1 P3
L1:
x=6 L1:
Shared mem:
P2
L1:
x=INV
Eviction
Write-back ‘x’
(clean/dirty)
Rd->xWr->x
Main memory
On miss at line ‘x’
34
![Page 35: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/35.jpg)
Analysis of prior work RWEEHC
• Performance overhead
o Dataless entry cause high writebacks
to LLC
o Writeback buffer gets full more often
o Hence system stalls more often
Performance affected due to stalling
P1 P3
L1:
x=6 L1:
Shared mem:
x=6
P2
L1:
x=INV
Eviction
Write-back ‘x’
(clean/dirty)
Rd->xWr->x
Main memory
On miss at line ‘x’
35
![Page 36: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/36.jpg)
Introduction
Related work
Analysis of prior works
• Motivation
• Proposed SLAM framework
• Experimental setup
• Results
• Conclusion and future work
Overview
36
![Page 37: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/37.jpg)
• Use hybrid last level cache (LLC) comprised of SRAM and STTRAM
• Use existing cache block state to track eviction of dirty block from L1
• Avoid writebacks to STTRAM region of LLC due to eviction of dirty block from L1
Motivation
What does
SLAM do?
37
![Page 38: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/38.jpg)
System configuration
CPU x86, 2.66GHz, 4-cores, out of order execution
L1 cache 32kB SRAM split I/D caches8-way, 64B blocksize4-cycle read and write latencyLRU replacement policywrite-invalidate, write-backdirectory-based MESI
L2 cache/ LLC 4MB 16-way inclusive hybrid (1MB SRAM + 3MB STTRAM)4-way SRAM and 12-way STTRAM, 64B blocksize8-cycle SRAM read and write latency8-cycle STTRAM read latency32 cycle STTRAM write latencyLRU replacement policywrite-back cache
Simulator used SNIPER v6.1 (multi-core, parallel, trace-driven, high-speed and accurate x86 simulator)
Benchmarks used PARSEC-2.1 and SPLASH-2
Motivation
38
![Page 39: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/39.jpg)
Sources of writes to LLC
Coherency writes constitute 60% of all the writes
Motivation
39
coherency core prefetch
![Page 40: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/40.jpg)
Motivation
Writebacks due to eviction of dirty blockconstitute 88% of all coherency writes
40
Writebacks due to dirty eviction
Writebacks due to request from another
core
Coherency writes
![Page 41: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/41.jpg)
Can we avoid coherency
writes to LLC?
Copy is requested by
another core
(priority writeback)
Copy is NOT requested
by another core
(NOT a priority writeback)
Writeback due to dirty eviction
Motivation
Writeback due to request from peer processor
41
![Page 42: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/42.jpg)
Introduction
Related work
Analysis of prior works
Motivation
• Proposed SLAM framework
• Experimental setup
• Results
• Conclusion and future work
Overview
42
![Page 43: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/43.jpg)
C0
L1 L1
43
C3
state
LRU
bits
count
Tag Data
M 5
M 3
M 4
M 2
M 7
E 6
M 1
S 0
Line
index
0
1
2
3
4
5
6
7
A set in L1 cache (8-way)
4-way SRAM region
12-way STTRAM region
TAG DATA
SLAM frameworkLi
ne
4 is
LR
U
Check if writeback in STTRAM region
Search for clean block
Drop clean block silently
16-way set in hybrid L2/LLC
SLAM
![Page 44: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/44.jpg)
SLAM• Hardware overhead
o 32-bit buffer for each L1 to hold address
of actual LRU dirty block selected for
eviction from L1
o Two 2-bit registers for each L1 to
represent one cache block state from
{M,E,S,I}
o Total hardware overhead = 4*32 + 4*2*2
= 144 bits = 18B
o Negligible compared to 4MB LLC
Negligible hardware overhead
44
32-bit address buffer
TAG DATA
4MB hybrid LLC
2-bit register2-bit register
x4
![Page 45: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/45.jpg)
SLAM
45
• Performance overhead
o Extra access to 4MB LLC cost 8 cycles
o 1 cycle to load cache block states from L1 into
buffer for comparison
o 1 cycle for performing comparison
o Clean block is searched iteratively in entire 8-
way set of L1
Extra LLC access cycles Extra execution cycles Extra total cycles
Best case 8 2 10
Worst case 8 14 22
o Best case- Clean block is found in first iteration
Number of cycles = 2 + 8 = 10 cycles
o Worst case- Clean block is found in last iteration
Number of cycles = 2*7 + 8 = 22 cycles
o Each writeback to STTRAM region needs 32 cycles
(write latency of STTRAM)
o Hence performance of overall system is maintained
![Page 46: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/46.jpg)
WC AC WLC ALC
TAG STT_STATE DATA
PTHCM RWEEHC SLAM
TAG DATA
Hardware overhead comparison
46
TAG DATA
68B swap/migration buffer 66B swap/migration buffer
TAG WC AC
TAG STT_STATE DATA
163kB prediction table
12 bits for extra fields
per cache line
4MB hybrid LLC 4MB hybrid LLC
16kB space for STT_STATE
32-bit address buffer
2-bit register
x4
x4
TAG DATA
4MB hybrid LLC
2-bit register
![Page 47: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/47.jpg)
SRAM LLC
2MB
Hybrid LLC
4MB(1MB SRAM + 3MB
STTRAM)
STTRAM LLC
8MB
44.7305 𝑚𝑚2 44.7305 𝑚𝑚244.7305 𝑚𝑚2
SLAM
47
![Page 48: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/48.jpg)
SRAM-STTRAM partition in hybrid LLC
• Total energy least for 4-12 combination (4-
way SRAM and 12-way STTRAM) partition
is least
• 4-12 combination is the best fit for the
selected LLC on-chip area
• Results shown for only 4 workloads for
brevity; conclusions are the same across
other workloads
SLAM
48
![Page 49: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/49.jpg)
Introduction
Related work
Analysis of prior works
Motivation
Proposed SLAM framework
• Experimental setup
• Results
• Conclusion and future work
Overview
49
![Page 50: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/50.jpg)
Benchmark selection
• PARSEC-2.1 and SPLASH-2
• Parallel and multi-threaded
• Diverse application domain
• Large usage and exchange of shared data
Workload Application Domain%
coherency writes
swaptions Financial analysis 68%
freqmine Data mining 68%
fluidanimate Animation 30%
raytrace Graphics 32%
cholesky Sparse matrix factorization kernel 66%
barnes N-body problem (3D) 65%
fmm N-body problem (2D) 39%
lu.cont Dense matrix factorization kernel 89%
fft Blocked matrix transpose kernel 36%
ocean.cont Large-scale ocean movements 94%
radix Integer radix sort kernel 75%
Experimental setup
50
![Page 51: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/51.jpg)
Power and energy parameters
• Extracted from CACTI and NVSim for
STTRAM
• Scaled for 45nm technology from various
previous works
• Used to evaluate total LLC energy
consumption
2MB SRAM LLC
8MB STTRAM LLC
4MB Hybrid LLC (SRAM/STTRAM)
Readenergy
(nJ/access)0.3072 0.1484 0.3072/0.1484
Write energy
(nJ/access)0.3072 2.78 0.3072/2.78
Static power (mW)
3825 1040 2302.5
Experimental setup
51
![Page 52: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/52.jpg)
Experimental setup
52
• Applications were run to completion on all four cores while exploiting cache coherence with detailed
models of cores, caches and interconnection networks
• Number of LLC accesses were collected for entire application runtime to evaluate-
o Minimized writes to STTRAM
o Decreased total energy consumption of LLC
• Performance is measured in terms of IPC (Instructions Per Cycle)
• Comparison of SLAM’s energy and performance with
o PTHCM [B. Quan, et al, IEEE ICCCT, 2012]
o RWEEHC [S. Agarwal, et al, IEEE VLSI-SoC, 2016]
Simulator setup
• Simulator used – SNIPER v6.1 (multi-core, parallel, trace-driven, high-speed and accurate x86 simulator)
• Used two metrics for evaluation- Total LLC energy and overall system performance
![Page 53: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/53.jpg)
Introduction
Related work
Analysis of prior works
Motivation
Proposed SLAM framework
Experimental setup
• Results
• Conclusion and future work
Overview
53
![Page 54: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/54.jpg)
Energy evaluation for SLAM
• The use of hybrid LLC architecture saved energy compared to SRAM-only and STTRAM-only LLC architectures
• Negligible use of external hardware led to significant energy savings compared to PTHCM and RWEEHC
Comparisonarchitecture
Average LLC energy savings
SRAM 18.94%
STTRAM 32.31%
PTHCM 38.79%
RWEEHC 8.97%
Results
54
![Page 55: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/55.jpg)
Performance evaluation for SLAM
• SLAM outperforms SRAM-only and STTRARM-only LLC architectures by avoiding writeback operations, thus
avoiding saturating the writeback buffer
• SLAM outperforms PTHCM and RWEEHC by eliminating migration/swapping between SRAM and STTRAM regions
Comparisonarchitecture
Average IPC improvement
SRAM 4.631%
STTRAM 0.607%
PTHCM 6.863%
RWEEHC 0.407%
Results
55
![Page 56: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/56.jpg)
Introduction
Related work
Analysis of prior works
Motivation
Proposed SLAM framework
Experimental setup
Results
• Conclusion and future work
Overview
56
![Page 57: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/57.jpg)
Conclusion• Designed a framework that
o Tracks writeback operations to LLC
o Avoid writeback operations to STTRAM region of LLC due to dirty eviction from L1
• Did comprehensive energy and performance comparison based on same area
constraint with
o Baseline SRAM based LLC architecture
o Baseline STTRAM based LLC architecture
o PTHCM based hybrid LLC architecture
o RWEEHC based hybrid LLC architecture
• Compared to SRAM, STTRAM, PTHCM and RWEEHC
o Achieved 18.94%, 32.31%, 38.79% and 8.97% total LLC energy savings respectively
o Achieved 4.631%, 0.607%, 6.863% and 0.407% improvement in performance respectively
57
![Page 58: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/58.jpg)
Future work
• Three and higher level cache hierarchy where writeback operations to LLC may vary with levels of cache
• Exclusive LLC as it is populated only through writebacks due to eviction from L1
• Write-through LLC wherein writebacks due to conflict miss at L1 are part of non-idle CPU
• Lower nanometer technologies wherein writes to STTRAM are unstable because of smaller
MTJ thickness
There are several potential extensions to our work, for example, consideration of
58
![Page 59: SLAM: High performance and energy efficient shared hybrid ...€¦ · Prediction Assisted STT-RAM Cache Architecture," 2014 IEEE 20th International Symposium on High Performance Computer](https://reader035.vdocument.in/reader035/viewer/2022071009/5fc7364215712c58cf6087b9/html5/thumbnails/59.jpg)
Thank you
59