exploring the performance implications of memory safety ... · computer architecture group...
TRANSCRIPT
![Page 1: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/1.jpg)
COMPUTER ARCHITECTURE GROUP
Exploring the Performance Implications of Memory Safety
Primitives in Many-core Processors Executing Multi-threaded Workloads
Masab Ahmad, Syed Kamran Haider, Farrukh Hijaz, Marten van Dijk, Omer Khan
University of ConnecAcut
![Page 2: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/2.jpg)
COMPUTER ARCHITECTURE GROUP
Agenda MoAvaAon CharacterizaAon Methodology
CharacterizaAon Results Insights and Possible Improvements
![Page 3: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/3.jpg)
COMPUTER ARCHITECTURE GROUP
Agenda Mo#va#on CharacterizaAon Methodology
CharacterizaAon Results Insights and Possible Improvements
![Page 4: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/4.jpg)
COMPUTER ARCHITECTURE GROUP
Security Vulnerabilities in Processors
Vulnerable Applications Buffer Overflow Attacks Code Injection
OS
CPU
DRAM I/O Disk
Applications Compromised OS Privacy Leakage
Malicious I/Os DoS Attacks
Other Devices Data Tampering
Side Channels
![Page 5: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/5.jpg)
COMPUTER ARCHITECTURE GROUP
Security Vulnerabilities in Processors
Vulnerable Applications Buffer Overflow Attacks Code Injection
OS
CPU
DRAM I/O Disk
Applications Compromised OS Privacy Leakage
Malicious I/Os DoS Attacks
Other Devices Data Tampering
Side Channels
Performance
Security
![Page 6: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/6.jpg)
COMPUTER ARCHITECTURE GROUP
Key Questions • What are the performance implicaAons of these security schemes?
• How do they affect performance in the context of mulAcores?
• IEEE Computer Vision 2022 predicts that MulAcores as a key enabling technology by 2022
• To get started, we look at Buffer Overflow Protec/on Schemes
• How do they affect tradeoffs in MulAcores?
![Page 7: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/7.jpg)
COMPUTER ARCHITECTURE GROUP
A Primer on Buffer Overflow Attacks • The stack return address is located aRer the program contents
• An aSacker enters a modified string, overwrites the return address
• The return address now points to the malicious code
a+2
a+2
Malicious Code
a+1
a
a+n
a+2
……
a+3
a+2
a+2
Return Address
Other Variables
Buffer
a+1
a
a+n
a+2
……
a+3
Overflow ASack
overwrites Return Address
A)acker Comes In
A)acker takes Control
![Page 8: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/8.jpg)
COMPUTER ARCHITECTURE GROUP
Security Solutions and Prior Works • Dynamic InformaAon Flow Tracking (DIFT) :
• Flags and Tags all data from I/O • Raises excepAons of tagged data propagates into program control flow • High False posiAve rates, Zero false negaAves (Hence not used in applicaAons)
• Context based Checking : • Creates a “context”, an idenAfier for each variable/data structure in the program
• Checks on each variable read/write • Large data structure required, 1 element for each array element. Pointer aliasing problems (Hence also not used in applicaAons)
• Bounds Checking : • Creates a base and bounds Metadata data structure for a program • Reads and writes are checked and updated as meta data • Close to Zero False posiAves and negaAves (Used a lot in Safe languages etc)
![Page 9: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/9.jpg)
COMPUTER ARCHITECTURE GROUP
Security Solutions and Prior Works • Dynamic InformaAon Flow Tracking (DIFT) :
• Flags and Tags all data from I/O • Raises excepAons of tagged data propagates into program control flow • High False posiAve rates, Zero false negaAves (Hence not used in applicaAons)
• Context based Checking : • Creates a “context”, an idenAfier for each variable/data structure in the program
• Checks on each variable read/write • Large data structure required, 1 element for each array element. Pointer aliasing problems (Hence also not used in applicaAons)
• Bounds Checking : • Creates a base and bounds Metadata data structure for a program • Reads and writes are checked and updated as meta data • Close to Zero False posiAves and negaAves (Used a lot in Safe languages etc)
![Page 10: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/10.jpg)
COMPUTER ARCHITECTURE GROUP
Bound Checking Schemes • Hardware based Schemes:
• CHERI • HardBound • WatchdogLite
• SoRware based Schemes: • So5Bound • Cyclone • SafeCode • Ccured • And many others • Safe Languages such as Python and Java
Return Address Other Variables
Buffer
a+1
a
a+n
a+2
……
a+3
Overflow ASack
cannot write beyond the Bound
Base
Bound
Compute Pipeline
Bounds Check
Extra Memory Accesses
Caches Bounds Metadata
DRAM Bounds Metadata
Bounds Checks
![Page 11: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/11.jpg)
COMPUTER ARCHITECTURE GROUP
Performance Implications in Multicores
Compute Pipeline
Bounds Check
Increased Coherency Bo)lenecks
Shared Cache
Bounds Metadata
Compute Pipeline
Bounds Check
Core Core
Bounds Metadata
Cache Pollu#on and Interference
Synchroniza#on Bo)lenecks as more #me spent in cri#cal
code sec#ons
Compute Stalls for Bound Checks
Reduced Scalability
…
• Previous works have not analyzed memory safety schemes in the context of mulAcores
• All prior works use sequenAal benchmarks such as SPEC
Extra Memory Accesses
DRAM Bounds Metadata
![Page 12: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/12.jpg)
COMPUTER ARCHITECTURE GROUP
Performance Implications in Multicores : Critical Code • CriAcal code secAons such as atomically locked program funcAons are most affected.
• Due to bounds checking stalls, a thread keeps a locked secAon for a longer Ame, depriving other threads from exploiAng scalability.
Code Without So5Bound
1. int a = 1;
2. pthread_mutex_lock(&lock);
3. do_parallel_work();
4. pthread_mutex_unlock(&lock);
5. return 0;
Code With So5Bound
1. int a = 1;
2. pthread_mutex_lock(&lock);
3. do_parallel_work();
4. Get Security Metadata();
5. SoRBound Checks ();
6. pthread_mutex_unlock(&lock);
7. return 0;
![Page 13: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/13.jpg)
COMPUTER ARCHITECTURE GROUP
Agenda MoAvaAon Characteriza#on Methodology
CharacterizaAon Results Insights and Possible Improvements
![Page 14: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/14.jpg)
COMPUTER ARCHITECTURE GROUP
System Model
Mul#core Processor
SoRBound
.EXE Executables compiled with SoRBound
ApplicaAon Data
DRAM
SoRBound Metadata
Tile
Graph
ite Sim
ulator
![Page 15: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/15.jpg)
COMPUTER ARCHITECTURE GROUP
System Parameters • The Graphite Simulator is used to analyze Parallel Benchmarks
• 256 Cores @ 1 GHz • Results for both in-‐order and
Out-‐of-‐Order cores • L1-‐I Cache : 32 KB per core • L1-‐D Cache : 32 KB per core • L2 Cache : 256 KB per core • OOO ROB Buffer Size: 168
• An 8 Core Intel machine used as well (Trends similar as Graphite results) (Results in Paper)
• POSIX Threading Model is used • Evalua#on Includes results for 1-‐256 Threads
![Page 16: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/16.jpg)
COMPUTER ARCHITECTURE GROUP
Evaluated Benchmarks • Prefix Scan – 16M elements
• Each Thread gets a chunk to scan, then the Master Thread reduces the scans of other threads to get the final soluAon
• Barriers used to do explicit SynchronizaAon
• Matrix Mul#ply – 1K x 1K • Each Thread Tiles a chunk of the matrix, and mulAplies to get the final matrix • Barriers used to do explicit SynchronizaAon
• Breadth First Search (BFS) – 1M ver#ces, 16M edges • Each Thread gets a chunk of the graph to search on • Atomic locks used on all graph verAces to ensure that no race condiAons occur in shared verAces
• Dijkstra – 16K ver#ces, 134M edges • Checking neighboring nodes for shortest path distances parallelized among threads • Explicit Barriers to progress each node check
![Page 17: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/17.jpg)
COMPUTER ARCHITECTURE GROUP
Agenda MoAvaAon CharacterizaAon Methodology
Characteriza#on Results Insights and Possible Improvements
![Page 18: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/18.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Prefix Scan
0
0,2
0,4
0,6
0,8
1
1,2
1 2 4 8 16 32 64 96 128 160 192 224 256 Normalized
Com
ple#
on Tim
e
Thread Count
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
• Shows Weak Scaling
![Page 19: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/19.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Prefix Scan
0
0,2
0,4
0,6
0,8
1
1,2
1 2 4 8 16 32 64 96 128 160 192 224 256 Normalized
Com
ple#
on Tim
e
Thread Count
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
• Shows Weak Scaling
![Page 20: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/20.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Prefix Scan
0
0,2
0,4
0,6
0,8
1
1,2
1 2 4 8 16 32 64 96 128 160 192 224 256 Normalized
Com
ple#
on Tim
e
Thread Count
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
• Shows Weak Scaling
![Page 21: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/21.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Prefix Scan
0
0,2
0,4
0,6
0,8
1
1,2
1 2 4 8 16 32 64 96 128 160 192 224 256 Normalized
Com
ple#
on Tim
e
Thread Count
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
• Shows Weak Scaling
![Page 22: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/22.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Prefix Scan
0
0,2
0,4
0,6
0,8
1
1,2
1 2 4 8 16 32 64 96 128 160 192 224 256 Normalized
Com
ple#
on Tim
e
Thread Count
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
• Shows Weak Scaling
![Page 23: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/23.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Prefix Scan
0
0,2
0,4
0,6
0,8
1
1,2
1 2 4 8 16 32 64 96 128 160 192 224 256 Normalized
Com
ple#
on Tim
e
Thread Count
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
• Shows Weak Scaling
Weak Scaling
![Page 24: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/24.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Prefix Scan
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
1 S 2 S 4 S 8 S 16 S 32 S 64 S 96 S
128 S
160 S
192 S
224 S
256 S
Normalized
Com
ple#
on Tim
e
Thread Count
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
• SoRBound has higher overheads due to extra compute and memory accesses
• ‘S’ shows results with SoRBound
Baseline So5Bound
![Page 25: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/25.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Prefix Scan
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
1 S 2 S 4 S 8 S 16 S 32 S 64 S 96 S
128 S
160 S
192 S
224 S
256 S
Normalized
Com
ple#
on Tim
e
Thread Count
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
• Concurrency hides SoRBound’s overhead at high thread counts
• More Compute than CommunicaAon in this Workload
• ‘S’ shows results with SoRBound
Concurrency Improvement!
Baseline So5Bound
![Page 26: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/26.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Matrix Multiply
0
0,5
1
1,5
2
2,5
1 S 2 S 4 S 8 S 16 S 32 S 64 S 96 S
128 S
160 S
192 S
224 S
256 S Normalized
Com
ple#
on Tim
e
Thread Count
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
0
0,05
0,1
160 S
192 S
224 S
256 S
• Significant Overhead with SoRBound
• Memory Bounds ApplicaAons lead to more Metadata and security checks
• ‘S’ shows results with SoRBound Baseline So5Bound
![Page 27: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/27.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Matrix Multiply
0
0,5
1
1,5
2
2,5
1 S 2 S 4 S 8 S 16 S 32 S 64 S 96 S
128 S
160 S
192 S
224 S
256 S Normalized
Com
ple#
on Tim
e
Thread Count
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
0
0,05
0,1
160 S
192 S
224 S
256 S
• Concurrency does reduce SoRBound’s overhead at high thread counts
• However, overhead sAll quite significant
• ‘S’ shows results with SoRBound Baseline So5Bound
![Page 28: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/28.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Matrix Multiply
0
0,5
1
1,5
2
2,5
1 S 2 S 4 S 8 S 16 S 32 S 64 S 96 S
128 S
160 S
192 S
224 S
256 S Normalized
Com
ple#
on Tim
e
Thread Count
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
0
0,05
0,1
160 S
192 S
224 S
256 S
• One can get the Efficiency of the baseline by using addiAonal Threads for SoRBound
• ‘S’ shows results with SoRBound
Baseline So5Bound
So5Bound Equal to Baseline
![Page 29: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/29.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Breadth First Search (BFS)
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
1 S 2 S 4 S 8 S 16 S 32 S 64 S 96 S
128 S
160 S
192 S
224 S
256 S
Normalized
Com
ple#
on Tim
e
Thread Count
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
0
0,1
0,2
160 S
192 S
224 S
256 S
• Another Graph Workload, however with higher scalability and Locality
• Concurrency does reduce SoRBound’s overhead at high thread counts
• However, overhead sAll quite significant
• ‘S’ shows results with SoRBound
Baseline So5Bound
![Page 30: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/30.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Breadth First Search (BFS)
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
1 S 2 S 4 S 8 S 16 S 32 S 64 S 96 S
128 S
160 S
192 S
224 S
256 S
Normalized
Com
ple#
on Tim
e
Thread Count
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
0
0,1
0,2
160 S
192 S
224 S
256 S
• Concurrency helps reduce SoRBound overhead at high thread counts
• However, overhead sAll quite significant because of fine grained synchronizaAon in this workload
• ‘S’ shows results with SoRBound
Baseline So5Bound
![Page 31: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/31.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Dijkstra
0
0,5
1
1,5
2
2,5
1 S 2 S 4 S 8 S 16 S 32 S 64 S 96 S
128 S
160 S
192 S
224 S
256 S Normalized
Com
ple#
on Tim
e
Thread Count
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
0
0,2
16 S 32 S 64 S 96 S
• SoRBound results in higher on-‐chip and off-‐chip data accesses, due to large working set
• ‘S’ shows results with SoRBound
Baseline So5Bound
![Page 32: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/32.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Dijkstra
0
0,5
1
1,5
2
2,5
1 S 2 S 4 S 8 S 16 S 32 S 64 S 96 S
128 S
160 S
192 S
224 S
256 S Normalized
Com
ple#
on Tim
e
Thread Count
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
0
0,2
16 S 32 S 64 S 96 S
• Scalability is limited due to fine grained communicaAon
• SoRBound checks within criAcal secAons hurts performance
• ‘S’ shows results with SoRBound
Baseline So5Bound
![Page 33: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/33.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Summary of Slowdowns
1
1,5
2
2,5
3
3,5
4
0 50 100 150 200 250
So5Bo
und’s N
ormalized
Overhead
Thread Count
PREFIX
Concurrency Improvement!
Baseline Scales Faster Ini#ally
• Concurrency helps hide SoRBound Overheads
![Page 34: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/34.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Summary of Slowdowns
1
1,5
2
2,5
3
3,5
4
0 50 100 150 200 250
Slow
down du
e to So5
Boun
d
Thread Count
MATMUL
Larger Metadata Size Inhibits Scalability
So5Bound Scales Ini#ally
• Concurrency helps hide SoRBound Overheads iniAally at ~32-‐64 threads • Then worsens since metadata impacts the available local cache size
![Page 35: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/35.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Summary of Slowdowns
1
1,5
2
2,5
3
3,5
4
0 50 100 150 200 250
Slow
down du
e to So5
Boun
d
Thread Count
BFS
Larger Metadata Size Inhibits Scalability
So5Bound Scales Ini#ally
• Concurrency helps hide SoRBound Overheads iniAally at ~2-‐8 threads • Then worsens since metadata impacts criAcal secAons
![Page 36: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/36.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Summary of Slowdowns
1
1,5
2
2,5
3
3,5
4
0 50 100 150 200 250
Slow
down du
e to So5
Boun
d
Thread Count
DIJK
Larger Metadata Size Inhibits Scalability
So5Bound Scales a bit
Cache Effects
• Concurrency helps hide SoRBound Overheads iniAally at ~2-‐8 threads • Then worsens since metadata impacts criAcal secAons
![Page 37: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/37.jpg)
COMPUTER ARCHITECTURE GROUP
Benchmark Characterization : Slowdowns in Out-of-Order Cores
1
1,5
2
2,5
3
3,5
4
0 50 100 150 200 250
Slow
down du
e to So5
Boun
d
Thread Count
DIJK PREFIX BFS MATMUL • All Prior Results had In-‐Order Cores
• ReducAon in performance degradaAon observed
• OOOs improve criAcal code secAons
• Latency hiding helps So5Bound scale be)er
![Page 38: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/38.jpg)
COMPUTER ARCHITECTURE GROUP
0
0,5
1
1,5
2
2,5
IN OOC IN OOC IN OOC IN OOC
So5Bo
und Overhead
DIJK
Compute L1Cache-‐L2Home
L2Home-‐OffChip Synchroniza#on
Benchmark Characterization : In-Order vs. Out-of-Order
Prefix BFS Matrix Mul#ply
• Parallel SoRBound results for both In-‐Order and OOO core types
• At thread counts showing highest speedups
OOO Improvements in Compute Bound Workloads
• OOOs can not improve parallel performance much • Alterna#ve
architectural improvements needed
![Page 39: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/39.jpg)
COMPUTER ARCHITECTURE GROUP
Agenda MoAvaAon CharacterizaAon Methodology
CharacterizaAon Results Insights and Possible Improvements
![Page 40: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/40.jpg)
COMPUTER ARCHITECTURE GROUP
Insights from Characterization • Most boSlenecks in MulA-‐threaded SoRBound workloads stem from Memory Accesses and SynchronizaAon
• Latency Hiding does improve SoRBound using OoO Cores slightly
• Increase in thread count allows SoRBound to perform as well as a baseline at a lower thread counts
• However, addi#onal so5ware/architectural mechanisms are needed to further improve Performance
![Page 41: Exploring the Performance Implications of Memory Safety ... · COMPUTER ARCHITECTURE GROUP Exploring the Performance Implications of Memory Safety Primitives in Many-core Processors](https://reader035.vdocument.in/reader035/viewer/2022070804/5f036dfd7e708231d4092a09/html5/thumbnails/41.jpg)
COMPUTER ARCHITECTURE GROUP
Potential Future Improvements • Improve SoRBound’s Metadata structures
• Compression • BeSer Layout (e.g. a beSer tree type of structure)
• Use ParallelizaAon Strategies (e.g. Blocking) that are aware of Security Metadata’s presence
• Devise a Prefetcher for Security Metadata • Prefetch SoRBound metadata from DRAM • Hides SoRBounds latency