architectures for secure processing matt devuyst
TRANSCRIPT
Architectures for Secure Processing
Matt DeVuyst
Research Exam - Matt DeVuyst 2
Introduction
L2
L1 - D
L1 - I
Pipeline,Functional Units
L3
MainMemory
Memory Bus
CPU
Line of TrustPoints of Attack
EDU
Keys
EncryptionDecryptionUnitand keys
Research Exam - Matt DeVuyst 3
Introduction
What kind of security? Protection of what? For whom? From whom/what?
This work focuses on: Protection of execution (process data and control flow) Protection for users, copyright holders, software companies Protection from all other processes (including OS) and
physical attack This work focuses on general purpose security
mechanisms for general purpose computers.
Research Exam - Matt DeVuyst 4
Introduction
This research takes an architecture-centric approach. Cryptographic algorithms may be utilized but they
will not be proven Focus is given to hardware support
Software and OS reap the benefits
Research Exam - Matt DeVuyst 5
Goals
Execution Privacy Process control flow and data exposed only to the
CPU Execution Integrity
Process control flow and data cannot be tampered with without detection
Research Exam - Matt DeVuyst 6
Outline
Execution Privacy Execution Integrity Proposed Architectures Conclusions and Open Questions
Research Exam - Matt DeVuyst 7
Outline
Execution Privacy Naïve Encryption One Time Pad (OTP) Encryption Improved OTP Encryption
Execution Integrity Proposed Architectures Conclusions and Open Questions
Research Exam - Matt DeVuyst 8
Naïve Encryption
Encryption/DecryptionUnit
CPUMemory
Memory BusPlaintext Data Cyphertext Data Cyphertext DataPlaintext Data
Research Exam - Matt DeVuyst 9
A Closer Look At the Encryption/Decryption Unit
AES in Cipher Block Chaining (CBC) Mode
Research Exam - Matt DeVuyst 10
Issues With Naïve Encryption
On the critical path → Performance suffers Not secure against all attacks
Research Exam - Matt DeVuyst 11
Why Naïve Encryption Is Not Secure
Plaintext Ciphertextti
me
Pattern is identical
Encrypt Data Only
Research Exam - Matt DeVuyst 12
Why Naïve Encryption Is Not Secure
Plaintext Ciphertextti
me
Pattern is still identical
Encrypt Data/Address
Writes to same address
Research Exam - Matt DeVuyst 13
Why Naïve Encryption Has Poor Performance Stores are effectively immune to encryption
latency Store buffer
Loads that miss in the cache cost: Time to bring in data from memory Time to decrypt that data
time
Memory Latency Decryption LatencyLoad Instruction
Research Exam - Matt DeVuyst 14
Outline
Execution Privacy Naïve Encryption One Time Pad (OTP) Encryption* Improved OTP Encryption
Execution Integrity Proposed Architectures Conclusions and Open Questions
* Suh, et al. “Efficient Memory Integrity Verification and Encryption for Secure Processors” – MITand Yang et al. “Fast Secure Processor for Inhibiting Software Piracy and Tampering” – UC Riverside
Research Exam - Matt DeVuyst 15
How OTP Encryption/Decryption WorksEncryption Decryption
Research Exam - Matt DeVuyst 16
Why OTP Encryption is Secure
Plaintext Ciphertextti
me
No pattern is expressed
Encrypt addr, seq #
Writes to same address
Research Exam - Matt DeVuyst 17
How OTP Encryption Solves the Performance Problem Decryption done in parallel with load
Taken off the critical path The key to how it works
Decryption cannot depend on ciphertext
time
Memory Latency Decryption LatencyLoad Instruction XOR
Research Exam - Matt DeVuyst 18
The Achilles’ Heel of OTP Encryption
Sequence number must be available long before memory access completes
time
Memory Latency
Decryption Latency
Load Instruction
Sequence numberavailable here
Sequence number associated with every cache-block-sized chunk of memory→ Cannot keep all sequence numbers on chip
XOR
One solution: sequence number cache
Research Exam - Matt DeVuyst 19
Outline
Execution Privacy Naïve Encryption One Time Pad (OTP) Encryption Improved OTP Encryption*
Execution Integrity Proposed Architectures Conclusions and Open Questions
* Shi, et al. “High Efficiency Counter Mode Security Architecture Via Prediction and Precomputation” – Georgia Tech
Research Exam - Matt DeVuyst 20
Solutions To the OTP Problem Prediction and Precomputation
Predict sequence number Precompute pad When memory access completes, compare real
sequence number with predicted one If they match, use precomputed pad If they don’t match, compute real pad
Research Exam - Matt DeVuyst 21
Prediction and Precomputation
TLBRoot Seq #
Root Seq #
Root Seq #
Root Seq #
Page of memoryReal Seq #
Real Seq #
Real Seq #
Real Seq #
Real Seq #
Page table entry
Cache block
Research Exam - Matt DeVuyst 22
Prediction and Precomputation
TLB129145
637432
179966
Page of memory343923
343923
343923
Page table entry
Cache block
343923
343923
Initially, all sequence numbers areset to page’s root sequence number
343923
Research Exam - Matt DeVuyst 23
Prediction and Precomputation
TLB129145
343923
637432
179966
Page of memory343925
343924
343935
Page table entry
Cache block
343933
343925
Writes increment the sequence numbers
Research Exam - Matt DeVuyst 24
Prediction and Precomputation
TLB129145
637432
179966
Page of memory343925
343924
343935
Page table entry
Cache block
343933
343923
Start predictions with this
343925
Memory Latency
Generate pad for seq # 343923
Load Instruction
Generate pad for seq # 343924
Generate pad for seq # 343925
Research Exam - Matt DeVuyst 25
Better Prediction and Precompuatation Problem: Frequently updated data will have
sequence number beyond prediction depth One solution:
Reset root sequence number Use a prediction history for each page This is called “adaptive prediction”
TLBRoot Seq #
Root Seq #
Root Seq #
Root Seq #
Page table entry
Prediction History
Prediction History
Prediction History
Prediction History
Research Exam - Matt DeVuyst 26
Better Prediction and Precompuatation Problem: Frequently updated data will have
sequence number beyond prediction depth Another solution:
Record past difference (diff) between root sequence number and real sequence number
On subsequent load, make predictions around root sequence number + diff
This is called “context-based” predictionTLB
Root Seq #
Root Seq #
Root Seq #
Root Seq #
Page table entry diff
Register
Research Exam - Matt DeVuyst 27
Prediction and Precomputation Accuracy
“Adaptive prediction” is reported to be about 80% accurate* “Context-based prediction” is reported to be close to 100%
accurate* (though this has not yet been verified by other researchers).
Cost Larger TLB Slightly larger memory footprint and bandwidth requirement
Conclusion Using OTP with optimizations, decryption latency is almost
completely hidden.
* Shi, et al. “High Efficiency Counter Mode Security Architecture Via Prediction and Precomputation” – Georgia Tech
Research Exam - Matt DeVuyst 28
Outline
Execution Privacy Execution Integrity
Basic Execution Integrity Cached Hash Trees Log Hashing
Proposed Architectures Conclusions and Open Questions
Research Exam - Matt DeVuyst 29
Execution Integrity – Basic Idea
On a write… Keyed hash is taken over data and address Data and hash are stored in memory
On a read… Data and hash are returned from memory Hash is computed Compare computed hash and returned hash
CPU Memory
Data Hash(Key,Data,Address)
Data Hash(Key,Data,Address)
Hash(Key,Data,Address)
Research Exam - Matt DeVuyst 30
Security Analysis of Basic Execution Integrity
Arbitrary data cannot be introduced because: The hash is keyed and An attacker does not know the key
Data stored at one address cannot be substituted for data stored at another address because: Hashing the data along with the address binds the two
But a replay attack is possible because: An attacker may replay stale data previously stored at the
given address
Research Exam - Matt DeVuyst 31
Outline
Execution Privacy Execution Integrity
Basic Execution Integrity Cached Hash Trees* Log Hashing
Proposed Architectures Conclusions and Open Questions
* Blum, et al. “Checking the Correctness of Memories” – UC BerkleyGassend, et al. “Caches and Hash Trees for Efficient Memory Integrity Verification” – MITMerkle, et al. “Protocols for Public Key Cryptography”
Research Exam - Matt DeVuyst 32
Cached Hash Trees
Fundamental problem with basic hashing Hashes verified data integrity, but nothing verified
the integrity of the hashes A solution: cached hash trees
Keyed hashes are taken over data Keyed hashes are taken over those hashes, etc. Problem: memory requirement of hashes
Solution: Hashes are stored in memory and cached on-chip along with data.
Research Exam - Matt DeVuyst 33
Cached Hash Trees
How it works A tree is built Leaf nodes contain data Intermediate nodes are
hashes The root hash is kept in a
special register on-chip Hashes are only updated
when necessaryData Block
Hash Hash Hash Hash
HashHash Hash Hash
Hash HashHash Hash
Hash
Research Exam - Matt DeVuyst 34
Cached Hash Tree Consistency Invariant:
If a node is in memory→ then it’s parent hash is consistent with it (whether the hash is in the cache or in memory)
Research Exam - Matt DeVuyst 35
Cached Hash Tree Consistency
Cache Memory
= Up-to-date hash = Outdated hash
Data
Parent Hash
Grandparent Hash
hashes are not updatedIf data is written …
Research Exam - Matt DeVuyst 36
Cached Hash Tree Consistency
Cache Memory
= Up-to-date hash = Outdated hash
Data
Parent Hash
Grandparent Hash
parent hash in cache is updatedIf dirty data is evicted …
Research Exam - Matt DeVuyst 37
Cached Hash Tree Consistency
Cache Memory
= Up-to-date hash = Outdated hash
Data
Parent Hash
Grandparent Hash
parent hash in cache is updatedIf a hash block is evicted …
Research Exam - Matt DeVuyst 38
Cached Hash Tree Consistency
Cache Memory
= Up-to-date hash = Outdated hash
Data
Parent Hash
Grandparent Hash
1. The parent is loaded and verified against grandparent.
If data is loaded and parenthash is not in the cache …
2. Then the data is verified against its parent.
Research Exam - Matt DeVuyst 39
Performance Analysis of Cached Hash Trees Common case: Hash nodes are in cache
Data evictions only require an update to a cached node Data loads only require one hash check with cached node
Uncommon case: Hash nodes are not in the cache Data evictions require hash node loads Data loads require hash node loads
Passing hash nodes across the memory bus cuts into the bandwidth of data
Hash nodes occupy space in the cache
Research Exam - Matt DeVuyst 40
Outline
Execution Privacy Execution Integrity
Basic Execution Integrity Cached Hash Trees Log Hashing*
Proposed Architectures Conclusions and Open Questions
* Suh, et al. “Efficient Memory Integrity Verification and Encryption for Secure Processors” – MIT
Research Exam - Matt DeVuyst 41
Log Hashing
Key insight Verification is not necessary at every load Verification is necessary before application results
are produced Implication
Relax constraint on constant, vigilant verification
Research Exam - Matt DeVuyst 42
Log Hashing – Incremental Multiset Hashes* Incremental
Keyed hash is not computed over all data, just additional data
Multiset Duplicate items are
allowed Multiplicity of items is
significant Order of items is not
Hash
Set 1
Set 2
=
Hash Engine
* Clarke, et al. “Incremental Multiset Hash Functions and Their Application to Memory Integrity Checking” – MIT
Research Exam - Matt DeVuyst 43
Log Hashing
2 incremental multiset hashes WriteHash
Hashes everything evicted from cache (written to memory)
ReadHash Hashes everything fetched from memory
Counters are associated with memory operations and keyed hashes taken over (data, counter, address)
Research Exam - Matt DeVuyst 44
Log Hashing
3 phases of operation Initialization
All program data written out to memory (hashed into WriteHash)
Run-time Hash of every eviction is added to WriteHash Hash of every fetch is added to ReadHash
Verification All data not in cache is brought in (hashing into ReadHash) ReadHash compared to WriteHash. If equal, integrity
maintained. Else, integrity violated.
Research Exam - Matt DeVuyst 45
Log Hashing - Initialization
Write Hash Read Hash
Memory
Cache
Research Exam - Matt DeVuyst 46
Log Hashing – Run-time
Write Hash Read Hash
Memory
Cache
Research Exam - Matt DeVuyst 47
Log Hashing – Run-time
Write Hash Read Hash
Memory
Cache
Research Exam - Matt DeVuyst 48
Log Hashing – Verification
Write Hash Read Hash
Memory
Cache
=
Research Exam - Matt DeVuyst 49
Log Hashing – Performance Analysis Initialization and verification are very costly We assume initialization and verification are
rare occurrences. Run-time hashing has no overhead Loading/storing sequence numbers in
memory incurs a small performance overhead and a small memory overhead.
Research Exam - Matt DeVuyst 50
Log Hashing – Security Analysis If data is tampered with in memory:
ReadHash will be different from WriteHash. If data was returned from memory more times
than it was written (as in a replay attack): The multiplicity of hashed items will not match →
hashes will not match. If data is returned from memory out of order:
The hashes won’t match because different counter values would have been hashed in with the data.
Research Exam - Matt DeVuyst 51
Outline
Execution Privacy Execution Integrity Proposed Architectures
XOM SP AEGIS SENSS
Conclusions and Open Questions
Research Exam - Matt DeVuyst 52
Proposed Architectures
XOM* First of its kind Uses naïve privacy and integrity mechanisms Slow and vulnerable to attack Keys for encryption and hashing burned on chip
* Lie, et al. “Architectural Support for Copy and Tamper Resistant Software” – Stanford
Research Exam - Matt DeVuyst 53
Proposed Architectures
Secret-Protected* Based on XOM Uses naïve privacy and integrity mechanisms Decouples secret from device
Key stored on chip only during user session User keys are separate from device secret (hardware
key) and are transferable
* Lee, et al. “Architecture for Protecting Critical Secrets in Microprocessors” – Princeton
Research Exam - Matt DeVuyst 54
Proposed Architectures
AEGIS* Uses OTP encryption for privacy
without performance optimizations like prediction and precomputation
Uses cached hash trees for integrity Hides device keys using Physically Random
Functions (PRFs) The circuit timing characteristics of a particular chip are
unique and impossible to measure. PUFs exploit this to create device secrets
* Suh, et al. “Design and Implementation of the AEGIS Single-Chip Secure Processor Using Physical Random Functions” – MIT
Research Exam - Matt DeVuyst 55
Proposed Architectures
SENSS* Uses simple OTP encryption scheme like AEGIS Uses cached hash tree scheme like AEGIS Adds support for multiprocessor systems
Each device has its own key Combination Cipher Block Chaining and One Time Pad
mode encryption is used for cache-to-cache transfers
* Zhang, et al. “SENSS: Security Enhancement to Symmetric Shared Memory Multiprocessors” - UTD
Research Exam - Matt DeVuyst 56
Outline
Execution Privacy Execution Integrity Proposed Architectures Conclusions and Open Questions
Research Exam - Matt DeVuyst 57
Conclusions – OTP
Execution privacy is solved by OTP encryption (with optimizations) Secure against all system-level attacks and
physical attacks (outside processor). Almost no performance cost
Research Exam - Matt DeVuyst 58
Conclusions – Cached Hash Trees Cached hash trees are secure against all
known attacks But they have potentially poor performance
No research has been done to stress test them Performance is bad when hash tree is not in cache
→ a large working set or pathological access pattern may result in poor performance
Research Exam - Matt DeVuyst 59
Conclusions – Log Hashing
Log hashing is secure as long as verification is done before results are used How do you ensure that results are not consumed
by users or other applications e.g. disk writes, network writes, shared memory, screen
refresh, OS interrupts
Log hashing has good performance if verification is infrequent But what if it’s not? How many applications
require frequent verification?
Research Exam - Matt DeVuyst 60
Conclusions – Keys
Execution privacy and integrity require keys Keys must be protected, even if OS is compromised
or physical attack How should keys be protected?
Are Physically Random Functions really resistant to physical attack?
How should device public keys be used? Should the manufacturer publish them? How should revocation work? What happens if ownership of the device is transferred?
Architectures for Secure Processing
Matt DeVuyst
Research Exam - Matt DeVuyst 62
Cached Hash Tree Consistency
Cache Memory
= Up-to-date hash = Outdated hash
Data
Parent Hash
Grandparent Hash
1. The parent is loaded and verified against grandparent.
If dirty data is evicted andparent hash is not in the cache …
2. Then the parent is updated