architecture support for secure computing
DESCRIPTION
Architecture Support for Secure Computing. Mikel Bezdek Chun Yee Yu CprE 585 Survey Project 12/10/04. Presentation Outline. Motivation Assumptions Attacks Proposed Solutions Pending questions and future research. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
Architecture Support for Secure Computing
Mikel BezdekChun Yee Yu CprE 585 Survey Project12/10/04
Presentation Outline
Motivation Assumptions Attacks Proposed Solutions Pending questions and future research
Motivation
Currently piracy of software and digital media is a huge problem
Attempts to solve with software solutions have proven easy to foil
Adding support at the hardware level is a promising solution
Assumptions
All solutions assume processor and on chip storage to be secure
Operating system and all peripherals, including off chip memory, are untrusted
ProcessorOS
Memory
I/O
Devices
Points of Attack
Because of untrusted memory attacks can occur on any transfers to or from external memory
Because of untrusted OS, attacks could occur at context switches, when OS takes control of operation
Memory Attacks
Adversaries may try to gain information from unprotected off chip memory by:Modifying data
Spoofing, Splicing, and Replay Attacks
Monitoring data access pattern (address bus)
Solutions
Basic XOM architecture XOM using One Time Pad Encryption Hash Trees Aegis Processor HIDE Architecture
XOM (Execute-Only Memory) Tamper Resistant Software
Software is encrypted using symmetric encryption, its key is encrypted using asymmetric encryption
Asymmetric Encryption - public key used by vendor, private key used by XOM chip
Symmetric Encryption - the private key is unique to each program, also called the XOM ID
Secured Computing Enforces access restrictions using tagged and
encrypted storage Encrypted code execution using on-chip decryption
XOM Internal Security
L2 Cache lines tagged with XOM ID with valid bits for each word in cache line
L1 Cache lines are tagged with a XOM ID Registers are tagged with a XOM ID XOM ID is kept in a table in the XOM chip
XOM Context Switches
Involves 4 special registers:Data register - Data is packaged into movable
(by the interrupting application), read-write protected data. A mutating key and XOM ID is used for packaging.
Hash registers (2) - 128 bit hash is made from the package, stored in two 64 bit registers
XOM ID register - storing XOM tag
XOM and External Memory
Encrypts data with XOM ID and creates a hash (MAC)
Message Authentication Code – a keyed one way hash, protects against spoofing and slicing attacks
XOM Performance Issues
Optimizations:Use a reversible CRC instead of MACDedicated, pipelined DES
encryption/decryption hardware. Max of 50% slowdown assuming a 48
cycle Triple DES implementation and 100 cycle memory access latency.
XOM with One-Time Pad
Average XOM slowdown is 16.7% on SPEC 2000 benchmarks
Around 30% slowdown on memory intensive programs
One-Time Pad encryption can be used to remove encryption/decryption from critical path
XOM with OTP
Proposed OTP solution Cipher = plain encryptedkey(address + seq) Plain = cipher encryptedkey(address + seq)
key = XOM ID address = virtual address of data/instruction seq = mutating sequence number
encryptedkey(address + seq) is concurrent with memory access
Encryption/decryption requires a one cycle XOR operation
XOM with OTP
Cipher = plain encryptedkey(address + seq) Plain = cipher encryptedkey(address + seq)
key = XOM ID address = virtual address of data/instruction seq = mutating sequence number
XOM with OTP
Sequence Number Cache (SNC)Stores sequence numbers for each cache lineAccessed by virtual address of cache lineLimited size
Use replacement – store parts of SNC in unsecured memory
No replacement – OTP on some data, can’t use OTP on rest of data
XOM with OTP
Sequence Number Cache operationHits – sequence number is accessed and
passed on to the encryption unitMisses
No replacement – default back to original XOM, where encryption is performed after memory access. Costs 100 + 50 cycles
With replacement – fetch sequence number memory, then perform encryption
XOM with OTP
SNC and Context SwitchingDump to memory with encryptionTag SNC entries with XOM ID
XOM with OTP
Performance16.7% XOM average slowdown4.59% XOM w/ OTP – No Replacement1.28% XOM w/ OTP – With Replacement
1.035% max additional memory traffic
Hash Trees
Memory Integrity Verification Allows the secure processor to ensure that
the data it reads from memory matches the data most recently written
ProtectionSpoofingSplicingReplay
Hash Tree - Details
Works by calculating a hash of data Hash is easy to compute given data, but hard to find
data which will result in an equivalent hash
Data
H H H H H H H H
H HHH
H
H H
Secure
Hash Tree - Details Calculated when accessing memory
No need to calculate hash for a cache hit Data can be given speculatively to the processor while
hash is generated and checked Speculative commits
Allowed using fetched but unverified data Exception raised by hash checker does not need to be
recovered from Stalls on hash checker when using processor’s secret
key Simulations done show that with caching of hashes an
average overhead of less than 20% can be achieved
Aegis Architecture
Uses concepts from XOM and hash trees to create a “private and authenticated tamper-resistant environment” for the processor to run in
This means that data is private from any observers and that any tampering will be detected
Aegis Architecture
Allows a user to trust the results from a program System Authentication Program Authentication Message Authentication
This is accomplished by the sign_msg instruction, which encrypts a message and a hash of the program with the processor’s secret key before sending back to the user
Aegis Architecture
To provide environment, 3 key things must be doneMemory Integrity VerificationEncryption/Decryption of off-chip memoryContext Switches managed securely
Aegis – Memory Integrity Verification
Accomplished using hash trees Introduces new twist on hash trees, log hash In log hash, only memory accesses leading up to
a sign_msg instruction are verified Greatly reduces cost of verification while not
sacrificing much security
Aegis – Off chip memory
Data stored in the off chip memory is encrypted and decrypted using the one time pad xom scheme to hide latency
Pads are generated using the address of the data combined with a time stamp, incremented at every write-back
Time stamps are needed before calculation of pad can begin, so caching of timestamps is a good idea
Aegis – secure context switches
Uses a Secure Context Manager Maintains a table of all processes Table entry contains: secure process ID (SPID),
program hash, register values, and hash for off-chip memory verification
Table stored in memory, but can be cached for recent processes
In addition, cache entries are tagged with SPID to ensure a process cannot gain access to another process’s data
Aegis - Overhead
Overhead of SCM in negligible, main slow down comes from integrity verification and encryption of memory
Using l-hashes and OTP encryption, authors were able to see an average overhead of < 25%, with a worst case of 55% of tested benchmarks
HIDE - Motivation
Addresses the problem of secure information leaking due to monitoring of the address bus
Access patterns reveal information about branchingCan be compared with known branching
patterns to identify IP reused in secure process
HIDE – Critical Idea
Addresses from the processor are remapped before being sent to memory
Mapping is done using a permutation function to ensure a random mapping
Current mapping (permutation vector) must be stored on chip
HIDE - Implementation
To ensure that attackers cannot see patterns in memory accesses, each access from a current pv must happen once
Implemented with locking cache blocks
HIDE – Hide Cache
Modified L2 cache Cache hits (R and W) unmodified When a block is loaded on a cache miss, it
is locked A locked block cannot be replaced When all blocks are locked, permutation
must be done, which unlocks all blocks
HIDE – Permutation Steps
A new pv is created mapping set of all current memory addresses to new addresses
Blocks are loaded sequentially from memory and stored in their new location (pv[i]) in an on-chip buffer
Buffer is written back sequentially to memory If on-chip buffer size S is less then memory size,
M, process must be repeated M/S times
HIDE - Improvements
Since permutation is a lengthy operation, don’t want to wait until all cache blocks are locked
Idea of pre-permutation – start permutation when half of cache blocks are locked
HIDE - Improvements
Instead of permuting entire memory at once, permute chunks at a time
Chunk size is one or more pages Memory accesses within a chunk preserve
security, only accesses across chunks leak information. Reduce by: Larger chunk size Store code to minimize inter-chunk access
Requires maintaining info about each page
HIDE - Results
Simulated using super scalar on SPEC2K benchmarks
Average slowdown was only 1.3% Memory bandwidth used was on average
9% of total
HIDE - Conclusions
Provides high level of security without imposing must loss in performance
Requires slight modification to L2 cache, addition of permutation hardware
Will not work for multiprocessor systems, since the pv and locking info must be communicated on unsecured bus
In Summary
Supporting software security with hardware is a developing field
Assumes basic model of secure processor with private half of public-private key pair
XOM with OTP keeps memory private, hashes ensure memory is tamper free, and permutation scheme can be used to secure address bus
When combined, allows users to trust results from a secure processor and software developers to create copy-proof software
Pending Questions
Will users accept performance losses in order to gain security
Will vendors support secure processing Problems relating to secret (private) key
stored on processor