architectures for secure processing matt devuyst

62
Architectures for Secure Processing Matt DeVuyst

Upload: andre-graff

Post on 15-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Architectures for Secure Processing Matt DeVuyst

Architectures for Secure Processing

Matt DeVuyst

Page 2: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 2

Introduction

L2

L1 - D

L1 - I

Pipeline,Functional Units

L3

MainMemory

Memory Bus

CPU

Line of TrustPoints of Attack

EDU

Keys

EncryptionDecryptionUnitand keys

Page 3: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 3

Introduction

What kind of security? Protection of what? For whom? From whom/what?

This work focuses on: Protection of execution (process data and control flow) Protection for users, copyright holders, software companies Protection from all other processes (including OS) and

physical attack This work focuses on general purpose security

mechanisms for general purpose computers.

Page 4: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 4

Introduction

This research takes an architecture-centric approach. Cryptographic algorithms may be utilized but they

will not be proven Focus is given to hardware support

Software and OS reap the benefits

Page 5: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 5

Goals

Execution Privacy Process control flow and data exposed only to the

CPU Execution Integrity

Process control flow and data cannot be tampered with without detection

Page 6: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 6

Outline

Execution Privacy Execution Integrity Proposed Architectures Conclusions and Open Questions

Page 7: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 7

Outline

Execution Privacy Naïve Encryption One Time Pad (OTP) Encryption Improved OTP Encryption

Execution Integrity Proposed Architectures Conclusions and Open Questions

Page 8: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 8

Naïve Encryption

Encryption/DecryptionUnit

CPUMemory

Memory BusPlaintext Data Cyphertext Data Cyphertext DataPlaintext Data

Page 9: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 9

A Closer Look At the Encryption/Decryption Unit

AES in Cipher Block Chaining (CBC) Mode

Page 10: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 10

Issues With Naïve Encryption

On the critical path → Performance suffers Not secure against all attacks

Page 11: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 11

Why Naïve Encryption Is Not Secure

Plaintext Ciphertextti

me

Pattern is identical

Encrypt Data Only

Page 12: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 12

Why Naïve Encryption Is Not Secure

Plaintext Ciphertextti

me

Pattern is still identical

Encrypt Data/Address

Writes to same address

Page 13: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 13

Why Naïve Encryption Has Poor Performance Stores are effectively immune to encryption

latency Store buffer

Loads that miss in the cache cost: Time to bring in data from memory Time to decrypt that data

time

Memory Latency Decryption LatencyLoad Instruction

Page 14: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 14

Outline

Execution Privacy Naïve Encryption One Time Pad (OTP) Encryption* Improved OTP Encryption

Execution Integrity Proposed Architectures Conclusions and Open Questions

* Suh, et al. “Efficient Memory Integrity Verification and Encryption for Secure Processors” – MITand Yang et al. “Fast Secure Processor for Inhibiting Software Piracy and Tampering” – UC Riverside

Page 15: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 15

How OTP Encryption/Decryption WorksEncryption Decryption

Page 16: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 16

Why OTP Encryption is Secure

Plaintext Ciphertextti

me

No pattern is expressed

Encrypt addr, seq #

Writes to same address

Page 17: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 17

How OTP Encryption Solves the Performance Problem Decryption done in parallel with load

Taken off the critical path The key to how it works

Decryption cannot depend on ciphertext

time

Memory Latency Decryption LatencyLoad Instruction XOR

Page 18: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 18

The Achilles’ Heel of OTP Encryption

Sequence number must be available long before memory access completes

time

Memory Latency

Decryption Latency

Load Instruction

Sequence numberavailable here

Sequence number associated with every cache-block-sized chunk of memory→ Cannot keep all sequence numbers on chip

XOR

One solution: sequence number cache

Page 19: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 19

Outline

Execution Privacy Naïve Encryption One Time Pad (OTP) Encryption Improved OTP Encryption*

Execution Integrity Proposed Architectures Conclusions and Open Questions

* Shi, et al. “High Efficiency Counter Mode Security Architecture Via Prediction and Precomputation” – Georgia Tech

Page 20: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 20

Solutions To the OTP Problem Prediction and Precomputation

Predict sequence number Precompute pad When memory access completes, compare real

sequence number with predicted one If they match, use precomputed pad If they don’t match, compute real pad

Page 21: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 21

Prediction and Precomputation

TLBRoot Seq #

Root Seq #

Root Seq #

Root Seq #

Page of memoryReal Seq #

Real Seq #

Real Seq #

Real Seq #

Real Seq #

Page table entry

Cache block

Page 22: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 22

Prediction and Precomputation

TLB129145

637432

179966

Page of memory343923

343923

343923

Page table entry

Cache block

343923

343923

Initially, all sequence numbers areset to page’s root sequence number

343923

Page 23: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 23

Prediction and Precomputation

TLB129145

343923

637432

179966

Page of memory343925

343924

343935

Page table entry

Cache block

343933

343925

Writes increment the sequence numbers

Page 24: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 24

Prediction and Precomputation

TLB129145

637432

179966

Page of memory343925

343924

343935

Page table entry

Cache block

343933

343923

Start predictions with this

343925

Memory Latency

Generate pad for seq # 343923

Load Instruction

Generate pad for seq # 343924

Generate pad for seq # 343925

Page 25: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 25

Better Prediction and Precompuatation Problem: Frequently updated data will have

sequence number beyond prediction depth One solution:

Reset root sequence number Use a prediction history for each page This is called “adaptive prediction”

TLBRoot Seq #

Root Seq #

Root Seq #

Root Seq #

Page table entry

Prediction History

Prediction History

Prediction History

Prediction History

Page 26: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 26

Better Prediction and Precompuatation Problem: Frequently updated data will have

sequence number beyond prediction depth Another solution:

Record past difference (diff) between root sequence number and real sequence number

On subsequent load, make predictions around root sequence number + diff

This is called “context-based” predictionTLB

Root Seq #

Root Seq #

Root Seq #

Root Seq #

Page table entry diff

Register

Page 27: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 27

Prediction and Precomputation Accuracy

“Adaptive prediction” is reported to be about 80% accurate* “Context-based prediction” is reported to be close to 100%

accurate* (though this has not yet been verified by other researchers).

Cost Larger TLB Slightly larger memory footprint and bandwidth requirement

Conclusion Using OTP with optimizations, decryption latency is almost

completely hidden.

* Shi, et al. “High Efficiency Counter Mode Security Architecture Via Prediction and Precomputation” – Georgia Tech

Page 28: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 28

Outline

Execution Privacy Execution Integrity

Basic Execution Integrity Cached Hash Trees Log Hashing

Proposed Architectures Conclusions and Open Questions

Page 29: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 29

Execution Integrity – Basic Idea

On a write… Keyed hash is taken over data and address Data and hash are stored in memory

On a read… Data and hash are returned from memory Hash is computed Compare computed hash and returned hash

CPU Memory

Data Hash(Key,Data,Address)

Data Hash(Key,Data,Address)

Hash(Key,Data,Address)

Page 30: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 30

Security Analysis of Basic Execution Integrity

Arbitrary data cannot be introduced because: The hash is keyed and An attacker does not know the key

Data stored at one address cannot be substituted for data stored at another address because: Hashing the data along with the address binds the two

But a replay attack is possible because: An attacker may replay stale data previously stored at the

given address

Page 31: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 31

Outline

Execution Privacy Execution Integrity

Basic Execution Integrity Cached Hash Trees* Log Hashing

Proposed Architectures Conclusions and Open Questions

* Blum, et al. “Checking the Correctness of Memories” – UC BerkleyGassend, et al. “Caches and Hash Trees for Efficient Memory Integrity Verification” – MITMerkle, et al. “Protocols for Public Key Cryptography”

Page 32: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 32

Cached Hash Trees

Fundamental problem with basic hashing Hashes verified data integrity, but nothing verified

the integrity of the hashes A solution: cached hash trees

Keyed hashes are taken over data Keyed hashes are taken over those hashes, etc. Problem: memory requirement of hashes

Solution: Hashes are stored in memory and cached on-chip along with data.

Page 33: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 33

Cached Hash Trees

How it works A tree is built Leaf nodes contain data Intermediate nodes are

hashes The root hash is kept in a

special register on-chip Hashes are only updated

when necessaryData Block

Hash Hash Hash Hash

HashHash Hash Hash

Hash HashHash Hash

Hash

Page 34: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 34

Cached Hash Tree Consistency Invariant:

If a node is in memory→ then it’s parent hash is consistent with it (whether the hash is in the cache or in memory)

Page 35: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 35

Cached Hash Tree Consistency

Cache Memory

= Up-to-date hash = Outdated hash

Data

Parent Hash

Grandparent Hash

hashes are not updatedIf data is written …

Page 36: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 36

Cached Hash Tree Consistency

Cache Memory

= Up-to-date hash = Outdated hash

Data

Parent Hash

Grandparent Hash

parent hash in cache is updatedIf dirty data is evicted …

Page 37: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 37

Cached Hash Tree Consistency

Cache Memory

= Up-to-date hash = Outdated hash

Data

Parent Hash

Grandparent Hash

parent hash in cache is updatedIf a hash block is evicted …

Page 38: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 38

Cached Hash Tree Consistency

Cache Memory

= Up-to-date hash = Outdated hash

Data

Parent Hash

Grandparent Hash

1. The parent is loaded and verified against grandparent.

If data is loaded and parenthash is not in the cache …

2. Then the data is verified against its parent.

Page 39: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 39

Performance Analysis of Cached Hash Trees Common case: Hash nodes are in cache

Data evictions only require an update to a cached node Data loads only require one hash check with cached node

Uncommon case: Hash nodes are not in the cache Data evictions require hash node loads Data loads require hash node loads

Passing hash nodes across the memory bus cuts into the bandwidth of data

Hash nodes occupy space in the cache

Page 40: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 40

Outline

Execution Privacy Execution Integrity

Basic Execution Integrity Cached Hash Trees Log Hashing*

Proposed Architectures Conclusions and Open Questions

* Suh, et al. “Efficient Memory Integrity Verification and Encryption for Secure Processors” – MIT

Page 41: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 41

Log Hashing

Key insight Verification is not necessary at every load Verification is necessary before application results

are produced Implication

Relax constraint on constant, vigilant verification

Page 42: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 42

Log Hashing – Incremental Multiset Hashes* Incremental

Keyed hash is not computed over all data, just additional data

Multiset Duplicate items are

allowed Multiplicity of items is

significant Order of items is not

Hash

Set 1

Set 2

=

Hash Engine

* Clarke, et al. “Incremental Multiset Hash Functions and Their Application to Memory Integrity Checking” – MIT

Page 43: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 43

Log Hashing

2 incremental multiset hashes WriteHash

Hashes everything evicted from cache (written to memory)

ReadHash Hashes everything fetched from memory

Counters are associated with memory operations and keyed hashes taken over (data, counter, address)

Page 44: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 44

Log Hashing

3 phases of operation Initialization

All program data written out to memory (hashed into WriteHash)

Run-time Hash of every eviction is added to WriteHash Hash of every fetch is added to ReadHash

Verification All data not in cache is brought in (hashing into ReadHash) ReadHash compared to WriteHash. If equal, integrity

maintained. Else, integrity violated.

Page 45: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 45

Log Hashing - Initialization

Write Hash Read Hash

Memory

Cache

Page 46: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 46

Log Hashing – Run-time

Write Hash Read Hash

Memory

Cache

Page 47: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 47

Log Hashing – Run-time

Write Hash Read Hash

Memory

Cache

Page 48: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 48

Log Hashing – Verification

Write Hash Read Hash

Memory

Cache

=

Page 49: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 49

Log Hashing – Performance Analysis Initialization and verification are very costly We assume initialization and verification are

rare occurrences. Run-time hashing has no overhead Loading/storing sequence numbers in

memory incurs a small performance overhead and a small memory overhead.

Page 50: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 50

Log Hashing – Security Analysis If data is tampered with in memory:

ReadHash will be different from WriteHash. If data was returned from memory more times

than it was written (as in a replay attack): The multiplicity of hashed items will not match →

hashes will not match. If data is returned from memory out of order:

The hashes won’t match because different counter values would have been hashed in with the data.

Page 51: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 51

Outline

Execution Privacy Execution Integrity Proposed Architectures

XOM SP AEGIS SENSS

Conclusions and Open Questions

Page 52: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 52

Proposed Architectures

XOM* First of its kind Uses naïve privacy and integrity mechanisms Slow and vulnerable to attack Keys for encryption and hashing burned on chip

* Lie, et al. “Architectural Support for Copy and Tamper Resistant Software” – Stanford

Page 53: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 53

Proposed Architectures

Secret-Protected* Based on XOM Uses naïve privacy and integrity mechanisms Decouples secret from device

Key stored on chip only during user session User keys are separate from device secret (hardware

key) and are transferable

* Lee, et al. “Architecture for Protecting Critical Secrets in Microprocessors” – Princeton

Page 54: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 54

Proposed Architectures

AEGIS* Uses OTP encryption for privacy

without performance optimizations like prediction and precomputation

Uses cached hash trees for integrity Hides device keys using Physically Random

Functions (PRFs) The circuit timing characteristics of a particular chip are

unique and impossible to measure. PUFs exploit this to create device secrets

* Suh, et al. “Design and Implementation of the AEGIS Single-Chip Secure Processor Using Physical Random Functions” – MIT

Page 55: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 55

Proposed Architectures

SENSS* Uses simple OTP encryption scheme like AEGIS Uses cached hash tree scheme like AEGIS Adds support for multiprocessor systems

Each device has its own key Combination Cipher Block Chaining and One Time Pad

mode encryption is used for cache-to-cache transfers

* Zhang, et al. “SENSS: Security Enhancement to Symmetric Shared Memory Multiprocessors” - UTD

Page 56: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 56

Outline

Execution Privacy Execution Integrity Proposed Architectures Conclusions and Open Questions

Page 57: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 57

Conclusions – OTP

Execution privacy is solved by OTP encryption (with optimizations) Secure against all system-level attacks and

physical attacks (outside processor). Almost no performance cost

Page 58: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 58

Conclusions – Cached Hash Trees Cached hash trees are secure against all

known attacks But they have potentially poor performance

No research has been done to stress test them Performance is bad when hash tree is not in cache

→ a large working set or pathological access pattern may result in poor performance

Page 59: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 59

Conclusions – Log Hashing

Log hashing is secure as long as verification is done before results are used How do you ensure that results are not consumed

by users or other applications e.g. disk writes, network writes, shared memory, screen

refresh, OS interrupts

Log hashing has good performance if verification is infrequent But what if it’s not? How many applications

require frequent verification?

Page 60: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 60

Conclusions – Keys

Execution privacy and integrity require keys Keys must be protected, even if OS is compromised

or physical attack How should keys be protected?

Are Physically Random Functions really resistant to physical attack?

How should device public keys be used? Should the manufacturer publish them? How should revocation work? What happens if ownership of the device is transferred?

Page 61: Architectures for Secure Processing Matt DeVuyst

Architectures for Secure Processing

Matt DeVuyst

Page 62: Architectures for Secure Processing Matt DeVuyst

Research Exam - Matt DeVuyst 62

Cached Hash Tree Consistency

Cache Memory

= Up-to-date hash = Outdated hash

Data

Parent Hash

Grandparent Hash

1. The parent is loaded and verified against grandparent.

If dirty data is evicted andparent hash is not in the cache …

2. Then the parent is updated