cryptanalysis through cache address leakage eran tromer adi shamir dag arne osvik

Cryptanalysis through Cache Address Leakage

Cryptanalysis through Cache Address Leakage

Eran TromerAdi Shamir

Dag Arne Osvik

Eran TromerAdi Shamir

Dag Arne Osvik

2

Cache attacks

• Pure software

• No special privileges

• No interaction with the cryptographic code (some variants)

• Very efficient (e.g., full AES key extraction from Linux encrypted partition in 65 milliseconds)

• Compromise otherwise well-secured systems(e.g., see NIST AES process)

• “Commoditize” side-channel attacks: easily deployed software breaks many common systems

3

CPU core60% (until recently)

Main memory7-9%

Why cache analysis?

cacheAnnual speedincrease:

Typicallatency:

50-150ns0.3ns → timing gap

4

Address leakage

• The cache is a shared resource:cache state affects, and is affected by, all processes, leading to crosstalk between processes.

• The cached data is subject to memory protection…

• But the metadata leaks information about memory access patterns: which addresses are being accessed.

5

Related works on cache attacks

• [Hu ‘92] – Convert channels

• [Page ’02] – Theoretical attacks via power trace

• [Tsunoo Tsujihara Minematsu Miyuachi ’02],[Tsunoo Saito Suzaki Shigeri Miyauchi ’03] – Timing attacks via internal collisions

• [Bernstein ’04] – Model-less timing attacks

• [Percival ’05] – RSA

Different setting, further assumptions, less practical, and/or attack a different ciphers.

This the first work to fully demonstrate an efficient cache attack on a modern block cipher in real-life settings.

6

1.Eavesdropping

2.Cryptanalysis

3.Experimental results

4.Implications

7

Associative memory cacheD

RA

Mca

che

memory block(64 bytes)

cache line

(64 bytes)

cache set

(4 cache lines)

8

S-box tables in memoryD

RA

Mca

che

S-box

table

9

Detecting access to AES tables (basic idea)

DR

AM

cach

e

Attacker

memory

S-box

table

10

Measurement technique

Inter-process crosstalk can be exploited in two ways:

• Effect of the cache on the encryption (timing)

• Effect of the encryption on the cache

11

DR

AM

cach

e

T0Attacker

memory

1. Make sure the tables are cached

2. Evict one cache set

3. Time an encryption and see if it’s slow

Measuring effect of cache on encryption

12

Measurement technique

Inter-process crosstalk can be exploited in two ways:

• Effect of the cache on the encryption (timing)

• Effect of the encryption on the cache

13

Measuring effect of encryption on cacheD

RA

Mca

che

Attacker

memory 1. Completely

evict tables from cache

S-box

table

14


RA

Mca

che

Attacker



2. Trigger a single encryption

S-box

table

15


RA

Mca

che

Attacker



2. Trigger a single encryption

3. Access attacker memory again and see which cache sets are slow

S-box

table

16

Advantages of second method

• Yields more information (64) from a single encryption

• Not a timing attack!Insensitive to timing variance in enryption code path (crucial for effective attacks on complicated systems).

• No real need to trigger the encryption – can wait until it happens by itself…

17

1.Eavesdropping

2.Cryptanalysis


4.Implications

18

char p[16], k[16]; // plaintext and keyint32 T0[256],T1[256],T2[256],T3[256]; // lookup tablesint32 Col[4]; // intermediate state

...

/* Round 1 */

Col[0] T0[p[ 0]©k[ 0]] T1[p[ 5]©k[ 5]] T2[p[10]©k[10]] T3[p[15]©k[15]];

Col[1] T0[p[ 4]©k[ 4]] T1[p[ 9]©k[ 9]] T2[p[14]©k[14]] T3[p[ 3]©k[ 3]];

Col[2] T0[p[ 8]©k[ 8]] T1[p[13]©k[13]] T2[p[ 2]©k[ 2]] T3[p[ 7]©k[ 7]];

Col[3] T0[p[12]©k[12]] T1[p[ 1]©k[ 1]] T2[p[ 6]©k[ 6]] T3[p[11]©k[11]];

A typical software implementation of AES

lookup index = plaintext key

19

Variant 1: Synchronous attack

• A software service performs AES encryption using a secret key.

• An attacker process runs on the same CPU.

• The attacker process can somehow invoke the service on known plaintext.

• Examples:

— Encrypted disk partition + filesystem

— IP/Sec, VPN

→ the attacker can discover the secret key.

or decryption, or MAC

or ciphertexts

or trigger and

time memory

accesses

remotely

20

Synchronous attack on AES: Overview

• Measure (possibly noisy) cache usage of many encryptions of known plaintexts.

• Guess the first key byte. For each hypothesis:

— For each sampled plaintext, predict which cache line is accessed by “T0[p[ 0]©k[ 0]]”

• Identify the hypothesis which yields maximal correlation between predictions and measurements.

• Proceed for the rest of the key bytes.

• Practically, a few hundred samples suffice.

Got 64 bits of the key (high nibble of each byte)!

• Use these partial results to mount attack further AES rounds, exploiting S-box nonlinearity.

A few thousand samples for complete key recovery.

21

Synchronous 1R attack (idealized)

Suppose we could answer queries of the form:

“Does encryption of p under the unknown key k access the memory block of index y in table Tl?”

Denote this predicate Q(p,l,y){0,1}.

The approach:

Obtain many samples of the predicate

”Guess” part of the key

Deduce constraints on value of predicateCheck if fulfilled in all samples

22

Synchronous 1R attack (cont.)

• Consider one of the memory accesses in the first round: “T0[p[0]©k[0]]”

• Given a candidate value k’[0] and samples of Q(p,l,y):

—The useful samples are those that fulfill p[0]©k’[0]y

—If k’[0]k[0] then for all useful samples: p[0]©k[0] p[0]©k’[0] y so T0[p[0]©k[0]] accesses y hence Q(p,l,y)=1

—Otherwise: p[0]©k[0] p[0]©k’[0] y hence Q(p,l,y)=0

(but there are 35 more “random” accesses to T0…)with probability (1-1/16)350.104

• Hence, a few hundred random samples suffice to eliminate all bad candidates find high nibble of all key bytes

23

Full key extraction

• We got half the key (64 bits)

• Too much for brute force

• We extracted all possible information from 1st-round accesses

• Move to 2nd round, taking advantage of the non-linearity of the S-box

24

Synchronous 2R attack (idealized)

• After expansion of the AES key schedule and 1st round, we see that the 2nd round does the following lookup:

“T2[ S[p[0]©k[0]] 2S[p[5]©k[5]] 3S[p[10]©k[10]] S[p[15]©k[15]] S[k[15]] k[2] ]”

• The only relevant unknowns are the low nibbles of k[0], k[5], k[10], k[15] (216 candidates).

• Can test a candidate as before:—Predict this lookup according to guess

—Identify useful samples, i.e., those where y is in the same memory block as the prediction

—Check whether Q(p,l,y)=1 for all useful samples.

• There are 3 more accesses of this special form, with disjoint sets of relevant low nibbles.

• Hence: full key recovery using ~2000 random samples.

25

Scenario 2: asynchronous attack

• Someone runs encryptions computations using a secret key.

• An attacker process runs on the same CPU at (roughly) the same time.

• The plaintext/ciphertext has a non-uniform (conditional) distribution:

— English

— Formatted data

— Headers

— Ciphertext gleaned from wire

• Examples: just about any use of crypto on a multi-user system

→ attacker can (partially?) discover the secret key

26

Asynchronous attack (basic idea)

Compare two distributions:

• Measured memory accesses statistics.

• Predicted memory accesses statistics,under the given plaintext distribution and the key hypothesis.

Find key that yields best correlation.

27

“Hyper Attack”

• Obtaining parallelism:

— HyperThreading (simultaneous multithreading)

— Multi-core, shared caches, cache coherence

— Interrupts

• Attack model:

— Encryption process is not communicating with anyone (no I/O, no IPC).

— No special measurement equipment

— No knowledge of either plaintext of ciphertext

28

1.Eavesdropping

2.Cryptanalysis


4.Implications

29

Experimental results

• Synchronous attack on OpenSLL AES encryption library call:

Full key recovery by analyzing 300 encryptions (13ms)

• Synchronous attack on an AES encrypted filesystem (Linux dm-crypt):

Full key recovery by analyzing 800 write operations (65ms)

• Asynchronous attack on AES(independent process doing batch encryption of text):

Recovery of 45.7 key bits in one minute.

30

Experimental example: synchronous attack I

Measuring a “black box” OpenSSL encryption on Athlon 64, using 10,000 samples.Horizontal axis: evicted cache setVertical axis: p[0] (left), p[5] (right)Brightness: encryption time (normalized)

k[0]=0x00 k[5]=0x50

31

Experimental example: synchronous attack II

Measuring a Linux 2.6.11 dm-crypt encrypted filesystem with ECB AES on Athlon 64, using 30,000 samples.Horizontal axis: evicted cache setVertical axis: p[0]Brightness: cache probing time (normalized)

Left: raw. Right: after subtracting cache set average.

32

Finding the tables

Measuring a Linux 2.6.11 dm-crypt encrypted filesystem with ECB AES on Athlon 64, using 30,000 samples.Horizontal axis: candidate table offsetCandidate key byte value: k’[0] (left), k’[5] (right)Brightness: score

k[0]=0x00 k[5]=0x50

33

Experimental example: asynchronous attack

34

1.Eavesdropping

2.Cryptanalysis


4.Implications

35

Implications and Extensions

• Multiuser systems

• Virtual machines— Examples: jail(), Xen, UML, VMware, Virtual PC

— Breaks NSA US Patent 6,922,774

— No guarantees in AMD ”Secure Virtual Machine” orIntel “Virtualization Technology” (VMX) specs either

• DRMThe trusted path is leaky (even if verified by TPM attestation, NGSCB, etc.).

• Untrusted code (e.g., ActiveX, Java applets, managed .NET, JavaScript)

• Remote network attacks

36

Countermeasures

Note: With sufficient temporal resolution, the attacker gets a fair approximation of the full memory access transcript.

Secure

Efficient Generic?

[GO95]no cache

[ZZLP04]

[MR04]

bitslice

OS mode

ignore

[Intel05]

37

Conclusions

New side channel attacks that are

• cheap

• easy to exploit

• hard to detect

• expensive to avoid

• applicable to many deployed systems

cryptanalysis through cache address leakage eran tromer adi shamir dag arne osvik

Documents

measuring effect of

encryption timing effect

cache state

cache attackshu

cache linesconfidential

cache analysis

cache sets

cache set3