accelerating memory decryption and authentication with frequent value prediction weidong...

28
Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong Shi Hsien-Hsin Sean Lee Motorola Labs Georgia Tech

Upload: elwin-casey

Post on 16-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

Accelerating Memory Decryption and Authentication With Frequent Value

Prediction

Weidong Shi Hsien-Hsin Sean LeeMotorola Labs Georgia Tech

Page 2: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

2/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Security Frontier

Transistor Leaf Cell

Register/Unit Processor SoC

Embedded Secrets

Counterfeit Detection

Authentication/Secure Token

Isolation

Content Confidentiality

Circuit Camouflage/Obfuscation/Private Circuit

(Eurocrypt 02/06)

Secure MMU/Buses/Memory(CASES-04, ASPLOS-04,

PACT-06)

Secure Processor(e.g., IBM 06, MICRO-36/37/39,

ASPLOS 02/04, ISCA32/33)

Secure SoC

Chip De-liddingDie Analysis

Probing PCB

Side-channel

Clocking-Timing

Backdoor

Page 3: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

3/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Secure Processor Architecture

Encrypted Memory

[MICRO-36,37, 39, ASPLOS-02,04, ISCA-32,33, IBM SecureBlue]

Trusted Secure Processor

Processor Core

Memory Enc/Dec,Integrity

Verification Engine

L2

Page 4: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

4/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Agenda• Counter Mode Cipher

• “Direct Memory” Block Ciphers

• Frequent Value Speculation

• Performance Analysis

• Conclusion

Page 5: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

5/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Counter Mode Encryption

Counter

Block Cipher(AES)

Plaintxt0

Ciphertxt0

XOR

Secret Key

One Time Pad

Nonce/IV

• Use Counter to generate a secret keystream that encrypts a memory block with a simple XOR

• Turn a block cipher into a stream cipher

Page 6: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

6/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Counter Mode Encryption

Counter

Block Cipher(AES)

Plaintxt0

Ciphertxt0

XOR

Nonce Counter+1

Block Cipher(AES)

Plaintxt1

Ciphertxt1

XOR

Nonce Counter+N

Block Cipher(AES)

PlaintxtN

CiphertxtN

XOR

Nonce

• Use Counter to generate a secret keystream that encrypts a memory block with a simple XOR

• Turn a block cipher into a stream cipher

Page 7: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

7/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Parallelization for Counter Mode Secure Arch

• OTP generation and Data fetch are done in parallel

• How to obtain Counter values– Counter Cache [MICRO36]– Prediction & Precomputation

[ISCA32]

Counter

Block Cipher(AES)

Plaintxt cache line X

Ciphertxt cache line X XORXOR

One Time Pad

Secure Processor

Memory

?

Nonce

Page 8: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

8/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Block Cipher (ECB)

Block Cipher(AES)

Plaintxt0

Ciphertxt0

Secret Key

• “Direct” Memory Encryption• Electronic Code Book

Page 9: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

9/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Block Cipher (ECB)

Block Cipher(AES)

Plaintxt0

Ciphertxt0

Secret Key

Block Cipher(AES)

PlaintxtN

CiphertxtN

Secret Key

• “Direct” Memory Encryption• Electronic Code Book

Page 10: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

10/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Block Cipher (CBC)• Cipher-Block Chaining• A dependency with the neighboring ciphertext for

decrypting a target

Block Cipher(AES)

Plaintxt0

Ciphertxt0

Secret Key

XORInit. Vector

Block Cipher(AES)

Plaintxt1

Ciphertxt1

Secret Key

XOR

Block Cipher(AES)

Plaintxt2

Ciphertxt2

Secret Key

XOR

Page 11: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

11/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Authenticated Encryption• The same cipher protects

– Confidentiality (tamper-resistance)

– Message Integrity (tamper-evidence)

• Offset Code Block (OCB)– One of the authenticated encryption methods– Non-malleable under chosen-ciphertxt -- which

counter mode is vulnerable to– 802.11i currently specifies AES-OCB as an

alternative to CCM for confidentiality and integrity

A B

C

A B

C

Page 12: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

12/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Authenticated Encryption: OCB Encryption

Block Cipher(AES)

L pseudo random #

R

XOR

Secret Key

Nonce || mem addr

PlaintxtN

XOR

Block Cipher(AES)

Secret Key

aL+R

XORaL+R

CiphertxtN

Page 13: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

13/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Authenticated Encryption: OCB Authentication

Plaintxt0 Plaintxt1 Plaintxt2 Plaintxt3

5L+R XOR

Block Cipher(AES)

Secret Key

Message Authentication Code(MAC)

Hash

Page 14: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

14/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

OCB ─ Decryption and Integrity Verification

• Decryption can start after encrypted memory blocks are fetched.

• Decrypted blocks cannot be issued till its integrity is verified.

• MAC verification can take longer time than decryption.

E(B0)

Memory Fetch

E(B1) E(B2) E(B3)

Decryption

B0 B1 B2 B3

MAC Verification

Issue Issue Issue Issue

MAC

Page 15: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

15/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Speculations in Secure Processor

Examples of

Prediction

Applicable Cipher Scenario

What can be Predicted

Why Predicable?

Counter Prediction[ISCA-32]

Counter Mode Encryption

Counter Values Coherence of Counter Values

Value Prediction[CF-07]

“Direct” Encryption mode

Encrypted Value Existence of Frequent Values

• Improve performance by taking advantage of – The nature of the data or,– Statistical property of the data.

• Do not compromise security as performed only

within the secure boundary.

Page 16: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

16/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Analysis of Frequent ValuesFrequent value - 256K L2

0

10

20

30

40

50

60

70

80

90

Applu

apsi

art

bzip2

crafty

facerec

gadel

gcc

gzip

mcf

mesa

mgrid

parser

six

swim

twolf

vortex

vpr

wupw

ise

average

8 16 32

Frequent value - 1M L2

0

10

20

30

40

50

60

70

80

Applu

Apsi

Art

bzip2

crafty

facerec

gadel

gcc

gzip

mcf

mesa

mgrid

parser

six

swim

twolf

vortex

vpr

wupw

ise

average

• 40 to 60% encrypted memory data are frequent values

• 8 to 32 frequent values account for over 40% encrypted data

Page 17: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

17/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Speculation Using Idle Pipelined Crypto Engine• Generate “encrypted” frequent values using

otherwise idle crypto engines

Encryption Pipeline

Memory Pipeline

Retrieving the Encrypted Cache Line Ek(X)

Frequent value Ek(A)

T1

Ek(B)

T2

Ek(C)

T3

Ek(D)

T4

Ek(E)

T5

Ek(F)

T6

Ek(G)

T7

=?

• Integrity verification can also be speculated. • Generate MAC for speculated frequent values

Ek(E) matches

Time Line

Page 18: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

18/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Value Prediction Based Decryption

WBBuffer

Pipelined Encryption Engine

Pipelined Encryption Engine

Pipelined Decryption Engine

Scheduler

Cache

Returned Encrypted Data

Frequent Value Table

CAM

Secure processor

XYZW

E(X)E(Y)E(Z)E(W)

Page 19: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

19/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Handle Large Block Size

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

Freq Value

Non-Freq Value

128-bit Cipher

128-bit Cipher

128-bit Cipher

128-bit Cipher

Four 64-bit frequent value blocks

• Under 128 bit cipher, is predictable. is not.

64-bitblock

64-bitblock

64-bitblock

64-bitblock

Predictable Blocks of Freq Value Blocks (%), L2=256KB

0

10

20

30

40

50

60

applu

apsi

art

bzip2

crafty

facerec

galgel

gap

gcc

gzip

mcf

mesa

mgrid

parser

sixtrack

swim

twolf

vortex

vpr

wupw

ise

average

Page 20: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

20/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Block Re-ordering64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

64-bitblock

Predictable Freq Value Pair

Predictable Freq Value Pair

Predictable Blocks of Frequent Values Blocks (%) L2=256KB

0

20

40

60

80

100

applu

apsi

art

bzip2

crafty

facerec

galgel

gap

gcc

gzip

mcf

mesa

mgrid

parser

sixtrack

swim

twolf

vortex

vpr

wupw

ise

average

without reorder with_reorder

64-bitblock

64-bitblock

Freq Value

Non-Freq Value

Page 21: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

21/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

1 0 0 0 0 1 0 10 1 0 1 0 1 1 0

0 1 0 0 0 0 1 1…

1 1 0 1 0 1 0 10 0 0 1 0 1 1 0

0 1 0 0 0 0 1 0

Frequent Value Map

• Speculation targeted only for frequent value blocks

• Overhead– 1 frequent value map bit

per encrypted block (128 bits)

– 8 bits per cache line (64B cache line size)

– 512 bits per page– Total 64K bits for 128-enry

TLB

• Can be shared for many other purposes – frequent value based cache

compression– power saving cache

Cache line FV bit map

Page

Pages in TLB

Frequent Value Map for All TLB Pages

0 1 0 1 0 1 0 11 0 0 1 0 1 1 0

0 0 0 1 0 1 1 0

Page 22: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

22/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

MAC Speculation Speculated

Encrypted Block

Memory FetchMACSpeculation

Comparison

SpeculatedEncrypted Block

MACSpeculation

SpeculatedEncrypted Block

MACSpeculation

SpeculatedEncrypted Block

MACSpeculation

Comparison Comparison Comparison

• Compute MAC for speculated frequent value blocks

• Compare

• fetched encrypted block with speculated encrypted block

• fetched MAC with speculated MAC

• If both match, issue the fetched instruction/data

Page 23: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

23/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Experimental Setup

Parameters Value

L1 I/D Cache DM, 16KB

L2 Cache 4way, unified, 256KB and 1MB

Memory Bus 8B wide, 1:4, 1:5, 1:6 Ratio

CPU Clock 1GHz

L1 Latency 1 cycle

L2 Latency 8 cycles (1MB), 4 cycles (256KB)

TDES Decryption Latency 96ns

AES Decryption Latency 65ns

Block Size 64-bit (Triple DES), 128-bit (AES)

Page 24: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

24/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Results – Value PredictionIPC Speedup L2=256KB

1

1.051.1

1.151.2

1.251.3

1.35

applu

apsi

art

bzip2

crafty

facerec

galgel

gap

gcc

gzip

mcf

mesa

mgrid

parser

sixtrack

swim

twolf

vortex

vpr

wupw

ise

average

IPC Speedup L2=1MB

1

1.05

1.1

1.151.2

1.25

1.3

1.35

applu

apsi

art

bzip2

crafty

facerec

galgel

gap

gcc

gzip

mcf

mesa

mgrid

parser

sixtrack

swim

twolf

vortex

vpr

wupw

ise

average

Page 25: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

25/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Performance ― Number of Frequent Values

• 64-bit block size

IPC Speedup L2=256KB

1

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

1.45

ap

plu

ap

si

art

bzip

2

crafty

face

rec

ga

lge

l

ga

p

gcc

gzip

mcf

me

sa

mg

rid

pa

rser

sixtrack

swim

two

lf

vorte

x

vpr

wu

pw

ise

ave

rag

e

8_freq_values 16_freq_values 32_freq_values

Page 26: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

26/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Sensitivity to Memory SpeedIPC Speedup, L2=256KB, Ratio=1:4

11.05

1.11.15

1.21.25

1.31.35

1.4

ap

plu

ap

si

art

bzip

2

crafty

face

rec

ga

lge

l

ga

p

gcc

gzip

mcf

me

sa

mg

rid

pa

rser

sixtrack

swim

two

lf

vorte

x

vpr

wu

pw

ise

ave

rag

e

IPC Speedup, L2=256KB, Ratio=1:6

11.05

1.11.15

1.21.25

1.31.35

1.4

ap

plu

ap

si

art

bzip

2

crafty

face

rec

ga

lge

l

ga

p

gcc

gzip

mcf

me

sa

mg

rid

pa

rser

sixtrack

swim

two

lf

vorte

x

vpr

wu

pw

ise

ave

rag

e

Page 27: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

27/26Shi and Lee, Accelerating Memory Decryption and Authentication (CF’07)

Conclusion

• Frequent value speculation can hide both• Decryption latency• Integrity verification latency• For direct memory block ciphers

• Encrypted values demonstrate predictability.

• We propose block re-ordering to consolidate the predictability

• Memory-bound benchmark programs show 10%- 30% performance improvement.

Page 28: Accelerating Memory Decryption and Authentication With Frequent Value Prediction Weidong ShiHsien-Hsin Sean Lee Motorola LabsGeorgia Tech

Thank You!

Georgia TechECE MARS Labshttp://arch.ece.gatech.edu