process-shared and persistent code caches derek bruening and vladimir kiriansky vee 2008

35
Process-Shared and Persistent Code Caches Derek Bruening and Vladimir Kiriansky VEE 2008

Upload: austen-reed

Post on 03-Jan-2016

222 views

Category:

Documents


1 download

TRANSCRIPT

Process-Shared andPersistentCode Caches

Derek Bruening and Vladimir Kiriansky

VEE 2008

Copyright © 2008 VMware, Inc. All rights reserved. 2

Software Code Caches

Performance boost for runtime systems

Virtual machines, interpreters, dynamic translators

Dynamic compilers (JIT, etc.) and optimizers

Simulators and emulators

Indirection of runtime code manipulation

Avoid transparency and granularity limitations of directly modifying application code

Dynamic tools: profiling, security, optimization, auditing, introspection, analysis, ...

Copyright © 2008 VMware, Inc. All rights reserved. 3

Performance Limitations

As code caches mature, their uses are moving beyond single-application research instances

Deploy on production systems

Apply to many processes simultaneously

Problem: memory usage!

Scalability noticeably more limited than native

Problem: cold code performance!

Desktop application start-up feels sluggish

Copyright © 2008 VMware, Inc. All rights reserved. 4

Contributions

Code cache design that supports both inter-process sharing and inter-execution persistence

Adaptive-level-of-granularity code cache

Evaluation in DynamoRIO industrial-strength system

Base for Determina’s Memory Firewall host intrusion prevention technology

Focus on security:

Scheme that avoids privilege escalation while allowing high-to-low and peer-to-peer sharing

Read-only code caches and data structures in steady state

Copyright © 2008 VMware, Inc. All rights reserved. 5

Outline

Introduction

Sharing

Security

Consistency

Implementation

Evaluation

Copyright © 2008 VMware, Inc. All rights reserved. 6

Shared Libraries Undone

D.dll

A.dll

B.dll

C.dll

X.exe

code cache:executed parts ofA, B, C

Y.exe X.exe

code cache: executed parts ofA, C, D

code cache: executed parts ofA, B, C

Copyright © 2008 VMware, Inc. All rights reserved. 7

Granularity of Sharing

Mirror native code organization

Code caches contain native code translations

Align code cache shareability, removal, and versioning with the units of code that the application loads, unloads, and are updated

Larger units have more limited shareability

Other instances of the same application

Do not share dynamically-generated code

Unlikely to be identical in every process

Copyright © 2008 VMware, Inc. All rights reserved. 8

Process-Shared Code Caches

D.dll

A.dll

B.dll

C.dll

X.exe Y.exe X.exe

D code cache

A code cache

B code cache

C code cache

Copyright © 2008 VMware, Inc. All rights reserved. 9

Mechanism of Sharing

Live versus frozen code caches

Frozen are much simpler, especially for security

File-based versus memory-only

File-based have more security concerns but enable inter-execution sharing (persistence)

Inter-process and inter-execution sharing share many challenges

Copyright © 2008 VMware, Inc. All rights reserved. 10

Outline

Introduction

Sharing

Security

Consistency

Implementation

Evaluation

Copyright © 2008 VMware, Inc. All rights reserved. 11

Code Cache Security

Avoid opening up new vulnerability vectors that do not exist natively:

Privilege escalation

Any input from low to high is a potential vector

Code modifiability

Application executable and library files

ftp server should not let user write ftp.exe code cache

In-memory caches

Copyright © 2008 VMware, Inc. All rights reserved. 12

Prevent Privilege Escalation

Privilege escalation unacceptable

Cannot rely on building bulletproof verifier

No sharing from low to high!

Identify Trusted Computing Base (TCB)

Share only from TCB to everyone, or among peers

Copyright © 2008 VMware, Inc. All rights reserved. 13

Two-Level Hierarchy

NT AUTHORITY\System (S-1-5-18)

NT AUTHORITY\LocalService (S-1-5-19)

NT AUTHORITY\NetworkService (S-1-5-20)

Regular users (S-1-5-21-RID)

Trusted Computing Base

All other users, isolated from each other

Copyright © 2008 VMware, Inc. All rights reserved. 14

Two-Level Hierarchy

Trusted Computing

Base

services.exe

lsass.exe

RpcSs

SvcHost.exe NetSvcs

NetworkService

explorer.exe

firefox.exe

excel.exe

explorer.exe

iexplore.exe

winword.exe

All Other Users

Copyright © 2008 VMware, Inc. All rights reserved. 15

Limit Code Modifiability

Use protected directories

All code cache files kept in directories writable only by the TCB

Users create and merge new caches in user-writable directories

Limited-privilege TCB-launched process verifies user-written files and publishes official files

TCB service watches for new user-written files

Agent that verifies and publishes is not full TCB: only input is new user file, only output is inherited file handle for published file target

Copyright © 2008 VMware, Inc. All rights reserved. 16

Outline

Introduction

Sharing

Security

Consistency

Implementation

Evaluation

Copyright © 2008 VMware, Inc. All rights reserved. 17

Code Cache Consistency

Original libraries are not unchanging

Application updates

Local tools (rebasing, etc.)

Code cache file must be invalidated if its source application file has changed

Cache filename based on module version for initial check, and to support multiple simultaneous versions

Copyright © 2008 VMware, Inc. All rights reserved. 18

Consistency Checks

Offline byte-by-byte prior to publishing

Avoid code modifiability vectors

Online checksum comparisons

Support legitimate application changes

Detect disk corruption

In our threat model, attackers with write access to TCB-owned files are cause for far more worry than modification of code cache files

Copyright © 2008 VMware, Inc. All rights reserved. 19

Checksum Costs

Copyright © 2008 VMware, Inc. All rights reserved. 20

Outline

Introduction

Sharing

Security

Consistency

Implementation

Evaluation

Copyright © 2008 VMware, Inc. All rights reserved. 21

Re-Design Code Cache

Read-only cache

For file-based sharing and security

Position-independence of cache and data structures

Eliminate and/or combine data structures to remove pointers

Platform and execution dependencies

Micro-architectural dependencies: cache line, etc.

TLS offsets

Copyright © 2008 VMware, Inc. All rights reserved. 22

Data Structures

Existing code cache has fine-grained control

Individual code fragment unlink and removal

Separate data structure per code fragment and each of its exits, memory regions spanned, and incoming links; plus, backpointer from its cache slot

Many separate, writable, variable-sized, inter-linked structures: complex to persist!

Copyright © 2008 VMware, Inc. All rights reserved. 23

Reduce Code Cache Granularity

Switch to coarse-grain scheme

Give up individual code fragment control

Permanent intra-cache links

No per-fragment data structures at all

Treat entire cache as a unit for consistency

Side benefit: reduce single-application memory usage

Copyright © 2008 VMware, Inc. All rights reserved. 24

Persisted File Layout

code cache

exit/lookup indirection pads

R(W)X

relocation data

inter-module link stubs

R

RX

hashtable of entry points

checksums

header

R(W)X

Copyright © 2008 VMware, Inc. All rights reserved. 25

Support Dynamism

Relocation

Use application library relocation tables

Add reloc entries for our own code changes that are not easily made position-independent

Becomes more important with VISTA ASLR

Application code modifications

Invalidate persisted cache; switch to incremental coarse-grain + fine-grain combination

Copyright © 2008 VMware, Inc. All rights reserved. 26

Adaptive Level of Granularity

Start with coarse-grain caches + sharing/persistence

Switch to fine-grain for individual modules or sub-regions of modules after significant consistency events, to avoid expensive entire-module flushes

Support simultaneous fine-grain fragments within coarse-grain regions for corner cases

Match amount of bookkeeping to amount of code change

Majority of application code does not need fine-grain

Copyright © 2008 VMware, Inc. All rights reserved. 27

Support Instrumentation

Preserve instrumentation when persisting

Tools provide relocation info, or produce PIC

Dynamically-varying tools specify do-not-persist

Add tool name to file header and namespace

Only load file that matches current tool

Typical tool deployment is the same tool system-wide, rather than a disparate set of simultaneous tools

Copyright © 2008 VMware, Inc. All rights reserved. 28

Outline

Introduction

Sharing

Security

Consistency

Implementation

Evaluation

Copyright © 2008 VMware, Inc. All rights reserved. 29

System-wide Deployment

Windows XP desktop: boot + auto-logon

Peak committed memory usage once idle

27 processes executed under 4 different users

Copyright © 2008 VMware, Inc. All rights reserved. 30

Desktop Startup: Memory

Copyright © 2008 VMware, Inc. All rights reserved. 31

Desktop Startup: Time

Copyright © 2008 VMware, Inc. All rights reserved. 32

Desktop Startup: Time Breakdown

Copyright © 2008 VMware, Inc. All rights reserved. 33

Related Work: One or the Other

Process Sharing

Czajkowski 02: Inter-JVM sharing

Transitive: translations not persisted due to security concerns

Bungale 07 (PinOS): below OS, so can share at machine page level, but virtual address differences require expensive checks

Persistence

Static instrumentation tools (ATOM, Etch, EEL, etc.)

Hazelwood 03: persistence study

Li 05: persistence across module unloads

Reddi 05, 07: inter-execution persistence in Pin

Copyright © 2008 VMware, Inc. All rights reserved. 34

Related Work: Both

FX!32: per-module persistent translations

Central service translates offline using profile info

.NET NGEN pre-compiler

Shares only cryptographically signed code; if not installed centrally, performs expensive runtime verification

Background service that tracks dependencies and re-compiles as needed, to support inlining

Copyright © 2008 VMware, Inc. All rights reserved. 35

Summary: Improved Scalability

Design for inter-process sharing of code caches that also supports inter-execution persistence

Scheme for sharing without risk of privilege escalation and with read-only code caches and data structures

Evaluation in DynamoRIO where we achieved a two-thirds reduction in both memory usage ( scalability) and startup time