process-shared and persistent code caches derek bruening and vladimir kiriansky vee 2008

Process-Shared andPersistentCode Caches

Derek Bruening and Vladimir Kiriansky

VEE 2008

Copyright © 2008 VMware, Inc. All rights reserved. 2

Software Code Caches

Performance boost for runtime systems

Virtual machines, interpreters, dynamic translators

Dynamic compilers (JIT, etc.) and optimizers

Simulators and emulators

Indirection of runtime code manipulation

Avoid transparency and granularity limitations of directly modifying application code

Dynamic tools: profiling, security, optimization, auditing, introspection, analysis, ...


Performance Limitations

As code caches mature, their uses are moving beyond single-application research instances

Deploy on production systems

Apply to many processes simultaneously

Problem: memory usage!

Scalability noticeably more limited than native

Problem: cold code performance!

Desktop application start-up feels sluggish


Contributions

Code cache design that supports both inter-process sharing and inter-execution persistence

Adaptive-level-of-granularity code cache

Evaluation in DynamoRIO industrial-strength system

Base for Determina’s Memory Firewall host intrusion prevention technology

Focus on security:

Scheme that avoids privilege escalation while allowing high-to-low and peer-to-peer sharing

Read-only code caches and data structures in steady state


Outline

Introduction

Sharing

Security

Consistency

Implementation

Evaluation


Shared Libraries Undone

D.dll

A.dll

B.dll

C.dll

X.exe

code cache:executed parts ofA, B, C

Y.exe X.exe

code cache: executed parts ofA, C, D

code cache: executed parts ofA, B, C


Granularity of Sharing

Mirror native code organization

Code caches contain native code translations

Align code cache shareability, removal, and versioning with the units of code that the application loads, unloads, and are updated

Larger units have more limited shareability

Other instances of the same application

Do not share dynamically-generated code

Unlikely to be identical in every process


Process-Shared Code Caches

D.dll

A.dll

B.dll

C.dll

X.exe Y.exe X.exe

D code cache

A code cache

B code cache

C code cache


Mechanism of Sharing

Live versus frozen code caches

Frozen are much simpler, especially for security

File-based versus memory-only

File-based have more security concerns but enable inter-execution sharing (persistence)

Inter-process and inter-execution sharing share many challenges


Outline

Introduction

Sharing

Security

Consistency

Implementation

Evaluation


Code Cache Security

Avoid opening up new vulnerability vectors that do not exist natively:

Privilege escalation

Any input from low to high is a potential vector

Code modifiability

Application executable and library files

ftp server should not let user write ftp.exe code cache

In-memory caches


Prevent Privilege Escalation

Privilege escalation unacceptable

Cannot rely on building bulletproof verifier

No sharing from low to high!

Identify Trusted Computing Base (TCB)

Share only from TCB to everyone, or among peers


Two-Level Hierarchy

NT AUTHORITY\System (S-1-5-18)

NT AUTHORITY\LocalService (S-1-5-19)

NT AUTHORITY\NetworkService (S-1-5-20)

Regular users (S-1-5-21-RID)

Trusted Computing Base

All other users, isolated from each other


Two-Level Hierarchy

Trusted Computing

Base

services.exe

lsass.exe

RpcSs

SvcHost.exe NetSvcs

NetworkService

explorer.exe

firefox.exe

excel.exe

explorer.exe

iexplore.exe

winword.exe

All Other Users


Limit Code Modifiability

Use protected directories

All code cache files kept in directories writable only by the TCB

Users create and merge new caches in user-writable directories

Limited-privilege TCB-launched process verifies user-written files and publishes official files

TCB service watches for new user-written files

Agent that verifies and publishes is not full TCB: only input is new user file, only output is inherited file handle for published file target


Outline

Introduction

Sharing

Security

Consistency

Implementation

Evaluation


Code Cache Consistency

Original libraries are not unchanging

Application updates

Local tools (rebasing, etc.)

Code cache file must be invalidated if its source application file has changed

Cache filename based on module version for initial check, and to support multiple simultaneous versions


Consistency Checks

Offline byte-by-byte prior to publishing

Avoid code modifiability vectors

Online checksum comparisons

Support legitimate application changes

Detect disk corruption

In our threat model, attackers with write access to TCB-owned files are cause for far more worry than modification of code cache files


Checksum Costs


Outline

Introduction

Sharing

Security

Consistency

Implementation

Evaluation


Re-Design Code Cache

Read-only cache

For file-based sharing and security

Position-independence of cache and data structures

Eliminate and/or combine data structures to remove pointers

Platform and execution dependencies

Micro-architectural dependencies: cache line, etc.

TLS offsets


Data Structures

Existing code cache has fine-grained control

Individual code fragment unlink and removal

Separate data structure per code fragment and each of its exits, memory regions spanned, and incoming links; plus, backpointer from its cache slot

Many separate, writable, variable-sized, inter-linked structures: complex to persist!


Reduce Code Cache Granularity

Switch to coarse-grain scheme

Give up individual code fragment control

Permanent intra-cache links

No per-fragment data structures at all

Treat entire cache as a unit for consistency

Side benefit: reduce single-application memory usage


Persisted File Layout

code cache

exit/lookup indirection pads

R(W)X

relocation data

inter-module link stubs

R

RX

hashtable of entry points

checksums

header

R(W)X


Support Dynamism

Relocation

Use application library relocation tables

Add reloc entries for our own code changes that are not easily made position-independent

Becomes more important with VISTA ASLR

Application code modifications

Invalidate persisted cache; switch to incremental coarse-grain + fine-grain combination


Adaptive Level of Granularity

Start with coarse-grain caches + sharing/persistence

Switch to fine-grain for individual modules or sub-regions of modules after significant consistency events, to avoid expensive entire-module flushes

Support simultaneous fine-grain fragments within coarse-grain regions for corner cases

Match amount of bookkeeping to amount of code change

Majority of application code does not need fine-grain


Support Instrumentation

Preserve instrumentation when persisting

Tools provide relocation info, or produce PIC

Dynamically-varying tools specify do-not-persist

Add tool name to file header and namespace

Only load file that matches current tool

Typical tool deployment is the same tool system-wide, rather than a disparate set of simultaneous tools


Outline

Introduction

Sharing

Security

Consistency

Implementation

Evaluation


System-wide Deployment

Windows XP desktop: boot + auto-logon

Peak committed memory usage once idle

27 processes executed under 4 different users


Desktop Startup: Memory


Desktop Startup: Time


Desktop Startup: Time Breakdown


Related Work: One or the Other

Process Sharing

Czajkowski 02: Inter-JVM sharing

Transitive: translations not persisted due to security concerns

Bungale 07 (PinOS): below OS, so can share at machine page level, but virtual address differences require expensive checks

Persistence

Static instrumentation tools (ATOM, Etch, EEL, etc.)

Hazelwood 03: persistence study

Li 05: persistence across module unloads

Reddi 05, 07: inter-execution persistence in Pin


Related Work: Both

FX!32: per-module persistent translations

Central service translates offline using profile info

.NET NGEN pre-compiler

Shares only cryptographically signed code; if not installed centrally, performs expensive runtime verification

Background service that tracks dependencies and re-compiles as needed, to support inlining


Summary: Improved Scalability

Design for inter-process sharing of code caches that also supports inter-execution persistence

Scheme for sharing without risk of privilege escalation and with read-only code caches and data structures

Evaluation in DynamoRIO where we achieved a two-thirds reduction in both memory usage ( scalability) and startup time

process-shared and persistent code caches derek bruening and vladimir kiriansky vee 2008

Documents