an introduction to proof-carrying code peter lee carnegie mellon university

201
An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University Lecture 1 October 29, 2001 ConCert Meeting

Upload: tale

Post on 30-Jan-2016

40 views

Category:

Documents


1 download

DESCRIPTION

An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University. Lecture 1 October 29, 2001. ConCert Meeting. Plan. Today: Show and tell. Cartoons Some history Special J compiler Demo Next time: Technical details. Lf i and Oracle-based checking Safety policies - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

An Introduction toProof-Carrying Code

Peter LeeCarnegie Mellon University

Lecture 1

October 29, 2001

ConCert Meeting

Page 2: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Plan

Today: Show and tell. Cartoons Some history Special J compiler Demo

Next time: Technical details. Lfi and Oracle-based checking Safety policies Compiler strategy and annotations Engineering considerations Ideas for ConCert-related projects

Page 3: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University
Page 4: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Arianne 5

On June 4, 1996, the Arianne 5 took off on its maiden flight.

40 seconds into its flight it veered off course and exploded.

It was later found to be an error in reuse of a software component.

For the next two years, virtually every research presentation used this picture.

Page 5: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

“Better, Faster, Cheaper”

In 1999, NASA lost both the Mars Polar Lander and the Climate Orbiter.

Later investigations determined software errors were to blame.

Orbiter: Component reuse error.

Lander: Precondition violation.

Page 6: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

USS Yorktown

“After a crew member mistakenly entered a zero into the data field of an application, the computer system proceeded to divide another quantity by that zero. The operation caused a buffer overflow, in which data leaked from a temporary storage space in memory, and the error eventually brought down the ship's propulsion system. The result: the Yorktown was dead in the water for more than two hours.”

Page 7: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Programmable mobile devices

By 2003, one in five people will own a mobile communications device.

Nokia expects to sell 500M Java-enabled phones in 2003.

Most of these devices will be power and memory limited.

Page 8: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Security Attacks

According to CERT, the majority of security attacks exploit

input validation failure

buffer overflow

VBShttp://www.cert.org/summaries/CS-2000-04.html

Page 9: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

BSOD embarrassments

Page 10: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Observations

Failures often due to simple problems “in the details.”

Reuse is critical but perilous.

Performance still matters a lot.

Page 11: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Safety Engineering

Small theorems about large programs would be useful.

Need clearly specified interfaces and checking of interface compliance.

Must not sacrifice performance.

Page 12: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The Code Safety Problem

Please install and execute this.

Page 13: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Code Safety

CPU

Code

Trusted Host

Is this safe to execute?

Page 14: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

TheoremProver

Approach 4Formal Verification

CPU

Code

Flexible andpowerful.

Trusted Host

But really reallyreally hard andmust be correct.

Page 15: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

A Key Idea: Explicit Proofs

CertifyingProver

CPU

ProofChecker

Code

Proof

Trusted Host

Page 16: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

A Key Idea: Explicit Proofs

CertifyingProver

CPU

Code

Proof

No longer need totrust this component.

ProofChecker

Page 17: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Proof-Carrying Code[Necula & Lee, OSDI’96]

A

B

Formal proof or“explanation” of safety

Typically nativeor VM code

rlrrllrrllrlrlrllrlrrllrrll…

Page 18: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Proof-Carrying Code

CertifyingProver

CPU

Code

Proof

Simple,small (<52KB),and fast.

No longer need totrust this component.

ProofChecker

Reasonable in size (0-10%).

Page 19: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Automation viaCertifying Compilation

CertifyingCompiler

CPULooks and smells like a compiler.

% spjc foo.java bar.class baz.c -ljdk1.2.2

Sourcecode

Proof

Objectcode

CertifyingProver

ProofChecker

Page 20: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The Role ofProgramming Languages

Civilized programming languages can provide “safety for free”.

Well-formed/well-typed safe.

Idea: Arrange for the compiler to “explain” why the target code it generates preserves the safety properties of the source program.

Page 21: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The Role ofJava in this Short Course

In recent years, Java has been the main focus of my work.

Java is just barely a civilized programming language.

We routinely do better than this.

Page 22: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Java

Java is probably a worthwhile subject of research.

However, it contains many outrageous and mostly inexcusable design errors.

As researchers, we should not forget that we have already done much better, and must continue to do better in the future.

Page 23: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Note

Our current approach seems to work for many problems.

But it is the only one we have tried — there are many others.

PCC is a general concept and we have just barely scratched the surface.

Page 24: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Overview of Our Approach

Please install and execute this.

OK, but let me quickly look over the instructions first.

Code producer Host

Page 25: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Overview of Our Approach

Code producer Host

Page 26: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Overview of Our Approach

This store instruction is dangerous!

Code producer Host

Page 27: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Overview of Our Approach

Can you prove that it is always safe?

Code producer Host

Page 28: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Overview of Our Approach

Can you prove that it is always safe?

Yes! Here’s the proof I got from my certifying Java compiler!

Code producer Host

Page 29: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Overview of Our Approach

Your proof checks out. I believe you because I believe in logic.

Code producer Host

Page 30: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Some History

Page 31: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

History: early 90’s

Fox project starts building the FoxNet

Need to control memory layout of data Words, bytes, etc. (endianness? alignment?) Boxed vs unboxed data (efficiency? control?) Packet headers (how to write packet filters?)

ML not expressive enough, and compiler technology is inadequate

Harper invents intentional polymorphism, typed intermediate languages, and type-directed compiling

Biagioni, et al., extend SML design

Page 32: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

History: mid 90’s

Question: Can these ideas be used in a “production-quality” compiler for a big language like ML?

Morrisett and Tarditi build TIL General hints on IL design Encouraging signs that optimizations are OK

Stone and Harper design the MIL

Lots of work, world-wide, on type-directed compiling

Work begins on TILT

Page 33: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

History: mid 90’s

An easy observation in 1995: Types in TIL are not carried all the way down to the

final target code The idea of enclosing LF encodings of proofs with

code is “floating around”

Lee and Necula work on this, but get nowhere Many problems, such as optimizations

Necula goes to DEC SRC to intern with Detlefs and Nelson

Works on extending ESC to catch memory leaks in Modula-3 programs

The next Fall, takes Frank’s Constructive Logic course

Page 34: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

History: 1996

Necula and Lee write several standard BPF packet filters in hand-optimized Alpha assembly code.

Simple operational semantics for a core “safe Alpha”

– Checks safety conditions for each instruction execution Proof system for “real Alpha”

– Encoded in LF– Proofs generated and checked using Elf

Results in “self-certified code”, later “proof-carrying code”

Plus proof representations, certifying compilation, safety policies (incl. resource bounds)

Inspires significant follow-on and new work at Cornell, Princeton, INRIA, and many other places

Page 35: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

History: 1999

CMU releases PCC to Cedilla Systems Incorporated.

Patent 6,128,774. Oct.2000, Safe to execute verification of software (Necula and Lee)

Patent 6,253,370. June 2001, Method and apparatus for annotating a computer program to facilitate subsequent processing of the program (Abadi, Ghemawat, and Stata)

In less than 26 months, a complete optimizing “ahead-of-time” PCC compiler for Java.

Page 36: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

“Applets, Not Craplets”

Page 37: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

History: Today

Strong similarities in TILT, PCC, TAL, …

Compiler design is changing

Some day, all compilers will be certifying

Page 38: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

History: Today

Are proofs really necessary?

Probably not

And they are messy, compared to types

But as a verification mechanism, proofchecking seems to have some possibly significant engineering advantages over typechecking

Page 39: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The primary contribution

“Proof engineering”.

PCC more clearly defined the proof-engineering problem

How to do checking with minimal overhead and restriction on programs, with minimal time and space overhead in checking, with minimal size and complexity of the checker, and with minimal need for changes when the proof

system changes

Page 40: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

K Virtual Machine

Designed to support the CLDC.

Must fit into <128KB.

Must have fast bytecode verification.

kJava class files must be Java-compatible.

Divides bytecode verification into two stages.

Page 41: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

kJava and KVM

kJava Compiler

CPU

Sourcecode

Annot

Bytecodes

kJava Preverifier

Verifier

Page 42: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

KVM Verification

“Preverification” is performed by the code producer.

Uses global (iterative) analysis to compute the types of stack slots and local vars at every join point.

Second stage is performed by class loader.

Simple linear scan verifies correctness of join-point annotations.

Page 43: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

KVM Example[from Frank Yellin]

static void test(Long x) { Number y = x; while (y.IntValue() != 0) { y = nextValue(y); } return y;

0. aload_01. astore_12. goto 10Long Number | <>5. aload_16. invokeStatic nextValue(Number)9. astore_1Long Number | <>10. aload_111. invokeVirtual intValue()14. ffne 517. return

Join-point typingannotations

Page 44: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

KVM Verification

The second stage verifier is a 10KB program that requires

a single scan of the code, and

<100 bytes of run-time storage.

Impressive!

This is Java verification done right.

Page 45: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Join-Point Annotations

All of these approaches to certified code make use of join-point typing annotations to reduce code verification to a simple problem.

They are essentially the classical loop invariants of the Dijkstra/ Hoare program verification approach.

Page 46: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Overheads

In TAL and PCC we observe relatively large annotations sizes (~10-20%), sometimes much more.

Unknown for kJava.

Research question:

Can we reduce this size?

Checking speed and storage space is also a problem.

Page 47: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The Special J Compiler

Page 48: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

High-Level Architecture

Explanation

CodeVerificationconditiongenerator

Checker

Safetypolicy

Agent

Host

Page 49: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

High-Level Architecture

Explanation

CodeVerificationconditiongenerator

Checker

Safetypolicy

Agent

Host

Page 50: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The VCGen

The verification condition generator (VCGen) examines each instruction.

It is a symbolic evaluator that essentially implements the operational semantics of a “safe” version of the machine language.

It checks some simple properties directly. E.g., direct jumps go to legal addrs.

Informally, it invokes the Checker when “dangerous” instructions are encountered.

Page 51: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The VCGen, cont’d

Examples of dangerous instructions:

memory operations

procedure calls

procedure returns

For each such instruction, VCGen creates a verification condition (VC).

Page 52: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

High-Level Architecture

Explanation

CodeVerificationconditiongenerator

Checker

Safetypolicy

Agent

Host

Page 53: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The Checker

When given a VC, the Checker attempts to determine its validity.

Sometimes, it consults the “explanation” for help with this.

If successful, it allows VCGen to proceed.

The set of allowable VCs and their valid proofs is defined by the safety policy.

Page 54: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

High-Level Architecture

Explanation

CodeVerificationconditiongenerator

Checker

Safetypolicy

Agent

Host

Page 55: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The Safety Policy

The safety policy is defined by an inference system that defines

the language of predicates (for VCs) the axioms and inference rules for

writing valid proofs of VCs. specifications (pre/post-conditions)

for each required entry point in the code.

Page 56: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Operational Semantics

The VCGen is derived (by hnd) directly from the operational semantics of a “safe machine”.

The calls to the checker establish that the code always makes progress (or halts normally) in the operational semantics.

This leads to a standard notion of soundness.

Page 57: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

What Can’t Be Enforced?

Liveness properties currently cannot be enforced by this architecture.

In practice, however, safety properties are often “good enough”.

Page 58: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Architecture

Code producer Host

Ginseng

Native code

Proof

Special J

Java binary

~52KB, written in CWritten in OCaml

Page 59: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Annotations

Architecture

Code producer Host

Proof checker

VCGen

Axioms

Native code

Proof

VCSpecial J

Java binary

Page 60: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Annotations

Architecture

Code producer Host

Java binary

Proof generator

Proof checker

VCGen

Axioms

Axioms

Certifying compiler

VCGen

VC

Native code

Proof

VC

Page 61: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Java Virtual Machine

JVM

Java Verifier

JNI

Class file Class file

Native code

Proof-carrying

code

Ch

ecke

r

Page 62: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Show either the Mandelbrot or NBody3D demo.

Page 63: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Crypto Test Suite Results[Cedilla Systems]

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Cedilla J ava J I T

sec

On average, 72.8% faster than Java, 37.5% faster than Java with a JIT.

Page 64: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Java Grande Suite v2.0 [Cedilla Systems]

0

100

200

300

400

500

600

700

Cedilla J ava J I T

sec

Page 65: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Java Grande Bench Suite [Cedilla Systems]

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

5000000

arith assign method

CedillaJ avaJ I T

ops

Page 66: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Ginseng

VCGen

Checker

Safety Policy

Dynamic loading

Cross-platformsupport

~15KB, roughly similar to a KVM verifier (but with floating-point).

~4KB, generic.

~19KB, declarative and machine-generated.

~22KB, some optional.

Page 67: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Example: Source Code

public class Bcopy { public static void bcopy(int[] src,

int[] dst) { int l = src.length; int i = 0;

for(i=0; i<l; i++) { dst[i] = src[i]; } }}

Page 68: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Example: Target Code

ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3).text.align 4.globl _bcopy__6arrays5BcopyAIAI_bcopy__6arrays5BcopyAIAI:

cmpl $0, 4(%esp)je L6movl 4(%esp), %ebxmovl 4(%ebx), %ecxtestl %ecx, %ecxjg L22ret

L22:xorl %edx, %edxcmpl $0, 8(%esp)je L6movl 8(%esp), %eaxmovl 4(%eax), %esi

L7:ANN_LOOP(INV = {

(csubneq ebx 0),(csubneq eax 0),(csubb edx ecx),(of rm mem)},

MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM))cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edxjl L7ret

L13:call __Jv_ThrowBadArrayIndex

ANN_UNREACHABLEnop

L6:call __Jv_ThrowNullPointer

ANN_UNREACHABLEnop

Page 69: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Cut Points

Each loop entry must be annotated as a cut point.

VCGen requires this so that checking can be performed in a single scan of the code.

As a convenience, the modified registers are also declared in the cut annotations.

Page 70: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Example: Source Code

public class Bcopy { public static void bcopy(int[] src,

int[] dst) { int l = src.length; int i = 0;

for(i=0; i<l; i++) { dst[i] = src[i]; } }}

Page 71: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Example: Target Code

ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3).text.align 4.globl _bcopy__6arrays5BcopyAIAI_bcopy__6arrays5BcopyAIAI:

cmpl $0, 4(%esp)je L6movl 4(%esp), %ebxmovl 4(%ebx), %ecxtestl %ecx, %ecxjg L22ret

L22:xorl %edx, %edxcmpl $0, 8(%esp)je L6movl 8(%esp), %eaxmovl 4(%eax), %esi

L7:ANN_LOOP(INV = {

(csubneq ebx 0),(csubneq eax 0),(csubb edx ecx),(of rm mem)},

MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM))cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edxjl L7ret

L13:call __Jv_ThrowBadArrayIndex

ANN_UNREACHABLEnop

L6:call __Jv_ThrowNullPointer

ANN_UNREACHABLEnop

Page 72: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

A Note about Memory

We define a type for valid heap memory states:

mem : exp

and operators for reading and writing heap memory:

(sel M A)

(upd M A E)

Page 73: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The VCGen Process (1)_bcopy__6arrays5BcopyAIAI:

cmpl $0, src je L6 movl src, %ebx movl 4(%ebx), %ecx testl %ecx, %ecx jg L22 retL22:

xorl %edx, %edx cmpl $0, dst je L6 movl dst, %eax movl 4(%eax), %esiL7: ANN_LOOP(INV = …

A0 = (type src_1 (jarray jint))A1 = (type dst_1 (jarray jint))A2 = (type rm_1 mem)A3 = (csubneq src_1 0)ebx := src_1ecx := (sel4 rm_1 (add src_1 4))

A4 = (csubgt (sel4 rm_1 (add src_1 4)) 0)

edx := 0

A5 = (csubneq dst_1 0)eax := dst_1esi := (sel4 rm_1 (add dst_1 4))

Page 74: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The VCGen Process (2)

L7: ANN_LOOP(INV = { (csubneq ebx 0), (csubneq eax 0), (csubb edx ecx), (of rm mem)}, MODREG = (EDI, EDX, EFLAGS,FFLAGS,RM)) cmpl %esi, %edx jae L13

movl 8(%ebx,%edx,4), %edi

movl %edi, 8(%eax,%edx,4) …

A3A5A6 = (csubb 0 (sel4 rm_1 (add src_1 4)))

edi := edi_1edx := edx_1rm := rm_2

A7 = (csubb edx_1 (sel4 rm_2 (add dst_1 4))!!Verify!! (saferd4 (add src_1 (add (imul edx_1 4) 8)))

Page 75: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The Checker (1)

The checker is asked to verify that(saferd4 (add src_1 (add (imul edx_1 4) 8)))

under assumptionsA0 = (type src_1 (jarray jint))A1 = (type dst_1 (jarray jint))A2 = (type rm_1 mem)A3 = (csubneq src_1 0)A4 = (csubgt (sel4 rm_1 (add src_1 4)) 0)A5 = (csubneq dst_1 0)A6 = (csubb 0 (sel4 rm_1 (add src_1 4)))A7 = (csubb edx_1 (sel4 rm_2 (add dst_1 4))

The checker looks in the PCC for a proof of this VC.

Page 76: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The Checker (2)

In addition to the assumptions, the proof may use axioms and proof rules defined by the host, such as

szint : pf (size jint 4)

rdArray4: {M:exp} {A:exp} {T:exp} {OFF:exp} pf (type A (jarray T)) -> pf (type M mem) -> pf (nonnull A) -> pf (size T 4) -> pf (arridx OFF 4 (sel4 M (add A 4))) -> pf (saferd4 (add A OFF)).

Page 77: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Checker (3)

A proof for

(saferd4 (add src_1 (add (imul edx_1 4) 8)))

in the Java specification looks like this (excerpt):

(rdArray4 A0 A2 (sub0chk A3) szint (aidxi 4 (below1 A7)))

This proof can be easily validated via LF type checking.

Page 78: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

VCGenSummary

VCGen is a symbolic evaluator for the object language.

It essentially implements a reference interpreter, except:

it uses symbolic values in order to model all possible executions, and

instead of performing run-time checks, it asks a Checker to verify the safety of “dangerous” instructions.

Page 79: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Safety Policies

More formally, we begin by defining the small-step operational semantics of a machine (called the s86).

, , pc instr ’, pc’

We define the machine so that only safe executions are defined.

program

register state

program counter

Page 80: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Safety Policies, cont’d

For convenience we choose the s86 to be a restriction of the x86.

Hence all s86 programs will execute faithfully on a real x86.

Except that on some programs in which the x86 does not execute, the x86 might do something weird.

The goal then is to prove that any given program always makes progress (or returns) in the s86.

With such a proof, the x86 is then just as good as an s86.

Page 81: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Verification Conditions

The point of the verification conditions, then, is to provide such progress theorems for each instruction in the program.

In other words, a VC’s validity says that the corresponding instruction has a defined execution in the s86 operational semantics.

Page 82: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Symbolic Evaluator

We can define the verification condition generator (VCGen) via a symbolic evaluator

SE,,0,Post(i, , L)

The result of symbolic evaluation is a conjunction of VCs, so the overall progress theorem is then

Pre SE,,0,Post(i, , L)

LF signaturepostcondition

entry point

annotations

Page 83: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Soundness

For particular operational semantics (a safe x86 and a safe Alpha), we have presented theorems that say, essentially:

Thm: If Pre SE,,0,Post(i, , L), then execution of , given Pre and 0, and starting from entry point i, will always make progress (or return).

Page 84: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Getting from Concept to Implementation

In an actual implementation, it is also handy to have a bit more than just a VC generator.

Precise syntax for VCs.

Pre/post-conditions for each entry point expected by the host in any downloaded code.

Precisely specified logical system for proving the VCs.

Verifier for “meta-data.”

Page 85: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Safety Policy Implementations

Safety policies are thus given in four parts:

A verification-condition generator (VCGen). A specification of the pre & post conditions

for all required procedures. A specification of the inference rules for

constructing valid proofs. Plug-ins for performing meta-data

verification.

LF (Elf syntax) is used for the rule and pre/post specifications, C for the VCGen and plug-ins.

Page 86: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

C?!@$#@!

The use of C to define and implement the VCGen is, at best, expedient and at worst dubious.

However, since any code-inspection system must parse object files (not trivial!) and understand the instruction set, this seems to have practical benefits.

Clearly, a more formal approach would be desirable.

Page 87: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

How Do We Know That It’s Right?

Page 88: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

How Do We Know That It’s Right?

Although the papers and dissertation follow a rigorous development leading to a soundness result, in practice it is tempting to hack in new things in the LF signature…

Page 89: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

ExampleJava Type-Safety Specification

Our largest example of a safety-policy specification is for the “SpecialJ” Java native-code compiler.

It contains about 140 inference rules.

Roughly speaking, these rules can be separated into 5 classes.

Page 90: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Safety PolicyRule Excerpts

/\ : pred -> pred -> pred.\/ : pred -> pred -> pred.=> : pred -> pred -> pred.all : (exp -> pred) -> pred.

pf : pred -> type.

truei : pf true.andi : {P:pred} {Q:pred} pf P -> pf Q -> pf (/\ P Q).andel : {P:pred} {Q:pred} pf (/\ P Q) -> pf P.ander : {P:pred} {Q:pred} pf (/\ P Q) -> pf Q.

1. Standard syntax and rules for first-order logic.

Type of valid proofs, indexed by predicate.

Syntax of predicates.

Inference rules.

Page 91: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

= : exp -> exp -> pred.<> : exp -> exp -> pred.

eq_le : {E:exp} {E':exp} pf (csubeq E E') -> pf (csuble E E').

moddist+: {E:exp} {E':exp} {D:exp} pf (= (mod (+ E E') D) (mod (+ (mod E D) E') D)).

=sym : {E:exp} {E':exp} pf (= E E') -> pf (= E' E).<>sym : {E:exp} {E':exp} pf (<> E E') -> pf (<> E' E).

=tr : {E:exp} {E':exp} {E'':exp} pf (= E E') -> pf (= E' E'') -> pf (= E E'').

Safety PolicyRule Excerpts

2. Syntax and rules for arithmetic and equality.

“csuble” means in the x86 machine.

Page 92: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Safety PolicyRule Excerpts

jint : exp.jfloat : exp.jarray : exp -> exp.jinstof : exp -> exp.

of : exp -> exp -> pred.

faddf : {E:exp} {E':exp} pf (of E jfloat) -> pf (of E' jfloat) -> pf (of (fadd E E') jfloat).

ext : {E:exp} {C:exp} {D:exp} pf (jextends C D) -> pf (of E (jinstof C)) -> pf (of E (jinstof D)).

3. Syntax and rules for the Java type system.

Page 93: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Safety PolicySample Rules

aidxi : {I:exp} {LEN:exp} {SIZE:exp} pf (below I LEN) -> pf (arridx (add (imul I SIZE) 8) SIZE LEN).

wrArray4: {M:exp} {A:exp} {T:exp} {OFF:exp} {E:exp} pf (of A (jarray T)) ->

pf (of M mem) -> pf (nonnull A) -> pf (size T 4) ->

pf (arridx OFF 4 (sel4 M (add A 4))) -> pf (of E T) -> pf (safewr4 (add A OFF) E).

4. Rules describing the layout of data structures.

This “sel4” means the result of reading 4 bytes from heap M at address A+4.

Page 94: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Safety PolicySample Rules

nlt0_0 : pf (csubnlt 0 0).nlt1_0 : pf (csubnlt 1 0).nlt2_0 : pf (csubnlt 2 0).nlt3_0 : pf (csubnlt 3 0).nlt4_0 : pf (csubnlt 4 0).

5. Quick hacks.

Sometimes “unclean” things are put into the specification...

Page 95: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The Basic Trick

Recall the bcopy program:public class Bcopy { public static void bcopy(int[] src,

int[] dst) { int l = src.length; int i = 0;

for(i=0; i<l; i++) { dst[i] = src[i]; } }}

Page 96: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Unoptimized Loop Body

L11 :movl 4(%ebx), %eaxcmpl %eax, %edxjae L24

L17 :cmpl $0, 12(%ebp)movl 8(%ebx, %edx, 4), %esije L21

L20 :movl 12(%ebp), %edimovl 4(%edi), %eaxcmpl %eax, %edxjae L24

L23 :movl %esi, 8(%edi, %edx, 4)movl %edi, 12(%ebp)incl %edx

L9 :ANN_INV(ANN_DOM_LOOP,

%LF_(/\ (of rm mem ) (of loc1 (jarray jint) ))%_LF,RB(EBP,EBX,ECX,ESP,FTOP,LOC4,LOC3))cmpl %ecx, %edxjl L11

Bounds check on src.

Bounds check on dst.

Note: L24 raises the ArrayIndex exception.

Page 97: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Unoptimized Code is Easy

In the absence of optimizations, proving the safety of array accesses is relatively easy.

Indeed, in this case it is reasonable for VCGen to verify the safety of the array accesses.

As the optimizer becomes more successful, verification gets harder.

Page 98: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Role of Loop Invariants

It is for this reason that the optimizer’s knowledge must be conveyed to the theorem prover.

Essentially, any facts about program values that were used to perform and code-motion optimizations must be declared in an invariant.

Page 99: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Optimized Loop Body

L7:ANN_LOOP(INV = {

(csubneq ebx 0),(csubneq eax 0),(csubb edx ecx),(of rm mem)},

MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM))cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edx

Essential facts about live variables, used by the compiler to eliminate bounds-checks in the loop body.

Page 100: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Certifying Compiling andProving

Intuitively, we will arrange for the Prover to be at least as powerful as the Compiler’s optimizer.

Hence, we will expect the Prover to be able to “reverse engineer” the reasoning process that led to the given machine code.

An informal concept, needing a formal understanding! (Type theory is essential here…)

Page 101: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

What is Safety, Anyway?

If the compiler fails to optimize away a bounds-check, it will insert code to perform the check.

This means that programs may still abort at run-time, albeit with a well-defined exception.

Is this safe behavior?

Page 102: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Compiler Development

The PCC infrastructure catches many (probably most) compiler bugs early.

Our standard regression test does not execute the object code!

Principle: Most compiler bugs show up as safety violations.

Page 103: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Example Bug

… L42: movl 4(%eax), %edx

testl %edx, %edxjle L47

L46: … set up for loop … L44: … enter main loop code …

…jl L44jmp L32

L47: fldzfldz

L32: … return sequence …ret

Page 104: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Example Bug

… L42: movl 4(%eax), %edx

testl %edx, %edxjle L47

L46: … set up for loop … L44: … enter main loop code …

…jl L44jmp L32

L47: fldz

L32: … return sequence …ret

Error in rarely executed compensation code is caught by the Proof Generator.

Page 105: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Another Example Bug

Suppose bcopy’s inner loop is changed:

L7: ANN_LOOP( … )cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)incl %edxcmpl %ecx, %edxjl L7ret

Page 106: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Another Example Bug

Suppose bcopy’s inner loop is changed:

L7: ANN_LOOP( … )cmpl %esi, %edxjae L13movl 8(%ebx, %edx, 4), %edimovl %edi, 8(%eax, %edx, 4)addl 2, %edxcmpl %ecx, %edxjl L7ret

Again, PCC spots the danger.

Page 107: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Yet Another

class Floatexc extends Exception {

public static int f(int x) throws Floatexc { return x;} public static int g(int x) { return x;}

public static float handleit (int x, int y) {float fl=0;try { x=f(x); fl=1; y=f(y);}catch (Floatexc b) { fl+=fl; }return fl;

}}

Page 108: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Yet Another

…Install handler…pushl $_6except8Floatexc_Ccall __Jv_InitClassaddl $4, %esp

…Enter try block…L17:

movl $0, -4(%ebp)pushl 8(%ebp)call _6except8Floatexc_MfIaddl $4, %espmovl %eax, %ecx

……A handler…L22:

flds -4(%ebp)fadds -4(%ebp)jmp L18

Page 109: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Another Example[by George Necula]

void fir (int *data, int dlen, int *filter, int flen) { int i, j;

for (i=0; i<=dlen-flen; i++) { int s = 0;

for (j=0; j<flen; j++) s += filter[j] * data[i+j];

data[i] = s; }}

Page 110: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Compiled Example

ri = 0sub t1 = rdl, rfl

L0: CUT(ri,rj,rs,t2,t3,t4,rm)le t2 = ri, t1jeq t2, L3rs = 0rj = 0

L1: CUT(rj,rs,t2,t3,t4)lt t2 = rj, rfljeq t2, L2ult t2 = rj, rfljeq t2, Labortld t3 = [rf + 4*rj]add t2 = ri, rj

ult t4 = t2, rdljeq t4, Labortld t2 = [rd + 4*t2]mul t2 = t3, t2add rs = rs, t2add rj = rj, 1jmp L1

L2: ult t2 = ri, rdljeq t2, Labortst [rd + 4*ri] = rsadd ri = ri, 1jmp L0

L3: retLabort: call abort

/* rd=data, rdl=dlen, rf=filter, rfl=flen */

Page 111: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The Safety Policy

The safety policy defines verification conditions of the form:

true, E = E saferd(M, E), safewr(M, E, E) array(EA, ES, EL), vector(EA, ES, EL) Prefir = array(rd,4,rdl),

vector(rf,4,rfl) Postfir = true

Page 112: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

VCGen Example

ri = 0sub t1 = rdl, rfl

L0: CUT(ri,rj,rs,t2,t3,t4,rm)

le t2 = ri, t1jeq t2, L3…

L3: ret

Assume precondition: array(cd,4,cdl) vector(cf,4,cfl)

Set ri = 0

Set t1 = sub(cdl,cfl)

Set rd=cd; rdl=cdl; rf=cf; rfl=cfl; rm=cm

Set ri=ci; rj=cj; rs=cs; t2=c2; t3=c3; t4=c4; rm=cm’

Set t2 = le(ci, sub(cdl,cfl))Assume not(le(ci, sub(cdl,cfl)))

Check postcondition;

Check rd,rdl,rf,rfl have initial values

Page 113: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

VCGen Example

ri = 0sub t1 = rdl, rfl

L0: CUT(ri,rj,rs,t2,t3,t4,rm)

le t2 = ri, t1jeq t2, L3rs = 0rj = 0

L1: CUT(rj,rs,t2,t3,t4)

lt t2 = rj, rfljeq t2, L2…

L2: ult t2 = ri, rdljeq t2, Labortst [rd + 4*ri] = rs

Set ri = 0

Set t1 = sub(cdl,cfl)Set ri=ci; rj=cj; rs=cs; t2=c2 t3=c3; t4=c4; rm=cm’

Set t2 = le(ci, sub(cdl,cfl))Assume le(ci, sub(cdl,cfl))Set rs = 0Set rj = 0Set rj=cj’; rs=cs’; t2=c2’; t3=c3’; t4=c4’

Set t2 = lt(cj’, cfl)Assume not(lt(cj’, cfl))

Set t2 = ult(ci, cdl)Assume ult(ci, cdl)Check safewr(cm’, add(cd,mul(4,ci)),cs’)

Page 114: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

More on the Safety Policy

Some of the inference rules in the LF signature:

rdarray : saferd(M,add(A,mul(S,I))) <- array(A,S,L), ult(I,L).

rdvector : saferd(M,add(A,mul(S,I))) <- vector(A,S,L), ult(I,L).

wrarray : safewr(M,add(A,mul(S,I)),V) <- array(A,S,L), ult(I,L).

Page 115: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The Checker

When the Checker is invoked on safewr(cm’, add(cd,mul(4,ci)), cs’)

There are assumptions: assume0 : ult(ci,cdl). assume1 : not(lt(cj’,cfl)). assume2 : le(ci, sub(cdl,cfl)). assume3 : vector(cf,4,cfl). assume4 : array(cd,4,cdl).

Page 116: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The Checker, cont’d

The VC safewr(cm’, add(cd,mul(4,ci)), cs’)

can be verified by using the rule wrarray : safewr(M,add(A,mul(S,I)),V) <- array(A,S,L), ult(I,L).

and assumptions assume0 : ult(ci,cdl). assume4 : array(cd,4,cdl).

Page 117: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Proof Representation

A simple (but somewhat naïve) representation of the proof is simply the sequence of proof rules:

wrarray, assume4, assume0

Page 118: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Optimized Code

The previous example was somewhat simplified.

More realistic code is optimized, usually based on inferences about integer values.

Such optimizations require that arithmetic invariants be placed in the cut points.

Page 119: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Optimized Example

ri = 0sub t1 = rdl, rfl

L0: CUT(ri>0,{ri,rj,…})le t2 = ri, t1jeq t2, L3rs = 0rj = 0

L1: CUT(rj>0,{rj,rs,…})lt t2 = rj, rfljeq t2, L2ld t3 = [rf + 4*rj]add t2 = ri, rj

ld t2 = [rd + 4*t2]mul t2 = t3, t2add rs = rs, t2add rj = rj, 1jmp L1

L2: st [rd + 4*ri] = rsadd ri = ri, 1jmp L0

L3: ret

/* rd=data, rdl=dlen, rf=filter, rfl=flen */

Page 120: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

VCGen Example

ri = 0sub t1 = rdl, rfl

L0: CUT(ri>0, {ri,rj,rs,t2,t3,t4,rm}

le t2 = ri, t1jeq t2, L3rs = 0rj = 0

Set ri = 0

Set t1 = sub(cdl,cfl)Set ri=ci; rj=cj; rs=cs; t2=c2 t3=c3; t4=c4; rm=cm’

Set t2 = le(ci, sub(cdl,cfl))Assume le(ci, sub(cdl,cfl))

Assume >(ci,0)

Page 121: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Practical Considerations

Page 122: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Trusted Computing Base

The trusted computing base is the software infrastructure that is responsible for ensuring that only safe execution is possible.

Obviously, any bugs in the TCB can lead to unsafe execution.

Thus, we want the TCB to be simple, as well as fast and small.

Page 123: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

VCGen’s Complexity

Fortunately, proofs can be quite small, and proofchecking can be quite simple, small, and fast.

VCGen, at core, is also simple and fast.

But in practice it gets to be quite complicated.

Page 124: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

VCGen’s Complexity

Some complications: If dealing with machine code, then

VCGen must parse machine code. Maintaining the assumptions and

current context in a memory-efficient manner is not easy.

Note that Sun’s kVM does verification in a single pass and only 8KB RAM!

Page 125: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

VC Explosion

a == b

a == c

f(a,c)

a := x c := x

a := y c := y

a=b => (x=c => safef(y,c) x<>c => safef(x,y))

a<>b => (a=x => safef(y,x) a<>x => safef(a,y))

Exponential growth in size of the VC is possible.And it actually happens in practice!

Precondition: safef(i,j)

Page 126: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

VC Explosion

a == b

a == c

f(a,c)

a := x c := x

a := y c := y

INV: P(a,b,c,x)

(a=b => P(x,b,c,x)

a<>b => P(a,b,x,x))

(a’,c’. P(a’,b,c’,x) =>

a’=c’ => safef(y,c’) a’<>c’ => safef(a’,y))

Growth can usually becontrolled by careful placementof just the right “join-point” invariants.

Page 127: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Stack Slots

Each procedure will want to use the stack for local storage.

This raises a serious problem because a lot of information is lost by VCGen (such as the value) when data is stored into memory.

Page 128: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Stack Slots

We avoid this problem by assuming that procedures use up to 256 words of stack as registers.

Main restriction:

No indirect addressing of stack slots.

Page 129: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Callee-save Registers

Standard calling conventions dictate that the contents of some registers be preserved.

These callee-save registers are specified along with the pre/post-conditions for each procedure.

The preservation of their values must be verified at every return instruction.

Page 130: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Postcondition

Precondition

ANN_FUNCTION(__Jv_instanceof,

%LF_(/\ (of loc3 (jinstof _4java4lang6Object_C))

(/\ (of (loc2 jint)

(/\ (jelemtype loc1)

(of rm mem))))%_LF,

%LF_(/\ (of eax jbool)

(of rm mem))%_LF,

RB(ESP,EBP,FTOP),

3,4)

Function specifications

Callee-save registersStack spec

Page 131: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Annotations used by Special J

ANN_CLASSANN_FUNCTIONANN_LOCALSANN_INVANN_DOM_LOOPANN_DOMINATORANN_SYMBOLADDRANN_CALLJAVAVIRTUALANN_CALLJAVAINTERFACEANN_JUMPTHROUGHTABLEANN_INSTALLEDJAVAHANDLERANN_UNINSTALLEDJAVAHANDLERANN_UNREACHABLE

Page 132: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

ANN_CLASS and ANN_FUNCTION

Normally, ANN_FUNCTION is not used. Instead, ANN_CLASS declares that an object file implements a Java class.

public final class Factor1 { … }

ANN_CLASS(_7Factor1_vt)…

Page 133: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

ANN_LOCALS

As a convenience for VCGen, the number of stack slots is declared for each method.

public static void combineTags(Node n, int i) {

}

ANN_LOCALS(__7Factor1_McombineTagsL4NodeXI, 8).text.align 4.globl __7Factor1_McombineTagsL4NodeXI__7Factor1_McombineTagsL4NodeXI :…

Page 134: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

ANN_INV / ANN_DOM_LOOP

Loop invariants.

ANN_INV(ANN_DOM_LOOP,

%LF_(/\ (nonnull loc2 )

(/\ (of rm mem )

(of eax (jinstof

_4java4util12ListIterator_vt) )))%_LF,

RB(EBP,ESP,FTOP,LOC4,LOC3,LOC2))

Signifies loop invariant

Invariants

Modified registers

Page 135: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

ANN_DOMINATOR

Dominating join points are marked.

ANN_DOMINATOR.L536_dom:

jle .L237

.L237 :ANN_INV(.L536_dom, %LF_(/\ (nonnull loc3 ) (/\ (of rm mem ) (of loc3 (jinstof _4Node_vt) )))%_LF, RB(EBP,ESP,FTOP,LOC5,LOC4))

Page 136: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Invariants

Special J currently emits the followings kinds of invariants:

true, false x = y, x <> y (x,y regs or consts) x < y (signed and unsigned) x : t

jint, jbool, … Jclassdesc jinstof(C) implSpecIntf(x,y,z) …

Page 137: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Virtual method invocation

public static void combineTags(Node n, int i) { if(i>0) { if(!n.isString()) { Iterator iter = n.getSubtrees();

while(iter.hasNext()) { combineTags((Node)(iter.next()), i-1); }

Page 138: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Virtual method invocation, cont’d

For the loop body:pushl $1 # vmethod

ANN_SYMBOLADDR(0)pushl $_4java4util8Iterator_vt # classpushl -4(%ebp) # objectcall __Jv_LookupInterfaceMethodaddl $12, %esppushl -4(%ebp)

ANN_CALLJAVAVIRTUAL(_4java4util8Iterator_vt, 1) # next methodcall *%eaxaddl $4, %esp

ANN_SYMBOLADDR(0)pushl $_4Node_vtpushl $0pushl %eaxcall __Jv_checkCast

Page 139: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Jump tables

public static final void closeToString (int t) throws IOException { if(!isEmpty(t)) { switch (getColor(t)) { case -1 : break ; // no color case 0 : singleTagString('r', noSecond, false); break; case 1 : singleTagString('g', noSecond, false); break; case 2 : singleTagString('b', noSecond, false); break; case 3 : singleTagString('c', noSecond, false); break; case 4 : singleTagString('m', noSecond, false); break; case 5 : singleTagString('y', noSecond, false); break; case 6 : singleTagString('k', noSecond, false); break; case 7 : singleTagString('w', noSecond, false); break; }…

Page 140: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Jump tables, cont’d

ANN_DOMINATOR.L181_dom:

jae .L23.L33 :ANN_JUMPTHROUGHTABLE(.L32, 9)ANN_SYMBOLADDR(0)

jmp *.L32(, %ebx, 4).L24 :

pushl $0pushl $0pushl $119call

__3Tag_MsingleTagStringCCZaddl $12, %espjmp .L23

.L25…

….L32:

.long .L23

.long .L31

.long .L30

.long .L29

.long .L28

.long .L27

.long .L26

.long .L25

.long .L24

Page 141: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Exception handlers

public Object clone() { try { return super.clone(); } catch (CloneNotSupportedException e) { return null; }}

Page 142: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Exception handlers, cont’d

__7Context_Mclone :pushl %ebpmovl %esp, %ebpcall __Jv_GetExcHandler

ANN_SYMBOLADDR(0)pushl $.L11

ANN_SYMBOLADDR(0)pushl

$_4java4lang26CloneNotSupportedException_vtpushl %ebppushl $1pushl (%eax)

ANN_INSTALLJAVAHANDLER(.L11)movl %esp, (%eax)pushl 8(%ebp)

ANN_DOMINATOR.L14_dom:

call __4java4lang6Object_Mcloneaddl $4, %esp

.L9 :movl %eax, 8(%ebp)call __Jv_GetExcHandlermovl (%esp), %ebx

ANN_UNINSTALLJAVAHANDLER(1)…

.L11 :ANN_INV(.L14_dom,

%LF_(of rm mem )%_LF,RB(EBP,ESP,FTOP,LOC3,LOC2))nop

.L12 :xorl %eax, %eaxmovl %ebp, %esppopl %ebpret

Page 143: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Efficient Representation and Validation of Proofs

Page 144: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Goals

We would like a representation for proofs that is

compact, fast to check, requires very little memory to check, and is “canonical,” in the sense of

accommodating many different logics without requiring a reimplementation of the checker.

Page 145: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Three Approaches

1. Direct representation of a logic.

2. Use of a Logical Framework.

3. Oracle strings.

We will reject (1).We consider only (2) and (3).

Page 146: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Logical Framework

For representation of proofs we use the Edinburgh Logical Framework (LF).

Page 147: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

LFi

Skip?

Page 148: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

LF Example in Elf Syntax

exp : typepred : typepf : pred -> type

true : pred/\ : pred -> pred -> pred=> : pred -> pred -> predall : (exp -> pred) -> pred

truei : pf trueandi : {P:pred} {R:pred} pf P -> pf R -> pf (/\ P R)andel : {P:pred} {R:pred} pf (/\ P R) -> pf Pimpi : {P:pred} {R:pred} (pf P -> pf R) -> pf (=> P R)alli : {P:exp -> pred} ({X:exp} pf (P X)) -> pf (all P)alle : {P:exp -> pred} {E:exp} pf (all P) -> pf (P E)

Page 149: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

LF as a Proof Representation

LF is canonical, in that a single typechecker for LF can serve as a proofchecker for many different logics specified in LF. [See Avron, et al. ‘92]

But the efficiency of the representation is poor.

Page 150: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Size of LF Representation

Proofs in LF are extremely large, due to large amounts of repetition.

Consider the representation of P P P for some predicate P:

The proof of this predicate has the following LF representation:

(=> P (/\ P P))

(impi P (/\ P P) ([X:pf P] andi P P x x))

Page 151: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Checking LF

The nice thing is that typechecking

is enough for proofchecking. [The theorem is in the LF paper.]

But the proofs are extremely large.

(impi P (/\ P P) ([X:pf P] andi P P X X)) : pf (=> P (/\ P P))

Page 152: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Implicit LF

A dramatic improvement can be achieved by using a variant of LF, called Implicit LF, or LFi.

In LFi, parts of the proof can be replaced by placeholders.

(impi * * ([X:*] andi * * X X)) : pf (=> P (/\ P P))

Page 153: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Soundness of LFi

The soundness of the LFi type system is given by a theorem that states:

If, in context , a term M has type A in LFi (and and A are placeholder-free), then there is a term M’ such that M’ has type A in LF.

Page 154: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Typechecking LFi

The typechecking algorithm for LFi is given in [Necula & Lee, LICS98].

A key aspect of the algorithm is that it avoids repeated typechecking of reconstructed terms.

Hence, the placeholders save not only space, but also time.

Page 155: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Effectiveness of LFi

In experiments with PCC, LFi leads to substantial reductions in proof size and checking time.

Improvements increase nonlinearly with proof size.

Experiment Proof size (bytes) Checking time (ms)LF LFi LF LFi

unpack >10 x 106 23728 8256 42simplex >2 x 106 23888 1656 42sharpen 183444 4816 136 7qsort 92412 3098 74 6kmp 77246 2092 60 3bcopy 12466 796 11 1

Page 156: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The Need for Improvement

Despite the great improvement of LFi, in our experiments we observe that, in practice, LFi proofs are 10%-200% the size of the code.

Page 157: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

How Big is a Proof?

A basic question is how much essential information is in a proof?

In this proof,

there are only 2 uses of rules and in each case they were the only rule that could have been used.

(impi * * ([X:*] andi * * x x)) : pf (=> P (/\ P P))

Page 158: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Improving the Representation

We will now improve on the compactness of proof representation by making use of the observation that large parts of proofs are deterministically generated from the inference rules.

Page 159: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Additional References

For LF:

Harper, Honsell, & Plotkin. A framework for defining logics. Journal of the ACM, 40(1), 143-184, Jan. 1993.

Avron, Honsell, Mason, & Pollack. Using typed lambda calculus to implement formal systems on a machine. Journal of Automated Reasoning, 9(3), 309-354, 1992.

Page 160: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Additional References

For Elf: Pfenning. Logic programming in the

LF logical framework. Logical Frameworks, Huet & Plotkin (Eds.), 149-181, Cambridge Univ. Press, 1991.

Pfenning. Elf: A meta-language for deductive systems (system description). 12th International Conference on Automated Deduction, LNAI 814, 811-815, 1994.

Page 161: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Oracle-Based Checking

Page 162: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Necula’s ExampleSyntax of Girard’s System F

ty : typeint : tyarr : ty -> ty -> tyall : (ty -> ty) -> ty exp : typez : exps : exp -> explam : (exp -> exp) -> expapp : exp -> exp -> exp

of : exp -> ty -> type

Page 163: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Necula’s ExampleTyping Rules for System F

tz : of z int

ts : {E:exp} of E int -> of (s E) int

tlam : {E:exp->exp} {T1:ty} {T2:ty} ({X:exp} of X T1 -> of (E X) T2) -> of (lam E) (arr T1 T2)

tapp : {E1:exp} {E2:exp} {T:ty} {T2:ty} of E1 (arr T2 T) -> of E2 T2 -> of (app E1 E2) T

tgen : {E:exp} {T:ty->ty} ({T1:ty} of E (T T1)) -> of E (all T)

tins : {E:exp} {T:ty->ty} {T1:ty} of E (all T) -> of E (T T1)

Page 164: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

LF Representation

Consider the lambda expression

It is represented in LF as follows:

(f.(f x.x) (f 0)) y.y

app (lam [F:exp] app (app F (lam [X:exp] X)) (app F 0)) (lam [Y:exp] Y)

Page 165: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Necula’s Example

Now suppose that this term is an applet, with the safety policy that all applets must be well-typed in System F.

One way to make a PCC is to attach a typing derivation to the term.

Page 166: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Typing Derivation in LF(tapp (lam [F:exp] (app (app F (lam [X:exp] X)) (app F 0))) (lam ([X:exp] X)) (all ([T:ty] arr T T)) int (tlam (all ([T:ty] arr T T)) int ([F:exp] (app (app F (lam [X:exp] X)) (app F 0))) ([F:exp][FT:of F (all ([T:ty] arr T T))] (tapp (app F (lam [X:exp] X)) (app F 0) int int (tapp F (lam [X:exp] X) (arr int int) (arr int int) (tins F ([T:ty] arr T T) (arr int int) FT) (tlam int int ([X:exp] X) ([X:exp][XT:of X int] XT))) (tapp F 0 int int (tins F ([T:ty] arr T T) int FT) t0)))) (tgen (lam [Y:exp] Y) ([T:ty] arr T T) ([T:ty] (tlam T T ([Y:exp] Y) ([Y:exp] [YT:of Y T] YT)))))

Page 167: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Typing Derivation in LFi

(tapp * * (all ([T:*] arr T T)) int (tlam * * * ([F:*][FT:of F (all ([T:ty] arr T T))] (tapp * * int (tapp * * (arr int int) (arr int int) (tins * * * FT) (tlam * * * ([X:*][XT:*] XT))) (tapp * * int int (tins * * * FT) t0)))) (tgen * * ([T:*] (tlam * * * ([Y:*] [YT:*] YT)))))

I think. I did this by hand!

Page 168: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

LF Representation

Using 16 bits per token, the LF representation of the typing derivation requires over 2,200 bits.

The LFi representation requires about 700 bits.

(The term itself requires only about 360 bits.)

Skip ahead

Page 169: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

A Bit More about LFi

To convert an LF term into an LFi term, a representation algorithm is used. [Necula&Lee, LICS98]

Intuition: When typechecking a term: c M1 M2 … Mn : A (in a context )

we know, if A has no placeholders, that some of the M1…Mn may appear in A.

Page 170: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

A Bit More about LFi, cont’d

For example, when the rule

is applied at top level, the first two arguments are present in the term

and thus can be elided.

tapp : {E1:exp} {E2:exp} {T:ty} {T2:ty} of E1 (arr T2 T) -> of E2 T2 -> of (app E1 E2) T

app (lam [F:exp] app (app F (lam [X:exp] X)) (app F 0)) (lam [Y:exp] Y)

Page 171: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

A Bit More about LFi, cont’d

A similar trick works at lower levels by relying on the fact that typing constraints are solved in a certain order (e.g., right-to-left).

See the paper for complete details.

Page 172: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Can We Do Better?

tz : of z int

ts : {E:exp} of E int -> of (s E) int

tlam : {E:exp->exp} {T1:ty} {T2:ty} ({X:exp} of X T1 -> of (E X) T2) -> of (lam E) (arr T1 T2)

tapp : {E1:exp} {E2:exp} {T:ty} {T2:ty} of E1 (arr T2 T) -> of E2 T2 -> of (app E1 E2) T

tgen : {E:exp} {T:ty->ty} ({T1:ty} of E (T T1)) -> of E (all T)

tins : {E:exp} {T:ty->ty} {T1:ty} of E (all T) -> of E (T T1)

Page 173: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Determinism

Looking carefully at the typing rules, we observe:

For any typing goal where the term is known but the type is not:

3 possibilities: tgen, tins, other.

If type structure is known, only 2 choices, tapp or other.

Page 174: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

How MuchEssential Information?

(tapp (lam [F:exp] (app (app F (lam [X:exp] X)) (app F 0))) (lam ([X:exp] X)) (all ([T:ty] arr T T)) int (tlam (all ([T:ty] arr T T)) int ([F:exp] (app (app F (lam [X:exp] X)) (app F 0))) ([F:exp][FT:of F (all ([T:ty] arr T T))] (tapp (app F (lam [X:exp] X)) (app F 0) int int (tapp F (lam [X:exp] X) (arr int int) (arr int int) (tins F ([T:ty] arr T T) (arr int int) FT) (tlam int int ([X:exp] X) ([X:exp][XT:of X int] XT))) (tapp F 0 int int (tins F ([T:ty] arr T T) int FT) t0)))) (tgen (lam [Y:exp] Y) ([T:ty] arr T T) ([T:ty] (tlam T T ([Y:exp] Y) ([Y:exp] [YT:of Y T] YT)))))

Page 175: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

How MuchEssential Information?

There are 15 applications of rules in this derivation.

So, conservatively: log2 3 15 = 30 bits

In other words, 30 bits should be enough to encode the choices made by a type inference engine for this term.

Page 176: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Oracle-based Checking

Idea: Implement the proofchecker as a nondeterministic logic interpreter whose

program consists of the derivation rules, and

initial goal is the judgment to be verified.

We will avoid backtracking by relying on the oracle string.

Skip ahead

Page 177: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Why Higher-Order?

The syntax of VCs for the Java type-safety policy is as follows:

The LF encodings are simple Horn clauses (and requiring only first-order unification). Higher-order features only for implication and universal quantification.

E ::= x | c E1 … En

F ::= true | F1 F2 | x.F | E | E F

Page 178: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Why Higher-Order?

Perhaps first-order Horn logic (or perhaps first-order hereditary Harrop formulas) is enough.

Indeed, first-order expressions and formulas seem to be enough for the VCs in type-safety policies.

However, higher-order and modal logics would require higher-order features.

Page 179: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

A SimplificationA Fragment of LF

Level-0 types. A ::= a | A1 A2

Level-1 types (-normal form). B ::= a M1 … Mn | B1 B2 | x:A.B

Level-0 kinds. K ::= Type | A K

Level-0 terms (-normal form). M ::= x:A.M | c M1 … Mn | x M1 … Mn

Page 180: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

LF Fragment

This fragment simplifies matters considerably, without restricting the application to PCC.

Level-0 types to encode syntax.

Level-1 types to encode derivations.

No level-1 terms since we never reconstruct a derivation, only verify that one exists.

Page 181: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

LF Fragment, cont’d

ty : typeexp : type

of : exp -> ty -> type

Level-0 types.

Level-1 type family.

Disallowing level-2 and higher type families seems not to have any practical impact.

Page 182: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Logic InterpreterGoals

G ::= B | M = M’ | x:B.G | x:A.G

| T | G1 G2

.

For Necula’s example, the interpreter will be started with the goal

t:ty. of E t

Page 183: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Naïve Interpreter

solve(B1 B2) = x:B1. solve(B2)

solve(x:A.B) = x:A. solve(B)

solve(a M1 … Mn) = subgoals(B, a M1 … Mn) where B is the type of a level-1 constant or a level-1 quantified variable (in scope), as selected by the oracle.

subgoals(B1 B2, B) = x:B1. solve(B2)

subgoals(x:A.B’, B) = x:A. solve(B)

subgoals(a M1’ … Mn’, a M1 … Mn) = M1 = M1’ … Mn = Mn’

Page 184: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Back to the example

Consider

solve(of E t)

This consults the oracle.

Since there are 3 level-1 constants that could be used at this point, 2 bits are fetched from the oracle string (to select tapp).

Page 185: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Higher-Order Unification

The unification goals that remain after solve are higher-order and thus only semi-decidable.

A nondeterministic unification procedure (also driven by the oracle string) is used.

Some standard LP optimizations are also used.

Page 186: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Certifying Theorem Proving

Page 187: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Certifying Theorem Proving

Time does not allow a description here.

See: Necula and Lee. Proof generation

in the Touchstone theorem prover. CADE’00.

Of particular interest: Proof-generating congruence-

closure and simplex algorithms.

Page 188: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Resource Constraints

Bounds on certain resources can be enforced via counting.

In a Reference Intepreter: Maintain a global counter. Increment the count for each

instruction executed. Verify for each instruction that the

limit is not exceeded. Use the compiler to optimize away

the counting operations.

Page 189: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Ten Good Things About PCC

1. Someone else does all the really hard work.

2. The host system changes very little.

...

Page 190: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Logic as a lingua franca

CertifyingProver

CPU

Code

ProofProof

Engine

Page 191: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Logic as a lingua franca

CertifyingProver

CPU

ProofProof

Checker

Policy

VC

Code

Language/compiler/machine dependences isolated from the proof checker.

Expressed as predicates and derivations in a formal logic.

Page 192: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Logic as a lingua franca

CertifyingProver

CPU

…iaddiaload...

ProofProof

Checker

Policy

VC

Code can be in any language

once a Safety Policy is supplied.

Page 193: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Logic as a lingua franca

CertifyingProver

CPU

…addl %eax,%ebxtestl %ecx,%ecxjz NULLPTRmovl 4(%ecx),%edxcmpl %edx,%ebxjae ARRAYBNDSmovl 8(%ecx.%ebx.4).%edx...

ProofProof

Checker

Policy

VC

…addl %eax, %testl %ecx,%ejz NULLPTRmovl 4(%ecx),%cmpl %edx,%ebjae ARRAYBNDmovl 8(%ecx.

Adequacy of dynamic checksand “wrappers” can be verified.

Page 194: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Logic as a lingua franca

CertifyingProver

CPU

…add %eax,%ebxmovl 8(%ecx,%ebx,4)...

ProofProof

Checker

Policy

VC

Safety of optimized codecan be verified.

Page 195: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Ten Good Things About PCC

3. You choose the language.

4. Optimized (“unsafe”) code is OK.

5. Verifies that your optimizer and dynamic checks are OK.

Page 196: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

The Role ofProgramming Languages

Civilized programming languages can provide “safety for free”.

Well-formed/well-typed safe.

Idea: Arrange for the compiler to “explain” why the target code it generates preserves the safety properties of the source program.

Page 197: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Certifying Compilers[Necula & Lee, PLDI’98]

Intuition: Compiler “knows” why each translation

step is semantics-preserving. So, have it generate a proof that safety

is preserved. “Small theorems about big programs.”

Don’t try to verify the whole compiler, but only each output it generates.

Page 198: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Automation viaCertifying Compilation

CertifyingCompiler

CPU

ProofChecker

Policy

VC

Sourcecode

Proof

Objectcode

Looks and smells like a compiler.

% spjc foo.java bar.class baz.c -ljdk1.2.2

Page 199: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Ten Good Things About PCC

6. Can sometimes be easy-to-use.

7. You can still be a “hero theorem hacker” if you want.

...

Page 200: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Ten Good Things About PCC

8. Proofs are a “semantic checksum”.

9. Possibility for richer safety policies.

10. Co-exists peacefully with crypto.

Page 201: An Introduction to Proof-Carrying Code Peter Lee Carnegie Mellon University

Acknowledgments

George Necula.

Robert Harper and Frank Pfenning.

Mark Plesko, Michael Donohue, and Guy Bialostocki.