This One Time, at PL Camp ...
Summer School on Language-Based Techniques for Integrating with the External World
University of Oregon
Eugene, Oregon
July 2007
Checking Type Safety of Foreign Function Calls
Jeff Foster
University of Maryland
Ensure type safety across languages• OCaml/JNI – C
Multi-lingual type inference system• Representational types
SAFFIRE• Multi-lingual type inference system
Dangers of FFIs
In most FFIs, programmers write “glue code”• Translates data between host and foreign languages
• Typically written in one of the languages
Unfortunately, FFIs are often easy to misuse• Little or no checking done at language boundary
• Mistakes can silently corrupt memory
• One solution: interface generators
Example: “Pattern Matching”
if (Is_long(x)) { if (Int_val(x) == 0) /* B */ ... if (Int_val(x) == 1) /* D */ ...
} else {
if (Tag_val(x) == 0) /* A */ Field(x, 0) = Val_int(0)
if (Tag_val(x) == 1) /* C */ Field(x, 1) = Val_int(0)}
type t = A of int| B| C of int * int| D
Garbage Collection
C FFI functions need to play nice with the GC• Pointers from C to the OCaml heap must be registered
value bar(value list) { CAMLparam1(list); CAMLlocal1(temp); temp = alloc_tuple(2); CAMLreturn(Val_unit);
}
Easy to forget Difficult to find this error with testing
Multi-Lingual Types
Representational Types• Embed OCaml types in C types and vice versa
SAFFIRE
Static Analysis of Foreign Function InteRfacEs
Programming Models for Distributed Computing
Yannis Smaragdakis
University of Oregon
NRMI: Natural programming model for distributed computing.
J-Orchestra: Execute unsuspecting programs over a network, using program rewriting.
Morphing: High-level language facility for safe program transformation.
NRMI
Identify all reachable
4
9 7
1 3
t
alias1
alias2 4
9 7
1 3
tree
Client side Server sideNetwork
NRMI
Execute remote procedure
4
9 7
1 3
t
alias1
alias2 4
0 9
1 8
tree
Client side Server sideNetwork
2
tmp
NRMI
Send back all reachable
4
9 7
1 3
t
alias1
alias2 4
0 9
1 8
tree
Client side Network
2
NRMI
Match reachable maps
4
9 7
1 3
t
alias1
alias2 4
0 9
1 8
tree
Network
2
NRMI
Update original objects
4
0 9
1 8
t
alias1
alias2 4
0 9
1 8
tree
Network
2
NRMI
Adjust links out of original objects
4
0 9
1 8
t
alias1
alias2 4
0 9
1 8
tree
Network
2
NRMI
Adjust links out of new objects
4
0 9
1 8
t
alias1
alias2 4
0 9
1 8
tree
Network
2
NRMI
Garbage collect
4
0 9
1 8
t
alias1
alias2
Network
2
J-Orchestra
Automatic partition system Works as bytecode compiler
• lots of indirection using proxies, interfaces, local and remote objects
Partitioned program equivalent to original
Morphing
Ensure program generators are safe Statically check the generator to determine
the safety of any generated program, under All inputs• ensure that genrated programs compile
Early approach – SafeGen• Using theorem provers
MJ• Using types
Fault Tolerant Computing
David August and David Walker Princeton University
Processors are becoming more susceptible to intermittent faults.• Moore’s Law, radiation• Alter computation or state, resulting in incorrect
program execution. Goal: Build reliable systems from unreliable
components.
Topics
Transient faults and mechanisms designed to protect against them (HW).
The role of languages and compilers may play in creating radiation hardened programs.
New opportunities made possible by languages which embrace potentially incorrect behavior.
Causes
Software/Compiler
Duplicate instructions and check at important locations (store) [SWIFT, EDDI]
λzap
λ calculus with fault tolerance• Intermediate language for compilers• Models single fault• Based on replication
Semantics model type of faults
let x = 2 inlet y = x + x inout y
let x1 = 2 inlet x2 = 2 inlet x3 = 2 inlet y1 = x1 + x1 inlet y2 = x2 + x2 inlet y3 = x3 + x3 inout [y1,y2,y3]
Testing
Typing Ad Hoc Data
Kathleen Fisher
AT&T Labs
PADS project*
• Data Description Language (DDL)• Data Description Calculus (DDC)• Automatic inference of PADS descriptions
*http://padsproj.org
PADS
Declarative description of data source:• Physical format information• Semantic constraints
type responseCode = { x : Int | 99 < x < 600}
Pstruct webRecord { Pip ip; " - - ["; Pdate(’:’) date; ":"; Ptime(’]’) time; "]"; httpMeth meth; " "; Puint8 code; " "; Puint8 size; " "; };Parray webLog { webRecord[] };
Raw Data
ASCII log files Binary Tracesstruct { ........ ...... ...........}
Data Description
XML
CSV
Standard formats & schema;
Visual Information
End-user tools
Learning
Problem: Producing useful tools for ad hoc data takes a lot of time.
Solution: A learning system to generate data descriptions and tools automatically.
Format Inference Engine
Chunked Data
FormatRefinement
Tokenization
StructureDiscovery
ScoringFunction
IR to PADSPrinter
PADSDescription
Input File(s)
Multi-Staged Programming
Walid Taha
Rice University
Writing generic program that do not pay a runtime overhead.• Use program generators• Ensure syntactic well-formed, well-typed
MetaOCaml
The Abstract View
P2P1
I1P
Batch
I2
I2
I2
MetaOCaml
Brackets (.< >.)• delay execution of an expression
Escape (.~ )• Combine smaller delayed values to construct
larger ones
Run (.! )• Compile and execute the dynamically generated
code
Power Example
let rec power (n , x) = match n with 0 → 1 | n → x * (power (n-1, x));;
let power2 (x) = power (2, x);;let power2 = fun x → power (2, x);;
let power2 (x) = 1*x*x;
let rec power (n, x) = match n with 0 → .<1>. | n → .<.~x * ~(power (n-1, x))>.;;let power2 = .! .<fun x → .~(power (2, .<x>.))>.;;
Scalable Defect Detection
Manuvir Das, Daniel Wang,
Zhe Yang, Microsoft Research
Program analysis at Microsoft scale• scalability, accuracy
Combination of weak global analysis and slow local one (for some regions of code)
Programmers are requires to add interface annotations• some automatic inference is available
Web and Database Application Security
Zhendong Su
University of California-Davis
Static analyses for enforcing correctness of dynamically generated database queries.
Runtime checking mechanisms for detecting SQL injection attacks;
Static analyses for detecting SQL injection and cross-site scripting vulnerabilities.
XML and Web Application Programming
Anders Møller
University of Aarhus
Formal models of XML schemas• Expressiveness of DTD, XML Schema, Relax NG
Type checking XML transformation languages• “Assuming that X is valid according to Sin is T(x)
valid according to Sout?”
Web application frameworks• Java Servlets and JSP, JWIG, GWT
Types for Safe C-Level Programming
Dan Grossman
University of Washington
Cyclone, a safe dialect for C• Designed to prevent safety violations (buffer
overflow, memory management, …)
Mostly underlying theory• Types, expression, memory regions
Analyzing and Debugging Software
Understanding Multilingual Software [Foster]• Parlez vous OCaml?
Statistical Debugging [Liblit]• you are my beta tester, and there’s lots of you
Scalable Defect Detection [Das, Wang, Yang]• Microsoft programs have no bugs
Programming Models
Types for Safe C-Level Programming [Grossman]• C without the ick factor
Staged Programming [Taha]• Programs that produce programs that produce programs...
Prog. Modles for Dist. Comp. [Smaragdakis]• We’ve secretly replaced your centralized program with a
distributed application. can you tell the difference?
The Web
Web and Database Application Security [Su]• How not to be pwn3d by 1337 haxxors
XML and Web Application Programming [Møller]• X is worth 8 points in scrabble...let’s use it a lot
Other Really Important Stuff
Fault Tolerant Computing [August, Walker]• Help, I’ve been hit by a cosmic ray!
Typing Ad Hoc Data [Fisher]• Data, data, everywhere, but what does it mean?
Statistical Debugging
Ben Liblit
University Of Wisconsin-Madison
Statistical Debugging & Cooperative Bug Isolation• Observe deployed software in the
hands of real end users• Build statistical models of success &
failure• Guide programmers to the root causes
of bugs• Make software suck less
What’s This All About?
Motivation
“There are no significantbugs in our released software
that any significant numberof users want fixed.”
Bill Gates, quoted in FOCUS Magazine
Software Releases in the Real World
[Disclaimer: this may be a caricature.]
Software Releases in the Real World
1. Coders & testers in tight feedback loop• Detailed monitoring, high repeatability• Testing approximates reality
2. Testers & management declare “Ship it!”• Perfection is not an option• Developers don’t decide when to ship
Software Releases in the Real World
3. Everyone goes on vacation• Congratulate yourselves on a job well done!• What could possibly go wrong?
4. Upon return, hide from tech support• Much can go wrong, and you know it• Users define reality, and it’s not pretty
– Where “not pretty” means “badly approximated by testing”
Testing as Approximation of Reality
Microsoft’s Watson error reporting system• Crash report from 500,000 separate programs• x% of software causes 50% of bugs• Care to guess what x is?
1% of software errors causes 50% of user crashes Small mismatch ➙ big problems (sometime) Big mismatch ➙ small problem? (sometime!)
• Perfection is not an economically viable option
Real Engineers Measure Things;Are Software Engineers Real Engineers?
“The major difference between a thing that might go wrong and a thing that
cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong, it usually turns out to be
impossible to get at or repair.”
Instrumentation Framework
Douglas Adams, Mostly Harmless
Bug Isolation Architecture
ProgramSource
Compiler
ShippingApplicatio
nSampler
Predicates
Counts& /
StatisticalDebugging
Top bugs withlikely causes
Each behavior is expressed as a predicate P on program state at a particular program point.
Count how often “P observed true” and“P observed” using sparse but fair random samples of complete behavior.
Model of Behavior
ProgramSource
Compiler
ShippingApplicatio
nSampler
Predicates
Counts& /
StatisticalDebugging
Top bugs withlikely causes
Predicate Injection:Guessing What’s Interesting
Branch Predicates Are Interesting
if (p)…
else…
if (p)// p was true (nonzero)
else
// p was false (zero)
Syntax yields instrumentation site Site yields predicates on program behavior Exactly one predicate true per visit to site
Branch Predicate Counts
Returned Values Are Interesting
n = fprintf(…);
Did you know that fprintf() returns a value? Do you know what the return value means? Do you remember to check it?
n = fprintf(…);
// return value < 0 ?// return value == 0 ?// return value > 0 ?
Syntax yields instrumentation site Site yields predicates on program behavior Exactly one predicate true per visit to site
Returned Value Predicate Counts
Pair Relationships Are Interesting
int i, j, k;…i = …;
Pair Relationship Predicate Counts
int i, j, k;…i = …;
// compare new value of i with…// other vars: j, k, …// old value of i// “important” constants
Many Other Behaviors of Interest
Assert statements• Perhaps automatically introduced, e.g. by CCured
Unusual floating point values• Did you know there are nine kinds?
Coverage of modules, functions, basic blocks, …
Reference counts: negative, zero, positive, invalid
Kinds of pointer: stack, heap, null, …
Temporal relationships: x before/after y
More ideas? Toss them all into the mix!
Observation stream observation count• How often is each predicate observed true?• Removes time dimension, for good or ill
Bump exactly one counter per observation• Infer additional predicates (e.g. ≤, ≠, ≥) offline
Feedback report is:
1. Vector of predicate counters
2. Success/failure outcome label
Still quite a lot to measure• What about performance?
Summarization and Reporting
ProgramSource
Compiler
ShippingApplicatio
nSampler
Predicates
Counts& /
StatisticalDebugging
Top bugs withlikely causes
Fair Sampling Transformation
Sampling the Bernoulli Way
Decide to examine or ignore each site…• Randomly• Independently• Dynamically
Cannot be periodic: unfair temporal aliasing Cannot toss coin at each site: too slow
Amortized Coin Tossing
Randomized global countdown• Small countdown upcoming sample
Selected from geometric distribution• Inter-arrival time for biased coin toss• How many tails before next head?• Mean sampling rate is tunable parameter
Geometric Distribution
D = mean of distribution= expected sample density
1)1log(
))1,0(log(1
D
randcountdown
Weighing Acyclic Regions
Break CFG into acyclic regions
Each region has:• Finite number of paths
• Finite max number of instrumentation sites
Compute max weight in bottom-up pass
1
2 1
1
1
2
3
4
Weighing Acyclic Regions
Clone acyclic regions• “Fast” variant
• “Slow” variant
Choose at run time
Retain decrements on fast path for now• Stay tuned…
>4?
Path Balancing Optimization
Decrements on fast path are a bummer• Goal: batch them up
• But some paths are shorter than others
Idea: add extra “ghost” instrumentation sites• Pad out shorter paths
• All paths now equal
1
2 1
1
1
2
3
4
Path Balancing Optimization
Fast path is faster• One bulk counter
decrement on entry
• Instrumentation sites have no code at all
Slow path is slower• More decrements
Consume more randomness
1
2 1
1
1
2
3
4
Optimizations
Identify and ignore “weightless” functions / cycles
Cache global countdown in local variable
Avoid cloning
Static branch prediction at region heads
Partition sites among several binaries
Many additional possibilities…
What Does This Give Us?
Absolutely certain of what we do see• Subset of dynamic behavior• Success/failure label for entire run
Uncertain of what we don’t see
Given enough runs, samples ≈ reality• Common events seen most often• Rare events seen at proportionate rate
Playing the Numbers Game
ProgramSource
Compiler
ShippingApplicatio
nSampler
Predicates
Counts& /
StatisticalDebugging
Top bugs withlikely causes
Isolating a Deterministic Bug
Hunt for crashing bug in ccrypt-1.2
Sample function return values• Triple of counters per call site: < 0, == 0, > 0
Use process of elimination• Look for predicates true on some bad runs,
but never true on any good run
Elimination Strategies
Universal Falsehood• Disregard P if |P| = 0 for all runs• Likely a predicate that can never be true
Lack of failing coverage• All predicates for S is |S|=0 for all failed runs• Site not reached in failing executions
Lack of failing example• |P|=0 for all failed executions• Need not be true for a failure to occur
Successful counterexample• |P|>0 on at least one successful run• Can be true without causing failure
Winnowing Down the Culprits
1710 counters• 3 × 570 call sites
1569 zero on all runs• 141 remain
139 nonzero on at least one successful run
Not much left!• file_exists() > 0• xreadline() == 0
0 500 1000 1500 2000 2500 30000
20
40
60
80
100
120
140
Number of successful trials used
Nu
mb
er
of "
go
od
" fe
atu
res
left
Multiple, Non-Deterministic Bugs
Strict process of elimination won’t work• Can’t assume program will crash when it should
• No single common characteristic of all failures
Look for general correlation, not perfect prediction
Warning! Statistics ahead!
Ranked Predicate Selection
Consider each predicate P one at a time• Include inferred predicates (e.g. ≤, ≠, ≥)
How likely is failure when P is true?• (technically, when P is observed to be true)
Multiple bugs yield multiple bad predicates
Some Definitions
)()()(
)(
0 with runs successful # )(
0 with runsfailing # )(
PFPSPF
PBad
PPS
PPF
Are We Done? Not Exactly!
Bad(f = NULL)= 1.0Bad(f = NULL)= 1.0
Are We Done? Not Exactly!
Predicate (x = 0) is innocent bystander• Program is already doomed
Bad(f = NULL)= 1.0Bad(f = NULL)= 1.0
Bad(x = 0) = 1.0Bad(x = 0) = 1.0
Crash Probability
Identify unlucky sites on the doomed path
Background risk of failure for reaching this site, regardless of predicate truth/falsehood
)()()(
)(PPFPPS
PPFPContext
Isolate the Predictive Value of P
Does P being true increase the chance of failure over the background rate?
Formal correspondence to likelihood ratio testing
)()()( PContextPBadPIncrease
Increase Isolates the Predictor
Increase(f = NULL)= 1.0
Increase(f = NULL)= 1.0
Increase(x = 0) = 0.0Increase(x = 0) = 0.0
It Works!
…for programs with just one bug.
Need to deal with multiple bugs• How many? Nobody knows!
Redundant predictors remain a major problem
Goal: isolate a single “best” predictor for each bug, with no prior knowledge
of the number of bugs.
Multiple Bugs: Some Issues
A bug may have many redundant predictors• Only need one, provided it is a good one
Bugs occur on vastly different scales• Predictors for common bugs may dominate, hiding
predictors of less common problems
Bad Idea #1: Rank by Increase(P)
High Increase but very few failing runs These are all sub-bug predictors
• Each covers one special case of a larger bug Redundancy is clearly a problem
Bad Idea #2: Rank by F(P)
Many failing runs but low Increase Tend to be super-bug predictors
• Each covers several bugs, plus lots of junk
A Helpful Analogy
In the language of information retrieval• Increase(P) has high precision, low recall• F(P) has high recall, low precision
Standard solution:• Take the harmonic mean of both• Rewards high scores in both dimensions
Rank by Harmonic Mean
Definite improvement• Large increase, many failures, few or no
successes But redundancy is still a problem
Redundancy Elimination
One predictor for a bug is interesting• Additional predictors are a distraction• Want to explain each failure once
Similar to minimum set-cover problem• Cover all failed runs with subset of predicates• Greedy selection using harmonic ranking
Simulated Iterative Bug Fixing
1. Rank all predicates under consideration
2. Select the top-ranked predicate P
3. Add P to bug predictor list
4. Discard P and all runs where P was true• Simulates fixing the bug predicted by P• Reduces rank of similar predicates
5. Repeat until out of failures or predicates
Not Covered Today
Visualization of Bug Predictors
• Simple visualization may help reveal trends
Increase(P)
S(P)
error bound
log(F(P) + S(P))
Context(P)
Not Covered Today
Reconstruction of failing paths.• Bug predictor is often the smoking gun, but not
always.• Want short, feasible path that exhibits bug.
– “Just because it’s undecidable doesn’t mean we don’t need an answer.”