merlin: inferring specifications for explicit information flow problems ben livshits aditya nori...
TRANSCRIPT
Merlin: Inferring Specifications for Explicit Information Flow Problems
Ben LivshitsAditya Nori
Sriram RajamaniAnindya Banerjee
Web application
vulnerabilities are
a serious threat
addressed by
static analysis
tools
Microsoft CAT.NET
MOTIVATION & PROJECT GOALS
When it comes to static analysis tools, specification quality affects result
quality
More specification more bugs
Better specification fewer false positives
A typical
specification
includes dozens
of sources, sinks,
and sanitizers
Type Count Revisions
Sources 27 11
Sanitizers 7 2
Sinks 77 10
• Specification
Sources: start taint
Sinks: taint not allowed
Sanitizers: untaint
data
= 111 = 23
Tools are only as
good as the
specification and
good specification is
hard to come by
• This example
ReadData1, ReadData2 – source?
Cleanse – sanitizer?
WriteData – sink?
• Large scale
Libraries with their own APIs
Specification particular to application
1. void ProcessRequest() 2. { 3. string s1 = ReadData1("name"); 4. string s2 = ReadData2("encoding");
5. string s3 = Cleanse(s1);
6. WriteData("Parameter " + s1); 7. WriteData("Header " + s2); 8. }
ALGORITHMS
Merlin Processing
Initial specification
Program
Final specificationMerlin
inferenceProp. graph construction
Factor graph construction
Probabilistic inference
Static analysis
Vulnerabilities
1 2 3
We convert the
propagation
graph we get
from CAT.NET to a
reduced
propagation
graph
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
ReadData1, Cleanse,
WriteData
ReadData2
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
ReadData1,
ReadData2,
WriteData
Cleanse
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
ReadData1,
ReadData2, Cleanse
WriteData
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
Avoid source
wrappers:
Prop1 is not a source
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
Avoid sink wrappers:
Cleanse is not a sink
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
Avoid double
sanitizers:
Prop1 is not a
sanitizer
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
We derive
probabilistic
constraints from
the reduced
propagation
graph
We approximate
path constraints
with triple
constraints
B1: For every acyclic path
m1,m2,…,mk-1,mk, where m1 is a potential source and mk is a potential sink, the joint probability of classifying m1 as a source, mk as a sink and all of m2,…, mk-1 as regular nodes is low. C1: For every triple of nodes
‹m1,m2,m3›, where m1 is a potential source, m3 is a potential sink, and m1 and m3 are connected by a path through m2 in the propagation graph, the joint probability that m1 is a source, m2 is not a sanitizer, and m3 is a sink is low.
2N
N3
Probabilistic
inference
Source Sanitizer SinksReadData1 .95 .001 .001ReadData2 .5 .5 .5Cleanse .5 .5 .5WriteData .5 .5 .85
…
Source Sanitizer SinksReadData1 .95 .001 .001ReadData2 .5 .5 .5Cleanse .01 .997 .03WriteData .5 .5 .85
…
Direct constraint representation is too big. Factor graphs to the
rescue.
fC3(xProp1,xProp2) fC4(xProp1) fC2(xProp1,xProp2) fC4(xProp2) fC2(xProp2,xCleanse) fC4(xCleanse) fC3(xProp2,xCleanse) fC4(xWriteData)
xReadData1 xReadData2 xProp1 xProp2 xCleanse xWriteData
fC1(xReadData1,xProp1, xWriteData) fC1(xReadData1,xProp1, xWriteData) fC1(xReadData2,xProp2, xWriteData) fC1(xReadData1,xProp1, xWriteData)
EXPERIMENTALRESULTS
We have chosen
10 line-of-
business
applications
written in C#
using ASP.NET.
Summary of Discovered Specifications
Sources Sanitizers Sinks0
20
40
60
80
100
120
140
160Original With Merlin
Summary of Discovered Vulnerabilities
Original
With Merlin
Eliminated
-50 0 50 100 150 200 250 300 350 400
89
335
13
Analyze This:
a routine from
one of our
benchmarks that
shows how Merlin
affects
vulnerabilities.
known sink
Starting with an
initial
specification
really helps, but
Merlin can work
with no
specification at
all.
Executive
summary of
experimental
results.
• 10 large Web apps in .NET
• New specs: 167
• New vulnerabilities: 302
• False positives removed:
3
• Final false positive rate
for Cat.Net after Merlin
<1%
Related workExplicit information flow•Analysis of Web apps (WebSSARI, Griffin, etc.) and Fortify, Cat.Net•Asbestos, HiStar•Hardware support
Mining sec. specifications•AutoISES – security-sensitive data structures in Linux kernel•Ganapathy
Specification mining•Kremenek (belief inf. for malloc/free)•Perracotta, DynaMine, Dyckon, Weimer