highly scalable distributed dataflow analysis joseph l. greathouse advanced computer architecture...

28
Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlanc Todd Austin Valeria Bertacco CGO, Chamonix, France April 6, 2011

Upload: bryce-gibson

Post on 13-Dec-2015

221 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Highly Scalable Distributed Dataflow

AnalysisJoseph L. Greathouse

Advanced Computer Architecture Laboratory

University of Michigan

Chelsea LeBlanc Todd Austin Valeria Bertacco

CGO, Chamonix, FranceApril 6, 2011

Page 2: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

NIST: SW errors cost U.S. ~$60 billion/year as of 2002

FBI CCS: Security Issues $67 billion/year as of 2005 >⅓ from viruses, network intrusion, etc.

Software Errors Abound

2

Page 3: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Goals of this Work High quality dynamic software analysis

Find difficult bugs that other analyses miss

Distribute Tests to Large Populations Low overhead or users get angry

Accomplished by sampling the analyses Each user only tests part of the program

3

Page 4: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Dynamic Dataflow Analysis

Associate meta-data with program values

Propagate/Clear meta-data while executing

Check meta-data for safety & correctness

Forms dataflows of meta/shadow information

4

Page 5: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

a += ya += y z = y * 75z = y * 75

y = x * 1024y = x * 1024 w = x + 42w = x + 42 Check wCheck w

Example Dynamic Dataflow Analysis

5

validate(x)validate(x)x = read_input()x = read_input() Clear

a += ya += y z = y * 75z = y * 75

y = x * 1024y = x * 1024

x = read_input()x = read_input()

Propagate

Associate

InputInput

Check aCheck a

Check zCheck z

Data

Meta-data

Page 6: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Distributed Dynamic Dataflow Analysis

6

Split analysis across large populations Observe more runtime states Report problems developer never thought to test

Instrumented Program Potential

problems

Page 7: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Taint Analysis(e.g.TaintCheck)

Dynamic Bounds Checking

FP Accuracy Verification

Problem: DDAs are Slow

7

10-200x10-200x2-200x2-200x

10-80x10-80x

5-50x5-50x100-500x100-500x

2-300x2-300x

Page 8: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Our Solution: Sampling

8

Lower overheads by skipping some analyses

CompleteAnalysis

CompleteAnalysis

NoAnalysis

NoAnalysis

Page 9: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Sampling Allows Distribution

9

Developer

Beta TestersEnd Users

Many users testing at little overhead see more errors than

one user at high overhead.

Many users testing at little overhead see more errors than

one user at high overhead.

Page 10: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

a += ya += y z = y * 75z = y * 75

y = x * 1024y = x * 1024 w = x + 42w = x + 42

Cannot Naïvely Sample Code

10

Validate(x)Validate(x)x = read_input()x = read_input()

a += ya += y

y = x * 1024y = x * 1024

False PositiveFalse Positive

False NegativeFalse Negative

w = x + 42w = x + 42

validate(x)validate(x)x = read_input()x = read_input() Skip Instr.

InputInput

Check wCheck w

Check zCheck zSkip Instr.

Check aCheck a

Page 11: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Sampling must be aware of meta-data

Remove meta-data from skipped dataflows Prevents false positives

Our Solution: Sample Data, not Code

11

Page 12: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

a += ya += y z = y * 75z = y * 75

y = x * 1024y = x * 1024 w = x + 42w = x + 42

Dataflow Sampling Example

12

validate(x)validate(x)x = read_input()x = read_input()

a += ya += y

y = x * 1024y = x * 1024

False NegativeFalse Negative

x = read_input()x = read_input()Skip Dataflow

InputInput

Check wCheck w

Check zCheck z

Skip Dataflow

Check aCheck a

Page 13: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Mechanisms for Dataflow Sampling (1)

13

Start with demand analysis

Demand Analysis ToolDemand Analysis Tool

NativeApplication

NativeApplication

InstrumentedApplication

(e.g. Valgrind)

InstrumentedApplication

(e.g. Valgrind)

Operating SystemOperating SystemMeta-dataMeta-data

No meta-dataNo meta-data

Page 14: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Mechanisms for Dataflow Sampling (2)

14

Remove dataflows if execution is too slow

Sampling Analysis ToolSampling Analysis Tool

InstrumentedApplication

InstrumentedApplication

NativeApplication

NativeApplication

InstrumentedApplication

InstrumentedApplication

Operating SystemOperating SystemMeta-dataMeta-data

Clear meta-dataClear meta-dataOH ThresholdOH Threshold

Page 15: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Prototype Setup Taint analysis sampling system

Network packets untrusted Xen-based demand analysis

Whole-system analysis with modified QEMU Overhead Manager (OHM) is user-controlled

15

Xen Hypervisor

OS and ApplicationsOS and Applications

AppApp AppApp AppApp…

LinuxLinux

ShadowPage Table

ShadowPage Table

Admin VMAdmin VM

Taint Analysis QEMU

Taint Analysis QEMU

Net StackNet

Stack OHMOHM

Page 16: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Benchmarks Performance – Network Throughput

Example: ssh_receive Accuracy of Sampling Analysis

Real-world Security Exploits

16

Name Error Description

Apache Stack overflow in Apache Tomcat JK Connector

Eggdrop Stack overflow in Eggdrop IRC bot

Lynx Stack overflow in Lynx web browser

ProFTPD Heap smashing attack on ProFTPD Server

Squid Heap smashing attack on Squid proxy server

Page 17: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

ssh_receive

Performance of Dataflow Sampling

17

Throughput with no analysis

Page 18: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Accuracy at Very Low Overhead Max time in analysis: 1% every 10 seconds Always stop analysis after threshold

Lowest probability of detecting exploits

18

Name Chance of Detecting Exploit

Apache 100%

Eggdrop 100%

Lynx 100%

ProFTPD 100%

Squid 100%

Page 19: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Accuracy with Background Tasks ssh_receive running in background

19

Page 20: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Conclusion & Future WorkDynamic dataflow sampling gives users aknob to control accuracy vs. performance

Better methods of sample choicesCombine static informationNew types of sampling analysis

20

Page 21: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

BACKUP SLIDES

21

Page 22: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Outline

22

Software Errors and Security

Dynamic Dataflow Analysis

Sampling and Distributed Analysis

Prototype System

Performance and Accuracy

Page 23: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Detecting Security Errors Static Analysis

Analyze source, formal reasoning

+ Find all reachable, defined errors

– Intractable, requires expert input,no system state

Dynamic Analysis Observe and test runtime state

+ Find deep errors as they happen

– Only along traversed path,very slow

23

Page 24: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Security Vulnerability Example Buffer overflows a large class of security

vulnerabilities

24

void foo(){ int local_variables; int buffer[256]; … buffer = read_input(); … return;}

Return address

Local variables

buffer

Bu

ffer F

ill

New Return address

Bad Local variables

If read_input() reads 200 ints If read_input() reads 200 ints If read_input() reads >256 ints If read_input() reads >256 ints

buffer

Page 25: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

netcat_receive

Performance of Dataflow Sampling (2)

25

Throughput with no analysis

Page 26: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

ssh_transmit

Performance of Dataflow Sampling (3)

26

Throughput with no analysis

Page 27: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

netcat_receive running with benchmark

Accuracy with Background Tasks

27

Page 28: Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd

Width Test

28