chronicler: lightweight recording to reproduce field failures · @_jon_bell_ 5/23/13 chronicler:...

47
@_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and Gail Kaiser Columbia University, New York, NY USA

Upload: others

Post on 19-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Chronicler: Lightweight Recording to Reproduce Field Failures

Jonathan Bell, Nikhil Sarda and Gail KaiserColumbia University, New York, NY USA

Page 2: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

What happens when software crashes?

Page 3: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and
Page 4: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and
Page 5: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Page 6: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Page 7: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Page 8: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Stack Traces Aren’t Enough

Method 1

Stack TraceMethod 1

Page 9: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Stack Traces Aren’t Enough

Method 1 Method 2

Stack TraceWrites

Method 1Method 2

Page 10: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Stack Traces Aren’t Enough

Method 1 Method 2

Stack TraceWrites

Method 1

Page 11: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Stack Traces Aren’t Enough

Method 1 Method 2 Method 3

Stack TraceWrites

Method 1Method 3

Page 12: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Stack Traces Aren’t Enough

Method 1 Method 2 Method 3

Stack TraceWrites Reads

*Crashes*

Method 1Method 3

Page 13: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Potential Solutions

• Frequent Snapshots [Artzi et al]

• Guided symbolic execution [Cheung et al], [Crameri et al], [Jin and Orso], [Zamfir and Candea]

Page 14: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Recording External Inputs

[Geels et al], [Clause and Orso], [Mickens et al], this work

Page 15: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Other applications on

the NetworkFiles on the user's disk

Inputs from the user (e.g. keyboard)

Our application

Recording External Inputs

Page 16: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Records all external interaction

Recording External InputsOther

applications on the Network

Files on the user's disk

Inputs from the user (e.g. keyboard)

Our application

Recording System

Page 17: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Record and Replay: Workflow

R+R System

Application

Instrumented for log

Instrumented for replay

R+R system generatestest case

Bug successfully reproduced in the lab

Used in the field

Crashes

Crashes

Page 18: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Recording Inputs

• Previous work looks at logging system calls

• e.g. read, write, open

• Can lead to very high overhead in VM-based languages, which perform many such operations just to setup the VM!

Page 19: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Our insight:API Level Logging

Page 20: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Reading a File: Log System Calls

public class ReadFile { public static void main(String[] args) throws FileNotFoundException { String text = new Scanner(new File("testFile")).useDelimiter("\\A").next(); }}

684 System Calls

$ java ReadFile

Page 21: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Reading a File: API Loggingpublic class ReadFile { public static void main(String[] args) throws FileNotFoundException { String text = new Scanner(new File("testFile")).useDelimiter("\\A").next(); }}

1 “External” API call

$ java ReadFile

Page 22: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Language VM (.NET CLR, JVM, etc)

Outside world (sources of external input)

Normal APIApplication

Chronicler

Language API

External-input API

API Level Logging: The Challenge

Page 23: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Language VM (.NET CLR, JVM, etc)

Outside world (sources of external input)

Normal APIApplication

Chronicler

Language API

External-input API

API Level Logging: The Challenge

Must determine reliably

Page 24: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Identifying External Inputs

Outside world (sources of external input)

Language VM (.NET CLR, JVM, etc)

"Native" interface

Page 25: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

External Inputs in Java

Outside world (sources of external input)

Native Methods in API

Language VM (.NET CLR, JVM, etc)

Language API

Page 26: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

External Inputs in Java

Outside world (sources of external input)

Native Methods in API

Language VM (.NET CLR, JVM, etc)Language API

API Calling Native Methods

Page 27: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Problem: A lot is done outside of the VM

Page 28: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Log explosion

public static native void arraycopy ...

Not all native methods return external input

1984 direct callers

Page 29: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Log explosion: System.arraycopy

125,714 indirect callers

public String concat(String str) { ... getChars(0, count, buf, 0); ...}

void getChars(char dst[], int dstBegin) {System.arraycopy(...);}

Page 30: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Managing Log Explosion

Outside world (external input)

Native methods in API receiving external input

Language VM (.NET CLR, JVM, etc)Language API

API calling external input methods

Outside world(not input-providing)

Page 31: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Managing Log Explosion

• Stop list: approx 70 native methods that do NOT gather external input

• Also includes StrictMath, some image processing

• Not necessary to be complete in the stop list

Page 32: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Managing Log Explosion

• Stop list: approx 70 native methods that do NOT gather external input

• Also includes StrictMath, some image processing

• Not necessary to be complete in the stop list

• Result: decrease log sites by ~25%

Page 33: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Implementation

• Targets Java

• Bytecode transformation (no source!)

• Logs all external inputs from the API

• Plenty of clever tricks: details in the paper (fast logging, callbacks, finalizers, and more!)

Page 34: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Evaluation

Page 35: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Functionality Evaluation

Find bug in bug tracker

Instrument with Chronicler

Manually Reproduce Bug

Chronicler Reproduces it

Again

Page 36: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Functionality Evaluation

• Attempted to reproduce 11 real bugs in:

• Jetty

• Apache Commons-Math

• Apache Commons-Lang

• Groovy

• Succeeded in all cases

Page 37: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

0 5000 10000 15000 20000 25000 30000 35000 40000

avrora

batik

eclipse

fop

h2

jython

luindex

lusearch

pmd

sunflow

tomcat

tradebeans

tradesoap

xalan

Average benchmark time (ms)

Baseline

Chronicler

Performance: DaCapo

<5% overhead (6)

DaCapo Absolute Performance

Page 38: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

0 5000 10000 15000 20000 25000 30000 35000 40000

avrora

batik

eclipse

fop

h2

jython

luindex

lusearch

pmd

sunflow

tomcat

tradebeans

tradesoap

xalan

Average benchmark time (ms)

Baseline

Chronicler

Performance: DaCapo

<10% overhead (4)

DaCapo Absolute Performance

Page 39: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

0 5000 10000 15000 20000 25000 30000 35000 40000

avrora

batik

eclipse

fop

h2

jython

luindex

lusearch

pmd

sunflow

tomcat

tradebeans

tradesoap

xalan

Average benchmark time (ms)

Baseline

Chronicler

Performance: DaCapo

<30% overhead (2)

DaCapo Absolute Performance

Page 40: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

0 5000 10000 15000 20000 25000 30000 35000 40000

avrora

batik

eclipse

fop

h2

jython

luindex

lusearch

pmd

sunflow

tomcat

tradebeans

tradesoap

xalan

Average benchmark time (ms)

Baseline

Chronicler

Performance: DaCapo

<40% overhead (2; nearly worst case)

DaCapo Absolute Performance

Page 41: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

0 5000 10000 15000 20000 25000 30000 35000 40000

avrora

batik

eclipse

fop

h2

jython

luindex

lusearch

pmd

sunflow

tomcat

tradebeans

tradesoap

xalan

Average benchmark time (ms)

Baseline

Chronicler

Performance: DaCapoDaCapo Absolute Performance

Page 42: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

0 5000 10000 15000 20000 25000 30000 35000 40000

avrora

batik

eclipse

fop

h2

jython

luindex

lusearch

pmd

sunflow

tomcat

tradebeans

tradesoap

xalan

Average benchmark time (ms)

Baseline

Chronicler

Performance: DaCapoDaCapo Absolute Performance

Page 43: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Performance: Best Case

0

500

1000

1500

2000

2500

3000

3500

Composite FFT SOR Monte Carlo

MatMult LU

Perf

orm

ance

(Meg

aflo

ps)

Baseline Chronicler

0.23%

1.15%

0.29%

0.19%

0.27%

0%

SciMark Absolute Performance

Page 44: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Performance: Worst Case

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2 4 8 16 32 64 128 512 1024 2048 3072

Ove

rhea

d

Input File Size (MB)

I/O Microbenchmark Relative Performance

Logging overhead masked by JVM setup/teardown

Worst case performance

Page 45: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Limitations and Future Work

Page 46: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Limitations and Future Work

• Assumes VM correctness

• Thread Interleavings

• Employ developer-side race detectors [e.g. Naik]

• Integrate with existing thread access loggers [e.g. Huang]

• Privacy

• Symbolic execution on the client to generate new inputs [e.g. Castro, Clause]

Page 47: Chronicler: Lightweight Recording to Reproduce Field Failures · @_jon_bell_ 5/23/13 Chronicler: Lightweight Recording to Reproduce Field Failures Jonathan Bell, Nikhil Sarda and

@_jon_bell_ 5/23/13

Chronicler: Lightweight Recording to Reproduce Field Failures

Jonathan Bell, Nikhil Sarda and Gail KaiserColumbia University, New York, NY USA

https://github.com/Programming-Systems-Lab/chroniclerj

[email protected]