sherlog: error diagnosis by connecting clues from run-time logs
Post on 16-Jan-2015
1.884 Views
Preview:
DESCRIPTION
TRANSCRIPT
Introduction
Scenario - production run failure
failure reproduction: reproduce the failed execution trying to figureout what was going on with the program
Challenges
customers’ privacy concernsdifficulty in setting up exact same execution environmentlack of low-overhead logging mechanism for failure reproduction onmulti-processors (why?)
Common Practice in Industry
customers send logs to vendors in case of failurevendors analyze logs to find clues to the problem
Research Question
how to locate root cause of failure by analyzing logs?even without reproduce the failure execution
CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 2/13
Introduction
Scenario - production run failure
failure reproduction: reproduce the failed execution trying to figureout what was going on with the program
Challenges
customers’ privacy concernsdifficulty in setting up exact same execution environmentlack of low-overhead logging mechanism for failure reproduction onmulti-processors (why?)
Common Practice in Industry
customers send logs to vendors in case of failurevendors analyze logs to find clues to the problem
Research Question
how to locate root cause of failure by analyzing logs?even without reproduce the failure execution
CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 2/13
Introduction
Scenario - production run failure
failure reproduction: reproduce the failed execution trying to figureout what was going on with the program
Challenges
customers’ privacy concernsdifficulty in setting up exact same execution environmentlack of low-overhead logging mechanism for failure reproduction onmulti-processors (why?)
Common Practice in Industry
customers send logs to vendors in case of failurevendors analyze logs to find clues to the problem
Research Question
how to locate root cause of failure by analyzing logs?even without reproduce the failure execution
CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 2/13
Introduction
Scenario - production run failure
failure reproduction: reproduce the failed execution trying to figureout what was going on with the program
Challenges
customers’ privacy concernsdifficulty in setting up exact same execution environmentlack of low-overhead logging mechanism for failure reproduction onmulti-processors (why?)
Common Practice in Industry
customers send logs to vendors in case of failurevendors analyze logs to find clues to the problem
Research Question
how to locate root cause of failure by analyzing logs?even without reproduce the failure execution
CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 2/13
Approach
Idea
Ideal Goal: find out what exactly happened in the failure execution,i.e. the exact failure-inducing execution paths
Realistic Goal: identify the Must-Have, May-Have, andMust-Not-Have paths, and the states of variables on the possiblepaths
Usage Scenario
runs the tool to get an interesting pathqueries or examines values of certain interesting variables along thepathrepeats the previous step until the root cause is found
CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 3/13
Approach
Idea
Ideal Goal: find out what exactly happened in the failure execution,i.e. the exact failure-inducing execution pathsRealistic Goal: identify the Must-Have, May-Have, andMust-Not-Have paths, and the states of variables on the possiblepaths
Usage Scenario
runs the tool to get an interesting pathqueries or examines values of certain interesting variables along thepathrepeats the previous step until the root cause is found
CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 3/13
Approach
Idea
Ideal Goal: find out what exactly happened in the failure execution,i.e. the exact failure-inducing execution pathsRealistic Goal: identify the Must-Have, May-Have, andMust-Not-Have paths, and the states of variables on the possiblepaths
Usage Scenario
runs the tool to get an interesting pathqueries or examines values of certain interesting variables along thepathrepeats the previous step until the root cause is found
CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 3/13
Design
Three main components:
Log Parsing: locates the source code lines printing the messages
Path Inference: infers the Must-Paths, May-Paths, andPruned-Paths
Value Inference: infers the variable values on the paths
CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 5/13
Design
Three main components:
Log Parsing: locates the source code lines printing the messages
Path Inference: infers the Must-Paths, May-Paths, andPruned-Paths
Value Inference: infers the variable values on the paths
CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 5/13
Evaluation
Methodology
manually reproduce and diagnose the failurecollect path summaries at runtimecompare the result of SherLog with the reproduction
Terminology
useful: SherLog infers a subset of the summarized informationcomplete: SherLog infers all the information necessary for debugging
CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 7/13
Overall Results
CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 9/13
Case Studies
Three case studies to demonstrate the effectiveness of SherLog:
Case 1: ln of coreutils 4.5.1
Case 2: Squid web proxy cache server
Case 3: CVS Configuration Error
CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 10/13
Squid Case Study
CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 11/13
Performance
CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 12/13
Discussion
What can we do with the results of SherLog? Can we make thesesuccessive steps automated as well?
How much helpful the result of SherLog is for debugging? Or moregenerally, how do we evaluate automated debugging tools?
How much useful SherLog is when it is not complete?
CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 13/13
top related