acm distinguished program: cooperative testing and analysis: human-tool, tool-tool, and human-human...
Post on 01-Nov-2014
630 Views
Preview:
DESCRIPTION
TRANSCRIPT
Cooperative Testing and Analysis:
Human-Tool, Tool-Tool, and Human-Human Cooperations to Get the Job
Done
Tao Xie
North Carolina State UniversityRaleigh, NC, USA
Turing Test Tell Machine and Human Apart
Human vs. Machine Machine Better Than Human?
IBM's Deep Blue defeated chess champion Garry Kasparov in 1997
IBM Watson defeated top human Jeopardy! players in 2011
What is Toronto????Human Better Than Machine?
Category U.S. CITIES: “Its largest airport was named for a World War II hero; its second largest, for a World War II battle”Responses of Rutter and Jennings: “What is Chicago?”Response of Watson: "What is Toronto?????"
CAPTCHA: Human is Better
"Completely Automated Public Turing test to tell Computers and Humans Apart"
Human Computer Interaction
Movie: Minority Report
CNN News
iPad
Human-Centric Software Engineering
…
Automation in Software Testing
2010 Dagstuhl Seminar 10111
Practical Software Testing: Tool Automation and Human Factors
http://www.dagstuhl.de/programm/kalender/semhp/?semnr=1011
Automation in Software Testing
2010 Dagstuhl Seminar 10111
Practical Software Testing: Tool Automation and Human Factors
Human Factors
http://www.dagstuhl.de/programm/kalender/semhp/?semnr=1011
Automated Test Generation
10
Recent advanced technique: Dynamic Symbolic Execution/Concolic Testing Instrument code to explore feasible paths
Example tool: Pex from Microsoft Research (for .NET programs)
Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: directed automated random testing. In Proc. PLDI 2005Koushik Sen, Darko Marinov, and Gul Agha. CUTE: a concolic unit testing engine for C. In Proc. ESEC/FSE 2005Nikolai Tillmann and Jonathan de Halleux. Pex - White Box Test Generation for .NET. In Proc. TAP 2008
Dynamic Symbolic Execution
Code to generate inputs for:
Constraints to solve
a!=null a!=null &&a.Length>0
a!=null &&a.Length>0 &&a[0]==1234567890
void CoverMe(int[] a){ if (a == null) return; if (a.Length > 0) if (a[0] == 1234567890) throw new Exception("bug");}
Observed constraints
a==null
a!=null &&!(a.Length>0)a!=null &&a.Length>0 &&a[0]!=1234567890
a!=null &&a.Length>0 &&a[0]==1234567890
Data
null
{}
{0}
{123…}a==null
a.Length>0
a[0]==123…T
TF
T
F
F
Execute&MonitorSolve
Choose next path
Done: There is no path left.
Negated condition
Automating Test Generation
Method sequences MSeqGen/Seeker [Thummalapenta et al. OOSPLA 11, ESEC/FSE
09], Covana [Xiao et al. ICSE 2011], OCAT [Jaygarl et al. ISSTA 10], Evacon [Inkumsah et al. ASE 08], Symclat [d'Amorim et al. ASE 06]
Environments e.g., db, file systems, network, … DBApp Testing [Taneja et al. ESEC/FSE 11], [Pan et al. ASE 11] CloudApp Testing [Zhang et al. IEEE Soft 12]
Loops Fitnex [Xie et al. DSN 09]
Code evolution eXpress [Taneja et al. ISSTA 11]
@NCSU ASE
Pex on MSDN DevLabsIncubation Project for Visual Studio
Download counts (20 months)(Feb. 2008 - Oct. 2009 )
Academic: 17,366 Devlabs: 13,022 Total: 30,388
http://research.microsoft.com/projects/pex/
Open Source Pex extensionshttp://pexase.codeplex.com/
Publications: http://research.microsoft.com/en-us/projects/pex/community.aspx#publications
State-of-the-Art/Practice Testing Tools
Running Symbolic PathFinder ...…=============================
========================= results
no errors detected=============================
========================= statistics
elapsed time: 0:00:02states: new=4, visited=0,
backtracked=4, end=2search: maxDepth=3, constraints=0choice generators: thread=1, data=2heap: gc=3, new=271, free=22instructions: 2875max memory: 81MBloaded code: classes=71, methods=884
…
15
Challenges Faced by Test Generation Tools
object-creation problems (OCP) - 65% external-method call problems (EMCP) – 27%
Total block coverage achieved is 50%, lowest coverage 16%.
16
Example: Dynamic Symbolic Execution/Concolic Testing Instrument code to explore feasible paths Challenge: path explosion
Example Object-Creation Problem
17
A graph example from QuickGraph library
Includes two classes GraphDFSAlgorithm
GraphAddVertexAddEdge: requires
both vertices to be in graph
00: class Graph : IVEListGraph { …03: public void AddVertex (IVertex v) {04: vertices.Add(v); // B1 }06: public Edge AddEdge (IVertex v1, IVertex v2) {07: if (!vertices.Contains(v1))08: throw new VNotFoundException(""); 09: // B210: if (!vertices.Contains(v2))11: throw new VNotFoundException("");12: // B314: Edge e = new Edge(v1, v2);15: edges.Add(e); } }
//DFS:DepthFirstSearch18: class DFSAlgorithm { … 23: public void Compute (IVertex s) { ...24: if (graph.GetEdges().Size() > 0) { // B425: isComputed = true;26: foreach (Edge e in graph.GetEdges()) {27: ... // B528: }29: } } } 17
[Thummalapenta et al. OOPSLA 11]
18
Test target: Cover true branch (B4) of Line 24
Desired object state: graph should include at least one edge
Target sequence:
Graph ag = new Graph();Vertex v1 = new Vertex(0);Vertex v2 = new Vertex(1);ag.AddVertex(v1);ag.AddVertex(v2);ag.AddEdge(v1, v2);DFSAlgorithm algo = new
DFSAlgorithm(ag);algo.Compute(v1);
18
00: class Graph : IVEListGraph { …03: public void AddVertex (IVertex v) {04: vertices.Add(v); // B1 }06: public Edge AddEdge (IVertex v1, IVertex v2) {07: if (!vertices.Contains(v1))08: throw new VNotFoundException(""); 09: // B210: if (!vertices.Contains(v2))11: throw new VNotFoundException("");12: // B314: Edge e = new Edge(v1, v2);15: edges.Add(e); } }
//DFS:DepthFirstSearch18: class DFSAlgorithm { … 23: public void Compute (IVertex s) { ...24: if (graph.GetEdges().Size() > 0) { // B425: isComputed = true;26: foreach (Edge e in graph.GetEdges()) {27: ... // B528: }29: } } }
Example Object-Creation Problem
[Thummalapenta et al. OOPSLA 11]
Challenges Faced by Test Generation Tools
object-creation problems (OCP) - 65% external-method call problems (EMCP) – 27%
Total block coverage achieved is 50%, lowest coverage 16%.
19
Example: Dynamic Symbolic Execution/Concolic (Pex) Instrument code to explore feasible paths Challenge: path explosion
Example External-Method Call Problems (EMCP)
Example 1: File.Exists has data
dependencies on program input
Subsequent branch at Line 1 using the return value of File.Exists. Example 2:
Path.GetFullPath has data dependencies on program input
Path.GetFullPath throws exceptions.
Example 3: String.Format do not cause any problem
20
1
2
3
Human Can Help! Object Creation Problems (OCP)Tackle object-creation problems with Factory Methods
21
Human Can Help!External-Method Call Problems (EMCP)Tackle external-method call problems with Mock Methods or Method Instrumentation
Mocking System.IO.File.ReadAllText
22
State-of-the-Art/Practice Testing Tools
Running Symbolic PathFinder ...…=============================
========================= results
no errors detected=============================
========================= statistics
elapsed time: 0:00:02states: new=4, visited=0,
backtracked=4, end=2search: maxDepth=3, constraints=0choice generators: thread=1, data=2heap: gc=3, new=271, free=22instructions: 2875max memory: 81MBloaded code: classes=71, methods=884
…
Tools Typically Don’t Communicate Challenges Faced by Them to Enable Cooperation between Tools and Users
23
Bigger Picture
Machine is better at task set A Mechanical, tedious, repetitive tasks, … Ex. solving constraints along a long path
Human is better at task set B Intelligence, human intent, abstraction,
domain knowledge, … Ex. local reasoning after a loop, recognizing
naming semantics
= A U
B24
Cooperation Between Human and Machine
Human-Assisted Computing Driver: tool Helper: human Ex. Covana [Xiao et al. ICSE 2011]
Human-Centric Computing Driver: human Helper: tool Ex. Coding duels @Pex for Fun
Interfaces are important. Contents are important too!
25
Human-Assisted ComputingMotivation
Tools are often not powerful enough Human is good at some aspects that tools are not
What difficulties does the tool face? How to communicate info to the user to get help?
How does the user help the tool based on the info?
26
Iterations to form Feedback Loop
Human-Assisted ComputingMotivation
Tools are often not powerful enough Human is good at some aspects that tools are not
What difficulties does the tool face? How to communicate info to the user to get
help?
How does the user help the tool based on the info? 27
Iterations to form Feedback Loop
Difficulties Faced by Automated-Structural-Test-Generation Tools
external-method call problems (EMCP)
object-creation problems (OCP)
28
Existing Solution of Problem Identification
Existing solution identify all executed external-method calls report all object types of program inputs and
fields
Limitations the number is often high some identified problem are irrelevant for
achieving higher structural coverage
29
DSE Challenges - Preliminary Study
Real EMCPs: 0Real OCPs: 5
Reported EMCPs: 44Reported OCPs: 18 vs.
30
Proposed Approach: Covana
Goal: Precisely identify problems faced by tools when achieving structural coverage
Insight: Partially-Covered Statements have data dependency on real problem candidates
31
[Xiao et al. ICSE 11]
Xusheng Xiao, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux. Precise Identification of Problems for Structural Test Generation. In Proc. ICSE 2011
Overview of Covana
Data Dependence Analysis
Forward Symbolic Execution
Problem Candidat
es
Problem Candidate Identificati
on
Runtime Informati
on
Identified Problems
Coverage
Program
Generated Test Inputs
Runtime Events
32
Problem Candidate Identification
Data Dependencies
33
External-method calls whose arguments have data dependencies on program inputs
Data Dependence Analysis
Symbolic Expression:return(File.Exists) == true
Element of EMCP Candidate:return(File.Exists)
Branch Statement Line 1 has data dependency on File.Exists at Line 1
34
Partially-covered branch statements have data dependencies on EMCP candidates for return values
Evaluation – Subjects and Setup
Subjects: xUnit: unit testing framework for .NET▪ 223 classes and interfaces with 11.4 KLOC
QuickGraph: C# graph library▪ 165 classes and interfaces with 8.3 KLOC
Evaluation setup: Apply Pex to generate tests for program under
test Feed the program and generated tests to Covana Compare existing solution and Covana
35
Evaluation – Research Questions
RQ1: How effective is Covana in identifying the two main types of problems, EMCPs and OCPs?
RQ2: How effective is Covana in pruning irrelevant problem candidates of EMCPs and OCPs?
36
Evaluations - RQ1: Problem Identification
Covana identifies • 43 EMCPs with only 1 false positive and 2 false negatives• 155 OCPs with 20 false positives and 30 false negatives.
37
Evaluation –RQ2: Irrelevant-Problem-Candidate Pruning
Covana prunes • 97% (1567 in 1610) EMCP candidates with 1 false positive and 2 false negatives• 66% (296 in 451) OCP candidates with 20 false positives and 30 false negatives
38
Cooperation Between Human and Machine – Covana Task: What need to automate?
Test-input generation What difficulties does the tool face?
Doesn’t know which methods to instrument and explore Doesn’t know how to generate effective method sequences
How to communicate info to the user to get her help? Report encountered problems
How does the user help the tool based on the info? Instruct which external methods to instrument/write mock objects Write factory methods for generating objects
Iterations to form feedback loop? Yes, till the user is happy with coverage or impatient
[Xiao et al. ICSE 2011]
Cooperation Between Human and Machine
Human-Assisted Computing Driver: tool Helper: human Ex. Covana [Xiao et al. ICSE 2011]
Human-Centric Computing Driver: human Helper: tool Ex. Coding duels @Pex for Fun
Interfaces are important. Contents are important too!
40
Microsoft Research Pex for FunTeaching and Learning CS via Social Gaming
1,083,640 clicked 'Ask Pex!'
www.pexforfun.com
The contributed concept of Coding Duel games as major game type of Pex for Fun since Summer 2010
41
Behind the Scene of Pex for Fun
Secret Implementation class Secret {
public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); }}
Player Implementation
class Player { public static int Puzzle(int x) { return x; }}
class Test {public static void Driver(int x) { if (Secret.Puzzle(x) != Player.Puzzle(x)) throw new Exception(“Mismatch”); }}
behaviorSecret Impl == Player Impl
42
HCC: Pex for FunCoding duels at http://www.pexforfun.com/ Task for Human: write behavior-equiv code
Human Tool Does my new code behave differently? How exactly?
Human Tool Could you fix your code to handle failed/passed tests?
Iterations to form feedback loop? Yes, till tool generates no failed tests/player is impatient
class Player { public static int Puzzle(int x) { return x; }}
Human-Centric Computing
Coding duels at http://www.pexforfun.com/ Brain exercising/learning while having fun Fun: iterative, adaptive/personalized, w/ win
criterion Abstraction/generalization, debugging,
problem solving
Brain exercising
Coding Duel Competition @ICSE 2011
Coding Duels for Automatic Grading @NCSU CSC 510
Especially valuable in Massive Open Online Courses (MOOC)
Human-Human Cooperation: Pex for Fun (Crowdsourcing)
47
Internet
class Secret { public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); } }
Everyone can contribute Coding duels Duel solutions
Human-Human Cooperation: Puzzle Games (Crowdsourcing)
InternetPuzzle Games Made from Difficult Constraints or Object-Creation Problems
Supported by MSR SEIF Award
Ning Chen and Sunghun Kim. Puzzle-based Automatic Testing: bringing humans into the loop by solving puzzles. In Proc. ASE 2012
http://www.cs.washington.edu/verigames/
Human-Human/Tool Cooperation: Performance Debugging in the Large
50
Pattern Matching
Bug update
Problematic Pattern
Repository
Bug Database
Trace analysis
Bug filing
StackMine [Han et al. ICSE 12]
Trace StorageTrace collection
Internet
Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012
StackMine: Industry Impact “We believe that the MSRA tool is
highly valuable and much more efficient for mass trace (100+
traces) analysis. For 1000 traces, we believe the tool saves us 4-6
weeks of time to create new signatures, which is quite a
significant productivity boost.”- from Development Manager in
WindowsHighly effective new issue discovery
onWindows mini-hang
Continuous impact on future Windows versions
51
Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012
Tool-Tool Cooperation
Static analysis + dynamic analysis Static checking + Test generation …
Dynamic analysis + static analysis Fix generation + fix validation …
Static analysis + static analysis …
Dynamic analysis + dynamic analysis …
52Example: Xiaoyin Wang, Lu Zhang, Tao Xie, Yingfei Xiong, and Hong Mei. Automating Presentation Changes in Dynamic Web Applications via Collaborative Hybrid Analysis. In Proc. FSE 2012
Summary: Cooperative Testing and Analysis
Human-Assisted Computing Covana
Human-Centric Computing Pex for Fun
Human-Human Cooperation StackMine
Acknowledgment Wonderful current/former students@NCSU ASE
Collaborators, especially those from Microsoft Research Redmond/Asia, Peking University
Colleagues who gave feedback and inspired me
NSF grants CCF-0845272, CCF-0915400, CNS-0958235, ARO grant W911NF-08-1-0443, an NSA Science of Security, Lablet grant, a NIST grant, a 2011 Microsoft Research SEIF Award
Thank you!
Questions ?
https://sites.google.com/site/asergrp/
top related