software analytics:towards software mining that matters (2014)

35
Software Analytics: Towards Software Mining that Matters Tao Xie Department of Computer Science University of Illinois at Urbana-Champaign, USA [email protected] In Collaboration with Microsoft Research

Upload: tao-xie

Post on 27-Nov-2014

432 views

Category:

Technology


1 download

DESCRIPTION

Software Analytics: Towards Software Mining that Matters (2014)

TRANSCRIPT

Page 1: Software Analytics:Towards Software Mining that Matters (2014)

Software Analytics:Towards Software Mining that

MattersTao Xie

Department of Computer ScienceUniversity of Illinois at Urbana-Champaign, USA

[email protected]

In Collaboration with Microsoft Research

Page 2: Software Analytics:Towards Software Mining that Matters (2014)

Machine Learning that Matters

“The basic argument in her paper is that machine learning might be in danger of losing its impact because the community as a whole has become quite self-referential. People are probably solving real-world problems using ML methods, but there is little sharing of these results within the community. Instead, people focus on existing benchmarks which might have originally had some connection to real-world problems which has been long forgotten, however.”

“She proposes a number of tasks like $100M solved through ML based decision making or a human life saved through a diagnosis or an intervention recommended by an ML system to get ML back on track.”

ICML’12

http://icml.cc/2012/papers/298.pdf

http://blog.mikiobraun.de/2012/06/is-machine-learning-losing-impact.html

Page 3: Software Analytics:Towards Software Mining that Matters (2014)

2012 NSF Workshop on Formal Methods• Goal: to identify the future directions in

research in formal methods and its transition to industrial practice.

• Success examples mentioned by the attendees– SLAM/SDV– ASTREE– SMT-based tools– …

http://goto.ucsd.edu/~rjhala/NSFWorkshop/

Page 4: Software Analytics:Towards Software Mining that Matters (2014)

“What Happened to the Promise of Software Tools?” – Jim Larus

http://www.srl.inf.ethz.ch/workshop2014/eth-larus.pdf

https://www.youtube.com/watch?v=kO9OYnkeRTM

Page 5: Software Analytics:Towards Software Mining that Matters (2014)

Software Analytics

Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services.

Dongmei Zhang, Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, and Tao Xie. Software Analytics as a Learning Case in Practice: Approaches and Experiences. In MALETS 2011http://research.microsoft.com/en-us/groups/sa/malets11-analytics.pdf

Page 6: Software Analytics:Towards Software Mining that Matters (2014)

Software Analytics

Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services.

http://research.microsoft.com/en-us/groups/sa/ http://research.microsoft.com/en-us/news/features/softwareanalytics-052013.aspx

Page 7: Software Analytics:Towards Software Mining that Matters (2014)

“What Happened to the Promise of Software Tools?” – Jim Larus

http://www.srl.inf.ethz.ch/workshop2014/eth-larus.pdf

https://www.youtube.com/watch?v=kO9OYnkeRTM

Page 9: Software Analytics:Towards Software Mining that Matters (2014)

Performance debugging in the large

Pattern Matching

Trace StorageTrace

collection

Bug updateProblematic

Pattern Repository

Bug Database

Network

Trace analysis

Bug filingKey to issue

discovery

Page 10: Software Analytics:Towards Software Mining that Matters (2014)

Performance debugging in the large

Pattern Matching

Trace StorageTrace

collection

Bug updateProblematic

Pattern Repository

Bug Database

Network

Trace analysis

Bug filingKey to issue

discoveryBottleneck

of scalability

Page 11: Software Analytics:Towards Software Mining that Matters (2014)

Performance debugging in the large

Pattern Matching

Trace StorageTrace

collection

Bug updateProblematic

Pattern Repository

Bug Database

Network

Trace analysis

How many issues are still unknown?

Bug filingKey to issue

discoveryBottleneck

of scalability

Page 12: Software Analytics:Towards Software Mining that Matters (2014)

Performance debugging in the large

Pattern Matching

Trace StorageTrace

collection

Bug updateProblematic

Pattern Repository

Bug Database

Network

Trace analysis

How many issues are still unknown?

Which trace file should I investigate

first?

Bug filingKey to issue

discoveryBottleneck

of scalability

Page 13: Software Analytics:Towards Software Mining that Matters (2014)

Technical highlights• Data mining for software domain

– Discovery of problematic execution patterns formulated as callstack mining & clustering

– Domain knowledge incorporated systematically

• Interactive performance analysis system– Parallel mining infrastructure based on HPC + MPI– Visualization aided interactive exploration

Page 14: Software Analytics:Towards Software Mining that Matters (2014)

Impact: Debugging Productivity Boost“We believe that the MSRA tool is highly valuable and much more efficient for mass trace (100+ traces) analysis. For 1000 traces, we believe the tool saves us 4-6 weeks of time to create new signatures, which is quite a significant productivity boost.”

Highly effective new issue discovery on Windows mini-hang

Continuous impact on future Windows versions

Page 16: Software Analytics:Towards Software Mining that Matters (2014)

XIAO: Code Clone Analysis• Motivation

– Copy-and-paste is a common developer behavior– A real tool widely adopted internally and externally

• XIAO enables code clone analysis in the following way– High tunability– High scalability– High compatibility– High explorability

Page 17: Software Analytics:Towards Software Mining that Matters (2014)

High tunability – what you tune is what you get• Intuitive similarity metric

– Effective control of the degree of syntactical differences between two code snippets

• Tunable at fine granularity– Statement similarity– % of inserted/deleted/modified statements– Balance between code structure and disordered statements

for (i = 0; i < n; i ++) { a ++; b ++; c = foo(a, b); d = bar(a, b, c); e = a + c; }

for (i = 0; i < n; i ++) { c = foo(a, b); a ++; b ++; d = bar(a, b, c); e = a + d; e ++; }

Page 18: Software Analytics:Towards Software Mining that Matters (2014)

High explorability

1. Clone navigation based on source tree hierarchy2. Pivoting of folder level statistics3. Folder level statistics4. Clone function list in selected folder5. Clone function filters6. Sorting by bug or refactoring potential7. Tagging

1 2 3 4 5 6

7

1. Block correspondence2. Block types3. Block navigation4. Copying5. Bug filing6. Tagging

1

2

3

4

1

6

5

Page 19: Software Analytics:Towards Software Mining that Matters (2014)

Scenarios & SolutionsQuality gates at milestones• Architecture refactoring• Code clone clean up• Bug fixing

Post-release maintenance• Security bug investigation• Bug investigation for sustained

engineering

Development and testing• Checking for similar issues before check-

in• Reference info for code review• Supporting tool for bug triage

Online code clone search

Offline code clone analysis

Page 20: Software Analytics:Towards Software Mining that Matters (2014)

Impact: Benefiting developer community

Available in Visual Studio 2012 RC

Searching similar snippets for fixing bug once

Finding refactoring opportunity

Page 21: Software Analytics:Towards Software Mining that Matters (2014)

Impact: More secure Microsoft products

Code Clone Search service integrated into workflow of Microsoft Security Response Center

Over 590 million lines of code indexed across multiple products

Real security issues proactively identified and addressed

Page 22: Software Analytics:Towards Software Mining that Matters (2014)

Example – MS Security Bulletin MS12-034Combined Security Update for Microsoft Office, Windows, .NET Framework, and Silverlight, published: Tuesday, May 08, 2012

3 publicly disclosed vulnerabilities and 7 privately reported involved. Specifically, 1 is exploited by the Duqu malware to execute arbitrary code when a user opened a malicious Office document

Insufficient bounds check within the font parsing subsystem of win32k.sysCloned copy in gdiplus.dll, ogl.dll (office), Silver Light, Windows Journal viewer

Microsoft Technet Blog about this bulletinHowever, we wanted to be sure to address the vulnerable code wherever it appeared across the Microsoft code base. To that end, we have been working with Microsoft Research to develop a “Cloned Code Detection” system that we can run for every MSRC case to find any instance of the vulnerable code in any shipping product. This system is the one that found several of the copies of CVE-2011-3402 that we are now addressing with MS12-034.

Page 23: Software Analytics:Towards Software Mining that Matters (2014)

SASIncident management of online services

http://research.microsoft.com/apps/pubs/?id=202451

Page 24: Software Analytics:Towards Software Mining that Matters (2014)

Motivation

Incident Management (IcM) is a critical task to assure service quality

• Online services are increasingly popular & important

• High service quality is the key

Page 25: Software Analytics:Towards Software Mining that Matters (2014)

Incident Management: Workflow

Detect a

service issue

Alert On-Call

Engineers (OCEs)

Investigate the problem

Restore the

service

Fix root cause via

postmortem analysis

Page 26: Software Analytics:Towards Software Mining that Matters (2014)

SAS: Incident management of online services SAS, developed and deployed to effectively reduce

MTTR (Mean Time To Restore) via automatically analyzing monitoring data

26

Design Principle of SAS Automating Analysis Handling Heterogeneity Accumulating Knowledge Supporting human-in-the-loop

(HITL)

Page 27: Software Analytics:Towards Software Mining that Matters (2014)

Techniques Overview• System metrics

– Identifying Incident Beacons• Transaction logs

– Mining Suspicious Execution Patterns• Historical incidents

– Mining Historical Workaround Solutions

Page 28: Software Analytics:Towards Software Mining that Matters (2014)

Industry Impact of SAS

Deployment

•SAS deployed to worldwide datacenters for Service X (serving hundreds of millions of users) since June 2011•OCEs now heavily depend on SAS

Usage•SAS helped successfully diagnose ~76% of the service incidents assisted with SAS

Page 29: Software Analytics:Towards Software Mining that Matters (2014)

Coding Duels (Code Hunt/Pex4Fun)

Teaching/Learning Programming/Software Engineering via Interactive Gaming

http://web.engr.illinois.edu/~taoxie/publications/icse13see-pex4fun.pdf

Page 30: Software Analytics:Towards Software Mining that Matters (2014)

Code Hunt Competition for Students https://www.codehunt.com/

Precursor: http://www.pex4fun.com/

Page 31: Software Analytics:Towards Software Mining that Matters (2014)

A Fun and Engaging Game – Win by Writing Code Supports Java and C#Adapts to competitions as well as individual play

Users: 1,181,152User Programs: 7,079,497

WWW.CODEHUNT.COM

Page 32: Software Analytics:Towards Software Mining that Matters (2014)

Behind the Scene of Coding Duel

Secret Implementation class Secret {

public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); }}

Player Implementation

class Player { public static int Puzzle(int x) { return x; }}

class Test {public static void Driver(int x) { if (Secret.Puzzle(x) != Player.Puzzle(x)) throw new Exception(“Mismatch”); }}

behaviorSecret Impl == Player

Impl

33

Page 33: Software Analytics:Towards Software Mining that Matters (2014)

Experience Reports on Successful Tool Transfer• Nikolai Tillmann, Jonathan de Halleux, and Tao Xie. Transferring an Automated

Test Generation Tool to Practice: From Pex to Fakes and Code Digger. In Proceedings of ASE 2014, Experience Papers. http://web.engr.illinois.edu/~taoxie/publications/ase14-pexexperiences.pdf

• Jian-Guang Lou, Qingwei Lin, Rui Ding, Qiang Fu, Dongmei Zhang, and Tao Xie. Software Analytics for Incident Management of Online Services: An Experience Report. In Proceedings ASE 2013, Experience Paper. http://web.engr.illinois.edu/~taoxie/publications/ase13-sas.pdf

• Dongmei Zhang, Shi Han, Yingnong Dang, Jian-Guang Lou, Haidong Zhang, and Tao Xie. Software Analytics in Practice. IEEE Software, Special Issue on the Many Faces of Software Analytics, 2013. http://web.engr.illinois.edu/~taoxie/publications/ieeesoft13-softanalytics.pdf

• Yingnong Dang, Dongmei Zhang, Song Ge, Chengyun Chu, Yingjun Qiu, and Tao Xie. XIAO: Tuning Code Clones at Hands of Engineers in Practice. In Proceedings of ACSAC 2012. http://web.engr.illinois.edu/~taoxie/publications/acsac12-xiao.pdf

Page 34: Software Analytics:Towards Software Mining that Matters (2014)

Ex: Human Consumption of Tool Outputs

• Developer: Your tool generated “\0”

• Pex team: What did you expect?

• Developer: Marc

Invariant candidates:this.getPrice() > 0this.getPrice() >= 0

http://www.agitar.com/

http://research.microsoft.com/projects/pex/

Page 35: Software Analytics:Towards Software Mining that Matters (2014)

Q & Ahttp://research.microsoft.com/en-us/groups/sa/

http://www.cs.illinois.edu/homes/taoxie/

Contact: [email protected]

Supported in part by a Microsoft Research Award, NSF grants CCF-1349666, CNS-1434582, CCF-1434596, CCF-1434590, CNS-1439481, and the USA National Security Agency (NSA) Science of Security Lablet.