advancing foundation and practice of software analytics

26
Advancing Foundation & Practice of Software Analytics Tao Xie North Carolina State University with Dongmei Zhang (Microsoft Research Asia) Xusheng Xiao (North Carolina State University) Chunhua Weng (Columbia University) RAISE 2013

Upload: tao-xie

Post on 05-Dec-2014

1.018 views

Category:

Technology


2 download

DESCRIPTION

Vision Statement Presentation on "Advancing Foundation & Practice of Software Analytics" at the 2nd International NSF sponsored Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2013) http://promisedata.org/raise/2013/

TRANSCRIPT

Page 1: Advancing Foundation and Practice of Software Analytics

Advancing Foundation & Practice of Software Analytics

Tao Xie

North Carolina State Universitywith Dongmei Zhang (Microsoft Research Asia) Xusheng Xiao (North Carolina State University)

Chunhua Weng (Columbia University)

RAISE 2013

Page 2: Advancing Foundation and Practice of Software Analytics

Software Analytics

Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services.

Dongmei Zhang, Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, and Tao Xie. Software Analytics as a Learning Case in Practice: Approaches and Experiences. In Proc. MALETS 2011.

MSRA Software Analytics group founded in May 2009 Term coined/defined expanding scope of previous work [Buse and Zimmermann, FoSER 10][Hassan and Xie, FoSER 10]

http://research.microsoft.com/en-us/groups/sa/ http://research.microsoft.com/en-us/news/features/softwareanalytics-052013.aspx

Page 3: Advancing Foundation and Practice of Software Analytics

ICSE 2013

Five Dimensions

Research Topics

Technology Pillars

Target Audience

Connection to Practice

Output

Page 4: Advancing Foundation and Practice of Software Analytics

Research Topics – the Trinity View

Software Users

Software Development Process

Software System

• Covering different areas ofsoftware domain

• Throughout entire development cycle

• Enabling practitioners to obtain insights

Page 5: Advancing Foundation and Practice of Software Analytics

Data Sources

Runtime traces

Program logs

System events

Perf counters

Usage logUser surveysOnline forum

postsBlog & Twitter

Source codeBug history

Check-in historyTest cases

Page 6: Advancing Foundation and Practice of Software Analytics

Target Audience – Software Practitioners

Developer

Tester

Program Manager

Usability engineer

Designer

Support engineer

Management personnel

Operation engineer

Page 7: Advancing Foundation and Practice of Software Analytics

ICSE 2013

Output – Insightful Information

Conveys meaningful and useful understanding or knowledge towards completing the target task

Not easily attainable via directly investigating raw data without aid of analytics technologies

Going from correlation to causality Examples

It is easy to count the number of re-opened bugs, but how to find out the primary reasons for these re-opened bugs?

When the availability of an online service drops below a threshold, how to localize the problem?

Page 8: Advancing Foundation and Practice of Software Analytics

ICSE 2013

Output – Actionable Information

Enables software practitioners to come up with concrete solutions towards completing the target task

Examples Why bugs were re-opened?▪ A list of bug groups each with the same reason

of re-opening Why availability of online services dropped?▪ A list of problematic areas with associated

confidence values Which part of my code should be refactored?▪ A list of cloned code snippets easily explored

from different perspectives

Page 9: Advancing Foundation and Practice of Software Analytics

Research Topics & Technology Pillars

Vertical

Horizontal

Information Visualization

Data Analysis Algorithms

Large-scale Computing

Software Users

Software Development Process

Software System

Page 10: Advancing Foundation and Practice of Software Analytics

ICSE 2013

Connection to Practice

Software Analytics is naturally tied with software development practice

Getting real

RealData

RealProblem

s

RealUsers

RealTools

Page 11: Advancing Foundation and Practice of Software Analytics

Human/Tool Cooperation: Performance Debugging in the Large

11

Pattern Matching

Bug update

Problematic Pattern

Repository

Bug Database

Trace analysis

Bug filing

StackMine [Han et al. ICSE 12]

Trace StorageTrace collection

Internet

Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012

How many issues are still unknown?

Which trace file should I investigate

first?

Key to issue discovery

Bottleneck of

scalability

Page 12: Advancing Foundation and Practice of Software Analytics

StackMine: Industry Impact

“We believe that the MSRA tool is highly valuable and much more efficient for mass trace (100+ traces) analysis. For 1000 traces, we believe the tool saves us 4-6 weeks of time to create new signatures, which is quite a significant productivity boost.”

- from Development Manager in WindowsHighly effective new issue

discovery onWindows mini-hang

Continuous impact on future Windows versions

12

Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012

Page 13: Advancing Foundation and Practice of Software Analytics

Dual Ends of the Road

13

Foundation: Science of Software Analytics?From correlation to causality

Practice: Software AnalyticsFrom pieces to a wholeBring human in the loopMake real impact in practice

FoundationPractice

Page 14: Advancing Foundation and Practice of Software Analytics

Caricature: Standard Security Research

Choose random system component

Find vulnerability

Suggest defense

Analyze security or test performance

Are we making progress?

Positive aspect: most security research addresses real problems

@J. Mitchell

Page 15: Advancing Foundation and Practice of Software Analytics

Meaning of “Science”

Systematization of Knowledge: An organized body of knowledge gained through researchAd hoc point solutions vs. general understandingRepeating failures of the past with each new platform, type

of vulnerability

Scientific Method: System of acquiring knowledge based on the scientific methodProcess of hypothesis testing and experimentsBuilding abstractions and models, theorems

Universal Laws: Laws or theories that are predictiveWidely applicableMake strong, quantitative predictions

@D. Evans, J. Mitchell

Page 16: Advancing Foundation and Practice of Software Analytics

Percentage of bug-introducing changes for eclipse

Don’t program on Fridays ;-)

[Zimmermann et al. 05]

Page 17: Advancing Foundation and Practice of Software Analytics

Failure is a 4-letter Word

[PROMISE’11 Zeller et al.]

Page 18: Advancing Foundation and Practice of Software Analytics

From Correlation to Causality

Analytic techniques are often used for applications that emphasize results over causation of the findings

Users may choose to act on the behavior without focus on understanding it (or its causation) provided that the pattern has a high empirical probability of correctly identifying an issueE.g., smuggling, traveling with false documents,

or predicting winning stock

@L. Williams, M. Rappa

Page 19: Advancing Foundation and Practice of Software Analytics

From Correlation to Causality cont.

Analytic techniques are often not used to support the identification and advancement of fundamental scientific principles based upon an analysis of causation

Emphasize the use of analytics to advance science (e.g., producing insights) besides the use of analytics in providing just observations

@L. Williams, M. Rappa

Page 20: Advancing Foundation and Practice of Software Analytics

Open Questions

How much science of a field (e.g., soft analytics)?A field may be a means/solution in contrast

to a problem domain like “security”, “design”

How can analytics/AI be used to help build science of “X”?

How to move a field to a foundational level?How to balance foundation and practice?

Page 21: Advancing Foundation and Practice of Software Analytics

Dual Ends of the Road

21

FoundationPractice

Foundation: Science of Software Analytics?From correlation to causality

Practice: Software AnalyticsFrom pieces to a wholeBring human in the loopMake real impact in practice

Page 22: Advancing Foundation and Practice of Software Analytics

Fitnex Path-Exploration Strategy for Pex in Pex Download counts

initial 20 months of release Academic: 17,366

Industrial: 13,022 Total: 30,388

22

Released since 2008

Page 23: Advancing Foundation and Practice of Software Analytics

Analytics/AI is the Means to the End

Interesting results

Actionable results

vs.

Problem hunting

vs.

Problem driven

Page 24: Advancing Foundation and Practice of Software Analytics

Open Questions

24

Who should bring software analytics research results to the hands of practitioners?

How to do so?

Page 25: Advancing Foundation and Practice of Software Analytics

Dual Ends of the Road

25

FoundationPractice

Foundation: Science of Software Analytics?From correlation to causality

Practice: Software AnalyticsFrom pieces to a wholeBring human in the loopMake real impact in practice

Page 26: Advancing Foundation and Practice of Software Analytics

Thank you!

Questions ?

https://sites.google.com/site/asergrp

/[email protected]

NSF grants CCF-0845272, CCF-0915400, CNS-0958235, ARO grant W911NF-08-1-0443, an NSA Science of Security, Lablet grant, a NIST grant, a 2011 Microsoft Research SEIF Award