carol v. alexandru, sebastiano panichella, harald c. gall · 2017. 2. 22. · 0.525 0.109 0.082...

126
Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall Software Evolution and Architecture Lab University of Zurich, Switzerland {alexandru,panichella,gall}@ifi.uzh.ch 22.02.2017 SANER’17 Klagenfurt, Austria

Upload: others

Post on 27-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Carol V. Alexandru, Sebastiano Panichella, Harald C. GallSoftware Evolution and Architecture Lab

University of Zurich, Switzerland{alexandru,panichella,gall}@ifi.uzh.ch

22.02.2017

SANER’17

Klagenfurt, Austria

Page 2: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

The Problem Domain

• Static analysis (e.g. #Attr., McCabe, coupling...)

1

Page 3: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

The Problem Domain

v0.7.0 v1.0.0 v1.3.0 v2.0.0 v3.0.0 v3.3.0 v3.5.0

2

• Static analysis (e.g. #Attr., McCabe, coupling...)

Page 4: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

The Problem Domain

v0.7.0 v1.0.0 v1.3.0 v2.0.0 v3.0.0 v3.3.0 v3.5.0

2

• Static analysis (e.g. #Attr., McCabe, coupling...)

• Many revisions, fine-grained historical data

Page 5: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

A Typical Analysis Process

3

www

clone

select project

Page 6: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

A Typical Analysis Process

www

clone

checkout

select project

select revision

3

Page 7: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

A Typical Analysis Process

www

clone

checkout analysis tool

Res

select project

select revision

applytool

storeanalysis

results

3

Page 8: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

A Typical Analysis Process

www

clone

checkout analysis tool

Res

select project

select revision

applytool

storeanalysis

results

more revisions?

3

Page 9: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

A Typical Analysis Process

www

clone

checkout analysis tool

Res

select project

select revision

applytool

storeanalysis

results

more revisions?

more projects?

3

Page 10: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Redundancies all over...

4

Redundancies in historical code analysis

Impact on Code Study Tools

Page 11: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Redundancies all over...

Redundancies in historical code analysis

Few files change

Only small parts of a file change

Across RevisionsImpact on Code

Study Tools

4

Page 12: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Redundancies all over...

Redundancies in historical code analysis

Few files change

Only small parts of a file change

Across RevisionsImpact on Code

Study Tools

Repeated analysis of "known" code

4

Page 13: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Redundancies all over...

Redundancies in historical code analysis

Few files change

Only small parts of a file change

Changes may not even affect results

Across RevisionsImpact on Code

Study Tools

Repeated analysis of "known" code

Storing redundant results

4

Page 14: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Redundancies all over...

4

Redundancies in historical code analysis

Few files change

Across Languages

Only small parts of a file change

Changes may not even affect results

Each language has their own toolchain

Yet they share many metrics

Across RevisionsImpact on Code

Study Tools

Repeated analysis of "known" code

Storing redundant results

Page 15: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Redundancies all over...

Redundancies in historical code analysis

Few files change

Across Languages

Only small parts of a file change

Changes may not even affect results

Each language has their own toolchain

Yet they share many metrics

Across RevisionsImpact on Code

Study Tools

Repeated analysis of "known" code

Storing redundant results

Re-implementing identical analyses

Generalizability is expensive

4

Page 16: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Redundancies all over...

Redundancies in historical code analysis

Few files change

Across Languages

Only small parts of a file change

Changes may not even affect results

Each language has their own toolchain

Yet they share many metrics

Across RevisionsImpact on Code

Study Tools

Repeated analysis of "known" code

Storing redundant results

Re-implementing identical analyses

Generalizability is expensive

5

Page 17: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Important!

6

Techniques implemented

in LISA

Your favouriteanalysis features

Pick what you like!

Page 18: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

#1: Avoid Checkouts

Page 19: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Avoid checkouts

7

clone

Page 20: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Avoid checkouts

7

clone

checkout

read write

Page 21: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Avoid checkouts

7

clone

checkout

read

read write

analyze

Page 22: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Avoid checkouts

7

clone

checkout

read

For every file: 2 read ops + 1 write opCheckout includes irrelevant filesNeed 1 CWD for every revision to be analyzed in parallel

read write

analyze

Page 23: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Avoid checkouts

8

clone analyze

read

Page 24: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Avoid checkouts

8

clone analyze

Only read relevant files in a single read opNo write opsNo overhead for parallization

read

Page 25: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Avoid checkouts

8

clone analyze

Only read relevant files in a single read opNo write opsNo overhead for parallization

Git

Analysis Tool

File Abstraction Layer

read

Page 26: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Avoid checkouts

8

clone analyze

Only read relevant files in a single read opNo write opsNo overhead for parallization

Git

Analysis Tool

File Abstraction Layer

E.g. for the JDK Compiler:

class JavaSourceFromCharrArray(name: String, val code: CharBuffer)extends SimpleJavaFileObject(URI.create("string:///" + name), Kind.SOURCE) {override def getCharContent(): CharSequence = code

}

read

Page 27: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Avoid checkouts

clone analyze

Only read relevant files in a single read opNo write opsNo overhead for parallization

Git

Analysis Tool

File Abstraction Layer

E.g. for the JDK Compiler:

class JavaSourceFromCharrArray(name: String, val code: CharBuffer)extends SimpleJavaFileObject(URI.create("string:///" + name), Kind.SOURCE) {override def getCharContent(): CharSequence = code

}

read

9

Page 28: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

#2: Use a multi-revision representationof your sources

Page 29: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

10

Page 30: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

rev. 1

rev. 2

rev. 3

rev. 4

11

rev. 1

Merge ASTs

Page 31: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

rev. 1

rev. 2

rev. 3

rev. 4

12

rev. 2

Merge ASTs

Page 32: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

13

rev. 3

Page 33: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

14

rev. 4

Page 34: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

15

Page 35: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

16

rev. range [1-4]

rev. range [1-2]

Page 36: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4AspectJ (~440k LOC):

1 commit: 2.2M nodesAll >7000 commits: 6.5M nodes

17

Page 37: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4AspectJ (~440k LOC):

1 commit: 2.2M nodesAll >7000 commits: 6.5M nodes

18

Page 38: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4AspectJ (~440k LOC):

1 commit: 2.2M nodesAll >7000 commits: 6.5M nodes

19

Page 39: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

#3: Store AST nodes only ifthey're needed for analysis

Page 40: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}

}

}

What's the complexity (1+#forks) and name for each method and class?

20

Page 41: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}

}

}

140 AST nodes(using ANTLR)

parse

What's the complexity (1+#forks) and name for each method and class?

20

Page 42: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}

}

}

140 AST nodes(using ANTLR)

parseCompilationUnit

TypeDeclaration

Modifiers

public

Members

Method

Modifiers

public

Name

run

Name

Demo

Parameters ReturnType

PrimitiveType

VOID

Body

Statements

...

...

What's the complexity (1+#forks) and name for each method and class?

20

Page 43: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}

}

}

What's the complexity (1+#forks) and name for each method and class?

140 AST nodes(using ANTLR)

parse filtered parse

TypeDeclaration

Method

Name

run

Name

DemoForStatement

IfStatement

ConditionalExpression

7 AST nodes(using ANTLR)

21

Page 44: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}

}

}

What's the complexity (1+#forks) and name for eachmethod and class?

140 AST nodes(using ANTLR)

parse filtered parse

TypeDeclaration

Method

Name

run

Name

DemoForStatement

IfStatement

ConditionalExpression

7 AST nodes(using ANTLR)

22

Page 45: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

}

}

}

What's the complexity (1+#forks) and name for eachmethod and class?

140 AST nodes(using ANTLR)

parse filtered parse

TypeDeclaration

Method

Name

run

Name

DemoForStatement

IfStatement

ConditionalExpression

7 AST nodes(using ANTLR)

23

Page 46: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

#4: Use non-duplicative data structuresto store your results

Page 47: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

rev. 1

rev. 2

rev. 3

rev. 4

24

Page 48: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

rev. 1

rev. 2

rev. 3

rev. 4

24

Page 49: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

rev. 1

rev. 2

rev. 3

rev. 4

24

[1-1]

label

#attr

mcc

[4-4]

label

#attr

mcc

InnerClass

0 4

1 2 4

[2-3]

label

#attr

mcc

Page 50: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

rev. 1

rev. 2

rev. 3

rev. 4 [1-1]

label

#attr

mcc

[4-4]

label

#attr

mcc

InnerClass

0 4

1 2 4

[2-3]

label

#attr

mcc

25

Page 51: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

LISA also does:#5: Parallel Parsing#6: Asynchronous graph computation#7: Generic graph computationsapplying to ASTs from compatiblelanguages

26

Page 52: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

To Summarize...

Page 53: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

The LISA Analysis Process

27

www

clone

select project

Page 54: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

The LISA Analysis Process

27

www

clone

parallel parseinto mergedgraph

select project

Language Mappings(Grammar to Analysis)

determines which ASTnodes are loaded

ANTLRv4Grammar used by

Generates Parser

Page 55: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

The LISA Analysis Process

27

www

clone

parallel parseinto mergedgraph

Res

select project

storeanalysis

results

Analysis formulated as Graph Computation

Language Mappings(Grammar to Analysis) used by

Async. compute

determines which ASTnodes are loaded

determines whichdata is persisted

ANTLRv4Grammar used by

Generates Parser

runson graph

Page 56: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

The LISA Analysis Process

27

www

clone

parallel parseinto mergedgraph

Res

select project

storeanalysis

results

more projects?

Analysis formulated as Graph Computation

Language Mappings(Grammar to Analysis) used by

Async. compute

determines which ASTnodes are loaded

determines whichdata is persisted

ANTLRv4Grammar used by

Generates Parser

runson graph

Page 57: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

How well does it work, then?

Page 58: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Marginal cost for +1 revision

41.670

4.6330.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033

0

5

10

15

20

25

30

35

40

45

50

1 10 100 1000 2000 3000 4000 5000 6000 7000

Average Parsing+Computation time per Revision when analyzing n revisions of AspectJ (10 common metrics)

Seco

nd

s

# of revisions

28

Page 59: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Overall Performance Stats

29

Language Java C# JavScript

#Projects 100 100 100

#Revisions 646'261 489'764 204'301

#Files (parsed!) 3'235'852 3'234'178 507'612

#Lines (parsed!) 1'370'998'072 961'974'773 194'758'719

Total Runtime (RT)¹ 18:43h 52:12h 29:09h

Median RT¹ 2:15min 4:54min 3:43min

Tot. Avg. RT per Rev.² 84ms 401ms 531ms

Med. Avg. RT per Rev.² 30ms 116ms 166ms

¹ Including cloning and persisting results² Excluding cloning and persisting results

Page 60: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

What's the catch?(There are a few...)

Page 61: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

The (not so) minor stuff

• Must implement analyses from scratch

• No help from a compiler

• Non-file-local analyses need some effort

30

Page 62: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

The (not so) minor stuff

• Must implement analyses from scratch

• No help from a compiler

• Non-file-local analyses need some effort

• Moved files/methods etc. add overhead

• Uniquely identifying files/entities is hard

• (No impact on results, though)

30

Page 63: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Language matters

31

Javascript C# Java

E.g.: Javascript takes longer because:• Larger files, less modularization• Slower parser (automatic semicolon-insertion)

Page 64: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

LISA is E X TREME

32

complexfeature-richheavyweight

simplegeneric

lightweight

Page 65: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Thank you for your attention

SANER ‘17, Klagenfurt, 22.02.2017

Read the paper: http://t.uzh.ch/Fj

Try the tool: http://t.uzh.ch/Fk

Get the slides: http://t.uzh.ch/Fm

Contact me: [email protected]

Page 66: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Parallelize Parsing

66

Single Git tree traversal

src/Main.java {1: 1251a4}, {3: fc2452}, {4: 251929}

src/Settings.java {2: fa255a}

src/Foo.java {1: 512fc2}, {4: 791c2a}, {5: bcb215}

src/Bar.java {4: 8a23b2}, {5: b2399f}

Obtain sequence of Git blob ids for old versions of each unique path

Page 67: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Parallelize Parsing

67

Single Git tree traversal

src/Main.java {1: 1251a4}, {3: fc2452}, {4: 251929}

src/Settings.java {2: fa255a}

src/Foo.java {1: 512fc2}, {4: 791c2a}, {5: bcb215}

src/Bar.java {4: 8a23b2}, {5: b2399f}

Obtain sequence of Git blob ids for old versions of each unique path

Parse files with different paths in parallelSome files will have more revisions, taking longer to parse in total Parsing only takes roughly as long as required for the file with the most revisions

Page 68: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Parallelize Parsing

68

Single Git tree traversal

src/Main.java {1: 1251a4}, {3: fc2452}, {4: 251929}

src/Settings.java {2: fa255a}

src/Foo.java {1: 512fc2}, {4: 791c2a}, {5: bcb215}

src/Bar.java {4: 8a23b2}, {5: b2399f}

Obtain sequence of Git blob ids for old versions of each unique path

Parse files with different paths in parallelSome files will have more revisions, taking longer to parse in total Parsing only takes roughly as long as required for the file with the most revisions

Page 69: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

"Speed-up factor" for each technique

• Parallel parsing: Roughly 2

• Merged ASTs: >1000 for many revisions

• Filtered parsing: >10 during

computation, depends on how much is

filtered

• All depends on file sizes / parser speed

Page 70: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Asynchronous Graph Analysis

rev. 1

rev. 2

rev. 3

rev. 4

Depending on the node type:- Signal specfic data

70

Page 71: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Asynchronous Graph Analysis

rev. 1

rev. 2

rev. 3

rev. 4

Depending on the node type:- Signal specfic data- Collect specific data

71

Page 72: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Asynchronous Graph Analysis

rev. 1

rev. 2

rev. 3

rev. 4

Depending on the node type:- Signal specfic data- Collect specific data- Store specific data

72

Page 73: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Asynchronous Graph Analysis

rev. 1

rev. 2

rev. 3

rev. 4

Depending on the node type:- Signal specfic data- Collect specific data- Store specific data- Create new nodes & edges

73

Page 74: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

rev. 1

rev. 2

rev. 3

rev. 4

74

Example - Method Count (NOM):

Signal:

Collect:

Store:

Depending on the node type:- Signal specfic data- Collect specific data- Store specific data- Create new nodes & edges

Asynchronous Graph Analysis

Page 75: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

rev. 1

rev. 2

rev. 3

rev. 4

75

Example - Method Count (NOM):

Signal:METHOD – nom: 1

Collect:

Store:

Depending on the node type:- Signal specfic data- Collect specific data- Store specific data- Create new nodes & edges

Asynchronous Graph Analysis

Page 76: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

rev. 1

rev. 2

rev. 3

rev. 4

76

Example - Method Count (NOM):

Signal:METHOD – nom: 1

Collect:CLASS-like – nom += s.nom

Store:

Depending on the node type:- Signal specfic data- Collect specific data- Store specific data- Create new nodes & edges

Asynchronous Graph Analysis

Page 77: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

rev. 1

rev. 2

rev. 3

rev. 4

77

Example - Method Count (NOM):

Signal:METHOD – nom: 1

Collect:CLASS-like – nom += s.nom

Store:CLASS-like – nom

Depending on the node type:- Signal specfic data- Collect specific data- Store specific data- Create new nodes & edges

Asynchronous Graph Analysis

Page 78: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Static source code analysis?

78

Page 79: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Static source code analysis?

79

Page 80: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Static source code analysis?

80

Simple Code Metrics(NOC, NOM, WMC, Complexity, ...)

Page 81: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Static source code analysis?

81

Structure(Coupling, Inheritance, ...)

Simple Code Metrics(NOC, NOM, WMC, Complexity, ...)

Page 82: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Static source code analysis?

82

Structure(Coupling, Inheritance, ...)

Simple Code Metrics(NOC, NOM, WMC, Complexity, ...)

Page 83: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Static source code analysis?

83

Practicecode smells

refactoring advicehot-spot detection

bug prediction

Structure(Coupling, Inheritance, ...)

Simple Code Metrics(NOC, NOM, WMC, Complexity, ...)

Page 84: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Static source code analysis?

84

Practicecode smells

refactoring advicehot-spot detection

bug prediction

Researchunderstanding software evolution identifying patterns &anti-patternscode quality assessment techniques...→ code studies

Structure(Coupling, Inheritance, ...)

Simple Code Metrics(NOC, NOM, WMC, Complexity, ...)

Page 85: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

(Many) existing studies

85

Page 86: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

(Many) existing studies

86

Page 87: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

(Many) existing studies

2g

• investigate a small number of projects

Page 88: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

(Many) existing studies

• investigate a small number of projects

source: github.com

source: openhub.net

88

Page 89: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

(Many) existing studies

• investigate a small number of projects

• analyze a few snapshots of multi-year projects

v0.7.0 v1.0.0 v1.3.0 v2.0.0 v3.0.0 v3.3.0 v3.5.0

source: github.com

source: openhub.net

89

Page 90: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

(Many) existing studies

• investigate a small number of projects

• analyze a few snapshots of multi-year projects

90

Page 91: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

(Many) existing studies

• investigate a small number of projects

• analyze a few snapshots of multi-year projects

• focus on very few programming languages

91

Page 92: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Why?

92

Only few LanguagesOnly a few projects Only a few snapshots

Page 93: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Why?

93

Only few LanguagesOnly a few projects Only a few snapshots

Time & resource intensive analysis

Page 94: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Why?

94

Only few LanguagesOnly a few projects Only a few snapshots

Time & resource intensive analysis

Each commit analyzed seperately

Page 95: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Why?

95

Only few LanguagesOnly a few projects Only a few snapshots

Manual work

Time & resource intensive analysis

Each commit analyzed seperately

Page 96: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Why?

96

Only few LanguagesOnly a few projects Only a few snapshots

Manual work

Time & resource intensive analysis

Each commit analyzed seperately

Many preconditions for analyses

Page 97: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Why?

97

Only few LanguagesOnly a few projects Only a few snapshots

Manual work

Time & resource intensive analysis

Each commit analyzed seperately

Many preconditions for analyses

Page 98: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Why?

98

Only few LanguagesOnly a few projects Only a few snapshots

Manual work

Time & resource intensive analysis

Each commit analyzed seperately

Many preconditions for analyses

Analysis Tools are purpose-built, not designed for large-scale studies

Page 99: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Why?

99

Only few LanguagesOnly a few projects Only a few snapshots

Manual work

Time & resource intensive analysis

Each commit analyzed seperately

Many preconditions for analyses

Analysis Tools are purpose-built, not designed for large-scale studies

Possible solution: LISAA fast, multi-purpose, graph-based analysis approach

Page 100: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

#3: Use asynchronous graph analysistechniques

Page 101: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Rapid Analysis using LISA

101

Only few LanguagesOnly a few projects Only a few snapshots

Manual work

Time & resource intensive analysis

Each commit analyzed seperately

Many preconditions for analyses

Fast, general purpose code analysis tool aimed specifically at large scale

Page 102: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Rapid Analysis using LISA

102

Only few LanguagesOnly a few projects Only a few snapshots

Manual work

Time & resource intensive analysis

Analyze all commits in parallel

Many preconditions for analyses

Fast, general purpose code analysis tool aimed specifically at large scale

Page 103: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Rapid Analysis using LISA

103

Only few LanguagesOnly a few projects Only a few snapshots

«Point and Shoot»

Time & resource intensive analysis

Analyze all commits in parallel

Many preconditions for analyses

Fast, general purpose code analysis tool aimed specifically at large scale

Page 104: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Rapid Analysis using LISA

104

Only few LanguagesOnly a few projects Only a few snapshots

«Point and Shoot»

Time & resource intensive analysis

Analyze all commits in parallel

Analysis directly on source code

Fast, general purpose code analysis tool aimed specifically at large scale

Page 105: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Rapid Analysis using LISA

105

All code is equal(given a parser)

Only a few projects Only a few snapshots

«Point and Shoot»

Time & resource intensive analysis

Analyze all commits in parallel

Analysis directly on source code

Fast, general purpose code analysis tool aimed specifically at large scale

Page 106: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Rapid Analysis using LISA

106

All code is equal(given a parser)

Only a few projects Only a few snapshots

«Point and Shoot»

Quick analysis of 1000s of commits

Analyze all commits in parallel

Analysis directly on source code

Fast, general purpose code analysis tool aimed specifically at large scale

Page 107: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Rapid Analysis using LISA

107

All code is equal(given a parser)

Only a few projects All commits

«Point and Shoot»

Quick analysis of 1000s of commits

Analyze all commits in parallel

Analysis directly on source code

Fast, general purpose code analysis tool aimed specifically at large scale

Page 108: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Rapid Analysis using LISA

108

All code is equal(given a parser)

Only a few projects All commits

«Point and Shoot»

Quick analysis of 1000s of commits

Analyze all commits in parallel

Analysis directly on source code

Fast, general purpose code analysis tool aimed specifically at large scale

Page 109: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

# of Commits: Linear Scaling

MarchUpdate(Total)

109

Page 110: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Multi-Project Parallelization

27

LISA

AspectJ:7642 commits, 440k LOC

Requirements:20 GB memory45 min on 4 cores

Page 111: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Multi-Project Parallelization

27

LISA

AspectJ:7642 commits, 440k LOC

Requirements:20 GB memory45 min on 4 cores

VM VM

VM

VMVM

VM

VM

VM

VM

VM

Parallelization scenario:10x Amazon EC 2 r3.8xlarge10x 244GB memory10x 32 cores

Potential to analyze 160AspectJ-sized projects per hour

(Most projects on git-hub are much smaller)

Page 112: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Benchmarking

10

Page 113: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Benchmarking

10

Is this good or bad?What does it even

mean?

Is this metric too high?

Here‘s a code smell - should I fix it?

Page 114: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Benchmarking

10

Page 115: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Benchmarking

10

Create reference points for orientation

Page 116: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Cluster & Compare

11

Page 117: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Cluster & Compare

11

• Discover „phenotypes“

• By metric values

Page 118: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Cluster & Compare

11

• Discover „phenotypes“

• By metric values

• By metric evolution over

time

Page 119: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Cluster & Compare

11

• Discover „phenotypes“

• By metric values

• By metric evolution over

time

• Compare programming

languages / paradigms

Page 120: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Cluster & Compare

11

• Discover „phenotypes“

• By metric values

• By metric evolution over

time

• Compare programming

languages / paradigms

Find interesting projects for further study &identify evolution patterns to find (un-)desirable ones

Page 121: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Machine Learning

12

Sequence of Changes(observable)

Bug Reports(training)

HMM

Page 122: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Machine Learning

12

Sequence of Changes(observable)

Bug Reports(training)

HMM

Bug-Proneness(hidden)

Page 123: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Machine Learning

12

Sequence of Changes(observable)

Bug Reports(training)

HMM

Bug-Proneness(hidden)

Leverage big data to train classifiers and predictors

Page 124: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Study Replication

13

Conclusions X, Y, Z

Page 125: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Study Replication

13

Conclusions X, Y, Z

Replicable with 1000s of projects?

Page 126: Carol V. Alexandru, Sebastiano Panichella, Harald C. Gall · 2017. 2. 22. · 0.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033 0 5 10 15 20 25 30 35 40 45 50 1 10 100 1000 2000 3000

Study Replication

13

Conclusions X, Y, Z

Replicable with 1000s of projects?

Confirmation or rebuttal of existing assumptions