bimodal software documentation christoph treude -...

39
Bimodal Software Documentation Christoph Treude

Upload: hathien

Post on 19-Aug-2019

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

Bimodal Software Documentation

Christoph Treude

Page 2: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

[1985]

Software Documentation

2

Page 3: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

3

Page 4: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100%

4

Page 5: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100% 74%

5

Page 6: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100% 74%

59%

6

Page 7: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100% 74%

59% 44%

7

Page 8: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100% 74%

59% 44% 37%

8

Page 9: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100% 74%

59% 44% 37%

162 different domains in the top 10 for 99 queries

9

Page 10: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100% 59% 36%

Tensorflow Python API: 309 different domains in the top 10 for 2,192 queries

10

Page 11: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

jQuery Event API: 75 different domains in the top 10 for 57 queries

100% 59% 36%

Tensorflow Python API: 309 different domains in the top 10 for 2,192 queries

100% 100% 98%

11

Page 12: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Navigating documentation is not trivial

12

Page 13: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Navigating documentation is not trivial

13

Common TasksLink

Link

Link

Link

Link

Link

Link

Link

Page 14: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

verb noun adjective

Extracting tasks from documentation

[C. Treude, M. P. Robillard, and B. Dagenais. Extracting Development Tasks to Navigate Software Documentation. IEEE Trans. on Software Engineering, 41, 6, p. 565-581]

14

Page 15: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Grammatical dependencies

direct object:generate

confirmation

direct object:generate receipt

[C. Treude, M. P. Robillard, and B. Dagenais. Extracting Development Tasks to Navigate Software Documentation. IEEE Trans. on Software Engineering, 41, 6, p. 565-581]

15

Page 16: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Grammatical dependencies

passive nominal subject: set size

[C. Treude, M. P. Robillard, and B. Dagenais. Extracting Development Tasks to Navigate Software Documentation. IEEE Trans. on Software Engineering, 41, 6, p. 565-581]

16

Page 17: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Grammatical dependencies

adjective modifier:set thumbnail size

passive nominal subject: set size

[C. Treude, M. P. Robillard, and B. Dagenais. Extracting Development Tasks to Navigate Software Documentation. IEEE Trans. on Software Engineering, 41, 6, p. 565-581]

17

Page 18: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Grammatical dependenciespreposition:

set thumbnail size in templates

passive nominal subject: set size

adjective modifier:set thumbnail size

[C. Treude, M. P. Robillard, and B. Dagenais. Extracting Development Tasks to Navigate Software Documentation. IEEE Trans. on Software Engineering, 41, 6, p. 565-581]

18

Page 19: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide [C. Treude, M. Sicard, M. Klocke, and M. P. Robillard. TaskNav: Task-based Navigation of Software Documentation. ICSE ’15: 37th Int’l. Conf. on Software Engineering, p. 649-652]

19

Page 20: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100% 74%

59% 44% 37%

20

Page 21: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide 21[C. Treude and M. P. Robillard. Augmenting API Documentation with Insights from Stack Overflow. ICSE ’16: 38th Int’l. Conference on Software Engineering, p. 392-403]

Page 22: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

insight sentencea sentence from Stack Overflow that is related to a particular API type and that provides insight not contained in the API documentation of that type

Page 23: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

Supervised Insight Sentence Extractor

Augment API documentation with insights from Stack Overflow

23

Page 24: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Bimodal software documentation

[B. A. Campbell and C. Treude. NLP2Code: Code Snippet Content Assist via Natural Language Tasks. ICSME ’17: 33rd Int’l. Conf. on Software Maintenance and Evolution, to appear]

24

Page 25: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

Challenges in Analyzing Documentation

University of Adelaide 25

• Software documentation is technical and often contains references to code elements

• Natural language text written by software developers may not obey all grammatical rules, e.g.,– sentences that are grammatically incomplete– content that has not authored by a native speaker

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

Page 26: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

Comparing NLP libraries

University of Adelaide 26

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C++ variable.

Returns the C++ variable.

Returns the C++ variable.

Returns the C++ variable.

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

Page 27: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

Comparing NLP libraries

University of Adelaide 27

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

Page 28: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

Comparing NLP libraries

University of Adelaide 28

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

1. different tokenization

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

Page 29: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

Comparing NLP libraries

University of Adelaide 29

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

NNS DT NN JJ CC JJ .

VBZ DT NNP NN .

VBZ DT NNP NN .

NNS DT NN JJ .

1. different tokenization

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

Page 30: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

Comparing NLP libraries

University of Adelaide 30

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

NNS DT NN JJ CC JJ .

VBZ DT NNP NN .

VBZ DT NNP NN .

NNS DT NN JJ .

1. different tokenization

2. general part of speech

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

Page 31: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

Comparing NLP libraries

University of Adelaide 31

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

NNS DT NN JJ CC JJ .

VBZ DT NNP NN .

VBZ DT NNP NN .

NNS DT NN JJ .

1. different tokenization

2. general part of speech

3. specific part of speech

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

Page 32: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

Comparing NLP libraries

University of Adelaide 32

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

NNS DT NN JJ CC JJ .

VBZ DT NNP NN .

VBZ DT NNP NN .

NNS DT NN JJ .

1. different tokenization

2. general part of speech

3. specific part of speech

Only between 60% and 71% of tokens from

Stack Overflow, GitHub, and the Java API

Documentation were assigned the same

part-of-speech tag by all four libraries.

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

Page 33: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Bimodal software documentation

[B. A. Campbell and C. Treude. NLP2Code: Code Snippet Content Assist via Natural Language Tasks. ICSME ’17: 33rd Int’l. Conf. on Software Maintenance and Evolution, to appear]

33

Page 34: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Bimodal software documentationtaskstasks

code

[B. A. Campbell and C. Treude. NLP2Code: Code Snippet Content Assist via Natural Language Tasks. ICSME ’17: 33rd Int’l. Conf. on Software Maintenance and Evolution, to appear]

34

Page 35: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide

Code Snippet Content Assisttaskstasks

code

[B. A. Campbell and C. Treude. NLP2Code: Code Snippet Content Assist via Natural Language Tasks. ICSME ’17: 33rd Int’l. Conf. on Software Maintenance and Evolution, to appear]

35

Page 36: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

University of Adelaide [B. A. Campbell and C. Treude. NLP2Code: Code Snippet Content Assist via Natural Language Tasks. ICSME ’17: 33rd Int’l. Conf. on Software Maintenance and Evolution, to appear]

36

Page 37: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

The integration of natural language and code in documentation

University of Adelaide 37

Page 38: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

The integration of natural language and code in documentation

University of Adelaide

creates challenges & opportunities for software engineering tools.38

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

Page 39: Bimodal Software Documentation Christoph Treude - UCLcrest.cs.ucl.ac.uk/cow/55/slides/cow55_Treude.pdf · documentation of that type. Supervised Insight Sentence Extractor Augment

The integration of natural language and code in documentation

University of Adelaide

creates challenges & opportunities for software engineering tools.39

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

Thank [email protected]