bimodal software documentation christoph treude -...

Post on 19-Aug-2019

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Bimodal Software Documentation

Christoph Treude

University of Adelaide

[1985]

Software Documentation

2

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

3

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100%

4

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100% 74%

5

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100% 74%

59%

6

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100% 74%

59% 44%

7

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100% 74%

59% 44% 37%

8

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100% 74%

59% 44% 37%

162 different domains in the top 10 for 99 queries

9

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100% 59% 36%

Tensorflow Python API: 309 different domains in the top 10 for 2,192 queries

10

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

jQuery Event API: 75 different domains in the top 10 for 57 queries

100% 59% 36%

Tensorflow Python API: 309 different domains in the top 10 for 2,192 queries

100% 100% 98%

11

University of Adelaide

Navigating documentation is not trivial

12

University of Adelaide

Navigating documentation is not trivial

13

Common TasksLink

Link

Link

Link

Link

Link

Link

Link

University of Adelaide

verb noun adjective

Extracting tasks from documentation

[C. Treude, M. P. Robillard, and B. Dagenais. Extracting Development Tasks to Navigate Software Documentation. IEEE Trans. on Software Engineering, 41, 6, p. 565-581]

14

University of Adelaide

Grammatical dependencies

direct object:generate

confirmation

direct object:generate receipt

[C. Treude, M. P. Robillard, and B. Dagenais. Extracting Development Tasks to Navigate Software Documentation. IEEE Trans. on Software Engineering, 41, 6, p. 565-581]

15

University of Adelaide

Grammatical dependencies

passive nominal subject: set size

[C. Treude, M. P. Robillard, and B. Dagenais. Extracting Development Tasks to Navigate Software Documentation. IEEE Trans. on Software Engineering, 41, 6, p. 565-581]

16

University of Adelaide

Grammatical dependencies

adjective modifier:set thumbnail size

passive nominal subject: set size

[C. Treude, M. P. Robillard, and B. Dagenais. Extracting Development Tasks to Navigate Software Documentation. IEEE Trans. on Software Engineering, 41, 6, p. 565-581]

17

University of Adelaide

Grammatical dependenciespreposition:

set thumbnail size in templates

passive nominal subject: set size

adjective modifier:set thumbnail size

[C. Treude, M. P. Robillard, and B. Dagenais. Extracting Development Tasks to Navigate Software Documentation. IEEE Trans. on Software Engineering, 41, 6, p. 565-581]

18

University of Adelaide [C. Treude, M. Sicard, M. Klocke, and M. P. Robillard. TaskNav: Task-based Navigation of Software Documentation. ICSE ’15: 37th Int’l. Conf. on Software Engineering, p. 649-652]

19

University of Adelaide

Software Documentation is everywhere

[C. Parnin and C. Treude. Measuring API Documentation on the Web. Web2SE ’11: 2nd Int’l. Workshop on Web 2.0 for Software Engineering, p. 25-30]

100% 74%

59% 44% 37%

20

University of Adelaide 21[C. Treude and M. P. Robillard. Augmenting API Documentation with Insights from Stack Overflow. ICSE ’16: 38th Int’l. Conference on Software Engineering, p. 392-403]

insight sentencea sentence from Stack Overflow that is related to a particular API type and that provides insight not contained in the API documentation of that type

Supervised Insight Sentence Extractor

Augment API documentation with insights from Stack Overflow

23

University of Adelaide

Bimodal software documentation

[B. A. Campbell and C. Treude. NLP2Code: Code Snippet Content Assist via Natural Language Tasks. ICSME ’17: 33rd Int’l. Conf. on Software Maintenance and Evolution, to appear]

24

Challenges in Analyzing Documentation

University of Adelaide 25

• Software documentation is technical and often contains references to code elements

• Natural language text written by software developers may not obey all grammatical rules, e.g.,– sentences that are grammatically incomplete– content that has not authored by a native speaker

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

Comparing NLP libraries

University of Adelaide 26

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C++ variable.

Returns the C++ variable.

Returns the C++ variable.

Returns the C++ variable.

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

Comparing NLP libraries

University of Adelaide 27

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

Comparing NLP libraries

University of Adelaide 28

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

1. different tokenization

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

Comparing NLP libraries

University of Adelaide 29

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

NNS DT NN JJ CC JJ .

VBZ DT NNP NN .

VBZ DT NNP NN .

NNS DT NN JJ .

1. different tokenization

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

Comparing NLP libraries

University of Adelaide 30

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

NNS DT NN JJ CC JJ .

VBZ DT NNP NN .

VBZ DT NNP NN .

NNS DT NN JJ .

1. different tokenization

2. general part of speech

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

Comparing NLP libraries

University of Adelaide 31

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

NNS DT NN JJ CC JJ .

VBZ DT NNP NN .

VBZ DT NNP NN .

NNS DT NN JJ .

1. different tokenization

2. general part of speech

3. specific part of speech

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

Comparing NLP libraries

University of Adelaide 32

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

NNS DT NN JJ CC JJ .

VBZ DT NNP NN .

VBZ DT NNP NN .

NNS DT NN JJ .

1. different tokenization

2. general part of speech

3. specific part of speech

Only between 60% and 71% of tokens from

Stack Overflow, GitHub, and the Java API

Documentation were assigned the same

part-of-speech tag by all four libraries.

[F. N. A. Al Omran and C. Treude. Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments. MSR '17: 14th Int’l. Conf. on Mining Software Repositories, p. 187-197]

University of Adelaide

Bimodal software documentation

[B. A. Campbell and C. Treude. NLP2Code: Code Snippet Content Assist via Natural Language Tasks. ICSME ’17: 33rd Int’l. Conf. on Software Maintenance and Evolution, to appear]

33

University of Adelaide

Bimodal software documentationtaskstasks

code

[B. A. Campbell and C. Treude. NLP2Code: Code Snippet Content Assist via Natural Language Tasks. ICSME ’17: 33rd Int’l. Conf. on Software Maintenance and Evolution, to appear]

34

University of Adelaide

Code Snippet Content Assisttaskstasks

code

[B. A. Campbell and C. Treude. NLP2Code: Code Snippet Content Assist via Natural Language Tasks. ICSME ’17: 33rd Int’l. Conf. on Software Maintenance and Evolution, to appear]

35

University of Adelaide [B. A. Campbell and C. Treude. NLP2Code: Code Snippet Content Assist via Natural Language Tasks. ICSME ’17: 33rd Int’l. Conf. on Software Maintenance and Evolution, to appear]

36

The integration of natural language and code in documentation

University of Adelaide 37

The integration of natural language and code in documentation

University of Adelaide

creates challenges & opportunities for software engineering tools.38

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

The integration of natural language and code in documentation

University of Adelaide

creates challenges & opportunities for software engineering tools.39

CoreNLP

SyntaxNet

spaCy

NLTK

Returns the C + + variable .

Returns the C++ variable .

Returns the C++ variable .

Returns the C++ variable .

Thank you!christoph.treude@adelaide.edu.au

top related