code-tagging and similarity-based retrieval with mycbr

Post on 15-Dec-2014

978 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

This paper describes the code tagging plug-in coTag, which allows annotating code snippets in the integrated development environment eclipse. coTag offers an easy-to-use interface for tagging and searching. Using the similarity-based search engine of the open-source tool myCBR, the user can search not only for exactly the same tags as offered by other code tagging extensions, but also for similar tags and, thus, for similar code snippets. coTag provides means for context-based adding of new as well as changing of existing similarity links between tags, supported by myCBR’s explanation component.

TRANSCRIPT

Code-tagging and similarity-based retrieval with myCBRThomas Roth-Berghofer & Daniel BahlsSenior researcher, trb@dfki.de German Research Centre for Artificial Intelligence DFKI GmbH

CAMBRIDGE, UK, 10 DEC 2008

Samstag, 18. Juli 2009

Programmer‘s dilemma

Samstag, 18. Juli 2009

Programmer‘s dilemma

Samstag, 18. Juli 2009

Programmer‘s dilemma

• Where is the code fragment I used to solve a similar problem in the past?

• Is this piece of code still available?

• Is it worth the effort to search for it?

• If so, what would be the right search term?

Samstag, 18. Juli 2009

Personalised approach

Samstag, 18. Juli 2009

Personalised approach

• Personal vocabulary: tags

Samstag, 18. Juli 2009

Personalised approach

• Personal vocabulary: tags

• Linking tags

Samstag, 18. Juli 2009

Personalised approach

• Personal vocabulary: tags

• Linking tags

• Case-based retrieval

Samstag, 18. Juli 2009

Personalised approach

• Personal vocabulary: tags

• Linking tags

• Case-based retrieval

• Work context

Samstag, 18. Juli 2009

Personalised approach

• Personal vocabulary: tags

• Linking tags

• Case-based retrieval

• Work context

• Social dimension: tag exchange

Samstag, 18. Juli 2009

CBR cycle

Agnar Aamodt and Enric Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Communications, 7(1):39–59, 1994.

Samstag, 18. Juli 2009

CBR cycle

Agnar Aamodt and Enric Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Communications, 7(1):39–59, 1994.

myCBRCBR

Samstag, 18. Juli 2009

Code snippet & context

Java code snippet

Samstag, 18. Juli 2009

Code snippet & context

Work context

• java.net.URL

• java.net.URLConnection

• java.io.InputStream

• java.lang.StringBuffer

• java.io.BufferedReader

• java.lang.String

• java.lang.Exception

Java code snippet

Samstag, 18. Juli 2009

Case structureAttribute Value type category

Tags String (multiple) Problem description

Context items String (multiple) Problem description

Code snippet String Solution

Document type String Provenance

Project name String Provenance

File path String Provenance

Author ID String Provenance

Creation date Long Provenance

Rating Float Maintenance

Rating count Integer Maintenance

Samstag, 18. Juli 2009

Case structureAttribute Value type category

Tags String (multiple) Problem description

Context items String (multiple) Problem description

Code snippet String Solution

Document type String Provenance

Project name String Provenance

File path String Provenance

Author ID String Provenance

Creation date Long Provenance

Rating Float Maintenance

Rating count Integer Maintenance

Set by user

Set by coTag

Samstag, 18. Juli 2009

Acquiring case

Samstag, 18. Juli 2009

Acquiring case

Samstag, 18. Juli 2009

Query view

• Search for tags: init, logging config

• Include context => regard currently selected code

Samstag, 18. Juli 2009

Retrieval

• Result for: init, logging, config

• Ranked list of code snippets

Samstag, 18. Juli 2009

Presentation of cases

Samstag, 18. Juli 2009

Situations in which explanations play a role

• Instructing explanations:• Novice users want to know about how tagging and (similarity-based)

retrieval works.

• Convincing explanations:• Regular users want to check when the retrieval does not meet their

expectations.

• Improving explanations• Regular users want to correct coTag‘s behaviour.

Samstag, 18. Juli 2009

Explanation of matching

• Search terms: • init, logging, config

• Case tags: • init, Logger

Samstag, 18. Juli 2009

Graphical explanation of trigram matching

• Syntactical similarity• Typos

• Stemming

Samstag, 18. Juli 2009

Similarity customisation

• Tag similarities:

• Updates personal and community similarity measure

unsimilar 0%

partly similar 25%

similar 50%

very similar 75%

identical 100%

Samstag, 18. Juli 2009

Similarity customisation

• Tag similarities:

• Updates personal and community similarity measure

unsimilar 0%

partly similar 25%

similar 50%

very similar 75%

identical 100%

Samstag, 18. Juli 2009

Three levels of similarity calculation

Personal

Imported

Trigram

Samstag, 18. Juli 2009

Three levels of similarity calculation

Personal

Imported

Trigram

Samstag, 18. Juli 2009

Three levels of similarity calculation

Personal

Imported

Trigram

Samstag, 18. Juli 2009

Three levels of similarity calculation

Personal

Imported

Trigram

Samstag, 18. Juli 2009

Three levels of similarity calculation

Personal

Imported

Trigram

Samstag, 18. Juli 2009

Customised (personal) and imported similarity

Samstag, 18. Juli 2009

Client-side architecture

Samstag, 18. Juli 2009

Client-side architecture

Samstag, 18. Juli 2009

Client-side architecture

Samstag, 18. Juli 2009

Tag and exchange code snippets

Samstag, 18. Juli 2009

Samstag, 18. Juli 2009

Samstag, 18. Juli 2009

Take home messages

Samstag, 18. Juli 2009

• Re-finding information is a quite typical task in knowledge-work.

Take home messages

Samstag, 18. Juli 2009

• Re-finding information is a quite typical task in knowledge-work.

• Tagging is a helpful and well-known technique.

Take home messages

Samstag, 18. Juli 2009

• Re-finding information is a quite typical task in knowledge-work.

• Tagging is a helpful and well-known technique.

• Similarity-based retrieval can improve searches.

Take home messages

Samstag, 18. Juli 2009

• Re-finding information is a quite typical task in knowledge-work.

• Tagging is a helpful and well-known technique.

• Similarity-based retrieval can improve searches.

• Explanation-aware development of applications help you deal with increased complexity of similarity-based retrieval.

Take home messages

Samstag, 18. Juli 2009

Code-tagging and similarity-based retrieval with myCBRThomas Roth-Berghofer & Daniel BahlsSenior researcher, trb@dfki.de German Research Centre for Artificial Intelligence DFKI GmbH

CAMBRIDGE, UK, 10 DEC 2008

Thank you!

Samstag, 18. Juli 2009

top related