code-tagging and similarity-based retrieval with mycbr
DESCRIPTION
This paper describes the code tagging plug-in coTag, which allows annotating code snippets in the integrated development environment eclipse. coTag offers an easy-to-use interface for tagging and searching. Using the similarity-based search engine of the open-source tool myCBR, the user can search not only for exactly the same tags as offered by other code tagging extensions, but also for similar tags and, thus, for similar code snippets. coTag provides means for context-based adding of new as well as changing of existing similarity links between tags, supported by myCBR’s explanation component.TRANSCRIPT
Code-tagging and similarity-based retrieval with myCBRThomas Roth-Berghofer & Daniel BahlsSenior researcher, [email protected] German Research Centre for Artificial Intelligence DFKI GmbH
CAMBRIDGE, UK, 10 DEC 2008
Samstag, 18. Juli 2009
Programmer‘s dilemma
Samstag, 18. Juli 2009
Programmer‘s dilemma
Samstag, 18. Juli 2009
Programmer‘s dilemma
• Where is the code fragment I used to solve a similar problem in the past?
• Is this piece of code still available?
• Is it worth the effort to search for it?
• If so, what would be the right search term?
Samstag, 18. Juli 2009
Personalised approach
Samstag, 18. Juli 2009
Personalised approach
• Personal vocabulary: tags
Samstag, 18. Juli 2009
Personalised approach
• Personal vocabulary: tags
• Linking tags
Samstag, 18. Juli 2009
Personalised approach
• Personal vocabulary: tags
• Linking tags
• Case-based retrieval
Samstag, 18. Juli 2009
Personalised approach
• Personal vocabulary: tags
• Linking tags
• Case-based retrieval
• Work context
Samstag, 18. Juli 2009
Personalised approach
• Personal vocabulary: tags
• Linking tags
• Case-based retrieval
• Work context
• Social dimension: tag exchange
Samstag, 18. Juli 2009
CBR cycle
Agnar Aamodt and Enric Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Communications, 7(1):39–59, 1994.
Samstag, 18. Juli 2009
CBR cycle
Agnar Aamodt and Enric Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Communications, 7(1):39–59, 1994.
myCBRCBR
Samstag, 18. Juli 2009
Code snippet & context
Java code snippet
Samstag, 18. Juli 2009
Code snippet & context
Work context
• java.net.URL
• java.net.URLConnection
• java.io.InputStream
• java.lang.StringBuffer
• java.io.BufferedReader
• java.lang.String
• java.lang.Exception
Java code snippet
Samstag, 18. Juli 2009
Case structureAttribute Value type category
Tags String (multiple) Problem description
Context items String (multiple) Problem description
Code snippet String Solution
Document type String Provenance
Project name String Provenance
File path String Provenance
Author ID String Provenance
Creation date Long Provenance
Rating Float Maintenance
Rating count Integer Maintenance
Samstag, 18. Juli 2009
Case structureAttribute Value type category
Tags String (multiple) Problem description
Context items String (multiple) Problem description
Code snippet String Solution
Document type String Provenance
Project name String Provenance
File path String Provenance
Author ID String Provenance
Creation date Long Provenance
Rating Float Maintenance
Rating count Integer Maintenance
Set by user
Set by coTag
Samstag, 18. Juli 2009
Acquiring case
Samstag, 18. Juli 2009
Acquiring case
Samstag, 18. Juli 2009
Query view
• Search for tags: init, logging config
• Include context => regard currently selected code
Samstag, 18. Juli 2009
Retrieval
• Result for: init, logging, config
• Ranked list of code snippets
Samstag, 18. Juli 2009
Presentation of cases
Samstag, 18. Juli 2009
Situations in which explanations play a role
• Instructing explanations:• Novice users want to know about how tagging and (similarity-based)
retrieval works.
• Convincing explanations:• Regular users want to check when the retrieval does not meet their
expectations.
• Improving explanations• Regular users want to correct coTag‘s behaviour.
Samstag, 18. Juli 2009
Explanation of matching
• Search terms: • init, logging, config
• Case tags: • init, Logger
Samstag, 18. Juli 2009
Graphical explanation of trigram matching
• Syntactical similarity• Typos
• Stemming
Samstag, 18. Juli 2009
Similarity customisation
• Tag similarities:
• Updates personal and community similarity measure
unsimilar 0%
partly similar 25%
similar 50%
very similar 75%
identical 100%
Samstag, 18. Juli 2009
Similarity customisation
• Tag similarities:
• Updates personal and community similarity measure
unsimilar 0%
partly similar 25%
similar 50%
very similar 75%
identical 100%
Samstag, 18. Juli 2009
Three levels of similarity calculation
Personal
Imported
Trigram
Samstag, 18. Juli 2009
Three levels of similarity calculation
Personal
Imported
Trigram
Samstag, 18. Juli 2009
Three levels of similarity calculation
Personal
Imported
Trigram
Samstag, 18. Juli 2009
Three levels of similarity calculation
Personal
Imported
Trigram
Samstag, 18. Juli 2009
Three levels of similarity calculation
Personal
Imported
Trigram
Samstag, 18. Juli 2009
Customised (personal) and imported similarity
Samstag, 18. Juli 2009
Client-side architecture
Samstag, 18. Juli 2009
Client-side architecture
Samstag, 18. Juli 2009
Client-side architecture
Samstag, 18. Juli 2009
Tag and exchange code snippets
Samstag, 18. Juli 2009
Samstag, 18. Juli 2009
Samstag, 18. Juli 2009
Take home messages
Samstag, 18. Juli 2009
• Re-finding information is a quite typical task in knowledge-work.
Take home messages
Samstag, 18. Juli 2009
• Re-finding information is a quite typical task in knowledge-work.
• Tagging is a helpful and well-known technique.
Take home messages
Samstag, 18. Juli 2009
• Re-finding information is a quite typical task in knowledge-work.
• Tagging is a helpful and well-known technique.
• Similarity-based retrieval can improve searches.
Take home messages
Samstag, 18. Juli 2009
• Re-finding information is a quite typical task in knowledge-work.
• Tagging is a helpful and well-known technique.
• Similarity-based retrieval can improve searches.
• Explanation-aware development of applications help you deal with increased complexity of similarity-based retrieval.
Take home messages
Samstag, 18. Juli 2009
Code-tagging and similarity-based retrieval with myCBRThomas Roth-Berghofer & Daniel BahlsSenior researcher, [email protected] German Research Centre for Artificial Intelligence DFKI GmbH
CAMBRIDGE, UK, 10 DEC 2008
Thank you!
Samstag, 18. Juli 2009