model driven retrieval of model repositories

22
Model-Driven Retrieval of Model Repositories Politecnico di Milano POLO TERRITORIALE DI COMO Master of Science in Computer Engineering Supervisor: Prof. Marco Brambilla Assistant Supervisor: Prof. Alessandro Bozzon Master graduation thesis by: Stefano Celentano, ID: 755287

Upload: marco-brambilla

Post on 24-Jan-2015

892 views

Category:

Technology


2 download

DESCRIPTION

Model-Driven Development (MDD) is a software development methodology that focuses on the creation and maintenance of domain models as the primary form of expression in the development cycle. One of the fundamental characteristics of such approach is the reuse of software artifacts through their model representation. However, software reuse is impaired by the fact that current systems lack an efficient way to search through the model repositories as many of the current solutions don't tackle the relationships between model artifacts. These relationships are instead important to better satisfy the user information need in a model-driven development scenario. This thesis aims to define a model-driven methodology for creating model search engines. As opposed to many related works, this methodology is metamodel-independent and exploits the metamodel of the searched project models in order to obtain more precise results. A prototype has been implemented to support such methodology. We address two case studies that deal with the indexing and the retrieving of models from two different collections of UML and WebML projects respectively. Each case study involves several experiments adopting different indexing strategies. Finally, after having manually built the ground truth for each repository, we performed various tests using established Information Retrieval measures like DCG, MRR, MAP, Precision and Recall in order to evaluate the results.

TRANSCRIPT

Page 1: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories

Politecnico di MilanoPOLO TERRITORIALE DI COMO

Master of Science in Computer Engineering

Supervisor: Prof. Marco BrambillaAssistant Supervisor: Prof. Alessandro Bozzon

Master graduation thesis by:Stefano Celentano, ID: 755287Lorenzo Furrer, ID: 750213

Page 2: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 2

Introduction• Software models retrieval is essential for the paradigm of

Model-Driven Development (MDD)• Current systems lack efficient and standardized

methodologies• The metamodel is not taken into account

• Our contributions:• A methodology for model-driven retrieval of model repositories that

takes into account the metamodels• The development of a prototype

for such methodology• Two case studies• Evaluation of different test

configurations

Page 3: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 3

Outline• Model retrieval approaches• MDD and Metamodeling• Our Approach:

• Abstract Solution• Design Dimensions• Indexing Strategies

• Prototype Architecture• Case Studies

• UML Case Study• WebML Case Study

• Tests and evaluation• Future works

Introduction & Methodology

Prototype & Case studies

Page 4: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 4

Model Retrieval Approaches• Text-based

• Model representation: unstructured document

(bag of words) (e.g., Vector Space Model, Tf-idf)• Query type: keyword-based• Matching algorithm: standard IR similarity

measures (e.g., cosine similarity)

• Content-based• Model representation: model structure is

taken into account (e.g., graph-based)• Query type: search by example• Matching algorithm: ad-hoc algorithms

(depends on the model representation)

Page 5: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 5

Model-Driven Development and Metamodeling

• A fundamental concept: «metamodel»

• MOF (Meta-Object Facility)

Meta-metamodel

Metamodel

Model

Instance

Page 6: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 6

Our Approach (1/3): Abstract Solution

Page 7: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 7

Our Approach (2/3): Design Dimensions

• Segmentation Granularity• Whole project• Subproject• Project concept

• Index structure• Flat• Weighted• Multi-field• Hybrid (e.g., multi-field index

containing weighted terms)

• Query type• Keyword-based search• Search by example

• Result presentation• Snippet visualization• Faceted search

Page 8: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 8

Our Approach (3/3): Indexing StrategiesSegmentation Granularity

Index Index terms weights

Experiment A Whole project Flat NO

Experiment B Metamodel concept

Multi-field NO

Experiment C Metamodel concept

Multi-field Assigned according to the metamodel concept

Experiment D* Metamodel concept

Multi-field Assigned according to the metamodel concept

* The indexing phase includes a graph-based algorithm that enriches the document representation of a model element with information that are extracted from its neighboring elements.

Page 9: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 9

Prototype Architecture

Data Source

Router Listener BPEL pipeline

Analyzers

Index

Queue

BPELProcessor

Apache Solr

Configurator

• Based on SMILA: an extensible framework for building search solutions to access unstructured information.

• Uses Apache Solr: a scalable search platform featuring full-text search.

Crawler

Page 10: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 10

Prototype Architecture

Data Source

Router Listener BPEL pipeline

Analyzers

Index

Queue

• Based on SMILA: an extensible framework for building search solutions to access unstructured information.

• Uses Apache Solr: a scalable search platform featuring full-text search.

BPELProcessor

Apache Solr

Configurator

Crawler

Page 11: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 11

Case Studies

• UML Class Diagram• 84 meta-models from AtlanMod• Small size• General purpose

• WebML• 12 real-life industrial projects• Large size• Large quantity of concepts• Domain specific

Page 12: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 12

UML Case Study: Experiment A

location commentsBefore commentsAfter entries predicates name type allFields fields predicate name expression field value LocatedElement Query Entry Field Predicate Expression

Content Field:

• Granularity: Project• Index: Flat

Page 13: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 13

UML Case Study: Experiment B

Entry

ClassName Field:

BQL

ProjectName Field:

name type allFields fields predicate

AttributeNames Field:

• Granularity: Class• Index: Multi-Field

Page 14: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 14

UML Case Study: Experiment C

Entry|1.7

ClassName Field:

BQL|1.0

ProjectName Field:

name|1.0 type|1.0 allFields|1.0 fields|1.5 predicate|1.6

AttributeNames Field:

• Granularity: Class• Index: Multi-Field, Weighted

Page 15: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 15

UML Case Study: Experiment D

Entry|1.7

ClassName Field:

BQL|1.0

ProjectName Field:

name|0.75 location|0.9 commentsBefore|0.9 commentsAfter|0.9 name|1.0 type|1.0 allFields|1.0 predicate|1.6 fields|1.3 Predicate|0.765 Query|0.816 Field|0.85 LocatedElement|0.9

AttributeNames Field:

#HOP = 1

• Granularity: Class• Index: Multi-Field, Weighted

Page 16: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 16

WebML Case Study: Experiment B

Book requests Create book ConnectUserToBook New book request Newbook User Book request list

Content Field:

Book requests

AreaName Field:

• Granularity: Area• Index: Multi-Field

Page 17: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 17

WebML Case Study: Experiment C

Create|1.0 book |1.0 ConnectUserToBook|1.0 New|1.1 book|1.1 request|1.1 New|1.0Book|1.0 User|1.0 Book |1.1request|1.1 list|1.1

Content Field:

Book|1.2 requests|1.2

AreaName Field:

• Granularity: Area• Index: Multi-Field, Weighted

Page 18: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 18

Tests and Evaluation: Meta-queriesMeta-queries Type of

searched document

Information need

1 Project All projects related to one specific topic

2 Project All projects related to one general topic

3 Pattern Searches for a pattern by using as query string the terms belonging to different classes connected by some relation

4 Class Searching for a class using as query string all (or some) of the terms belonging to a class

5 Class Searching for a class using as query string some of the terms belonging to a class and some terms related to the project

Page 19: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 19

UML Experiment A (Project Granularity, Flat Index)

• DCG and iDCG are very close in the first 3 positions.

• ALWAYS able to retrieve the most relevant document in the first position.

Page 20: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 20

Other UML Experiments

• Weighted experiment is always better than the non-weighted one.

• Both Experiments B and C are close to the ideal curve in the first positions.

• Experiment D is supposed to answer a different user need than the one captured by the used ground truth.

Page 21: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 21

WebML Experiments

• Experiments B and C perform identically up to the third position.

• After that, the experiment using weights performs always slightly better than the non-weighted one.

Page 22: Model driven retrieval of model repositories

Model-Driven Retrieval of Model Repositories 22

Conclusions

• Integrating a content-based solution• Metamodel integration• Testing more configurations• Weight training

Future Directions

• The system has been tested with both a general purpose and a domain specific modeling language.

• Good performances in the first rank positions.• Performances of the weighted case are always better or equal

than the others, albeit slightly.• The prototype has shown good results in retrieving documents that

are relevant in terms of conceptual and terminological similarity.• Structural similarity is difficult to capture in a text-based search.