presentation at mtsr 2012

23
Date: 30/11/2012 SSONDE: Semantic Similarity On liNked Data Entities Riccardo Albertoni [email protected] Ontology Engineering Group. Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid Joint work with Monica De Martino (CNR-IMATI-GE) MTSR 2012, 6th Metadata and Semantics Research Conference 28-30 November 2012 - Cádiz (Spain)

Upload: riccardo-albertoni

Post on 21-Jun-2015

61 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Presentation at MTSR 2012

Date: 30/11/2012

SSONDE: Semantic Similarity On liNked Data Entities

Riccardo [email protected]

Ontology Engineering Group. Departamento de Inteligencia ArtificialFacultad de Informática

Universidad Politécnica de MadridJoint work with Monica De Martino (CNR-IMATI-GE)

MTSR 2012,

6th Metadata and Semantics Research Conference

28-30 November 2012 - Cádiz (Spain)

Page 2: Presentation at MTSR 2012

2

Presentation Outline

1. How SSONDE fits with other linked data technologies• What is it for? what is it not for?

2. Characteristics of instance similarity in SSONDE• The theory behind SSONDE’s similarity is detailed in

• Albertoni R. and De Martino M.; Asymmetric and context dependent semantic similarity among ontology instances, Journal of Data Semantics, LNCS, 2008.

3. SSONDE Architecture and Examples on Linked Data

Riccardo Albertoni

Page 3: Presentation at MTSR 2012

3

Linked data Crawling architectural pattern

Riccardo Albertoni

SSONDE

LDSPIDER/FUSEKI

LDIF

Cluster analysis Explorative search on resources

Build analysis services

Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). 1-136. Morgan & Claypool

Page 4: Presentation at MTSR 2012

4

SSONDE Instance similarity

is not to align ontologies/schemas;

to interlink/consolidate entities;

aims at • providing a method for comparing entities represented as

instances in ontology driven repository or as entities exposed in linked data;

• supporting in explorative searches.

assumes all the integration steps are doneActually, it works at the Application Layer of the Linked Data Crawling Architectural Pattern

main characteristics (make SSONDE unique in its kind)Context to represent similarity criteria (algorithm parameters);

Asymmetry to emphasize containment between instances.

Example: comparing researchers

Page 5: Presentation at MTSR 2012

5

Presentation Outline

1. How SSONDE fits with other linked data technologies• What is it for? what is it not for?

2. Characteristics of instance similarity in SSONDE• The theory behind SSONDE’s similarity is detailed in

• Albertoni R. and De Martino M.; Asymmetric and context dependent semantic similarity among ontology instances, Journal of Data Semantics, LNCS, 2008.

3. SSONDE Architecture and Examples on Linked Data

Riccardo Albertoni

Page 6: Presentation at MTSR 2012

6

Example: Researchers’ comparison

theirPublications

ResearchersTheir

Research Topics

TheirProjects

Page 7: Presentation at MTSR 2012

7

• Common publications

• Common research projects

• Similar research interests

Different Contexts

the researchers, publications, … are instances

Researcher’sExperience

Researchers’ Scientific

Interest

• Age• Number of

publications• Number of projects

Contexts Researchers’ Features

(Data/Object properties) considered in the Sim.

It is used only in this context!!They are used

In both the contexts!!

Page 8: Presentation at MTSR 2012

8

[ResearchStaff, Interest]{{{TopicName,Inter}},{{RelatedTopic, Inter} }}

Formalization of Application Context

A function that for each recursion path specifies data/objects properties and which operations to consider

Example

• Common publications• Common research

project• Similar research interest

Researchers’ Scientific

Interest

[ResearchStaff] {{Φ}, {{Publication, Inter} {WorkAtProject, Inter} {interest, Simil}}}

Page 9: Presentation at MTSR 2012

9

Why an Asymmetric Similarity?

Sim(a,b) might differ from Sim(b,a) • Sim is not the inverse of a metric distance metric properties

cannot be exploited to prune comparisons

Here asymmetry is adopted to highlight the containment between instances A, B

Example of containment: (Comparing wrt publications only)

• A is Ph.D student who has always published with his tutor B,

A

B

pub 3

pub 1

pub 2

A is contained in B!!! (A<<B)A can be replaced by B

B is not contained in A!!!If you replace B with A

some experience got lost !!

Page 10: Presentation at MTSR 2012

10

SSONDE’s Asymmetric Similarity returns

Sim(A,B) ranges in [0,1]

It is proportional to the number of data and object property values that A shares with B • A is contained in B Sim(A,B)=1 • If A is not contained in B Sim(A,B)<1 • If A and B don’t share any “features” Sim(A,B)=0• If A has exactly the same characteristics of B (A<<B,

B<<A) Sim(A,B) = Sim(B,A) = 1

Page 11: Presentation at MTSR 2012

11

Results comparing young and senior researchers of IMATI

Research Experience Research Interest

The darkest is the matrix value the more is the similarity

Page 12: Presentation at MTSR 2012

12

Presentation Outline

1. How SSONDE fits with other linked data technologies• What is it for? what is it not for?

2. Characteristics of instance similarity in SSONDE• The theory behind SSONDE’s similarity is detailed in

• Albertoni R. and De Martino M.; Asymmetric and context dependent semantic similarity among ontology instances, Journal of Data Semantics, LNCS, 2008.

3. SSONDE Architecture and Examples on Linked Data

Riccardo Albertoni

Page 13: Presentation at MTSR 2012

13

SS

ON

DE

Output

TDB Rep.

SDBRep.

RDF Dumps

Configuration Similarity

Context Layer

Ontology Layer

Data Layer

Data wrappers

JENA TDB

JENASDB

JENA MEM

List of Instances Java Class to

generate the list

Ref. Context

Ref. Rules (e.g., JENA rules)

Similarity matrix in CSV

n-most similar entities

In JSON...Virtuoso

Wrppr

virtuoso

Kind of Store

….

WE

B O

F

DA

TA

RDF Dumps

HTTP DEREFERENCIABLE URIs

SPARQLEnd Points

Third parties

Served Linked dataset

Crawling architectural pattern

LDIFLDSpider +Fuseki Linked data consumption

Local Data Store/Cache

SSONDE ARCHITECTURE

Page 14: Presentation at MTSR 2012

14

SSONDE: a building block for new analysis services

SSONDE applied on “real linked data”• Analysing Habitat and Species

• published in NatureSDIplus (ECP-2007-GEO-317007), a European project developing a Spatial Data Infrastructure for Nature Conservation.

• to rank habitats according to the species they host an insight into inter-dependencies between habitats and species

• Analysing overlaps among scientific interests• Subset of linked dataset provided data.cnr.it as part of

SemanticScout framework by third parties (Gangemi et al)• to compare IMATI-CNR researcher according to their

research interests

Riccardo Albertoni

Page 15: Presentation at MTSR 2012

15Riccardo Albertoni

Identify crawling seeds• URI of

entities to be involved in the analysis

Identify RDF properties• to be used in

the instances comparison

Run LDSpider • constraining

the crawling to the selected properties

configure SSONDE• JSON

configuration file

• Context definition

Run SSONDE

Analyse results

Applying SSONDE on data.cnr.it

Page 16: Presentation at MTSR 2012

16Riccardo Albertoni

Identify crawling seeds• URI of

entities to be involved in the analysis

Identify RDF properties• to be used in

the instances comparison

Run LDSpider • constraining

the crawling to the selected properties

configure SSONDE• JSON

configuration file

• Context definition

Run SSONDE

Analyse results

Applying SSONDE on data.cnr.it

http://code.google.com/p/ssonde/wiki/RDF_statements_download

Page 17: Presentation at MTSR 2012

17

Configuration file 1

{ "StoreConfiguration":{

"KindOfStore":"JENATDB",

"RDFDocumentURIs":[ ],

"TDBDirectory":"data/CNRIT/TDB-0.8.9/CNRR/"

},

"InstanceConfiguration":{

"InstanceURIsClass":"application.dataCNRIt.GetResearcherIMATIplusCoauthor"

},

"OutputConfiguration":{

"KindOfOutput":"JSONOrderedResult",

"NumberOfOrderedResult":”20",

"FilePath":"conf/dataCNRIt/ComplexContextResearchInterest/CRRIIntPub.res.json"

},

"ContextConfiguration":{

"ContextFilePath":"conf/dataCNRIt/ComplexContextResearchInterest/CCRIIntPub.ctx"

}

}

Riccardo Albertoni

List of LOD Entities URI Java class Implementing ListOfInputInstances

Similarity Matrix CSV - JSON encoding of top n-most similar

Context Encoded in a format in-house text format/ hopefully soon in JSON

Page 18: Presentation at MTSR 2012

18

Crawled by Data.CNR.it

Crawled by DBPEDIA

Data.cnr.it – defining a context

Riccardo Albertoni

Res 226

pub: 22

Topic:25 Res 225

Topic:26

pub: 26

Topic:2

pub: 29

Res 226

Topic:27

Topic:23

skos:broader

dc:subject

pub:autoreCNRdi

PREFIX dc: <http://purl.org/dc/terms/>PREFIX pub: <http://www.cnr.it/ontology/cnr/pubblicazioni.owl#>

[owl:Thing, dc:subject]-> {{},{(skos:broader, Inter)}}[owl:Thing]-> {{}, { (pub:autoreCNRDi, Inter),(dc:subject, Simil)}}

No data properties are

considered in this context

Publications

Interests

Interest Hierarchy

Page 19: Presentation at MTSR 2012

19

Similarity Matrix:

Riccardo Albertoni

data is more recent but less accurate

But More Researchers are

represented&

Still containment is highlighted

Page 20: Presentation at MTSR 2012

20

Hierarchical clustering: Scientific cluster are discovered

Computer graphics

Grid Computing

Knowledge management

E-Learning

Visiting researchers/

Technicians / Associates

Hierarchical Clustering Hierarchical Clustering Explorer, 3.0, Human-Computer Interaction Lab University of Maryland. http://www.cs.umd.edu/hcil/multi-cluster/.

Page 21: Presentation at MTSR 2012

21

What next?

(i) semantic similarity optimization:(i) the caching of intermediate similarity results

(ii) the adoption of MapReduce paradigm to speed up the assessment of semantic similarity;

(ii) domain driven extensions at data layer: (iii) defining new data layer measures suited for geo-

referenced entities

(iv) the multilingual similarity

(iii) definition of interfaces sifting entities according to their similarity exploiting visualization frameworks such as Exibit, Google visualization and JavaScript InfoVis Toolkit.

Riccardo Albertoni

Page 22: Presentation at MTSR 2012

22

THANKS for your kind attention!!!

Questions/ Discussion / Suggestion Riccardo Albertoni

• SSONDE can be deployed in some of your future projects (proposal)

• You are interested in contributing to SSONDE Open framework

Do not hesitate to contact us if

SSONDE framework• pushes our instance similarity as a ready-to-go tool for the

analysis of linked data. • its Java Code available in Google Code

• http://purl.oclc.org/NET/SSONDE• licenced as open source code (GNU GPL v3)

Page 23: Presentation at MTSR 2012

23

SSONDE Framework • R. Albertoni, M. De Martino, SSONDE: Semantic Similarity On liNked Data Entities, 6th Metadata

and Semantics Research Conference, 28-30 November 2012 - Cádiz (Spain) [to appear]• Framework Installation & use http://code.google.com/p/ssonde/wiki/GettingStarted

Semantic Similarity Theoretical Framework• Albertoni R. and De Martino M.; Asymmetric and context dependent semantic similarity among

ontology instances, Journal of Data Semantics, LNCS, 2008.• Albertoni R. and De Martino M.;. Semantic similarity of ontology instances tailored on the

application context. Full paper at On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE, volume 4275 of LNCS, pages 1020–1038. Springer, 2006.

Issues adapting theoretical framework to Linked Data • Albertoni R., De Martino M.; Semantic Similarity and Selection of Resources Published

According to Linked Data Best Practice, OnToContent 2010, Part of the OTM (OTM'10)

Further ApplicationsComparing EUNIS habitats wrt their species• Albertoni R., De Martino M.; Semantic Technology to Exploit Digital Content Exposed as Linked

Data, eChallenges e-2011, 26-28 October 2011 Florence, Italy

Comparing shapes metadata (not Linked Data)• Albertoni R., De Martino M.; Using Context Dependent Semantic Similarity to Browse

Information Resources: an Application for the Industrial Design, First workshop on multimedia Annotation and Retrieval enabled by Shared Ontologies, Genoa, Italy, (2007)

A complete list of references on SSONDE and its Instance Similarity