interlinking educational data to web of data (thesis presentation)

International doctorate thesis I li ki Ed i l d Interlinking Educational data to Web of Data Presented by: Enayat Rajabi Supervisors: Salvador Sanchez-Alonso Miguel-Ángel Sicilia May 2015

Upload: enayat-rajabi

Post on 28-Jul-2015



Presentations & Public Speaking

0 download


Page 1: Interlinking educational data to Web of Data (Thesis presentation)

International doctorate thesis

I li ki Ed i l d Interlinking Educational data to Web of Data

Presented by: Enayat RajabiSupervisors: Salvador Sanchez-Alonso

Miguel-Ángel Sicilia

May 2015

Page 2: Interlinking educational data to Web of Data (Thesis presentation)


1 Research context1. Research context2. Motivation3 State of the art3. State of the art4. General objective & approach5. Specific objectives6. Studies & experimentationsp7. Conclusion & future work

2 out of 54

Page 3: Interlinking educational data to Web of Data (Thesis presentation)

Research contextResearch context Linked Data An approach for e posing structured data An approach for exposing structured data

(triples) on the Web Currently, the LOD cloud includes ~10,000 y, ,

datasets (88 Billion triples) in different domains Datasets include metadata about objects

I li ki l h l bli h li k Interlinking tools help publishers to link datasets

12 datasets!May 2007

1. Research context3 out of 54

Page 4: Interlinking educational data to Web of Data (Thesis presentation)

Research contextResearch context eLearning eLearning repositories (including educational data) eLearning repositories (including educational data) eLearning metadata schema (Dublin Core, IEEE

LOM,…) Analysis on largest educational repository with

around one million metadata (GLOBE)

1. Research context4 out of 54

Page 5: Interlinking educational data to Web of Data (Thesis presentation)


An increasing number of educational resourcesare published on the Webare published on the Web.

Some of these resources are implicitly orsemantically related to each other.semantically related to each other.

The Linked Data approach allows resources to bereusable, and accessible for learners.

There exist a number of tools for exposing dataand semi-automatic linking between datasets.

2. Motivation5 out of 54

Page 6: Interlinking educational data to Web of Data (Thesis presentation)

State of the art (background)State of the art (background)

• Is there any mapping • Is the (meta)data

Practical steps for exposing data as Linked Data

y pp gtool to convert data?

( )structure flat or hierarchical?

Selecting a proper

Converting d t i t proper

schema orontology

data into a structured


Mapping data

according to the

Importing the RDF

dump to a triple store

• Creating a dump file• Selecting a proper

triple store• Setting up a SPARQL

ontologytriple store

Setting up a SPARQL endpoint

3. State of the art6 out of 54

Page 7: Interlinking educational data to Web of Data (Thesis presentation)

Linked Data exposure infrastructurep

3. State of the art7 out of 54

Page 8: Interlinking educational data to Web of Data (Thesis presentation)

State of the artState of the art

Interlinking educational data Studying the importance of interlinking on an

educational context (Stefan Dietz, 2012)

Exposing IEEE LOM as RDF RDF binding of some IEEE LOM elements

(Nil & P l é 2002)(Nilson & Palmér, 2002)

Interlinking tools Theoretical comparison of interlinking tools

(Wolger et al., 2011)

3. State of the art8 out of 54

Page 9: Interlinking educational data to Web of Data (Thesis presentation)

General objectiveGeneral objective

Investigating an interlinking approachInvestigating an interlinking approacheducational contexton an

4. General objective and approach9 out of 54

Page 10: Interlinking educational data to Web of Data (Thesis presentation)

General approachGeneral approach

4. General objective and approach10 out of 54

Page 11: Interlinking educational data to Web of Data (Thesis presentation)

Specific objectivesSpecific objectives

1 Analyzing an eLearning metadata schema 1. Analyzing an eLearning metadata schema for exposing it as Linked Open Data

2. Examining the datasets in the Linked 2. Examining the datasets in the Linked Open Data cloud

3. Investigating existing interlinking tools in g g g gan educational context

4. Assessing the interlinking results and g gtheir advantages

5. Specific objectives11 out of 54

Page 12: Interlinking educational data to Web of Data (Thesis presentation)

Objective 1:Analyzing a metadata schema for

exposing it as Linked Open Data

6. Studies and experimentations12 out of 54

Page 13: Interlinking educational data to Web of Data (Thesis presentation)

Exposing a flat schemaExposing a flat schema

DCT titlDCTerms:title



Mapping an RDB to Dublin Core

6. Studies and experimentations13 out of 54

Page 14: Interlinking educational data to Web of Data (Thesis presentation)

Exposing a complex schema (IEEE LOM)Exposing a complex schema (IEEE LOM)

6. Studies and experimentations14 out of 54

Page 15: Interlinking educational data to Web of Data (Thesis presentation)

IEEE LOM ontologyIEEE LOM ontologyIEEE LOM schema has a hierarchical structure and it supports different kinds of data types, so pp yp ,we had to:

Map the data types Specif a correct element for identifier (URI) Specify a correct element for identifier (URI) Choose a strategy for exposing aggregated elements (e.g., keyword) Reuse existing vocabularies Test the ontology in an implementationimplementation

6. Studies and experimentations15 out of 54

Page 16: Interlinking educational data to Web of Data (Thesis presentation)

A case study based on the ontologyy gy

6. Studies and experimentations16 out of 54

Page 17: Interlinking educational data to Web of Data (Thesis presentation)

Remarks of this investigationRemarks of this investigation

Analyzing the IEEE LOM schema for the sake of: exposing its elements as Linked Open Data creating an complete ontology identifying the appropriate elements for interlinking identifying the appropriate elements for interlinking

The exposing approach was applied for other schemas as well.

E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “I li ki Ed i l R W b f D h h “Interlinking Educational Resources to Web of Data through IEEE LOM”. Computer Science and Information Systems,vol. 12, No. 1, pp. 233–255, 2015.

6. Studies and experimentations17 out of 54

Page 18: Interlinking educational data to Web of Data (Thesis presentation)

Objective 2:Examining the datasets in the

Linked Open Data cloud

Page 19: Interlinking educational data to Web of Data (Thesis presentation)

The LOD datasets analysisThe LOD datasets analysis

We analyzed the Linked Open Data cloud to realize:

1. what datasets are more important in the cloud to be linked in an educational domain?to be linked in an educational domain?

Examining the LOD cloud using Social Network Analysis (SNA)

2. what educational datasets are appropriate for interlinking?

Selecting a set of educational datasets in Selecting a set of educational datasets in datahub using some metrics

6. Studies and experimentations19 out of 54

Page 20: Interlinking educational data to Web of Data (Thesis presentation)

The LOD datasets analysisThe LOD datasets analysis

We considered the LOD cloud as a directed graph and analyzed them according the following SNA metrics:analyzed them according the following SNA metrics: Betweenness Centrality (BC): If a dataset has a high BC value, then

many datasets are connected through it to others. In-Degree: the number of datasets point to the current dataset

D I D O D B C li

g p Out-Degree: the number of datasets that to the current dataset

point to

Dataset In-Degree Out-Degree Betweenness Centrality

DBpedia 181 30 82,664

Geonames 55 0 10,958

DrugBank 8 12 7,446

Bio2rdf-goa 11 8 3,751

Ordance-survey 16 0 3 272Ordance survey 16 0 3,272

6. Studies and experimentations20 out of 54

Page 21: Interlinking educational data to Web of Data (Thesis presentation)

The LOD datasets analysisThe LOD datasets analysis

High BCHigh BC

6. Studies and experimentations21 out of 54

Page 22: Interlinking educational data to Web of Data (Thesis presentation)

Selecting educational datasetsSelecting educational datasets

Exploring the LOD cloud to find educational d t t i th f ll i tdataset using the following steps:

Finding the datasets in datahub tagged as educational subjectseducational subjects

Checking their SPARQL endpoints or RDF dumps’ availability

Retrieving their specification (size, metadata schema, language…) from an interlinking point of view using SPARQLpoint of view using SPARQL

6. Studies and experimentations22 out of 54

Page 23: Interlinking educational data to Web of Data (Thesis presentation)

Datahub endpointDatahub endpoint

Exploring datahub endpoint to find educational datasets

6. Studies and experimentations23 out of 54

Page 24: Interlinking educational data to Web of Data (Thesis presentation)

Educational datasets bubble graphEducational datasets bubble graphSelecting 20 available educational datasets

6. Studies and experimentations24 out of 54

Page 25: Interlinking educational data to Web of Data (Thesis presentation)

Getting datasets specification using SPARQLGetting datasets specification using SPARQLDatasets Size (triple) SPARQL EndpointCharles University in Prague 93,233,661 8,026,637

h d d k ( ) 7 h // l d / lAchievement Standards Network (ASN) 7,494,201 6,619,847 of Southampton 5,726,668 - academic video search 4,932,352 of Muenster(LODUM) 4,179,372 U i it i UK 3 588 626 htt //d t k/ lOpen University in UK 3,588,626 of Huddersfield 3,553,343 ISVU (Kent) 2,421,268

/sparqlUniversity of Bristol 1,885,124

workshop/sparqlAalto University 1 589 122 http://data aalto fi/sparqlAalto University 1,589,122 Courseware Consortium metadata 636,453 (University of Oxford) 318,392 Thesaurus for the Social Sciences (GESIS) 305,329 62,375

h // d d i 2 i /LMF/ l/Open Data @ Tor Vergata 54,968

Vytautas Magnus University, Kaunas 39,279 3,906 project 132

6. Studies and experimentations25 out of 54

Page 26: Interlinking educational data to Web of Data (Thesis presentation)

Getting entities from the datasetsGetting entities from the datasets

Open University of UK endpoint

6. Studies and experimentations26 out of 54

Page 27: Interlinking educational data to Web of Data (Thesis presentation)

Remarks of this investigation

Selecting the DBpedia dataset as the LOD hub for i t li ki d ti l d t t

Remarks of this investigation

interlinking educational datasets Identifying a set of well-formed educational

datasets for interlinkingdatasets for interlinking

E. Rajabi, S. Sanchez-Alonso, and M.-A. Sicilia, “Analyzing Broken Links on the Web of Data: an Experiment with DBpedia,” Journal of the Association for Information Science and Technology (JASIST), vol. 65, no. 8, pp. 1721–1727, 2014.

E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Discovering Duplicate and Related Resources using Interlinking Approach: The case of Educational Datasets,” Journal of Information Science, first published on March 10 2015published on March 10, 2015

6. Studies and experimentations27 out of 54

Page 28: Interlinking educational data to Web of Data (Thesis presentation)

Objective 3:Investigating existing interlinking

tools in an educational context

6. Studies and experimentations

Page 29: Interlinking educational data to Web of Data (Thesis presentation)

Interlinking tools (comparison)Interlinking tools (comparison)

Tool DomainSPARQL/ RDF





GWAP Multimedia No Manual No Unknown

LIMES LOD Y A i Y YLIMES LOD Yes Automatic Yes Yes

LOD Refine General Yes Automatic Yes Partially

RDF-IA LOD RDF Dump Automatic No Unknown

SAI Multimedia No Automatic No Unknown

Silk LOD Yes Automatic Yes Yes

UCI LOD Y M l N U kUCI LOD Yes Manual No Unknown

6. Studies and experimentations29 out of 54

Page 30: Interlinking educational data to Web of Data (Thesis presentation)

Interlinking tools (general idea)Interlinking tools (general idea)

Source• Source data type: RDF dump• Source data type: RDF dump• Source entity: dcterms:title• Filtering: English titles Target

• Target data type. SPARQL Endpoint• Target entity: dcterms:title• Filtering: English titles

Setting• Matching algorithm: Trigrams• Matching algorithm: Trigrams• Threshold of acceptance: 95%• Output file format: N-TRIPLE• ...

6. Studies and experimentations30 out of 54

Page 31: Interlinking educational data to Web of Data (Thesis presentation)

Interlinking processInterlinking process

6. Studies and experimentations31 out of 54

Page 32: Interlinking educational data to Web of Data (Thesis presentation)

The interlinking tools (SILK)The interlinking tools (SILK)

6. Studies and experimentations32 out of 54

Page 33: Interlinking educational data to Web of Data (Thesis presentation)

The interlinking tools (LIMES)The interlinking tools (LIMES)

Source & Target datasets


6. Studies and experimentations33 out of 54

Page 34: Interlinking educational data to Web of Data (Thesis presentation)

The interlinking tools (LOD Refine)The interlinking tools (LOD Refine)

6. Studies and experimentations34 out of 54

Page 35: Interlinking educational data to Web of Data (Thesis presentation)

Sample interlinking results (exact matched)Sample interlinking results (exact matched)

Title in both datasets Globe resource Target URI Dataset name

lhttp://www.globe-i f / t/l 2


Nuclear Energy 108450

hub/publications/118933#pub Bristol Huddersfield

Bibliography OpenUK

http://data.uni-muenster de/context/istg/allegro/6/210/T0024


6. Studies and experimentations35 out of 54

Page 36: Interlinking educational data to Web of Data (Thesis presentation)

Evaluating the interlinking toolsEvaluating the interlinking tools

We used three tools to interlink GLOBE to DBpedia

GLOBE and DBpedia on title

6. Studies and experimentations36 out of 54

Page 37: Interlinking educational data to Web of Data (Thesis presentation)

Evaluating the interlinking toolsEvaluating the interlinking tools

Does the result change if we use more than one tool?

Common results among the tools

6. Studies and experimentations37 out of 54

Page 38: Interlinking educational data to Web of Data (Thesis presentation)

Remarks of this investigation

Applying the interlinking tools for linking datasets i li bl th d

Remarks of this investigation

is a reliable method. Silk and LIMES were the efficient tools for

similarity discovery among the LOD datasets.similarity discovery among the LOD datasets.

E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “An empirical study on the evaluation of interlinking tools on the Web of Data,” Journal of Information Science, vol 40, pp.637–648 2014 fi t bli h d J 11 2014648 2014, first published on June 11, 2014.

6. Studies and experimentations38 out of 54

Page 39: Interlinking educational data to Web of Data (Thesis presentation)

Objective 4:Assessing the interlinking results

and their advantages

6. Studies and experimentations

Page 40: Interlinking educational data to Web of Data (Thesis presentation)

Evaluating the interlinking resultsEvaluating the interlinking results

Interlinking tools perform an interlinking d i t t th t h d process and print out the matched resources.

The question under this discussion is to what The question under this discussion is to what extent are the results reliable?

An important step after doing the interlinking is evaluating the interlinking results by human and domain expertsand domain experts.

6. Studies and experimentations40 out of 54

Page 41: Interlinking educational data to Web of Data (Thesis presentation)

The interlinking approachThe interlinking approach

6. Studies and experimentations41 out of 54

Page 42: Interlinking educational data to Web of Data (Thesis presentation)

GLOBE metadata analysisGLOBE metadata analysis

Creating a criteria under which we can findappropriate elements for interlinking (datatype, completeness, content)pp p g ( yp , p , )

6. Studies and experimentations42 out of 54

Page 43: Interlinking educational data to Web of Data (Thesis presentation)

GLOBE metadata analysisGLOBE metadata analysis

6. Studies and experimentations43 out of 54

Page 44: Interlinking educational data to Web of Data (Thesis presentation)

Interlinking resultsInterlinking results

Title Keyword Taxon Coverage

GLOBE 8,260 228,352 134,791 12,941

Percentage 2% 74% 76% 78%

Interlinking through the Keyword element

6. Studies and experimentations44 out of 54

Page 45: Interlinking educational data to Web of Data (Thesis presentation)

Evaluating the interlinking resultsEvaluating the interlinking resultsWe evaluated the interlinking results from

the following perspectives:the following perspectives: Reliability Level of agreement between the ratersg

Relationship among results (e.g., threshold 75%)

Is parent of, Is related to, Is part of Enrichment of content

Li ki d Linking one resource to many datasets on the Web

6. Studies and experimentations45 out of 54

Page 46: Interlinking educational data to Web of Data (Thesis presentation)

Remarks of this investigationg Human experts (the results raters) agreed that the

interlinking results are reliable. Interlinking a learning repository to several

educational datasets in the LOD cloud leads to the enrichment of content.

Interlinking results can lead to duplicate metadata finding.

E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Discovering Duplicate and Related Resources using Interlinking Approach: The case of Educational Datasets,” Journal of Information Science, first

bli h d M h 10 2015

E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking Educational Data: an Experiment with Engineering-related

published on March 10, 2015

Resources in GLOBE,” International Journal of Engineering and Education, Vol 31-3, 2015.

6. Studies and experimentations46 out of 54

Page 47: Interlinking educational data to Web of Data (Thesis presentation)


1. Exposing eLearning metadata as Linked O D t Open Data

A complete analysis was done on exposing the IEEE LOM schema as RDFIEEE LOM schema as RDF.

A new ontology was designed for RDF binding of IEEE LOM.

Keyword, Coverage, Classification, and Title were appropriate elements for interlinking.

7. Conclusion & Future work47 out of 54

Page 48: Interlinking educational data to Web of Data (Thesis presentation)

Conclusions (cont.)

2. Evaluating Linked Data tools & datasets

( )

Silk and LIMES were the efficient frameworks in terms of discovering similarities between two ddatasets.

DBpedia was identified as the hub of the LOD cloud. cloud.

Twenty educational dataset were identified as the most proper targets for interlinking.

The Open University of UK includes rich metadata schema and reliable endpoint.

7. Conclusion & Future work48 out of 54

Page 49: Interlinking educational data to Web of Data (Thesis presentation)

Conclusions (cont.)

3. Enriching the educational datasets

( )

g Interlinking results were reviewed and

verified by human experts. Several educational resources were linked to

more than one resources in the LOD cloud. A duplicate identification was proposed after

the analysis of the interlinking results.

7. Conclusion & Future work49 out of 54

Page 50: Interlinking educational data to Web of Data (Thesis presentation)

Additional contributions

Implementing several platform for exposing data as Linked DataLinked Data Organic.Edunet ( ARIADNE ( Open Discovery Space (http://data opendiscoveryspace eu) Open Discovery Space ( Agrega ( )

Submitting the IEEE LOM ontology to Linked Open Vocabularies (LOV) at Vocabularies (LOV) at

Developing an online Mashup to interlinking eLearning objects to Web of Data (research stay in eLearning objects to Web of Data (research stay in Agroknow)

Writing a book chapter about “Optimizing Big Data using the Linked Data approach”

7. Conclusion & Future work50 out of 54

Page 51: Interlinking educational data to Web of Data (Thesis presentation)

Additional contributions (cont.)( )

7. Conclusion & Future work51 out of 54

Page 52: Interlinking educational data to Web of Data (Thesis presentation)

Future work

Content Applying the interlinking approach to other

educational repositories Tools and software Tools and software

Extending the tools to link one datasets to several datasets at the same time

Adding some semantic similarity services to tools to improve the interlinking results

Linking educational resources by datasets crawling

7. Conclusion & Future work52 out of 54

Page 53: Interlinking educational data to Web of Data (Thesis presentation)

Publications (journal papers)(j p p ) E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking Educational Resources to Web of Data

through IEEE LOM”. Computer Science and Information Systems, vol. 12, No. 1, pp. 233–255, 2015.

E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Discovering Duplicate and Related Resources using Interlinking Approach: The case of Educational Datasets,” Journal of Information Science, first published on March 10, 2015 doi:10.1177/0165551515575922.

E. Rajabi, S. Sanchez-Alonso, and M.-A. Sicilia, “Analyzing Broken Links on the Web of Data: an Experiment with DBpedia,” Journal of the Association for Information Science and Technology (JASIST), vol. 65, no. 8, pp. 1721–1727, 2014 doi: 10.1002/ ( ) pp /

E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “An empirical study on the evaluation of interlinking tools on the Web of Data,” Journal of Information Science, vol 40, pp.637–648 2014, first published on June 11, 2014 doi:10.1177/0165551514538151.

E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking Educational Data: an Experiment with Engineering related Resources in GLOBE ” International Journal of Engineering and Education Engineering-related Resources in GLOBE, International Journal of Engineering and Education, 2015. In press.

E. Rajabi, W Greller, K Niemann, K Kastrantas, S Sanchez-Alonso, Social data interoperability in educational repositories and federations , International Journal of Metadata, Semantics and Ontologies 8 (2), 169 - 178, 2013.

E. Rajabi, S. Sanchez-Alonso, M.-A. Sicilia, and N. Mouneselis, “A linked and open dataset from a network of learning repositories on organic agriculture”, British Journal of Educational Technology, submitted (under second review).

M-C Valiente, M.-A. Sicilia, E. Garcia-Barriocanal, E. Rajabi, "Adopting the metadata approach to improve the search and analysis of educational resources for online learning", Computers in Human

7. Conclusion & Future work53 out of 54

p y g , pBehavior. 2015. In press.

Page 54: Interlinking educational data to Web of Data (Thesis presentation)

Publications (conference papers)( p p ) E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking Educational Data: an

Experiment with GLOBE Resources,” presented at the First International Conference on Technological Ecosystem for Enhancing Multiculturality, Salamanca, Spain, 2013. g y g y, , p ,

E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, "Research Objects Interlinking: The Case of Dryad Repository”, presented at Metadata and Semantics Research, Karlsruhe, Germany, 2014.

E. Rajabi, and S. Sanchez-Alonso, "Enriching the e-learning contents using j , , g g ginterlinking”, presented at 5th eLearning conference, Belgrade, Serbia, 2014. Link:

E. Rajabi, M.-A. Sicilia, S. Sanchez-Alonso, A Simple Approach towards SKOSificationof Digital Repositories , Metadata and Semantics Research, 67-74, 2013.

M-A. Sicilia, S. Sanchez-Alonso, E. Garcia-Barriocanal, J. Minguillón and E. Rajabi, Exploring the keyword space in large learning resource aggregations: the case of GLOBE, Lacro workshop, April 2013.

7. Conclusion & Future work54 out of 54