interlinking educational data to web of data (thesis presentation)
TRANSCRIPT
International doctorate thesis
I li ki Ed i l d Interlinking Educational data to Web of Data
Presented by: Enayat RajabiSupervisors: Salvador Sanchez-Alonso
Miguel-Ángel Sicilia
May 2015
Agendag
1 Research context1. Research context2. Motivation3 State of the art3. State of the art4. General objective & approach5. Specific objectives6. Studies & experimentationsp7. Conclusion & future work
2 out of 54
Research contextResearch context Linked Data An approach for e posing structured data An approach for exposing structured data
(triples) on the Web Currently, the LOD cloud includes ~10,000 y, ,
datasets (88 Billion triples) in different domains Datasets include metadata about objects
I li ki l h l bli h li k Interlinking tools help publishers to link datasets
12 datasets!May 2007
1. Research context3 out of 54
Research contextResearch context eLearning eLearning repositories (including educational data) eLearning repositories (including educational data) eLearning metadata schema (Dublin Core, IEEE
LOM,…) Analysis on largest educational repository with
around one million metadata (GLOBE)
1. Research context4 out of 54
MotivationMotivation
An increasing number of educational resourcesare published on the Webare published on the Web.
Some of these resources are implicitly orsemantically related to each other.semantically related to each other.
The Linked Data approach allows resources to bereusable, and accessible for learners.
There exist a number of tools for exposing dataand semi-automatic linking between datasets.
2. Motivation5 out of 54
State of the art (background)State of the art (background)
• Is there any mapping • Is the (meta)data
Practical steps for exposing data as Linked Data
y pp gtool to convert data?
( )structure flat or hierarchical?
Selecting a proper
Converting d t i t proper
schema orontology
data into a structured
format
Mapping data
according to the
Importing the RDF
dump to a triple store
• Creating a dump file• Selecting a proper
triple store• Setting up a SPARQL
ontologytriple store
Setting up a SPARQL endpoint
3. State of the art6 out of 54
Linked Data exposure infrastructurep
3. State of the art7 out of 54
State of the artState of the art
Interlinking educational data Studying the importance of interlinking on an
educational context (Stefan Dietz, 2012)
Exposing IEEE LOM as RDF RDF binding of some IEEE LOM elements
(Nil & P l é 2002)(Nilson & Palmér, 2002)
Interlinking tools Theoretical comparison of interlinking tools
(Wolger et al., 2011)
3. State of the art8 out of 54
General objectiveGeneral objective
Investigating an interlinking approachInvestigating an interlinking approacheducational contexton an
4. General objective and approach9 out of 54
General approachGeneral approach
4. General objective and approach10 out of 54
Specific objectivesSpecific objectives
1 Analyzing an eLearning metadata schema 1. Analyzing an eLearning metadata schema for exposing it as Linked Open Data
2. Examining the datasets in the Linked 2. Examining the datasets in the Linked Open Data cloud
3. Investigating existing interlinking tools in g g g gan educational context
4. Assessing the interlinking results and g gtheir advantages
5. Specific objectives11 out of 54
Objective 1:Analyzing a metadata schema for
exposing it as Linked Open Data
6. Studies and experimentations12 out of 54
Exposing a flat schemaExposing a flat schema
DCT titlDCTerms:title
DCTerms:date
DCTerms:publisher
Mapping an RDB to Dublin Core
6. Studies and experimentations13 out of 54
Exposing a complex schema (IEEE LOM)Exposing a complex schema (IEEE LOM)
6. Studies and experimentations14 out of 54
IEEE LOM ontologyIEEE LOM ontologyIEEE LOM schema has a hierarchical structure and it supports different kinds of data types, so pp yp ,we had to:
Map the data types Specif a correct element for identifier (URI) Specify a correct element for identifier (URI) Choose a strategy for exposing aggregated elements (e.g., keyword) Reuse existing vocabularies Test the ontology in an implementationimplementation
6. Studies and experimentations15 out of 54
A case study based on the ontologyy gy
6. Studies and experimentations16 out of 54
Remarks of this investigationRemarks of this investigation
Analyzing the IEEE LOM schema for the sake of: exposing its elements as Linked Open Data creating an complete ontology identifying the appropriate elements for interlinking identifying the appropriate elements for interlinking
The exposing approach was applied for other schemas as well.
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “I li ki Ed i l R W b f D h h “Interlinking Educational Resources to Web of Data through IEEE LOM”. Computer Science and Information Systems,vol. 12, No. 1, pp. 233–255, 2015.
6. Studies and experimentations17 out of 54
Objective 2:Examining the datasets in the
Linked Open Data cloud
The LOD datasets analysisThe LOD datasets analysis
We analyzed the Linked Open Data cloud to realize:
1. what datasets are more important in the cloud to be linked in an educational domain?to be linked in an educational domain?
Examining the LOD cloud using Social Network Analysis (SNA)
2. what educational datasets are appropriate for interlinking?
Selecting a set of educational datasets in Selecting a set of educational datasets in datahub using some metrics
6. Studies and experimentations19 out of 54
The LOD datasets analysisThe LOD datasets analysis
We considered the LOD cloud as a directed graph and analyzed them according the following SNA metrics:analyzed them according the following SNA metrics: Betweenness Centrality (BC): If a dataset has a high BC value, then
many datasets are connected through it to others. In-Degree: the number of datasets point to the current dataset
D I D O D B C li
g p Out-Degree: the number of datasets that to the current dataset
point to
Dataset In-Degree Out-Degree Betweenness Centrality
DBpedia 181 30 82,664
Geonames 55 0 10,958
DrugBank 8 12 7,446
Bio2rdf-goa 11 8 3,751
Ordance-survey 16 0 3 272Ordance survey 16 0 3,272
6. Studies and experimentations20 out of 54
The LOD datasets analysisThe LOD datasets analysis
High BCHigh BC
6. Studies and experimentations21 out of 54
Selecting educational datasetsSelecting educational datasets
Exploring the LOD cloud to find educational d t t i th f ll i tdataset using the following steps:
Finding the datasets in datahub tagged as educational subjectseducational subjects
Checking their SPARQL endpoints or RDF dumps’ availability
Retrieving their specification (size, metadata schema, language…) from an interlinking point of view using SPARQLpoint of view using SPARQL
6. Studies and experimentations22 out of 54
Datahub endpointDatahub endpoint
Exploring datahub endpoint to find educational datasets
6. Studies and experimentations23 out of 54
Educational datasets bubble graphEducational datasets bubble graphSelecting 20 available educational datasets
6. Studies and experimentations24 out of 54
Getting datasets specification using SPARQLGetting datasets specification using SPARQLDatasets Size (triple) SPARQL EndpointCharles University in Prague 93,233,661 http://linked.opendata.cz/sparqlUNISTAT-KIS 8,026,637 http://data.linkedu.eu/kis/query
h d d k ( ) 7 h // l d / lAchievement Standards Network (ASN) 7,494,201 http://sparql.jesandco.org:8890/sparqlData.gov.uk 6,619,847 http://services.data.gov.uk/education/sparqlUniversity of Southampton 5,726,668 http://sparql.data.southampton.ac.uk/Yovisto - academic video search 4,932,352 http://sparql.yovisto.com/University of Muenster(LODUM) 4,179,372 http://data.uni-muenster.de/sparql/O U i it i UK 3 588 626 htt //d t k/ lOpen University in UK 3,588,626 http://data.open.ac.uk/sparqlUniversity of Huddersfield 3,553,343 http://data.linkedu.eu/hud/querySemantic ISVU (Kent) 2,421,268 http://kent.zpr.fer.hr:8080/educationalProgram
/sparqlUniversity of Bristol 1,885,124 http://resrev.ilrt.bris.ac.uk/data-server-
workshop/sparqlAalto University 1 589 122 http://data aalto fi/sparqlAalto University 1,589,122 http://data.aalto.fi/sparqlOpen Courseware Consortium metadata 636,453 http://data.linkedu.eu/ocw/queryOxPoints (University of Oxford) 318,392 https://data.ox.ac.uk/sparql/TheSoz Thesaurus for the Social Sciences (GESIS) 305,329 http://lod.gesis.org/thesoz/sparqlPROD 62,375 http://data.linkedu.eu/prod/query
h // d d i 2 i /LMF/ l/Open Data @ Tor Vergata 54,968 http://opendata.ccd.uniroma2.it/LMF/sparql/select
Vytautas Magnus University, Kaunas 39,279 http://kaunas.rkbexplorer.com/sparql/MoreLab 3,906 http://www.morelab.deusto.es/joseki/articlesForge project 132 http://data.linkedu.eu/forge/query
6. Studies and experimentations25 out of 54
Getting entities from the datasetsGetting entities from the datasets
Open University of UK endpoint
6. Studies and experimentations26 out of 54
Remarks of this investigation
Selecting the DBpedia dataset as the LOD hub for i t li ki d ti l d t t
Remarks of this investigation
interlinking educational datasets Identifying a set of well-formed educational
datasets for interlinkingdatasets for interlinking
E. Rajabi, S. Sanchez-Alonso, and M.-A. Sicilia, “Analyzing Broken Links on the Web of Data: an Experiment with DBpedia,” Journal of the Association for Information Science and Technology (JASIST), vol. 65, no. 8, pp. 1721–1727, 2014.
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Discovering Duplicate and Related Resources using Interlinking Approach: The case of Educational Datasets,” Journal of Information Science, first published on March 10 2015published on March 10, 2015
6. Studies and experimentations27 out of 54
Objective 3:Investigating existing interlinking
tools in an educational context
6. Studies and experimentations
Interlinking tools (comparison)Interlinking tools (comparison)
Tool DomainSPARQL/ RDF
DumpManual/
AutomaticWell-
documentedCustomization
flexibility
GWAP Multimedia No Manual No Unknown
LIMES LOD Y A i Y YLIMES LOD Yes Automatic Yes Yes
LOD Refine General Yes Automatic Yes Partially
RDF-IA LOD RDF Dump Automatic No Unknown
SAI Multimedia No Automatic No Unknown
Silk LOD Yes Automatic Yes Yes
UCI LOD Y M l N U kUCI LOD Yes Manual No Unknown
6. Studies and experimentations29 out of 54
Interlinking tools (general idea)Interlinking tools (general idea)
Source• Source data type: RDF dump• Source data type: RDF dump• Source entity: dcterms:title• Filtering: English titles Target
• Target data type. SPARQL Endpoint• Target entity: dcterms:title• Filtering: English titles
Setting• Matching algorithm: Trigrams• Matching algorithm: Trigrams• Threshold of acceptance: 95%• Output file format: N-TRIPLE• ...
6. Studies and experimentations30 out of 54
Interlinking processInterlinking process
6. Studies and experimentations31 out of 54
The interlinking tools (SILK)The interlinking tools (SILK)
6. Studies and experimentations32 out of 54
The interlinking tools (LIMES)The interlinking tools (LIMES)
Source & Target datasets
Condition
6. Studies and experimentations33 out of 54
The interlinking tools (LOD Refine)The interlinking tools (LOD Refine)
6. Studies and experimentations34 out of 54
Sample interlinking results (exact matched)Sample interlinking results (exact matched)
Title in both datasets Globe resource Target URI Dataset name
lhttp://www.globe-i f / t/l 2
http://schools.nyc.gov/NR/rdonlyres/6C64098F-0C24-4B27-A22F-F542A2F97DA0/130926/TTS_G11_LiteracySSandScience_NuclearEnergy.pdf
ASN
Nuclear Energy info.org/ont/lom2owl# 108450 http://resrev.ilrt.bris.ac.uk/research-revealed-
hub/publications/118933#pub Bristol
http://data.linkedu.eu/hud/book/118555 Huddersfield
Bibliographyhttp://www.globe-info.org/ont/lom2owl#178214
http://resrev.ilrt.bris.ac.uk/research-revealed-hub/publications/15140#pub OpenUK
http://data.uni-muenster de/context/istg/allegro/6/210/T0024 Muenstermuenster.de/context/istg/allegro/6/210/T00244773
Muenster
6. Studies and experimentations35 out of 54
Evaluating the interlinking toolsEvaluating the interlinking tools
We used three tools to interlink GLOBE to DBpedia
GLOBE and DBpedia on title
6. Studies and experimentations36 out of 54
Evaluating the interlinking toolsEvaluating the interlinking tools
Does the result change if we use more than one tool?
Common results among the tools
6. Studies and experimentations37 out of 54
Remarks of this investigation
Applying the interlinking tools for linking datasets i li bl th d
Remarks of this investigation
is a reliable method. Silk and LIMES were the efficient tools for
similarity discovery among the LOD datasets.similarity discovery among the LOD datasets.
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “An empirical study on the evaluation of interlinking tools on the Web of Data,” Journal of Information Science, vol 40, pp.637–648 2014 fi t bli h d J 11 2014648 2014, first published on June 11, 2014.
6. Studies and experimentations38 out of 54
Objective 4:Assessing the interlinking results
and their advantages
6. Studies and experimentations
Evaluating the interlinking resultsEvaluating the interlinking results
Interlinking tools perform an interlinking d i t t th t h d process and print out the matched resources.
The question under this discussion is to what The question under this discussion is to what extent are the results reliable?
An important step after doing the interlinking is evaluating the interlinking results by human and domain expertsand domain experts.
6. Studies and experimentations40 out of 54
The interlinking approachThe interlinking approach
6. Studies and experimentations41 out of 54
GLOBE metadata analysisGLOBE metadata analysis
Creating a criteria under which we can findappropriate elements for interlinking (datatype, completeness, content)pp p g ( yp , p , )
6. Studies and experimentations42 out of 54
GLOBE metadata analysisGLOBE metadata analysis
6. Studies and experimentations43 out of 54
Interlinking resultsInterlinking results
Title Keyword Taxon Coverage
GLOBE 8,260 228,352 134,791 12,941
Percentage 2% 74% 76% 78%
Interlinking through the Keyword element
6. Studies and experimentations44 out of 54
Evaluating the interlinking resultsEvaluating the interlinking resultsWe evaluated the interlinking results from
the following perspectives:the following perspectives: Reliability Level of agreement between the ratersg
Relationship among results (e.g., threshold 75%)
Is parent of, Is related to, Is part of Enrichment of content
Li ki d Linking one resource to many datasets on the Web
6. Studies and experimentations45 out of 54
Remarks of this investigationg Human experts (the results raters) agreed that the
interlinking results are reliable. Interlinking a learning repository to several
educational datasets in the LOD cloud leads to the enrichment of content.
Interlinking results can lead to duplicate metadata finding.
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Discovering Duplicate and Related Resources using Interlinking Approach: The case of Educational Datasets,” Journal of Information Science, first
bli h d M h 10 2015
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking Educational Data: an Experiment with Engineering-related
published on March 10, 2015
Resources in GLOBE,” International Journal of Engineering and Education, Vol 31-3, 2015.
6. Studies and experimentations46 out of 54
Conclusions
1. Exposing eLearning metadata as Linked O D t Open Data
A complete analysis was done on exposing the IEEE LOM schema as RDFIEEE LOM schema as RDF.
A new ontology was designed for RDF binding of IEEE LOM.
Keyword, Coverage, Classification, and Title were appropriate elements for interlinking.
7. Conclusion & Future work47 out of 54
Conclusions (cont.)
2. Evaluating Linked Data tools & datasets
( )
Silk and LIMES were the efficient frameworks in terms of discovering similarities between two ddatasets.
DBpedia was identified as the hub of the LOD cloud. cloud.
Twenty educational dataset were identified as the most proper targets for interlinking.
The Open University of UK includes rich metadata schema and reliable endpoint.
7. Conclusion & Future work48 out of 54
Conclusions (cont.)
3. Enriching the educational datasets
( )
g Interlinking results were reviewed and
verified by human experts. Several educational resources were linked to
more than one resources in the LOD cloud. A duplicate identification was proposed after
the analysis of the interlinking results.
7. Conclusion & Future work49 out of 54
Additional contributions
Implementing several platform for exposing data as Linked DataLinked Data Organic.Edunet (http://data.organic-edunet.eu) ARIADNE (http://ariadne.grnet.gr) Open Discovery Space (http://data opendiscoveryspace eu) Open Discovery Space (http://data.opendiscoveryspace.eu) Agrega (http://agrega2.red.es/ )
Submitting the IEEE LOM ontology to Linked Open Vocabularies (LOV) at Vocabularies (LOV) at http://lov.okfn.org/dataset/lov/vocabs/lom
Developing an online Mashup to interlinking eLearning objects to Web of Data (research stay in eLearning objects to Web of Data (research stay in Agroknow)
Writing a book chapter about “Optimizing Big Data using the Linked Data approach”
7. Conclusion & Future work50 out of 54
Additional contributions (cont.)( )
7. Conclusion & Future work51 out of 54
Future work
Content Applying the interlinking approach to other
educational repositories Tools and software Tools and software
Extending the tools to link one datasets to several datasets at the same time
Adding some semantic similarity services to tools to improve the interlinking results
Linking educational resources by datasets crawling
7. Conclusion & Future work52 out of 54
Publications (journal papers)(j p p ) E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking Educational Resources to Web of Data
through IEEE LOM”. Computer Science and Information Systems, vol. 12, No. 1, pp. 233–255, 2015.
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Discovering Duplicate and Related Resources using Interlinking Approach: The case of Educational Datasets,” Journal of Information Science, first published on March 10, 2015 doi:10.1177/0165551515575922.
E. Rajabi, S. Sanchez-Alonso, and M.-A. Sicilia, “Analyzing Broken Links on the Web of Data: an Experiment with DBpedia,” Journal of the Association for Information Science and Technology (JASIST), vol. 65, no. 8, pp. 1721–1727, 2014 doi: 10.1002/asi.23109.gy ( ) pp /
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “An empirical study on the evaluation of interlinking tools on the Web of Data,” Journal of Information Science, vol 40, pp.637–648 2014, first published on June 11, 2014 doi:10.1177/0165551514538151.
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking Educational Data: an Experiment with Engineering related Resources in GLOBE ” International Journal of Engineering and Education Engineering-related Resources in GLOBE, International Journal of Engineering and Education, 2015. In press.
E. Rajabi, W Greller, K Niemann, K Kastrantas, S Sanchez-Alonso, Social data interoperability in educational repositories and federations , International Journal of Metadata, Semantics and Ontologies 8 (2), 169 - 178, 2013.
E. Rajabi, S. Sanchez-Alonso, M.-A. Sicilia, and N. Mouneselis, “A linked and open dataset from a network of learning repositories on organic agriculture”, British Journal of Educational Technology, submitted (under second review).
M-C Valiente, M.-A. Sicilia, E. Garcia-Barriocanal, E. Rajabi, "Adopting the metadata approach to improve the search and analysis of educational resources for online learning", Computers in Human
7. Conclusion & Future work53 out of 54
p y g , pBehavior. 2015. In press.
Publications (conference papers)( p p ) E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, “Interlinking Educational Data: an
Experiment with GLOBE Resources,” presented at the First International Conference on Technological Ecosystem for Enhancing Multiculturality, Salamanca, Spain, 2013. g y g y, , p ,
E. Rajabi, M.-A. Sicilia, and S. Sanchez-Alonso, "Research Objects Interlinking: The Case of Dryad Repository”, presented at Metadata and Semantics Research, Karlsruhe, Germany, 2014.
E. Rajabi, and S. Sanchez-Alonso, "Enriching the e-learning contents using j , , g g ginterlinking”, presented at 5th eLearning conference, Belgrade, Serbia, 2014. Link: https://scholar.google.es/scholar?oi=bibs&cluster=16249634834288673991&btnI=1&hl=en
E. Rajabi, M.-A. Sicilia, S. Sanchez-Alonso, A Simple Approach towards SKOSificationof Digital Repositories , Metadata and Semantics Research, 67-74, 2013.
M-A. Sicilia, S. Sanchez-Alonso, E. Garcia-Barriocanal, J. Minguillón and E. Rajabi, Exploring the keyword space in large learning resource aggregations: the case of GLOBE, Lacro workshop, April 2013.
7. Conclusion & Future work54 out of 54