towards a linked open data infrastructure for science, technology & innovation studies

116
x Towards a Linked Open Data Infrastructure for Science, Technology & Innovation Studies Ali Khalili, PhD Department of Computer Science/Artificial Intelligence Knowledge Representation & Reasoning Research Group

Upload: ali-khalili

Post on 20-Feb-2017

85 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

x

Towards a Linked Open Data Infrastructure for Science, Technology & Innovation

Studies

Ali Khalili, PhD Department of Computer Science/Artificial Intelligence

Knowledge Representation & Reasoning Research Group

Page 2: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Outline

• Linked (Open) Data

• RISIS Project

• Semantically Mapping Science (SMS) Platform

• Workflow

• Use Cases

• Adaptive Functional Urban Areas (FUAs) to Study Innovative Activities

• Gendered Dimensions in Grant Selection

Page 3: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Evolution of the Web

https://mcgratha.wordpress.com/

Page 4: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Evolution of the Web

https://mcgratha.wordpress.com/

Page 5: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked (Open) Data

• A set of best practices for publishing data on the Web.• Follows 4 simple principles:

https://www.ted.com/talks/tim_berners_lee_on_the_next_web

• Use HTTP URIs so that users can look up (dereference) those names. • When someone looks up a URI, provide useful information, using the

open standards.• Include links to other URIs, so that users can discover more things.

• Use URIs as names (identifiers) for conceptual things.

Page 6: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked (Open) Data: Principles

WWW World

Page 7: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked (Open) Data: Principles

WWW World

Page 8: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked (Open) Data: Principles

WWW World

Page 9: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked (Open) Data: Principles

WWW World

Page 10: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked (Open) Data: Principles

WWW World

Page 11: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

5 Open Data

make your stuff available on the Web (whatever format) under an open license

make it available as structured data (e.g., Excel instead of image scan of a table)

make it available in a non-proprietary open format (e.g., CSV instead of Excel)

use Linked Data format (URIs to identify things, RDF to represent data)

link your data to other people’s data to provide context

http://5stardata.info/

Page 12: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

5 Open Data

Page 13: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data Cloud

8Linked Open DataAli Khalili

http://lod-cloud.net/

Page 14: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data Cloud

8Linked Open DataAli Khalili

http://lod-cloud.net/

Page 15: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data Cloud

8Linked Open DataAli Khalili

http://lod-cloud.net/

Page 16: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data Cloud

8Linked Open DataAli Khalili

http://lod-cloud.net/

Page 17: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data Cloud

8Linked Open DataAli Khalili

http://lod-cloud.net/

Page 18: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data Cloud

8Linked Open DataAli Khalili

http://lod-cloud.net/

Page 19: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data Cloud

8Linked Open DataAli Khalili

http://lod-cloud.net/

Page 20: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data Cloud

8Linked Open DataAli Khalili

http://lod-cloud.net/

Page 21: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data Cloud

http://lod-cloud.net/

Page 22: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data: Statistics

http://lodlaundromat.org/

http://stats.lod2.eu/

more than 3426 datasets

Page 23: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data: Examples

https://en.wikipedia.org/wiki/Paris

Page 24: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data: Examples

Resource Property Value

https://en.wikipedia.org/wiki/Paris

Page 25: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data: Exampleshttp://dbpedia.org/resource/Paris

Page 26: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data: Examples• Give me a list of capital cities in Europe with population more than 500,000 • Who are mayors of central European towns elevated more than 1000m? • Which movies are starring both Brad Pitt and Angelina Jolie? • All soccer players, who played as goalkeeper for a club that has a stadium with

more than 40.000 seats and who are born in a country with more than 10 million inhabitants

• …

Page 27: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data: Examples• Give me a list of capital cities in Europe with population more than 500,000 • Who are mayors of central European towns elevated more than 1000m? • Which movies are starring both Brad Pitt and Angelina Jolie? • All soccer players, who played as goalkeeper for a club that has a stadium with

more than 40.000 seats and who are born in a country with more than 10 million inhabitants

• …

Page 28: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data: Examples

Page 29: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data: Examples

Page 30: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data: Examples

https://www.google.com/cse/

Page 31: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data: Examples

http://www.wolframalpha.com/

Page 32: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

http://risis.eu

Page 33: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

RISIS EU Project (http://risis.eu)

http://datasets.risis.eu/

Page 34: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

RISIS Datasets: Entity Types

Organization Product Agreement

Person PolicyPolicy

Evaluation Location

CIB ETER EUPRO JOREP Leiden-Ranking

MORE I Nano Profile SIPER VICO

Higher Education

Firm Funding Body

Publication

Patent

Project

Investment

Funding Program

Page 35: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

RISIS Datasets: Entity Types

Organization Product Agreement

Person PolicyPolicy

Evaluation Location

CIB ETER EUPRO JOREP Leiden-Ranking

MORE I Nano Profile SIPER VICO

Higher Education

Firm Funding Body

Publication

Patent

Project

Investment

Funding Program

Page 36: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Semantically Mapping Science (SMS) Platform

http://sms.risis.eu

Page 37: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

RISIS WP9 Vision

ProposingS&TmapofEurope

Page 38: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Functional Urban Areas (FUAs)

Page 39: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Functional Urban Areas (FUAs)

Page 40: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

defined by OECD in collaboration with EC/Eurostat consider factors beyond the predefined city boundaries to better reflect the economic geography of where people live and work

Functional Urban Areas (FUAs)

OECD Metropolitan eXplorer: http://measuringurban.oecd.org

Page 41: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

defined by OECD in collaboration with EC/Eurostat consider factors beyond the predefined city boundaries to better reflect the economic geography of where people live and work

population area GDP environment (CO2 emissions and air pollution) labour market (employment and unemployment growth) innovation (patent intensity) urban form and territorial organization

Functional Urban Areas (FUAs)

OECD Metropolitan eXplorer: http://measuringurban.oecd.org

Page 42: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Functional Urban Areas (FUAs)

Page 43: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

FUAs Example: Netherlands

Page 44: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

FUAs: Building Blocks

Municipalities

Page 45: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Problem

Address FUA?

Page 46: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Problem

Address FUA?

• Vrije Universiteit Amsterdam • De Boelelaan 1105, 1081 HV Amsterdam Amsterdam (NL002)

Page 47: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Problem

Address FUA?

• Vrije Universiteit Amsterdam • De Boelelaan 1105, 1081 HV Amsterdam Amsterdam (NL002)

OECD FUAs List

Page 48: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Problem

Address FUA?

• Vrije Universiteit Amsterdam • De Boelelaan 1105, 1081 HV Amsterdam Amsterdam (NL002)

- Geocode to LAU (municipality)

OECD FUAs List

Page 49: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Problem

Address FUA?

• Vrije Universiteit Amsterdam • De Boelelaan 1105, 1081 HV Amsterdam Amsterdam (NL002)

- Geocode to LAU (municipality)

OECD FUAs List

Page 50: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Problem

Address FUA?

• Vrije Universiteit Amsterdam • De Boelelaan 1105, 1081 HV Amsterdam Amsterdam (NL002)

- Geocode to LAU (municipality)

- Shapefiles for FUAs or LAUs?OECD FUAs List

Page 51: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Problem

Address FUA?

• Vrije Universiteit Amsterdam • De Boelelaan 1105, 1081 HV Amsterdam Amsterdam (NL002)

- Geocode to LAU (municipality)

- Shapefiles for FUAs or LAUs?OECD FUAs List

Page 52: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data 28Ali Khalili

Linked Open Data

Interlinking

Enrichment

Quality Analysis

Evolution

Exploration

Extraction

Storage/Querying

Authoring

Linked (Open) Data Lifecycle

http://stack.linkeddata.org/

Page 53: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data 28Ali Khalili

Linked Open Data

Interlinking

Enrichment

Quality Analysis

Evolution

Exploration

Extraction

Storage/Querying

Authoring

Linked (Open) Data Lifecycle

http://stack.linkeddata.org/

Page 54: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data 29Ali Khalili

Linked Open Data Lifecycle

Exploration

Page 55: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data 29Ali Khalili

Linked Open Data Lifecycle

• Search • Browse • Visualize

Exploration

Page 56: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Search for Linked Data

Linked Open Data 30Ali Khalili

Linked Open Data Lifecycle Exploration

http://lov.okfn.org/

Page 57: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Search for Linked Data

Linked Open Data 31Ali Khalili

Linked Open Data Lifecycle Exploration

http://schema.org/

http://bl.ocks.org/danbri/1c121ea8bd2189cf411c

Page 58: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Search for Linked Data

Linked Open Data 32Ali Khalili

Linked Open Data Lifecycle

Data hub http://datahub.io search for data, register published datasets, create and manage groups of datasets…

Exploration

Page 59: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Search for Linked Data

Linked Open Data 33Ali Khalili

Linked Open Data Lifecycle Exploration

http://lotus.lodlaundromat.org

Page 60: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Search for Linked Data

Linked Open Data 34Ali Khalili

Linked Open Data Lifecycle Exploration

• OpenStreepMap (OSM) • Database of Global Administrative Areas (GADM) • Flickr Shapefiles Dataset • Published Shapefiles for Individual Countries • Published Geospatial RDF Datasets

Example

Page 61: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

OpenStreetMap (OSM)

• https://www.openstreetmap.org • built by a community of mappers that contribute and

maintain data about roads, trails, cafés, railway stations, and much more, all over the world.

• Administrative Boundaries • Level 1: super-national administrations e.g. European Union. • Level 2: country borders based on the political entities listed on

the ISO 3166 standard. • Level 3 to 11: subnational borders such as ``state'', ``province'',

``region'' and ``district''. • Data Access • Nominatim Web API for querying OSM • The Overpass API for fetching specific OSM data • Planet.osm Data (over 617GB uncompressed!)

Page 62: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

OSM: Nominatim Web API

• a tool to search OSM data by name and address and to generate synthetic addresses of OSM points (reverse geocoding)

• Several companies provide hosted instances of Nominatim query API, e.g MapQuest Open Initiative, PickPoint or the OpenCage Geocoder

• API documentation • Example usage:

• http://nominatim.openstreetmap.org/search.php?q=amsterdam&polygon=1&country=Netherlands&format=json&addressdetails=1

• MapQuest API

Page 63: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

OSM Data: Example

Page 64: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

GADM (Global Administrative Areas)

• http://www.gadm.org • GADM is developed by University of California, Berkeley

Museum of Vertebrate Zoology, the International Rice Research Institute and the University of California, Davis, and with contributions of many others.

• uses other existing sources: http://www.gadm.org/links • Administrative Boundaries • Level 0: countries. • Level 1 to 5: lower level subdivisions such as provinces, departments,

counties, etc. depending on the size and availability of data for the underlying country.

• Data Access • data is available globally and for each individual country, in different

formats: geopackage,R SpatialPolygonsDataFrame, ESRI file geodatabase, Google Earth

Page 65: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Flickr geo-tagged pictures

• Data from 190M geo-tagged photos on Flickr • new smart phone do not only have a camera but also the ability to capture

location information. • plotted all the geotagged photos associated with a particular place to

generate a mostly accurate contour of that place (something more fine-grained than a bounding box!).

• Where On Earth (WOE) IDs • correspond to the hierarchy of places where a photo was taken: from

country (level 1), region (level 2) county (level 3), locality (level 4) to neighborhood (level 5).

• for a given WOE entity, approximate shape of that place is inferred. • shapes in GeoJSON format • view shapes at http://polymaps.org/ex/flickr.html • download at http://www.flickr.com/services/shapefiles/2.0.1/ • more info: http://code.flickr.net/2012/10/24/2273/

Page 66: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Published Shapefiles for Individual Countries

• Local administrative offices or Geo-related research centres might provide shape files specific to a country.

• E.g. for the Netherlands, shapefiles are provided by Centraal Bureau voor de Statistiek (CBS)

• Data collection needs to be done by a group of people in contact with Geo-related organization in countries.

• Current status

Page 67: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Published Geospatial RDF Datasets

•http://linkedgeodata.org and http://geoknow.eu •a large spatial knowledge base (>400m geo elements)

which has been derived from OpenStreetMap. •provides unique URIs and has Mappings to DBpedia.

•GeoVocab.org • GADM-RDF: Global Administrative Areas • NUTS-RDF: EU's Nomenclature of Territorial Units for

Statistics

Page 68: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Published Geospatial RDF Datasets

•http://linkedgeodata.org and http://geoknow.eu •a large spatial knowledge base (>400m geo elements)

which has been derived from OpenStreetMap. •provides unique URIs and has Mappings to DBpedia.

•GeoVocab.org • GADM-RDF: Global Administrative Areas • NUTS-RDF: EU's Nomenclature of Territorial Units for

Statistics Outdated!

No Shapefiles!

Page 69: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Extraction

Linked Open Data 42Ali Khalili

Linked Open Data Lifecycle

Page 70: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

from Semi-structured sources

Linked Open Data 43Ali Khalili

Linked Open Data Lifecycle Extraction

Resource Property Value

Page 71: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data 44Ali Khalili

Linked Open Data Lifecycle Extraction DBpedia

Page 72: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data 44Ali Khalili

Linked Open Data Lifecycle Extraction DBpedia

Persian DBpedia?

Page 73: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Persian DBpedia (mapping Wiki)

Linked Open Data 45Ali Khalili

Linked Open Data Lifecycle Extraction DBpedia

Page 74: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Open Data 46Ali Khalili

Linked Open Data Lifecycle Extraction

• Ad-hoc• DBpedia extraction framework

• Generic• OpenRefine

from Semi-structured sources

Page 75: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

from Unstructured sources

Linked Open Data 47Ali Khalili

Linked Open Data Lifecycle Extraction

…After leaving Apple, Jobs took a few of its members with him to found NeXT, a computer platform development company based in Redwood City, specializing in state-of-the-art computers for higher-education and business markets. In addition, Jobs helped to initiate the development of the visual effects industry when he funded the spinout of the computer graphics division of George Lucas's company Lucasfilm in 1986. The new company, Pixar, would eventually produce the first fully computer-animated film, Toy Story…

NLP, Text mining, Annotation

Page 76: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

from Unstructured sources

Linked Open Data 47Ali Khalili

Linked Open Data Lifecycle Extraction

…After leaving Apple, Jobs took a few of its members with him to found NeXT, a computer platform development company based in Redwood City, specializing in state-of-the-art computers for higher-education and business markets. In addition, Jobs helped to initiate the development of the visual effects industry when he funded the spinout of the computer graphics division of George Lucas's company Lucasfilm in 1986. The new company, Pixar, would eventually produce the first fully computer-animated film, Toy Story…

NLP, Text mining, Annotation

Named Entity Recognition

Page 77: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

from Unstructured sources

Linked Open Data 47Ali Khalili

Linked Open Data Lifecycle Extraction

…After leaving Apple, Jobs took a few of its members with him to found NeXT, a computer platform development company based in Redwood City, specializing in state-of-the-art computers for higher-education and business markets. In addition, Jobs helped to initiate the development of the visual effects industry when he funded the spinout of the computer graphics division of George Lucas's company Lucasfilm in 1986. The new company, Pixar, would eventually produce the first fully computer-animated film, Toy Story…

NLP, Text mining, Annotation

Named Entity Recognition

foundedBy

Relation Extraction

Page 78: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Named Entity Recognition

Linked Open Data 48Ali Khalili

Linked Open Data Lifecycle Extraction

http://spotlight.dbpedia.org

http://bioportal.bioontology.org/annotator

Page 79: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

from Structured sources: Triplification

Linked Open Data 49Ali Khalili

Linked Open Data Lifecycle Extraction

• Relational Database to RDFR2RML: RDB to RDF Mapping Languagehttp://www.w3.org/TR/r2rml/

• D2R Server: Accessing databases with SPARQL & as Linked Datahttp://d2rq.org/

• Sparqlifydefining RDF views on relational databaseshttp://sparqlify.org/

Page 80: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

DATA EXTRACTION & CONVERSION

GeoJSON

Enrichment Functions

MappingConfigurations

OSM XML

PBF

ESRI shapes

triplify

mapshaper

osmtogeojson

osmosis

Page 81: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

DATA EXTRACTION & CONVERSION

Metadata about different levels provided by OSMhttp://wiki.openstreetmap.org/wiki/Tag:boundary%3Dadministrative

Page 82: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Storage & Querying

Linked Open Data 52Ali Khalili

Linked Open Data Lifecycle

Page 83: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Relational Databases vs. Triple Stores

Linked Open Data 53Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• A relational databases’ (e.g. MySQL, PostgreSQL, Oracle) natural representation is a collection interlinked tables.

• A triple stores’ (e.g. OpenSesame, AllegroGraph, Neo4j) natural representation is a multi-relational network, or graph.

* Triple Store: it is called a triple store because in RDF, the facts are represented in form of a triple (Subject-Predicate-Object).

Page 84: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Existing Triple Stores

Linked Open Data 54Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• Native triple stores4Store, AllegroGraph, BigData, Jena TDB, Sesame, Stardog, OWLIM and uRiKa

• RDBMS-backed triple storesJena SDB, IBM DB2 and OpenLink Virtuoso

• NoSQL triplestoresCumulusRDF

Page 85: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

DATA STORAGE & QUERYING

Virtuoso Geo Spatial Geometry as SMS

internal representation for Geo-data in RDF

Page 86: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

SPARQL – SQL for the Linked Data

Linked Open Data 56Ali Khalili

Linked Open Data Lifecycle Storage/Querying

What can be done with SPARQL that can't be done with SQL?

Page 87: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

SPARQL – SQL for the Linked Data

Linked Open Data 56Ali Khalili

Linked Open Data Lifecycle Storage/Querying

What can be done with SPARQL that can't be done with SQL?

• SPARQL queries are considerably better aligned with users’ mental models of a domain.

Page 88: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

SPARQL – SQL for the Linked Data

Linked Open Data 56Ali Khalili

Linked Open Data Lifecycle Storage/Querying

What can be done with SPARQL that can't be done with SQL?

• SPARQL queries are considerably better aligned with users’ mental models of a domain.

Page 89: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

SPARQL – SQL for the Linked Data

Linked Open Data 57Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• SPARQL allows the conceptual data model to be fully explored through queries.

Page 90: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

SPARQL – SQL for the Linked Data

Linked Open Data 57Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• SPARQL allows the conceptual data model to be fully explored through queries.

- example:workPhone rdfs:subPropertyOf example:phone- example:cellPhone rdfs:subPropertyOf example:phone- example:homePhone rdfs:subPropertyOf example:phone

Page 91: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

SPARQL – SQL for the Linked Data

Linked Open Data 58Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• Queries that have to traverse a chain of connections are particularly complex in SQL while very simple in SPARQL.

Page 92: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

SPARQL – SQL for the Linked Data

Linked Open Data 58Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• Queries that have to traverse a chain of connections are particularly complex in SQL while very simple in SPARQL.

Page 93: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

SPARQL – SQL for the Linked Data

Linked Open Data 59Ali Khalili

Linked Open Data Lifecycle Storage/Querying

• In addition to SELECT, INSERT and DELETE, SPARQL supports ASK queries.

• SPARQL includes syntax (i.e. SERVICE) to call two or more data sources within a single query.

• …

Page 94: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

SPARQL Query Interface

Linked Open Data 60Ali Khalili

Linked Open Data Lifecycle Storage/Querying

http://yasgui.org/

Page 95: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Interlinking

Linked Open Data 61Ali Khalili

Linked Open Data Lifecycle

Page 96: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Interlinking

Linked Open Data 62Ali Khalili

Linked Open Data Lifecycle

• The degree to which entities that represent the same concepts are linked to each other.

• “Connecting things that are somehow related” • Methods

• Automatic, Semi-automatic, Manual • Universal, Domain-specific

<http://dbpedia.org/resource/VU_University_Amsterdam>

<https://www.wikidata.org/entity/Q1065414>

SameAs

Page 97: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Interlinking Methods

Linked Open Data 63Ali Khalili

Linked Open Data Lifecycle

• Ontology Matching • establish links between ontologies underlying two

data sources.

• Instance Matching (Link Discovery) • discover links between instances contained in two

data sources.

Page 98: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

DATA LINKAGE

- Query on metadata about the administrative boundaries

- Find the alignment between levels in different datasets

Page 99: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

DATA LINKAGE

- used the possible mappings between datasets at different levels.

- check the overlaps of areas at the similar level, and for the matching areas apply string matching to make sure that they refer to the same administrative boundary.

Page 100: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

DATA LINKAGE

OECDFUAs

DBpedia

GeoNames

WikiData

GADM

FlickrShapes

OpenStreetMapAdministrative

Boundaries

Page 101: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

DATA LINKAGE

OECDFUAs

DBpedia

GeoNames

WikiData

GADM

FlickrShapes

OpenStreetMapAdministrative

Boundaries

Page 102: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

DATA LINKAGE

OECDFUAs

DBpedia

GeoNames

WikiData

GADM

FlickrShapes

OpenStreetMapAdministrative

Boundaries

Page 103: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Scientific Lenses

DBpedia

Wikidata

OrgRef

GRID FundRef

Geoname

ISNI

VIAF

Cordis

?

Page 104: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Semantically Mapping Science (SMS) Platform

http://sms.risis.eu

Page 105: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Linked Data Serviceshttp://api.sms.risis.eu/

Page 106: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

SERVICE TO APPLICATIONhttp://sms.risis.eu/demos

Page 107: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

SERVICE TO APPLICATIONhttps://docs.google.com/spreadsheets/d/1XhXzdAf-veqHPj0kIaeZoXE3AJa8nwNugW_nOHH1jtk/edit?usp=sharing

Page 108: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Use Cases

https://hyperir.cartodb.com/viz/13b5f3da-4356-11e6-a365-0e5db1731f59/public_map

(research) and innovation subsidies for organizations and companies in the Netherlands

Page 109: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Use Cases(research) and innovation subsidies for organizations and companies in the Netherlands

People Hybrid OECD FUAsBusinesses

People Hybrid OECD FUAsBusinesses

Page 110: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Use Cases

Universities + Companies + Projects + Boundaries

Properties of container administrative boundaries

Collaboration between Universities and Companies

Properties of Universities and Companies

Page 111: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Use Cases

Universities + Companies + Projects + BoundariesRVO-NL

DBpedia OpenStreetMap

GADMFlickr

OECD FUAs

CBS-NL

Properties of container administrative boundaries

Collaboration between Universities and Companies

Properties of Universities and Companies

Page 112: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Use Cases

Universities + Companies + Projects + BoundariesRVO-NL

DBpediaLeiden-Ranking

ETER

OrgRef Cordis OpenStreetMap

GADMFlickr

OECD FUAs

Grid

CBS-NL

Properties of container administrative boundaries

Collaboration between Universities and Companies

Properties of Universities and Companies

Eurostat

Page 113: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Summary of the Use Case

Address

FUA

Administrative Boundaries

Coordinates

geocode

Page 114: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

Summary of the Use Case

Address

FUA

Administrative Boundaries

Coordinates

geocode

Page 115: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies
Page 116: Towards a Linked Open Data Infrastructure  for Science, Technology & Innovation Studies

References

Linked Open Data 77Ali Khalili

Linked Open Data

• http://slidewiki.org/deck/11936_semantic-data-web-lecture-series • Introduction to linked data and its lifecycle on the web • http://euclid-project.eu/ • http://videolectures.net/wims2011_auer_interlinked/ • https://vimeo.com/76257120 • http://www.slideshare.net/slidarko/evolving-the-web-into-a-giant-global-

database-3880018 • http://www.dataversity.net/introduction-to-triplestores/ • http://www.topquadrant.com/2014/05/05/comparing-sparql-with-sql/