the web of linked data information universe seongmin lim [email protected] dept. of industrial...
Post on 21-Dec-2015
216 views
TRANSCRIPT
The Web of Linked DataInformation Universe
Seongmin Lim
Dept. of Industrial Engineering
Seoul National University
2
contents
Foundations of Dataspaces and Linked Data- Where do they overlap?
The Web of Linked Data- What data is out there?
Linked Data Applications- What is being done with the data?
Remarks on- Identity- Self-descriptive Data- Pay-as-you-go Integration
3
From data integration systems to dataspace
In order to cope with growing number of data sources
Properties of dataspaces- may contain any kind of data
(structured, semi-structured, unstructured)- require no upfront investment into a global schema- provide for data-coexistence- give best-effort answers to queries- rely on pay-as-you-go data integration
4
Linked data principles
For publishing structured data on the general Web
Tim Berners-Lee
1. Use URIs as names for things.
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful RDF information.
4. Include RDF statements that link to other URIs so that they can discover related things.
5
From classic web to web 2.0
Single global information space No single global dataspace
1. Small set of simple standards 1. APIs have proprietary interfaces
2. Hyperlinks to connect everything 2. Mashups from a fixed data sources
3. No hyperlinks within different APIs
7
Can’t we just publish data as files?
pdf- Easy to read and publish
Excel- Allows further processing and analysis
csv- Processing without need for proprietary tools
But…- Structure of data not explained- No connection between different data sets, silos- Static and fixed – can’t retrieve just slices relevant to problem
8
Linked data
Extend the Web with a single global dataspace- By using RDF to publish structured data on the Web- By setting links between data items within different data sources
9
What is RDF?
Resource Description Framework
RDF is the data format for linked data
It’s about writing down relations between things
What is RDF for?- For everyone to do same for data- To make the Web into a database
14
contents
Foundations of Dataspaces and Linked Data- Where do they overlap?
The Web of Linked Data- What data is out there?
Linked Data Applications- What is being done with the data?
Remarks on- Identity- Self-descriptive Data- Pay-as-you-go Integration
15
Properties of the Web of linked data
Global, distributed dataspace built on a simple set of standards- RDF, URIs, HTTP
Entities are connected by links- enables the discovery of new data sources.
Provides for data-coexistence- Everyone can publish data to the Web of Linked Data- Everyone can express their personal view on things- Everybody can use the schemata that they like for this
16
W3C linking open data project
Publish existing open license datasets as linked data Interlink things between different data sources 2007
18
DBpedia
community effort to extract structured
information from Wikipedia. provides data about 3.4 million things- 312,000 persons- 140,000 organizations- 413,000 places- 94,000 music albums- 49,000 films- 146,000 species- …
provides identifiers for many common things- http://dbpedia.org/resource/Calgary
overlaps with many other data sources on the Web
19
Uptakes in many areas
Uptake in life sciences- W3C linking open drug data effort- Bio2RDF project- Allen Brain Atlas
Governments, libraries, media industry, ……
20
The structural continuum
The Web of linked data is interwoven with the classic Web.- Unstructured data: HTML- Semi-structured data: RDFa embed into HTML- Structured data: RDF/XML
Services using named entity recognition to annotate texts with Linked Data URIs- Open Calais (Thomsons Reuters) for news- Zemanta (startup) for blog posts
21
contents
Foundations of Dataspaces and Linked Data- Where do they overlap?
The Web of Linked Data- What data is out there?
Linked Data Applications- What is being done with the data?
Remarks on- Identity- Self-descriptive Data- Pay-as-you-go Integration
22
Linked data browsers
Provide for navigating between data sources in order to explore the dataspace.- Tabulator Browser (MIT, USA)- Marbles (FU Berlin, DE)- OpenLink RDF Browser (OpenLink, UK)- Zitgist RDF Browser (Zitgist, USA)- Disco Hyperdata Browser (FU Berlin, DE)- Fenfire (DERI, Irland)
25
Web of data search engines
Crawl the dataspace and provide best-effort query answers over crawled data.- Falcons (IWS, China)- Sig.ma (DERI, Ireland)- Swoogle (UMBC, USA)- VisiNav (DERI, Ireland)- Watson (Open University, UK)
27
What are the big players doing?
Yahoo! and Google have started to crawl Linked Data in its RDFa serialization as well as Microformats.
Yahoo!- provides access to crawled data through the Yahoo BOSS API- is using the data within Yahoo Search Monkey to make search
results more useful and visually appealing.
Google- uses crawled RDF data for its Social Graph API- uses crawled data to enhance search results snippets for reviews
and people.
29
contents
Foundations of Dataspaces and Linked Data- Where do they overlap?
The Web of Linked Data- What data is out there?
Linked Data Applications- What is being done with the data?
Remarks on- Identity- Self-descriptive Data- Pay-as-you-go Integration
30
Identity
Real world objects are identified with multiple URIs- Coupling of identification and retrieval- Data-coexistence: everybody can say everything about anything
31
Enable Clients to retrieve the Schema
Clients can resolve the URIs that identify vocabulary terms in order to get their RDFS or OWL definitions.
32
Reuse Terms from Common Vocabularies
Common Vocabularies- Friend-of-a-Friend for describing people and their social network- SIOC for describing forums and blogs- SKOS for representing topic taxonomies- Organization Ontology for describing the structure of
organizations- GoodRelations for describing products and business entities- Music Ontology for describing artists, albums, and performances- Review Vocabulary provides terms for representing reviews
Common sources of identifiers (URIs) for real world objects- LinkedGeoData and Geonames: Locations- GeneID and UniProt: Life science identifiers- Dbpedia: Wide range of things
33
Somebody Pays-As-You-Go
The overall data integration effort is split between the data publisher, the data consumer and third parties.
Data Publisher- publishes data as RDF- publishes data in a self-descriptive fashion- sets links and publishes mappings
Third Parties- set links pointing at your data- publish mappings to the Web
Data Consumer- has to do the rest
34
Summary
Linked Data moves the dataspace vision to a global scale and adds the social/community aspect to it.
The Web of Linked Data is growing rapidly- active deployment communities in different domains- might have exceeded the critical mass
Great playground for experimentation- dataspace profiling- probabilistic and approximate schema mapping- data fusion, data quality, and trust- What will the user interfaces look like?- Will search engines turn into answer engines?
End of Document
Seongmin Lim
Dept. of Industrial Engineering
Seoul National University