linked data usecases
DESCRIPTION
This slide describes usecases of Linked Data.TRANSCRIPT
Linked Data & Semantic Web Technology
Linked Data Usecases
Dr. Myungjin Lee
2 Linked Data & Semantic Web Technology
Agenda
• Introduction of the Linked Data
• Linked Data for Cross-Domain
• Linked Geographic Data
• Linked Government Data
• Linked Media Data
• Linked Data for User Generated Con-
tent
• Linked Publication Data
• Linked Life Science Data
3 Linked Data & Semantic Web Technology
Introduction ofthe Linked Data
4 Linked Data & Semantic Web Technology
What is Linked Data?
• a method of publishing structured data so that data can be interlinked and become more useful
• based on standard Web technologies such as HTTP, RDF and URIs.
• to share information in a way that can be read au-tomatically by computers.
5 Linked Data & Semantic Web Technology
Stack and Requirements for Linked Data
an elemental syntaxfor content structurewithin documents
a simple languagefor expressing data models,
which refer to objects ("resources")and their relationships
a vocabulary for describingproperties and classes
of RDF-based resources
a protocol and query languagefor semantic web data sources
a string of characters used to identify a name or a resource
6 Linked Data & Semantic Web Technology
Four Principles of Linked Data
1. Use URIs to identify things.
2. Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents.
3. Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML.
4. Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.
7 Linked Data & Semantic Web Technology
5 Star Linked Data
★ Available on the web (whatever format) but with an open licence, to be Open Data
★★ Available as machine-readable structured data (e.g. excel instead of image scan of a table)
★★★ as (2) plus non-proprietary format (e.g. CSV instead of excel)
★★★★ All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff
★★★★★ All the above, plus: Link your data to other people’s data to provide context
8 Linked Data & Semantic Web Technology
The Linking Open Data cloud dia-gram
9 Linked Data & Semantic Web Technology
Media
User Generated Content
Publications
Government
Geographic
Cross-Domain
Life Sciences
Domain Number of datasets Triples (Out-)Links
Media 25 18,4185,2061 5044,0705
Geographic 31 61,4553,2484 3581,2328
Government 49 133,1500,9400 1934,3519
Publications 87 29,5072,0693 1,3992,5218
Cross-domain 41 41,8463,5715 6318,3065
Life Sciences 41 30,3633,6004 1,9184,4090
User-generated Content 20 1,3412,7413 344,9143
Total 295 316,3421,3770 5,0399,8829
10 Linked Data & Semantic Web Technology
Linked Data forCross-Domain
11 Linked Data & Semantic Web Technology
DBPedia
• a project aiming to extract structured content from the information created as part of the Wikipedia project
• as of September 2011, more than 3.64 million things, more than 6.5 million interlinks, and over 1 billion pieces of information (RDF triples)
12 Linked Data & Semantic Web Technology
13 Linked Data & Semantic Web Technology
The DBpedia Information Extraction Framework
• Source– an abstraction over a source of Media Wiki pages
• WikiParser– a parser which transforms an Media Wiki page source into an Abstract Syn-
tax Tree (AST)
• Extractor– a mapping from a page node to a graph of statements about it
• Destination– an abstraction over a destination of RDF statements
14 Linked Data & Semantic Web Technology
Freebase
• a large collaborative knowledge base consisting of metadata composed mainly by its community members
• as of May 2012, approximately 22 million topics
"Freebase is the bridge between the bottom up vision of Web 2.0 collective
intelligence and the more structured world of the semantic web."
15 Linked Data & Semantic Web Technology
16 Linked Data & Semantic Web Technology
OpenCyc
• Cyc– an artificial intelligence project that attempts to assemble a
comprehensive ontology and knowledge base of everyday common sense knowledge
• OpenCyc– mainly taxonomic assertions, not the complex rules avail-
able in Cyc– 239,000 concepts, 2,093,000 facts, and 69,000 owl:sameAs links to external (non-Cyc) semantic data
– the RDF-compatible content extracted from OpenCyc us-ing the open source Texai
17 Linked Data & Semantic Web Technology
Linked Geographic Data
18 Linked Data & Semantic Web Technology
GeoNames
• a geographical database available and accessible through various web services, under a Creative Commons attribution license
• over 10,000,000 geographical names correspond-ing to over 7,500,000 unique features
19 Linked Data & Semantic Web Technology
20 Linked Data & Semantic Web Technology
LinkedGeoData
• an effort to add a spatial dimension to the Web of Data / Semantic Web collected by the OpenStreetMap project according to the Linked Data principles
Dataset #Triples
Ontology 8K
RelevantNodes 66Mio
RelevantWays 65Mio
RelevantWayNodes 74Mio
RelevantNodePositions 60Mio
DBpedia Interlinks 101K
GeoNames Interlinks 487K
21 Linked Data & Semantic Web Technology
22 Linked Data & Semantic Web Technology
etc.
• Linked Sensor Data– an RDF dataset containing expressive descriptions of ~20,000 weather sta-
tions in the United States
• U.S. Census– Basic geographic data for the U.S., the states, counties, cities, ZCTAs, and
congressional districts.– 1,016,219 triples in N3 format
<http://www.rdfabout.com/rdf/usgov/geo/us/sc/counties/hampton_county> rdf:type usgovt:County ;
usgovt:fipsCountyCode "049" ;usgovt:fipsStateCountyCode "45:049" ;dc:title "Hampton County" ;dcterms:isPartOf <http://www.rdfabout.com/rdf/usgov/geo/us/sc> ;geo:lat 32.796299 ;geo:long -81.131622 ;census:population 21386 ;census:households 8582 ;census:landArea "1449823309 m^2" ;census:waterArea "7369890 m^2" ;census:details
<http://www.rdfabout.com/rdf/usgov/geo/us/sc/counties/hampton_county/censustables> .<http://www.rdfabout.com/rdf/usgov/geo/us/sc>
dcterms:hasPart<http://www.rdfabout.com/rdf/usgov/geo/us/sc/counties/hampton_county> .
23 Linked Data & Semantic Web Technology
Linked Government Data
24 Linked Data & Semantic Web Technology
Open Government Data
• By “open”, “open” data is free for anyone to use, re-use and re-distribute.
• By “government data” we mean data and informa-tion produced or commis-sioned by government or government controlled entities.
Open
GovData
OpenData
OpenGov
DataGov
OpenGovData
25 Linked Data & Semantic Web Technology
United States
• Data.gov– "The purpose of Data.gov is to increase public access to high value, ma-
chine readable datasets generated by the Executive Branch of the Federal Government.“
– "a repository for all the information the government collects"– over 250,000 datasets
• Data-gov Wiki– a project investigating open government datasets using semantic web
technologies– to translate datasets into RDF, to get them linked to the linked data cloud,
and to develop interesting applications on linked government data– Dataset Statistics
• 417 RDFlized datasets and 6.46 billion RDF triples• 35 Non-Data.gov Datasets and 0.9 billion more RDF triples
26 Linked Data & Semantic Web Technology
27 Linked Data & Semantic Web Technology
United Kingdom
• Data.gov.uk – a UK Government project to make available non-personal
UK government data as open data– over 9,000 datasets– the use of Linked Data standards for flexible and easy re-
use– Dataset
• Environment, Finance, Legislation, Location, Reference, Statistics, Transport, etc.
28 Linked Data & Semantic Web Technology
29 Linked Data & Semantic Web Technology
All around the world
Country Official? Rating Datasets
Sweden N ★★ few
New Zealand Y ★★ many
Ireland Y ★★★ few
Canada Y ★★★ many
United States Y ★★★★ many
Spain N ★★★★★ few
United Kingdom Y ★★★★★ many
Korea ? ? ?
30 Linked Data & Semantic Web Technology
Korea
• 공공데이터포털– 국가가 보유하고 있는 다양한 공공정보를 국민에 개방하여
이를 편리하고 손쉽게 활용할 수 있도록 지원– 1,717 datasets and 242 Open APIs– http://www.data.go.kr
• 공공 DB 피디아– 24 Datasets and 50,184 Resources– http://lod.data.go.kr
31 Linked Data & Semantic Web Technology
<rdf:RDF xmlns:ns1="http://lod.data.go.kr/sample/schema#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns0="http://lod.data.go.kr/schema/dataset#" > <rdf:Description rdf:about="http://lod.data.go.kr/sample/data/DS-0501"> <ns0:sampleResource rdf:resource="http://lod.data.go.kr/sample/data/DS-0501/SportFacility/SD10209PUEF"/> </rdf:Description> <rdf:Description rdf:about="http://lod.data.go.kr/sample/data/DS-0501/SportFacility/SD10209PUEF"> <ns0:prefLabel> 자전거체험장 </ns0:prefLabel> <ns0:nodeLabel> 자전거체험장 </ns0:nodeLabel> <ns1:phone>02-2204-7634</ns1:phone> <ns1:name> 자전거체험장 </ns1:name> <ns1:manageOrg> 성동구도시관리공단 </ns1:manageOrg> <ns1:description> 남녀노소 모두 편하게 이용할 수 있는 자전거체험장 </ns1:description> <ns1:address> 서울특별시 성동구 마장동 802-2 마장 2 교 ~ 사근램프 사이 </ns1:address> <rdf:type rdf:resource="http://lod.data.go.kr/sample/schema#SportFacility"/> </rdf:Description></rdf:RDF>
32 Linked Data & Semantic Web Technology
Seoul, Korea
• 서울 열린 데이터 광장– 서울시의 공공정보를 민간에 공개하고 소통함으로써 공익성 , 업무효율성 ,
투명성을 높이고 시민의 자발적 참여로 새로운 서비스와 공공의 가치를 창출– http://data.seoul.go.kr
• 서울 열린 데이터 광장 Linked Data Beta 서비스– 행정동 기준 행정구역 및 문화시설과 문화재 13,600 여종– http://lod.seoul.go.kr
33 Linked Data & Semantic Web Technology
34 Linked Data & Semantic Web Technology
KDATA (Linked Data for Korea)
• W3C 의 시맨틱 웹 표준 기술로 Linked Data 를 구현한 공개 기반 데이터
• http://kdata.kr• http://www.li-st.com
35 Linked Data & Semantic Web Technology
Domain Triples
국가코드 3,899
엔터테인먼트 44,278
행정구역 2,969
초중고등학교 126,469
교육청 1,130
대학교 2,833
사회적 기업 5,539
서울시 개방 화장실 47,340
야구선수 및 팀 228,872
지하철역 4,450
역사 5,392
행정데이터표준용어 109,101
한옥마을 1,155
공공 WiFi 설치정보 1,671
KDATA 분류용어 808
전통시장 4,535
국립공원 10,605
문화재 80,156
공공체육시설 49,799
생물분류 3,256
문화시설 9,418
공원정보 및 프로그램 2,429
가격안정모범업소 16,212
가격안정모범업소 상품목록 14,300
공공시설물 인증제품 6,931
제설함 위치정보 39,218
야생동식물정보 115,099
야생동식물 출현정보 139,608
합계 1,077,472
36 Linked Data & Semantic Web Technology
37 Linked Data & Semantic Web Technology
LinkedMedia Data
38 Linked Data & Semantic Web Technology
MusicBrainz
• MusicBrainz– a project that aims to create an open content music data-
base– information about 750,000 artists, 1 million releases, and
12 million recordings
• LinkedBrainz– to help MusicBrainz publish its database as Linked Data– mapped to concepts in the Music Ontology
39 Linked Data & Semantic Web Technology
Music Ontology
• main concepts and properties for describing music (i.e. artists, albums, tracks, but also performances, ar-rangements, etc.) on the Semantic Web
40 Linked Data & Semantic Web Technology
Linked Data on BBC
• Problems– lot of data (broadcast between 1,000 and 1,500 programs a
day)– hand-crafted, customized sites– often not maintained– often not persistent
• build upon Open Data Repositories– such as MusicBrainz and Wikipedia
41 Linked Data & Semantic Web Technology
Data from Wikipedia
Data from MusicBrainz
42 Linked Data & Semantic Web Technology
43 Linked Data & Semantic Web Technology
BBC Ontologies
• Programmes Ontology– every programme brand, series and episode broadcast by the BBC– the Programmes Ontology to expose data following the Linked Data ap-
proach, enabling the interchange of programme information on the Semantic Web
• Wildlife Ontology– a simple vocabulary for describing biological species and related taxa– terms for describing the names and ranking of taxa, as well as providing sup-
port for describing their habitats, conservation status, and behavioural charac-teristics, etc
• Curriculum Ontology– a core data model for formally describing the national curricula across the UK– to provide a model of the national curricula across the UK
44 Linked Data & Semantic Web Technology
LinkedMDB
• publishing the first open semantic web database for movies, including a large number of interlinks to several datasets
45 Linked Data & Semantic Web Technology
Linked Datafor User Generated Content
46 Linked Data & Semantic Web Technology
flickr™ wrappr
• to extend DBpedia with RDF links to photos posted on flickr
• to generate a collection of flickr photos for each of the 1.95 million DBpedia concepts
47 Linked Data & Semantic Web Technology
48 Linked Data & Semantic Web Technology
Revyu.com
• a web site where you can review and rate things
49 Linked Data & Semantic Web Technology
Open Graph Protocol
• to integrate web pages into the facebook’s social graph based on RDFa
<html xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml"> <head> <meta property="og:url" content="http://www.imdb.com/title/tt1285016/" /> <meta property='og:image' content='http://ia.media-imdb.com/…140_.jpg'> <meta property='og:type' content='movie' /> <meta property='fb:app_id' content='115109575169727' /> <meta property='og:title' content='The Social Network (2010)' /> <meta property='og:site_name' content='IMDb' />...
50 Linked Data & Semantic Web Technology
Linked Life Science Data
51 Linked Data & Semantic Web Technology
BIO2RDF
• a Biological data-base using the Se-mantic web tech-nologies to provide interlinked life sci-ence data
52 Linked Data & Semantic Web Technology
Linked Life Data
• a semantic data integration platform for the bio-medical domain
• Search and explore over RDF statements from various sources including UniProt, PubMed, En-trezGene and so forth
53 Linked Data & Semantic Web Technology
Select drugs related to asthma that are linked to a molecular interaction
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>PREFIX biopax2: <http://www.biopax.org/release/biopax-level2.owl#>PREFIX uniprot: <http://purl.uniprot.org/core/>PREFIX drugbank: <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/>SELECT distinct ?fullname ?drugname ?indicationWHERE {
?physicalEntity skos:semanticRelation ?protein . ?protein uniprot:recommendedName ?name. ?name uniprot:fullName ?fullname . ?target skos:exactMatch ?protein . ?drug drugbank:target ?target. ?drug drugbank:genericName ?drugname. ?drug drugbank:indication ?indication. filter(regex(?indication, "asthma", "i"))
}
54 Linked Data & Semantic Web Technology
References• http://en.wikipedia.org/wiki/Linked_data• http://en.wikipedia.org/wiki/Semantic_Web_Stack• http://www.w3.org/DesignIssues/LinkedData• http://lod-cloud.net/• http://en.wikipedia.org/wiki/Dbpedia• http://dbpedia.org/About• http://en.wikipedia.org/wiki/Freebase• http://www.freebase.com/• http://en.wikipedia.org/wiki/OpenCyc• http://www.cyc.com/platform/opencyc• http://en.wikipedia.org/wiki/GeoNames• http://www.geonames.org/• http://www.geonames.org/ontology/documentation.html• http://linkedgeodata.org/About• http://wiki.knoesis.org/index.php/SSW_Datasets• http://www.rdfabout.com/demo/census/• http://www.slideshare.net/cygri/the-state-of-linked-government-data• http://www.slideshare.net/onlyjiny/linked-open-government-data-15708234• http://data-gov.tw.rpi.edu/wiki/The_Data-gov_Wiki• http://data.gov.uk/linked-data• http://musicbrainz.org/• http://wiki.musicbrainz.org/LinkedBrainz• http://musicontology.com/• http://www.slideshare.net/alabarga/linked-data-in-industry• http://www.bbc.co.uk/ontologies/• http://linkedmdb.org/• http://wifo5-03.informatik.uni-mannheim.de/flickrwrappr/• http://revyu.com/• http://en.wikipedia.org/wiki/Open_Graph_protocol#Open_Graph_protocol• http://www.slideshare.net/onlyjiny/social-semantic-web-on-facebook-open-graph-protocol-and-twitter-annotations• http://bio2rdf.org/• http://linkedlifedata.com/• http://www.slideshare.net/echo4ngel/linked-data-in-healthcare-and-life-sciences-16926052
55 Linked Data & Semantic Web Technology
Dr. Myungjin Lee
e-Mail : [email protected] : http://twitter.com/MyungjinLee
Facebook : http://www.facebook.com/mjinlee
SlideShare : http://www.slideshare.net/onlyjiny/
Thanks foryour attention.