lod examplar - lod museum -
TRANSCRIPT
Hideaki Takeda, Fumi Kato / National Institute of Informatics
LOD Application Exemplar- A case study: LODAC Museum
Hideaki TakedaFumi Kato
National Institute of Informaticstakeda@ nii.ac.jp
2012 INTERNATIONAL ASIAN SUMMER SCHOOL IN LINKED DATA IASLOD 2012, August 13-17, 2012, KAIST, Daejeon, Korea
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Aim of this talk
• How to plan, design, and implement LOD?• Learn from the case
Hideaki Takeda, Fumi Kato / National Institute of Informatics
LODAC Project• Open Social Semantic Web Platform for Academic
Resources– Providing platforms for Linked Open Data– Practicing data accumulation and publishing
• Interested Areas– Museum information– Geographical information, especially geographical names– Local information– Taxonomic information on species– …
http://lod.ac/
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Linked Open Data Initiative
http://linkedopendata.jp/
• Non Profit Organization– (Under application for approval)
• Academia + IT People + local people• Aim: facilitate LOD activities among local
people
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Museum data as LOD
• The state-of-the-art of museum information in Japan (nearly 6,000 museums in Japan)– Distributed
• Self maintained• Isolated
– Opaque• Self designed• Messy
• Aggregating and associating museum information– LODAC-Museum
Hideaki Takeda, Fumi Kato / National Institute of Informatics
LODAC Museum – Main work
• Gathering of data– Thesaurus, museum collections, etc
• Standardization of data– Representing data from different sources in a
unique form• Integration of data– Identifying data– Associating the same data
• Consuming of data
Hideaki Takeda, Fumi Kato / National Institute of Informatics
LODAC Museum Architecture
Gathering of data
Standardization of data
Integration of dataConsuming of data
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Gathering data
• No museums publish data as LOD!• We use data published as Web pages– Scrape and translate data– License is not clear • It is a serous problem• We need permission from every site in principle• We got permission from some data publishers not all
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Gathering data
• No museums publish data as LOD!• We use data published as Web pages– Scrape and translate data– License is not clear • It is a serous problem• We need permission from every site in principle• We got permission from some data publishers not all
Hideaki Takeda, Fumi Kato / National Institute of Informatics
DatasetType No. Data source
Art work (lodac:Work)
ca.80,000 Catalog of the collections of 3 National Art Museum (25,180), National Museum of Western Art (4,373), Tokushima Pref. Art Museum (18,482) … over 100 museums
Database for National Treasure & Important Cultural Property of National Designated (915)
The Japanese Art Thesaurus (266)Specimen (lodac:Speciment)
ca.1,690,000 (100+ Museum collections)Science Net (National Science Museum)
Person (foaf:Person) ca. 8,800 The Japanese Art Thesaurus
Facilities (icls. Museum)
ca. 200,000 The Japanese Art ThesaurusCultural Heritage OnlineGIS data National and Regional Planning Bureau
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Extracting collection data from museum websites
Extract
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Extract
Extracting collection data from museum websites
Property Value
Property Value
Hideaki Takeda, Fumi Kato / National Institute of Informatics
13
Standardization of dataRe-organized common metadata.
Raw Data
dc:title
crm:P45_consistOf
skos:preflabel
lodac:era
Re-organized Metadata
Current organized policies・ Use existing metadata・ Define own metadata.
....
Hideaki Takeda, Fumi Kato / National Institute of Informatics
14
Namespaces
Prefix Metadata Name
crm CIDOC-CRM
dc11 Dublin Core 1.1
dc DCMI Terms
skos Simple Knowledge Organization System
rdfs Resource Description Frame Work Schema
foaf Friend of a Friend
rda2 Resource Description and Access
lodac LODAC Project
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Metadata schema for works
lodac:Work PropertyGenre lodac:genreType of cultural assets lodac:culturalAssetsCreator dc:creator / dc11:creatorNationality crm:P7_took_place_atTitle dc:title / skos:prefLabelTitle Pronunciation (yomi) dc:title @ja-hrkt / skos:altLabelTitle in English dc:title @en / skos:altLabelInscription crm:P62I_is_depicted_bySeal crm:P65_shows_visual_itemNo. of parts crm:P57_has_number_of_partsCollection dc:isPartOfCreated year dc:createdEstimated starting year lodac:estimatedStartYearMaterial dc:medium / crm:P45_consists_of
Hideaki Takeda, Fumi Kato / National Institute of Informatics
(Ref-resource)Creator’s reference
(ID-resource)Creator’s information
dc:references dc:references
(Ref-resource)Creator’s reference
Integrating Data
• How to integrate data from different sources – sharing of responsibility• Each source is responsible for its data
– Identifying IDs for data and managing data with the IDs
• LODAC is only responsible for integration– Assigning original IDs and associating other IDs to them
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Integrating Data
Data from Source BIntegrated data
dc:references dc:references
dc:references dc:references
dc:references dc:references
dc:creatordc:creator
crm:P55_has_current_location
crm:P55_has_current_location
crm:P55_has_current_locationdc:creator
Data from Source AWork
Museum
Creator
Minimum Data to identify entitiesRaw Data for entities Raw Data for entities
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Integration of Person Data• Matching of Creators– Base: List of Artists from Thesaurus of Japanese Art– Target: Creators of collection in museums + Dbpedia– Method: String match of names– Results: Links from artist nodes to work nodes are added
LODAC data
Link to Work
DBpedia
Basic Information for Creators
Links
Hideaki Takeda, Fumi Kato / National Institute of Informatics
19
Integrating DataIntegrate Item Source Amount
of DataIntegration
Data
FacilitiesA.Japanese Art Thesaurus 648
77B.Cultural Heritage Online 915
Title of important cultural properties
A.Japanese Art Thesaurus (Art work) 3,80074
B.DB for National Treasure (Art work) 10,115
Creator information and Work Title
A.Japanese Art Thesaurus (Creator) 1,33215,020
B.All of art work (Work title string) 61,861
Creator nameA.Japanese Art Thesaurus (Creator) 1,332
615B.All of art work title(using creator name) 61,861
Hideaki Takeda, Fumi Kato / National Institute of Informatics
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:lodac="http://lod.a
c/ns/lodac#" xmlns:dc="http://purl.org/dc/terms/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:skos="http://www.w3.org
/2004/02/skos/core#">
<foaf:Person rdf:about="http://lod.ac/id/359">
<lodac:creates rdf:resource="http://lod.ac/id/20029"/>
<lodac:creates rdf:resource="http://lod.ac/id/20128"/>
<lodac:creates rdf:resource="http://lod.ac/id/20755"/>
<lodac:creates rdf:resource="http://lod.ac/id/24768"/>
<lodac:creates rdf:resource="http://lod.ac/id/26732"/>
……
<dc:references rdf:resource="http://ja.dbpedia.org/resource/ 下村観山 "/>
<dc:references rdf:resource="http://lod.ac/ref/359"/>
<rdfs:label xml:lang="ja"> 下村観山 </rdfs:label>
<skos:prefLabel xml:lang="ja"> 下村観山 </skos:prefLabel>
<foaf:name xml:lang="ja"> 下村観山 </foaf:name>
</foaf:Person>
20
Publishing data as RDF
ID-resource URI(Own address)
http://lod.ac/id/359
Ref-resource URIhttp://lod.ac/ref/359
External linkDBpedia Japanese
Links to her/his work URI
Hideaki Takeda, Fumi Kato / National Institute of Informatics
LODAC Museum Architecture
Gathering of data
Standardization of data
Integration of dataConsuming of data
Hideaki Takeda, Fumi Kato / National Institute of Informatics
LODAC Applications
• Photo BURARI Pro• Yokohama Art Spot• Go2Museum• http://lod.ac/apps
Hideaki Takeda, Fumi Kato / National Institute of Informatics
23
Photo BURARI Pro
Photo App with SPARQL
(C)ATR-Promotions,Inc
Hideaki Takeda, Fumi Kato / National Institute of Informatics
• SPARQL Endpoints– DBpedia– Linked Geo Data– LODAC
• Other data source– Sinsai.info
• Using JSON Result– JSON Framework for
Objective C
Photo BURARI Pro(C)ATR-Promotions,Inc
Hideaki Takeda, Fumi Kato / National Institute of Informatics
An example in Objective C
NSString* sparql = @” PREFIX dct: <http://purl.org/dc/terms/ > PREFIX omgeo: <http://www.ontotext.com/owlim/geo#> PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT distinct ?link ?title ?lat ?long WHERE{ ?link dct:references ?ref. ?ref rdfs:label ?title. ?ref geo:lat ?lat. ?ref geo:long ?long. ?ref omgeo:within(NW_lat NW_long SE_lat SE_long). } LIMIT 30” ;NSString* query = (NSString*)CFURLCreateStringByAddingPercentEscapes(kDFAllocatorDefault, (CFStringRef)sparql, NULL, CFSTR(“;,/?:@=+$#”), kCFStringEncodingUTF8) ;
NSURL *url = [NSURL URLWithString: query ];NSMutableURLRequest *req = [NSMutableURLRequest requestWithURL:url]; [req setValue:@”application/sparql-results+json” forHTTPHeaderField:@”Accept”];
NSURLResponse *resp;NSError *err;NSData *data = [NSURLConnection sendSynchronousRequest:req returningResponse:&resp error:&err]; NSString* result = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Yokohama Art Spot
–Application using museum and local data–Data related to art in
Yokohama• Collections• Events• Q&A
http://lod.ac/apps/yas/
LODAC Museum × Yokohama Art LOD × PinQA
Hideaki Takeda, Fumi Kato / National Institute of Informatics
System Architecture
Work
InstitutionArtistArtist Institution
EventQuestion
AnswerUser
PinQAYokohama Art LOD
LODAC Museum
SPARQL
JSON SPARQL
JSONXML
SPARQL
Yokohama Art Spot
‣ Python + SPARQLWrapper‣ Geolocation
Hideaki Takeda, Fumi Kato / National Institute of Informatics
PREFIX ical: <http://www.w3.org/2002/12/caaltzd#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-sl/icchema#>PREFIX event: <http://lod.ac/ns/event#>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>PREFIX dc: <http://purl.org/dc/terms/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX lodacid: <http://lod.ac/id/>PREFIX omgeo: <http://www.ontotext.com/owlim/geo#>PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
SELECT distinct ?event ?lat ?long ?title ?location_name ?location ?fee ?dtstart ?dtend WHERE { ?event a event:Event ; rdfs:label ?title ; event:fee ?fee; ical:location ?location ; ical:dtstart ?dtstart ; ical:dtend ?dtend . ?location rdfs:label ?location_name ; dc:references ?locRef. ?locRef omgeo:within(%(NE_lat)s %(NE_long)s %(SW_lat)s %(SW_long)s); vcard:postal-code ?postalcode; geo:lat ?lat; geo:long ?long. FILTER ((?dtstart > "%(dtstart)s"^^xsd:dateTime && ?dtstart < "%(dtend)s"^^xsd:dateTime) || (?dtend > "%(dtstart)s"^^xsd:dateTime && ?dtend < "%(dtend)s"^^xsd:dateTime) || (?dtstart < "%(dtstart)s"^^xsd:dateTime && ?dtend > "%(dtend)s"^^xsd:dateTime))}ORDER BY (omgeo:distance(?lat, ?long, %(C_lat)s, %(C_long)s))
Hideaki Takeda, Fumi Kato / National Institute of Informatics
PREFIX ical: <http://www.w3.org/2002/12/cal/icaltzd#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX event: <http://lod.ac/ns/event#>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>PREFIX dc: <http://purl.org/dc/terms/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX lodacid: <http://lod.ac/id/>PREFIX dc11: <http://purl.org/dc/elements/1.1/>
SELECT *WHERE { ?link a event:Event ; rdfs:label ?title ; event:fee ?fee; ical:categories ?cat; ical:location %(museum_id)s ; ical:dtstart ?dtstart ; ical:dtend ?dtend . ?cat dc11:title ?category. OPTIONAL{ ?link event:Credit ?crd . ?crd dc11:description ?credit . }}
Hideaki Takeda, Fumi Kato / National Institute of Informatics
PREFIX dc: <http://purl.org/dc/terms/>PREFIX dc11: <http://purl.org/dc/elements/1.1/>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX lodac: <http://lod.ac/ns/lodac#>PREFIX lodacid: <http://lod.ac/id/>
SELECT ?link ?title ?creator ?created ?genre ?material ?sizeWHERE { %(museum_id)s lodac:isProviderOf ?link . ?link rdfs:label ?title; dc:references ?workRef . ?workRef lodac:genre %(genre)s; dc11:creator ?creator; dc:medium ?material; dc:extent ?size . OPTIONAL{ ?workRef dc:created ?created; }}LIMIT 100
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Go2Museum
http://160.193.95.58/~ueda/go2museum/
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Museum data from various web sites
LODACMuseum
LODACLocation
NDLSearch
CiNii
Yahoo!Location
GoogleWeb/Map/Route
Link
Link
Link
SearchSearch
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Twitter: @go2museum
• “Today’s museum”• Recommendation based on lat&long of tweets
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Summary
• A life cycle of data is described– Scraping, standardizing, integrating, and publishing
• Important issues– Recognizing data– Designing schema
• Good for data• Good for RDF Store and SPARQL
– Developing applications• More people can be involved• Next cycle of data