data-mining the semantic web @tcd
TRANSCRIPT
Data-mining the Semantic Weband spatially visualising the resultsDAH workshopTrinity College Dublin 27 May 2015
2 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Workshop overview
• Morning session : Data-mining– Open Data– Linked Data– Linked Open Data implementation– Semantic Web and ontologies– Hands-on practical exercises
3 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Workshop overview
• Afternoon session : Data visualisation– Data visualisation concepts introduction– Web maps and geo-tagging– Hands-on practical– Interpretations– Hermeneutic circle
4 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
But first, a very quick survey
• Your occupation– UG student– PG student– Professional academic– Non-academic
5 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
A quick survey
• Your age group– Under 16– 16-24– 25-34– 35-44– 45-54– 55 and over
6 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
A quick survey
• How familiar are you with Open Access?– 1 - Not familiar at all– 2– 3– 4– 5 – Very familiar
7 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
A quick survey
• How familiar are you with Open Data?– 1 – Not familiar at all– 2– 3– 4– 5 – Very familiar
8 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
A quick survey
• How familiar are you with Linked Data?– 1 – Not familiar at all– 2– 3– 4– 5 – Very familiar
9 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
A quick survey
• How familiar are you with the Semantic Web?– 1 – Not familiar at all– 2– 3– 4– 5 – Very familiar
10 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
A quick survey
• Have you ever published Open Data?– Yes– No
11 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
A quick survey
• Have you ever consumed Linked Open Data services?– Yes– No
12 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
A quick survey
• Please fill in your…– Name– Email address
Don’t worry – I’m not going to pass them on to anyone
13 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
From the horse’s mouth
(source: www.ted.com/talks/tim_berners_lee_on_the_next_web)
14 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
15 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Open Access
TerminologyOpen Data
Big Data
The web of data
The Semantic WebLinked Data
data mining
16 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Asking questions of digital datasets
Terminology
17 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Open Access
Terminology
18 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Design by Julie Beckfor the Harvard University Neuroinformatics dept(source: www.juliebcreative.com/portfolio/open-data-logo/)
19 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Linked DataTerminology
The linkages between the major Linked Data datasets (source: lod-cloud.net)
20 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Big DataTerminology
Wordle of terms associated with Big Data activity (source: sfdata.startupweekend.org)
21 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
5 Stars of Open Data
put your data online under an open license
make it structured (e.g. as an Excel file)
use non-proprietary formats (e.g. XML and not Excel)
use URIs to identify resources
link your data to external datasets
22 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
The RDF Triple
23 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
A Triple Example
‘…the boy’s name is Tom…’
subject
predicate
object
24 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Triple Linking
‘…Tom is short for Thomas…’
subject
predicate
object
25 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Graph data
26 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Serialising RDF
• Turtle
• JSON
• RDF/XML
• N-Triples
27 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
RDF Turtle@base <http://example.org/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .@prefix rel: <http://www.perceive.net/schemas/relationship/> .
<green-goblin> rel:enemyOf <spiderman> ; a foaf:Person ; # in the context of the Marvel universe foaf:name "Green Goblin" .
<spiderman> rel:enemyOf <green-goblin> ; a foaf:Person ; foaf:name "Spiderman", "Человек-паук"@ru .
1
2
3
28 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
As N-Triples
<http://example.org/green-goblin> <http://www.perceive.net/schemas/relationship/enemyOf> <http://example.org/spiderman> .<http://example.org/green-goblin> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .<http://example.org/green-goblin> <http://xmlns.com/foaf/0.1/name> "Green Goblin" .<http://example.org/spiderman> <http://www.perceive.net/schemas/relationship/enemyOf> <http://example.org/green-goblin> .<http://example.org/spiderman> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .<http://example.org/spiderman> <http://xmlns.com/foaf/0.1/name> "Spiderman" .<http://example.org/spiderman> <http://xmlns.com/foaf/0.1/name> "\u00D0\u00A7\u00D0\u00B5\u00D0\u00BB\u00D0\u00BE\u00D0\u00B2\u00D0\u00B5\u00D0\u00BA-\u00D0\u00BF\u00D0\u00B0\u00D1\u0083\u00D0\u00BA"@ru .
29 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
As JSON{"http:\/\/example.org\/green-goblin":{"http:\/\/www.perceive.net\/schemas\/relationship\/enemyOf":[{"type":"uri","value":"http:\/\/example.org\/spiderman"}],"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type":[{"type":"uri","value":"http:\/\/xmlns.com\/foaf\/0.1\/Person"}],"http:\/\/xmlns.com\/foaf\/0.1\/name":[{"type":"literal","value":"Green Goblin"}]},"http:\/\/example.org\/spiderman":{"http:\/\/www.perceive.net\/schemas\/relationship\/enemyOf":[{"type":"uri","value":"http:\/\/example.org\/green-goblin"}],"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type":[{"type":"uri","value":"http:\/\/xmlns.com\/foaf\/0.1\/Person"}],"http:\/\/xmlns.com\/foaf\/0.1\/name":[{"type":"literal","value":"Spiderman"},{"type":"literal","value":"\u0427\u0435\u043b\u043e\u0432\u0435\u043a-\u043f\u0430\u0443\u043a","lang":"ru"}]}}
30 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
As RDF/XML<?xml version="1.0" encoding="utf-8" ?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:ns0="http://www.perceive.net/schemas/relationship/">
<foaf:Person rdf:about="http://example.org/green-goblin"> <ns0:enemyOf> <foaf:Person rdf:about="http://example.org/spiderman"> <ns0:enemyOf rdf:resource="http://example.org/green-goblin"/> <foaf:name>Spiderman</foaf:name> <foaf:name xml:lang="ru">Человек-паук</foaf:name> </foaf:Person> </ns0:enemyOf>
<foaf:name>Green Goblin</foaf:name> </foaf:Person>
</rdf:RDF>
31 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Visualised as a Graph
32 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Triplestoresand
InfrastructureA server farm (source: www.cirrusinsight.com)
33 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Practical: Making RDF
http://www.franklynam.com/blog.aspx?id=85
Q: Create RDF representations of yourself and your relationships
34 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
The Semantic Web and Ontologies
The stages of the Web (source: urenio.org)
35 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Ontological Classes and Properties
36 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
The British Museum data mapping onto the CIDOC CRM(source: confluence.ontotext.com/display/ResearchSpace/BM+Mapping)
37 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
The CIDOC CRM basic entity types and their relationships(source: www.cidoc-crm.org/)
38 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Vocabularies
39 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Graph data
40 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Minna Sundberg (source: www.sssscomic.com/comic.php?page=196)
41 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Querying using SPARQL
SELECT *WHERE {
?s ?p ?o} LIMIT 10
42 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
More complex SPARQL
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX letters1916: <http://letters1916.linkedarc.net/ontology/>PREFIX letters1916data: <http://letters1916.linkedarc.net/data/>PREFIX schema: <http://schema.org/>
SELECT DISTINCT ?letter ?letterName ?recipientPostalAddressName ?recipientLongitude ?recipientLatitudeWHERE {
?letter rdf:type letters1916:Letter ;schema:name ?letterName ;letters1916:recipientLocation ?recipientPostalAddress .
?recipientPostalAddress schema:addressRegion ?recipientPostalAddressRegion ;FILTER regex(?recipientPostalAddressRegion, 'Galway', 'i')?recipientPostalAddress schema:name ?recipientPostalAddressName .
?recipientPlace schema:address ?recipientPostalAddress ;schema:geo ?recipientGeoCoordinates .
?recipientGeoCoordinates schema:longitude ?recipientLongitude ;schema:latitude ?recipientLatitude
}
1
2
3
43 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Practical: Universities on DBpedia
http://www.franklynam.com/blog.aspx?id=86
Q: Get a list of all of the universities that DBpedia knows about
44 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
SKOS
@prefix dct: <http://purl.org/dc/terms/> .@prefix skos: <http://www.w3.org/2004/02/skos/core#> .@prefix cc: <http://creativecommons.org/ns#> .
<http://linkedarc.net/vocabs/vessel-jar> a skos:Concept ;cc:license <http://creativecommons.org/licenses/by/3.0> ;cc:attributionURL <http://linkedarc.net> ;cc:attributionName "linkedarc.net" ;skos:inScheme <http://linkedarc.net/vocabs> ;skos:prefLabel “Jar" ;skos:scopeNote ”A jar concept. Pottery. This isn’t a great scope note." ;dct:publisher <http://linkedarc.net> ;dct:identifier <http://linkedarc.net/vocabs/vessel-jar> ;dct:issued "2015-02-23"^^xsd:date ;skos:exactMatch <http://purl.org/heritagedata/schemes/mda_obj/concepts/97609> .
45 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
SPARQL + FILTER
SELECT * WHERE { ?s rdfs:label ?label .
FILTER langMatches(lang(?label), "en”)}
46 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
SPARQL + FILTER
SELECT * WHERE { ?s rdfs:label ?label .
FILTER langMatches(lang(?label), "en") .
FILTER regex(?label, ”bell", "i”)}
47 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
SPARQL + FILTER
SELECT * WHERE { ?s dct:dateCreated ?dateCreated .
FILTER (?dateCreated > '1900-01-01'}
48 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Practical: British Museum Sarcophagi
Q: Get the find spots of all of the sarcophagi in the British Museum collection
SPARQL endpoint: http://collection.britishmuseum.org/sparql
49 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Practical: Archaeological stratigraphy
Q: Get the stratigraphic relationships between the contexts excavated at Priniatikos Pyrgos
SPARQL endpoint: http://linkedarc.net/sparql
50 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Stratigraphy explained (very briefly…)
Sample stratigraphic sequence (source: www.lparchaeology.com)
51 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
The Priniatikos Pyrgos ontology
52 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Practical: Archaeological stratigraphy
Q: Get the stratigraphic relationships between the contexts excavated at Priniatikos Pyrgos
SPARQL endpoint: http://linkedarc.net/sparql
Hint: you will need to traverse 2 levels of the ontology’s hierarchy to get at the stratigraphy data
53 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Practical: Nomisma and Ancient Coins
Q: Get the geo-coordinates of all of the coin hoards stored in the Nomisma triplestore
SPARQL endpoint: http://nomisma.org/sparql
54 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Geo-coding the Find Spotswith Google Refine
55 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
The Google Maps API
Address String
Geo-coordinates as JSON
56 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Export as CSV
57 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Practical: Getty Concepts
Q: Get all of the Getty URIs that represent concepts related to amphorae
SPARQL endpoint: http://vocab.getty.edu/sparql
58 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Additional Linked Data Resources
http://www.franklynam.com/blog.aspx?id=89
59 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
One final quick survey
• Please arrange the practicals in terms of how easy they were to complete (1 for hardest and 5 for easiest)?– Making your FOAF profile– DBpedia universities– British Museum sarcophagi hunting– Getty vocabularies– Nomisma coin hoards
60 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
One final quick survey
• Would you consider publishing Linked Open Data in the future?– 1 – Absolutely not – 2– 3– 4– 5 – Definitely
61 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
One final quick survey
• Would you consider using Linked Open Data resources (using SPARQL or otherwise) in the future?– 1 – Absolutely not – 2– 3– 4– 5 – Definitely
62 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
One final quick survey
• Is Linked Open Data a feasible platform on which to undertake humanities research?– 1 – Absolutely not– 2– 3– 4– 5 – Definitely
63 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
One final quick survey
• Any final comments?
64 of 63@flynam @bilusaurusData-mining the Semantic Web and spatially visualising the resultsDAH workshop
Thank you!
Martin Lemay (source: twitter.com/martinlemay)