open data and linked data
DESCRIPTION
Providing open data is of interest for its societal and commercial value, for transparency, and because more people can do fun things with data. There is a growing number of initiatives to provide open data, from, for example, the UK government and the World Bank. However, much of this data is provided in formats such as Excel files, or even PDF files. This raises the question of - How best to provide access to data so it can be most easily reused? - How to enable the discovery of relevant data within the multitude of available data sets? - How to enable applications to integrate data from large numbers of formerly unknown data sources? One way to address these issues to to use the design principles of linked data (http://www.w3.org/DesignIssues/LinkedData.html), which suggest best practices for how to publish and connect structured data on the Web. This presentation gives an overview of linked data technologies (such as RDF and SPARQL), examples of how they can be used, as well as some starting points for people who want to provide and use linked data. The presentation was given on August 8, at the Hacknight event (http://hacknight.se/) of Forskningsavdelningen (http://forskningsavd.se/) (Swedish: “Research Department”) a hackerspace in Malmö.TRANSCRIPT
Hans Rosling
http://www.flickr.com/photos/23176450@N08/2663925153/
Hans Rosling
http://www.flickr.com/photos/23176450@N08/2663925153/
"The database hugging in public institutions is hampering innovation."
Hans Rosling at OECD World Forum in Istanbul, 2007http://www.viddler.com/explore/JesseRobbins/videos/4/
Why open data?
• Transparency
• Value – for society as a whole and commercially
• More people can do fun things with data!
5
6
7
9
• How best to provide access to data so it can be most easily reused?
• How to enable the discovery of relevant data within the multitude of available data sets?
• How to enable applications to integrate data from large numbers of formerly unknown data sources?
10
http://www.flickr.com/photos/kateconsumption/3636784054/
★ Available on the web (whatever format),
but with an open licence
★★ Available as machine-readable structured
data (e.g. excel instead of image scan of a table)
★★★ as (2) plus non-proprietary format (e.g.
CSV instead of excel)
★★★★ All the above plus, Use open standards from W3C
(RDF and SPARQL) to identify things, so that people can point at your stuff
★★★★★ All the above, plus: Link your data to other
people’s data to provide context
RDF?
RDF?
Resource Description Framework
RDF?
Resource Description Framework
“Description”?
Subject - Predicate - Object
Subject - Predicate - Object
Image from the book Semantic Web for the Working Ontologist by Allemang and Hendler.
Image from the book Semantic Web for the Working Ontologist by Allemang and Hendler.
Multiple Sources
+
+Image from the book Semantic Web for the Working Ontologist by Allemang and Hendler.
One graph...
Image from the book Semantic Web for the Working Ontologist by Allemang and Hendler.
What is what?
• If two sources use the same terminology, do they have the same thing in mind?
• URIs to the rescue!
• Two nodes are “the same” if they share the same URI.
SPARQL
• SPARQL Protocol and RDF Query Language (recursive acronym...)
• W3C recommendation
• A query contain a set of triple patterns called a basic graph pattern.
• Triple patterns are like RDF triples except that each of the subject, predicate and object may be a variable.
Linked Data principles1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
4. Include links to other URIs so that they can discover more things.
http://www.w3.org/DesignIssues/LinkedData.html
24
Linked data
25Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
• Extract structured information from Wikipedia and to make this information available on the Web.
• The DBpedia knowledge base currently describes more than 3.4 million things, out of which 1.5 million are classified in a consistent ontology, including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000 video games, 140,000 organizations, 146,000 species
Possible Queries
• DBPedia allows you to find answers to questions where the information is spread across many different Wikipedia articles.
• For example...
28
People who were born in Copenhagen before 1900
29
30
NYTimes – Linked Open Data
http://data.nytimes.com/
32
Example app: Who Went Where?
http://magicalnihilism
.com/2009/11/07/get-excited-and-m
ake-things/
Want to make data available?
37
Want to find data?
39
40
Want to build?
42
Read more• Heath and Bizer (2011) Linked Data: Evolving the
Web into a Global Data Spacehttp://linkeddatabook.com/editions/1.0/
• Allemang and Hendler (2011) Semantic Web for the Working Ontologisthttp://workingontologist.org/
• http://open.blogs.nytimes.com/2010/03/30/build-your-own-nyt-linked-data-application/
• http://www.w3.org/2001/sw/wiki/Tools