the web of data emerging industries
DESCRIPTION
The Web of Data emerging industries . Michalis Vafopoulos 04/04/2013. Contents . The Web of documents vs. Web of data Some technology Some economics ..and action PSGR project and more…. The Web of Documents. Simple, big and unstructured Organized in Silos But humans: - PowerPoint PPT PresentationTRANSCRIPT
The Web of Data emerging industries
Michalis Vafopoulos04/04/2013
Contents ① The Web of documents vs. Web of
data– Some technology– Some economics– ..and action
② PSGR project ③ and more…
2
The Web of Documents• Simple, big and unstructured• Organized in Silos
But humans:• are interested in Things,no documents & these Things might be in docs or elsewhere
• Limited capacity to extract meaning...
3
The Web of Data• Analogy: a global file system ----> global database• Designed for: human consumption ->machines first, humans
later• Primary objects: documents --> things (or descriptions of
things)• Links between: documents --> things • Degree of structure in objects: fairly low ---> high• Semantics of content and links: implicit --> explicit
(Tom Heath)4
The Web of Data: why?
5
encourages reuse reduces redundancy maximizes its (real and potential)
inter-connectedness enables network effects to add value
to data
The Web of Data: how?
6
– current state on the Web• Relational Databases• APIs• XML• CSV• XLSComputers can’t consume data because:• Different formats & models• Not inter-connected
The Web of Data: how?
7
– we need to create a standard way of publishing Data on the Web (like HTML for docs)
This is the Resource Description Framework
(RDF)
(a simple example here from Juan F. Sequeda), more next semester!)
Resource Description Framework (RDF)
• A data model – A way to model data– Inspired form Relational databases and Logic
• RDF is a triple data model• Labeled Graph (semantic networks)• Subject, Predicate, Object<Isidoro> <was born in> <Chios><Chios> <is part of> <Greece>
Example: Document on the Web
Databases back up documents
Isbn Title Author PublisherID ReleasedData
978-0-596-15381-6
Programming the Semantic Web
Toby Segaran
1 July 2009
… … … … …PublisherID PublisherNa
me1 O’Reilly
Media… …
This is a THING:A book title “Programming the Semantic Web” by Toby Segaran, …
THINGS have PROPERTIES:A Book as a Title, an author, …
Data representation in RDF
book
Programming the Semantic
Web
978-0-596-15381-6
Toby Segaran
Publisher O’Reilly
title
name
author
publisher
isbn
Isbn Title Author PublisherID
ReleasedData
978-0-596-15381-6
Programming the Semantic Web
Toby Segaran
1 July 2009
PublisherID
PublisherName
1 O’Reilly Media
Everything on the web is identified by a URI!
link the data to other data
http://…/
isbn978
Programming the Semantic
Web
978-0-596-15381-6
Toby Segaran
http://…/
publisher1
O’Reilly
title
name
author
publisher
isbn
consider the data from Revyu.comhttp://
…/isbn978
http://…/
review1
Awesome Book
http://…/
reviewerJuan
Sequeda
hasReview
reviewerdescription
name
start to link data
http://…/
isbn978
Programming the Semantic Web
978-0-596-15381-6
Toby Segaran
http://…/publisher
1O’Reilly
title
name
author
publisher
isbn
http://…/
isbn978
sameAs
http://…/
review1
Awesome Book
http://…/
reviewer
Juan Sequeda
hasReview
hasReviewerdescription
name
Juan Sequeda publishes data too
http://juansequeda.com/id
livesInJuan Sequedaname
http://dbpedia.org/Austin
Let’s link more datahttp://
…/isbn978
http://…/
review1
Awesome Book
http://…/
reviewer
Juan Sequeda
http://juansequeda.com/id
hasReview
hasReviewerdescription
name
sameAs
livesIn
Juan Sequedaname
http://dbpedia.org/Austin
Linked data = internet + http + RDF
http://…/isbn978
Programming the Semantic Web
978-0-596-15381-6
Toby Segaran
http://…/publisher1
O’Reilly
title
name
author
publisher
isbn
http://…/isbn978
sameAs
http://…/
review1
Awesome Book
http://…/
reviewer
Juan Sequeda
http://juansequeda.
com/id
hasReview
hasReviewer
description
name
sameAs
livesIn
Juan Sequedaname
http://dbpedia.org/Austin
Linked data = internet + http + RDF
Linked Data Principles1. Use URIs as names for things2. Use URIs so that people can
look up (dereference) those names.
3. When someone looks up a URI, provide useful information.
4. Include links to other URIs so that they can discover more things.
Web as a databaseLinked Data makes the web exploitable as ONE GIANT HUGE GLOBAL DATABASE!
Is there any query language like sql?SPARQL…
May 2007
What is a Linked Data application/service?
Software system that makes use of data on the Web from multiple
datasets and that benefits from links between the datasets
Characteristics of Linked Data Applications
• Consume data that is published on the web following the Linked Data principles: an application should be able to request, retrieve and process the accessed data
• Discover further information by following the links between different data sources: the fourth principle enables this.
• Combine the consumed linked data with data from sources (not necessarily Linked Data)
• Expose the combined data back to the web following the Linked Data principles
• Offer value to end-users
the 5 stars of open linked data
★make your stuff available on the Web (whatever format)★★make it available as structured data (e.g. excel instead of image scan of a table)★★★non-proprietary format (e.g. csv instead of excel)★★★★use URLs to identify things, so that people can point at your stuff★★★★★link your data to other people’s data to provide contexthttp://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
Two magics of Web Science: the case of Linked Data
The (practical) question
contextualized & hands-on experience in Semantic Web & Business 3.0 on a unique, fast evolving and semantified dataset
29
PSGR project: the answer
The first attempt to generate, curate, interlink and distribute daily updated public spending data in LOD formats that can be useful to both expert (i.e. scientists and professionals) and naïve users.
30
The context first…
31
Economy after the Web
New form of property• Public, Private, Peer (e.g. Wikipedia)
The right to: • Use-modify-benefit-transfer resources
• Energetic & connected consumption• Pro-sumption
32
Research question
Web economy: from potential to actual
Enable new virtuous cycles in the economy through Linked Open Data
33
Outline ① EU Unification: institutions-technology② Why Linked Open Data? ③ Economic LODo the story so faro how to starto use caseso engineering
④Government Budget⑤Tenders ⑥Spending⑦Business Information ⑧Next steps
34
EU Unification: the institutions Best in theory – poor in practicea (complicated) market example• monetary policy, currency, eurozone • European Single Market • fiscal policy FORTHCOMING
35
EU Unification: the technology Linked Data or Web of data• “publish once, use many times”. • different consumers extract different
slices of the data for different purposes• publish in context:
value & “meaning”
36
EU Unification: the technology
• Linked Data (LD) + Open Data =LOD• Economic LOD as “data currency”
37
Why LOD?
• Transparency & innovation
Network effects: enabling users to • bidirectional & massively processable
interconnections among data • re-using the existing infrastructure in the
government and business spheres
38
Economic LOD: the story so far
• Isolated/fragmented behind technological & institutional barriers• General statistics: Eurostat etc. • LOD2 case • LOTTED (Linked Open Tenders Electronic Daily)
39
Economic LOD: how to start A general model
40
Economic LOD: use cases
• Business applications on top• Users: citizens, gov., EU, business• track the life-cycle of every financial flow:
evaluate budget allocation, tenders, spending and their efficiency• pre-allocate resources on provisional
public works • receive & submit information in real-time
41
Economic LOD: engineering
42
Government Budget• heterogeneous repositories & methods (mainly PDF)
43
Tenders • Closed data in HTML• Public Contracts Ontology (PCO), e.g. – pco:Contract and pco:AwardCriterion
• Common Procurement Vocubulary• now working on linking our ontology to:– Payments Ontology – GoodRelations – FOAF
44
Spending • most dynamic & open part• increasing number of countries/cities• raw & structured data• leader: the Greek Clarity project• spending decisions ex-ante to execution• Actually every decision
45
www.publicspending.gr (*****)• based on Greek Clarity & Tax information• semantify, interconnect, clean, visualize,
SPARQL endpoint, daily update• PSGR ontology Links to– WESO products classif. – UK Payments Ontology– DBpedia and Geonames– …more to come
46
Business Information • Registries: mainly closed• Key standards– Classification of Products by Activity (CPA)– eXtensible Business Reporting Language (XBRL)
47
Business Information
48
Next steps
• Working on our basic ontology• Real-life examples & apps• Bad news: A long way to go• Good news: we have started
49
PSGR ① why Linked Open Data (LOD)② LOD in Greece③ issues ④ WHERE MY MONEY GOES App⑤ local spending in EU demo ⑥ to the future
50
Why public spending LOD
omore & better information oobjective and processable information
for economic/political “dialogue”• to promote competition• to decrease cost • to judge the efficiency of policy mixtures• to enable participation
51
LOD in Greece: current status
• in its infancy – NO Apps yet• 2-3 stars• Open not Linked• very limited public awareness
52
LOD in Greece: why it is important
• quality of information during economic crisis• transparency & efficiency in funding
development
53
Issues ohow can we initiate the virtuous cycle of
creation?demonstrate LOD’s added value
ohow to get the most out of data?local & global interconnections
54
In few words,
Apps, Apps, Apps…..
55
WHERE MY MONEY GOES in Greece publicspending.gr
• the first LOD App in Greece• daily updates• open spending linked data, endpoint &
visualizations
56
WHERE MY MONEY GOES in Greece publicspending.gr
• Input 1.“Diavgeia” (all public spending decisions online daily)
API, average data quality, rich information• Payer, payee (amount, VAT number, name)• CPA 2008: Classification of products by Activity• CPV 2008: Common Procurement Vocabulary• Original decision text in pdf
2. TAXIS (official Tax Information System)VAT number validation and profile request
57
Checklist ①Ontology – enriching with core vocub. ②Basic visualizations ③SPARQL endpoint - thedatahub④Interconnections– Product classifications – Open Corporates– Greek LOD (e-proc, geodata, dbpedia)– EU and US (CPV -> NAICS)
⑤Demos & services⑥Public awareness - working with the media , hackathons,
courses, theses 58
59
60
Architecture
61
62
publicspending.gr ontology
63
Network analysisBetweenness Centrality: how often a node appears on shortest paths between nodes in the network
64
65
Size: Betweness Cent.Color: HUB (HITS)
66
Node size:Weighted- In Degree Cent., Node color: PageRank
67
Competition in telecoms
Comments, ideas and more
68
Additional material
69
History of LD• Linked Data Design Issues by TimBL July 2006• Linked Open Data Project WWW2007• First LOD Cloud May 2007• 1st Linked Data on the Web Workshop WWW2008• 1st Triplification Challenge 2008• How to Publish Linked Data Tutorial ISWC2008• BBC publishes Linked Data 2008• 2nd Linked Data on the Web Workshop WWW2009• NY Times announcement SemTech2009 - ISWC09• 1st Linked Data-a-thon ISWC2009• 1st How to Consume Linked Data Tutorial ISWC2009• Data.gov.uk publishes Linked Data 2010• 2st How to Consume Linked Data Tutorial WWW2010• 1st International Workshop on Consuming Linked Data COLD2010
More Examples• http://data-gov.tw.rpi.edu/wiki• http://dbrec.net/• http://fanhu.bz/• http://data.nytimes.com/schools/scho
ols.html• http://sig.ma • http://visinav.deri.org/semtech2010/
References
• Weaving the Economic Linked Open Data• The Web Economy: Goods, Users, Models, and Policies• Public Spending: Interconnecting and Visualizing Gr
eek Public Expenditure Following Linked Open Data Directives
• A Framework for Linked Data Business Models
73