Download - What is opendata
What is Open Data?
DATAVIZ: VISUAL REPRESENTATION OF COMPLEX PHENOMENA
data visualization & computational design
@ Better Nouveau Workshop14/12/2011
Lorenzo Benussi, TOP-IX [email protected]
1
Research & Business Development
TOP-IX Consortium
2
Fellow, Department of Economics University of Turin
Fellow, NEXA Centre
Polytechnic of Turin
About me
agenda
1. Background
2. Definitions
I. Open Knowledge Definition
II. Open Data Licenses
III. Pricing models
IV. Formats
3. Examples
3
Did you take the bus today?
4
Background
Ref: National Geographic http://ngm.nationalgeographic.com/big-idea/14/augmented-reality
5
BIG DATA stylized facts 1• $600 to buy a disk drive that can store all the
world's music.• 5 billion mobile phone in use in 2010.• 30 billion pieces of content shared on Facebook
every month.• 40% of projected growth in global data generated
per year VS 5% growth in global IT spending.• 235 terabytes data collected by US Library of
Congress in April 2011.• 15 out of 17 sectors in the United States have more
data stored per company than the US Library of Congress
McKinsey: Big Data: The next frontier of innovation, competition and productivity. (may 2011)
6
$300 billion potential annual value to US health care - more than X 2 total annual health care spending in Spain.
• €250 billion potential annual value to Europe's public sector administration - more than GDP of Greece.
• $600 billion potential annual consumer surplus from using personal location data globally.
• 60% potential increase in retailers' operating margins possible with big data.
• 140.000-190.000 more deep analytical talent position and 1.5 million more data-savvy managers needed to take full advantage of big data in the USA.
BIG DATA stylized facts 2
McKinsey: Big Data: The next frontier of innovation, competition and productivity. (may 2011)
7
WEB(squared)
Ref: Tim O’Reilly and John Battelle (2009), Web Squared: Web 2.0 Five Years On. http://www.web2summit.com/web2009/public/schedule/detail/10194
1.Redefining Collective Intelligence: New Sensory Input2.Cooperating Data Subsystems3.How the Web Learns: Explicit vs. Implicit Meaning4.Web Meets World: The "Information Shadow" and the Internet of Things5.The Rise of Real Time: A Collective Mind
8
Digital technology could enable an extraordinary range of ordinary people to become part of a creative process. (The future of ideas, Lawrence Lessig)
9
When I say that innovation is being democratized, I mean that users of products and services—both firms and individual consumers—are increasingly able to innovate for themselves.(Democratizing Innovation, Eric Von Hippel)
10
• Data
• Information
• Knowledge
• Value
11
Hal Varian, Google’s Chief Economist
The value of metrics
12
Data are not closed inside applications but they are consumed on-demand as a serviceRESTful API make possible to access data as a web resource (trough URI)
DATA as a SERVICE
13
Business ModelsA. Data owner: paid to publish / revenue share.B. Data user: pay for data delivery/trasformation/
analysis services.
New Generation Marketplace3. Works with open and not-open data4. Provide data on-the-fly through API (evan custom).5. Sometime the community of data curators in
involved to maintain and expand the data crowd-sourcing (e.g. Factual).
6. Provide tools (web based) to explore the data
14
What open data means? Open Data is a model to extract value from public sector information by using the data to build new tools and to create innovative services
15
• The Public Sector produces and manages huge amount of data, opening PSI information in EU produces economic growth 140 billion € / year (aggregate)
• Public Data are the raw material to create new products and services
PSI (public sector information) mines
16
COURTESY/RON WHEELER. The 8,000-foot deep Homestake Gold Mine in South Dakota is the site where scientists, including UC Berkeley researchers, plan to construct the world's deepest research center.
“Openness will strengthen our democracy and promote efficiency and effectiveness in
Government” Transparency and Open Government
Memorandum for the Heads of Executive Departments and Agencies (2009)
data.gov
17
[…] As you know, transparency is at the heart of our agenda for Government. We recognise that transparency and open data can be a powerful tool to help reform public services, foster innovation and empower citizens. David Cameron - Letter to Cabinet Ministers (2011)
Information is the currency of democracyBenjamin Franklin (attribution)
18
"... give us the unadulterated data, we want the data, we want unadulterated data. We have to ask for raw data now." Tim Berners-Lee, advisor data.gov.uk
Raw data now!
19
USA - data.gov
20
UK - data.gov.uk
Australia - data.gov.au
data.gov: leading examples
EUROPADirettiva 2003/98/CE del 17 novembre 2003
The evolution towards an information and knowledge society influences the life of every citizen in the Com-munity, inter alia, by enabling them to gain new ways of accessing and acquiring knowledge.
DIRECTIVE 2003/98/EC OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 17 November 2003 on the re-use of public sector information
Legislation in EU, Italy and Piedmont
ITALYDecreto Legislativo n. 36 January, 24 2006 and L. 96/2010.
PIEDMONTDelibera di Giunta regionale 36 - 1109 November 2010
21
• Accountability
• Tansparency
• Collaboration
• Participation
22
WHY : civil society
WHY : (digital) market
• Innovation
•Cooperation
•Competition
•Digital commons
23
The first example in Italy - dati.piemonte.it
24
apps4italy• All EU citizens can participate (!!) & 40K€
in cash prizes
• Building useful, innovative projects based on italian public data (not only open data)
• Four main categories (growing):
1. Ideas
2. Apps
3. Visualization
4. Datasets
25
Ref: appsforitaly.org
Open Data: definitions
26
Open Knowledge Definition v.1.1 by OKF
1. Access
2. Redistribution
3. Reuse
4. Absence of technological restriction
5. Attribution
6. Integrity
7. No discrimination (persons or groups)
8. No discrimination (fields or endeavor)
9. Distribution of license
10. License must not be specific to a package
11. License must not restrict the distribution of other works
A work is open if its manner of distribution satisfies the following conditions:
27
Open Definition - http://opendefinition.org/okd/Version 1.1
Terminology
The term knowledge is taken to include:
# 1.# Content such as music, films, books# 2.# Data be it scientific, historical, geographic or otherwise# 3.# Government and other administrative information
Software is excluded [...]
The term work will be used to denote the item or piece of knowledge which is being transferred.
The term package may also be used to denote a collection of works. [...]
The term license refers to the legal license under which the work is made available. Where no license has been made this should be interpreted as referring to the resulting default legal conditions under which the work is available (for example copyright).
28
The Definition - A work is open if its manner of distribution satisfies the following conditions:
1. ACCESSThe work shall be available as a whole and at no more than a reasonable reproduction cost, preferably downloading via the Internet without charge. The work must also be available in a convenient and modifiable form.
2. REDISTRIBUTIONThe license shall not restrict any party from selling or giving away the work either on its own or as part of a package made from works from many different sources. The license shall not require a royalty or other fee for such sale or distribution.
3. REUSEThe license must allow for modifications and derivative works and must allow them to be distributed under the terms of the original work.
29
4. ABSENCE OF TECHNOLOGICAL RESTRICTIONThe work must be provided in such a form that there are no technological obstacles to the performance of the above activities. This can be achieved by the provision of the work in an open data format, i.e. one whose specification is publicly and freely available and which places no restrictions monetary or otherwise upon its use.
5. ATTRIBUTIONThe license may require as a condition for redistribution and re-use the attribution of the contributors and creators to the work. If this condition is imposed it must not be onerous. For example if attribution is required a list of those requiring attribution should accompany the work.
6. INTEGRITYThe license may require as a condition for the work being distributed in modified form that the resulting work carry a different name or version number from the original work.
30
7. NO DISCRIMINATION AGAINST PERSONS OR GROUPSThe license must not discriminate against any person or group of persons.
8. NO DISCRIMINATION AGAINST FIELDS OF ENDEAVORThe license must not restrict anyone from making use of the work in a specific field of endeavor. For example, it may not restrict the work from being used in a business, or from being used for genetic research.
9. DISTRIBUTION OF LICENSEThe rights attached to the work must apply to all to whom it is redistributed without the need for execution of an additional license by those parties.
10. LICENSE MUST NOT BE SPECIFIC TO A PACKAGEThe rights attached to the work must not depend on the work being part of a particular package. If the work is extracted from that package and used or distributed within the terms of the work’s license, all parties to whom the work is redistributed should have the same rights as those that are granted in conjunction with the original package.
11. LICENSE MUST NOT RESTRICT THE DISTRIBUTION OF OTHER WORKSThe license must not place restrictions on other works that are distributed along with the licensed work. For example, the license must not insist that all other works distributed on the same medium are open.
31
Open Data: prices 32
• The transition from a physically-based to a knowledge-based economic environment made information a primary wealth-creating asset.
• Digital access to information seems to have changed the structure of many industries, promoting services-oriented business models based on disclosure and sharing of information and knowledge.
A paradigmatic shift:information economy
33
• The Public Sector holds and manages huge amounts of data and information. Fostering access to those repositories enables new business opportunities that can broaden market volumes in such sectors.
• PSI represents the raw material from which value added products and services can be designed.
A paradigmatic shift:PSI data mines
34
PSI can be used and reused in many ways (non rivalry in
consumption):
1.Broad range of sectors
2.Different sets of actors
3.PSI holders
4.Private re-users
5.Regulatory bodies
6.Citizens
The use/value of PSI
35
Several supply chain configurations.
1.Linear models (private re-users add value)
2.User generated contents
3.Information sharing between public bodies
• The peculiar cost structure of digital data collecting, processing and delivering (high fixed costs, zero marginal cost) strongly influences the possible pricing strategies to be adopted by PSI holders.
• Pollock (2008): a price that equals marginal costs (i.e. PSI free of charge) is socially optimal provided that elasticity of demand and positive externalities overcome a given threshold.
✓Empirics: those conditions are likely to be verified in most of the PSI domains.
The price of PSI:the “free data” approach
36
• Although a cost recovery regime may bound potential demand and distort competition, several critical issues could trigger its adoption.
• Underestimation of downstream demand and network externalities.✓Lack of long-run commitment in subsidizing PSI collection.✓Short-term decision making.✓Moral hazard (?).
The price of PSI:cost recovery approach
37
Directive 2003/98/EC is aimed at fostering PSI reuse mainly by promoting:1.PSI availability in digital format2.Transparency of reuse conditions and pricing3.Non discrimination
Directive impact Main condition Example
Closed shop Minor. Public Sector bodies continue to control the supply chain.
Information is strongly liked with the functioning of public bodies.
Cadastral information
Battlefield Non-negligible. New entrants step into the downstream market.
Information is important while not strategic for PA.
Meteorological data
Playground
Strong. Public Sector enlarges its influence over the downstream stages.
Digitalization offers new opportunities for value extraction.
Legal information
Non-negligible. Public Sector has the only role of information holder.
Information reuse generates high demand volumes from citizens and firms
Traffic and transport information
38
MEPSIR (2006) Which market configurations are likely to emerge?
The price of PSI: possible scenarios
All pricing strategies encompass potential risks of inefficiency for PSI holders (due to lack of incentives in reducing costs
and/or improving quality).
The importance of the regulatory framework
The price of PSI:Externalities & Policy
39
The Central Role of Externalities
Open Data: formats40
Linked open data and Semantic web
The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data. (by Tim Berners-Lee)
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more things.
Ref: http://www.w3.org/DesignIssues/LinkedData.html
41
42
Linked open data: basic principles
1. Everything has a name (people, locations, etc.)
1. Every name starts with http://
3. All data are described by using RDF (Resource Description Framework is a W3C standard).
Tim Berners Lee talk on linked data:http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
43
Data as a RDF graph
44
The Vision - A global interconnected database
45
The Vision - Mix data on-the-fly
46
Linked data - hands onDBPedia provide information of wikipedia as Linked Data. Example, Turin airport: http://dbpedia.org/page/Turin_Caselle_Airport
47
Open Data: license
48
Open Data license 1 (OKF)
Open Knowledge foundation licences
1. Public Domain Dedication and License (PDDL) — “Public Domain for data/databases”
2. Open Data Commons Attribution License (ODC-By) — “Attribution for data/databases”
3. Open Data Commons Open Database License (ODC-ODbL) — “Attribution Share-Alike for data/databases”
Ref: http://www.opendatacommons.org/licenses/
49
Open Data licenses 2 (CC e IODL)
Creative Commons Licenses (http://creativecommons.org/licenses/)
1. CC Zero
2. CC by - Atribution
3. CC SA - Share alike
4. CC BY-SA - Attribution and Share alike
Italian open data license (http://www.formez.it/iodl/)
• IODL - Italian Open Data License (BY-SA)
50
examples
51
2 groupsI. Transparency
II. Information services
52
Transparency
• Public assembly (parliament, councils)
• Public Budget and expenses
• Public procurement
53
Info services
• Transportation
• Environment
• Cultural heritage
Ref: http://traintimes.org.uk/map/tube/
54
food
55
kids
56
environment
57
transportation
58
Ref: http://traintimes.org.uk/map/tube/
Data VIZ59
Ref: http: //www.gapminder.org/
Ref: http://webdesignledger.com/inspiration/15-stunning-examples-of-data-visualization
Where to find open dataOpen (and not open) data archivehttp://ckan.net/http://it.ckan.net/
Example of italian datasets:Dati.gov.it: http://www.dati.gov.it/5T: http://biennaledemocrazia.it/dataset/Dati Piemonte: http://dati.piemonte.itISTAT: http://dati.istat.it/Enel: http://data.enel.com/
60
Tools and linksONLINE DATA VISUALIZATIONG visualization Api: http://code.google.com/intl/it-IT/apis/chart/Tableau Public: http://www.tableausoftware.com/publicOpen Heat Map: http://www.openheatmap.com/
ONLINE STORAGE+VISUALIZATIONGoogle Public Data explorer: http://www.google.com/publicdata/homeIBM Many Eyes: http://www-958.ibm.com/software/data/cognos/manyeyes/Google Fusion tables: http://www.google.com/fusiontables/HomeImpure: http://www.impure.com/
CURATION & LINKINGGoogle RefineData Wrangler: http://vis.stanford.edu/wrangler/
OFFLINE TOOLSR: http://www.r-project.org/Jscript Library for data viz: http://thejit.org/Anche questa: http://vis.stanford.edu/protovis/Network / graph analysis / visualization: http://gephi.org/Language turing complete for dataviz for visual artist: http://processing.org/
61
wrap-up
1. Not all public data are open data
2. Public data and gov data are often “broken” (strange formats and ambiguous IP)
3. Open Data make sense if we put it in perspective - the rise of Big Data
62
everything is changing
63