open data and the programmable city 1_rob ktichin...the data revolution •conceptualisation of data...

21
The Impact of the Data Revolution on Official Statistics: Opportunities, Challenges and Risks Prof. Rob Kitchin National University of Ireland Maynooth @robkitchin

Upload: others

Post on 22-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

The Impact of the Data Revolution on

Official Statistics:

Opportunities, Challenges and Risks

Prof. Rob Kitchin

National University of Ireland Maynooth

@robkitchin

Page 2: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

Background

• All-Island Research Observatory

(www.airo.ie)

• Dublin Dashboard

(www.dublindashboard.ie)

• Digital Repository of Ireland (www.dri.ie)

• The Programmable City

Page 3: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

The data revolution

• Conceptualisation of data

• Data infrastructures

• Open and linked data

• Big data

• Data analytics

• Data uses

• Data markets

• Data ethics

• Create disruptive innovations that offer opportunities, challenges and risks for government, business and academy

Page 4: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

Data infrastructures

• Actively planned, curated and managed

• Enables storing, scaling, combining, sharing and consuming data across networked archives and repositories

• NSIs long operated as such trusted data infrastructures

• Now a need to organise into more coordinated platforms ― National Data Infrastructures that extends across govt depts, with: • dedicated and integrated hardware and networked technologies;

interoperable software and middleware services and tools; shared standards, protocols, metadata; shared services, analysis tools & policies

• Also skilled data/statistical staffing operating across government departments/agencies

• Can handle big data streams and diverse forms of data ― administrative, survey, operational, services/infrastructure, spatial/planning, sensor/IoT, scientific, crowdsourced, locative and social media, derived, etc.

• Can federate into larger pan-national infrastructures (Eurostat, ESPON, UN, etc)

Page 5: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

Open and linked data

• Opening PSI (and other) data for re-use

• Driven by arguments re. transparency, participation, collaboration, economic development

• Linking data/metadata using non-propriety formats and URIs and RDF

• NSIs already very active in this space; other government data providers much further beyond

• More to be done, especially: • retro opening and linking of historical records

• producing APIs & machine-readable formats

• upgrading extent of openness (licensing re. re-use, reworking, redistribution, reselling)

• using non-proprietary formats

• opening data about the organizations themselves

• Creating user-friendly analysis tools

Page 6: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

Usability and utility Five levels of open and linked data

Level Form Benefits Costs

1 * Non-machine readable Data is available Data is locked in document and is

difficult to release.

2 ** Machine-readable but

using propriety format

(e.g., Excel)

Data can be analyzed with

propriety software; data can be

exported in other formats

Depends on propriety software to

access and use.

3 *** Machine-readable using

non-propriety format (eg.,

.CSV)

Data can be analyzed in any

software package

Is data on the Web, not data in

the Web, and is not linked in

nature and so exists in isolation.

4

****

Machine-readable, using

non-propriety format and

URIs and RDF

Data can be accessed from

anywhere on Web, be easily

linked to and combined with

other data, and plugged into

existing tools and libraries.

Can increase data preparation

time and data management and

curation.

5

*****

Machine-readable, using

non-propriety format and

URIs and RDF, and linking

to other data and

metadata

As level 4, but data becomes

more discoverable and users

have full access to data

schema/ontology

Needs active data management

to maintain inward and outward

links.

Page 7: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

Big data

Characteristic Small data Big data

Volume Limited to large Very large

Exhaustivity Samples Entire populations

Resolution and

indexicality

Coarse & weak to tight

& strong

Tight & strong

Relationality Weak to strong Strong

Velocity Slow, freeze-framed Fast

Variety Limited to wide Wide

Flexible and scalable Low to middling High

Page 8: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

Big data

• Diverse range of public and private generation of fine-scale data about citizens, activities and places in real-time: • utilities

• transport providers, logistics systems

• environmental agencies

• mobile phone operators

• app developers

• social media sites

• travel and accommodation websites

• home appliances and entertainment systems

• financial institutions and retail chains

• private surveillance and security firms

• remote sensing, aerial surveying

• emergency services

Page 9: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

Big

data

and o

ffic

ial st

ati

stic

s (s

ourc

e E

SSC

2014) Data source Data type Statistical domains

Mobile communication Mobile phone data Tourism statistics

Population statistics

WWW Web searches Labour statistics

Migration statistics

e-commerce websites Price statistics

Businesses’ websites Information society statistics

Business registers

Job advertisements Employment statistics

Real-estate websites Price statistics (real estate)

Social media Consumer confidence; GDP and

beyond; information society

statistics

Sensors Traffic loops Traffic/transport statistics

Smart meters Energy statistics

Satellite images Land use statistics; agricultural

statistics; environment statistics

Automatic vessel identification Transport and emissions statistics

Transactions of process

generated data

Flight movements Transport and emissions statistics

Supermarket scanner and sales data Price statistics

Household consumption statistics

Crowdsourcing Volunteered geographic information

(VGI) websites (OpenStreetMap,

Wikimapia, Geowiki)

Land use

Community pictures collections

(flickr, Instagram, Panoramio)

-

Page 10: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

Data analytics

• Challenge of making sense of big data is coping with its: • abundance and exhaustivity

• timeliness and dynamism

• messiness and uncertainty

• semi-structured or unstructured nature

• Solution has been machine learning made possible by advances in computation

• Four broad classes of analytics: • data mining and pattern recognition

• statistical analysis

• prediction, simulation, and optimization

• data visualization and visual analytics

Page 11: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data
Page 12: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

New paradigms?

• Big data, coupled with new data analytics, challenges established

epistemologies across the sciences, social sciences and humanities

• Transforming how we frame, ask and answer questions

• Some argue leading to new paradigms within and across disciplines

• For Kuhn (1962) paradigm shifts are driven by science being unable to account

for particular phenomena or answer key questions

• For Gray (2009) paradigm shifts are also driven by new forms of measurement,

data and analytical techniques. He charts the evolution of science through four

broad paradigms

Paradigm Nature Form When

First Experimental science Empiricism; describing natural phenomena

pre-Renaissance

Second Theoretical science Modelling and generalization pre-computers

Third Computational science Simulation of complex phenomena pre-big data

Fourth Exploratory science Data-intensive; statistical exploration and data mining

Now

Page 13: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

End of theory vs data-driven science

• Some suggest that big data ushers in a new era of empiricism wherein data can speak for themselves free of theory

• ‘End of theory’ thesis challenges traditional statistical approach

• Anderson (2008) argues: ‘The data deluge makes the scientific method obsolete’; that the patterns and relationships contained within big data inherently produce meaningful and insightful knowledge

• For others it is leading to new era of data-intensive science and a radically new extension of the established scientific method

• Differs from traditional, experimental deductive design in that it seeks to generate hypotheses and insights ‘born from the data’ rather than ‘born from the theory’

• The epistemological strategy is to use guide knowledge discovery techniques to identify potential questions worthy of further examination and testing

• Both are different to traditional ways NSI data are analyzed

Page 14: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

Big data and official statistics

• Given its scope, timeliness & resolution big data have captured the interest of:

• NSIs

• Eurostat, the European Statistical System (ESS)

• United Nations Economic Commission for Europe (UNECE)

• United Nations Statistical Division (UNSD)

• In 2013 EU NSIs signed the Scheveningen Memorandum to examine the use of

big data in official statistics

• Initial analysis indicates that whilst big data offer a number of opportunities for

official statistics, they also offer a series of challenges and risks

Page 15: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

Big data: opportunities

• Complement, replace, improve, and add to existing datasets/statistics

• Produce more timely outputs ― nowcasting

• Compensate for survey fatigue of citizens and companies

• Complement and extend micro-level and small area analysis

• Improve quality and ground truthing

• Refine existing statistical composition

• Easier cross-jurisdictional comparisons

• Better linking to other datasets

• New data analytics producing new and better insights

• Reduced costs

• Optimization of working practices and efficiency gains in production

• Redeployment of staff to higher value tasks

• Greater collaboration with computational social science, data science, and data industries

• Greater visibility and use of official statistics

Page 16: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

Big data: challenges

• Forming strategic alliances with big data producers

• Gaining access to data, procurement and licensing

• Gaining access to associated methodology and metadata

• Establishing provenance and lineage of datasets

• Legal and regulatory issues, including intellectual property

• Establishing suitability for purpose

• Establishing dataset quality with respect to veracity (accuracy, fidelity), uncertainty, error, bias, reliability, and calibration

• Technological feasibility re. transferring, storing, cleaning, checking, and linking big data

• Methodological feasibility re. augmenting/producing OSs

• Experimenting and trialing big data analytics

• Institutional change management and staff re-skilling

• Ensuring inter-jurisdictional collaboration and common standards

Page 17: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

Big data: risks

• Mission drift

• Data quality and losing control of generation / sampling /

processing

• Inconsistent access and continuity (breaks in method/time-

series)

• Privacy breaches and data security

• Damage to reputation and losing public trust

• Resistance of big data providers and populace

• Fragmentation of approaches across jurisdictions

• Resource constraints and cut-backs

• Privatisation and competition

Page 18: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

Way forward

• UNECE Big data sandbox

• Hosted by Central Statistics office (CSO) and the Irish Centre for High-End Computing (ICHEC)

• Technical platform to:

• test the feasibility of remote access and processing

• test whether existing statistical standards / models / methods etc. can be applied to big data

• determine which big data software tools are most useful for statistical organisations

• learn more about the potential uses, advantages and disadvantages of big data sets ― ‘learning by doing’

• build an international collaboration community to share ideas and experiences on the technical aspects of using big data

Page 19: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

Way forward

• Need to find common international positions on:

• conceptual and operational (management, technology, methodology) approach and dealing with risks;

• other roles NSIs might adopt, such as becoming the arbiters or certifiers of big data quality, or becoming clearing houses for statistics from non-traditional sources

• resolving issues of access, procurement, licensing, and standards

• identifying and tackling privacy, ethics, security, legal, and governance issues

• establishing best practices for change management that will maintain quality standards, continuity and trust

• resourcing at national and international scales

Page 20: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

Conclusion

• A data revolution is underway

• a fundamental shift in data openness and sharing

• the scaling into data infrastructures

• big data and new data analytics

• Creating a set of disruptive innovations that are producing

opportunities, challenges and risks for NSIs and statistical

systems

• It is important for NSIs to get ahead of the curve with

respect to challenges and risks, becoming proactive not

reactive and setting the agenda for new data and statistical

innovations

• This requires conceptual, practical, technical and strategic

thought and a coordinated approach

Page 21: Open Data and the Programmable City 1_Rob Ktichin...The data revolution •Conceptualisation of data •Data infrastructures •Open and linked data •Big data •Data analytics •Data

[email protected]

@robkitchin

Kitchin, R. (2014) Big data, new epistemologies and paradigm shifts. Big Data and Society 1 (April-June): 1-12.

Kitchin, R. (2015) The opportunities, challenges and risks of big data for official statistics. Statistical Journal of the International Association of Official Statistics 31(3): 471-481.

Kitchin, R. and Lauriault, T. (2015) Small data in the era of big data. GeoJournal 80(4): 463-475

Kitchin, R., Lauriault, T. and McArdle, G. (2015) Knowing and governing cities through urban indicators, city benchmarking & real-time dashboards. Regional Studies, Regional Science 2: 1-28

http://www.nuim.ie/progcity

@progcity