data science and official statistics myth or reality

33
Data Science and Official Statistics Myth or Reality ? Prof. Dr. Bertrand Loison Vice Director at Federal Statistical Office & Professor of Information Systems at University of Applied Sciences Western Switzerland Swiss Conference on Data Science (SDS2020) Online, 26 June 2020

Upload: others

Post on 25-Mar-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

17th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Data Science and Official Statistics Myth or Reality ?

Prof. Dr. Bertrand Loison

Vice Director at Federal Statistical Office & Professor of Information

Systems at University of Applied Sciences Western Switzerland

Swiss Conference on Data Science (SDS2020)

Online, 26 June 2020

27th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

« Ensuring statistics accurately reflect a changing economy is one of the hardest challenges NSIs face.

The economy’s complexity and structure are becoming increasingly difficult to capture within the basic conceptual framework that underpins the national accounts. When the statistical framework was first devised, the economy was one in which most businesses were engaged in the production of reasonably homogenous goods in a single country.

The reality today is rather different, with many businesses operating across national borders and producing a range of heterogeneous goods and services that may be tailored to the tastes of individual consumers. »Source: Charles Bean, Independent Review of UK Economic Statistics, (2016). https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/507081/2904936_Bean_Review_Web_Accessible.pdf, pp. 116.

Digital Transformation Leads to… A New Reality

37th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

1. The Need to adapt the way NSI measure our

How Do We Measure This New Reality ?

Source: NTTS 2015, (2015). https://www.researchgate.net/publication/273909550_Statistics_40

47th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Can Big Data and Data Science Help Measure This New Reality ?

67th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

1. Demystifying the ‘big data’ hype2. Demystifying the ‘Internet of things’ hype3. Demystifying the two approaches of analytics4. Demystifying ‘analytics of things’5. Process models for continuous improvement6. Conclusion

Agenda

77th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Data Collection (input) - Characteristics of Survey, Administrative and Big Data

Source: Rob Kitchin, The opportunities, challenges and risks of big data for official statistics, National University of Ireland, Maynooth, (2015). https://www.researchgate.net/publication/282421109_The_opportunities_challenges_and_risks_of_big_data_for_official_statistics, pp. 9.

1

2

3

4

Surveys

Registers andadministrative data

New data sources(big data)

2007

2017

87th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

• The term ‘big data’ — coined in 1997 by two researchers at the NASA — has acquired the trappings of a ‘religion’.

• But, what exactly are ‘big data’ ?

Big data are a data management IT infrastructure which should ensure that the underlying hardware, software and architecture have the ability to enable ‘learning from data’ or ‘making sense out of data’, i.e. ‘analytics’ ( ‘data-driven decision making’ and ‘data-informed policy making’).

The term ‘big data’ applies to an accumulation of data that can not be processed or handled using traditional data management processes or tools.

Big Data History

97th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Big Data Characteristics

• ‘Volume’, ‘Variety’ and ‘Velocity’ are the ‘essential’ characteristics of (big) data.

• ‘Veracity’ and ‘Value’ are the ‘qualification for use’ characteristics of (big) data.

• The 5th V of big data: ‘Value’ , i.e. the ‘usefulness of data’.

Source: B. Loison, D. Kuonen, FSO’s Data Innovation Strategy, Seminar on Data Science Campus activities in Kigali, UNGWG, (2019). https://www.slideshare.net/BertrandLoison/fsos-data-innovation-strategy-kigali-Rwanda, slide 12.

107th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Big Data (New Data Sources) – Tentative Taxonomy

Social Networks(human-sourced information)

Internet of Things(machine-generated data)

Traditional Business Systems(process-mediated data)

• Social Networks: Facebook, Twitter, Tumblr, etc.

• Blogs and comments• Personal documents• Pictures: Instagram, Flickr,

Picasa, etc.• Videos: YouTube, etc.• Internet searches• Mobile data content: text

messages• User-generated maps• E-Mail

• Data produced by Public Agencies

• Medical records

• Data produced by Businesses • Commercial transactions • Banking/stock records • E-commerce • Credit cards

• Fixed sensors• Home automation • Weather/pollution

sensors • Traffic sensors/webcam• Scientific sensors • Security/surveillance

videos/images • Mobile sensors (tracking)

• Mobile phone location • Cars• Satellite images

Source: Big Data: Potential, Challenges, and Statistical Implication, IMF, (2017). https://www.imf.org/en/Publications/Staff-Discussion-Notes/Issues/2017/09/13/Big-Data-Potential-Challenges-and-Statistical-Implications-45106, pp. 9 - 10.

117th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Potential Use of Big Data in Official Statistics

Source: Big Data: Potential, Challenges, and Statistical Implication, IMF, (2017). https://www.imf.org/en/Publications/Staff-Discussion-Notes/Issues/2017/09/13/Big-Data-Potential-Challenges-and-Statistical-Implications-45106, pp. 11.

147th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

1. Demystifying the ‘big data’ hype2. Demystifying the ‘Internet of things’ hype3. Demystifying the two approaches of analytics4. Demystifying ‘analytics of things’5. Process models for continuous improvement6. Conclusion

Agenda

157th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

• The term ‘Internet of Things’ (IoT) — coined in 1999 by the technologist Kevin Ashton —starts acquiring the trappings of a ‘new religion’ !

Source: Christer Bodell, ‘SAS Institute and IoT’, May 30, 2017 (goo.gl/cVYCKJ)

Internet of Things (IoT)

167th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

• ‘Volume’, ‘Variety’ and ‘Velocity’ are the ‘essential’ characteristics of IoT (data).

• ‘Veracity’ and ‘Value’ are the ‘qualification for use’ characteristics of IoT (data).

• The five V of big data are the sameas the five V of IoT (data).

Internet of Things (IoT) Characteristics

Source: Statistical Journal of the IAOS, vol. 35, no 4, pp 615-622, 2019

177th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

1. Demystifying the ‘big data’ hype2. Demystifying the ‘Internet of things’ hype3. Demystifying the two approaches of analytics4. Demystifying ‘analytics of things’5. Process models for continuous improvement6. Conclusion

Agenda

187th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Data Processing (throughput) – Two Approaches of Analytics

1

2

3

4

Surveys

Registers andadministrative data

New data sources(big data)

Statistical InformationSystems (SIS)

+Complementary

analytics methods

2007

2017

2017

6

5

20102017

197th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Source: João Tapadinhas, VP Business Analytics and Data Science, Gartner, June 2014 (goo.gl/YmjFPB)

Questions Analytics try to answer & Analytics Continuum

«16. Big data breaks with the traditional method to search for causality.

Working with big data implies seeking patterns and correlations that may not tell us why something is happening, but rather alert us that it is happening […].

In this vein, new indicators can be developed to obtain real-time correlations and to establish a more comprehensive early-warning system (Kitchin 2015) that can monitor the buildup of country specific as well as systemic risks in the real, external, fiscal, and financial sectors. »Source: Big Data: Potential, Challenges, and Statistical Implication, IMF,(2017). https://www.imf.org/en/Publications/Staff-Discussion-Notes/Issues/2017/09/13/Big-Data-Potential-Challenges-and-Statistical-Implications-45106, pp. 11 - 12.

207th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Primary analytics or top-down (i.e. explanatory and confirmatory) analytics.

‘Idea (hypothesis) evaluation or testing’.

Analytics’ paradigm: ‘deductive reasoning’ as ‘idea (theory) first’.

Statistics traditionally is concerned with analysing primary (e.g. experimental or ‘made’ or ‘designed’) data that have been collected (and designed) for statistical purposes to explain and check the validity of specific existing ‘ideas’ (‘hypotheses’), i.e. through the operationalisation of

theoretical concepts.

Deductive Reasoning

217th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Secondary analytics or bottom-up (i.e. exploratory and predictive) analytics.

‘Idea (hypothesis) generation’.

Analytics’ paradigm: ‘inductive reasoning’ as ‘data first’.

Data science — a rebranding of ‘data mining’ and as a term coined in 1997 by a statistician —on the other hand, typically is concerned with analysing secondary (e.g. observational or ‘found’

or ‘organic’ or ‘convenience’) data that have been collected (and designed) for other reasons (and often not ‘under control’ or without supervision of the investigator) to create new ideas

(hypotheses or theories).

Inductive Reasoning

227th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

• The two approaches of analytics, i.e. deductive and inductive reasoning, are complementary and should proceed iteratively and side by side in order to enable data-driven decision making, data-informed policy making and proper continuous improvement.

• The inductive-deductive reasoning cycle:

Source: Box, G. E. P. (1976). Science and statistics. Journal of the American Statistical Association, 71, 791–799.

The Inductive - Deductive Reasoning Cycle - #1

« Neither exploratory nor confirmatory is sufficient alone. To try to replace either by the other is madness. We need them both. »

John W. Tukey, 1980

237th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Theory

Hypothesis

Survey

Data

Pattern

Literature

Deductive approach(currently uses

analytical methods)

Start of data science approach

Start of classical approach

Inductive approach(uses complementaryanalytical methods)

The Inductive - Deductive Reasoning Cycle - #2

247th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

1. Demystifying the ‘big data’ hype2. Demystifying the ‘Internet of things’ hype3. Demystifying the two approaches of analytics4. Demystifying ‘analytics of things’5. Process models for continuous improvement6. Conclusion

Agenda

257th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

It becomes key to execute analytics and related ‘data quality processes’ on the data-gathering devices themselves, i.e. at the edge (or at the ‘endpoint’), or as close to the originating data source as possible.

The ‘Analytics of Things’ (AoT) corresponds to the ‘analytics layer’ that occurs with the IoT devices and their generated data.

For example, in practice, the most efficient way to control data quality is to do it at the point where the data are created, as cleaning up data downstream (and hence centralised) is expensive and not scalable.

‘Analytics at the edge’ or ‘edge analytics’ is based on a distributed ‘IT architecture layer’ called ‘edge computing’.

Analytics of Things (AoT)

267th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Pushing Computation Out – Moving the Analytics to the Data - #1

287th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

1. Demystifying the ‘big data’ hype2. Demystifying the ‘Internet of things’ hype3. Demystifying the two approaches of analytics4. Demystifying ‘analytics of things’5. Process models for continuous improvement6. Conclusion

Agenda

307th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

1. “Plan–Do–Check–Act” (PDCA)

2. “Generic Statistical Business Process Model” (GSBPM)

The ‘Plan–Do–Check–Act’ (PDCA) cycle is often referred to as the Deming cycle, Deming wheel or the

Shewhart cycle.

The ‘Generic Statistical Business Process Model’ (GSBPM) — coordinated through the ‘United Nations Economic Commission for Europe’ (Version 5.1 as of January 2019) — is consistent with the PDCA cycle.

Process models for continuous improvement - #2

337th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

• Moreover, the GSBPM is a deductive reasoning and a sequential approach. For example, the first GSBPM steps are entirely focused on deductive reasoning for primary data collection and are not suited for inductive reasoning applied to (already existing) secondary data. Moreover, the evaluation (‘Evaluate’ step) is only performed at the end.

• This process model needs to be adapted to incorporate analytics by taking into account both approaches of analytics (i.e. inductive and deductive reasoning) and through the usage of, for example, data-informed continuous evaluation at any GSBPM step.

Current production processes of official statistics need to be augmented and empowered by analytics (of things) !

Process models for continuous improvement - #4

347th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

3. “CRoss Industry Standard Process for Data Mining”

4. The complementary cycles of developing and deploying “analytical assets”

The CRISP-DM (‘CRoss Industry Standard Process for Data Mining’) process — initially conceived in 1996 — is also consistent with the PDCA cycle

Source: Erick Brethenoux, Director, IBM Analytics Strategy & Initiatives, August 18, 2016 (goo.gl/AhsG1n).

Process models for continuous improvement - #5

357th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Statistical Production Process for Official Statistics

Source: Swiss Federal Statistical Office (2018). https://www.slideshare.net/BertrandLoison/fsos-data-innovation-strategy.

Source: Statistical Journal of the IAOS, vol. 35, no 4, pp 615-622, 2019

367th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

1. Demystifying the ‘big data’ hype2. Demystifying the ‘Internet of things’ hype3. Demystifying the two approaches of analytics4. Demystifying ‘analytics of things’5. Process models for continuous improvement6. Conclusion

Agenda

377th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Task Teams

1. Access and Partnerships2. Big Data and the Sustainable Development Goals3. Mobile Phone Data4. Satellite Imagery and Geo-Spatial Data5. Scanner Data6. Social Media Data7. Training, Skills and Capacity-building8. Committee on Global Platform for Data, Services and

Applications

United Nations - UN Global Working Group on Big Data for Official Statistics

387th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Eurostat – Experimental Statistics (ESSnet Big Data)

Source: Eurostat, (2019). https://ec.europa.eu/eurostat/web/experimental-statistics..

397th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Federal Statistical Office – Experimental Statistics

Source: Federal Statistical Office, (2019). https://www.bfs.admin.ch/bfs/en/home/services/recherche/experimental-statistics.html.

407th Swiss Conference on Data Science – Data Science and Official Statistics Myth or Reality | Swiss Federal Statistical Office | Prof. Dr. Bertrand Loison | Online| 26 June 2020

Data Science Competence Center

Source: Swiss Federal Council (2020). https://www.admin.ch/gov/de/start/dokumentation/medienmitteilungen/bundesrat.msg-id-79101.html