ess event: big data in official statistics · many important players like google. the official...

21
1 ESS event: Big Data in Official Statistics v v erbi is

Upload: others

Post on 23-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

1

ESS event: Big Data in Official Statistics

v v erbi is

Page 2: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

2

LEARNING AND DEVELOPMENT: CAPACITY BUILDING AND TRAINING FOR ESS HUMAN RESOURCES FACILITATOR: JOSÉ CERVERA- FERRI

Parallel sessions 2A and 2B

Page 3: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

3

Session 2 Related Scheveningen challenges

[SCH5] Short-term Human Resources needs: recruitment, professional training, secondment/re-deployment

[SCH5] Long-term needs: academic curricula for Data Scientists

[SCH6] Collaboration with academia for training Data Scientists for official statistics

Page 4: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

4

Session 2: Topics for discussion

• Skills for Big Data

• Opportunities for building skills

• Proposal for a key input to the roadmap to be established by the ESS Task Force

• Cross-cutting: short-term vs long-term

Page 5: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

5

Session 2: Organization

Short-term Long-term

Skills for Big Data Session 2A

Opportunities for acquiring skills

Session 2A Session 2B

Proposal for a roadmap to acquire skills for Big Data in the ESS

Session 2B

Page 6: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

6

SKILLS FOR BIG DATA OPPORTUNITIES FOR ACQUIRING SKILLS

Parallel session 2A

Page 7: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

7

Session 2A

Preliminary considerations (1): Can NSIs rely on existing skills?

• “Non-traditional set of skills to develop”

• Trained statisticians and IT staff in statistics are already close to the “data science” skills required for Big Data (data cleaning, cubes, analytical software, data mining, etc.). Staff well-trained in methodology and statistical domains (UNECE Sprint paper, SWOT analysis – strength).

• The Official Statistics Community has less knowledge of Big Data than many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the new, non-traditional, technologies used to gather, process and analyse Big Data (UNECE Sprint paper, SWOT analysis – weakness).

Page 8: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

8

Session 2A Preliminary considerations (1):

Can NSIs rely on existing skills? (cont.)

• Young staff coming in from universities may be very innovative and already have a personal relationship with Big Data (Facebook, Google, Twitter trends) and less constrained by traditional IT and analysis (UNECE Sprint paper, SWOT analysis – opportunity).

• Failure to permit innovative methods might render OSC organizations less attractive workplaces for top talent (UNECE Sprint paper, SWOT analysis – threats).

• Cultural change:

– “a culture that values high quality and accurate information and regards the best way to achieve this through use of methods where the design can be controlled. Big Data doesn't allow this luxury”

– Innovative thinking, risk-taking (is it the realm of Civil Servants??)

Page 9: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

9

Session 2A Preliminary considerations (2):

Learning methods

• Learning by doing in OS

• Training individuals, or teams?

• The business analyst and project

manager • The mathematician who builds

algorithms • The data architect • The statistician (data collection, editing,

processing) • The communicator (visualization)

• Data analyst • Data scientist • Data engineer • Data integrator • System manager

Page 10: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

10

Session 2A Preliminary considerations (3):

Competition

• Competition with the Industry: better salaries in the private sector for Data Scientists?

• How to retain the talent?

Page 11: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

11

Session 2A Skills for Big Data

Data Scientist vs. Statistician

• Data Scientist as the “connective tissue” between data-processing technologies and data-driven decision making

Necessary skills: math/statistics, IT, visualization, subject matter specialization

• Math/stat: data mining techniques

• IT: Hadoop, MongoDB, NoSQL, …

Page 12: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

12

Session 2A: IT Skills for Big Data

• R-SAS-SPSS • Business Intelligence, Visual Analytics, Excel • MapReduce • Pig, Java • SQL • ETL (Extract, transform, load) • Linux…

• Which are the priorities?

Page 13: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

13

Session 2A Statistical Skills for Big Data

• Computational statistics

• Analytical methods: correlations & causality, modelling, network analysis, information reduction

• Dissemination: data visualization

• Which are the priorities?

Page 14: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

14

Session 2A Opportunities in the ESS

• ESS Learning and Development Framework

• ESTP 2014 course – Big Data: Effective Processing and Analysis of Very Large and Unstructured Data for Official

Statistics

• Contents: classification of various massive data sets, ETL (extract, transform, load), specific challenges, Privacy and statistical disclosure issues, comuting base, overview of statistical methods. Focus on concrete examples.

• Course requirements:

– Database fundamentals and data manipulation languages

– Data collection and integration tools

– Data mining techniques for large data sets

– Object-oriented design and programming

– Probablity and random variables

• Is there anyone with such a complete background in Official Statistics???

• European Masters in Official Statistics (EMOS): ESS certification of programmes offered by Universities – EMOS workshop 2014 (Helsinki, June 2014)

• Other methods for transfer of know-how within the ESS?

Page 15: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

15

OPPORTUNITIES FOR ACQUIRING SKILLS (CONT.) KEY INPUT TO THE ROADMAP TO BE ESTABLISHED BY THE ESSTASK FORCE

Parallel session 2B

Page 16: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

16

Sessions 2B Opportunities outside the ESS

Grasping the opportunities outside:

• Diversity of academic programmes on Big Data, Business Analytics, Data Science… (certification?)

• Training offer from private companies (certification?)

• Opportunities within Horizon 2020

Page 17: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

17

Session 2B [SH6] Collaboration with Academia

• Academic collaborators: use of existing expertise in statistical analysis of large sets of data: astronomy, remote sensing, genetics, image processing….

• Source of training: need for mapping academic programmes on Big Data

• How can academics be integrated with NSI staff?

• How can training be financed? National or ESS level?

Page 18: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

18

Session 2B Horizon 2020

• Marie Sklodowska-Curie actions: support for innovative training networks, mobility of researchers, inter-sectoral cooperation

• ICT 15 -2014: Big data and Open Data Innovation and take-up:

– Objective: To contribute to capacity-building by designing and coordinating a network of European skills centres for big data analytics technologies and business development. The network is expected to identify knowledge/skills gaps in the European industrial landscape and produce effective learning curricula and documentation to train large numbers of European data analysts and business developers, capable of (co)operating across national borders on the basis of a common vision and methodology

– Expected impact: Availability of deployable educational material for data scientists and data workers and thousands of European data professionals trained in state-of-the-art data analytics technologies and capable of (co)operating in cross-border, cross-lingual and cross-sector European data supply chains.

• Call on “Training and educating Data Scientists”

• More detailed linkages in Horizon 2020??

Page 19: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

19

Session 2B Input to the Roadmap: The actions

• Ideas for actions (which term?): – Identify existing skills in the ESS – Recruit Data Scientist with the missing skills – Establish a network of providers of Big Data skills within the ESS – Map the offer of Data Science training programmes in the private

sector and their applicability to OS – Establish a repository of assessed training materials – Establish agreements with private sector and academia as providers

of training,…

• Who? – NSIs, Eurostat, International organizations, private sector,

Academia? – Working Groups? Gexp (EMOS), HLG, ESTP, ???

• Which source of financing? – Horizon 2020? Eurostat? National budgets?

Page 20: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

20

Session 2B Input to the Roadmap: The actors

• Ideas for actors :

– NSIs

– Eurostat

– International organizations

– Universities

– Private sector

Page 21: ESS event: Big Data in Official Statistics · many important players like Google. The Official Statistics Community has limited skills and limited IT resources when it comes to the

21

Session 2B Input to the Roadmap for Big Data

training

• Brainstorming of ideas for building skills

• Assessment: sort by impact and ease of implementation

• Discussion of term, actors and level (national/EU/global),

• Proposal of responsibilities and time frame for the “Input

Rome Roadmap”