23 11:00 big data in official statistics data...26 march 2015 -- 09:30–11:00 big data in official...

38
26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring School 23 – 27 March 2015 EUROSTAT Task Force on Big Data & Unit G-3 ‘Short-term business statistics and tourism’

Upload: others

Post on 13-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

26 March 2015 -- 09:30–11:00

Big data in official statistics

Part II The case of mobile phone data for tourism statistics

EMOS Spring School

23 – 27 March 2015

EUROSTAT Task Force on Big Data &

Unit G-3 ‘Short-term business statistics and tourism’

Page 2: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Outline of the session

Feasibility study on using mobile phone data for tourism statistics

Rationale and objectives of the project

Barriers to access

Methodological challenges

Coherence

Opportunities and benefits

Conclusions and points for further discussion

Q&A

2

Page 3: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Why a project on using mobile phone data?

The world around us is changing

Changing geo-political environment

e.g. free movement of persons in Schengen area border surveys

Quickly evolving technology and large-scale adoption of tools/devices

Changing working environment of official statisticians

New technologies, new techniques, new sources and

a new 'Zeitgeist' boost and stimulate a paradigm shift

in official statistics

3

Page 4: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Why a project on using mobile phone data?

Potential of mobile positioning data, expectations

Making collection and compilation of data more efficient: reducing

burden and improving quality?

e.g. reduction of data entry error, reduction of recall bias (short

trips, same-day visits)

Partly replace data collection on tourism flows within the EU

(domestic, outbound)?

Complete or enhance current data on domestic and outbound

tourism flows (Regulation 692/2011) with data on total inbound

tourism flows?

4

Page 5: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Why a project on using mobile phone data?

Potential of mobile positioning data, expectations (continued)

Further harmonisation?

e.g. use of algorithms rather than subjective opinion/memory of

the respondent

Extension to other domains?

e.g. travel, passenger mobility, migration

Information previously not available

e.g. data at more detailed regional level or destination level,

infra-monthly data (day, week, weekends)

5

Page 6: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Which were the main objectives of the project?

In a nutshell:

Getting answers to the many questions

raised by "doubters"/"non-believers"

(but also by "believers") in big data,

in particular mobile phone data as a

source for tourism statistics

Is this only a daydream nation or

possibly a promised land for statisticians?

"What about those who don't

use mobile phones?"

"I live near the border and sometimes

connect to a foreign network!"

"Tourists buy foreign SIM cards when travelling,

don't they?"

6

Page 7: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Which were the main objectives of the project?

St. Peter's Square, Vatican City, 2005,

Benedictus becomes the new pope

St. Peter's Square, Vatican City, 2013,

Franciscus becomes the new pope

7

But if the coverage is not complete, how can we use it as a

reliable basis?

We should also look at how things are

currently being done!

0

10

20

30

40

50

60

70

80

90

100

IS FI SE NO LU IT DK

NL

CY

UK SI IE AT

MT ES EE PT

DE

CZ

SK LV EU BE

HU FR LT EL PL

BG

RO

Percentage of households having access to a mobile phone (2006)

Italy: 93%

Penetration rate of fixed lines for CATI

interviews by ISTAT: 49%

Page 8: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Which were the main objectives of the project?

"Vertical" objectives (task by task)

Assess feasibility to access databases with mobile

positioning data in European countries

Assess the feasibility to use mobile positioning data for

tourism statistics in the European context

Identify, discuss and address the main challenges for

implementation

Assess the potential impact on cost-efficiency of data

production

Assess the possibility to expand the methodology to other

domains and define joint algorithms

8

Page 9: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Which were the main objectives of the project?

"Horizontal" objectives (cross-cutting approach)

Mix of scientific/theoretical & practical/empirical/

applied work !

Can the methodology/technology be applied to the particular

case of tourism statistics (with its specific international

definitions)?

Can it be applied across a wide group of countries in a similar

way?

Can the outcomes be generalised to all countries?

9

Page 10: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Who carried out the feasibility study?

A multidisciplinary, international consortium (DE, EE, FR, FI)

National statistical institutes

Tourism reseachers

Academics

Data scientists

10

Page 11: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Where can I find the reports?

All reports are publicly available for download from the

Eurostat website

http://epp.eurostat.ec.europa.eu/portal/page/portal/tourism/methodology

/projects_and_studies

1 consolidated report (50 pages, incl. 10 pages executive summary)

5 comprehensive reports:

Stock-taking

Feasibility of access

Feasibility of use (methodological issues)

Feasibility of use (coherence)

Opportunities and benefits

11

Page 12: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Stock-taking

Inventory of the work already done (focus on Europe)

Use of mobile positioning data for research, in particular for

statistics on tourism flows or any other field of official statistics

Institutional set-up (users involved, MNOs involved, technological

aspects)

Outcomes (success? failure?) and lessons to be learnt for this

project

31 cases with access to data were documented (in official

statistics, private or government initiatives, scientific research)

12

Page 13: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Access

Discussion of potential barriers

(and how to overcome these)

privacy issues (operator, national law)

technical issues

financial and business related issues

Improving access to mobile positioning data

is THE main short term challenge in order to

pave the way for a more generalised use of

this source of big data!

13

Page 14: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Access

First things first … what is mobile positioning data?

Stored records of activities of mobile devices by the mobile network

operator (MNO)

Types of data

Call detail records (CDR)

on average 4 events per subscriber per day

Data detail records (e.g. internet usage)

on average 200 events per subscriber per day

Location updates

on average 12 events per subscriber per day

Technical data

on average 100 events per subscriber per day

14

Page 15: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Access

First things first … what is mobile positioning data?

(continued)

For the purpose of the feasibility study, call detail records were used

Good quality as MNOs use this for billing purposes, but

nevertheless certain limitations (see further)

Basic information consists of

− subscriber ID

− country code

− time of the event

− type of event (call, sms, data)

− cell ID (location)

15

Page 16: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Access

Barrier #1: privacy protection and legislation

Relevant legislation

Data Protection Directive (Directive 1995/46/EC and its

successor, the General Data Protection Regulation)

Electronic Privacy Directive (Directive 2002/58/EC)

Data Retention Directive (Directive 2006/24/EC)

Opinions of the Article 29 Data Protection Working Party

But also…

European Statistics Regulation (Regulations 223/2009/EC)

16

Page 17: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Access

Barrier #1: privacy protection and legislation

Directly or indirectly identifiable personal data

(e.g. mobile positioning data) can be used and

processed for statistics if one of the following is

true:

1. The subscriber has given his/her consent, or

2. National legislation allows the NSI and compels the MNOs, or

3. Data is processed in a fully anonymous way

17

Page 18: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Access

Barrier #1: privacy protection and legislation

Grey area in the interpretation of the key concept "personal data"

Because the end result of processing is, by itself, anonymous

(aggregated data), the processing of personal data for such

purpose can be interpreted as appropriate

Reluctance to grant access

Fear of public opinion

Big differences in the NSIs' rights to access data

Access to mobile positioning data can range from relatively easy

to nearly impossible

Strong need for a harmonised legal and methodological framework

for NSIs to access data from mobile network operators

18

Page 19: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Access

Barrier #2: technical feasibility

Complicated but possible

Not considered a hard barrier

Some possible issues

Differences in network systems

Patents

Processing (volume of the data)

Continuous data update

(processing time)

19

Page 20: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Access

Barrier #2: technical feasibility

Choice of the data compilation process:

decentralised or centralised? who pays what? quality assessment?

20

Page 21: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Access

Barrier #3: financial and business related aspects

MNOs are interested if the following is considered

Legal aspects and regulations

Public opinion

Business secrets (e.g. sensitive data such as share in the country's

roaming market)

Costs versus benefits (burden in terms of costs is significant:

implementation of extraction system, maintenance, human

resources, …): big data ≠ free data

MNOs need incentives / expect a mutually beneficial relation

Remuneration scheme for the provision of data, or

Ability to use the data for own purposes (internal, profit-making)

21

Page 22: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Methodological challenges

'Universal' issues

Data collection & compilation: sampling design, stratification, calibration

Issues that are inherent to mobile phone data

Representativeness (systematic / sampling bias?) of the technique,

assessment compared to traditional techniques for data collection?

e.g. structural bias: increase in trips or only increase in use?

overcoverage / undercoverage (> 1 SIM card, foreign SIM card)

Applying tourism statistics scope and definitions?

exclude flows within the usual environment, longitudinal data, …

Not more significant that similar shortcomings of 'traditional' sources

22

Page 23: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Methodological challenges

Issues that are inherent to new technologies

Continuity of data access

flexibility of changing the data requirements (e.g. new breakdown)

robustness of series if one or more MNOs drop out

contingency planning if all MNOs stop providing data

Shifts in technology and consumer behaviour

new devices and their impact on the way people communicate

new services (e.g. relevance of call detail records in 2020?)

bigger exposure to exogenous factors makes

close monitoring and constant innovation

essential conditions for using big data in official statistics

23

Page 24: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Methodological challenges

Location vs. antenna data: probabilistic geographical distribution

24

Page 25: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Methodological challenges

Effect of using different administrative borders on usual environment

25

Using LAU-1 for defining usual environment

Using LAU-2 for defining usual environment

Page 26: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Methodological challenges

Limitations of mobile positioning data for tourism statistics

Not entirely compatible with existing definitions and breakdowns

Mostly unknown purpose of the trip

No information on expenditure

Mostly unknown means of transport

Generally no socio-demographic breakdowns

The need for longitudinal data (to determine usual place of residence

& usual environment) is an additional barrier to getting access

26

Page 27: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Coherence

Analysis of the coherence of output based on mobile

positioning data versus existing tourism statistics

Domain coverage: domestic, inbound, outbound

Breakdown into tourism trips and same-day visits

Coherence with existing indicators, and reasons for deviations

27

Page 28: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Coherence

28

0

50 000

100 000

150 000

200 000

250 000

300 000

350 000

Jan-

09

Mar

-09

May

-09

Jul-0

9

Sep-

09

Nov

-09

Jan-

10

Mar

-10

May

-10

Jul-1

0

Sep-

10

Nov

-10

Jan-

11

Mar

-11

May

-11

Jul-1

1

Sep-

11

Nov

-11

Jan-

12

Mar

-12

May

-12

Jul-1

2

Sep-

12

Nov

-12

MOB_IN(EU-27)_OVERNIGHT SUPPLY_EE(EU-27)_ARR

Inbound overnight trips (vs. accommodation statistics) Inbound, outbound overnight trips (vs. ferry passengers data)

0

50 000

100 000

150 000

200 000

250 000

300 000

350 000

400 000

450 000

500 000

Q1-

09

Q2-

09

Q3-

09

Q4-

09

Q1-

10

Q2-

10

Q3-

10

Q4-

10

Q1-

11

Q2-

11

Q3-

11

Q4-

11

Q1-

12

Q2-

12

Q3-

12

Q4-

12

MOB_OUT(EU-27)_OVERNIGHT DEMAND_EE(EU-27)_OVERNIGHT

Outbound overnight trips (vs. demand side data)

0

20 000

40 000

60 000

80 000

100 000

120 000

140 000

160 000

180 000

Jan-

09M

ar-0

9M

ay-0

9

Jul-

09Se

p-09

Nov

-09

Jan-

10

Mar

-10

May

-10

Jul-

10Se

p-10

Nov

-10

Jan-

11M

ar-1

1M

ay-1

1Ju

l-11

Sep-

11

Nov

-11

Jan-

12M

ar-1

2M

ay-1

2Ju

l-12

Sep-

12N

ov-1

2

MOB_EE(RU) BORDCONT_EE(RU)

Inbound overnight trips (vs. border control data)

Better coverage

Less recall bias

Page 29: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Opportunities and benefits

Assessment of strengths and weaknesses

(as compared to the current production methodology)

Relevance

+ Completeness: better coverage, larger scope

+ New statistics, indicators, breakdowns previously not available

(e.g. finer granularity of space and time)

− Lack of socio-demographic variables and some domain-specific

variables (purpose of trips, expenditure, …)

Timeliness

+ Increased integration and automation leads to better timeliness,

up to near-real-time data (but impact on the cost!)

29

Page 30: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Opportunities and benefits

Accuracy

+ Absence of non-response

+ Absence of memory effects or recall bias

− Some overcoverage and undercoverage issues

− Measurement error (# observations vs. precision of location/duration)

Coherence and comparability

+ Good coherence with existing series

+ Synergies with related domains

(BOP travel, transport and urban mobility, etc.)

+ Use of joint algorithms leads to better comparability across domains

(and over time)

+ Additional calibration source for 'traditional' data

30

Page 31: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Opportunities and benefits

Cost and burden

+ Elimination of direct respondent burden

+ Elimination of traditional data entry (important error source!)

+ Possibly more cost-efficient than traditional surveys

− Piloting and implementation cost (start up), regular production cost

− Possibly parallel processes (big data / traditional data) in a first phase

− New skills needed

− Dependency on external data providers (in casu MNOs)

31

Page 32: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Opportunities and benefits

findings concerning the costs

Example: 3 MNOs (10, 5, 1 million subscribers), 15-day latency

High implementation cost, low annual running cost

Processing within the NSI is less costly (compared to decentralised)

32

Page 33: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Opportunities and benefits

findings concerning the costs

Impact of desired latency and of number of subscribers

Cost of data extraction from MNOs increases proportionally with the

number of subscribers and with the allowed latency.

Initial implementation and automation is expensive, maintaining the

system much less so.

33

Page 34: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Conclusions & points for further discussion #1

At present, mobile positioning data cannot replace current

statistics but can give complementary and/or faster results

However… official statisticians have to think out of the box

and leave their comfort zone

The existing scope and definitions are – besides user needs – based on

the available sources and methodologies at the time of development

Do not repeat but do better !

Use of big data necessitates a revolution of the mindset rather than a

simple evolution !

Re-thinking indicators, zero-base user need analysis instead of

incremental changes in the existing frame

34

Page 35: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Conclusions & points for further discussion #2

Implications for the statistical community

Follow the developments (before others take over our business!)

Explore mixed-mode solutions (e.g. large samples based on big data

+ smaller follow-up survey to collect domain-specific information)

Need for horizontal (across domains) and international cooperation in

this area (e.g. Task Force on Big Data)

Implementation will benefit from initiatives covering several countries

(and domains)

Same market structure (MNOs as contact point!)

Same methodological challenges and limitations

Same user needs (at least at European level)

35

Page 36: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Conclusions & points for further discussion #3

Achievements of the study

Impressive set of reports addressing many questions

"Everything I always wanted to know about using mobile

positioning data for statistics, but was afraid to ask"

This feasibility study should serve as a starting point & reference for

many projects to come

In the area of mobile positioning data, but also other types of big

data

In the area of tourism, but also in many other fields of statistics

36

Page 37: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

Conclusions & points for further discussion #4

Next steps by the ESS/Eurostat ?

Multi-country and multi-domain project in the pipeline (tbc)

Access is a critical factor the number of statistical domains

analysed & assessed should be maximised (e.g. population,

balance of payments, transport & urban mobility, tourism)

Involve several countries, possibly two-speed approach

Use of data stored by Mobile Network Operators

Expected output

Partnerships with MNOs

Studying data structures and defining data access standards

Testing data compilation and assessing quality

37

Page 38: 23 11:00 Big data in official statistics data...26 March 2015 -- 09:30–11:00 Big data in official statistics Part II The case of mobile phone data for tourism statistics EMOS Spring

38

Thank you

for your

attention!