big data visualization

28
Edwin de Jonge, December 3, 2013 Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and Alex Priem

Upload: edwin-de-jonge

Post on 14-Jun-2015

237 views

Category:

Technology


2 download

DESCRIPTION

Presentation given at Innovative Approaches to Turn Statistics into Knowledge, 2 - 4 December 2013, Aguascalientes, Mexico

TRANSCRIPT

Page 1: Big Data Visualization

Edwin de Jonge, December 3, 2013

Big Data Visualization

“Turning Statistics into Knowledge”, Aguascalientes

With thanks to Piet Daas, Martijn Tennekes and Alex Priem

Page 2: Big Data Visualization

Overview

2

• Big Data • Research ‘theme’ at Stat. Netherlands • Data driven approach

• Visualization as a tool •Why? •Examples in our office

•Census •Social Security •Social Media •Not shown: Traffic loops, Mobile phone data

Page 3: Big Data Visualization

Why Visualization?

October 1st 2013, Statistics Netherlands

Page 4: Big Data Visualization

Effective Display!

(see Tor Norretranders, “Band width of our senses”)

Page 5: Big Data Visualization

Anscombes quartet…

5

DS1 x

y DS2 x y

DS3 x y DS4 x y

10 8.04 10 9.14 10 7.46 8 6.58

8 6.95 8 8.14 8 6.77 8 5.76

13 7.58 13 8.74 13 12.74 8 7.71

9 8.81 9 8.77 9 7.11 8 8.84

11 8.33 11 9.26 11 7.81 8 8.47

14 9.96 14 8.1 14 8.84 8 7.04

6 7.24 6 6.13 6 6.08 8 5.25

4 4.26 4 3.1 4 5.39 19 12.5

12 10.84 12 9.13 12 8.15 8 5.56

7 4.82 7 7.26 7 6.42 8 7.91

5 5.68 5 4.74 5 5.73 8 6.89

Page 6: Big Data Visualization

Anscombe’s quartet

Property Value

Mean of x1, x2, x3, x4 All equal: 9

Variance of x1, x2, x3, x4 All equal: 11

Mean of y1, y2, y3, y4 All equal: 7.50

Variance of y1, y2, y3, y4 All equal: 4.1

Correlation for ds1, ds2, ds3, ds4 All equal 0.816

Linear regression for ds1, ds2, ds3, ds4

All equal: y = 3.00 + 0.500x

Looks the same, right?

Page 7: Big Data Visualization

Lets plot!

Page 8: Big Data Visualization

Visualization

For Big Data:

Use appropriate:

- Summarization

- Granularity

- Noise filtering

Research: What works for big data?

Page 9: Big Data Visualization

9

Scatter plot with 100 data points

Page 10: Big Data Visualization

10

Scatter plot with 100 000 data points

Page 11: Big Data Visualization

11

Example 1: Census

Page 12: Big Data Visualization

Example Virtual Census

‐ Every 10 years a Census needs to be conducted

‐ No longer with surveys in the Netherlands • Last traditional census was in 1971

‐ Now by (re-)using existing information • Linking administrative sources and available sample

survey data at a large scale

• Check result

• How?

• With a visualisation method: the Tableplot

11

Page 13: Big Data Visualization

Making the Tableplot

1. Load file 17 million records 2. Sort record according to 17 million records

key variable • Age in this example

3. Combine records 100 groups (170,000 records each)

• Numeric variables • Calculate average (avg. age)

• Categorical variables • Ratio between categories present (male vs. female)

4. Plot figure of select number of variables • Colours used are important up to 12

12

Page 14: Big Data Visualization
Page 15: Big Data Visualization

October 1st 2013, Statistics Netherlands tableplot of the census test file

Page 16: Big Data Visualization

Tableplot: Monitor data quality

16

– All data in Office passes stages:

‐ Raw data (collected)

‐ Preproccesed (technically correct)

‐ Edited (completed data)

‐ Final (removal of outliers etc.)

Page 17: Big Data Visualization

Processing of data Raw (unedited) data

Edited data

Final data

Page 18: Big Data Visualization

Example 2 : Social Security Register

15

Page 19: Big Data Visualization

Social Security Register

– Contains all financial data on jobs, benefits and

pensions in the Netherlands

‐ Collected by the Dutch Tax office

‐ A total of 20 million records each month

‐ How to obtain insight into so much data? • With a visualisation method: a heat map

19

Page 20: Big Data Visualization

October 1st 2013, Statistics Netherlands

Heat map: Age vs. ‘Income’

16

Age

Inco

me

(eu

ro)

Page 21: Big Data Visualization

17

amount

amount

Page 22: Big Data Visualization

22

Example 3: Social media

Page 23: Big Data Visualization

Daily Sentiment in Dutch Social Media

Social media: daily sentiment in Dutch messages

23

Page 24: Big Data Visualization

Granilarity: From day to week

Social media, daily sentiment in Dutch messages Social media: daily & weekly sentiment in Dutch messages

24

Page 25: Big Data Visualization

Granularity: From day to month

Social media, daily sentiment in Dutch messages Social media: daily, weekly & monthly sentiment in Dutch messages

25

Page 26: Big Data Visualization

Enter: Consumer confidence!

Social media, daily sentiment in Dutch messages Social media: monthly sentiment in Dutch messages & Consumer confidence

26 Corr: 0.88

Page 27: Big Data Visualization

Conclusions

Big data is a very interesting data source for

official statistics

Visualisation is a great way of

getting/creating insight

Not only for data exploration, but also for

finding errors

27

Page 28: Big Data Visualization

The future of statistics?