toolbox of a data scientist: multiple approaches to work with behavioural data philippe j....

46
Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

Upload: allan-botkins

Post on 15-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

Toolbox of a data scientist: multiple approaches to work with behavioural data

Philippe J. Giabbanelli, PhD

Data Insight Meetup, February 5th 2015

Page 2: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

2PJ Giabbanelli

Outline

Toolbox of a data scientist: multiple approaches to work with behavioural data

Toolbox of a data scientist: multiple approaches to work with behavioural data

Toolbox data scientistbehavioural data

1 – What’s data science?

2 – What questions can we ask of behavioural data?

3 – How do we use data science tools to get answers?

Food behaviours Drinking behaviours Insurgencies

Page 3: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

What’s data science?Visualization Data miningSimulation and modelling

3PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data

Page 4: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

Imagine that people have completed some kind of questionnaire. Typically you get an Excel spreadsheet. And you’d like to understand what relates to the target behaviour.

4PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data

Tableau

Page 5: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

5PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data

Imagine that you have a very complex system, where tons of variables interact… You may want to look at it as a network.

Gephi

Page 6: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 6Toolbox of a data scientist: multiple approaches to work with behavioural data

Page 7: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

7PJ Giabbanelli

What if you have a lot of text instead?

Toolbox of a data scientist: multiple approaches to work with behavioural data

Page 8: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

Here I am primarily concerned with visualization as seen from a data scientist’s viewpoint. I would use…

Tool Data

Tableau, Qlik, Spotfire Relational (spreadsheet)

Gephi or Visone Network

Datawatch Streaming relational

Many-eyes A bit of everything

GeoTime Spatial data over time

Jigsaw, CZSaw, InSpire, Leximancer,

Text

$

$

$

$$

$

$

Viz as data scientist ≠ Making pretty pictures

If you’re producing a visual for an audience, you show what you found. When you start with viz as a

data scientist, you want to find something!Visual Capitalist

Page 9: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

9PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data

Abusing the tool

If you watch CSI, you’ll see that when they search for a fingerprint match, the software shows all fingerprints it has!

Wasting computer resources for useless displays

Proper statistical

testingIf it looks like your data is normally distributed, that must be it, right?

Relying on visuals instead of doing proper statistics

Page 10: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

10PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data

Abusing the tool

When all you have is a hammer, everything starts looking like a nail.

Page 11: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

What’s data science?Visualization Data miningSimulation and modelling

11PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data

Page 12: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

What’s data science?

?

??

12PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data

Imagine that you’re working for CSI (again!) and you want to identify the dude in the picture.

When you know what you’re after, and it can be mathematically expressed, data mining helps.

Page 13: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 13

Rules

Communication

Often Very often

Dai

lyW

eekl

yN

ever

Binge drinker Non-binge drinker

A: rules

If ≥ oftenIf < often

B: comm.

If<daily

D: rules

If ≥ very oftenIf < very often

C: comm. B

A

D

C

Never

If ≥ daily If < weekly If ≥ weekly

Toolbox of a data scientist: multiple approaches to work with behavioural data

What’s data science?

Suggested tools: RapidMiner, Weka$ $

Page 14: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 14Toolbox of a data scientist: multiple approaches to work with behavioural data

What’s data science?

Data mining involves automatically testing lots of hypotheses by searching for combinations of

variables that might show a correlation.

Which variables are in the winning combination? You partly do data mining to answer this question…

A. WoodData

Manager

« For every variable that you seek to collect, provide a detailed rationale. »

V. LoEthics Board

Page 15: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

What’s data science?Visualization Data miningSimulation and modelling

15PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data

Page 16: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

16PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data

I offered coupons to some customers. Would they spend more? Who should I target?

I raised prices of fast foods. Would it curb obesity? Who would benefit the most?

I put people on antiretroviral therapy when they don’t have AIDS. Would it help? For whom?

There are lots of big questions for which you don’t necessarily have all the data. Also, methods that help you understand what happened may not be helpful to know what may happen if…

Page 17: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

What’s data science?Imagine that you want to change the urban environment to see if it helps people exercise more.

PJ Giabbanelli 17Toolbox of a data scientist: multiple approaches to work with behavioural data

You hopefully won’t be doing that.

Rather you might want to create a virtual environment that simplifies reality so you

can test your hypothesis safely.

Page 18: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

What’s data science?

PJ Giabbanelli 18Toolbox of a data scientist: multiple approaches to work with behavioural data

Page 19: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 19Toolbox of a data scientist: multiple approaches to work with behavioural data

What’s data science?

There are lots of ways to do modelling, depending on desired

spatial & individual resolution.

The most common approaches are agent-based modelling and

system dynamics.

Tool Approach

Anylogic ABM / SD

NetLogo ABM

Vensim, iThink SD$

$

$

Page 20: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 20

Also: The emergence of Computational Sociology (J. of Math. Soc., ‘95); Why model? (JASS ’08)

What’s data science?

Toolbox of a data scientist: multiple approaches to work with behavioural data

Page 21: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 21Toolbox of a data scientist: multiple approaches to work with behavioural data

Visualization Modelling & Simulation Data mining & Machine Learning

Data Science as a Technique

Applications

Defense Health

Chronic diseases Infectious diseases

Page 22: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 22

Why?

Tell me what people will do in the future!

Toolbox of a data scientist: multiple approaches to work with behavioural data

Page 23: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 23

Applications of Data Science

How would climate change policies impact the health of Canadians by 2030? Simulated data for 2030

Dietary patterns Built environment Socio-economics

Inputs Outputs

Systems modelExpected

health impacts

Physical health

Well-being

Toolbox of a data scientist: multiple approaches to work with behavioural data

Page 24: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015
Page 25: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 25

Applications of Data Science

There are many reasons other than prediction to do data science.

Explaining

To simulate far into the future, you need to understand what you have now and how it changes.

2014 2024 2044

1 - Explain 2 - Predict

Toolbox of a data scientist: multiple approaches to work with behavioural data

Page 26: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 26Toolbox of a data scientist: multiple approaches to work with behavioural data

Applications of Data Science

There are many reasons other than prediction to do data science.

Explaining

“Electrostatics explains lightning,

but we cannot predict when or where the next bolt will strike.”

“Plate tectonics explains earthquakes,

But does not permit us to predict the time and place of their occurence"

Page 27: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 27Toolbox of a data scientist: multiple approaches to work with behavioural data

Applications of Data Science

There are many reasons other than prediction to do data science.

Explaining

Schelling’s model of segregation

A preference that one's neighbors be of the same color, or even a preference for a mixture "up to some

limit", could lead to total segregation.

Page 28: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 28Toolbox of a data scientist: multiple approaches to work with behavioural data

Applications of Data Science

There are many reasons other than prediction to do data science.

What are the core dynamics in my problem?

Where are the gaps? Where do I need to collect data?

What would happen if?

How can we best do monitoring and surveillance?

Page 29: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 29

Illuminate core dynamics

“There is increasing evidence that social influence and social network structures are significant factors in obesity.”

Eating Exercising

Toolbox of a data scientist: multiple approaches to work with behavioural data

Page 30: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 30

Illuminate core dynamics

To which extent could social influences account for the dynamics of obesity?

Toolbox of a data scientist: multiple approaches to work with behavioural data

Let’s tackle the question using modelling & simulation.

Page 31: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 31

Illuminate core dynamics

Toolbox of a data scientist: multiple approaches to work with behavioural data

Page 32: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 32

Illuminate core dynamics

Toolbox of a data scientist: multiple approaches to work with behavioural data

Page 33: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli

Motivating question: to which extent is this model supported by interviewees?

33Toolbox of a data scientist: multiple approaches to work with behavioural data

Let’s tackle this question using interactive visualizations.

Illuminate core dynamics

Page 34: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli

We measured the strength of a relationship between two factors as the number of responses in the interviews that used words relevant to both factors.

34Toolbox of a data scientist: multiple approaches to work with behavioural data

Page 35: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 35

Explaining

ProcessYou select peers with

whom to drink……and then, their drinking

habits influence yours.

Structure

Can we explain why people engage in binge drinking? Let’s start with modelling and simulation, and make some hypotheses.

Toolbox of a data scientist: multiple approaches to work with behavioural data

Page 36: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 36

If we assume:

• that individuals select similar peers

• that individuals are prompted to drink if at least a fraction of their peers drink

• that one’s context known from drinking motives may deter/promote drinking

Then we can correctly infer the behaviour of half of the binge drinkers and 4 out of 5 non binge drinkers.

Explaining

Toolbox of a data scientist: multiple approaches to work with behavioural data

But without making any assumptions ourselves, if we just used data mining we would get roughly the same accuracy. The computer would build an explanation for us.

Page 37: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

March 2011: Emergence Escalation Early 2012: Militarisation

Monitoring

The situation might change as you are intervening.

How can you monitor changes and adapt?

PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data 37

Page 38: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

Visualizations allows the analyst to interactively explore the data and improve the model.

The model guides the analyst in the exploration of the new data.

PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data 38

Page 39: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli

There is a lot of potential in the tight coupling of techniques (e.g., modelling / interactive visualizations) but currently you’d have to come up with a technical solution yourself for that.

Toolbox of a data scientist: multiple approaches to work with behavioural data 39

Page 40: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli 40Toolbox of a data scientist: multiple approaches to work with behavioural data

Visualization Modelling & Simulation Data mining & Machine Learning

Defense Health

Chronic diseases Infectious diseases

Interdisciplinary: shock of cultures

Getting good quality data

Needing to understand a very wide range of tools

Continuously need to improve the tools

Data science in the world

Challenges

Page 41: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

Challenges – Need new tools

PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data 41

Page 42: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

Challenges – Interdisciplinary

PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data 42

Page 43: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

Challenges – Interdisciplinary

PJ Giabbanelli

In my field, good papers are published in conferences.

In my field, good papers are published in journals.

In my field, we just put data on our website for others.

In my field, we own the data and selectively share it.

Why don’t I just pick a book and learn your whole field?

Why don’t I just watch a couple videos to learn your job?

We need to build mutual trust and accomodate each other in a system that’s unsupportive.

Toolbox of a data scientist: multiple approaches to work with behavioural data 43

Page 44: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

Challenges – Getting good data

PJ Giabbanelli

There is a lot of data out there. But most is unstructured (text, video…)

and hard to deal with.

There are public repositories for data but a lot of that are lists of junk,

localisations, or population-level data split at best per age and gender.

http://ukdataservice.ac.ukhttp://data.gouv.frhttp://data.govhttp://adsfree.comhttp://kaggle.com

Toolbox of a data scientist: multiple approaches to work with behavioural data 44

Page 45: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

Challenges – Getting good data

PJ Giabbanelli

Kaggle

Toolbox of a data scientist: multiple approaches to work with behavioural data 45

Page 46: Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

PJ Giabbanelli

Investigator ScientistUniversity of Cambridge

(@Addenbrooke’s)

Get in touch? [email protected]

FounderVancouver Computational

Modelling

• PJ Giabbanelli. Modelling the spatial and social dynamics of insurgency. Security Informatics ‘14

(Simulation & Modelling in Defense)

• Pratt, Giabbanelli & Mercier. Detecting unfolding crises with visual analytics and conceptual maps: emerging phenomena and big data. Proc of IEEE ISI ‘13(Visual Analytics + Simulation & Modelling in Defense)

• Crutzen & Giabbanelli. Using classifiers to identify binge drinkers based on drinking motives. Substance use & misuse ‘14. (Data mining in health)

• Giabbanelli et al. Modeling the influence of social networks and environment on energy balance and obesity. Journal of Computational Science ‘12.

(Simulation & Modelling in Health)

Toolbox of a data scientist: multiple approaches to work with behavioural data 46