Download - Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Enhancing a Social Science

Model-building Workflow with

Interactive Visualisation

Cagatay Turkay, Aidan Slingsby,

Kaisa Lahtinen, Sarah Butt and Jason DykesgiCentre & Centre for Comparative Social Surveys at City University London

ESANN 2016, 29 April 2016

“We (social scientists) need (data-based)

models that we can understand and

explain so that we can defend them to

our peers in full confidence.”

A quote that motivates this work (from collaborators within our AddResponse project)

Image from: Lahtinen, K. et al. (2015). Informing Non-Response Bias Model Creation in Social

Surveys with Visualisation. Poster VIS 2015

Numerical models to predict phenomena or, act as a

simulation of the phenomena being investigated

Good predictive power is often desired in models, BUT, (in

some fields) explanatory power is also crucial (Shmueli, 2010 for a detailed

[*] Shmueli, Galit. "To explain or to predict?." Statistical science (2010): 289-310.

discussion)

AddResponse Project -- https://blogs.city.ac.uk/addresponse/

… utilise organically generated auxiliary data (from commercial

transactions, public administration and other sources) to understand propensity

to respond and eventually tackle nonresponse bias (i.e.,

respondents differ from nonrespondents ).

AddResponse - Details

• European Social Survey (ESS) UK 2012 - 13

• 4,520 households

• linked to auxiliary data from:

• administrative sources

• commercial consumer profiling

• open-source data

• 401 auxiliary variables

• 32 survey response variables (only for the respondents)

e.g., Proportion

of house

sharing adults

e.g., Sports

facilities

within walking

distance

Existing workflow

• Iteratively add and/or removing variables from a

logistic regression model

• Assess the changes through model fitness metrics

(e.g., AIC, McFadden)

• Put up a sticker !

• Highly manual but involved!

Key roles for interactive visualisation

• Incorporating Theory

• Exploring variables

• Interactively building models

• Considering Geography

• Recording the model-building process, i.e., provenance

VarXplorer ModelBuilder

Prototype-1: VarXplorer

Co-variation plot

Correlations with

indicators

Theory-related

meta-data

Interactive

modelling

Link to the Video: http://goo.gl/XNiOIX

http://goo.gl/XNiOIX

Exploring variables – 1: Investigate Covariation

- Compute pairwise correlation within all

401 variables

- Use this as a distance matrix and

project to 2D (using MDS)

- Visualise on a scatterplot where each

point is a variable

Exploring variables – 2: Correlation with indicators

- Compute correlations within all 32

response variables + response rate

- Use this as meta-data on variables to

check whether they relate to indicators

Incorporating Theory-related data

- Associate variables to social-science

concepts and theory

- Concepts relate to theories

- Variables act as proxies for concepts

- Use these as meta-data on variables

and visualise through histograms

Concepts, e.g.,

deprivation or quality

of life

Theories, e.g., social

isolation or social

disorganisation

Prototype-2: ModelBuilder

Variable selection

Model provenance

Interactive modelling

(through R)

Model quality

metrics

Prototype-2: ModelBuilder

Link to the Video: http://goo.gl/itUlm2

http://goo.gl/itUlm2

Interactively building models & evaluating them

- R scripts are called with the variable

selections and the variable to predict

(response or ESS variable)

- Quality metrics (AIC, McFadden) &

variables weights visualised

Interactive model building

also in VarXplorer

with variable weights

Considering Geography

- Facet data (geographically) into 12 regions

- Build local models

- Evaluate locally

Model provenance & annotations

- Save and analyse the model-building

trail

- Mark dead-ends and good models

- Attach notes to models

A brief example of the modelling process

1. Select two

concepts ,

economic

circumstances and

quality of life


2. Select variables

that are distinct

and relevant


3. Select variables

that correlate

with an ESS

indicator

(happiness)

3.1 Observe that

they relate to

“Social Isolation”


4. Use these variables as a

starting point, check local

variations and plug into

existing scripts

4.1 Model performs

“better” in South-East UK

and in Greater London

Lessons learned

• Enhanced analysis through informed use of computation

• Interactive visual methods improve reliability and

interpretability

• Improved trust in models

• Tight integration enables quick hypothesis prototyping

• Important to communicate the certainty of the findings

Looking into the future

• Explanatory models not only predictive models

• Incorporating more complex methods (already

incorporated random forests)

• Other ways to make models more accessible?

• Use models & findings as scientific evidence ?

Acknowledgments

• giCentre team @ City

• ADDResponse project funded by the UK Economic

and Social Research Council (grant ES/L013118/1)

Thank you !

[email protected]

@cagatay_turkay

http://staff.city.ac.uk/cagatay.turkay.1/

https://blogs.city.ac.uk/addresponse/

http://www.gicentre.net/

!! We are hiring !!

* Researcher in visualisation of cyber-security data

(H2020 funded RIA)

* PhD studentships

Deadlines in late May and June

check giCentre.net

mailto:[email protected]

https://blogs.city.ac.uk/addresponse/

mailto:[email protected]

Download - Enhancing a Social Science Model-building Workflow with Interactive Visualisation

Top Related