results matter. trust nag 1 july, 2008 research methods festival, st catherine's college,...

34
Results Matter. Trust NAG 1 July, 2008 Research Methods Festival, St Catherine's College, Oxford New Directions in Analysis and Visualization Dr Jeremy Walton Dr Jeremy Walton NAG Ltd, Oxford NAG Ltd, Oxford [email protected] [email protected] [ Visual Analytics Visual Analytics]

Upload: ursula-kennedy

Post on 29-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Results Matter. Trust NAG1 July, 2008 Research Methods Festival, St Catherine's College, Oxford

New Directions in Analysis and Visualization

Dr Jeremy WaltonDr Jeremy WaltonNAG Ltd, OxfordNAG Ltd, Oxford

[email protected]@nag.co.uk

[Visual AnalyticsVisual Analytics]

1 July, 2008 2

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Overview Introduction

NAG, HECToR

Visualization distribution, collaboration, steering

Data mining classification, exploratory analysis

The ADVISE project large data, interactive analysis

1 July, 2008 3

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Overview Introduction

NAG, HECToR

Visualization distribution, collaboration, steering

Data mining classification, exploratory analysis

The ADVISE project large data, interactive analysis

1 July, 2008 4

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

NAG profileProducts

Mathematical, statistical, data analysis components 3D visualization, compilers & tools HPC software engineering services

HECToR support

Users Academic researchers Professional developers Analysts / modelers

Founded 1976Not-for-profit company

1 July, 2008 5

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

High-End Computing Terascale ResourceLatest high-end computing service for UK

funded by EPSRC, NERC & BBSRC will run from 2007-2013

Partners: Hardware: Cray IncCray Inc Service Provision: University of Edinburgh HPCx LtdUniversity of Edinburgh HPCx Ltd

hardware hosting, user services, help desk

CSE Support: NAG LtdNAG Ltd technical assessment of project application porting / tuning / optimisation of user codes training courses (inc. visualization) best practice guides, documentation, FAQs

1 July, 2008 6

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Overview Introduction

NAG, HECToR

Visualization distribution, collaboration, steering

Data mining classification, exploratory analysis

The ADVISE project large data, interactive analysis

1 July, 2008 7

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Visualization toolkitsHelp construct visualization applications

no wheel-reinvention, stone canoes, chocolate teapots

Proprietary supported commercial systems e.g. Excel, IRIS Explorer, Spotfire

Open source, freely available software e.g. OpenDX, InfoVis

1 July, 2008 8

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

NAG’s IRIS Explorer…General purpose toolkit for data visualizationReusable building blocks (modules)Connect modules to build application Point-and-click development

Visual programming approach Build, execute, reshape

Add new modules, if required

1 July, 2008 9

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

…in action

Reads data

Colormaps it

Makes ribbonDisplays it

Application in map editor

Modules in

module librarian

1 July, 2008 10

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Make the connections

1 July, 2008 11

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Add more modules...

Adds axes

1 July, 2008 12

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

...and even more

Addscaption

1 July, 2008 13

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Some examples

1 July, 2008 14

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Trendalyzer (Gapminder)

1 July, 2008 15

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Worldmapper: area

1 July, 2008 16

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Worldmapper: deaths by disease

1 July, 2008 17

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Many eyes: shared visualization

1 July, 2008 18

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Overview Introduction

NAG, HECToR

Visualization distribution, collaboration, steering

Data mining classification, exploratory analysis

The ADVISE project large data, interactive analysis

1 July, 2008 19

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

NAG Data Mining ToolsData Cleaning

Data imputation - adding missing values Outlier detection - finding suspect data records

Data Transformation Scaling Data - before distance computation Principal Component Analysis - reducing # of variables

Model fitting Cluster analysis - finding interesting groups Classification techniques - # of groups is known Regression no groups - outcome is continuous

Linear / Non-linear / Time series

1 July, 2008 20

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Example: exploratory data analysisHow many species of water vole (Arvicola) in UK?Measurement data

Presence / absence of 13 skull characteristics 300 observations, each in one of 14 regions 3 groups:

A. terrestris / A. sapidus / unclassified UK cases

Treatment Average data within each region Gives 14 data points in 13 dimensions How to display dataset?

2D scatterplots

1 July, 2008 22

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Analysis2D scatterplots?

Structure is unclear (13 x 12) / 2 = 78 plots needed

Principal components analysis? 2 PCs explain 49% of the variance 3 PCs explain 65% of the variance

Should be > 85% for confident representation Fisher’s iris dataset (4 variables) is 95%

Alternative technique Metric scaling

1 July, 2008 23

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Metric scaling14 data points – one for each region

Each point has values for 13 variables

Construct 14 by 14 dissimilarity matrix, Δ Δij = distance between points i & j in 13D space Δ is symmetric, with zero diagonal elements

Want to find a new matrix, Δ* set of 14 new data points in 3D space that preserve Δ

Project Δ to Δ* using metric scaling Display data points in 3D

1 July, 2008 28

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Exploratory data analysis conclusions2D scatterplots don’t indicate group structure

cf. iris dataset

3D PCA unreliable hereMetric scaling of Δ used to reduce D from 13 to 33D visualization reveals group structure

Distinct A. sapidus group UK sample represents only A. terrestris

1 July, 2008 29

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Overview Introduction

NAG, HECToR

Visualization distribution, collaboration, steering

Data mining classification, exploratory analysis

The ADVISE project large data, interactive analysis

1 July, 2008 30

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

The ADVISE projectDTI-funded research project, started March 2007NAG / VSN / University of LeedsMerge visualization & statistics (visual analytics)

use statistics to identify key characteristics of dataset understand the characteristics through visualization

User community pharmaceuticals environmental science engineering

Initial user meeting held September 2007

1 July, 2008 31

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

Large datasetsSize matters (but isn’t everything)Developer’s view:Too large for our current system Problems of

performance robustness

User’s view:Too large for me to understand

Current ADVISE datasets are “only” a few GB complications (e.g comparing several) could raise this HECToR users have TB datasets

1 July, 2008 32

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

ADVISE ideasRetention of visual programming interfaceRe-use of algorithmic base

IRIS Explorer modules GenStat statistics functionality (from VSN)

Three layered architecture User interface Web service middleware Visualization components

Distribution, tailored user interface, collaboration

1 July, 2008 33

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

ADVISE progressPorting IE modules to standalone environment

some of these use GenStat for statistics

New system used to revisit air quality demo early (IEEE Viz 96)

web-based visualization new system more

efficient

Working with real user data

1 July, 2008 34

Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford

ConclusionsNAG offers software components for developers

no wheel-reinvention, stone canoes, chocolate teapots

Visualization & data mining crucial for analysis distribution, steering, classification, exploration interactivity / interrogation important integration is an ongoing field of activity

ADVISE project developing a new system for visual analysis working with real user problems improving understanding of data