Results Matter. Trust NAG1 July, 2008 Research Methods Festival, St Catherine's College, Oxford
New Directions in Analysis and Visualization
Dr Jeremy WaltonDr Jeremy WaltonNAG Ltd, OxfordNAG Ltd, Oxford
[email protected]@nag.co.uk
[Visual AnalyticsVisual Analytics]
1 July, 2008 2
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Overview Introduction
NAG, HECToR
Visualization distribution, collaboration, steering
Data mining classification, exploratory analysis
The ADVISE project large data, interactive analysis
1 July, 2008 3
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Overview Introduction
NAG, HECToR
Visualization distribution, collaboration, steering
Data mining classification, exploratory analysis
The ADVISE project large data, interactive analysis
1 July, 2008 4
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
NAG profileProducts
Mathematical, statistical, data analysis components 3D visualization, compilers & tools HPC software engineering services
HECToR support
Users Academic researchers Professional developers Analysts / modelers
Founded 1976Not-for-profit company
1 July, 2008 5
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
High-End Computing Terascale ResourceLatest high-end computing service for UK
funded by EPSRC, NERC & BBSRC will run from 2007-2013
Partners: Hardware: Cray IncCray Inc Service Provision: University of Edinburgh HPCx LtdUniversity of Edinburgh HPCx Ltd
hardware hosting, user services, help desk
CSE Support: NAG LtdNAG Ltd technical assessment of project application porting / tuning / optimisation of user codes training courses (inc. visualization) best practice guides, documentation, FAQs
1 July, 2008 6
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Overview Introduction
NAG, HECToR
Visualization distribution, collaboration, steering
Data mining classification, exploratory analysis
The ADVISE project large data, interactive analysis
1 July, 2008 7
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Visualization toolkitsHelp construct visualization applications
no wheel-reinvention, stone canoes, chocolate teapots
Proprietary supported commercial systems e.g. Excel, IRIS Explorer, Spotfire
Open source, freely available software e.g. OpenDX, InfoVis
1 July, 2008 8
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
NAG’s IRIS Explorer…General purpose toolkit for data visualizationReusable building blocks (modules)Connect modules to build application Point-and-click development
Visual programming approach Build, execute, reshape
Add new modules, if required
1 July, 2008 9
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
…in action
Reads data
Colormaps it
Makes ribbonDisplays it
Application in map editor
Modules in
module librarian
1 July, 2008 10
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Make the connections
1 July, 2008 11
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Add more modules...
Adds axes
1 July, 2008 12
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
...and even more
Addscaption
1 July, 2008 13
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Some examples
1 July, 2008 14
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Trendalyzer (Gapminder)
1 July, 2008 15
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Worldmapper: area
1 July, 2008 16
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Worldmapper: deaths by disease
1 July, 2008 17
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Many eyes: shared visualization
1 July, 2008 18
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Overview Introduction
NAG, HECToR
Visualization distribution, collaboration, steering
Data mining classification, exploratory analysis
The ADVISE project large data, interactive analysis
1 July, 2008 19
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
NAG Data Mining ToolsData Cleaning
Data imputation - adding missing values Outlier detection - finding suspect data records
Data Transformation Scaling Data - before distance computation Principal Component Analysis - reducing # of variables
Model fitting Cluster analysis - finding interesting groups Classification techniques - # of groups is known Regression no groups - outcome is continuous
Linear / Non-linear / Time series
1 July, 2008 20
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Example: exploratory data analysisHow many species of water vole (Arvicola) in UK?Measurement data
Presence / absence of 13 skull characteristics 300 observations, each in one of 14 regions 3 groups:
A. terrestris / A. sapidus / unclassified UK cases
Treatment Average data within each region Gives 14 data points in 13 dimensions How to display dataset?
1 July, 2008 22
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Analysis2D scatterplots?
Structure is unclear (13 x 12) / 2 = 78 plots needed
Principal components analysis? 2 PCs explain 49% of the variance 3 PCs explain 65% of the variance
Should be > 85% for confident representation Fisher’s iris dataset (4 variables) is 95%
Alternative technique Metric scaling
1 July, 2008 23
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Metric scaling14 data points – one for each region
Each point has values for 13 variables
Construct 14 by 14 dissimilarity matrix, Δ Δij = distance between points i & j in 13D space Δ is symmetric, with zero diagonal elements
Want to find a new matrix, Δ* set of 14 new data points in 3D space that preserve Δ
Project Δ to Δ* using metric scaling Display data points in 3D
1 July, 2008 28
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Exploratory data analysis conclusions2D scatterplots don’t indicate group structure
cf. iris dataset
3D PCA unreliable hereMetric scaling of Δ used to reduce D from 13 to 33D visualization reveals group structure
Distinct A. sapidus group UK sample represents only A. terrestris
1 July, 2008 29
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Overview Introduction
NAG, HECToR
Visualization distribution, collaboration, steering
Data mining classification, exploratory analysis
The ADVISE project large data, interactive analysis
1 July, 2008 30
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
The ADVISE projectDTI-funded research project, started March 2007NAG / VSN / University of LeedsMerge visualization & statistics (visual analytics)
use statistics to identify key characteristics of dataset understand the characteristics through visualization
User community pharmaceuticals environmental science engineering
Initial user meeting held September 2007
1 July, 2008 31
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
Large datasetsSize matters (but isn’t everything)Developer’s view:Too large for our current system Problems of
performance robustness
User’s view:Too large for me to understand
Current ADVISE datasets are “only” a few GB complications (e.g comparing several) could raise this HECToR users have TB datasets
1 July, 2008 32
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
ADVISE ideasRetention of visual programming interfaceRe-use of algorithmic base
IRIS Explorer modules GenStat statistics functionality (from VSN)
Three layered architecture User interface Web service middleware Visualization components
Distribution, tailored user interface, collaboration
1 July, 2008 33
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
ADVISE progressPorting IE modules to standalone environment
some of these use GenStat for statistics
New system used to revisit air quality demo early (IEEE Viz 96)
web-based visualization new system more
efficient
Working with real user data
1 July, 2008 34
Results Matter. Trust NAGResearch Methods Festival, St Catherine's College, Oxford
ConclusionsNAG offers software components for developers
no wheel-reinvention, stone canoes, chocolate teapots
Visualization & data mining crucial for analysis distribution, steering, classification, exploration interactivity / interrogation important integration is an ongoing field of activity
ADVISE project developing a new system for visual analysis working with real user problems improving understanding of data