generating insight from data

13
15 July 2016 P2175-P-011 v0.2 Commercially Confidential Generating Insight from Data Tailoring Analytic Algorithms and Visualization to Address User Requirements

Upload: cambridge-consultants

Post on 14-Apr-2017

301 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Generating insight from data

15 July 2016 P2175-P-011 v0.2

Commercially Confidential

Generating Insight from Data

Tailoring Analytic Algorithms and Visualization to Address User Requirements

Page 2: Generating insight from data

15 July 2016 P2175-P-011 v0.22Commercially Confidential

The Challenge

How do you get from a user with data to one acting on the basis of said data?

We will elaborate by way of example(s)

Data +User

Action !

NeedsAnalysis

Visualisation

Page 3: Generating insight from data

15 July 2016 P2175-P-011 v0.23Commercially Confidential

TfL Data Overview

The Transport for London (TfL) Tube travel data set provides a large, open-access data set we can play around with to demonstrate the process

Over ½ million journeys logged giving location and time of the start and end of each.– Needs cleaning (unstarted, unfinished, not applicable etc…)

While nominally about journeys, the data can be re-analysed to give information about:

– Stations– Lines

Other meta data allows for potentially interesting analyses, such as user type (elderly pass user, season ticket user etc…)

Page 4: Generating insight from data

15 July 2016 P2175-P-011 v0.24Commercially Confidential

User: Needs

Consider a potential user. What questions do they want answers to? What information is of use to them?

We consider a user who wants to know about stations – not just in terms of usage but in terms of which stations are similar to other stations. This may be for several reasons:

– Interested in advertising based on likely users;– Interested in appropriate staffing and rostering of stations;– Interested in issue tracking, learning and apply lessons to similar stations;

Alternative users might be interested in the traffic flow on lines and how they are affected by station closure:

– Emergency / contingency planning;– Sophisticated travel advice apps;

Page 5: Generating insight from data

15 July 2016 P2175-P-011 v0.25Commercially Confidential

Analysis: Station Profiles

We refocus the data set to give a profile of the usage of a station – recording both arrival and departure rates across the working day

Comparing total usage of stations is easily done by the users (already) so we scale each of these profiles to have a maximum value of 1.

Page 6: Generating insight from data

15 July 2016 P2175-P-011 v0.26Commercially Confidential

Analysis: Dissimilarity Metric

User is interested in type of station (e.g. commuter source) but not interested in the precise timing of the commuter rushes

Stations close to the centre (e.g. Harrow) have later morning departure peaks than stations further out (e.g. Chorleywood)

The reverse is true for the evening arrivals rush

Dissimilarity between stations is determined by minimum Euclidian distance between arrival and departure profiles allowing for small timeshifts

Timeshifts must be applied in opposite directions for arrivals and departures

Page 7: Generating insight from data

15 July 2016 P2175-P-011 v0.27Commercially Confidential

Analysis: Automatic Clustering

Agglomerative hierarchical clustering technique was used with group average linkage to merge clusters

Complete dendrogram is easy to calculate – deciding where to split is Splitting into 6 clusters provided useful insight (more clusters are also insightful)

Page 8: Generating insight from data

15 July 2016 P2175-P-011 v0.28Commercially Confidential

Analysis: 6 Clusters

Some insights can be gained just from looking at the clusters – e.g. the clusters were labelled by observing their membership

Commuter Source: 168 stations, characterised by a morning departures peak and an evening arrivals peak, mainly located in the suburbs (e.g. Barnet)

Commuter Destination: 44 stations, characterised by a morning arrivals peak and an evening departures peak, mainly central London (e.g. Canary Wharf)

Transit: 44 stations, with peaks as a commuter destination but also keeping high usage throughout the day, includes most rail/tube interchanges, (e.g. Kings Cross)

Social: 3 stations, with peaks as a commuter destination, but with extra arrivals early evening and many departures very late in the evening, (e.g. Covent Garden)

Heathrow Terminal 4: Cluster of one whose behaviour is highly variable - dependent upon flights rather than typical work patterns.

Heathrow Terminals 1,2,&3: Cluster of one whose behaviour is highly variable - dependent upon flights rather than typical work patterns.

…Text is a poor way of displaying these

Page 9: Generating insight from data

15 July 2016 P2175-P-011 v0.29Commercially Confidential

Geographic Visualisation

Further insight can be achieved by using an interactive, web based, visualisation tool to show the location, cluster and current usage of each station

Page 10: Generating insight from data

15 July 2016 P2175-P-011 v0.210Commercially Confidential

Geographic Visualisation

Rush-hour becomes startlingly clear as the size of each station is proportional to how busy it is

Page 11: Generating insight from data

15 July 2016 P2175-P-011 v0.211Commercially Confidential

Geographic Visualisation

Pan and zoom (inherited from Google Maps) allow a user to focus their interest in an intuitive manner – clicking on a station brings up details on the right

Page 12: Generating insight from data

15 July 2016 P2175-P-011 v0.212Commercially Confidential

Conclusions

To generate new insight you need to determine users needs, apply appropriate analytics and display with suitable visualisations.

Data analysis without an understanding of the goal may just be empty maths;

Data visualisation on it’s own may be very pretty, but not useful;

New insights can be generated from analytics + visualisation

Target these to address user needs and you have something useful

Page 13: Generating insight from data

15 July 2016 P2175-P-011 v0.2

Cambridge UK

Registered No. 1036296 England

Cambridge Consultants is part of the Altran group, a global leader in Innovation. www.Altran.com

www.CambridgeConsultants.comThe contents of this presentation are commercially confidential and the proprietary information of Cambridge Consultants © 2016 Cambridge Consultants Ltd. All rights reserved.

Boston USA Singapore