pivotal digital transformation forum: data science technical overview
TRANSCRIPT
Data Science A Technical Overview
Dr Carsten Riggselsen Principal Data Scientist Pivotal
2 © Copyright 2015 Pivotal. All rights reserved.
3 © Copyright 2015 Pivotal. All rights reserved.
Pivotal in a Nutshell
4 © Copyright 2015 Pivotal. All rights reserved.
Platform to Solution Level
� Ready to be incorporated into scalable, platform-independent applications: – Pivotal Cloud Foundry platform-as-a-service – Pivotal Labs gold standard for modern software development – World-class Data Science capability
Solution Level
� Pivotal Big Data Suite is the only enterprise-grade software distribution that contains all elements of the λ architecture
� Wrapped in a flexible and consumption-based commercial offer
� Truly enables enterprises to make the best decisions when it matters
Platform Level
Platform Component
Level
� GemFire is the leading in-memory data grid, processing 2 million events per second
� Hawq brings 100% ANSI compliant SQL, hundreds of concurrent queries and JDBC & ODBC compliance to link to legacy databases
� Spring XD streaming workflow builds performant pipelines to consume and consolidate data from a variety of endpoints
5 © Copyright 2015 Pivotal. All rights reserved.
Discover insights Create analytics algorithms
Digital Transformation
Deploy analytic apps and automation at scale
Store any type and size of data
6 © Copyright 2015 Pivotal. All rights reserved.
Value of Data and Information
Time
Value of Data
µs ms s hour day month year yr+ Spring XD
Pivotal HD
Pivotal GemFire
Information
7 © Copyright 2015 Pivotal. All rights reserved.
• Translate business problems into a
mathematical/statistical problem
• Combining appropriate Data Science techniques
• Infer and estimate parameter of statistical models
• Rely heavily on Pivotal’s stack and OSS
What We Do
8 © Copyright 2015 Pivotal. All rights reserved.
Data Science Toolkit
Pivotal Big Data SuitePivotal HDPivotal GreenplumDatabase
P L A T F O R M
Pivotal CF ®
KEY TOOLS KEY LANGUAGES
SQL
Spring XD
9 © Copyright 2015 Pivotal. All rights reserved.
R E A LT I M E DASHBOARD Destination Prediction
10 © Copyright 2015 Pivotal. All rights reserved.
Understanding Driving Behavior Vehicle Speed (km/h)
Distance (m)
• Same car • Same roads • Different drive
styles
0
10
20
30
40
50
60
- 200 400 600 800 1,000 1,200 1,400
11 © Copyright 2015 Pivotal. All rights reserved.
Pivotal HD
Spring XD
Realtime Evaluation Batch Training
Data Persistence
ODB/Bluetooth
12 © Copyright 2015 Pivotal. All rights reserved.
Example: Pivotal’s Connected Car (λ Arch)
Spring XD
Pivotal HD
Pivotal GemFire
Speed Layer Serving Layer
Batch Layer
13 © Copyright 2015 Pivotal. All rights reserved.
Video Recording
14 © Copyright 2015 Pivotal. All rights reserved.
Pivotal’s Scalable Video Analytics Architecture
15 © Copyright 2015 Pivotal. All rights reserved.
How We Do It
• Agile principles and practices
• Data Science enablement • Collaboration is key
16 © Copyright 2015 Pivotal. All rights reserved.
• Agile is an iterative approach that enables you to quickly change the kind of analysis you are doing, depending on what the data is telling you.
• Frequent interactions and pairing
with the customer ensure that the project stays on track and de-risks.
What Agile Data Science Means
Model Evaluation
Feature Review
Model Building
Feature EngineeringData Review
Operationalization
Scoping
Agile Data Science
17 © Copyright 2015 Pivotal. All rights reserved.
Frequent Feedback Removes
Risk
18 © Copyright 2015 Pivotal. All rights reserved.
Pivotal’s Software stack enables Data Science
Real-time, Interactive and Batch operations
Different architectures can be realized
Data Science is still a matter of thinking
Collaborate and interact
Conclusions
Digital Transformation Forum
Disrupt or Be Disrupted 19 OCTOBER · BMW WELT EVENT CENTRE · MUNICH