©2019 dataiku, inc. | dataiku.com | [email protected] ... · modern architecture linux server on...
TRANSCRIPT
©2019 dataiku, Inc. | dataiku.com | [email protected] | @dataiku
Where is your organization today?
Big Data Day 0
Initiation
Impact
Acceleration
Systematization
ML is for specialistsAd-hoc analyticsSiloed Approach
Demonstrate Value
Deliver Business Value In Actual Operations
Fully align data, organization and processes
Structure Execution and Self-Service
Enterprise AI
What Dataiku has had the opportunity to see
Manufacturing Financial Services
Services Consumer Goods
Technology Consulting
E-Retail Media
Healthcare Travel
Unique Journeys, Shared Challenges turned Software
Across many industriesA wide array of use cases
Predictive Maintenance
Fraud detection Product Recommendation
Churn Prediction
Risk Analysis
Production Improvements
Logistics optimization
Market Analysis
Pricing
And many, many, many more
The Fundamental Challenge is not “Software”Data Success at your company
Data is most often there, in many different forms and many different systems that need to be combined.
Success is not defined by having great business acumen OR fantastic data skills OR great analytics mindset.You need them all, together.
Labs, IT, data, operations, business… all have their own processes, and value delivery requires to bridge them.
Siloed Data Siloed People Siloed Processes
? ?
The “Tower of Babel” Effect The Classic Data Project Silos
Business Analyst
DATA PREPARATION ML MODELING ML DEPLOYMENT
Data Preparation
Data Science Notebooks & API Platforms
AutoMLSolutions
Data Scientist
Data Engineer
Bring Business, Engineers, and Data Scientists TogetherShare a common environment to have an impact
DATA PREPARATION ML MODELING ML DEPLOYMENT
Business Analyst
Data Engineer
Data Scientist
Single Collaborative, Governable and Auditable Environment
From Data Access to Operationalized ModelsSolution Overview: An End-to-End Solution
DATA MANAGEMENT MACHINE LEARNING MODEL DEPLOYMENT
Build plugins for….
Data ScientistBusiness Analyst
Find UnderstandPrepare Data
Build plugins for….
Business Analyst
VISUAL AUTO PREP CODING ENVIRONMENT(S) VISUAL AUTO ML VISUAL PIPELINE VISUAL MODEL MONITORINGMODEL DEPLOYMENT
Data Scientist Business Analyst
BusinessModelling Prototype
Data Scientist
Use For Productivity And Extend
Use as a BaselineUse for optimization
Analytics Leader Data Engineer Analytics Leader
MonitorResults
OptimizeSpeed
MonitorResults
Integrate Work Together in …
Integrate
UnderstandProgress
Build Business Monitoring
Dashboard For…
Architecture for Rapid Experimentation and DeploymentSolution Overview: Architecture II
CRM
Finance
Transactions
Event logs
Customer Touch Points
External Data
Operations
DATA
BASE
SAP
I’sLO
G FI
LES
SOURCE DATA
LARGE SCALE DATA STORAGE &
PROCESSING SYSTEMS
OUTPUT
MPP
DATA PREPARATION
ML MODEL BUILDING
MODEL ASSESSMENT
EXPLORATION / ANALYTICSACQUISITION
DEPLOYMENT
AUTOMATION NODE
API NODE
COLLABORATION
✓ Project bundling and deployment✓ Advanced automation scenario✓ Reporting and monitoring✓ Management API’s
✓ Deploy model through REST API’s✓ Model versioning✓ HA & load balancing✓ Logging
Data Scientists – Write Code
Business Analysts – Visual Tools
in-memory processing with Spark …
..or push compute to big data store where data lives
DEV OR PRODENVIRONMENTS
Real-time Applications
Operational Systems
Reporting & Dashboards
Analytical Databases / DWHHDFS
Big Data / Distributed
SUPPORT FOR KERBERIZED CLUSTERS
DESIGN NODE
Leverage your full stack and skillsModern Architecture
LINUX SERVERON PREMISE OR MANAGED
CLOUD
CENTRALIZED OR AD-HOC DATA
SOURCES, DATABASES, DATA
LAKE
AVAILABLE OR SPUN-UP PROCESSING RESOURCES
Leveraging best storage and
compute resources
Dataiku deployment servers for enterprise grade operationalization
PRODUCTION SYSTEMS
Centralized server to facilitate access to data, ressources, and
foster collaboration
Browser based interface
VISUAL DEVELOPMENT
COMPLETE CODING ENVIRONMENTS
VISUALIZATION
COLLABORATION AND PROJECT MANAGEMENT
AUDIT, MONITORING AND
SCHEDULING
User/task specific interaction modes
Goals on various level of your Enterprise - What comes first?From Strategy to Software
STRA
TEGY
CollaborationProductivity, Processes, Scaling
BUSI
NES
SSO
FTW
ARE
Infrastructure-FlexibilityData Governance
Costs
Open Source Industrialization
1
Where can processing occur?Dataiku - Enterprise scaling for IT
Local Server
In Hadoop / SparkAWS EMR / …. In SQL Database In Kubernetes &
Docker
Data Preparation (Interactive / Recipe in Workflow)
YES YES Spark, Hive, Impala
YES N/A
Coding: Python, R, Scala(Notebook / Recipe in Workflows)
YES YESYES
Custom code with DSS API
YES
SQL Analytics(Notebook / Recipe in Workflow)
N/A YES(Hive, Impala, Pig, SparkSQL)
YES N/A
Visualization (Charts)
YES YES(most charts)
YES(most charts)
N/A
Machine Learning: TrainingYES
scikit-learn, XGBoost, Keras/Tensorflow
YESMLlib, Sparkling Water
YESVertica ML
YESscikit-learn, XGBoost,
Keras/Tensorflow
Machine Learning: Inference YESDepending on algorithms
YESDepending on algorithms
YESDepending on
algorithms
YESDepending on algorithms
End-to-End Platform Solution to improve ProcessesFrom Wild West to Best Practices
DATA CATALOGING AND
CONNECTIVITYDATA PREPARATION
(AUTO) MACHINE LEARNING
PRODUCTION DEPLOYMENT
VISUAL
CODE
GOVERNANCE, VERSIONING, AUDIT AND REUSE
D.R.Y.
Maximize emergence, diffusion and reuse Build once, use again, and again, and again, and again
AutomateReuse
Navigate previous work in the centralized project directory
Copy/Paste flow components
Secure future understanding with In-built wiki
Accelerate repeated development with instant code snippets
Package complex operations in simple plugins for unlimited
reuse
Automate key operations with APIs
Identify existing components to reuse with the Catalog
After the fact A priori
BARC - User SurveyDataiku Reviews
Aligning Interests on your Path to Enterprise AIThe Dataiku Journey
New Projects
Quarterly business
review
Community
Dataiku Account Team + Partner
Implementation Manager and Professional Services
Customer Success Manager + Partner
Discovery Analysis Evaluation Subscription
Training
Deployment
First Project
Action Steps toward Enterprise AIAppendix: Maturity Model
Big Data Day 0
Initiation
Impact
Acceleration
Systematization
ML is for specialistsAd-hoc analyticsSiloed Approach
Demonstrate Value
Deliver Business Value In Actual Operations
Fully align data, organization and processes
Structure Execution and Self-Service
● Difficulty to assemble a first team
● Shifting data infrastructure/IT systems
● Lack of traction on business owners
● Difficulty to operationalize models
● Difficulty to get business acceptance and impact on model
● Inability to onboard analysts
Main Risks
● Fragmented technologies
● Data is limited to ‘experts’
● Maintaining models in production too costly, hindering new deployments
● Lack of capitalization on previous projects
● Fractionated initiatives difficult to reconcile
● Lack of manpower to expand projects
● Accumulated obsolescence of deployed projects
● Lack of leveraging of new technologies
● Data projects remain fairly specific, lacking cultural pervasivity
Enterprise AI
There is no shortcut to Enterprise AI. It is a journey that organisations need to undertake consciously, requiring mastering each one of the four key phases, one after the other.
= Software
= Consultant