©2019 dataiku, inc. | dataiku.com | [email protected] ... · modern architecture linux server on...

20
©2019 dataiku, Inc. | dataiku.com | [email protected] | @dataiku

Upload: others

Post on 22-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

©2019 dataiku, Inc. | dataiku.com | [email protected] | @dataiku

Page 2: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

Where is your organization today?

Big Data Day 0

Initiation

Impact

Acceleration

Systematization

ML is for specialistsAd-hoc analyticsSiloed Approach

Demonstrate Value

Deliver Business Value In Actual Operations

Fully align data, organization and processes

Structure Execution and Self-Service

Enterprise AI

Page 3: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

What Dataiku has had the opportunity to see

Manufacturing Financial Services

Services Consumer Goods

Technology Consulting

E-Retail Media

Healthcare Travel

Unique Journeys, Shared Challenges turned Software

Across many industriesA wide array of use cases

Predictive Maintenance

Fraud detection Product Recommendation

Churn Prediction

Risk Analysis

Production Improvements

Logistics optimization

Market Analysis

Pricing

And many, many, many more

Page 4: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

The Fundamental Challenge is not “Software”Data Success at your company

Data is most often there, in many different forms and many different systems that need to be combined.

Success is not defined by having great business acumen OR fantastic data skills OR great analytics mindset.You need them all, together.

Labs, IT, data, operations, business… all have their own processes, and value delivery requires to bridge them.

Siloed Data Siloed People Siloed Processes

? ?

Page 5: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

The “Tower of Babel” Effect The Classic Data Project Silos

Business Analyst

DATA PREPARATION ML MODELING ML DEPLOYMENT

Data Preparation

Data Science Notebooks & API Platforms

AutoMLSolutions

Data Scientist

Data Engineer

Page 6: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE
Page 7: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

Bring Business, Engineers, and Data Scientists TogetherShare a common environment to have an impact

DATA PREPARATION ML MODELING ML DEPLOYMENT

Business Analyst

Data Engineer

Data Scientist

Single Collaborative, Governable and Auditable Environment

Page 8: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

From Data Access to Operationalized ModelsSolution Overview: An End-to-End Solution

DATA MANAGEMENT MACHINE LEARNING MODEL DEPLOYMENT

Build plugins for….

Data ScientistBusiness Analyst

Find UnderstandPrepare Data

Build plugins for….

Business Analyst

VISUAL AUTO PREP CODING ENVIRONMENT(S) VISUAL AUTO ML VISUAL PIPELINE VISUAL MODEL MONITORINGMODEL DEPLOYMENT

Data Scientist Business Analyst

BusinessModelling Prototype

Data Scientist

Use For Productivity And Extend

Use as a BaselineUse for optimization

Analytics Leader Data Engineer Analytics Leader

MonitorResults

OptimizeSpeed

MonitorResults

Integrate Work Together in …

Integrate

UnderstandProgress

Build Business Monitoring

Dashboard For…

Page 9: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

Architecture for Rapid Experimentation and DeploymentSolution Overview: Architecture II

CRM

Finance

Transactions

Event logs

Customer Touch Points

External Data

Operations

DATA

BASE

SAP

I’sLO

G FI

LES

SOURCE DATA

LARGE SCALE DATA STORAGE &

PROCESSING SYSTEMS

OUTPUT

MPP

DATA PREPARATION

ML MODEL BUILDING

MODEL ASSESSMENT

EXPLORATION / ANALYTICSACQUISITION

DEPLOYMENT

AUTOMATION NODE

API NODE

COLLABORATION

✓ Project bundling and deployment✓ Advanced automation scenario✓ Reporting and monitoring✓ Management API’s

✓ Deploy model through REST API’s✓ Model versioning✓ HA & load balancing✓ Logging

Data Scientists – Write Code

Business Analysts – Visual Tools

in-memory processing with Spark …

..or push compute to big data store where data lives

DEV OR PRODENVIRONMENTS

Real-time Applications

Operational Systems

Reporting & Dashboards

Analytical Databases / DWHHDFS

Big Data / Distributed

SUPPORT FOR KERBERIZED CLUSTERS

DESIGN NODE

Page 10: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

Leverage your full stack and skillsModern Architecture

LINUX SERVERON PREMISE OR MANAGED

CLOUD

CENTRALIZED OR AD-HOC DATA

SOURCES, DATABASES, DATA

LAKE

AVAILABLE OR SPUN-UP PROCESSING RESOURCES

Leveraging best storage and

compute resources

Dataiku deployment servers for enterprise grade operationalization

PRODUCTION SYSTEMS

Centralized server to facilitate access to data, ressources, and

foster collaboration

Browser based interface

VISUAL DEVELOPMENT

COMPLETE CODING ENVIRONMENTS

VISUALIZATION

COLLABORATION AND PROJECT MANAGEMENT

AUDIT, MONITORING AND

SCHEDULING

User/task specific interaction modes

Page 11: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

Goals on various level of your Enterprise - What comes first?From Strategy to Software

STRA

TEGY

CollaborationProductivity, Processes, Scaling

BUSI

NES

SSO

FTW

ARE

Infrastructure-FlexibilityData Governance

Costs

Open Source Industrialization

1

Page 12: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE
Page 13: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

Where can processing occur?Dataiku - Enterprise scaling for IT

Local Server

In Hadoop / SparkAWS EMR / …. In SQL Database In Kubernetes &

Docker

Data Preparation (Interactive / Recipe in Workflow)

YES YES Spark, Hive, Impala

YES N/A

Coding: Python, R, Scala(Notebook / Recipe in Workflows)

YES YESYES

Custom code with DSS API

YES

SQL Analytics(Notebook / Recipe in Workflow)

N/A YES(Hive, Impala, Pig, SparkSQL)

YES N/A

Visualization (Charts)

YES YES(most charts)

YES(most charts)

N/A

Machine Learning: TrainingYES

scikit-learn, XGBoost, Keras/Tensorflow

YESMLlib, Sparkling Water

YESVertica ML

YESscikit-learn, XGBoost,

Keras/Tensorflow

Machine Learning: Inference YESDepending on algorithms

YESDepending on algorithms

YESDepending on

algorithms

YESDepending on algorithms

Page 14: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

End-to-End Platform Solution to improve ProcessesFrom Wild West to Best Practices

DATA CATALOGING AND

CONNECTIVITYDATA PREPARATION

(AUTO) MACHINE LEARNING

PRODUCTION DEPLOYMENT

VISUAL

CODE

GOVERNANCE, VERSIONING, AUDIT AND REUSE

Page 15: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

D.R.Y.

Page 16: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

Maximize emergence, diffusion and reuse Build once, use again, and again, and again, and again

AutomateReuse

Navigate previous work in the centralized project directory

Copy/Paste flow components

Secure future understanding with In-built wiki

Accelerate repeated development with instant code snippets

Package complex operations in simple plugins for unlimited

reuse

Automate key operations with APIs

Identify existing components to reuse with the Catalog

After the fact A priori

Page 17: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

BARC - User SurveyDataiku Reviews

Page 18: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

Aligning Interests on your Path to Enterprise AIThe Dataiku Journey

New Projects

Quarterly business

review

Community

Dataiku Account Team + Partner

Implementation Manager and Professional Services

Customer Success Manager + Partner

Discovery Analysis Evaluation Subscription

Training

Deployment

First Project

Page 20: ©2019 dataiku, Inc. | dataiku.com | contact@dataiku.com ... · Modern Architecture LINUX SERVER ON PREMISE OR MANAGED CLOUD CENTRALIZED OR AD-HOC DATA SOURCES, DATABASES, DATA LAKE

Action Steps toward Enterprise AIAppendix: Maturity Model

Big Data Day 0

Initiation

Impact

Acceleration

Systematization

ML is for specialistsAd-hoc analyticsSiloed Approach

Demonstrate Value

Deliver Business Value In Actual Operations

Fully align data, organization and processes

Structure Execution and Self-Service

● Difficulty to assemble a first team

● Shifting data infrastructure/IT systems

● Lack of traction on business owners

● Difficulty to operationalize models

● Difficulty to get business acceptance and impact on model

● Inability to onboard analysts

Main Risks

● Fragmented technologies

● Data is limited to ‘experts’

● Maintaining models in production too costly, hindering new deployments

● Lack of capitalization on previous projects

● Fractionated initiatives difficult to reconcile

● Lack of manpower to expand projects

● Accumulated obsolescence of deployed projects

● Lack of leveraging of new technologies

● Data projects remain fairly specific, lacking cultural pervasivity

Enterprise AI

There is no shortcut to Enterprise AI. It is a journey that organisations need to undertake consciously, requiring mastering each one of the four key phases, one after the other.

= Software

= Consultant