datacanvas: big data analytic flow in cloud

19
BIG DATA ANALYTIC FLOW IN CLOUD Lei Fang@ DataCanvas.io Empower big data analytics for business

Upload: fangleida

Post on 27-Jun-2015

188 views

Category:

Internet


1 download

DESCRIPTION

A PPT explains what is DataCanvas. DataCanvas is a cloud service that allows business to create, manage and share big data analytic pipelines.

TRANSCRIPT

Page 1: DataCanvas: Big Data Analytic Flow in Cloud

BIG DATA ANALYTIC FLOW IN CLOUD

Lei [email protected]

Empower big data analytics for business

Page 2: DataCanvas: Big Data Analytic Flow in Cloud

• 16.9B USD in 2015

• 40% Big data project

• Hadoop, CAGR 58%,

2.2B 2020

Page 3: DataCanvas: Big Data Analytic Flow in Cloud

• Volume

• Velocity

• Variety

Super hot in

• Government

• Communication

• Media

• Banking

• Manufacturing

Page 4: DataCanvas: Big Data Analytic Flow in Cloud

Technology

InfrastructureIAAS, SAAS, DAAS,

ApplicationBI, Social analytics,

visualization…

Domain solutionFinance, Retail,

Insurance

DevelopmentData scientist,

Devops

Business process

Operation, Support

Page 5: DataCanvas: Big Data Analytic Flow in Cloud

ANALYTICS IS THE

Make data live Data sitting in storage generates no value

Revenue and profit from data Application and solution to get insights from data Link insights with business Don’t stop at visualization or report

Advanced analytics is the engine of business solution Fraud detection Customer retention

Page 6: DataCanvas: Big Data Analytic Flow in Cloud

COMMON ANALYTICS SCENARIOS Data analysis

Example: Estimate customer’s life cycle value User: data scientist Demanding: flexibility to explore and faster iteration

Product analysis Example: How many female customers visit website home

page and leave within less than 5 clicks? User: product manager, data analyst, marketing team Demanding: No complex coding, SQL query at most

Predictive service Example: Is this transaction a fraud? User: developer and data scientist Demanding: pipeline processing

Page 7: DataCanvas: Big Data Analytic Flow in Cloud

WHAT DOES DATACANVAS ADDRESS Powering all these scenarios

Data Analysis: Flexible Product Analysis: Intuitive Prediction service: Complex processing

Enable application, solution and business process

DataCanvas

Page 8: DataCanvas: Big Data Analytic Flow in Cloud

Hadoop(HIVE/Pig) RDBMS NOSQL SPARK

Recommendation Anomaly Detection Operation Analytics

Application

Platform to enable application and connect infrastructure

Service

Pipeline

Infrastructure

Page 9: DataCanvas: Big Data Analytic Flow in Cloud

• Big data challenges are across services, environments and even locations

Storage

Processing

Reporting

Data Generation

• An orchestration platform is required to manage and connect steps in the pipeline

• Bring Pipeline to the game

Page 10: DataCanvas: Big Data Analytic Flow in Cloud

No more central data store, bring computation to data, not vice versa!

• Unify resource

• Optimize workload

• Automation

Page 11: DataCanvas: Big Data Analytic Flow in Cloud

Unmanageable

Redundancy

Hard to fast iterate

Gap between documentation and actual workflow

Pain points

monster configuration

spaghetti script no reuse No idea what’s actually running

Page 12: DataCanvas: Big Data Analytic Flow in Cloud

WHAT IS DATACANVAS

• Drag & drop to run data flow• Public or private cloud• Intuitive job management

• Module repository• Built-in library• Make your own recipe• Powering advanced analytics

• Business solution template• Address common applications• Fully customizable

• Team collaboration • Flow sharing • Module sharing• This is the BEST documentation

Page 13: DataCanvas: Big Data Analytic Flow in Cloud

VALUE

WorkflowScheduling

Module Solution Template

Operation Developer/Data scientist

Business

• Data ETL• Machine learning • Module repository

• Business requirement• Recommendation • Fraud detection • Sentiments analysis

• User experience• Production

quality• Easy ops

Page 14: DataCanvas: Big Data Analytic Flow in Cloud

WHY CONTAINER MATTERS• Seamlessly connect to any existing/

upcoming computation infrastructure

• Enabler for module management

and sharing

• Support Lambda: Processing +

Serving + Visualization

Lambda Architecture

Page 15: DataCanvas: Big Data Analytic Flow in Cloud

COMPETITORSAWS DP

Oozie AzureML MortarData

Azkaban DataCanvas

Workflow + Scheduling

Module management

Solution template

Multiple Env support

Collaboration + Sharing

Cloud service

DataCanvas = ((Workflow + Scheduler) * Drag & drop * Module composition ) ^ Solution @ Cloud

Good

Bad or not support

Not that great

Page 16: DataCanvas: Big Data Analytic Flow in Cloud

BUSINESS MODEL Subscription

Charge services on tiers, Startup, Premium, Enterprise

Free

• 1 user• Unlimited

projects• Limited

workload, good for evaluation

• Forum support

Startup

• Unlimited users• Unlimited

projects• Decent

workload, 3-5 jobs in parallel

• Email support

Premium

• Unlimited users• Unlimited

projects• Significant

workload, >20 jobs in parallel

• Email support

Enterprise

• Unlimited users• Unlimited

projects• Workload on

scale• Full support

Annual Support Package For Premier and Enterprise customers Forum support, Email support with SLA, Telephone support

Page 17: DataCanvas: Big Data Analytic Flow in Cloud

TARGET CUSTOMER Data scientist

Assembly line to facilitate exploration Team collaboration

Analyst Drag and drop to find insights, need any more reason?

Manager Faster iteration Shorter time to deliver project Easier to maintain

Page 18: DataCanvas: Big Data Analytic Flow in Cloud

WHERE ARE WE NOWDemo upon request ([email protected])

DataCanvasIO @ GitHub

Page 19: DataCanvas: Big Data Analytic Flow in Cloud

THANK YOU