1© Cloudera, Inc. All rights reserved.
Hadoop in the CloudDavid Tishgart, Director of Product Marketing, Cloudera
2© Cloudera, Inc. All rights reserved.Cloudera — Confidential
MARKET & POSITIONING
The modern platform for data management and analytics, built on Apache Hadoop, Spark, latest open source tech
CUSTOMERS >1,000 enterprise subscription software customers
LARGESTECOSYSTEM
>2,600 partners
GLOBAL EXPANSION
Operations in 27 countries, 24x7 global support, educationCustomers in >50 countries
EMPLOYEES >1,400
FUNDING-TO-DATE
$670 million (Intel, Accel Partners, Institutional Investors)
Cloudera Snapshot
Cloudera — Confidential
4© Cloudera, Inc. All rights reserved.
Drive Customer
Insights
Improve Product & Services Efficiency
Lower Business
Risk
5© Cloudera, Inc. All rights reserved.
The world’s largest taxi
company owns ZERO
vehicles.
The world’s largest
accommodation provider owns
ZERO real estate.
The world’s most popular
media owner creates ZERO
content.
The world’s leading
music platform owns no
music.
7© Cloudera, Inc. All rights reserved.
Data is abundant …and cheap.
Keep all data online as long as needed.
8© Cloudera, Inc. All rights reserved.
Computationis affordable.
Ask bigger questions as fast as you can.
9© Cloudera, Inc. All rights reserved.
Internet of Things (IoT) – A Revolution In The Making
$1.7
TrillionIn Value
20% Annual
Growth
30
BillionThings
250
MillionConnected
Vehicles
Source - IDC & Gartner Estimates
Internet of
Things
IoT Markets - 2020
10© Cloudera, Inc. All rights reserved.
A modern data architecture is needed to drive success from data
12© Cloudera, Inc. All rights reserved.Cloudera — Confidential
Current Data ArchitecturesLimited data. Single access. Platform silos.
SERVEINTEGRATE& PROCESS
ANALYZE
13© Cloudera, Inc. All rights reserved.Cloudera — Confidential
Modern Data PlatformUnlimited data. Diverse access. One platform.
OPERA
TIONS
DATAM
ANAGEM
ENT
UNIFIEDSERVICES
PROCESS,ANALYZE,SERVE
STORE
INTEGRATE
SERVEINTEGRATE& PROCESS
ANALYZE
14© Cloudera, Inc. All rights reserved.Cloudera — Confidential
: Distributed Compute ::
OPERA
TIONS
DATAM
ANAGEM
ENT
UNIFIEDSERVICES
PROCESS,ANALYZE,SERVE
STORE
INTEGRATE
: Distributed Data
15© Cloudera, Inc. All rights reserved.
What’s Driving Hadoop to the Cloud?Enterprise customers using cloud for big data analytics
Hadoop deployments in cloud are accelerating:
● Executive mandate: minimize on-prem datacenter footprint
● Perceived lower overall TCO
● Increased agility: end-user self-service
● Elasticity: optimize infrastructure usage
17© Cloudera, Inc. All rights reserved.
Common workloads in the cloud
Only pay for what you need, when you need it
▪ Transient clusters▪ Elastic workload▪ Object storage centric▪ Cloud-native deployment
ETL/Modeling(Data Engineering)
App Delivery(Operational
Database)
Reduce Operating Costs New Insights, New Revenue Run Without Risk
BI/Analytics(Analytic Database)
Explore and analyze all data, wherever it lives
▪ Transient or Persistent clusters▪ Sized to demand▪ HDFS or object storage▪ Lift-and-shift or cloud-native
deployment
Enterprise-grade to protect your business, no matter what
▪ Fixed clusters▪ Periodic sync▪ All HDFS storage▪ Lift-and-shift deployment
18© Cloudera, Inc. All rights reserved.
Crunching 1,000+ Business Metrics per Customer with Sub-Second Responses
• Enables granular targeting of customers
• 50% reduction in marketing cost execution at one bank with focus on high potential customers
• Stores and processes thousands of critical events at scale at a low cost
• Provides flexibility, agility to support customer needs with Cloudera on Amazon Web Services and on premises
CUSTOMER 360
FINANCIAL SERVICES» BEHAVIORAL ANALYTICS» PREDICTIVE ANALYTICS» SCALABLE PROCESSING
19© Cloudera, Inc. All rights reserved.
Common workloads in the cloud
Only pay for what you need, when you need it
▪ Transient clusters▪ Elastic workload▪ Object storage centric▪ Cloud-native deployment
ETL/Modeling(Data Engineering)
App Delivery(Operational
Database)
Reduce Operating Costs New Insights, New Revenue Run Without Risk
BI/Analytics(Analytic Database)
Explore and analyze all data, wherever it lives
▪ Transient or Persistent clusters▪ Sized to demand▪ HDFS or object storage▪ Lift-and-shift or cloud-native
deployment
Enterprise-grade to protect your business, no matter what
▪ Fixed clusters▪ Periodic sync▪ All HDFS storage▪ Lift-and-shift deployment
20© Cloudera, Inc. All rights reserved.
Providing a complete view of consumer watching and buying habits
• Helps customers optimize their ad spend for greater campaign ROI
• Improves processing performance as data volumes double
• Boosts agility and flexibility and reduces risk with hybrid and multi-cloud strategy
CUSTOMER 360
MEDIA » CUSTOMER 360°» OMNI-CHANNEL ANALYTICS» SCALABLE PROCESSING
21© Cloudera, Inc. All rights reserved.
Common workloads in the cloud
Only pay for what you need, when you need it
▪ Transient clusters▪ Elastic workload▪ Object storage centric▪ Cloud-native deployment
ETL/Modeling(Data Engineering)
App Delivery(Operational
Database)
Reduce Operating Costs New Insights, New Revenue Run Without Risk
BI/Analytics(Analytic Database)
Explore and analyze all data, wherever it lives
▪ Transient or Persistent clusters▪ Sized to demand▪ HDFS or object storage▪ Lift-and-shift or cloud-native
deployment
Enterprise-grade to protect your business, no matter what
▪ Fixed clusters▪ Periodic sync▪ All HDFS storage▪ Lift-and-shift deployment
22© Cloudera, Inc. All rights reserved.
Measure user interaction across the ecosystem, help direct R&D and development spend• Virtuous cycle: Identify features that
facilitate sharing of content that drive new customers
• Real-time streaming and batch data from product logs, web analytics, channel data and ERP
• Impala connects to third-party data wrangling and BI tools for fast reporting
MANUFACTURING» CUSTOMER 360» DATA DRIVEN PRODUCTS» DATA DRIVEN SERVICES
DATA-DRIVENPRODUCTS
23© Cloudera, Inc. All rights reserved.
Key Requirements of Big Data in the Cloud
Size compute and storage independently, grow and shrink clusters dynamically, and pay only for what you use on ad-hoc, transient workloads
Preserve business flexibility and data portability and minimize cloud lock-in by running in any one of the three major public cloud providers or in private cloud
Reduce risk with comprehensive manageability, availability, security, and governance required for production big data workloads
Elastic Hybrid/Multi-Cloud Enterprise Grade
25© Cloudera, Inc. All rights reserved.
Embrace Transience for Lower Costs
Decoupled Storage and Compute for Elastic Scale
Patterns of Cloud-Native ApplicationsFlexibility, Self-Service Models, and New Cost Dynamics
Compartmentalize for Greater Isolation
Object Store
COMPUTE
1hr
SPIN UP SPIN DOWN
Object Store
26© Cloudera, Inc. All rights reserved.
Transient Applications
Transient cluster requirements: ● Object store integration● Fast cluster provisioning● Cluster metadata persistence● Usage-based pricing
Examples of transient clusters in the cloud: ● ETL workflows● Model training● Ad hoc analytics● Dev and test workflows
27© Cloudera, Inc. All rights reserved.
Persistent ApplicationsPersistent clusters have similar requirements to on-prem clusters:
• High availability and disaster recovery• Cluster operational management• Resource management• Security
Acquire some benefits from cloud infrastructure• Cluster auto-scaling for peak demand• Ad-hoc dev & test environments• Capitalize on cheaper “blade” instances
Examples of persistent use cases in the cloud: • Hbase, Solr clusters• Kafka clusters• BI analytics• Busy, multi-user clusters
28© Cloudera, Inc. All rights reserved.
Persistent application
DataSource
s
Real-TimeServing
Kafka/Flume
Spark Streaming
HBase orImpala/Kudu
(beta)
Kafka
Application
S3
Hive/Spark/HoS
Impala
Analytics
Batch Data Transformations
Streaming Architecture
29© Cloudera, Inc. All rights reserved.
Transient application
DataSource
s
Real-TimeServing
Kafka/Flume
Spark Streaming
HBase, orImpala/Kudu
(beta)
Kafka
Application
S3
Hive/Spark/HoS
S3
Batch Data Transformations
Batch Analytics
Impala
BI & Analytics
30© Cloudera, Inc. All rights reserved.
Combining the two: lambda architecture
DataSource
s
Real-TimeServing
Kafka/Flume
Spark Streaming
HBase orImpala/Kudu
(beta)
Kafka
Application
S3 S3
Hive/Spark/HoS
Batch Data Transformations
Impala
BI & Analytics
31© Cloudera, Inc. All rights reserved.
Get started with Cloudera Enterprise in the cloud
Deploy and manage Cloudera Enterprise in the cloud environment of your choice
Deploy an enterprise data hub on AWS
Provision and deploy Cloudera Enterprise on the Azure Marketplace
Cloudera Director AWS Quickstart Azure Marketplace