the elephant in the clouds
TRANSCRIPT
![Page 1: The Elephant in the Clouds](https://reader036.vdocument.in/reader036/viewer/2022062905/586e8d1c1a28aba0038b874f/html5/thumbnails/1.jpg)
The Elephant in the CloudsSanjay RadiaChief Architect, Founder Hortonworks
![Page 2: The Elephant in the Clouds](https://reader036.vdocument.in/reader036/viewer/2022062905/586e8d1c1a28aba0038b874f/html5/thumbnails/2.jpg)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Hadoop in the Cloud?
Unlimited Elastic Scale
Ephemeral & Long-Running
IT & Business Agility
No UpfrontHW Costs
$0
![Page 3: The Elephant in the Clouds](https://reader036.vdocument.in/reader036/viewer/2022062905/586e8d1c1a28aba0038b874f/html5/thumbnails/3.jpg)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Today’s Hadoop Cloud Solutions
The Forrester WaveTM
Big Data Hadoop Cloud SolutionsQ2 2016Get it at //aka.ms/forresterwave
Rackspace
OracleAltiscaleQubole
IBMAmazon Web Services
Microsoft
LeadersStrong
PerformersContendersChallengers
StrongWeak Strategy
Weak
Strong
CurrentOffering
MarketPresence
![Page 4: The Elephant in the Clouds](https://reader036.vdocument.in/reader036/viewer/2022062905/586e8d1c1a28aba0038b874f/html5/thumbnails/4.jpg)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Architectural Considerations for Hadoop in the Cloud
Shared Data& Storage
On-Demand Ephemeral Workloads
1010110101010101
010101010101010101010101010101010
Elastic Resource Management
Shared Metadata, Security & Governance
![Page 5: The Elephant in the Clouds](https://reader036.vdocument.in/reader036/viewer/2022062905/586e8d1c1a28aba0038b874f/html5/thumbnails/5.jpg)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Prescriptive On-Demand Ephemeral Workloads
On-DemandEphemeralWorkloads
Data ScienceR/W TablesCompute Fabric
ETL
R/W TablesCompute Fabric
WarehouseR/W TablesCompute Fabric
Search
R/W TablesCompute Fabric
![Page 6: The Elephant in the Clouds](https://reader036.vdocument.in/reader036/viewer/2022062905/586e8d1c1a28aba0038b874f/html5/thumbnails/6.jpg)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Shared Data and Storage
Understand and Leverage Unique Cloud Properties Shared data lake is cloud storage accessible
by all apps Cloud storage segregated from compute Built-in geo-distribution and DR
Focus Areas Address cloud storage consistency
and performance Enhance performance via memory
and local storage
Shared Data& Storage
1010110101010101
010101010101010101010101010101010
![Page 7: The Elephant in the Clouds](https://reader036.vdocument.in/reader036/viewer/2022062905/586e8d1c1a28aba0038b874f/html5/thumbnails/7.jpg)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enhance Performance via Caching
Tabular Data: LLAP Read + Write-thru Cache Cache only the needed columns Shared across jobs / apps and across engines Spills to SSD when memory is full (anti-caching) Read & Write-through cache Security: Column-level and row-level
HDFS Caching for Non-tabular Data Cache data from cloud storage as needed Write-through cache
Workloads
Cloud Storage
LLAP R/W TablesHDFS Files
Cache
![Page 8: The Elephant in the Clouds](https://reader036.vdocument.in/reader036/viewer/2022062905/586e8d1c1a28aba0038b874f/html5/thumbnails/8.jpg)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Shared Data Requires Shared Metadata, Security, and Governance
Shared Metadata Across All Workloads Metadata considerations
– Tabular data metastore– Lineage and provenance metadata– Pipeline and job management metadata– Add upon ingest– Update as processing modifies data
Access / tag-based policies and audit logs Centrally stored to facilitate use across apps
– Ex. backed by Cloud RDS (or shared DB)
Classification
Prohibition
Time
Location
Streams
Pipelines
Feeds
Tables
Files Objects
SharedMetadata
Policies
![Page 9: The Elephant in the Clouds](https://reader036.vdocument.in/reader036/viewer/2022062905/586e8d1c1a28aba0038b874f/html5/thumbnails/9.jpg)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Elastic Resource Management in Context of Workload
Workload Management vs. Cluster Management Understand resource needs of different
workload types Add / remove resources to meet workload SLAs Manage compute power and high-performance
data-access (ex., LLAP) Pricing-aware: instances (spot, reserved),
data, bandwidthElasticResourceManagement
![Page 10: The Elephant in the Clouds](https://reader036.vdocument.in/reader036/viewer/2022062905/586e8d1c1a28aba0038b874f/html5/thumbnails/10.jpg)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ram VenkateshSenior Director of EngineeringHortonworks
Demo of Cloud Tech PreviewEffectiveness of mobile ad spend (cross device attribution)
Clickstream ETL BI & Reporting Data Science
Data, Metadata, Security
Cloud Control Plane
![Page 11: The Elephant in the Clouds](https://reader036.vdocument.in/reader036/viewer/2022062905/586e8d1c1a28aba0038b874f/html5/thumbnails/11.jpg)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Vision: Connected Data Architecture Enables Enterprise Transformations
Data in Motion
Data in Motion
Data at Rest
Data at Rest
MachineLearning
Deep HistoricalAnalysis
C L O U D
D ATA C E N T E R
Stream Analytics
Edge Data
Edge Data
Edge Analytics
![Page 12: The Elephant in the Clouds](https://reader036.vdocument.in/reader036/viewer/2022062905/586e8d1c1a28aba0038b874f/html5/thumbnails/12.jpg)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Recommended Sessions…Thursday Hadoop & Cloud Storage: Object Store Integration in Production LLAP: Sub-Second Analytical Queries in Hive Zeppelin + Livy: Bringing multi tenancy to interactive data analysis
CHECK OUT HORTONWORKS CLOUD TECH PREVIEW!http://hortonworks.com/news-blogs/
![Page 13: The Elephant in the Clouds](https://reader036.vdocument.in/reader036/viewer/2022062905/586e8d1c1a28aba0038b874f/html5/thumbnails/13.jpg)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You