ctd taking advantage of cloud elasticity and flexibility...
TRANSCRIPT
1©Cloudera,Inc.Allrightsreserved.
FredKoopmansSr.DirectorofProductManagement
TakingAdvantageofCloudElasticityandFlexibility
1©Cloudera,Inc.Allrightsreserved.
2©Cloudera,Inc.Allrightsreserved.
Publiccloudadoptionissurging
3©Cloudera,Inc.Allrightsreserved.
Clouderacustomersareleadingtheway
4©Cloudera,Inc.Allrightsreserved.
Speed Convenience Scale
Self-Service TCO
Hadoopwasbornforthecloud
5©Cloudera,Inc.Allrightsreserved.
Performance BillShock ApplicationPortability Security
DataGovernance
DataSovereignty HybridCloud Lock-in
But,cloudcomeswithitsownsetofchallenges
6©Cloudera,Inc.Allrightsreserved.
Liftandshifttheplatform
Optimizeeachapplicationindividually
ReconstructanEnterpriseData
Hub
Astepwiseapproach
7©Cloudera,Inc.Allrightsreserved.©Cloudera,Inc.Allrightsreserved.
Liftandshifttheplatform
8©Cloudera,Inc.Allrightsreserved.
OpenEnvironment
Runthesameplatformindifferentcloudsoron
baremetal,socustomerscanmoveasneededwithoutmigration
orretraining
OpenEcosystem
450+certifiedISV’sassuresbackwardcompatibilityacrossreleases,socustomers
canleveragetheirpre-existinginvestments
OpenSource
Avoidvendorlock-in,andleveragecomponentssupported
bythecommitterswhodrivethe
communityroadmap
Opennessisevenmoreimportantinthecloud
9©Cloudera,Inc.Allrightsreserved.
Inon-premenvironments,manyapplicationstypicallyshareasingle,multi-tenantcluster
HDFS
10©Cloudera,Inc.Allrightsreserved.
Thecloudcreatesmore&smallerclusters,specializedforeachapplication
S3 AzureDataLake GoogleStorage*
11©Cloudera,Inc.Allrightsreserved.
• Differentdataconsistencymodels• Differentdirectorystructuresupport
Notadrop-inreplacementforHDFS
• Differentaccesscontrolmodels• Differentmaturitylevels
NotallObjectStorescreatedequal
• MostlyfinishedforS3• JustgettingstartedforADLS• NotyetstartedforGCS
NotyetuniversallysupportedbyCDH
Wheretostorethedata?
ObjectStoragegenerallybestchoice•Performanceoftengoodenough•GenerallycheaperperTBthanDAS•Scalesindependentlyfromcompute
12©Cloudera,Inc.Allrightsreserved.
SeparationfromHDFS
•S3Aconnector•ADLSconnector
Fillingthegaps
•Performance•Consistency•Renames
ClouderaFunctionalEquivalence
•Security•Governance•Backup&Recovery
Cross-ClusterSharing
•Permissions•Catalogue•Lineage
ObjectStoragesupportisrapidlyreachingmaturity
S3 ADLS
MapReduce Y Y
Hive Y Y
HiveonSpark Y -
Spark Y Y
HBase -
Impala Y -
Hue Y -
SupportasofC5.11
13©Cloudera,Inc.Allrightsreserved.
Howtoprovisionandmanagecloudinfrastructurecosteffectively?
Provisioningrequirements•Spinclustersup&downquickly•Grow&shrinkclustersdynamically•Selectrightinstancetypesforeachservice
•Leveragedemandbasedpricingwheneverpossible
Managementrequirements•Fullyautomatedandparallelizedinstallationandconfiguration
•Manageallaspectsofclustersecurityautomatically
•Retaindiagnosticandloginformationafterclusterisgone
•Supporttransientandlong-livedclusters
14©Cloudera,Inc.Allrightsreserved.
Easy• Singlepaneofglassforallcloudinfrastructure• Createtemplatestorunapplicationsinapre-optimizedmanner
Flexible•Multi-cloud:AWS,Azure,GCP• Hourlypricingwithautobilling&metering• Spotinstance/blocksupport
Enterprise-grade• IntegrationacrossClouderaEnterprise•ManagementofCDHdeploymentsatscale• DeeplyintegratedwithClouderaManager
ClouderaDirectorautomatesclusterlifecyclemanagement
15©Cloudera,Inc.Allrightsreserved.
Easyadministration• Spotinstanceresiliency• Automatedsecuritycredentialhandling
Transientclusteroperations• Optimizedclusterprovisioning• Automaticcollectionofdiagnosticsandlogs
Long-livedclusteroperations• Downtime-lessupgrade,patch,restart,andreconfiguration
•Monitoring,alerting,healthchecking,reporting,etc.
ClouderaManagerautomatesclusteroperations
ObjectStore
16©Cloudera,Inc.Allrightsreserved.©Cloudera,Inc.Allrightsreserved.
Optimizeeachapplicationindependently
17©Cloudera,Inc.Allrightsreserved.
Really,fourdiscreteapplicationsononeunifiedplatform
Moderndataprocessing(ETL)atscale
DataEngineering
Explore,analyze,andunderstandallyourdata
AnalyticDatabase
Data-drivenapplicationstodeliverreal-timeinsights
OperationalDatabase
Multi-Storage,Multi-Environment
Exploratorydatascienceandmachinelearningforthe
enterprise
DataScience
18©Cloudera,Inc.Allrightsreserved.
DataScience&Engineering
AccessPatterns• Batch• Canbetransientorpersistent
PerformanceNeeds• Relativelyinsensitvetolatencyanddatalocality
Security• Securityoftennotrequiredformanyusecases
OperationalDatabase
AccessPatterns• Real-time• Typicallypersistent
PerformanceNeeds• Typicallyquitesensitivetolatencyanddatalocality
Security• Fine-grainedsecurityoftenrequired
AnalyticDatabase
AcessPatterns• Batchorinteractive• Canbetransientorpersisent
PerformanceNeeds• Relativelyinsensitvetolatencyanddatalocality
Security• Fine-grainedsecurityoftenrequired
Needsofeachapplicationcanvarygreatly
19©Cloudera,Inc.Allrightsreserved.
DataScience&Engineering inthecloudThree architecturalpatternstooptimizeprice,convenience,performance
BatchCluster
TransientBatch(mostflexible)Spinupclustersasneeded● On-demand/spotinstances● Usage-basedpricing● Sizedforworkload● Clusterpertenant/user
BatchCluster
BatchCluster
PersistentBatch(mostcontrol)Persistentcluster(s)forfrequentETL● Reservedinstances● Node-basedpricing● Grow/shrink● Clusterpertenantgroup
PersistentClusterBatch
PersistentBatchonHDFS(fastest)TopperformanceforfrequentETL● Reservedinstances● Node-basedpricing● Grow/shrink● Sharedacrosstenantgroups
Batch Batch
PersistentClusterHDFS
Batch Batch
DefaultChoice
ObjectStorage
20©Cloudera,Inc.Allrightsreserved.
AnalyticDBinthecloud
NewInsights,NewRevenue
BI/Analytics
Exploreandanalyzealldata,whereveritlives
▪ Long-runningclusters▪ Objectstorageorlocalstorage▪ Lift-and-shiftdeployment
Onlypayforwhatyouneed,whenyouneedit
▪ Transientclusters▪ Objectstoragecentric▪ Cloud-nativedeployment
ETL
ReduceOperatingCosts
RefertoDataScience&Engineeringguidelines
Presentsnewsetofchoices
21©Cloudera,Inc.Allrightsreserved.
BI/AnalyticsinthecloudThree architecturalpatternstooptimizeprice,convenience,performance
ObjectStorage
TransientCluster
TransientBI(infrequentusage)Spinupclusterswhenneeded● On-demandinstances● Usage-basedpricing● Grow/shrink● Clusterpertenantoruser
PersistentBI(regularusage)PersistentclustersforBIanytime● Reservedinstances● Node-basedpricing● Grow/shrink● Clusterpertenantgroup
PersistentCluster
PersistentBIwithLocalStorage(fastest)Maxspeedformoreregularworkloads● Reservedinstances● Node-basedpricing● Lessfrequentgrow/shrink● Sharedclusterforsharedlocaldata
PersistentCluster HDFSand/orKudu
PersistentCluster
TransientCluster
DefaultChoice
22©Cloudera,Inc.Allrightsreserved.
OperationalDBinthecloudNotaswellsuitedforcloud,buttargetedbenefitsarepossible
CostGoals
• Low-costbackupanddisasterrecovery• Developmentandtestingenvironmentseasytodeployanddecommission
ConvenienceGoals
• Elasticgrowthfortightlyprovisionedworkloadsmakesexpansioneasy,andenablesalower-coststeadystate
• Fastandeasyprovisioningofadditionalclustershelpsprojectsmovequickly
23©Cloudera,Inc.Allrightsreserved.©Cloudera,Inc.Allrightsreserved.
ReconstructanEnterpriseDataHub
24©Cloudera,Inc.Allrightsreserved.
ManyproblemsareacombinationofSQL&predictive,batch&online
EnterpriseDataWarehouse
ApplicationsDataSources OperationalDataStores
TraditionalArchitecture
EnterpriseDataWarehouse
ServeELT
Archive
BISystem
Modeling
Reporting
ETL
HPCGRID
Storage#2
Storage#1
Ingest
Process Load
Unstructured
FinancialLedgerP&L
RisksMarket,
Counterparty,Ratings
PaymentsCollectionsCharges
Ingest
Ingest
PortfolioContractsPortfolio
25©Cloudera,Inc.Allrightsreserved.
CommonOperations
ObjectStore ObjectStore
DeveloperWorkbench
CommonGovernance
CommonSecurity
ReimaginingtheEnterpriseDataHubinthecloud
Common:Operations,Governance,Security,Schema,Catalog
SQLWorkbenchPartnerEcosystem
©Cloudera,Inc.Allrightsreserved. 26
ThankyouThankYouFredKoopmans