modernizing business intelligence and analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfuse...
TRANSCRIPT
![Page 1: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/1.jpg)
1©Cloudera,Inc.Allrightsreserved.
ModernizingBusinessIntelligenceandAnalytics
1©Cloudera, Inc.Allrightsreserved.
JustinEricksonSeniorDirector,ProductManagement
![Page 2: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/2.jpg)
2©Cloudera,Inc.Allrightsreserved.
•WhatbenefitscanIachievefrommodernizingmyanalyticDB?•WhenandhowdoImigratefromcurrentsystems?• Howdoesitworkinthecloud?
Agenda
![Page 3: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/3.jpg)
3©Cloudera,Inc.Allrightsreserved.
EDWOptimization
DataPreparation
Self-ServiceBI&Exploration
UseyourEDWmoreefficientlybyoffloadingworkloadstoHadoop
Fast,flexibleETLoverlargedatavolumes,sodataisalwaysreadyforyourbusiness
Fastesttime-to-insightswithamodernanalyticdatabasedesignedwithHadoop’sflexibilityandagility
KeyApplications
![Page 4: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/4.jpg)
4©Cloudera,Inc.Allrightsreserved.
Cloudera’sAnalyticDatabase
Identify,offload,&optimizeworkloadsto
Hadoop
NavigatorOptimizer
IntelligentSQLeditor
Hue
Audit,lineage,encryption,key
management,&policylifecycles
Navigator
IntegrationwiththeleadingBItools
BIPartners
InteractivequeryengineforBI&SQLanalytics
Impala
Large-scaleETL&batchprocessingengine
Hive-on-Spark
Multi-Storage,Multi-Environment
DataStorageforFast&ChangingData
Kudu
![Page 5: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/5.jpg)
5©Cloudera,Inc.Allrightsreserved.
KeyBenefitsAnanalyticdatabasedesignedforHadoop
High-PerformanceBIandSQLAnalytics
FlexibilityforDataandUseCaseVariety
Cost-effectiveScaleforTodayandTomorrow
GoBeyondSQLwithanOpenArchitecture
![Page 6: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/6.jpg)
6©Cloudera,Inc.Allrightsreserved.
AnalyticDBAnatomyBuiltforself-serviceandhybridcloud
![Page 7: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/7.jpg)
7©Cloudera,Inc.Allrightsreserved.
AnatomyofanAnalyticDatabaseCloudera DecoupledbyDesign
QueryEngine
StorageEngine
Catalog
QueryEngine(Impala)
Catalog(HMS)
MonolithicAnalyticDatabase ModernAnalyticDatabase
Storage(Kudu)
Storage(S3)
Storage(HDFS)
![Page 8: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/8.jpg)
8©Cloudera,Inc.Allrightsreserved.
LimitedtoSQLonly• Maintaindatacopiesfornon-SQL
RigidDataModel• Tightlycoupledstorageandcompute
StaticSizing• Majormaintenancetoaddcapacity/nodes
PoorlyDesignedforCloud• Noelasticityorintegrationwithobjectstorage
PainPointsTraditionalMonolithicAnalyticDatabases
∞
COMPUTESTORE
![Page 9: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/9.jpg)
9©Cloudera,Inc.Allrightsreserved.
Benefits ofCloudera’sModernApproachCloud-Native&On-Premise
GoBeyondSQL• OpenArchitecture:Openformatsandopenstorage
• ShareddataacrossSQLandnon-SQLworkloads
DataFlexibility• Faster,moreagiledataacquisition• Dataportability:Openformatsandopenstorage
Cost-EffectiveScalability• Elasticscaleon-premorinthecloud
• Cloud-nativepay-per-useandtransience
• Provenatbigdatascale
Hybrid• Runsacrossmulti-cloud&on-prem
• Multi-storageoverS3,HDFS,Kudu,Isilon,DSSD,etcSharedData
![Page 10: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/10.jpg)
10©Cloudera,Inc.Allrightsreserved.
EDWOptimizationExpandtheValueofYourDataWarehousingLandscape
![Page 11: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/11.jpg)
11©Cloudera,Inc.Allrightsreserved.
MotivationsforOptimizingtheEDW
CostcontainmentforexistingworkloadsLimitedbudgetforexpansion
UnabletotakeonnewworkloadsUnabletokeepupwithchangingbusinessneeds
Difficultyhandlingbothfixed-SLAreportsandself-serviceexploration
Growingimportanceofself-serviceBI,advancedanalytics,andcloud
$$
![Page 12: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/12.jpg)
12©Cloudera,Inc.Allrightsreserved.
ExistingEDWLandscape
DataSources
ETL/Staging
EDW
Archive
DataMarts
CannedReports
Dashboards/AnalyticApplications
Non-SQLWorkloads
Self-ServiceBI/AdHoc
![Page 13: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/13.jpg)
13©Cloudera,Inc.Allrightsreserved.
OptimizingtheEDWwithCloudera
• Cost-EffectiveScale• Sayyestomorewithouttherisk
• GoBeyondSQL• Exploration,advancedanalytics,andmoreallinoneplatform
•ModernizetheDataWarehouseLandscape• MaximizetheEDWwhileenablingiterative,self-serviceaccess/BI• Well-suitedforon-prem,cloud,andhybriddeployments
90%lessperTBvsRDBMSand75%lessvsNetezza
Augmented itsOracleEDWwithmulti-tenantClouderasystemwiththeirBItoolconfiguredtoallowuserstopullreportsfromboth
MediaResearchFirmSavedtensofmillionsbyoffloadingDBMStoClouderainthecloud
![Page 14: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/14.jpg)
14©Cloudera,Inc.Allrightsreserved.
ModernDataWarehouseEnvironment
DataSources
EDW
AnalyticDatabase
OperationalDatabase
DataScience&Engineering
SharedDataLayer
ModernDataPlatform
FixedReports
Dashboards/AnalyticApplications
Non-SQLWorkloads
Self-ServiceBI/AdHoc
FlexibleReporting
![Page 15: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/15.jpg)
15©Cloudera,Inc.Allrightsreserved.
Plan Offload Optimize
EstimateEffort
RiskAnalysis
SchemaDesign
FineTuningDataModelonHadoop
OptimizeQueriesforPerformance
Test&Validate
Evaluate
IdentifyUseCases
ImpactAnalysis
Objectives PrioritizedPlan
ValidateROI,CostInitialPOC
OffloadeachworkloadEvaluatetheneedforoffload Impactanalysis,prioritizedplan
Optimizeperformance
WorkloadVisibility
NavigatorOptimizerBuilttohelpyouthroughtheoptimizationprocess
OffloadActions
![Page 16: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/16.jpg)
16©Cloudera,Inc.Allrightsreserved.
WorkloadVisibilityGetinsightsintowhat’shappeningtoday
EvaluateQueries• Topqueries• Queryduplication• Querycomplexity• Commonaccesspatterns
EvaluateDataAccess• Toptables,topcolumns• Usage-basedERdiagram• Alltables/columnsinuse
EvaluatePOC• IdentifyinitialworkloadpieceforPoC• Getpartitioningkeysuggestions
Evaluate
![Page 17: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/17.jpg)
17©Cloudera,Inc.Allrightsreserved.
ImpactAnalysis&PrioritizedPlanUnderstandwhatittakestooffload
ImpactAnalysis• Focuseffortsbyidentifyingduplication• Workloadriskassessmentbasedoncomplexityandbestpractices
• Understandquerycompatibility
PrioritizedPlan• Estimateeffort• Identifyeasiestpiecestostartforfastsuccess• Prioritizeworkloadsforoffload
Plan
![Page 18: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/18.jpg)
18©Cloudera,Inc.Allrightsreserved.
PredictableOffloadRemovetheguesswork
Understandoffloadrequirements• Determinemostcommonworkload
patterns• Developdata-/usage-drivenoffload
strategy
Actionablerecommendations• Complexityassessmentforriskierareas• Focuseffortsbyidentifyingduplication• Designrecommendationsforbestresults
Offload
![Page 19: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/19.jpg)
19©Cloudera,Inc.Allrightsreserved.
OptimizingwithinHadoopMaintainpeakperformance
Understandusageandkeepupwithdataneeds• Understandmostcommonusagepatterns• Identifyoptimizationopportunities• Proactivelyadjustdatamodels
Performanceoptimizations• BestpracticeguidanceforHiveandImpala• Queryperformanceoptimization• Increaseplatformadoption
Optimize
![Page 20: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/20.jpg)
20©Cloudera,Inc.Allrightsreserved.
Builtforhybridcloud
![Page 21: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/21.jpg)
21©Cloudera,Inc.Allrightsreserved.
What’sDrivingAnalyticstotheCloud?Bigdatadeploymentsincloudareaccelerating:
● ExecutiveMandate:Minimizeon-premdatacenterfootprint
● IncreasedAgility:End-userself-service
● Elasticity:Optimizeinfrastructureusage
● LowerOverallTCO
![Page 22: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/22.jpg)
22©Cloudera,Inc.Allrightsreserved.
MostOrganizationsAreorWillbeHybridCloud
• 76%willembracehybridcloud(Gartner1)• 82%willhaveamulti-cloudstrategy(RightScale2)• 50%will“repatriate”atleastonepubliccloudworkloadbacktoprivatecloudor
on-prem forcostreasons(4513)• 50%ofCloudera’scloudcustomersrunahybridenvironment
1Gartner,MarketTrends:CloudAdoptionTrendsFavorPublicCloudWithaHybridTwist20152RightScale 2016StateoftheCloudReport3451Research:AWSLambda:newandexciting,oldandrehashed,morevendorlock-in(oralltheabove)?,November22,2016
Whyisthisacriticalstrategy?
Portability&Cost Functionality DataGravity
![Page 23: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/23.jpg)
23©Cloudera,Inc.Allrightsreserved.
Cost-Efficiencies&FlexibilityintheCloudPrimaryAnalyticDatabasePatterns
Onlypayforwhatyouneed,whenyouneedit
▪ Transientclusters▪ Objectstoragecentric▪ Cloud-nativedeployment
ETL
ReduceOperatingCosts NewInsights,NewRevenue
BI/Analytics
Exploreandanalyzealldata,whereveritlives
▪ Long-runningclusters▪ Objectstorageorlocalstorage▪ Lift-and-shiftdeployment
![Page 24: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/24.jpg)
24©Cloudera,Inc.Allrightsreserved.
AddUseCases,Analytics,andDataOn-Demand• AvoidtheITbacklogwithinstantaccesstoalldata
• On-demandclustersquerydirectlyonsharedobjectstorage
PredictableResultsWheneverYouWant• Consistentqueryperformance,evenduringpeaktimes
• Multi-tenancyviaisolatedclustersonshareddata
Just-in-TimeResources• Real-timecapacityforyourneeds,astheychange
• Elasticallygrow/shrinkyourclusterviadecoupledarchitecture
Contention-FreeETL• ETLanytimewithoutimpactingotherworkloadsorriskingSLAs
• SeparateETLclustersas-neededonshareddata
AdditiveBenefitsintheCloudExtendingcoreperformance,flexibility,scalability,andopenarchitecturebenefits
![Page 25: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/25.jpg)
25©Cloudera,Inc.Allrightsreserved.
BI/AnalyticsintheCloudThreeArchitecturesOptionstoOptimizePrice/Performance
ObjectStorage
TransientCluster
TransientBI(infrequentusage)Spinupclusterswhenneeded● On-demandinstances● Usage-basedpricing● Grow/shrink● Clusterpertenantoruser
PersistentBI(regularusage)PersistentclustersforBIanytime● Reservedinstances● Node-basedpricing● Grow/shrink● Clusterpertenantgroup
PersistentCluster
PersistentBIwithLocalStorage(fastest)Maxspeedformoreregularworkloads● Reservedinstances● Node-basedpricing● Lessfrequentgrow/shrink● Sharedclusterforsharedlocaldata
PersistentCluster HDFSand/orKudu
PersistentCluster
TransientCluster
DefaultChoice
![Page 26: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/26.jpg)
26©Cloudera,Inc.Allrightsreserved.
PersistentBIonObjectStorageBestforelasticity(andspeedvstransient)
● Thisisusuallythebestchoice● Bestwhenworkloadsare:
o Flexibleandchangingo Frequentduringmostworkingdayso Notscheduledforfixedhours
● Benefitsinclude:o Predictableresultsreadilyavailableo Fullmulti-tenantisolationo Commondatainsharedobjectstorageo Grow/shrinkforTCOefficiency
● Tradeoffs:o Pernodeperfofobjectstorage(usemore,
cheapernodes)ObjectStorage
SharedHMSDB
PersistentBI(regularusage)Persistentclustersforreadyavailability● Reservedinstances● Node-basedpricing● Grow/shrink● Clusterpertenantgroup
PersistentCluster
PersistentCluster
DefaultChoice
![Page 27: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/27.jpg)
27©Cloudera,Inc.Allrightsreserved.
PersistentBIwithLocally-AttachedStorageBestperformanceforconsistentworkloads
● Bestwhenworkloadsare:o Regularandconsistento Consistentlyqueryingcommondatao TightSLAsforperformanceo Fastchangingdata(thatneedsKudu)o Runningwithoutobjectstorage(eg.Azure,GCE)
● Benefitsinclude:o Fasterperformancepernodeonlocaldatao Abilitytoqueryobjectstorageforrestofdata
● Tradeoffs:o Lesselasticthanobjectstoredbasedclusterso Lessisolationformulti-tenantworkloadsusing
sameHDFSdatao Costifthereareoff-peakhours
ObjectStorage
PersistentBIwithHDFS(fastest)Maxspeedformoreregularworkloads● Reservedinstances● Node-basedpricing● Lessfrequentgrow/shrink● SharedclusterforsharedHDFSdata
PersistentCluster
LocalHMSDB
HDFSand/orKudu
![Page 28: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/28.jpg)
28©Cloudera,Inc.Allrightsreserved.
TransientBIonObjectStorageBestTCOforinfrequentusage
ObjectStorage
ClouderaDirector
● Bestwhenworkloadsare:o Infrequentorscheduled
● Benefitsinclude:o LowestTCOwithclustersonlywhenneededo Fullmulti-tenantisolationo Commondatainsharedobjectstorage
● Tradeoffs:o Delaytospin-upclusterswhenneededo CapabilityofBIuserstospinupclusterso Pernodeperfofobjectstorage(usemore,
cheapernodes)SharedHMSDB
TransientCluster
TransientBI(infrequentusage)Spinupclusterswhenneeded.● On-demandinstances● Usage-basedpricing● Grow/shrink● Clusterpertenantoruser
TransientCluster
![Page 29: Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse your EDW more efficiently by offloading workloads to Hadoop Fast, flexible ETL over](https://reader033.vdocument.in/reader033/viewer/2022050520/5fa39fd4d65b26346c3f5077/html5/thumbnails/29.jpg)
©Cloudera,Inc.Allrightsreserved. 29
ThankyouThankYouJustinErickson