BIGDATA!=HADOOPMoving the Cheese for the Industry
Agenda
BTW2017- ©dibuco GmbH 2
1 Obeservation aboutHadoopadoptioninindustry
2 BigDatahasmovedon
3 Businesscaseonfindingthecheese
4 Conclusions
AndreasTönne
BTW2017- ©dibuco GmbH 3
• CTOofdibuco• Specialinterests in• Cloudand BigDataArchitecture• Businessrequirements for the new world• Programming language design
Observation:Viewof BigData
BTW2017- ©dibuco GmbH 4
1stof Its Kind
MassiveBatchProcessing
Impressive!MustHave!
Highly ScalableHDFSStorage
BigDataAdoption
=HadoopAdoption
Observation:Hadoop is the GoldenHammerCompaniesthat have invested inHadoop
• look for Hadoop solutions for every BigDataProblem
• think it is obvious to store BigDatainHDFS• think it is obvious to use MapReduce
BTW2017- ©dibuco GmbH 5
FindHadoop!
BTW2017- ©dibuco GmbH 6
What Makes the Industry Cling onto Hadoop?"Hadoopadoptionisdrivenprimarilybytechnologyexecutives,especiallythoseintheC-suite,includingtheCFOorCOO."(Gartner)
Largeinvestment!(Often)Smallreturn• TimeandPersonnel– Missingelsewhere• Skills– RequirementsandalgorithmsforMapReduce?• IT-Operations– Howtooperatethisalien?
WhodarestosackthisinvestmentdrivenbytheC**?
BTW2017- ©dibuco GmbH 7
StillValid– BigData =Hadoop?
BTW2017- ©dibuco GmbH 8
Data:GartnerHadoop AdoptionStudy2015
54%No
invest
18%Invest in2years
26%Have
Hadoopskills
57%Lackofskills
49%Lookingfor value
inHadoop
70%Hadoop1-20users
4%Hadoopzerousers
28%Hadoopsingleuser
Simplerprogramming
model?
Simplermanagementand backup?
ApprochableUIfor
analysis?
There are Good Reasons to Say– No!“One of the corevalue propositions of Hadoop is that it is alower costoption to traditionalinformation infrastructure,”Heudecker (Gartner)...
“However,the low numbers of users relativeto the cost of clusterhardware,as well as any software support costs,
may mean Hadoop is failing to liveup to this promise.”
That was2015– What about today?
BTW2017 9
BigDataEvolved-Where Is Your Cheese Now?GartnerHypeCycle2016– Whatgoesmissing?BigData!• UnderstandingofBigDatavaluematured
BTW2017- ©dibuco GmbH 10
Hadoop
SparkFlink,Storm,Hana
Diggingthroughhugebatchesofdata(datalake)
Streaming(IoT),machinelearning(bigin2017),naturallanguage(Watson),...
BigDataEvolved – Batchis only One Use CaseMapReduceisanexcellentfitforembarrassinglyparalleldataprocessing• ....reallifeisnotembarrassinglyparallel• It'sastreamofevents,linkedbyhistory
TimeisacriticalfactorforBigDatavaluegain• Hadoophasahugelatency• Batchesareahistoricextractofthereallifestreamofdata
NB:HDFSstoragecanstillbeanexcellentchoice
BTW2017- ©dibuco GmbH 11
Decision Process For DigitalTransformation
BTW2017- ©dibuco GmbH 12
DigitalDisruptors
NewBusinessGoals
BigDataStrategyRequirements
TechnologyChoice
Solution
Starthere!
Nothere wrong driver of goals
BusinessCase
BusinessCase– Getting Away From Hadoop
• Bestpossible linkage of data by textual contents• Fastavailability of new data• Dealing with "language"changes
BTW2017- ©dibuco GmbH 14
DB
Web
File
Events
LiveUpdates
BusinessCase– Upfront TechnologyDecisions• Hadoopforeverythingglobal• Maintainingidentityconstraints• Keepinggloballanguagestatisticscurrent• Massimportandmaintenance
• SingleNoSQLDBchoice(Titan)• GraphDBmatchedlogicaldatamodelperfectly
• Microservice/Queuingarchitecturefortherest
BTW2017- ©dibuco GmbH 15
TheTechnologyStarted Biting our Goals• Businessvalueharmedbyhardscalingproblems• SingleDataSwampStorage• StorageModelinefficient• Oopswebuiltamonolith!
• ProblemssolvedbyHadoopbatchinvasion• Timewasrunningout.Literally!
Itwastimeto...
BTW2017- ©dibuco GmbH 16
Concurrency/Distribution
Accuracy oflinkage
Consistencyrequirements
Scaling
ThinkAgain!
BTW2017- ©dibuco GmbH 17
Rethinking the Solution
BTW2017- ©dibuco GmbH 18
Data Splitbyserviceandusageneeds
Data Consistencyrequirementsreasonable?
Data Distributionofcreationandusage?
Requirements Balanceofrequirementsandscalability
Requirements Findscalablealgorithmicsolutionorbinrequirement
BusinessGoals Whataretherisks?
LegalConsiderations Whatisallowedandaccepted?
DataGovernance E.g.dataownership,IAMsystemvs.scalability
💡 Insights💡• Wehaveastreamingsituation• Timeisofhighimportance• Idealconsistencyrequirementscanbereduced• Newalgorithmsallowtoreducethedatamodelstorage
BTW2017- ©dibuco GmbH 19
Cost ofstorage
Throughput
Responsetime
Scalability
Solution• Truemicroservice architecturewithpolyglotpersistence• Thebesttechnologyandmodelforeachservice
• Modularstreamingarchitecture• Multiplestreamingtopologiescutbyservice
• Newstreamingandbig-data-optimizedanalysisalgorithms• Alotofad-hoccomputationinsteadofglobal,aginginformation
BTW2017- ©dibuco GmbH 20
Outcome• Betterresultsbyad-hoccomputation👍• Incrementalmaintenanceofaginglinguisticinformation👍• Massivereductionofstoragerequirements(est.upto70-80%)👍• Truehorizontalscalabilitybyservice👍
• CompleteremovalofHadoopbatches👍
BTW2017- ©dibuco GmbH 21
Conclusions• HadoopisseenastherolemodelofBigData(ourobservation)
• InvestmentinHadooponeofthereasonstostickwithHadoop• WeobservethatproblemsarecraftedtobesolvablebyHadoop
• ExpectationsforBigDataevolvedbeyondbatchprocessing
• Allowrethinkingofthebusinessgoalsandsolutionrequirementswithouttechnologyinmind
BTW2017- ©dibuco GmbH 22
BTW2017- ©dibuco GmbH 23
THANKYOUFORYOURATTENTION!
Franz-SchubertStraße [email protected]
Sources• Cheese theme:"WhoMoved My Cheese?:AnAmazing Wayto Dealwith ChangeinYour WorkandinYour Life"(SpencerJohnson)G.P.Putnam's Sons;1edition (September8,1998)
• Cheese picture,WikimediaCommons (ChristianBauer)
• BigDataLandscape 2016(C)MattTurk,JimHao,FirstMark Capital
• HammerMalene Thyssen,http://commons.wikimedia.org/wiki/User:Malene
• Pocketwatch,WikimediaCommons (No user listed)
BTW2017- ©dibuco GmbH 25