beyond traditional data warehouse
TRANSCRIPT
-
7/27/2019 Beyond Traditional Data Warehouse
1/33
Analytic Platforms:
Beyond the Traditional Data Warehouse
By Merv Adrian and Colin White
BeyeNETWORK Custom Research Report
Prepared for Vertica
-
7/27/2019 Beyond Traditional Data Warehouse
2/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 2
Executive Summary
Theoncestaidandsettleddatabasemarkethasbeendisruptedbyanupwellingofnewentrants
targetingusecasesthathavenothingtodowithtransactionprocessing.Focusedonmakingmore
sophisticated,real-timebusinessanalysisavailabletomoresimultaneoususersonlarger,richersetsof
data,theseanalyticdatabasemanagementsystem(ADBMS)playershavesoughttoupendthenotion
andsuccessfullyintroducedtheanalyticplatform andprovenitsvalue.
newcomerssuccessfullyplacedanadditionalthousandinstancesbytheendofthedecade,makingit
fromincumbentclassicdatawarehouserelationaldatabasemanagementsystemproducts.
Analyticplatformsprovidetwokeyfunctions:theymanagestoreddataandexecuteanalyticprogramsagainstit.Wedescribethemasfollows:
Ananalyticplatformisanintegratedandcompletesolutionformanagingdataandgenerating
businessanalyticsfromthatdata,whichoffersprice/performanceandtimetovaluesuperiorto
non-specializedofferings.Thissolutionmaybedeliveredasanappliance(software-only,packaged
hardwareandsoftware,virtualimage),and/orinacloud-basedsoftware-as-a-service(SaaS)
form.
platformtobethetoolstheyusetoperformtheanalysis.Thismaybealegacyofclient-serverdays,
whenanalysiswasperformedoutsidethedatabaseonrichclientsoftwareondesktops.Butthe
increasingrequirementfortheADBMStopowertheanalysisisupendingthisthinking,andmostagreed
withourdescription.Wefound:
Thepaceofadoptionisstrongandacceleratingv
ofusecases,worldwide,inmanyindustries.
Thepromisesbeingmadearebeingmetv .Adoptersofanalyticplatformsreportthatthey
Therightselectionprocessisessential.v
-bersofuserslikelytobeonthesystem.Andrealtestsseparatewinnersfromlosers:often,some
candidatescantgetitdoneatall.
-
7/27/2019 Beyond Traditional Data Warehouse
3/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 3
Introduction
Weconductedanonlinesurveyofseveralhundredprofessionalsworldwide,whosharedtheir
experiencesandopinionswithus.Surveyresultsareshownattheendofthisreport;weincludesome
Figure1:AreYouUsingorPlanningtoUseanAnalyticPlatform?
Wealsoconductedinterviewswith8analyticplatformvendors,allofwhomaretargetingthismarket,
andwithanominatedcustomerfromeach.Theintervieweesarequitedifferentfromtheoverall
(DBMS)productsforanalyticplatformsinproportionsthatmirroredoverallmarketshares,our
intervieweescomefromtheleadingedgeofthedisruptiveanalyticplatformphenomenon.Theywork
manyapplications,includingsomeofthebusinessanalyticsbeingtargetedbythevendorsofanalytic
platforms,buthaveoptedtousespecialtyplatformsforavarietyofreasons.
Whatwelearnedwasprofound;businesses,moreandmoredrivenbytheirneedforanalyticprocessing
afewyears,generatingbillionsofdollarsinrevenue,heraldthearrivaloftheanalyticplatformasa
categorytobewatchedclosely.Itsolvesimportantproblems,andcustomersarederivingenormous
valuefromit,creatingnewclassesofbusinessapplicationsanddrivingtop-linegrowth.
fromtheissuesthatledyoutoaddananalyticplatformdata:theneedforcomplexanalyses,query
performance,andon-demandcapacitytoppedthelist.Theseissuesaremirroredinthecasestudy
interviewees.
-
7/27/2019 Beyond Traditional Data Warehouse
4/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 4
Thisreportexaminestheanalyticplatform,thebusinessneedsitmeets,thetechnologiesthatdriveit,
andtheusesanalyticplatformsarebeingputto.Itconcludeswithsomeguidanceonmakingtheright
choicesandgettingstartedwiththeproductsofchoice.
The Business Case for Analytic Platforms
What is an Analytic Platform?
sophisticateddemandsforbusinessanalytics.Itcombinesthetoolsforcreatinganalyseswithanengine
toexecutethem,aDBMStokeepandmanagethemforongoinguse,andmechanismsforacquiring
andpreparingdatathatisnotalreadystored.Inthisreport,wefocusontheDBMScomponentofthe
platform.Asnotedbelow,separateprovidersalsoofferdatasourcingandintegrationandtoolsfor
analyticssurroundingtheDBMS;thesewillinteractwiththeDBMSitselfandoftendependonitfor
execution.
Why Do We Need Analytic Platforms?
Abriefhistorydemonstrateshowwegothereoverseveraldecades.Theearliestcomputinguseda
simpleparadigmforsimpleanalyticprocessing:businessdatacreatedbytransaction,manufacturing,
managementreportsaboutthestateofthebusiness.Butroutine,multiple,simultaneoususeofthe
thingstoatwo-or-more-tiermodel,inwhichtheanalyticprocessingwasdoneondataextractedto
adifferentplatform,supportingoneormanyusersworkingwithlocalcopiesofthedatathatmightthemselvesbesavedormightgoawaywhenthesessionwasdone.Butthiscreateduncoordinated,
policy,andcurrencycouldbecentrallymanaged.Diversedatasourceswereharvestedanddatawas
authorityinbusinessunitswhodesiredautonomy.Infrontofthesesystems,dataextractionand
transformationproductsmanagedfeedingthedatain;behindthem,analytictoolsforadhoc
ButtheDBMSproductinthemiddleofallthiswasusuallythesameoneinuseforeverythingelse.1
-
7/27/2019 Beyond Traditional Data Warehouse
5/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 5
Figure2:ComponentsofanAnalyticPlatform
benchmarkmadeitclearthatDBMSswerecomingupshort;newapproacheswereneeded,andnew
vendorsemergedtomeetthem,creatingnewproductsthatsucceededwheretheincumbentscouldnot.
Theforcesdrivingtheneedforchangearelargelythesameanddrovethedesignofthenewcomerswho
Data Growth and New Types of Data
Thelargestdatawarehousesarenowmeasuredinpetabytes.Terabytesarenotatallunusual,andits
terabyteswasveryorsomewhatimportantintheirplanningoracquisition.
everydaytoolkitforbusinessanalystswerenotdesignedforthesenewformsofinformationandoften
arenotwell-equippedtoworkwiththem.
Figure3:WhatDataSourcesareUsedinYourAnalyticPlatforms?
-
7/27/2019 Beyond Traditional Data Warehouse
6/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 6
Analyticplatformsaredesignedtomanagelargedatavolumes,sophisticatedanalytics,andnewerdata
dataforbettercompression.Someusesmartstoragetodosomeoftheworkatthestoragelayerto
freeuptheprocessorfortheheavyanalyticlifting.Theylashtogethermanycommodityprocessors,
withlargermemoryspaces.Theyconnectprocessorswithoneanotherandwithdatastorageacross
fasternetworkstoscaleprocessingandstorageinsync.Theyaredesignedtohandlenewtypesofdata,
alwayseasytoupdatetoleveragethesenewopportunities.
Advanced Analytics
Simplereporting,spreadsheets,andevenfairlysophisticateddrill-downanalysishavebecome
commonplaceexpectationsandarenotconsideredadvanced.Whilethetermisfrequentlydebated,
itsclearthatevensimpleanalysisisadvancedwhenitneedstoperformedonamassivescale.Even
example,isaperformancechallengeformanysystemswhenrunagainsttodaysextraordinaryvolumesofdatawhileotheractivitiesrunonthesamesystem.
Butincreasingly,thenatureoftheanalysisitselfismoreadvanced.Sophisticatedstatisticalworkis
becomingcommonplaceformarketbasketanalysisinretail,behavioralanalysisinclickstreamsfor
websites,orriskanalysisfortradingdesks.Buildingpredictivemodelsandrunningthemagainstreal-
variableshapessuchassalesterritoriesorwatershedsthatarenoteasilycomputed.Suchambitions
wereusinghand-codedprogramsasopposedtopackagedtools.
thosesources.
Scalability and Performance
ofbusinessintelligence(BI)thinkersandplannerstoinvolvemoreusersinthecorporateanalysis
ofperformance.Intheclient-servererathiswasoftenhandledbyputtingtoolsontheirdesktops
andmovingdatatothem,creatingcoordinationproblemsascomputationalmodelswereduplicated.
-
7/27/2019 Beyond Traditional Data Warehouse
7/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 7
Analyticplatformsaredesignedtoleveragehigherbandwidthconnectionsacrossafabricofprocessors.
processesrunninginmassivelyparallelscale-outarchitectureswithmoreprocessors.Theyuse
inexpensivehardwarethatcanbeaddedwithouttakingsystemsdown,soasdemandsscale,socan
thatpermittheelasticsetupandteardownofsandboxeswherenewanalysesandideascanbetested.
Cost and Ease of Operation
Asdatavolumes,analyticcomplexity,andthenumbersofusersallgrow,sodoescost.Even
users,securitymanagement,andtheneedtomanageenvironmentsthatcannotbetakendownfor
maintenanceallcreatetheirowndemandsandcosts.
devicedrivers,andoperatingsystems.Eachpieceofacomplexstackofsoftwareisfrequentlyupdated
expensive,andbudgetisconsumedmerelykeepingthelightson.
Analyticplatformsoffermultipledeploymentoptionsthatcanreducemanyofthesecosts.Asthey
generallymovetocommodityhardware,someofthepricingpremiuminolderproprietarysystems
smootherandmoregranular.Itissimplertoaddbladeswithprocessor,memory,andstoragethatsnap
costs.
cost.Theyareincreasinglymaintainedandupdatedbytheirsuppliersinawaythatisdesignedto
ensurethatchangesdontbreakthings.
Finally,movingtheanalyticplatformoffpremisesinonefashionoranotherprovidesthemaximum
reductionincostofownershipandoperation.Severalvendorswillhostthesystemandthedataasa
dedicatedfacility.Somewillmakeitavailableinthecloudinamulti-tenantfashion,wheretoolsare
sharedbutdataisstoredandmanagedforindividualcustomers.Theymaytakeovertheprocessof
importingthedatafromitssourcesystems,suchasretailoronlinegamingsystems,andprovidethe
dataintegrationaswellasthestorageandanalytics.
Ananalyticplatformisanintegratedandcompletesolutionformanagingdataandgenerating
businessanalyticsfromthatdata,whichoffersprice/performanceandtimetovaluesuperiorto
non-specializedofferings.Thissolutionmaybedeliveredasanappliance(software-only,packaged
hardwareandsoftware,virtualimage),and/orinacloud-basedSaaSform.
Inthisreport,weconsiderDBMSofferingsthatformtheheartoftheanalyticplatform.
-
7/27/2019 Beyond Traditional Data Warehouse
8/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 8
Types of Analytic Platforms
alternativelanguagesanddatamodels.
TherearenumerousfeaturesandfunctionsthatdifferentiateADBMSsfromoneanother,butforthe
Useofproprietaryhardwarev -
Hardwaresharingmodelforprocessinganddata:v Increasingly,ADBMSvendorssupport
-
ageinashared-nothingenvironmentormaybeconnectedtosharedstoragesuchasastorage
areanetwork(SAN).
Storageformatandsmartdatamanagement:v ManyADBMSsareusingcolumnarstorage,
-
portbothrowandcolumnformatinonehybridformoranother.Somealsoaddintelligenceat
thestoragelayertopre-processsomeretrievaloperations.Alluseavarietyofencoding,com-
pressionanddistributionstrategies.
SQLsupportv
preventingsomequeriesfromrunningadequatelyoratall.
NoSQLtoo.v
Programmingextensibilityv :ADBMSenginesoffervaryingdegreesofsupportfortheinstal-
functionsthemselvesandwithpartners,andsomeofthesetakeadvantageofsystemparallelism
forperformanceimprovement.
Deploymentmodelsv .ADBMSsmaybedeliveredasanappliance:acompletepackageofhard-
wareandsoftware;software-onlyproductsmaybedeployedonpremisesoncommodityhard-
-
7/27/2019 Beyond Traditional Data Warehouse
9/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 9
Hardware Directions
Thingsarechangingfast.Severalkeyelementsofthehardwaremixareundergoingenormouschange,
withprofoundimplicationsforsystemdesignanditsimpactonanalyticperformance.
Memoryisthenewdisk;diskisthenewtape
Morecores,morethreads,yieldmoreprocessingpower .Theadditionofmorecores(and
processingthreads)tochipshassimilarimplications.Assoftwaresmartenoughtobreakupand
.Thespeedofinterconnectscanbean
enormousbottleneckforsystemperformance.Movingdataaroundinsidelargesystemsorfromone
Message from the Market: Its Time
Marketschangerapidly,buttheeffectsareoftennotfeltforyears.Thevalueofalreadyinstalled
softwareinmostcategoriesisseveralordersofmagnitudelargerthanthespendingonitinanygiven
yearortwo.Maintenanceandsupportcostsforinstalledsoftwaredwarfsnewspending.Butatthe
addedanotherthousandorso.Theseveralhundredmilliondollarsspentwiththesenewcomers
Butincontext,thesenumbersarehardlyablipontheradar.TherearehundredsofthousandsofDBMSs
installed;so-calleddatawarehouseDBMSsalesareestimatedat$7billionperyear.TheADBMSis
LeavingasideTeradataandSybase,ADBMSvendorscollectivelygenerateafewhundredmilliondollars
respondentstoldusthattheytypicallybegintheirsearchforaplatformwiththeirincumbentDBMS
vendor.
Welearnedinourinterviewsthatthoseadoptinganalyticplatformsareagentsofchange.Theyare
creatingnewvalue,newbusinessopportunities,andnewcustomeropportunities.Fromacompetitive
-
7/27/2019 Beyond Traditional Data Warehouse
10/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 10
yourproblem,youcanstartfast.Andgetvaluefast.Atlowercostthanyoumayhavethoughtpossible.
Techniques and Technologies
Inthissection,wereviewsomeofthekeytechniquesandtechnologiesofferedbyanalyticplatforms,
andoffersomesuggestionsaboutthingstoconsiderwhenevaluatingthesesolutions.
ADBMS versus a General Purpose RDBMS
Ananalyticplatformconsistsofthreemainsoftwarecomponents:thedataintegrationsoftware
fortransformingandloadingsourcedataintotheplatformsdatabase,thedatabasemanagement
deliveranalyticstousers.Inatraditionaldatawarehousingenvironment,thesethreecomponentsare
purchasedseparatelyandintegratedbythecustomer.Akeydifferencewithananalyticplatformisthat
thevendordoestheintegrationanddeliversasinglepackagetothecustomer.
andanalyticITapplications.Giventhetrendbymanycompaniestowardextremeprocessingatboth
forageneralpurposeorclassic
Thebroadeningapplicationprocessingspectrumisleadingtovendorsdevelopingdatabase
managementsoftwarethatfocusesonanarrowersubsetofthatspectrum.Inthisreport,productsthat
targetanalyticprocessingaredescribedasADBMSs.
workloadvaries.ThechallengeinselectinganADBMSistomatchtheworkloadtotheproduct.This
isespeciallytrueinthecaseofextremeprocessingandalsoinbusinessenvironmentswithconstantly
Inoursurveyandcustomerinterviewsweaskedpeopleaboutkeytechnologyrequirementsfor
ratedasveryimportant
-
7/27/2019 Beyond Traditional Data Warehouse
11/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 11
Figure4:WhatFeaturesareImportantforYourAnalyticPlatform?
in-databaseprocessingwas
exploitingthepowerofaparalleldatabaseenginetoruncertainperformancecriticalcomponentsofa
dataintegrationoranalyticapplication.
Scoresthatwerelowerthanexpectedwere:supportforopensourcedataintegrationandBItools
ownhardware.
extremeprocessing
coupledwitheasyscalingwerethemainproductselectioncriteria.Mostofthesecustomersalso
requiredhighavailability.
ADBMS Application Development Considerations
separatingtheuser,orlogical,viewofdatafromthewayitisphysicallystoredandmanaged.An
independenceremainslargelyuniquetorelationaltechnology.
Fromadevelopmentperspective,thefactorstoconsiderwhenselectingananalyticplatformandits
-
7/27/2019 Beyond Traditional Data Warehouse
12/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 12
concerntoapplicationsdevelopersthantheusersofinteractiveanalytictools.Fortheselatterusers,
Asalreadynoted,mostofthecustomersinterviewedforthisreportwereusingextremeprocessing,
veryimportantselectioncriterionforthesecustomers.Severalcustomerscommentedthatsomeofthe
courseiscontrarytooneofthemaintenetsoftherelationalmodel.
pushanalyticfunctionsandprocessinginto
theADBMSwillusuallyboostperformanceandmakecomplexanalysespossiblefromuserswhohave
theexpertisetousesuchfunctions,butnottheskillstoprogramthem.Formanyofthecustomersweinterviewed,in-databaseprocessingwasanimportantfeaturewhenchoosingaproduct.Theuseof
suchprocessing,however,canlimitapplicationportabilitybetweendifferentADBMSproductsbecause
ofimplementationdifferences.
processing,itdoesnotnecessarymeanthisprocessingisdoneinparallel.Someoftheprocessing
functionsmayberuninparallel,whileothersmaynot.Allofthemprovidemorerapidimplementation,
ADBMS Data Storage OptionsADBMSsoftwaresupportsawidevarietyofdifferentdatastorageoptions.Examplesinclude:
partitioning,indexing,hashing,row-basedstorage,column-basedstorage,datacompression,in-
memorydata,etc.Also,someproductssupportashared-diskarchitecture,whileothersuseashared-
nothingapproach.Theseoptionscanhaveabigimpactonperformance,scalability,anddatastorage
requirements.Theyalsocauseconsiderablediscussionbetweendatabaseexpertsastowhichoption
isthebesttouse.Thecurrentdebateaboutrow-basedversuscolumn-basedstorageisagoodexample
Inanidealworld,anADBMSwouldsupportallthesevariousoptionsandallowdeveloperstochoose
themostappropriateonetouseforanygivenanalyticworkload.ADBMSproducts,however,vary
applicationdeployment,andtodatabaseadministration.Aproductcouldautomaticallyselector
recommendthebestoption,andsomeproductsarebeginningtosupportthis.Ingeneral,however,
workloads.
ThephysicalstorageoptionssupportedbyanADBMSproductshouldbecompletelytransparenttothe
-
7/27/2019 Beyond Traditional Data Warehouse
13/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 13
performance.Thiswascertainlythecaseforseveralofthecustomersinterviewedforthereport.
Anotheroptionofcourseisgowithaproductthatprovidesverylittleinthewayoftuningoptions
andinsteademployabrute-forceapproachofsimplyinstallingmorehardwaretosatisfyperformanceneeds.Thetheoryisthathardwaretodayischeapcomparedtodevelopmentandadministrationcosts.
The Role of MapReduce and NoSQL Approaches
processmassiveamountsofdataeveryday.Ahighpercentageofthisdataisnotwellstructuredand
Alandmarkpaper
MapReduceisaprogrammingmodelandanassociatedimplementationforprocessingand
generatinglargedatasets.Programswritteninthisfunctionalstyleareautomatically
parallelizedandexecutedonalargeclusterofcommoditymachines.Theruntimesystemtakes
careofthedetailsofpartitioningtheinputdata,schedulingtheprogramsexecutionacrossasetof
machines,handlingmachinefailures,andmanagingtherequiredinter-machinecommunication.
Thisallowsprogrammerswithoutanyexperiencewithparallelanddistributedsystemstoeasilyutilizetheresourcesofalargedistributedsystem.
key/valuepairs.Therecords
areproducedfromsourcedatabythemapprogram.The value
typeofarbitrarydata.Googleusesthisapproachtoindexlargevolumesofunstructureddata.Note
managementsystems.GooglehasintegrateditintoitsBigTablesystem,whichisaproprietaryDBMS
Severalofthesponsorsofthisreportprovidethiscapability.Thishybridapproachcombinesthe
engine.
-
7/27/2019 Beyond Traditional Data Warehouse
14/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 14
NoSQLmovement.
reliability.
togetherinahybridenvironment.
customerswant,andarebuildingconnectorsbetweenthetwotechnologies.Maybethisiswhythe
websitenosql-database.orgprefersthepragmatictermNotonlySQL
textualdata,andsubsetsofthisdatawerethenbroughtintotheanalyticenvironmentusingasoftware
Administration and Support Considerations
customerinterviews.Severalofthecustomersinterviewedalsosaidthatsimpleadministrationwas
animportantproductselectioncriterionbecausetheydidntwanttoemployanarmyofdatabase
administrators.Easyadministrationwasparticularlyimportantwhendesigningdatabasesandstorage
structures,andwhenaddingnewhardware.
Severalofthecustomersinterviewedalsonotedthatasworkloadsincreasedinvolumeandbecamemoremixedinnature,theworkloadmanagementcapabilitiesoftheADBMSbecamemoreimportant.
Alloftheinterviewedcustomerswerehappywiththesupporttheyreceivedandtheworking
relationshiptheyhadwiththeirvendors.Severalalsocommentedthatthevendorwasusuallyvery
receptivetoaddingnewfeaturestotheanalyticplatformtomeettheirneeds.
-
7/27/2019 Beyond Traditional Data Warehouse
15/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 15
Deployment Models
Thedeploymentoptionsofferedbyanalyticplatformvendorsvary.Somevendorsprovideacomplete
packageofhardwareandsoftware,whileothersdeliveranintegratedpackofsoftwareandthenlet
customersdeployitontheirownin-housecommodityhardware.Somevendorsalsooffervirtual
softwareimagesthatareespeciallyusefulduringforbuildingandtestingprototypeapplications.
eitherthevendorsorathird-partydatacenterorforuseonanin-houseprivatecloud.Insomecases,
thevendormayalsoinstallandsupportaprivatecloudanalyticplatformonbehalfofthecustomer.
Ideally,avendorshouldsupportavarietyofdifferentdeploymentoptionsforitsanalyticplatform.This
customermayopt,forexample,todevelopandtestanapplicationinapubliccloudandthendeploythe
areruninhouse,whileothersaredeployedinapublicclouddependingonperformance,costanddata
securityneeds.
Use Cases
Basedonpriorexperience,thesurveyresultsandcustomerinterviewsfromourresearchstudy,wecan
1. Deployinganenterprisedatawarehousingenvironmentthatsupportsmultiplebusinessareas
andenablesbothintra-andinter-businessareaanalytics.
useinanalyticprocessing
Figure5:AnalyticPlatformUseCases
-
7/27/2019 Beyond Traditional Data Warehouse
16/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 16
Beforelookingateachoftheseusecasesindetail,itisimportanttocommentaboutthesurveyresults
andcustomerinterviewsusedinthissectionofthereport.
environments,andtechnologymaturity.Thecustomersinterviewedforthereport,ontheotherhand,
wererecommendedbyeachofthereportsponsors,andwere,inmanycases,developinganalyticsolutionswhereitwasnotpracticaltomaintainthedatainatraditionaldatawarehousingenvironment.
Theresultsandopinionsfromthetwogroupsthereforesometimesdiffer.Thesurveyaudienceresults
theinterviewedcustomersdemonstratethedisruptiveforcestakingplaceintheindustrythatenable
completelynewtypesofanalyticapplicationtobedeveloped.
Use Case 1: Enterprise Data Warehousing
Thisusecaseiswellestablishedandrepresentswhatcanbeconsideredtobethetraditionaldata
warehousingapproach.Theenvironmentconsistsofacentralenterprisedatawarehouse(EDW)with
oneormorevirtualordependentdatamarts.ThedataintheEDWanddatamartshasbeencleansed
historicalreportinganddataanalysispurposesbymultiplebusinessareas.
containingdataextractedfromanEDW.
datawarehousing.Forthiscustomer,reducingsoftwarecostswasthemainreasonformovingtoan
andanalyticprocessing).
Figure6:WhatUseCasesareBeingDeployedonYourAnalyticPlatform?
-
7/27/2019 Beyond Traditional Data Warehouse
17/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 17
Use Case 2: Independent Analytic Solution
surveyrespondentswereusingananalyticplatformforthisusecase.Therearethreemainreasonswhy
Inthissituation,
theanalyticplatformmaybetheinitialstepinbuildingoutatraditionalEDWenvironment.
butgrow,viaasetofscalableofferings,toprovidealargeEDWsystemthatcansupportmultiple
businessareas.
anexistingEDW.Inthissituation,ananalyticplatformoffersthepromiseofdeployingthisso-
calledindependentdatamartsolutionatalowercostandashortertimetovalue.Inthefuture,
dependingonbusinessneed,thedatainthedatamartmaybeintegratedintoanEDW.Many
companieshavelearnedfromexperience,however,thatindependentdatamartsmaysavetimeandmoneyintheshortterm,butmayprovemorecostlyinthelongtermbecausedatamarts
haveatendencytoproliferate,whichcreatesdataconsistencyanddataintegrationissues.Asa
result,manyexpertshaveanegativeviewoftheindependentdatamartapproach.
c) Theorganizationneedstosupportextremeprocessingwhereitisunnecessaryorimpracticalto
incorporatethedataintoanEDW.Sixofthecustomersinterviewedforthisresearchreport
matchthisscenario.Dependingonbusinessneed,theindependentanalyticsolutionmay
acquiredatafromanEDWtoaugmenttheanalysesandmayalsoreplicatetheprocessing
resultsbackintoanEDW.Someindependentanalyticsolutionsmaybeexperimentalinnature
thetraditionaldatawarehousinglifecycle.Theextremeprocessingscenario,however,isrelativelynew
andrepresentsthebiggestpotentialforbusinessgrowthandexploitationofanalytics.Itisimportant,
therefore,tolookatextremeprocessinginmoredetail.
impossible,forcost,performance,ordatalatencyreasonstoloadcertaintypesofdata(highvolume
webeventdata,forexample)intoanEDW.Insomecasesitmaynotevenbenecessary.Theapplication
mayinvolvedatathatonlyhasausefullifespanofafewdaysorweeks.Note,however,thattheselatter
typesofapplicationsdonotprecludetheanalyticresults,orscoredoraggregateddatafrombeing
storedinanEDWforusebyotheranalyticapplications.
Anotherfactordrivingextremeprocessingisthenatureoftheanalyticalprocessingitself.BIusersare
detaileddataaswellasaggregateddata.Theyarealsobuildingmorecomplexanalysesandmore
advancedpredictivemodels.Thereisalsoanincreasingdemandbytheseusersforenablingadhoc
-
7/27/2019 Beyond Traditional Data Warehouse
18/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 18
Extremedatacoupledwithextremeanalyticalprocessingleadstotheneedforhighperformance
andelasticscalability.Indata-drivencompanies,manyanalyticapplicationsaremissioncritical,and
reliabilityandhighavailabilityarethereforealsoofgreatimportance.Givenconstantlychanging
businessrequirementsanddatavolumes,theanalyticplatforminthesesituationsneedstosupport
replace.Theseextremeneedsrequireanewapproachtodatawarehousing,and,inouropinion,thisisthesweetspotfornewandevolvinganalyticplatforms.Theseanalyticsolutionsdonotreplacethe
Tousethetermindependentdatamarttodescribetheunderlyingdatastoreanddatamanagement
systemsupportingextremeanalyticapplicationprocessingmisrepresentsthisnewbreedof
extremeanalyticplatform.
Use Case 3: Filtering, Staging, and Transformation of Data
environmentsinvolvinghighvolumesofdataand/orawidevarietyofdatasourcesandtypesofdata.
competitortothisapproach.
TheprocessingofthedatainthisusecaseistypicallydoneusinganELTLapproachwherethe:
Extractv
Firstloadv
Transformv
Secondloadv steploadsthetransformeddataintotheADBMSoraremoteDBMSforanalyticprocessing
thedetaileddataandtheaggregatedresultsfromtheELTLprocessing.
Theanalyticprocessingperformedinthisusecasesupportsdatatransformationandaggregation,
forexample)isastrongcandidateforthistypeoftransformation.
Thisusecasealsooffersanalternativetousingextremeprocessing.Insteadofloadinghigh-volume
Wecanseefromtheresearchstudysurveyresultsandcustomerinterviewsthatanalyticplatforms
-
7/27/2019 Beyond Traditional Data Warehouse
19/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 19
circumstancescausedthemtomoveto,orconsider,ananalyticplatformforsupportingtheseusecases.
TheresultsareshowninFigure7.
Figure7:WhatIssuesLedYoutoUseanAnalyticPlatform?
demonstratethatcostisnotthemaindrivingforcebehindusingananalyticplatform.Thiswas
reasonforreplacingitsexistingEDWsolutionwithanewanalyticplatform.
relationshiptocost.Manyofthecustomersinterviewedforthisreportwerebuildingextremeanalytic
solutions,andthereasonstheygaveforchoosinganygivenanalyticplatformallmatchedoneormoreof
bebuiltbefore.Thiswaseitherbecausetheapplicationcouldntprovidetherequiredperformanceno
matterhowmuchhardwarewasemployedorbecausetheamountofhardwarerequiredtoachieve
acceptableperformancewascostprohibitive.
platformsprovidecost-effectivesolutionsthatextend,ratherthanreplace,theexistingdata
warehousingenvironment.Theyenableapplicationsthatsimplycouldnotbebuiltbefore.
Getting Started
Successinbusinessisanelusivething:solvingtodaysquestionopensnewpossibilitiesandnew
theprocurementstaffisasignedcontract;forbusinessanalyststhechallengeisgreater.Thefollowing
aresomethoughtsonhowtoensurethattheplatformselectedworkstodayandwillcontinuetogrow
andevolveasyourneedsdo.
-
7/27/2019 Beyond Traditional Data Warehouse
20/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 20
Therearealsobestpracticesfortheimplementationofyouranalyticsstrategythatleveragethetools
youacquireandfollowingthemwillmakesuccessmorelikely.Thevendorsandusersweinterviewed
offeredsomevaluableinsightsandweincludethemhere,togetherwithaquickreviewofthesuccess
factorsthatmakethedifference.
Selecting the Right Platform
Foranalyticplatformselection,thereisonlyoneplacetostart:understandingtheanalyticsplanned.
Theusecasesinthisstudyhintatthepossibilities,buttheyalsomakeitclearthatthereisenormous
variety:intheskillsandpreferredtoolsoftheusers,thebusinessproblemsbeingtackled,thetypesof
analysisrequired,thelatencyofthedata,andtheusersvolumes.Willstandardreportingbeenough?
Isadhocanalysiswithdrill-downandslice-and-diceoperationsrequired?Doesinternalandexternal
dataneedtobecombined?Willtheanalysesinvolvedataminingandpredictivemodelbuilding?
Willtemporaryanalyticdatastoresbesetup,processed,andthentorndownfrequently?Whatare
thecurrentandfuturedatavolumesandnumberofusers?Knowtheseanswersbeforeyoubegin.
howevercomplex,cansubstitutefortheonecriticalelement:testingonyourdata,withyourqueries,on
thehardwareandsoftwareplatformyouplantouse,withthenumberofconcurrentusersthatmatches
theexpectedusagepatterns.Asyoudecidewhoshouldbeonyourshortlistforactualtests,hereare
somekeyaspectstoconsiderasyoudrawupyourrequirements.
v .Ensurethatyoucanloaddataatthespeedyou
Workingwithyourlanguagesandtoolsv -
skills,andanswermorequestionsthanyouhavebefore.
Supportingyourtoughestquestionsv .Ifyoudidyourhomework,youshouldknowthetough
fromthemostmatureondown.
storagehardware,thespeedoftheinterconnects,thelevelofpre-integrationprovidedacrossthe
softwarestack.Besurethatsettinguptestordevelopmentsystemsisnomorecomplexthanyouare
comfortablewith;analyticsisanincreasinglyiterativeprocess.Explorethepossibilityofdoingsuch
testinginthecloudtoproductiononyourhardware?
involveddontseemknowledgeable,problemstakealongtimetoresolve,and/orsetupseemstotake
-
7/27/2019 Beyond Traditional Data Warehouse
21/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 21
alongtime,youmustconsiderwhatitwillbelikewhenthecheckhasbeensignedandthepurchase
made.Assesswhatservicesareavailabletoyoufordesign,training,andsupport.Andbesureto
leavesomesurprises.Donotconductyourtrialonprearrangedqueriesandanalysesonly.Stresstest
workloadsthatmirroryourexpectedonesinnumberofusers,volumesofdata,andotherprocesses
runningiftherewillbeany.
Conclusions
Therightselectionprocessinvolvesunderstandingthelikelyanalyticalworkloads,datavolumeand
often,somecandidatescantgetitdoneatall.
SupportforopensourcedataintegrationandBItools,columnardatastorage,andacompletehardware
adoptersandsurveydatashowthatqueryperformance,supportforcomplexanalytics,andon-demand
capacityare.Asanalyticplatformsbecomemainstream,however,itslikelythateaseofinstallationand
supportandaggressivedatacompressionstrategieswillbegintogrowinimportance.
andthisdevelopmentshoulddriveincreasedawarenessandgrowth.Theanalyticplatformwilldrive
billionsofdollarsinrevenueinthenextdecade,andtransformexpectationsabouttheabilitytouse
datatoimprovebusinessresults.
-
7/27/2019 Beyond Traditional Data Warehouse
22/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 22
Appendix: Detailed Survey Results
Q1:
generatingbusinessanalyticsfromthatdata,whichoffersprice/performanceandtimetovalue
superiortonon-specializedofferings.Thissolutionmaybedeliveredasanappliance(software-only,
packagedhardwareandsoftware,virtualimage)and/orinacloud-basedsoftware-as-a-service
Value
Yes
No 14
Q2: Value Alreadyusing
Noplans
Q3:
Value
1businessyear(orthepast4completequarters)orless
Q4: Value
No
Yes,withhand-codedprograms
Yes,withpackagedtools
-
7/27/2019 Beyond Traditional Data Warehouse
23/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 23
Q5: Whichofthefollowingusecases[architecturalmodels]arebeingdeployedforyouranalytic
Value
Enterprisedatawarehouse(EDW)
Datastagingareaforadatawarehouse
Dependentdatamart
Independentdatamartordatastore
7
Q6:
Value
Needforcomplexanalyses
Needforon-demandcapacity
Growthinnumberofconcurrentusers
Loadtimes
Availabilityandfaulttolerance
Archivingorbackuptimes
Q7:
loadedintoadatastorebeforeaddingindexes,aggregatetables,materializedviewsand/orcubes
builtfromtherawdata.)
Value
Lessthan1terabyte
18
-
7/27/2019 Beyond Traditional Data Warehouse
24/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 24
Q8: Howmuchdataareyoumanagingonyouranalyticplatformafterloading,tuning,enhancing,and
Value
Lessthan1terabyte
Q9:
Value
Q10:
Value
StructuredlegacyDBMSdata 78
Weblogs
Eventormessagedata
Datafromenterpriseservicebusorwebservice
11
11
-
7/27/2019 Beyond Traditional Data Warehouse
25/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 25
Q11:Pleaseratethefollowingfeaturesthatwere/areimportantinacquiringorplanningyouranalytic
Very
ImportantSomewhatImportant
NotVeryImportant
Goodadministrationtools
Faulttoleranceandhighavailability
IntegrationintoITenvironment
Easyscaling&hardwareupgrades
SupportforcommercialDI/BItools
In-databaseprocessing
Workloadmanagement
Datacompression SupportforopensourceDI/BItools
Supportforcloudcomputing
In-memorydata
Q12:
Value
Fully
No
-
7/27/2019 Beyond Traditional Data Warehouse
26/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 26
Q13:
Value
17
Government
Education
Manufacturing/Industry(non-computerrelated)
4
Manufacturingconsumergoods 4
Aerospace
Q14:
Value
18
17
11
-
7/27/2019 Beyond Traditional Data Warehouse
27/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 27
Q15:
Value
Businessdepartment
Businessdivision
Q16:Pleasetelluswhereyouandyourcompanyarelocated.
ValueNorth
AmericaEurope
LatinAmerica
Whereareyoulocated?
-
7/27/2019 Beyond Traditional Data Warehouse
28/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 28
Vertica Overview and Business Description
Vertica(www.vertica.com)ledthewayincombiningtwokeytechnologiesdrivingthenewgeneration
severalotherkeyexecutivesasitmakesthetransitionfrompromisingstartuptoaleadingplayerinits
communicationsservices,healthcare,socialnetworkingandonlinegaming,andretailandWeb-based
ofitstime-value(e.g.,immediateavailability),queryperformanceandbroadaccessbyusers(e.g.,highconcurrency).Verticatoutsextremeloadperformance,concurrentandhighperformancequery
extremelyfavorablecomparisonsontotalcostofownership,especiallyhardwarecosts;oneearly
commodity-basedplatformrunningthesameapplication.
TheVerticaAnalyticDatabaseisavailableassoftware-only,asahardware-basedappliance,asavirtual
freetrialversionfordownloadtopressitscase.
ArchitectureAsitsnameimplies,Verticawasdesignedfromthebottomupasacolumn-orientedstorageand
enhancesperformancedramatically.Makingtimetodeploymentakeyvaluedroveafocusonautomatic
databasedesign:Verticaprovidesaphysicaldesigntoolthatgeneratesandpartitionsdataacrossnodes
basedontheinputofalogicaldesign,sampledata,andsamplequeries.Theoutputisanautomatic
varioussortorders,encodingandcompression,andthetoolcanbereruntomakeincrementalchanges
veryfrequentlyusedtogether)automaticallyaswell.Theautomationfreesdatabasestafftofocuson
-
7/27/2019 Beyond Traditional Data Warehouse
29/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 29
strategiesandresultsvarybydatatype:Verticahasfoundcompressionratiosfortelcocalldatarecords
Verticashighavailabilitystrategyisbasedonwhatisknownask-safetyredundancy(replication
thesereplicasbystoringthedataindifferentsortordersforfurtherperformanceimprovements.
Automatednoderecoveryandshared-nothingarchitectureeliminateasinglepointoffailure.System
administrationwillsoonsportanewuserinterfaceforVerticasenhancedbackupanddisasterrecovery
Analytic Functionality
one;analyticdatabasestendtobeinmorecontinuoususeandrequirelowlatency.Verticaprovides
Differentiation
inthereyet,andsomeneverwillbe.Verticaistryingtoshiftconversationstowhatkindofproblem
customerswanttosolveandfocusonhowfast,atwhatscale,andatwhatprice.Itwinsagainstlarge
incumbent,non-specialtydatabases,expectscontinuedsuccessagainstthosethatprovidehardwareas
-
7/27/2019 Beyond Traditional Data Warehouse
30/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 30
Partnerships
forgovernmentnetworks.
How Should Customers Start, and What Matters Most?
Verticapointsoutthatcustomersshouldnotforgetdatacleansinganddataqualitychallenges.It
believesthatthetimeitsavesondatabasetuningandphysicaldesignissuesshouldbespentonthese
challengingone.
Future/Road Map Exploitation of Trends
Verticasexpansionwillfocuson4corethemes:
In-databaseanalyticsv
v
v
(orsubsetsofdata)
Easeofusev
Verticaexpectstoseeincreasedmovementtothecloud.Itsexperiencetheredatesbacktoits
volumedatatransfers.
-
7/27/2019 Beyond Traditional Data Warehouse
31/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 31
Vertica Customer Case Study: Zynga
Company Background
The Business Problem
hadreachedthepointinitsgrowththatitneededaBIsystemtobringtogetherallofthedatafromits
highprioritybyexecutivemanagement.
improvementstogamecontentandwouldalsohelpthemindesigningnewgames.
system.Gamesareaccessedfromsocialnetworkingsitesandonlyplayedforafewminutesatatime.
The Analytic Platform Solution
companyconsideredseveralcompetinganalyticplatformsandalsolookedatthepossibilityofusing
Anotherimportantfactorwastheabilitytocompressthedatatoreducediskstoragerequirements.
thingwelearnedfromtheevaluationprocessisthatyoureallyneedtoknowyourusecasesupfrontin
ordertoselecttherighttechnology.
chooseforthesekindofenvironmentsifyouwanttoachievehighperformanceandavailability.
-
7/27/2019 Beyond Traditional Data Warehouse
32/33
Copyright 2010 TechTarget, BI Research, IT Market Strategy 32
DetaileddataisloadedintotheVerticadatabaseinrealtimeusingin-housedevelopedsoftware.The
notedthat,VerticasparallelismisideallysuitedtothisELTapproachbecausewecanpushthe
usersforproducinghigh-levelreportsandanalyses.
Implementation Considerations
issueshavenowgoneawayandhardwarereliabilityisnowourmainconcern.Toimproveavailability
wehavenowinstalledasecondhardwarecluster.
Followingtheinitialinstallation,companygrowthanduseradoptionofthesystemcausedadramatic
theimplementationteambecameconcernedaboutscalingthesystemtomanagewhatwerelikelytobe
trialswithseveralvendors.TheresultsfromthesetrialsagainledtothedecisiontouseVertica.Sofar
Verticaonscalabilityneeds.
%HQHWV
Metricsfromthesystemimpactandenhanceeverygameweproduce.Theabilitytoscalewascrucial
tousbeingabletodevelopnewmetricstobettermanageandgrowourbusiness.Wefeeltherearefew
systemsouttherethatcanprovidethislevelofscalability.Thesystemisalsoacrucialunderpinningto
buildingnewapplications.
Summary
improveexistingproductsanddesignnewonesthatprovidecustomerswiththeexperiencetheywant
performancerequirementswerekeyselectioncriteria,buttheabilityofthesystemtoscaletomeet
-
7/27/2019 Beyond Traditional Data Warehouse
33/33
About the Authors
technologyresearchforseveralyears,beforereturningtohisrootsasananalystcoveringthesoftware
GigaInformationGroup.Mervfocusedonfacilitatingcollaborativeresearchamonganalysts,andserved
thoseevents,asaguitaristandsinger.
ColinWhite
educatorandwriterheiswellknownforhisin-depthknowledgeofdatamanagement,information
integration,andbusinessintelligencetechnologiesandhowtheycanbeusedforbuildingthesmart