beyond traditional data warehouse

Upload: shayan-ahmed

Post on 14-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Beyond Traditional Data Warehouse

    1/33

    Analytic Platforms:

    Beyond the Traditional Data Warehouse

    By Merv Adrian and Colin White

    BeyeNETWORK Custom Research Report

    Prepared for Vertica

  • 7/27/2019 Beyond Traditional Data Warehouse

    2/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 2

    Executive Summary

    Theoncestaidandsettleddatabasemarkethasbeendisruptedbyanupwellingofnewentrants

    targetingusecasesthathavenothingtodowithtransactionprocessing.Focusedonmakingmore

    sophisticated,real-timebusinessanalysisavailabletomoresimultaneoususersonlarger,richersetsof

    data,theseanalyticdatabasemanagementsystem(ADBMS)playershavesoughttoupendthenotion

    andsuccessfullyintroducedtheanalyticplatform andprovenitsvalue.

    newcomerssuccessfullyplacedanadditionalthousandinstancesbytheendofthedecade,makingit

    fromincumbentclassicdatawarehouserelationaldatabasemanagementsystemproducts.

    Analyticplatformsprovidetwokeyfunctions:theymanagestoreddataandexecuteanalyticprogramsagainstit.Wedescribethemasfollows:

    Ananalyticplatformisanintegratedandcompletesolutionformanagingdataandgenerating

    businessanalyticsfromthatdata,whichoffersprice/performanceandtimetovaluesuperiorto

    non-specializedofferings.Thissolutionmaybedeliveredasanappliance(software-only,packaged

    hardwareandsoftware,virtualimage),and/orinacloud-basedsoftware-as-a-service(SaaS)

    form.

    platformtobethetoolstheyusetoperformtheanalysis.Thismaybealegacyofclient-serverdays,

    whenanalysiswasperformedoutsidethedatabaseonrichclientsoftwareondesktops.Butthe

    increasingrequirementfortheADBMStopowertheanalysisisupendingthisthinking,andmostagreed

    withourdescription.Wefound:

    Thepaceofadoptionisstrongandacceleratingv

    ofusecases,worldwide,inmanyindustries.

    Thepromisesbeingmadearebeingmetv .Adoptersofanalyticplatformsreportthatthey

    Therightselectionprocessisessential.v

    -bersofuserslikelytobeonthesystem.Andrealtestsseparatewinnersfromlosers:often,some

    candidatescantgetitdoneatall.

  • 7/27/2019 Beyond Traditional Data Warehouse

    3/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 3

    Introduction

    Weconductedanonlinesurveyofseveralhundredprofessionalsworldwide,whosharedtheir

    experiencesandopinionswithus.Surveyresultsareshownattheendofthisreport;weincludesome

    Figure1:AreYouUsingorPlanningtoUseanAnalyticPlatform?

    Wealsoconductedinterviewswith8analyticplatformvendors,allofwhomaretargetingthismarket,

    andwithanominatedcustomerfromeach.Theintervieweesarequitedifferentfromtheoverall

    (DBMS)productsforanalyticplatformsinproportionsthatmirroredoverallmarketshares,our

    intervieweescomefromtheleadingedgeofthedisruptiveanalyticplatformphenomenon.Theywork

    manyapplications,includingsomeofthebusinessanalyticsbeingtargetedbythevendorsofanalytic

    platforms,buthaveoptedtousespecialtyplatformsforavarietyofreasons.

    Whatwelearnedwasprofound;businesses,moreandmoredrivenbytheirneedforanalyticprocessing

    afewyears,generatingbillionsofdollarsinrevenue,heraldthearrivaloftheanalyticplatformasa

    categorytobewatchedclosely.Itsolvesimportantproblems,andcustomersarederivingenormous

    valuefromit,creatingnewclassesofbusinessapplicationsanddrivingtop-linegrowth.

    fromtheissuesthatledyoutoaddananalyticplatformdata:theneedforcomplexanalyses,query

    performance,andon-demandcapacitytoppedthelist.Theseissuesaremirroredinthecasestudy

    interviewees.

  • 7/27/2019 Beyond Traditional Data Warehouse

    4/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 4

    Thisreportexaminestheanalyticplatform,thebusinessneedsitmeets,thetechnologiesthatdriveit,

    andtheusesanalyticplatformsarebeingputto.Itconcludeswithsomeguidanceonmakingtheright

    choicesandgettingstartedwiththeproductsofchoice.

    The Business Case for Analytic Platforms

    What is an Analytic Platform?

    sophisticateddemandsforbusinessanalytics.Itcombinesthetoolsforcreatinganalyseswithanengine

    toexecutethem,aDBMStokeepandmanagethemforongoinguse,andmechanismsforacquiring

    andpreparingdatathatisnotalreadystored.Inthisreport,wefocusontheDBMScomponentofthe

    platform.Asnotedbelow,separateprovidersalsoofferdatasourcingandintegrationandtoolsfor

    analyticssurroundingtheDBMS;thesewillinteractwiththeDBMSitselfandoftendependonitfor

    execution.

    Why Do We Need Analytic Platforms?

    Abriefhistorydemonstrateshowwegothereoverseveraldecades.Theearliestcomputinguseda

    simpleparadigmforsimpleanalyticprocessing:businessdatacreatedbytransaction,manufacturing,

    managementreportsaboutthestateofthebusiness.Butroutine,multiple,simultaneoususeofthe

    thingstoatwo-or-more-tiermodel,inwhichtheanalyticprocessingwasdoneondataextractedto

    adifferentplatform,supportingoneormanyusersworkingwithlocalcopiesofthedatathatmightthemselvesbesavedormightgoawaywhenthesessionwasdone.Butthiscreateduncoordinated,

    policy,andcurrencycouldbecentrallymanaged.Diversedatasourceswereharvestedanddatawas

    authorityinbusinessunitswhodesiredautonomy.Infrontofthesesystems,dataextractionand

    transformationproductsmanagedfeedingthedatain;behindthem,analytictoolsforadhoc

    ButtheDBMSproductinthemiddleofallthiswasusuallythesameoneinuseforeverythingelse.1

  • 7/27/2019 Beyond Traditional Data Warehouse

    5/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 5

    Figure2:ComponentsofanAnalyticPlatform

    benchmarkmadeitclearthatDBMSswerecomingupshort;newapproacheswereneeded,andnew

    vendorsemergedtomeetthem,creatingnewproductsthatsucceededwheretheincumbentscouldnot.

    Theforcesdrivingtheneedforchangearelargelythesameanddrovethedesignofthenewcomerswho

    Data Growth and New Types of Data

    Thelargestdatawarehousesarenowmeasuredinpetabytes.Terabytesarenotatallunusual,andits

    terabyteswasveryorsomewhatimportantintheirplanningoracquisition.

    everydaytoolkitforbusinessanalystswerenotdesignedforthesenewformsofinformationandoften

    arenotwell-equippedtoworkwiththem.

    Figure3:WhatDataSourcesareUsedinYourAnalyticPlatforms?

  • 7/27/2019 Beyond Traditional Data Warehouse

    6/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 6

    Analyticplatformsaredesignedtomanagelargedatavolumes,sophisticatedanalytics,andnewerdata

    dataforbettercompression.Someusesmartstoragetodosomeoftheworkatthestoragelayerto

    freeuptheprocessorfortheheavyanalyticlifting.Theylashtogethermanycommodityprocessors,

    withlargermemoryspaces.Theyconnectprocessorswithoneanotherandwithdatastorageacross

    fasternetworkstoscaleprocessingandstorageinsync.Theyaredesignedtohandlenewtypesofdata,

    alwayseasytoupdatetoleveragethesenewopportunities.

    Advanced Analytics

    Simplereporting,spreadsheets,andevenfairlysophisticateddrill-downanalysishavebecome

    commonplaceexpectationsandarenotconsideredadvanced.Whilethetermisfrequentlydebated,

    itsclearthatevensimpleanalysisisadvancedwhenitneedstoperformedonamassivescale.Even

    example,isaperformancechallengeformanysystemswhenrunagainsttodaysextraordinaryvolumesofdatawhileotheractivitiesrunonthesamesystem.

    Butincreasingly,thenatureoftheanalysisitselfismoreadvanced.Sophisticatedstatisticalworkis

    becomingcommonplaceformarketbasketanalysisinretail,behavioralanalysisinclickstreamsfor

    websites,orriskanalysisfortradingdesks.Buildingpredictivemodelsandrunningthemagainstreal-

    variableshapessuchassalesterritoriesorwatershedsthatarenoteasilycomputed.Suchambitions

    wereusinghand-codedprogramsasopposedtopackagedtools.

    thosesources.

    Scalability and Performance

    ofbusinessintelligence(BI)thinkersandplannerstoinvolvemoreusersinthecorporateanalysis

    ofperformance.Intheclient-servererathiswasoftenhandledbyputtingtoolsontheirdesktops

    andmovingdatatothem,creatingcoordinationproblemsascomputationalmodelswereduplicated.

  • 7/27/2019 Beyond Traditional Data Warehouse

    7/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 7

    Analyticplatformsaredesignedtoleveragehigherbandwidthconnectionsacrossafabricofprocessors.

    processesrunninginmassivelyparallelscale-outarchitectureswithmoreprocessors.Theyuse

    inexpensivehardwarethatcanbeaddedwithouttakingsystemsdown,soasdemandsscale,socan

    thatpermittheelasticsetupandteardownofsandboxeswherenewanalysesandideascanbetested.

    Cost and Ease of Operation

    Asdatavolumes,analyticcomplexity,andthenumbersofusersallgrow,sodoescost.Even

    users,securitymanagement,andtheneedtomanageenvironmentsthatcannotbetakendownfor

    maintenanceallcreatetheirowndemandsandcosts.

    devicedrivers,andoperatingsystems.Eachpieceofacomplexstackofsoftwareisfrequentlyupdated

    expensive,andbudgetisconsumedmerelykeepingthelightson.

    Analyticplatformsoffermultipledeploymentoptionsthatcanreducemanyofthesecosts.Asthey

    generallymovetocommodityhardware,someofthepricingpremiuminolderproprietarysystems

    smootherandmoregranular.Itissimplertoaddbladeswithprocessor,memory,andstoragethatsnap

    costs.

    cost.Theyareincreasinglymaintainedandupdatedbytheirsuppliersinawaythatisdesignedto

    ensurethatchangesdontbreakthings.

    Finally,movingtheanalyticplatformoffpremisesinonefashionoranotherprovidesthemaximum

    reductionincostofownershipandoperation.Severalvendorswillhostthesystemandthedataasa

    dedicatedfacility.Somewillmakeitavailableinthecloudinamulti-tenantfashion,wheretoolsare

    sharedbutdataisstoredandmanagedforindividualcustomers.Theymaytakeovertheprocessof

    importingthedatafromitssourcesystems,suchasretailoronlinegamingsystems,andprovidethe

    dataintegrationaswellasthestorageandanalytics.

    Ananalyticplatformisanintegratedandcompletesolutionformanagingdataandgenerating

    businessanalyticsfromthatdata,whichoffersprice/performanceandtimetovaluesuperiorto

    non-specializedofferings.Thissolutionmaybedeliveredasanappliance(software-only,packaged

    hardwareandsoftware,virtualimage),and/orinacloud-basedSaaSform.

    Inthisreport,weconsiderDBMSofferingsthatformtheheartoftheanalyticplatform.

  • 7/27/2019 Beyond Traditional Data Warehouse

    8/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 8

    Types of Analytic Platforms

    alternativelanguagesanddatamodels.

    TherearenumerousfeaturesandfunctionsthatdifferentiateADBMSsfromoneanother,butforthe

    Useofproprietaryhardwarev -

    Hardwaresharingmodelforprocessinganddata:v Increasingly,ADBMSvendorssupport

    -

    ageinashared-nothingenvironmentormaybeconnectedtosharedstoragesuchasastorage

    areanetwork(SAN).

    Storageformatandsmartdatamanagement:v ManyADBMSsareusingcolumnarstorage,

    -

    portbothrowandcolumnformatinonehybridformoranother.Somealsoaddintelligenceat

    thestoragelayertopre-processsomeretrievaloperations.Alluseavarietyofencoding,com-

    pressionanddistributionstrategies.

    SQLsupportv

    preventingsomequeriesfromrunningadequatelyoratall.

    NoSQLtoo.v

    Programmingextensibilityv :ADBMSenginesoffervaryingdegreesofsupportfortheinstal-

    functionsthemselvesandwithpartners,andsomeofthesetakeadvantageofsystemparallelism

    forperformanceimprovement.

    Deploymentmodelsv .ADBMSsmaybedeliveredasanappliance:acompletepackageofhard-

    wareandsoftware;software-onlyproductsmaybedeployedonpremisesoncommodityhard-

  • 7/27/2019 Beyond Traditional Data Warehouse

    9/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 9

    Hardware Directions

    Thingsarechangingfast.Severalkeyelementsofthehardwaremixareundergoingenormouschange,

    withprofoundimplicationsforsystemdesignanditsimpactonanalyticperformance.

    Memoryisthenewdisk;diskisthenewtape

    Morecores,morethreads,yieldmoreprocessingpower .Theadditionofmorecores(and

    processingthreads)tochipshassimilarimplications.Assoftwaresmartenoughtobreakupand

    .Thespeedofinterconnectscanbean

    enormousbottleneckforsystemperformance.Movingdataaroundinsidelargesystemsorfromone

    Message from the Market: Its Time

    Marketschangerapidly,buttheeffectsareoftennotfeltforyears.Thevalueofalreadyinstalled

    softwareinmostcategoriesisseveralordersofmagnitudelargerthanthespendingonitinanygiven

    yearortwo.Maintenanceandsupportcostsforinstalledsoftwaredwarfsnewspending.Butatthe

    addedanotherthousandorso.Theseveralhundredmilliondollarsspentwiththesenewcomers

    Butincontext,thesenumbersarehardlyablipontheradar.TherearehundredsofthousandsofDBMSs

    installed;so-calleddatawarehouseDBMSsalesareestimatedat$7billionperyear.TheADBMSis

    LeavingasideTeradataandSybase,ADBMSvendorscollectivelygenerateafewhundredmilliondollars

    respondentstoldusthattheytypicallybegintheirsearchforaplatformwiththeirincumbentDBMS

    vendor.

    Welearnedinourinterviewsthatthoseadoptinganalyticplatformsareagentsofchange.Theyare

    creatingnewvalue,newbusinessopportunities,andnewcustomeropportunities.Fromacompetitive

  • 7/27/2019 Beyond Traditional Data Warehouse

    10/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 10

    yourproblem,youcanstartfast.Andgetvaluefast.Atlowercostthanyoumayhavethoughtpossible.

    Techniques and Technologies

    Inthissection,wereviewsomeofthekeytechniquesandtechnologiesofferedbyanalyticplatforms,

    andoffersomesuggestionsaboutthingstoconsiderwhenevaluatingthesesolutions.

    ADBMS versus a General Purpose RDBMS

    Ananalyticplatformconsistsofthreemainsoftwarecomponents:thedataintegrationsoftware

    fortransformingandloadingsourcedataintotheplatformsdatabase,thedatabasemanagement

    deliveranalyticstousers.Inatraditionaldatawarehousingenvironment,thesethreecomponentsare

    purchasedseparatelyandintegratedbythecustomer.Akeydifferencewithananalyticplatformisthat

    thevendordoestheintegrationanddeliversasinglepackagetothecustomer.

    andanalyticITapplications.Giventhetrendbymanycompaniestowardextremeprocessingatboth

    forageneralpurposeorclassic

    Thebroadeningapplicationprocessingspectrumisleadingtovendorsdevelopingdatabase

    managementsoftwarethatfocusesonanarrowersubsetofthatspectrum.Inthisreport,productsthat

    targetanalyticprocessingaredescribedasADBMSs.

    workloadvaries.ThechallengeinselectinganADBMSistomatchtheworkloadtotheproduct.This

    isespeciallytrueinthecaseofextremeprocessingandalsoinbusinessenvironmentswithconstantly

    Inoursurveyandcustomerinterviewsweaskedpeopleaboutkeytechnologyrequirementsfor

    ratedasveryimportant

  • 7/27/2019 Beyond Traditional Data Warehouse

    11/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 11

    Figure4:WhatFeaturesareImportantforYourAnalyticPlatform?

    in-databaseprocessingwas

    exploitingthepowerofaparalleldatabaseenginetoruncertainperformancecriticalcomponentsofa

    dataintegrationoranalyticapplication.

    Scoresthatwerelowerthanexpectedwere:supportforopensourcedataintegrationandBItools

    ownhardware.

    extremeprocessing

    coupledwitheasyscalingwerethemainproductselectioncriteria.Mostofthesecustomersalso

    requiredhighavailability.

    ADBMS Application Development Considerations

    separatingtheuser,orlogical,viewofdatafromthewayitisphysicallystoredandmanaged.An

    independenceremainslargelyuniquetorelationaltechnology.

    Fromadevelopmentperspective,thefactorstoconsiderwhenselectingananalyticplatformandits

  • 7/27/2019 Beyond Traditional Data Warehouse

    12/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 12

    concerntoapplicationsdevelopersthantheusersofinteractiveanalytictools.Fortheselatterusers,

    Asalreadynoted,mostofthecustomersinterviewedforthisreportwereusingextremeprocessing,

    veryimportantselectioncriterionforthesecustomers.Severalcustomerscommentedthatsomeofthe

    courseiscontrarytooneofthemaintenetsoftherelationalmodel.

    pushanalyticfunctionsandprocessinginto

    theADBMSwillusuallyboostperformanceandmakecomplexanalysespossiblefromuserswhohave

    theexpertisetousesuchfunctions,butnottheskillstoprogramthem.Formanyofthecustomersweinterviewed,in-databaseprocessingwasanimportantfeaturewhenchoosingaproduct.Theuseof

    suchprocessing,however,canlimitapplicationportabilitybetweendifferentADBMSproductsbecause

    ofimplementationdifferences.

    processing,itdoesnotnecessarymeanthisprocessingisdoneinparallel.Someoftheprocessing

    functionsmayberuninparallel,whileothersmaynot.Allofthemprovidemorerapidimplementation,

    ADBMS Data Storage OptionsADBMSsoftwaresupportsawidevarietyofdifferentdatastorageoptions.Examplesinclude:

    partitioning,indexing,hashing,row-basedstorage,column-basedstorage,datacompression,in-

    memorydata,etc.Also,someproductssupportashared-diskarchitecture,whileothersuseashared-

    nothingapproach.Theseoptionscanhaveabigimpactonperformance,scalability,anddatastorage

    requirements.Theyalsocauseconsiderablediscussionbetweendatabaseexpertsastowhichoption

    isthebesttouse.Thecurrentdebateaboutrow-basedversuscolumn-basedstorageisagoodexample

    Inanidealworld,anADBMSwouldsupportallthesevariousoptionsandallowdeveloperstochoose

    themostappropriateonetouseforanygivenanalyticworkload.ADBMSproducts,however,vary

    applicationdeployment,andtodatabaseadministration.Aproductcouldautomaticallyselector

    recommendthebestoption,andsomeproductsarebeginningtosupportthis.Ingeneral,however,

    workloads.

    ThephysicalstorageoptionssupportedbyanADBMSproductshouldbecompletelytransparenttothe

  • 7/27/2019 Beyond Traditional Data Warehouse

    13/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 13

    performance.Thiswascertainlythecaseforseveralofthecustomersinterviewedforthereport.

    Anotheroptionofcourseisgowithaproductthatprovidesverylittleinthewayoftuningoptions

    andinsteademployabrute-forceapproachofsimplyinstallingmorehardwaretosatisfyperformanceneeds.Thetheoryisthathardwaretodayischeapcomparedtodevelopmentandadministrationcosts.

    The Role of MapReduce and NoSQL Approaches

    processmassiveamountsofdataeveryday.Ahighpercentageofthisdataisnotwellstructuredand

    Alandmarkpaper

    MapReduceisaprogrammingmodelandanassociatedimplementationforprocessingand

    generatinglargedatasets.Programswritteninthisfunctionalstyleareautomatically

    parallelizedandexecutedonalargeclusterofcommoditymachines.Theruntimesystemtakes

    careofthedetailsofpartitioningtheinputdata,schedulingtheprogramsexecutionacrossasetof

    machines,handlingmachinefailures,andmanagingtherequiredinter-machinecommunication.

    Thisallowsprogrammerswithoutanyexperiencewithparallelanddistributedsystemstoeasilyutilizetheresourcesofalargedistributedsystem.

    key/valuepairs.Therecords

    areproducedfromsourcedatabythemapprogram.The value

    typeofarbitrarydata.Googleusesthisapproachtoindexlargevolumesofunstructureddata.Note

    managementsystems.GooglehasintegrateditintoitsBigTablesystem,whichisaproprietaryDBMS

    Severalofthesponsorsofthisreportprovidethiscapability.Thishybridapproachcombinesthe

    engine.

  • 7/27/2019 Beyond Traditional Data Warehouse

    14/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 14

    NoSQLmovement.

    reliability.

    togetherinahybridenvironment.

    customerswant,andarebuildingconnectorsbetweenthetwotechnologies.Maybethisiswhythe

    websitenosql-database.orgprefersthepragmatictermNotonlySQL

    textualdata,andsubsetsofthisdatawerethenbroughtintotheanalyticenvironmentusingasoftware

    Administration and Support Considerations

    customerinterviews.Severalofthecustomersinterviewedalsosaidthatsimpleadministrationwas

    animportantproductselectioncriterionbecausetheydidntwanttoemployanarmyofdatabase

    administrators.Easyadministrationwasparticularlyimportantwhendesigningdatabasesandstorage

    structures,andwhenaddingnewhardware.

    Severalofthecustomersinterviewedalsonotedthatasworkloadsincreasedinvolumeandbecamemoremixedinnature,theworkloadmanagementcapabilitiesoftheADBMSbecamemoreimportant.

    Alloftheinterviewedcustomerswerehappywiththesupporttheyreceivedandtheworking

    relationshiptheyhadwiththeirvendors.Severalalsocommentedthatthevendorwasusuallyvery

    receptivetoaddingnewfeaturestotheanalyticplatformtomeettheirneeds.

  • 7/27/2019 Beyond Traditional Data Warehouse

    15/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 15

    Deployment Models

    Thedeploymentoptionsofferedbyanalyticplatformvendorsvary.Somevendorsprovideacomplete

    packageofhardwareandsoftware,whileothersdeliveranintegratedpackofsoftwareandthenlet

    customersdeployitontheirownin-housecommodityhardware.Somevendorsalsooffervirtual

    softwareimagesthatareespeciallyusefulduringforbuildingandtestingprototypeapplications.

    eitherthevendorsorathird-partydatacenterorforuseonanin-houseprivatecloud.Insomecases,

    thevendormayalsoinstallandsupportaprivatecloudanalyticplatformonbehalfofthecustomer.

    Ideally,avendorshouldsupportavarietyofdifferentdeploymentoptionsforitsanalyticplatform.This

    customermayopt,forexample,todevelopandtestanapplicationinapubliccloudandthendeploythe

    areruninhouse,whileothersaredeployedinapublicclouddependingonperformance,costanddata

    securityneeds.

    Use Cases

    Basedonpriorexperience,thesurveyresultsandcustomerinterviewsfromourresearchstudy,wecan

    1. Deployinganenterprisedatawarehousingenvironmentthatsupportsmultiplebusinessareas

    andenablesbothintra-andinter-businessareaanalytics.

    useinanalyticprocessing

    Figure5:AnalyticPlatformUseCases

  • 7/27/2019 Beyond Traditional Data Warehouse

    16/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 16

    Beforelookingateachoftheseusecasesindetail,itisimportanttocommentaboutthesurveyresults

    andcustomerinterviewsusedinthissectionofthereport.

    environments,andtechnologymaturity.Thecustomersinterviewedforthereport,ontheotherhand,

    wererecommendedbyeachofthereportsponsors,andwere,inmanycases,developinganalyticsolutionswhereitwasnotpracticaltomaintainthedatainatraditionaldatawarehousingenvironment.

    Theresultsandopinionsfromthetwogroupsthereforesometimesdiffer.Thesurveyaudienceresults

    theinterviewedcustomersdemonstratethedisruptiveforcestakingplaceintheindustrythatenable

    completelynewtypesofanalyticapplicationtobedeveloped.

    Use Case 1: Enterprise Data Warehousing

    Thisusecaseiswellestablishedandrepresentswhatcanbeconsideredtobethetraditionaldata

    warehousingapproach.Theenvironmentconsistsofacentralenterprisedatawarehouse(EDW)with

    oneormorevirtualordependentdatamarts.ThedataintheEDWanddatamartshasbeencleansed

    historicalreportinganddataanalysispurposesbymultiplebusinessareas.

    containingdataextractedfromanEDW.

    datawarehousing.Forthiscustomer,reducingsoftwarecostswasthemainreasonformovingtoan

    andanalyticprocessing).

    Figure6:WhatUseCasesareBeingDeployedonYourAnalyticPlatform?

  • 7/27/2019 Beyond Traditional Data Warehouse

    17/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 17

    Use Case 2: Independent Analytic Solution

    surveyrespondentswereusingananalyticplatformforthisusecase.Therearethreemainreasonswhy

    Inthissituation,

    theanalyticplatformmaybetheinitialstepinbuildingoutatraditionalEDWenvironment.

    butgrow,viaasetofscalableofferings,toprovidealargeEDWsystemthatcansupportmultiple

    businessareas.

    anexistingEDW.Inthissituation,ananalyticplatformoffersthepromiseofdeployingthisso-

    calledindependentdatamartsolutionatalowercostandashortertimetovalue.Inthefuture,

    dependingonbusinessneed,thedatainthedatamartmaybeintegratedintoanEDW.Many

    companieshavelearnedfromexperience,however,thatindependentdatamartsmaysavetimeandmoneyintheshortterm,butmayprovemorecostlyinthelongtermbecausedatamarts

    haveatendencytoproliferate,whichcreatesdataconsistencyanddataintegrationissues.Asa

    result,manyexpertshaveanegativeviewoftheindependentdatamartapproach.

    c) Theorganizationneedstosupportextremeprocessingwhereitisunnecessaryorimpracticalto

    incorporatethedataintoanEDW.Sixofthecustomersinterviewedforthisresearchreport

    matchthisscenario.Dependingonbusinessneed,theindependentanalyticsolutionmay

    acquiredatafromanEDWtoaugmenttheanalysesandmayalsoreplicatetheprocessing

    resultsbackintoanEDW.Someindependentanalyticsolutionsmaybeexperimentalinnature

    thetraditionaldatawarehousinglifecycle.Theextremeprocessingscenario,however,isrelativelynew

    andrepresentsthebiggestpotentialforbusinessgrowthandexploitationofanalytics.Itisimportant,

    therefore,tolookatextremeprocessinginmoredetail.

    impossible,forcost,performance,ordatalatencyreasonstoloadcertaintypesofdata(highvolume

    webeventdata,forexample)intoanEDW.Insomecasesitmaynotevenbenecessary.Theapplication

    mayinvolvedatathatonlyhasausefullifespanofafewdaysorweeks.Note,however,thattheselatter

    typesofapplicationsdonotprecludetheanalyticresults,orscoredoraggregateddatafrombeing

    storedinanEDWforusebyotheranalyticapplications.

    Anotherfactordrivingextremeprocessingisthenatureoftheanalyticalprocessingitself.BIusersare

    detaileddataaswellasaggregateddata.Theyarealsobuildingmorecomplexanalysesandmore

    advancedpredictivemodels.Thereisalsoanincreasingdemandbytheseusersforenablingadhoc

  • 7/27/2019 Beyond Traditional Data Warehouse

    18/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 18

    Extremedatacoupledwithextremeanalyticalprocessingleadstotheneedforhighperformance

    andelasticscalability.Indata-drivencompanies,manyanalyticapplicationsaremissioncritical,and

    reliabilityandhighavailabilityarethereforealsoofgreatimportance.Givenconstantlychanging

    businessrequirementsanddatavolumes,theanalyticplatforminthesesituationsneedstosupport

    replace.Theseextremeneedsrequireanewapproachtodatawarehousing,and,inouropinion,thisisthesweetspotfornewandevolvinganalyticplatforms.Theseanalyticsolutionsdonotreplacethe

    Tousethetermindependentdatamarttodescribetheunderlyingdatastoreanddatamanagement

    systemsupportingextremeanalyticapplicationprocessingmisrepresentsthisnewbreedof

    extremeanalyticplatform.

    Use Case 3: Filtering, Staging, and Transformation of Data

    environmentsinvolvinghighvolumesofdataand/orawidevarietyofdatasourcesandtypesofdata.

    competitortothisapproach.

    TheprocessingofthedatainthisusecaseistypicallydoneusinganELTLapproachwherethe:

    Extractv

    Firstloadv

    Transformv

    Secondloadv steploadsthetransformeddataintotheADBMSoraremoteDBMSforanalyticprocessing

    thedetaileddataandtheaggregatedresultsfromtheELTLprocessing.

    Theanalyticprocessingperformedinthisusecasesupportsdatatransformationandaggregation,

    forexample)isastrongcandidateforthistypeoftransformation.

    Thisusecasealsooffersanalternativetousingextremeprocessing.Insteadofloadinghigh-volume

    Wecanseefromtheresearchstudysurveyresultsandcustomerinterviewsthatanalyticplatforms

  • 7/27/2019 Beyond Traditional Data Warehouse

    19/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 19

    circumstancescausedthemtomoveto,orconsider,ananalyticplatformforsupportingtheseusecases.

    TheresultsareshowninFigure7.

    Figure7:WhatIssuesLedYoutoUseanAnalyticPlatform?

    demonstratethatcostisnotthemaindrivingforcebehindusingananalyticplatform.Thiswas

    reasonforreplacingitsexistingEDWsolutionwithanewanalyticplatform.

    relationshiptocost.Manyofthecustomersinterviewedforthisreportwerebuildingextremeanalytic

    solutions,andthereasonstheygaveforchoosinganygivenanalyticplatformallmatchedoneormoreof

    bebuiltbefore.Thiswaseitherbecausetheapplicationcouldntprovidetherequiredperformanceno

    matterhowmuchhardwarewasemployedorbecausetheamountofhardwarerequiredtoachieve

    acceptableperformancewascostprohibitive.

    platformsprovidecost-effectivesolutionsthatextend,ratherthanreplace,theexistingdata

    warehousingenvironment.Theyenableapplicationsthatsimplycouldnotbebuiltbefore.

    Getting Started

    Successinbusinessisanelusivething:solvingtodaysquestionopensnewpossibilitiesandnew

    theprocurementstaffisasignedcontract;forbusinessanalyststhechallengeisgreater.Thefollowing

    aresomethoughtsonhowtoensurethattheplatformselectedworkstodayandwillcontinuetogrow

    andevolveasyourneedsdo.

  • 7/27/2019 Beyond Traditional Data Warehouse

    20/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 20

    Therearealsobestpracticesfortheimplementationofyouranalyticsstrategythatleveragethetools

    youacquireandfollowingthemwillmakesuccessmorelikely.Thevendorsandusersweinterviewed

    offeredsomevaluableinsightsandweincludethemhere,togetherwithaquickreviewofthesuccess

    factorsthatmakethedifference.

    Selecting the Right Platform

    Foranalyticplatformselection,thereisonlyoneplacetostart:understandingtheanalyticsplanned.

    Theusecasesinthisstudyhintatthepossibilities,buttheyalsomakeitclearthatthereisenormous

    variety:intheskillsandpreferredtoolsoftheusers,thebusinessproblemsbeingtackled,thetypesof

    analysisrequired,thelatencyofthedata,andtheusersvolumes.Willstandardreportingbeenough?

    Isadhocanalysiswithdrill-downandslice-and-diceoperationsrequired?Doesinternalandexternal

    dataneedtobecombined?Willtheanalysesinvolvedataminingandpredictivemodelbuilding?

    Willtemporaryanalyticdatastoresbesetup,processed,andthentorndownfrequently?Whatare

    thecurrentandfuturedatavolumesandnumberofusers?Knowtheseanswersbeforeyoubegin.

    howevercomplex,cansubstitutefortheonecriticalelement:testingonyourdata,withyourqueries,on

    thehardwareandsoftwareplatformyouplantouse,withthenumberofconcurrentusersthatmatches

    theexpectedusagepatterns.Asyoudecidewhoshouldbeonyourshortlistforactualtests,hereare

    somekeyaspectstoconsiderasyoudrawupyourrequirements.

    v .Ensurethatyoucanloaddataatthespeedyou

    Workingwithyourlanguagesandtoolsv -

    skills,andanswermorequestionsthanyouhavebefore.

    Supportingyourtoughestquestionsv .Ifyoudidyourhomework,youshouldknowthetough

    fromthemostmatureondown.

    storagehardware,thespeedoftheinterconnects,thelevelofpre-integrationprovidedacrossthe

    softwarestack.Besurethatsettinguptestordevelopmentsystemsisnomorecomplexthanyouare

    comfortablewith;analyticsisanincreasinglyiterativeprocess.Explorethepossibilityofdoingsuch

    testinginthecloudtoproductiononyourhardware?

    involveddontseemknowledgeable,problemstakealongtimetoresolve,and/orsetupseemstotake

  • 7/27/2019 Beyond Traditional Data Warehouse

    21/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 21

    alongtime,youmustconsiderwhatitwillbelikewhenthecheckhasbeensignedandthepurchase

    made.Assesswhatservicesareavailabletoyoufordesign,training,andsupport.Andbesureto

    leavesomesurprises.Donotconductyourtrialonprearrangedqueriesandanalysesonly.Stresstest

    workloadsthatmirroryourexpectedonesinnumberofusers,volumesofdata,andotherprocesses

    runningiftherewillbeany.

    Conclusions

    Therightselectionprocessinvolvesunderstandingthelikelyanalyticalworkloads,datavolumeand

    often,somecandidatescantgetitdoneatall.

    SupportforopensourcedataintegrationandBItools,columnardatastorage,andacompletehardware

    adoptersandsurveydatashowthatqueryperformance,supportforcomplexanalytics,andon-demand

    capacityare.Asanalyticplatformsbecomemainstream,however,itslikelythateaseofinstallationand

    supportandaggressivedatacompressionstrategieswillbegintogrowinimportance.

    andthisdevelopmentshoulddriveincreasedawarenessandgrowth.Theanalyticplatformwilldrive

    billionsofdollarsinrevenueinthenextdecade,andtransformexpectationsabouttheabilitytouse

    datatoimprovebusinessresults.

  • 7/27/2019 Beyond Traditional Data Warehouse

    22/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 22

    Appendix: Detailed Survey Results

    Q1:

    generatingbusinessanalyticsfromthatdata,whichoffersprice/performanceandtimetovalue

    superiortonon-specializedofferings.Thissolutionmaybedeliveredasanappliance(software-only,

    packagedhardwareandsoftware,virtualimage)and/orinacloud-basedsoftware-as-a-service

    Value

    Yes

    No 14

    Q2: Value Alreadyusing

    Noplans

    Q3:

    Value

    1businessyear(orthepast4completequarters)orless

    Q4: Value

    No

    Yes,withhand-codedprograms

    Yes,withpackagedtools

  • 7/27/2019 Beyond Traditional Data Warehouse

    23/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 23

    Q5: Whichofthefollowingusecases[architecturalmodels]arebeingdeployedforyouranalytic

    Value

    Enterprisedatawarehouse(EDW)

    Datastagingareaforadatawarehouse

    Dependentdatamart

    Independentdatamartordatastore

    7

    Q6:

    Value

    Needforcomplexanalyses

    Needforon-demandcapacity

    Growthinnumberofconcurrentusers

    Loadtimes

    Availabilityandfaulttolerance

    Archivingorbackuptimes

    Q7:

    loadedintoadatastorebeforeaddingindexes,aggregatetables,materializedviewsand/orcubes

    builtfromtherawdata.)

    Value

    Lessthan1terabyte

    18

  • 7/27/2019 Beyond Traditional Data Warehouse

    24/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 24

    Q8: Howmuchdataareyoumanagingonyouranalyticplatformafterloading,tuning,enhancing,and

    Value

    Lessthan1terabyte

    Q9:

    Value

    Q10:

    Value

    StructuredlegacyDBMSdata 78

    Weblogs

    Eventormessagedata

    Datafromenterpriseservicebusorwebservice

    11

    11

  • 7/27/2019 Beyond Traditional Data Warehouse

    25/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 25

    Q11:Pleaseratethefollowingfeaturesthatwere/areimportantinacquiringorplanningyouranalytic

    Very

    ImportantSomewhatImportant

    NotVeryImportant

    Goodadministrationtools

    Faulttoleranceandhighavailability

    IntegrationintoITenvironment

    Easyscaling&hardwareupgrades

    SupportforcommercialDI/BItools

    In-databaseprocessing

    Workloadmanagement

    Datacompression SupportforopensourceDI/BItools

    Supportforcloudcomputing

    In-memorydata

    Q12:

    Value

    Fully

    No

  • 7/27/2019 Beyond Traditional Data Warehouse

    26/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 26

    Q13:

    Value

    17

    Government

    Education

    Manufacturing/Industry(non-computerrelated)

    4

    Manufacturingconsumergoods 4

    Aerospace

    Q14:

    Value

    18

    17

    11

  • 7/27/2019 Beyond Traditional Data Warehouse

    27/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 27

    Q15:

    Value

    Businessdepartment

    Businessdivision

    Q16:Pleasetelluswhereyouandyourcompanyarelocated.

    ValueNorth

    AmericaEurope

    LatinAmerica

    Whereareyoulocated?

  • 7/27/2019 Beyond Traditional Data Warehouse

    28/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 28

    Vertica Overview and Business Description

    Vertica(www.vertica.com)ledthewayincombiningtwokeytechnologiesdrivingthenewgeneration

    severalotherkeyexecutivesasitmakesthetransitionfrompromisingstartuptoaleadingplayerinits

    communicationsservices,healthcare,socialnetworkingandonlinegaming,andretailandWeb-based

    ofitstime-value(e.g.,immediateavailability),queryperformanceandbroadaccessbyusers(e.g.,highconcurrency).Verticatoutsextremeloadperformance,concurrentandhighperformancequery

    extremelyfavorablecomparisonsontotalcostofownership,especiallyhardwarecosts;oneearly

    commodity-basedplatformrunningthesameapplication.

    TheVerticaAnalyticDatabaseisavailableassoftware-only,asahardware-basedappliance,asavirtual

    freetrialversionfordownloadtopressitscase.

    ArchitectureAsitsnameimplies,Verticawasdesignedfromthebottomupasacolumn-orientedstorageand

    enhancesperformancedramatically.Makingtimetodeploymentakeyvaluedroveafocusonautomatic

    databasedesign:Verticaprovidesaphysicaldesigntoolthatgeneratesandpartitionsdataacrossnodes

    basedontheinputofalogicaldesign,sampledata,andsamplequeries.Theoutputisanautomatic

    varioussortorders,encodingandcompression,andthetoolcanbereruntomakeincrementalchanges

    veryfrequentlyusedtogether)automaticallyaswell.Theautomationfreesdatabasestafftofocuson

  • 7/27/2019 Beyond Traditional Data Warehouse

    29/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 29

    strategiesandresultsvarybydatatype:Verticahasfoundcompressionratiosfortelcocalldatarecords

    Verticashighavailabilitystrategyisbasedonwhatisknownask-safetyredundancy(replication

    thesereplicasbystoringthedataindifferentsortordersforfurtherperformanceimprovements.

    Automatednoderecoveryandshared-nothingarchitectureeliminateasinglepointoffailure.System

    administrationwillsoonsportanewuserinterfaceforVerticasenhancedbackupanddisasterrecovery

    Analytic Functionality

    one;analyticdatabasestendtobeinmorecontinuoususeandrequirelowlatency.Verticaprovides

    Differentiation

    inthereyet,andsomeneverwillbe.Verticaistryingtoshiftconversationstowhatkindofproblem

    customerswanttosolveandfocusonhowfast,atwhatscale,andatwhatprice.Itwinsagainstlarge

    incumbent,non-specialtydatabases,expectscontinuedsuccessagainstthosethatprovidehardwareas

  • 7/27/2019 Beyond Traditional Data Warehouse

    30/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 30

    Partnerships

    forgovernmentnetworks.

    How Should Customers Start, and What Matters Most?

    Verticapointsoutthatcustomersshouldnotforgetdatacleansinganddataqualitychallenges.It

    believesthatthetimeitsavesondatabasetuningandphysicaldesignissuesshouldbespentonthese

    challengingone.

    Future/Road Map Exploitation of Trends

    Verticasexpansionwillfocuson4corethemes:

    In-databaseanalyticsv

    v

    v

    (orsubsetsofdata)

    Easeofusev

    Verticaexpectstoseeincreasedmovementtothecloud.Itsexperiencetheredatesbacktoits

    volumedatatransfers.

  • 7/27/2019 Beyond Traditional Data Warehouse

    31/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 31

    Vertica Customer Case Study: Zynga

    Company Background

    The Business Problem

    hadreachedthepointinitsgrowththatitneededaBIsystemtobringtogetherallofthedatafromits

    highprioritybyexecutivemanagement.

    improvementstogamecontentandwouldalsohelpthemindesigningnewgames.

    system.Gamesareaccessedfromsocialnetworkingsitesandonlyplayedforafewminutesatatime.

    The Analytic Platform Solution

    companyconsideredseveralcompetinganalyticplatformsandalsolookedatthepossibilityofusing

    Anotherimportantfactorwastheabilitytocompressthedatatoreducediskstoragerequirements.

    thingwelearnedfromtheevaluationprocessisthatyoureallyneedtoknowyourusecasesupfrontin

    ordertoselecttherighttechnology.

    chooseforthesekindofenvironmentsifyouwanttoachievehighperformanceandavailability.

  • 7/27/2019 Beyond Traditional Data Warehouse

    32/33

    Copyright 2010 TechTarget, BI Research, IT Market Strategy 32

    DetaileddataisloadedintotheVerticadatabaseinrealtimeusingin-housedevelopedsoftware.The

    notedthat,VerticasparallelismisideallysuitedtothisELTapproachbecausewecanpushthe

    usersforproducinghigh-levelreportsandanalyses.

    Implementation Considerations

    issueshavenowgoneawayandhardwarereliabilityisnowourmainconcern.Toimproveavailability

    wehavenowinstalledasecondhardwarecluster.

    Followingtheinitialinstallation,companygrowthanduseradoptionofthesystemcausedadramatic

    theimplementationteambecameconcernedaboutscalingthesystemtomanagewhatwerelikelytobe

    trialswithseveralvendors.TheresultsfromthesetrialsagainledtothedecisiontouseVertica.Sofar

    Verticaonscalabilityneeds.

    %HQHWV

    Metricsfromthesystemimpactandenhanceeverygameweproduce.Theabilitytoscalewascrucial

    tousbeingabletodevelopnewmetricstobettermanageandgrowourbusiness.Wefeeltherearefew

    systemsouttherethatcanprovidethislevelofscalability.Thesystemisalsoacrucialunderpinningto

    buildingnewapplications.

    Summary

    improveexistingproductsanddesignnewonesthatprovidecustomerswiththeexperiencetheywant

    performancerequirementswerekeyselectioncriteria,buttheabilityofthesystemtoscaletomeet

  • 7/27/2019 Beyond Traditional Data Warehouse

    33/33

    About the Authors

    technologyresearchforseveralyears,beforereturningtohisrootsasananalystcoveringthesoftware

    GigaInformationGroup.Mervfocusedonfacilitatingcollaborativeresearchamonganalysts,andserved

    thoseevents,asaguitaristandsinger.

    ColinWhite

    educatorandwriterheiswellknownforhisin-depthknowledgeofdatamanagement,information

    integration,andbusinessintelligencetechnologiesandhowtheycanbeusedforbuildingthesmart