Download - Integrated EDW Kimball
-
8/12/2019 Integrated EDW Kimball
1/18
EssentialStepsforEssentialStepsforEssentialStepsforEssentialStepsfor
theIntegratedEDWtheIntegratedEDWtheIntegratedEDWtheIntegratedEDW
AKimballGroupWhitePaper
ByRalphKimball
-
8/12/2019 Integrated EDW Kimball
2/18
TableofContentsTableofContentsTableofContentsTableofContents
ExecutiveSummary.................................................................................3
AbouttheAuthor......................................................................................3
WhatDoesanIntegratedEnterpriseDataWarehouse(EDW)Deliver?....4DrillingAcrossistheUltimateLitmusTestforIntegration.........................4
TheOrganizationalChallengesofProvidinganIntegratedEDW..............5
ConformedDimensionsandFacts...........................................................6
UsingtheBusMatrixasaWaytoCommunicatewithExecutives.............6
ManagingtheBackboneoftheIntegratedEDW.......................................7
TheDimensionManager..........................................................................8
TheFactProvider..................................................................................11
ConfiguringBIToolstoUsetheIntegratedEDW....................................12
AdvancedTopics...................................................................................13
Conclusion.............................................................................................18
-
8/12/2019 Integrated EDW Kimball
3/18
EssentialStepsfortheIntegratedEDWCopyright2008byKimballGroup.Allrightsreserved.
ExecutiveSummaryExecutiveSummaryExecutiveSummaryExecutiveSummary
Inthiswhitepaper,weproposeaspecificarchitectureforbuildinganintegratedenterprisedatawarehouse(EDW).Thisarchitecturedirectlysupportsmasterdatamanagementeffortsandprovidestheplatformforconsistentbusinessanalysisacrosstheenterprise.Wedescribethescopeandchallengesofbuildingan
integratedenterprisedatawarehouse,andweprovidedetailedguidancefordesigningandadministeringthenecessaryprocessesthatsupportintegration.ThiswhitepaperhasbeenwritteninresponsetoalackofspecificguidanceintheindustryastowhatanintegratedEDWactuallyis,andwhatnecessarydesignelementsareneededtoachieveintegration.
AbouttheAuthorAbouttheAuthorAbouttheAuthorAbouttheAuthor
RalphKimballfoundedtheKimballGroup.Sincethemid1980s,hehasbeenthedatawarehouse/businessintelligence(DW/BI)industrysthoughtleaderonthedimensionalapproachandtrainedmorethan10,000ITprofessionals.PriortoworkingatMetaphorandfoundingRedBrickSystems,Ralphco-inventedtheStarworkstationatXeroxsPaloAltoResearchCenter(PARC).RalphhashisPh.D.inElectricalEngineeringfromStanfordUniversity.
TheKimballGroupisthesourcefordimensionalDW/BIconsultingandeducation,consistentwithourbest-sellingToolkitbookseries,DesignTips,andaward-winningarticles.Visitwww.kimballgroup.comformoreinformation.
-
8/12/2019 Integrated EDW Kimball
4/18
EssentialStepsfortheIntegratedEDW
WhatDoesanIntegratedEnterpriseDataWarehouse(EDW)WhatDoesanIntegratedEnterpriseDataWarehouse(EDW)WhatDoesanIntegratedEnterpriseDataWarehouse(EDW)WhatDoesanIntegratedEnterpriseDataWarehouse(EDW)Deliver?Deliver?Deliver?Deliver?
ThemissionstatementfortheintegratedEDWistoprovidetheplatformforbusinessanalysistobeappliedconsistentlyacrosstheenterprise.Aboveall,thismissionstatementdemandsconsistencyacrossbusinessprocesssubjectareas
andtheirassociateddatabases.Consistencyrequiresdetailedtextualdescriptionsofentitiessuchascustomers,products,locations,andcalendarstobeapplieduniformlyacrosssubjectareas,usingstandardizeddatavalues.Ofcourse,thisisafundamentaltenetofmasterdatamanagement(MDM).
Consistencyrequiresaggregatedgroupingssuchastypes,categories,flavors,colors,andzonesdefinedwithinentitiestohavethesameinterpretationsacrosssubjectareas.Thiscanbeviewedasahigherlevelrequirementonthetextualdescriptionsdescribedinthepreviousparagraph.
ConsistencyrequiresthatconstraintsposedbyBIapplicationswhichattemptto
harvestthevalueofconsistenttextdescriptionsandgroupingsbeappliedwithidenticalapplicationlogicacrosssubjectareas.Forinstance,constrainingonaproductcategoryshouldalwaysbedrivenfromafieldnamedCategoryfoundintheProductdimension.
Consistencyrequiresthatnumericfactsarerepresentedconsistentlyacrosssubjectareassothatitmakessensetocombinethemincomputationsandcomparethemtoeachother,perhapswithratiosordifferences.Forinstance,ifRevenueisanumericfactreportedfrommultiplesubjectareas,thenthedefinitionsofeachoftheserevenueinstancesmustbethesame.
Consistencyrequiresthatinternationaldifferencesinlanguages,locationdescriptions,timezones,currencies,andbusinessrulesberesolvedtoallowalloftheaboveconsistencyrequirementstobeachieved!
Consistencyrequiresthatauditing,compliance,authentication,andauthorizationfunctionsbeappliedinthesamewayacrosssubjectareas.
Finally,consistencyimpliescoordinationwithindustrystandardsfordatacontent,dataexchange,andreporting,wherethosestandardsimpacttheenterprise.TypicalstandardsincludeACORD(insurance),MISMO(mortgages),SWIFTandNACHA(financialservices),HIPAAandHL7(healthcare),RosettaNet(manufacturing),andEDI(procurement).
DrillingAcrossistheUltimateLitmusTestforIntegrationDrillingAcrossistheUltimateLitmusTestforIntegrationDrillingAcrossistheUltimateLitmusTestforIntegrationDrillingAcrossistheUltimateLitmusTestforIntegration
EvenanEDWthatmeetsalloftheconsistencyrequirementsdescribedabovemustadditionallyprovideamechanismfordeliveringintegratedreportsandanalysesfromBItools,attachedtomanydatabaseinstances,possiblyhostedonremote,incompatiblesystems.Wecallthisdrillingacross.DrillingacrossistheessentialactoftheintegratedEDW.Whenwedrillacross,wegatherresultsfromseparatebusinessprocesssubjectareasandthenalignorcombinetheseresultsintoasingleanalysis.
Forexample,supposeourintegratedEDWspansmanufacturing,distributionand
-
8/12/2019 Integrated EDW Kimball
5/18
EssentialStepsfortheIntegratedEDW
retailsalesinabusinessthatsellsaudio/visualsystems.Wellassumethateachofthesesubjectareasissupportedbyaseparatetransactionprocessingsystem.AproperlyconstructeddrillacrossreportcouldlooklikeFigure1.
Figure1.AThreeFactTableDrillAcrossReportFigure1.AThreeFactTableDrillAcrossReportFigure1.AThreeFactTableDrillAcrossReportFigure1.AThreeFactTableDrillAcrossReport
ThefirsttwocolumnsarerowheadersfromtheProductandCalendarconformeddimensions,respectively.Theremainingthreefactcolumnseachcomefromseparatedatabases,namelymanufacturing,distribution,andretailsales.ThisdeceptivelysimplereportcanonlybeproducedinaproperlyintegratedEDW.Inparticular,theProductandCalendardimensionsmustbeavailableinallthreeseparatedatabases,andtheCategoryandPeriodattributeswithinthosedimensionsmusthaveidenticalcontentsandinterpretations.Althoughthemetricsinthethreefactcolumnsaredifferent,themeaningofthemetricsmustbeconsistentacrossproductcategoriesandtimes.
YoumustunderstandandappreciatethetightconstraintsontheintegratedEDWenvironmentdemandedbytheabovereport.Ifyoudont,youwontunderstandthiswhitepaper,andyouwonthavethepatiencetostudythedetailedstepsdescribedbelow.Or,toputthedesignchallengeinotherterms,ifyoueventuallybuildasuccessfulintegratedEDW,youwillhavevisitedeveryissueinthispaper.So,withthosewarnings,readon!
TheOrganizationalChallengesofProvidinganIntegratedEDWTheOrganizationalChallengesofProvidinganIntegratedEDWTheOrganizationalChallengesofProvidinganIntegratedEDWTheOrganizationalChallengesofProvidinganIntegratedEDW
TheintegratedEDWdeliverablesdescribedaboveareadauntinglistindeed.But
forthesedeliverablestoevenbepossible,theenterprisemustmakeaprofoundcommitment,startingfromtheexecutivesuite.Theseparatedivisionsoftheenterprisemusthaveasharedvisionofthevalueofdataintegration,andtheymustanticipatethestepsofcompromiseanddecisionmakingthatwillberequired.Thisvisioncanonlycomefromtheseniorexecutivesoftheenterprise,whomustspeakveryclearlyonthevalueofdataintegration.
ExistingmasterdatamanagementprojectsprovideanenormousboostfortheintegratedEDW,sincepresumablytheexecutiveteamalreadyunderstandsandapprovesthecommitmenttobuildingandmaintainingmasterdata.AgoodMDM
-
8/12/2019 Integrated EDW Kimball
6/18
EssentialStepsfortheIntegratedEDW
resourcegreatlysimplifies,butdoesnoteliminate,theneedfortheEDWteamtobuildthestructuresnecessaryfordatawarehouseintegration.
Inmanyorganizations,achicken-and-eggdilemmaexists,astowhetherMDMisrequiredbeforeanintegratedEDWispossible,orwhethertheEDWteamcreatestheMDMresources.Often,alowprofileEDWefforttobuildconformeddimensionssolelyfordatawarehousepurposesmorphsintoafull-fledgedMDM
effortthatisonthecriticalpathtosupportingmainlineoperationalsystems.Inourclassessince1993,wehaveshownabackwardpointingarrowleadingfromcleaneddatawarehousedatatooperationalsystems.Intheearlydays,wesighedwistfullyandwishedthatthesourcesystemscaredaboutclean,consistentdata.Now,morethanfifteenyearslater,weseemtobegettingourwish!
ConformedDimensionsandFactsConformedDimensionsandFactsConformedDimensionsandFactsConformedDimensionsandFacts
Sincetheearliestdaysofdatawarehousing, conformeddimensionshavebeenusedtoconsistentlylabelandconstrainseparatedatasources.WelearnedaboutconformeddimensionsfromA.C.Nielsenin1983when,atMetaphorComputerSystems,webroughtNielsenssyndicatedscannerdatatogetherwithproduct
shipmentsdataatconsumerpackagegoodscompanies.Theideabehindconformeddimensionsisverysimple:twodimensionsareconformediftheycontainoneormorecommonfields,whosecontentsaredrawnfromthesamedomains.Thatresultsinconstraintsandlabelshavingthesamecontentandmeaningwhenappliedagainstseparatedatasources.
Conformedfactsaresimplynumericmeasuresthathavethesamebusinessandmathematicalinterpretationssothattheymaybecomparedandcomputedagainsteachotherconsistently.Usingthesenames,wehavetaughttheprinciplesofconformeddimensionsandconformedfactssince1993inourbooksandarticles.
UsingtheBusMatrixasaWaytoCommunicatewithExecutivesUsingtheBusMatrixasaWaytoCommunicatewithExecutivesUsingtheBusMatrixasaWaytoCommunicatewithExecutivesUsingtheBusMatrixasaWaytoCommunicatewithExecutives
WhenyoucombinethelistofEDWsubjectareaswiththenotionofconformeddimensions,apowerfuldiagramemerges,whichwecallthe enterprisedatawarehousebusmatrix.AtypicalbusmatrixisshowninFigure2.
Figure2.ABusMatrixforaManufacturingEDWFigure2.ABusMatrixforaManufacturingEDWFigure2.ABusMatrixforaManufacturingEDWFigure2.ABusMatrixforaManufacturingEDW
-
8/12/2019 Integrated EDW Kimball
7/18
EssentialStepsfortheIntegratedEDW
Thebusinessprocesssubjectareasareshownalongtheleftsideofthematrixandthedimensionsareshownacrossthetop.AnXmarkswhereasubjectareausesthedimension.Notethatsubjectareainourvocabularycorrespondstoabusinessprocess,typicallyrevolvingaroundatransactionaldatasource.Thuscustomerisnotasubjectarea.
AtthebeginningofanEDWimplementation,thisbusmatrixisveryusefulasa
guide,bothtoprioritizethedevelopmentofseparatesubjectareas,butalsotoidentifythepotentialscopeoftheconformeddimensions.Aswehaveoftenremarked,thecolumnsofthebusmatrixaretheinvitationlisttotheconformeddimensiondesignmeeting!
Beforetheconformeddimensiondesignmeetingoccurs,thisbusmatrixshouldbepresentedtoseniormanagement,perhapsinexactlytheformofFigure2.Seniormanagementmustbeabletovisualizewhythesedimensions(masterentities)attachtothevariousbusinessprocesssubjectareas,andtheymustappreciatetheorganizationalchallengesofassemblingthediverseinterestgroupstogethertoagreeontheconformeddimensioncontent.Ifseniormanagementisnotinterestedinwhatthebusmatriximplies,thentomakealongstoryshort,youhavenohopeofbuildinganintegratedEDW.
Itisworthrepeatingthedefinitionofaconformeddimensionatthispointtotakesomeofthepressureoffoftheconformingchallenge.Twoinstancesofadimensionareconformediftheycontainoneormorecommonfields,whosecontentsaredrawnfromthesamedomains.Thismeansthattheindividualsubjectareaproponentsdonothavetogiveuptheircherishedprivatedescriptiveattributes.Itmerelymeansthatasetofmaster,universallyagreed-uponattributesmustbeestablished.Thesemasterattributesthenbecomethecontentsoftheconformeddimensionandbecomethebasisfordrillingacross.
TheKimballGroupbooksandourarticleanddesigntiparchivescontainawealthofadditionalmaterialonthestepsofbuildingthebusmatrixforanenterpriseand
establishingconformeddimensionsandfacts.Pleasesee www.kimballgroup.com.
ManagingtheBackboneoftheIntegratedEDWManagingtheBackboneoftheIntegratedEDWManagingtheBackboneoftheIntegratedEDWManagingtheBackboneoftheIntegratedEDW
ThebackboneoftheintegratedEDWisthesetofconformeddimensionsandconformedfacts.Eveniftheenterpriseexecutivessupporttheintegrationinitiative,andtheconformeddimensiondesignmeetinggoeswell,thereisalottotheoperationalmanagementofthisbackbone.Thismanagementcanbevisualizedmostclearlybydescribingtwopersonalityarchetypes:the dimensionmanagerandthefactprovider.Briefly,thedimensionmanagerisacentralizedauthoritywhobuildsanddistributesaconformeddimensiontotherestoftheenterprise,andthefactprovideristheclientwhoreceivesandutilizestheconformeddimension,almostalwayswhilemanagingoneormorefacttableswithinasubjectarea.
Atthispointinthewhitepaperwemustmakethreefundamentalarchitecturalclaimstopreventfalseargumentsarisingthatturnintodistractions:
1) Theneedfordimensionmanagersandfactprovidersarisessolelyfromthenaturalre-useofdimensionsacrossmultiplefacttables(orOLAPcubes).OncetheEDWcommunityhascommittedtosupportingcross-subjectareaanalysis,thereisnowaytoavoidallthestepsdescribedinthiswhitepaper!
-
8/12/2019 Integrated EDW Kimball
8/18
EssentialStepsfortheIntegratedEDW
2) Althoughwedescribethehandofffromthedimensionmanagertothefactproviderasifitwereoccurringinadistributedenvironmentwheretheyareremotefromeachother,theirrespectiverolesandresponsibilitiesarethesamewhethertheEDWisfullycentralizedonasinglemachineorisprofoundlydistributedacrossmanydiversemachinesindifferentlocations.
3) Therolesofdimensionmanagerandfactprovider,althoughobviouslycouchedindimensionalmodelingterms,donotarisefromaparticularmodelingpersuasion.Allofthestepsdescribedinthiswhitepaperwouldbeneededinafullynormalizedenvironment.Actually,themanagementofprimary,durable,andnaturalkeysdescribedlaterinthiswhitepaper,aresubstantiallymorecomplicatedinanormalizedenvironmentbecauseoftheneedtopropagatechangingkeysupanddownthechainoflinkednormalizedtables.
Thenexttwosectionsdescribetherolesofthedimensionmanagerandthefactprovider.
TheDimensionManagerTheDimensionManagerTheDimensionManagerTheDimensionManagerThedimensionmanagerdefinesthecontentandstructureofaconformeddimension,anddeliversthatconformeddimensiontodownstreamclientsknownasfactproviders.ThisrolecandefinitelyexistwithinanMDMframework,buttheroleismuchmorefocusedthanjustbeingthekeeperofthesingletruthaboutanentity.Thedimensionmanagerhasalistofdeliverablesandresponsibilities,allorientedaroundcreatinganddistributingphysicalversionsofthedimensiontablesthatrepresentthemajorentitiesoftheenterprise.Inmanyenterprises,keyconformeddimensionsincludecustomer,product,service,location,employee,promotion,vendor,andcalendar.Inthefollowing,aswedescribethedimensionmanagerstasks,wewillusecustomerastheexampletokeepthediscussionfrombeingtooabstract.Herearethetasksofthecustomerdimensionmanager:
Definethecontentofthecustomerdimension.Thedimensionmanagerchairsthedesignmeetingfortheconformedcustomerdimension.Atthatmeeting,allthestakeholdersfromthecustomerfacingtransactionsystemscometoagreementonasetofdimensionalattributesthateveryonewillusewhendrillingacrossseparatesubjectareas.Rememberthattheseattributesareusedasthebasisforconstrainingandgroupingcustomers.TypicalconformedcustomerattributesincludeType,Category,Location(multiplefieldsimplementinganaddress),PrimaryContact(name,title,address),FirstContactDate,CreditWorthiness,DemographicCategory,andothers.Everycustomeroftheenterpriseappearsintheconformedcustomerdimension.
Receivenotificationofnewcustomers.Thedimensionmanageristhekeeperofthemasterlistofdimensionmembers,inthiscasecustomers.Thedimensionmanagermustbenotifiedwheneveranewcustomerisregistered.InafullblownMDMenvironment,newcustomersshouldonlyberegisteredbyusinganMDM-suppliedprocesswhichisunderthedirectcontrolofthedimensionmanager.InamoremodestdatawarehouseenvironmentwithoutacentralizedMDMfacility,eachremotecustomerfacingprocesshasthepotentialforregisteringanewcustomer.Inthesecases,thedimensionmanagerreceivesnotificationsofnewcustomersafterthefact.WithoutanMDMfacility,thedimensionmanagerisforcedtomaintainalistofnaturalkeysofcustomersfromeachpossiblesource.Thesenaturalkeysarethe
-
8/12/2019 Integrated EDW Kimball
9/18
EssentialStepsfortheIntegratedEDW
onlywaytoreliablydistinguishanewcustomerfromanoldcustomer.
De-duplicatecustomerdimension.Thedimensionmanagermustde-duplicatethemasterlistofcustomers.Customerlistsintherealworldarenearlyimpossibletode-duplicatecompletely.EvenwhencustomersareregisteredthroughacentralMDMprocess,itisoftenpossibletocreateduplicates,eitherforindividualcustomersorbusinessentities.Thede-duplicationproblemismuchworsewhenno
centralMDMresourceexists,sincetheseparatecustomerfacingprocessesarebydefinitionnotwellcoordinated.Evenworse,theseseparatecustomerfacingprocessesmayapplydifferentbusinessrulesandhavedifferentdatabasestructureswhencollectingcustomeridentityinformation.
Assignsuniquedurablekeytoeachcustomer.Thedimensionmanagermustidentifyandkeeptrackofauniquedurablekeyforeachcustomer.ManyDBAsautomaticallyassumethatthisisthenaturalkey.Butquicklychoosingthenaturalkeymaybethewrongchoice.Anaturalkeymaynotbedurable!Usingourcustomerexample,ifthereisanyconceivablebusinessrulethatcouldchangethenaturalkeyovertime,thenitisnotdurable.Also,intheabsenceofaformalMDMprocess,naturalkeyscanarisefrommorethanonecustomerfacingprocess.Inthiscase,differentcustomerscouldhavenaturalkeysofverydifferentformats.Finally,asourcesystemsnaturalkeymaybeacomplex,multi-fielddatastructure.Forallthesereasons,thedimensionmanagerneedstostepbackfromliteralnaturalkeysandassignauniquedurablekeythatiscompletelyunderthecontrolofthedimensionmanager.Werecommendthatthisunique,durablekeybeasimplesequentiallyassignedinteger,withnostructureorsemanticsembeddedinthekeyvalue.Notethatthecreationofsuchaunique,durablekeydoesnotprecludecarryingoriginalnaturalkeysintheconformeddimensionrecord,butofcoursethisbecomescomplicatedwhentherearemultipleoriginalsourcesregisteringcustomers,potentiallywithduplications.
TrackstimevarianceofcustomerswithType1,2,and3SCDs.Thedimensionmanagermustrespondtochangesintheconformedattributesdescribinga
customer.Muchhasbeenwrittenabouttrackingthetimevarianceofdimensionmembersusingslowlychangingdimensions(SCDs).AType1changeoverwritesthechangedattributeandthereforedestroyshistory.AType2changecreatesanewdimensionrecordforthatcustomer,properlytimestampedasoftheeffectivemomentofthechange.AType3changecreatesanewfieldinthecustomerdimensionthatallowsanalternaterealitytobetracked.Thedimensionmanagerupdatesthecustomerdimensioninresponsetochangenotificationsreceivedfromvarioussources.SeeanyoftheKimballGroupbooksorourwebsiteforanextensivediscussionofSCDs.
Assignssurrogatekeysforthecustomerdimension.Type2isthemostcommonandpowerfuloftheSCDtechniquessinceitprovidesprecisesynchronizationofa
customerdescriptionwiththatcustomerstransactionhistory.SinceType2createsanewrecordforthesamecustomer,thedimensionmanagerisforcedtogeneralizethecustomerdimensionprimarykeybeyondtheunique,durablekey.Theprimarykeyshouldbeasimplesurrogatekey,sequentiallyassignedasneeded,withnostructureorsemanticsinthekeyvalue.Thisprimarykeyisseparatefromtheuniquedurablekey,whichsimplyappearsinthedimensionasanormalfield.Theunique,durablekeyisthegluethatbindstheseparateSCD2recordsforasinglecustomertogether.SeeFigure3showingthecompleterecommendedsetofkeysforthecustomerdimension,includingnatural,durable,andsurrogatekeys.
-
8/12/2019 Integrated EDW Kimball
10/18
EssentialStepsfortheIntegratedEDW
Figure3.RecommendedKeyStructureForaCustomerDimensionFigure3.RecommendedKeyStructureForaCustomerDimensionFigure3.RecommendedKeyStructureForaCustomerDimensionFigure3.RecommendedKeyStructureForaCustomerDimensionHandleslatearrivingdimensiondata.WhenthedimensionmanagerreceiveslatenotificationofaType2changeaffectingacustomer,specialprocessingisneeded.Anewdimensionrecordmustbecreated,andtheeffectivedatesofthechangesadjusted.Thechangedattributemustbepropagatedforwardintimethroughexistingdimensionrecords.PleaseseeTheDataWarehouseETLToolkitbook[Wiley,2004]foracompletedescriptionoftheseprocessingsteps.
Providesversionnumbersforthedimension.Beforereleasingachangeddimensiontothedownstreamfactproviders,thedimensionmanagermustupdatethedimensionversionnumberifType1orType3changeshaveoccurred,oriflate
arrivingType2changeshaveoccurred.ThedimensionversionnumberdoesnotchangeifonlycontemporaryType2changeshavebeenmadesincethepreviousreleaseofthedimension.Werecommendembeddingthedimensionversionnumberasafieldinthedimensionitself,whereeveryrecordinthedimensioncontainsthesameversionnumbervalue.Inthisway,allquerytoolsandreportwritersattemptingtodrillacrossseparateinstancesofthedimensioncanincludetheversionnumberintheSQLSELECTlist,andtherebyautomaticallyavoidaligningincompatibledatafromdifferentdimensionversions.
Addsprivateattributestodimensions.Thedimensionmanagermustincorporateprivatedepartmentalattributesinthereleaseofthedimensionstothefactproviders.TheseareattributesthatareofinteresttoonlyapartoftheEDWcommunity,perhapsasingledepartment.Paradoxically,theseattributesmustbe
partofthemasterdimensionreleasesothatsuchdepartmentscanusetheattributesforconstrainingandgroupingwhenperformingdrillacrossqueries.Ifsomeoftheprivateattributeshavesensitivecontent,thenotherdepartmentsmustbeshieldedfromusingtheseattributesviatheauthenticationandauthorizationfunctionsoftheEDW.
Buildsshrunkendimensionsasneeded.Thedimensionmanagerisresponsibleforbuildingvariousshrunkendimensionsthatareneededbyfacttablesathighlevelsofgranularity.Forexample,acustomerdimensionmightberolledupto
-
8/12/2019 Integrated EDW Kimball
11/18
EssentialStepsfortheIntegratedEDW
DemographicCategorytosupportafacttablethatreportssalesatthislevel.Thedimensionmanagerisresponsibleforcreatingthisshrunkendimensionandassigningitskeys.Suchadimensioncannotbecreatedbydefiningaviewonthelowestlevelcustomerdimension,sincerecordsinsuchaviewwouldhavetobedrawnfromtheindividualcustomerlist,andtheseindividualcustomersdonotnecessaryexistoveralltimes.Thusashrunkendimensionmustbeaseparate,independentdimensiontablewithitsownkeys.
Replicatesdimensionstofactproviders.Thedimensionmanagerperiodicallyreplicatesthedimensionanditsshrunkenversionstoallthedownstreamfactproviders.Allthefactprovidersshouldattachthenewdimensionstotheirfacttablesatthesametime,especiallyiftheversionnumberhaschanged.
Documentsandcommunicateschanges.Thedimensionmanagermaintainsmetadataanddocumentationdescribingallthechangesmadetothedimensionwitheachrelease.
Coordinateswithotherdimensionmanagers.Althougheachconformeddimensioncanbeadministeredseparately,itmakessenseforthedimensionmanagerstocoordinatetheirreleasestolessentheimpactonthedownstreamfactproviders.
TheFactProviderTheFactProviderTheFactProviderTheFactProvider
Thefactprovidersitsdownstreamfromthedimensionmanagerandrespondstoeachreleaseofeachdimensionthatisattachedtoafacttableundertheproviderscontrol.
Avoidschangestoconformedattributes.Thefactprovidermustnotalterthevaluesofanyconformeddimensionattributes,orthewholelogicofdrillingacrossdiversesubjectareaswillbecorrupted.
Respondstolatearrivingdimensionupdates.Whenthefactproviderreceiveslate
arrivingupdatestoadimension,theprimarykeysofthenewlycreateddimensionrecordsmustbeinsertedintoallfacttablesusingthatdimensionwhosetimespansoverlapthedateofthechange.Ifthesenewlycreatedkeysarenotinsertedintotheaffectedfacttables,thenthenewdimensionrecordwillnottietothetransactionalhistory.Thenewdimensionkeymustoverwriteexistingdimensionkeysintheaffectedfacttablesfromthetimeofthedimensionchangeuptothenextdimensionchangethatwasalreadycorrectlyadministered.ThisprocessisdescribedinmoredetailinTheDataWarehouseETLToolkit.
Tiesconformeddimensionreleasetolocaldimension.Thedimensionmanagermustprovidetothefactprovideramappingthattiesthefactproviderslocalnaturalkeytotheprimarysurrogatekeyassignedbythedimensionmanager.Inthesurrogatekeypipeline(seebelow),thefactproviderreplacesthelocalnaturalkeys
intherelevantfacttableswiththeconformeddimensionprimarysurrogatekeysusingthismapping.
Processesdimensionsthroughsurrogatekeypipeline.Thefactproviderconvertsthenaturalkeysattachedtocontemporarytransactionrecordsintothecorrectprimarysurrogatekeys,andloadsthefactrecordsintothefinaltableswiththesesurrogatekeys.
Handleslatearrivingfacts.Thesurrogatekeypipelinedescribedinthepreviousparagraphcanbeimplementedintwodifferentways.Traditionally,thefactprovider
-
8/12/2019 Integrated EDW Kimball
12/18
EssentialStepsfortheIntegratedEDW
maintainsacurrentkeylookuptableforeachdimensionthattiesthenaturalkeystothcontemporarysurrogatekeys.Thisworksforthemostcurrentfacttabledatawhereyocanbesurethatthecontemporarysurrogatekeyistheonetouse.Butthelookuptablescannotbeusedforlatearrivingfactdatasinceitispossiblethatoneormoreosurrogatekeysmustbeused.Inthistraditionalapproach,thefactprovidermustrevertoaninefficientdimensiontablelookupinordertofigureoutwhicholdsurrogatekeyapplies.
Amoremodernapproachtothesurrogatekeypipelineimplementsadynamiccacherecordslookedupinthedimensiontableratherthanaseparatelymaintainedlookuptable.Thiscachehandlescontemporaryfactrecordsaswellaslatearrivingfactrecorwithasinglemechanism.SeeTheDataWarehouseETLToolkitbookformoredetai
Synchronizesdimensionreleaseswithotherfactproviders.Itiscriticallyimportantfoallthefactproviderstorespondtodimensionreleasesatthesametime.Otherwiseaclientapplicationattemptingtodrillacrosssubjectareaswillencounterdimensionswdifferentversionnumbers.Seethedescriptionofusingdimensionversionnumbersinthelastparagraphofthiswhitepaper.
ConfiguringBIToolstoUsetheIntegratedEDWConfiguringBIToolstoUsetheIntegratedEDWConfiguringBIToolstoUsetheIntegratedEDWConfiguringBIToolstoUsetheIntegratedEDWThereisnopointingoingtoallthetroubleofsettingupdimensionmanagers,factproviders,andconformeddimensionsifyouarentgoingtoperformdrillacrossquerieInotherwords,youneedtosort-mergeseparateanswersetsontherowheadersdefinedbythevaluesfromtheconformeddimensionattributes.TherearemanywaystodothisinstandardBItools,andinstraightSQL.
Mechanismfordrillacross.InSQLadrillacrossquerybringingdatafrommanufacturingshipmentsandretailsalescouldbeimplementedasfollows:
SELECTMfg.ProductCategory,Mfg.Year,Mfg_Amount,Sales_AmountFROM
--SubqueryMfgreturnstotalshipmentsfromManufacturing(SELECTCategoryASProductCategory,Year,SUM(Ship_Amount)Mfg_AmountFROMMfg_ShipmentsAINNERJOINProductCONA.Product_Key=C.Product_KeyINNERJOINDateDONA.Sales_Date_Key=D.Date_KeyGROUPBYCategory,Year)MfgINNERJOIN
--SubquerySalesreturnstotalsalesfromtheSalesdatabase(SELECTProdCat_NameASProductCategory,Year,SUM(Amount)Sales_Amount
FROMSales_factFINNERJOINProductCONF.Product_Key=C.Product_KeyINNERJOINDateDONF.Sales_Date_Key=D.Date_KeyGROUPBYProdCat_Name,Year)Sales--JoinconditionforoursmallresultsetsONMfg.ProductCategory=Sales.ProductCategoryANDMfg.Year=Sales.Year
-
8/12/2019 Integrated EDW Kimball
13/18
EssentialStepsfortheIntegratedEDW
Thisshouldperformalmostasfastasdoingthetwoindividualqueriesagainsttheseparatefacttablesbecausethejoinisonrelativelysmallsubsetofdatathatsalreadyinmemory.
Usesdimensionversionnumberswheresort-merge(outerjoin)issupportedbyBItoolindrillacrossqueries.AproperlyinstrumentedBItoolthatsort-mergesthefinalseparateanswersetsthatcomposeadrillacrossquerycanprovidevaluable
protectionagainsterroneousresultsthatcomefromaccessingconformeddimensionsthathavedifferentversionnumbers.IftheBItooldoesincludetheversionnumberintheSELECTlist,andtheresultsaresort-merged(outerjoined)thentheresultsfromthefacttablequerieswillenduponseparaterowsoftheanswerset,properlylabeledbythedimensionversion.Thisisntmuchconsolationtotheenduser,butatleasttheproblemisdiagnosedinanobviousway.
InFigure4weshowareportdrillingacrossthesamethreedatabasesasinFigure1,butwhereadimensionversionmismatchoccurs.Perhapsthedefinitionofcertainproductcategorieshasbeenadjustedbetweenproductdimensionversion7andversion8.Inthiscase,theretailsalesfacttableisusingversion8whereastheothertwofacttablesarestillusingversion7.ByincludingtheproductdimensionversionattributeintheSQLSELECTlist,weautomaticallyavoidmergingpotentiallyincompatibledata.Suchanerrorwouldbeparticularlyinsidiousbecausewithouttherowsbeingseparated,theresultwouldlookperfectlyreasonable,butitcouldbe
disastrouslymisleading.
Figure4.ADrillAcrossReportWithaDimensionVersionMismatchFigure4.ADrillAcrossReportWithaDimensionVersionMismatchFigure4.ADrillAcrossReportWithaDimensionVersionMismatchFigure4.ADrillAcrossReportWithaDimensionVersionMismatch
AdvancedTopicsAdvancedTopicsAdvancedTopicsAdvancedTopicsInthissectionwedescribespecialrefinementstothechallengeofEDWintegrationthatarebeyondthebasicstepspresentedintheprevioussections.
-
8/12/2019 Integrated EDW Kimball
14/18
EssentialStepsfortheIntegratedEDW
FactproviderimplementslocalSCDsinadditiontoconformedSCDs.Atrickyproblemoccurswhenalocallyprovideddimensionattributeundergoesachangeatadifferenttimethananychangesdownloadedfromthedimensionmanager.Thisislogicallyequivalenttohandlinglatearrivingdimensions,butrequiresthefactprovidertocreateasurrogatekeyforthedimensionthatwillnotbeusedbythedimensionmanager.Thedimensionmanagermayneedtopartitionthekeyspacetoassignabandofkeystothefactproviderforthispurpose.
Dimensionmanagersandfactprovidersresolveinternationalrepresentationdifferences.AtrulyinternationalEDWpresentsmanychallenges,whichareexploredinsignificantdetailinTheDataWebhouseToolkit,(KimballandMerz,Wiley2000).Thesechallengesinclude:
Foreignalphabetsandcharactersets.ManyoftheinternationaldisplayandprintingproblemsinaninternationalEDWrequirebeingabletorepresentforeigncharacters,includingnotjusttheaccentedcharactersfromwesternEuropeanalphabets,butCyrillic,Arabic,Japanese,Chinese,anddozensofotherlessfamiliarwritingsystems.Itisimportanttounderstandthatthisisnotafontproblem.Thisisacharactersetproblem.Afontissimplyanartistsrenderingofasetofcharacters.TherearehundredsoffontsavailableforstandardEnglish.ButstandardEnglishhasarelativelysmallcharactersetthatisenoughforanyonesuseunlessyouareaprofessionaltypographer.ThissmallcharactersetisusuallyencodedinASCII(AmericanStandardCodeforInformationInterchange),whichisan8-bitencodingthathasamaximumof255possiblecharacters.Onlyabout100ofthese255charactershaveastandardinterpretationthatcanbeinvokedfromanormalEnglishkeyboard,butthisisusuallyenoughforEnglishspeakingcomputerusers.Itshouldbeclear,though,thatASCIIiswoefullyinadequateforrepresentingthethousandsofcharactersneededfornonEnglishwritingsystems.Aninternationalbodyofsystemarchitects,theUnicodeConsortium,hasdefinedastandardknownasUnicodeforrepresentingcharactersandalphabetsinalmostalloftheworldslanguagesandcultures.
Theirworkcanbeaccessedonthewebat www.unicode.org.TheprimaryuseofUnicodeisa16-bitencodingthathasamaximumof65,535possiblecharacters.TheUnicodeStandard,version5.0,whichisthepublishedversionofUnicodeasofthewritingofthiswhitepaper,nowcoverstheprincipalwrittenlanguagesoftheAmericas,Europe,theMiddleEast,Africa,India,Asia,andPacifica.
Addressesandtheirextensionstolocationsandmaps.NamesandaddressesarethemostdifficultandfarreachinginternationaldesignproblemintheinternationalEDW.TobyAtkinsonhaswrittenaremarkablebookdescribingtheintricaciesofinternationalnamesandaddresses.InhisMerriamWebstersGuidetoInternationalBusinessCommunications
(Merriam-Webster,1999)hegivesthefollowingexample.Supposeyouhaveanameandaddresslikethefollowing:
SndorCsillaNemzetkziKiadKftRkcziu.737626PCS
Areyoupreparedtostorethisinadatabase?Isthisapostallyvalidaddress?Doesthisrepresentapersonoracompany?Maleorfemale?
-
8/12/2019 Integrated EDW Kimball
15/18
EssentialStepsfortheIntegratedEDW
Wouldtherecipientbeinsultedbyanythingaboutthis?Canyoursystemparseittodeterminetheprecisegeographiclocale?Whatsalutationwouldbeappropriateifyouweregreetingthisentityinaletteroronthetelephone?Whatisgoingtohappentothevariousspecialcharacterswhenitisprinted?Canyouevenenterthesecharactersfromyourvariouskeyboards?IfyourEDWcontainsinformationaboutpeopleorbusinesseslocatedinmultiplecountries,thenyouneedtoplancarefullyforacompletesystemspanning
datainput,transactionprocessing,addresslabelandmailingproduction,realtimecustomerresponsesystems,andyourmarketingorienteddatawarehouse.
Numbers.Numbersarerepresenteddifferentlyindifferentcultures.Thenumber100.456isslightlylargerthanonehundredintheUnitedStates,butslightlylargerthanonehundredthousandinGermany.InIndia,alargenumbermaybewrittenas2334789,sincetheymaygroupthedigitsbytwosafterthefirstgroupofthree.InIndia,alakhrepresents100,000andacrorerepresents10,000,000.Othercountriesuseperiods,commas,andevenapostrophestoseparatethedigits.AninternationalEDWmustbeabletoreadandwritenumberscorrectly,givenanassignedculturalcontext.
TelephoneNumbers.Telephonenumbers,likepostaladdresses,havetwobasicrepresentations.Oneisfordomesticconsumption,andoneisforinternationaluse.Tomakemattersworse,theinternationalversionisofteninterpretedinadifferentwaybyeachinternationalobserver.Atelephonenumber(randomlycreatedforillustrativepurposes)inSouthAfricaforexampleiswrittenas
021-222-3333
butmustbedialedfromtheUnitedStatesas
011-27-21-222-3333.
Theleading011isthewaytheUnitedStatesdialsinternationalnumbers.Thiswillnotbethesameinothercountries.
Currencies.Multinationalbusinessesoftenbooktransactions,collectrevenues,andpayexpensesinmanydifferentcurrencies.AgoodbasicdesignforallofthesesituationsisshowninFigure5.
Figure5.AMultinationalFactTableFigure5.AMultinationalFactTableFigure5.AMultinationalFactTableFigure5.AMultinationalFactTable
Theprimaryamountofthetransactionisrepresentedinthelocalcurrency.
-
8/12/2019 Integrated EDW Kimball
16/18
EssentialStepsfortheIntegratedEDW
Insomesense,thisisalwaysthecorrectvalueofthetransaction.Foreasyreportingpurposes,asecondfieldincludedinthetransactionfactrecordexpressesthesameamountinasinglestandardcurrency,suchasUnitedStatesdollars.Theequivalencybetweenthetwoamountsisabasicdesigndecisionforthefacttable,andperhapsisanagreedupondailyspotratefortheconversionofthelocalcurrencyintotheglobalcurrency.Nowalltransactionsinasinglecurrencycanbeaddedupeasilyfromthefacttable
byconstrainingthecurrencydimensiontoasinglecurrencytype.Transactionsfromaroundtheworldcaneasilybeaddedupbysummingthestandardcurrencyfield.Notethatcurrenciesandcountriesarecloselycorrelatedbuttheyarenotthesame.Countriesmaychangetheidentityoftheircurrencyduringperiodsofsevereinflation.
TimeofDay.Thecalculationofthetruewallclocktimeinagivenlocationaroundtheworldissurprisinglycomplicated.Mostpeoplethinkthereare24timezones,correspondingtothe24possiblehoursperday.Butwithevenalittleforeigntravelexperience,onebeginstorealizethatthissituationismuchmorecomplex.TheentirecountryofIndia,forinstance,sitsinbetweenthesehourboundaries,sinceatdifferenttimesoftheyear,itis
either5.5or6.5hoursaheadofGreenwichMeanTime.Therulesofwhenvariouslocationsgoonandoffdaylightsavingstimeareamazinglyintricate.PartsofIndiana,forexample,goondaylightsavingstime,andotherpartsdonot.Thedateswhendaylightsavingstimegoesintoeffectvarybylocation.ThetimedifferencebetweenLondon,EnglandandSydney,Australiacanvarybyasmuchastwohours,dependingonthetimeofyear.Inreality,therearemorethan500timezonesintheworld,andthelistisconstantlychanging.Thecomplexityoftimezonecalculationsmakesitclearthatonecannotembedtimezoneassumptionsinthecodeofapplicationsorfixedqueries.ItisalsoprettyclearthateachITorganizationshouldnotre-inventthewheelandderiveallthetimezonerulesindependently.Fortunately,thewebcomestoourrescue.Anumberoftimezoneconversionservices,suchaswww.timezoneconverter.com,areavailableon-linethathaveup-to-datedatabasesreflectingallthecomplexitiesoftimezonecalculations.
Calendars.Eachcountryhasauniquelistofholidays.Inmanycasestheholidaysdonotoccuronthesamedayinsuccessiveyears.Someholidays,suchasEaster,arebasedonverycomplexrules,thatinvolvethephasesofthemoon,orotherevents.Somereligiousholidaysarenotcelebratedonthesamedayinvariouspartsofthesamecountry.Holidaysaresocomplicatedthatitprobablydoesnotmakesensetotrytodefinethemmorethantenortwentyyearsintothefuture.Thus,muchaswithtimezones,thetechnicaldefinitionofholidaysintheEDWneedstobedrivenfromaservice.Atthetimeofthiswriting,someofthebestpubliclyavailablesourcesof
internationalholidaydefinitionscanbefoundonthewebbysearchingGoogleforinternationalholidaycalendar.
Reports,printing,andcollatingsequences.Aninterestingissueinmultinationalreportingishowtoprepareasetofconsistentreportsformanagersacrosssuchanorganizationindifferentlanguages.Therearethreebasicissuesthatmustbedealtwithsimultaneously:sorting(collating),grouping,andconforming.
Manylanguagesystemssorttheirspecialcharactersinauniqueway.
-
8/12/2019 Integrated EDW Kimball
17/18
EssentialStepsfortheIntegratedEDW
AtkinsonsbookdiscussesthespecificrulesforsortinginCatalan,Czech,Danish,Finnish,German,Hungarian,Norwegian,Polish,Slovenian,Spanish,Swedish,andTurkish.AndtheseareonlylanguagesusingtheRomanalphabet.Areportcouldsortthesamesetofcustomernamesdifferentlyindifferentlanguages.
Greatcaremustbetakenifasetofattributesinadimensionistranslated
fromonelanguagetoanother.Forinstance,ifthecategoryanddepartmentnamesforalargenumberofproductsaretranslatedintomorethanonelanguage,thenthecardinalityandthedetailedmany-to-manyandmany-to-onerelationshipsmustbeidenticalbetweenthetwolanguagesversionsofthedimension,orelsetheuseofanattributefromthedimensionasarowheader(groupingcriterion)willnotproducethesameresultsintheseparatelanguages.Becausethemaintenanceoftwolanguageversionsofalargedimensiontablewouldbesosubtleanddifficult,werecommendagainstthisapproach.
Ifthesamedimensiontablehasseverallanguageversionsindifferentcountries,thenitmaybeimpossibletoconformdatasourcesacrosstheseversions,becauseatanSQLquerylevel,therowheadersoftheseparateanswersetsindifferentlanguagescouldnotbematched.
Ifweassumethatwewantasetofreportstospanmultiplelanguages,thenwerecommendimplementingatwolayerarchitecture.Inthelowerlayer,westorealldataandproduceallreportsfromasinglebaselanguagesystem.Intheupperlayer,thefinishedreportisaugmentedwithtranslationsinauxiliaryreportingcolumns.Theseauxiliaryreportingcolumnsdonotaffectsorting,grouping,ortheabilitytoconformreportsacrossdatasourceslocatedindifferentcountries.Ifweadoptthisapproach,managersfromdifferentcountriesshouldbeabletositinthesameroomwiththeirownversionsofthesamereports,butbeabletounderstandeachothersreportsandcomparethem.
Dimensionmanagersandfactprovidersensurethatauditing,compliance,authentication,authorization,andusagetrackingfunctionsareapplieduniformlyfoallBIclients.Thissetofresponsibilitiesisespeciallychallengingsincetheyareoutsidethescopeofthestepsdescribedinthiswhitepaper.AcentralizedMDMresourcemaystandardizeclientsdirectaccesstomasterdata,suchascustomer.Butsuchdirectaccessprobablyoccursoveranenterpriseservicebus(ESB),perhapsimplementedonaserviceorientedarchitecture(SOA)framework.ThisaccessdirectlytotheMDMresourceisverydifferentthanusingacustomerdimensioninaBIreportproducedbytheEDW.EvenwhenmodernroleenabledauthenticationandauthorizationsafeguardsareinplacewhenusingtheEDW,subtledifferencesinthedefinitionofrolesmaygiverisetoinconsistency.For
example,arolenamedsenioranalystmayhavedifferentinterpretationsatdifferententrypointstotheEDW.Logically,thechallengeofconformingtheseroledefinitionsissimilartoconformingdimensionalattributes,buttheroledefinitionsarestoredandmaintainedentirelydifferently.Inmanycases,theseroledefinitionsarestoredandenforcedinlocalLDAPdirectoryserversthatinterceptendusersloginrequestsallacrosstheEDWlandscape.Andfinally,thecriteriaforwhoqualifiestobeasenioranalystmaydependonlocaladministrationthatistiedmoretothehumanresourcesfunctionthanbusinessresponsibility.ThebestthatcanbesaidforthisdifficultdesignchallengeisthatpersonnelresponsiblefordefiningtheLDAP-enabledrolesshouldbeinvitedtotheoriginaldimensionconforming
-
8/12/2019 Integrated EDW Kimball
18/18
meetingssothattheybecomeawareofthescopeofEDWintegration.
Dimensionmanagersandfactproviderscoordinatewithindustrystandardsfordatacontent,dataexchange,andreporting,suchasACORD(insurance),MISMO(mortgages),SWIFTandNACHA(financialservices),HIPAAandHL7(healthcare)RosettaNet(manufacturing),andEDI(procurement).TheexistenceofindustrystandardsismostlygoodnewsfortheEDWsinceeachindustrystandardprovides
thedefinitionofmanyconformeddimensionattributesandfacts.Butoftenthesestandardsareaccompaniedbylegalrestrictionsonhowtheinformationishandled.
ConclusionConclusionConclusionConclusion
TheintegratedEDWpromisesarational,consistentviewofenterprisedata.Thispromisehasbeenrepeatedendlesslyinthetradeliterature.Butuntilnow,therehasbeennospecificdesignforactuallyimplementingtheintegratedEDW.Inthispaper
wehavepreciselyidentifiedtheabilitytodrillacrossasthecentraldeliverableoftheintegratedEDW.Thenwehavemethodicallydescribedtherequiredstepsandresponsibilitieswhichgiverisetothearchetypalrolesofthedimensionmanagerandthefactprovider.AlthoughthisimplementationoftheintegratedEDWsurelymustseemdaunting,webelievethatthestepsandresponsibilitieswehavedescribedarebasicandunavoidable,nomatterhowyourdatawarehouseenvironmentisorganized.Finally,thisarchitecturerepresentsadistillationofmorethantwodecadesexperienceinbuildingdatawarehousebasedonconformeddimensionsandfacts.Ifyoucarefullyconsiderthedetailedrecommendationsinthispaper,youshouldavoidre-inventingthewheelwhenyouarebuildingyourintegratedEDW.