data wareouse primary concepts

Upload: pako-mogotsi

Post on 04-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Data Wareouse Primary Concepts

    1/10TORONTO USERS GROUP for Power Systems March 200916

    A

    ccording to market research firm IDC, the totaldata warehouse market is expected to grow to

    $13.5 billion in 2009 at a nine percent compoundannual growth rate. Data warehousing technologyis currently at a point o acceptance such that or mostmedium to large companies, it satisfies its own discrete parto strategic corporate data requirements. Effectively, it has

    proven so useul that no C-suite (Chie Executive Officer,Chie Financial Officer etc.) team

    would want to run a businesswithout it. How does such

    a technology, withmore o a back-roomthan ront-officeconnotation,become so hot? Ina word, it boilsdown to business(as opposedto I).

    Data Warehouse:

    Primary ConceptsBy Tibault Dambrine

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 Data Wareouse Primary Concepts

    2/10TORONTO USERS GROUP for Power Systems March 2009 17

    Date Target Acquired By Valuation2003/12 Crystal Reports Business Objects $1.2 billion

    2006/02 FirstLogic Business Objects $96 million

    2006/12 Knightsbridge Solutions HP700-person DW consultancy,undisclosed fnancial terms

    2007/03 Hyperion Oracle $3.3 billion

    2007/05 OutlookSoft SAP Estimate: $4-500 million

    2007/09 Applix Cognos $339 million

    2007/10 Business Objects SAP $6.8 billion

    2007/12 Cognos IBM $5 billion

    2008/01 BEA Oracle $8.5 billion

    DW Consolidations and Buy-outs

    Table 1.

    Business DriversTe raw material or good business decision-making is simply good data. Good, orthe purpose o this topic, can be definedas accurate, reliable, organized and timely.I one o these elements is missing, thedecision maker will either have to do more

    work to veriy, organize, or update the databeore making a decisionor risk taking abad (read expensive) decision.

    In what circumstances could data beso bad, when computerized systemsare so prevalent? Here are some o theleading causes or having late, inaccurate,disorganized, or unverifiable data:

    Data Organization: Mergers and acquisitions or major

    organizational shifs Data silos resulting in conflicting

    inormation Competitive pressure to maximize

    effectiveness o marketing efforts andproduction resources

    Timeliness: Poor data access resulting in delayed

    decision making Inability to perorm real-time

    business analysis

    Accuracy and Reliability:

    Rising management costs ordisparate systems and questionabledata integrity verification (reliability)

    Qualifying QuestionsHow would you know i a data warehousecould help in YOUR business? Here

    are some litmus testquestions:

    Are you managingmultiple, disparatedatabases?

    Is your company lacking acommon data set thatacilitates decisionmaking?

    Does your I staffstruggle to satisybusiness requestsor access to data?

    Does your companyhave the capabilitiesto perorm real-time business analysis?

    Do your competitors use real-timebusiness analysis?

    Over and above these questions, threebusiness trends have come to dominate I

    requirements:1. Access to data 24/7, world-wide, or

    internal and external, web-basedinormation customers.

    2. Te demand or business decisionmaking data in real-time, at all levelso aggregation.

    3. Sarbanes-Oxley and other similarlegislation has led to an increasedemphasis on financial controls.

    Te ease o satisying the type o require-

    ments described above with the help oDW technology has effectively spurred thegrowth in this discipline. In this article, I

    will lay out some theory on data warehous-ing and expose the three phases o a data

    warehouse project: the planning, the imple-mentation and the operation.

    Data Warehouse TheoryBill Inmon first defined the term

    data warehouse. He definedit in the ollowing way: Adata warehouse is a subject-oriented, integrated, time-variant, and non-volatile

    collection o data in sup-port o managementsdecision making pro-cess. Bill Inmons viewo the data warehouse

    is also known as thetop down design method, as

    it involves a lot o up-ront end-in-mind planning beore any results can beextracted.

    Ralph Kimball, another well-known datawarehouse author, defines a data warehouseas a copy o transaction data specifically

    structured or query and analysis. Kimballis a proponent o the bottom-up approachto data warehouse design. In this approach,data marts, or mini data warehouse datastorage acilities are first created to providereporting and analytical capabilities orspecific business processes. Like bricksorming a wall, the data contained

    within these data marts can eventually becombined to create a more comprehensivedata warehouse.

    A large number o vendors have initiallyappeared on the market to bring to lieInmon and Kimball theories. As DWsofware vendors are maturing, a wave oconsolidations and buy-outs similar to whathas been seen in the ERP vendor space istaking shape. (See Table 1.)

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 Data Wareouse Primary Concepts

    3/10TORONTO USERS GROUP for Power Systems March 200918

    In the table presented, note the ollowingpoints:

    Several acquirers, like BusinessObjects, who bought Crystal Reportsand FirstLogic, and Cognos whobought Applix, were subsequentlybought out themselves.

    Te takeover events above are in ordero date. On can clearly see a trend onthe right side, to bigger and bigger

    valuations, pointing to a maturing andincreasing valuation o corporationsin this sector.

    Data WarehouseFoundations:the Star SchemaTe oundation o any data Warehousestarts with giving some thought as to howthe data will be organized within. Teclassic data organization method or DW

    data storage is called the Star Schema.Here is how it works:

    Te first aim o the data warehousesystem is to help decision makers find,earlier than their competitors, hithertounoreseen trends that may affect theirbusiness. o support these decision makingrequirements, the data in a data warehouseis divided into acts and dimensions.Facts are tangible events which also carryinherent characteristics. Dimensions are

    any data elements that may affect thebehavior o these acts.

    o illustrate this point, the diagram inFigure 1shows a retail star schema. At the center o the star are the acts.

    Facts are tangible events. In thiscase, the acts are individual salestransactions.

    Around the acts are five dimensions:1) Customer Loyalty Dimension2) Geographic Dimension3) Product Dimension4) HR Dimension5) ime Dimension

    Further defining the Geographicdimension are two sub-dimensions,also known as snowflake dimensionsbecause o the shape they give to the star:

    1) ax Snowflake Dimension,which depends on the geographiclocation and the time when the actoccurred

    2) Weather Snowflake Dimension,which also depends on thegeographic location and the time

    when the act occurred

    Deciding when to snowflake a dimension orto include it should be above all a practicaldecision. For example, i you get weatherdata or 1,000 retail outlet postal codesevery day rom an outside supplier, it maybe just easier to keep it in a separate table.Speed o retrieval is also a actor. I the extratime it takes to access snowflake data is tooexpensive, it may be advisable to add theinormation to the main dimension or evenin the act table i it is critical enougheffectively de-normalizing, trading storageor speed.

    Having divided the data into acts anddimensions, one can mine the data or

    trends. In a retail environment, one couldlook or questions such as: What distance will the average loyalty

    card holding customer travel romtheir home to one o the companyretail stores?

    Is there a correlation between thedistance and the requency o the

    visits? I a promotion flyer was distributed by

    mail to a given postal code, what wasthe loyalty card holder response?

    In the spring season, at what averagetemperature do customers purchasemore cold drinks, like ruit juices thanhot drinks, like coffee?

    I a customer bought a product inthe salty snack category, what is theprobability that they would also buyone or more cold drinks?

    Is there a typical basket o goodspurchased on certain weekdays?

    What is the profile o the employeeswith the best sales?

    I a sales education course wasprovided or employees o a giventerritory, can the results be measured?

    Tese are just examples o questions onecould ask and conveniently get answers tousing data stored in a act/dimension-basedstar schema. Better yet, beyond havinganswers to questions that the marketersmay be curious about, the secondary aim othe data warehouse star schema is to enable

    data mining. Effectively, data mining is

    the use o sofware to uncover hithertounknown trends, or trends not easily visibleotherwise.

    o make the point, here is an example: Alarge (big surace/many stores) retail grocerhas its retail transactions organized in astar schema-based data warehouse. Teyuse a sofware package to mine the data orcyclic sales trends. What i they discoveredthat eggplants sales go through the rooevery time a ull-moon is on a Tursday?Te result is that they could prepare andstock up on that item in advance o everyull-moon Tursday, enabling managementto take advantage o a cyclical, predictableevent.

    I the eggplant story is not convincing, hereis one that is real. In the book he wroteabout Leadership, Rudolph Giuliani,

    ormer mayor o New York, describes howdata mining actually helped reduce crimein the New York City prison system. Tesituation was as ollows: Te Prison systemhad point-o-sale systems to manage the

    prison concession stores, which sell itemssuch as cigarettes, chocolate bars and such.Tey also had a completely separate systemto handle criminal records, includingthe criminal history o each individualregistered in the system, with a track recordo their crimes both outside and inside the

    prison system.

    One could describe these two systemsas silos of information, as they wereindependent of each other. Using data

    warehouse technology and data miningtools, the NYPD systems analystsuncovered a hitherto un-noticed trend:

    prior to most prison riots or group crimeevents, there was a run up in concessionsales. When they reported their findingsto the wardens, effectively the businessusers of these systems, it made totalsense to them. What happened was thata few prison kingpins and gang leaders

    would first stock up on concession goods,and then trigger a riot. As a predictable

    punishment for such events, the prisonauthorities would clamp down on theinmates and shut down the concessionstores for a period of time, thereby creatinga shortage of goods. Te kingpins wouldthen have a captive black market to re-sell

    what they had bought at regular price, at a

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 Data Wareouse Primary Concepts

    4/10TORONTO USERS GROUP for Power Systems March 2009 19

    premium to other inmates. By monitoringsudden spikes in concession sales, theNY prison wardens managed to reducein-prison crime. When such trends werenoticeable, they would move the knowntrouble-makers to un-familiar cells, beefup the security, do more in-depth searchesand effectively destroy the ability of these

    gang leaders to take advantage of theirpositions. Te net effect was a significantreduction in prison rioting events.

    Making It All Happen:PlanningPrior to starting a data warehouse in an Idepartment that did not previously use one;

    the first step is to ensure the staff who willwork in this area have a strong understandingo data warehouse technologies. Plan toget ormal training or the team that willollow through with the implementation,development and maintenance o the DW.raining the Business Analysts is equallyimportant, as they will convey the value

    Figure 1.

    TIME(DIMENSION) Date

    Hour

    Week Day

    Week Number

    Month

    Season

    Public Holliday

    Rush Hour

    HR(DIMENSION) Sales Person #

    Age

    Seniority

    Education

    Sales Period

    Sales

    SALESTRANSACTIONS(FACTS) item number

    item quantity

    sale date

    sale time

    store number

    sales person #

    item price Frequent

    Customer #

    GEOGRAPHY(DIMENSION) Store Number

    Province/State

    County

    City

    Postal Code

    Time Zone

    PRODUCT(DIMENSION) Item number

    Brand Name

    Item Category

    Item Packaging

    Order by Unit of Measure

    Sold by Unit of Measure

    TAXES(SNOWFLAKEDIMENSION) Province/State

    County

    City

    Effective Start Date

    Effective End Date

    CUSTOMERLOYALTY(DIMENSION) Frequent Customer #

    Customer Name

    Customer Postal Code

    WEATHER(SNOWFLAKEDIMENSION) Date

    Hour

    Province/State

    County

    City Temperature

    Conditions

    Sample: Retail Star Schema

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 Data Wareouse Primary Concepts

    5/10

  • 8/13/2019 Data Wareouse Primary Concepts

    6/10TORONTO USERS GROUP for Power Systems March 2009 21

    Te process o bringing data rom your operational systems tothe data warehouse is commonly known as EL or Extract/ransorm/Load. Tese three verbs suggest exactly what you willneed to do to get that data into the data warehouse.

    1. Data extraction involves:a.Identiying extraction criteria and requency o extractionb.Cleaning the data (ensuring there are no duplicates,

    ensuring there is a deault value or missing elements, andensuring that all the data ollows the same rules.

    c.Extracting also means, most times, to physically pull datarom a number o possible operational systems to the data

    warehouse, thereby involving sofware interaces.

    2. Data transformation involves:a.Bringing the data to its most granular shape: For example,

    in your financial system, your monthly data is stored in12 columns per table. Making the data more granularmeans to transorm the shape o this data and produce12 individual rows (one or each month), Tis will enablejoining at the GL act data at year/month/account level

    to data rom other parts o the business stored at a similarlevel o granularity and make comparisons/trending atthat level.

    b.Making the data more meaningul, e.g. i 1 meantMale and 2 meant Female in the source system,having Male and Female in the data warehouse makesthese values more readable and more usable.

    c.Deriving values rom existing data e.g. sales = qty * pricethe result o which will be stored or quicker retrieval (noneed to calculate it at retrieval time).

    d.Data cleansing is also typically part of thetransformation. Tis activity includes omission of

    useless data as well as validation and possibly rejection(error reporting) for data that does not conform toestablished rules. Activities such as replacing nulls withdefault values or flagging and removing duplicate valuesare part of this discipline. On the topic of duplicates,one should keep in mind that some duplicates aretrickier to identify than others. For example, having

    GM and General Motors as two different clientswith the same address could be considered a duplicate,even if the names are completely different. Tis is thetype of data cleansing that has to be addressed for thedata warehouse to be effective. Even better, if possible,the cleansing routine should report such anomaliesand help the source systems clean their data if possible.Some companies actually specialize in data cleanup.

    3. Data Loading:a.Ensuring the new data is not a duplicate o the existing

    one already loaded in the DW.b.I the data is loaded multiple times per day, the load

    may involve re-calculating the running totals used insubsequent extract every time a new load is perormed.

    c.Reerential integrity is critical or the unctioning o adata warehouse. Tis aspect must be verified at load time.

    d.Source to target integrity checking veriying that whathas been loaded in the DW does reflect what is in thesource system.

    At the end o the interaces and EL processes, one final step shouldalways ollow: Te integrity check, which will ensure that the dataloaded in the data warehouse truly reflects the contents o thebusiness data. I that integrity is lost, then business decisions made

    with the help o the data warehouse will likely be unreliable thecredibility o the DW will be questioned, rightully so and this willimpact the support rom sponsors.

    SchedulingOne o the seldom-talked-about topics o data warehousing isscheduling. On the load side, gathering data on regular basis, asseldom as monthly and as ofen as hourly or better, does requireautomation. Most I departments make use o some orm o jobscheduler. Your data warehouse EL jobs will have to not onlybe scheduled but also be subject to business run dependencies.Attention to detail, when it comes to setting job schedulerdependencies is critical.

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 Data Wareouse Primary Concepts

    7/10TORONTO USERS GROUP for Power Systems March 200922

    On the data retrieval side, while a simpledata pull rom the Business system to thedata warehouse sounds like routine, manythings can go wrong. Make sure you check

    your integrity reports daily or better yet,schedule integrity checking jobs that willnotiy a pager i any integrity is out. Keepa keen eye on any disruption or any jobthat runs abnormally slowly or quickly. Ithere is no programming or schedulingmodifications in the DW systems, this isofen a sign o trouble with source data.Many times, the data warehouse integritieswill be the first to signal anomalies in thedata and make Business Systems aware oa problem even beore the Business Usersknow about it.

    24/7 accessibility is also a requirementto consider, especially or operations that

    work across many time zones. Tis should

    be taken into account when planningscheduled jobs and architecting solutions.Conversely, in limited time-zone DWimplementations, having the night hours

    with relatively low client demands opensthe opportunity to do data-preparationin off hours. Examples o these could be

    canned reports or cubes that take a longtime to process. Preparing these at nightor during off hours will ensure the data isready or the users. When they come in themorning, the users will not have to wait orthe reports, they will be pre-calculated.

    Retrieving the Data Providing the ValueTis is the visible part o total effort thatgoes into the data warehouse, indeed,the one that will get the most attention.Te inormation sourced rom the datawarehouse is broadly known as BusinessIntelligence (BI) - strategic decision-making data. Business decision makers

    will consume the reporting and datamining results rom the data warehouse.Being visible to the high end-users whoofen sponsor data warehouse operations,reporting and data mining get a lot o

    research and development dollars rommost BI vendors. Reporting rom the data

    warehouse is done in two broad methods:1. Reporting:

    which itsel is divided intoconventional reports, OLAP anddashboards

    2. Data Mining:which comprises analysis o pastevents and predicting uture events,on the base o the past trends

    Reporting Describingwhat happenedReliable and timely Reporting o what hashappened, in terms o business transactions

    will no doubt help management makeeducated decisions on what to do next. Inthis section, I will describe three levels oreporting: Conventional, two-

    dimensional reports (like one thatcould be printed on paper)

    OLAP cubes, which have theability to report on more than twodimensions as well as pre-calculatingaggregate values

    Dashboard reporting, which is

    typically geared to report at a highlevel (Chie Executive Officers, ChieFinancial officers etc.)

    Conventional Reports:wo-dimensional query toolsSimplyReportingmany corporations use thedata warehouse acility simply to enableusers to draw their own data. Te data,being typically stored in a star schemaormat, is in the best shape to acilitate act/dimension reporting with sub-totals and

    categorization by dimension. A variety ospecialized vendors offer typically point-and-click web-based tools which enableeven the most junior users to produce

    very usable reports. Tese reporting toolstypically offer scheduling options and theability to provide sophisticated results atlow user effort cost.

    Spreadsheets are generally looked-downupon as a reporting tool. Te truth howeveris that they are still widely used; primarilybecause o their low cost. Te ease o useand flexibility offered by this tool is alsoits downall. Spreadsheets enable usersto conveniently manipulate data andpotentially alter the original version o thetruth. Use o spreadsheets, because o theirflexibility and ease o change, also meansthat there will likely be inconsistenciesbetween methods o extraction, sourceo data, calculations, and results. Withso many potential inconsistencies, siloso inormation tend to appear. While

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 Data Wareouse Primary Concepts

    8/10TORONTO USERS GROUP for Power Systems March 2009 23

    spreadsheets are easy to use, most expertsagree on limiting their use or data

    warehouse analysis and reporting.

    OLAP:is short or On-Line Analytical Processing.Te website www.olapreport.com has amore descriptive acronym: FASMI orFast Analysis o Shared MultidimensionalInormation. OLAP Cubes are akin toreports, except that they can have morethan two dimensions. It could enable orexample, to see a representation o sales

    per cities over a time period (3 dimensionsrather than 2) which is the best a paperreport or even a conventional spreadsheetcan do. Cubes can only be visualized usingspecialized sofware, much in the same

    way that Microsof Excel can open an.xls document. OLAP Cube terminologyreers to measures (equivalent o acts),

    categorized by dimensions.

    Dashboards:

    In an automobile, the dashboard has a simpleunction: to inorm the driver at a glance othe status o all critical systems in the car. Tisincludes the speed, RPM, uel, and enginetemperature. In all cases, there is a red lineon each gauge to clearly indicate the unsaezone o operation. Corporate dashboards,in a similar way, are geared to provide vitalcorporate data at a glance to the top decision

    makers. Te inormation is provided in aormat that shows how close or how ar thecorporation is operating rom a pre-definedred line. When or example, inventory levelsare critically low or how ar rom the pre-defined norm the key perormance indicatorssuch as sales or cash balance are. Te idea isthat the executive level management wouldbe inormed early and accurately real timei possible about the state o the companyto enable action early on.

    One o the best-known digital dashboardmeasurements is the Norton KaplanBalanced Scorecard, which measures ourcritical aspects o a company: Financial perspective Customer perspective Internal process perspective Learning and growth perspective

    By setting up ranges within which thecompany should operate on digitaldashboards, the C-Suite staff can monitor

    the corporate health and trends withrelative ease. I one particular area wouldneed attention, the Chie can drill down(another word or get down to the details)to see what is happening. For example, ithe sales were unexpectedly down or agiven month, the action to take would beto drill down in the sales by territory, to seei any specific territory is a problem, or by

    product category to see i a given productline was in trouble. Corrective action canthen be taken early, beore the troublespreads.

    Data Mining: Analyzing andPredicting

    Data mining:suggests a rigorous, rule-based statisticalapproach to examining the data with

    purpose to identiy repeatable trends

    within. o effectively mine data, companiesuse purpose-built sofware tools. Te trendsthey are looking or are ofen not visibleto the naked eye. Tink o it as seismicprospecting in your data.

    Data mining has a somewhat serendipitousconnotation. Te general idea would bethat data mining sofware would be ableto auto-find trends hitherto un-noticed.

    While this does happen occasionally, mosttimes, the findings are not completely

    unexpected.

    Data mining is all about getting a betterunderstanding o the data. In effect, theend-result o data mining is literally a ormo inormation about data. o put thisin simpler words, i a user says I have ahypothesis about this data, the data miningsofware tool can be programmed to veriythe hypothesis, enabling stronger decision-making and possibly pro-active or pre-emptive action.

    Analyzing describing why ithappened:Te first purpose o data mining is todescribe why it happened. Effectively, tofigure out, out o the mass o organized data

    within the data warehouse, how a changein a dimension has impacted the acts. Isuch a trend can be reliably recognizedand i the change in dimension is cyclicalin nature, there is a good bet that the eventis predictable. A simple example could be

    to determine at what point, between thewinter and summer, sales o cold drinksbecome more prevalent than warm drinks.Te driver here would be the temperaturechange, which is cyclical in nature and thus

    predictable.

    Predicting describing whatwill happen or rather whatwill likely happen:Tis is a stage where the DW has enoughpast data to mine, and has attained a stageo maturity where it can reliably find

    predictable trends within the data. Terationale is to enable the decision makersto take pre-emptive action to capitalize onpredictable events beore the competitionwould.

    Data mining can be applied to look ortrends outside o the companys control,

    weather or demographic changes orexample. It can also be used to measurethe impact o actions within the companysown control like pricing changes, storerenovations

    Customer retention is one o these areasthat are ofen challenging. Do customersabout to switch to the competition haveidentifiable pattern behaviors? Couldcustomers about to leave be identified inadvance? Could a phone call rom a sales

    associate or customer care representativeincrease their chances o staying? Datamining can be used to look at past clientpurchasing patterns common to switchersand help recognize in advance those aboutto deect, enabling action prior to losingcustomers.

    The Critical Value of a DWGovernance Model

    We have now visited the design, theinterace and the data retrieval processes.Successul DW operations are typically busy.Managing the new requests as well as largedata volumes that flow in every day doesrequire strong coordination and leadershipthat can distinguish the difference between

    urgent and important.

    Setting up a DW Governance team,especially at the inception o the project,is counter-intuitive. Many will say whynow?. Te act is that the Governance eameffectively aligns the user requirements, the

    http://-/?-http://-/?-http://-/?-http://www.olapreport.com/http://www.olapreport.com/http://-/?-
  • 8/13/2019 Data Wareouse Primary Concepts

    9/10TORONTO USERS GROUP for Power Systems March 200924

    corporate requirements and the best DWsolutions or the aorementioned.

    By definition, a DW operation gathersdata rom many sources and typically alsohas clients rom all parts o the company.

    With such varieties o requirements andpotentially diverging driving interests, thepotential or going astray is strong. ypicalpitalls o data warehousing are the creationo inormation silos, or subject areas alsoknown as data marts designed in such a

    way that they cannot be correlated withother data marts within the same data

    warehouse. Inormation silos ofen lead tothe many versions o the truth syndrome.Constant, unrelenting governance is criticalto data warehousing success.

    o ensure these potentially diverginginterests are properly prioritized andunded, it is wise to establish early on a

    permanent governance team, composed oboth technical and business leaders.

    Te mission o the DW governance team isto ensure each new DW project or initiativeollows strict standards, is designed inharmony with the existing structure(avoiding isolated data silos) and with theduty to make recommendations or seniormanagement when new investments arenecessary. oo many times, when manyconflicting interests require help rom thedata warehouse, the squeaky wheel getsthe attention first.

    At a high level, the DW Governance eammust establish priorities or DW services,enabling work on what is important asopposed to what is deemed urgent bysome. At the ground level, to ensure thedata quality is monitored and maintained.Beyond this, establishing strong processes

    and a DW project lie cycle will add value.While it does require some extra resources,consistent methods o bringing new datain will ensure predictable results. As inall things in I, no surprises is desirable.Each new DW proposed initiative shouldalso be subjected to a governance review toensure prioritization, alignment with otherdata marts and methods o execution are inaccordance with the direction agreed uponby the Governance team.

    While all this sounds an awul lot like redtape, the mission o the governance teamis toabove allensure the DW teambest satisfies the Business requirements.Te data warehouse must be built withBusiness requirements in mind. It shouldbebeore anything else (including I) aBusiness-sponsored activity. I this is to bea successul, long-term project, it will haveto be done with the interest o the business

    A Data Warehouse Governance Model

    Request Governance Action

    Data Warehouse SteeringCommittee

    DW SPONSOR

    Data Stewards

    DW Team Lead

    DesignatedData Focal Points

    Data Consumers(Business users)

    Data WarehouseDay-to-Day

    Work Queue

    DW Architects

    DW IT TeamThe Focal Points orsuper-users will both helpusers and gatherrequirements, which will goto the Data Stewards

    The Business users are theconsumers of the data. Theywill get first help for any DWrequest from the Focal Pointpersons. They will also spawnnew DW requests, which willgo to the steering committeevia the Focal Points and theData Stewards

    The Sponsorwill finance the DataWarehouse Activities

    The steering committeewill gather requests and

    make recommendations togo ahead with whichproject.Stewards own theBusiness data. DW

    Architects own the DWdesign & methods

    UserRequests

    UserSupport

    ValidateNewInitiatives

    ImplementNewInitiatives

    Development withend-in-mind

    Figure 3.

    A Data Warehouse Governance Model

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 Data Wareouse Primary Concepts

    10/10TORONTO USERS GROUP for Power Systems March 2009 25

    as a top-o-mind, overriding requirement. Everything else will besecondary in the governance model.

    Te diagram in Figure 3proposes a data governance structure thatdistributes the day-to-day DW requests to super-users, known asData Warehouse Focal Points. Tese are typically business userswith a good understanding o the data. When necessary, they willbring concerns or new DW requirements to the Data Stewards, alsobusiness users, but a higher level, who effectively have responsibilityor the data. Te Data Stewards are part o the steering committee.ogether with the DW Data Architects, they will make therecommendations on what new DW projects will be implementedand how. Tese projects will be submitted to the sponsor orapproval. Note that the governance process involves all the partiesthat have any involvement with the data warehouse, rom financing(sponsor) all the way to data consumers (the decision makers whouse the data provided by the DW).

    Size and ScalabilityOne o the more salient properties o data warehouses at large isthat they typically are BIG systems. Tey require lots o disk storage,

    processing power and they are geared to grow every day as a mattero existence. I one ollows Bill Inmons principles, data should neverbe purged rom a DW database. On this point, while theory has tostand on principle, there are times when common sense may enableexceptions. I or example, data germane to an obsolete businessunit is eating storage but not yielding value, it should be archivedand removed rom the disk. Part o the governance is to guide theDW staff in constantly re-evaluating an ofen overlooked question

    how much data should be stored? How valuable is the oldest data?Is orever data really worth it? Maybe it is, maybe it is not, but thequestion should be asked.

    Scalability is also an issue that should be considered every time newcode is written. One ofen-ound problem is code that works wellwith small volumes o data but slows down exponentially with theincreased volume load. esting DW applications, whether or ELor reporting purposes, is critical. Te value o having proper indexes,especially on large volume tables, cannot be overestimated.

    ConclusionOne can say that data warehousing is ofen perceived as a simpleconcept. At a high level, operations consist in bringing data to acentral repository at the back-end and retrieving it to do reportingand analysis at the ront-end. Te reality is that to build andmaintain a successul data warehouse read: useul to its users andsponsors there are a lot o actors to consider.

    Planning:Number one recommendation dont just jump in. Tis is a goodspot to take a deep breath and ensure a strong plan is put in place.

    Building:Inmon vs. Kimball. You dont have to make a choice. Most datawarehouse models are hybrids o these two models. Te successactor is reflected by the usage and the benefits the DW brings tothe company.

    Operating:Ensure strong processes are in place, especially a governance model.Te need or governance is proportional to the diversity andnumber o requests coming rom all parts o the company. Withoutgovernance, DW operations will likely struggle to please too manycustomers and not achieve ull value potential.

    Data warehouses are expensive, large-scale operations. Sincedesigning and extracting value from a data warehouse is not anexact science, one could wager that at least a percentage of DWimplementations are mediocre to the point of not bringing anyreturn to their sponsoring business. Data warehousing is notrisk-free.

    In this article, I have described examples of customer data, salesdata, data related to the customer/business relationship. Inactual fact, DW technology can be pointed to just about anytype of data, including operational or security. Imagine what

    you could find if you mined the data produced by your firewallor Accounts Payable for example. What trends could you find?If it is well done, the DW may bring huge returns in terms of

    understanding data trendsboth internal [operational] andexternal, ahead of the competitors.

    For the Big Picture perspective, it is Human nature to notwant to be the last to know. Because of this simple fact, thetrend for more sophisticated Business Intelligence and data

    warehouse implementations will not slow down anytime soon.Te Internet, 24/7 access to data and Sarbanes-Oxley will onlyaccelerate this tendency for more information, delivered faster,covering more ground more accurately. IBM, HP, SAP andOracle now own most of the more established DW outfits. Nodoubt, they will put their weight behind this technology and

    soon bring new, more powerful, more sophisticated tools tothe market. Expect these vendors to take competition in thisarea to a whole new level. No doubt, they themselves use data

    warehousing services. Tis can only be good for current andfuture DW consumers. TG

    Thibault Dambrineworks for Shell Canada

    Limited as a seniorsystems analyst.

    He holds the ITILFoundations as wellas the Release and

    Control PractitionersCertifcates. His past

    articles can be foundat www.tylogix.com.

    http://-/?-http://-/?-http://-/?-http://-/?-