data wareouse primary concepts
TRANSCRIPT
-
8/13/2019 Data Wareouse Primary Concepts
1/10TORONTO USERS GROUP for Power Systems March 200916
A
ccording to market research firm IDC, the totaldata warehouse market is expected to grow to
$13.5 billion in 2009 at a nine percent compoundannual growth rate. Data warehousing technologyis currently at a point o acceptance such that or mostmedium to large companies, it satisfies its own discrete parto strategic corporate data requirements. Effectively, it has
proven so useul that no C-suite (Chie Executive Officer,Chie Financial Officer etc.) team
would want to run a businesswithout it. How does such
a technology, withmore o a back-roomthan ront-officeconnotation,become so hot? Ina word, it boilsdown to business(as opposedto I).
Data Warehouse:
Primary ConceptsBy Tibault Dambrine
http://-/?-http://-/?-http://-/?-http://-/?- -
8/13/2019 Data Wareouse Primary Concepts
2/10TORONTO USERS GROUP for Power Systems March 2009 17
Date Target Acquired By Valuation2003/12 Crystal Reports Business Objects $1.2 billion
2006/02 FirstLogic Business Objects $96 million
2006/12 Knightsbridge Solutions HP700-person DW consultancy,undisclosed fnancial terms
2007/03 Hyperion Oracle $3.3 billion
2007/05 OutlookSoft SAP Estimate: $4-500 million
2007/09 Applix Cognos $339 million
2007/10 Business Objects SAP $6.8 billion
2007/12 Cognos IBM $5 billion
2008/01 BEA Oracle $8.5 billion
DW Consolidations and Buy-outs
Table 1.
Business DriversTe raw material or good business decision-making is simply good data. Good, orthe purpose o this topic, can be definedas accurate, reliable, organized and timely.I one o these elements is missing, thedecision maker will either have to do more
work to veriy, organize, or update the databeore making a decisionor risk taking abad (read expensive) decision.
In what circumstances could data beso bad, when computerized systemsare so prevalent? Here are some o theleading causes or having late, inaccurate,disorganized, or unverifiable data:
Data Organization: Mergers and acquisitions or major
organizational shifs Data silos resulting in conflicting
inormation Competitive pressure to maximize
effectiveness o marketing efforts andproduction resources
Timeliness: Poor data access resulting in delayed
decision making Inability to perorm real-time
business analysis
Accuracy and Reliability:
Rising management costs ordisparate systems and questionabledata integrity verification (reliability)
Qualifying QuestionsHow would you know i a data warehousecould help in YOUR business? Here
are some litmus testquestions:
Are you managingmultiple, disparatedatabases?
Is your company lacking acommon data set thatacilitates decisionmaking?
Does your I staffstruggle to satisybusiness requestsor access to data?
Does your companyhave the capabilitiesto perorm real-time business analysis?
Do your competitors use real-timebusiness analysis?
Over and above these questions, threebusiness trends have come to dominate I
requirements:1. Access to data 24/7, world-wide, or
internal and external, web-basedinormation customers.
2. Te demand or business decisionmaking data in real-time, at all levelso aggregation.
3. Sarbanes-Oxley and other similarlegislation has led to an increasedemphasis on financial controls.
Te ease o satisying the type o require-
ments described above with the help oDW technology has effectively spurred thegrowth in this discipline. In this article, I
will lay out some theory on data warehous-ing and expose the three phases o a data
warehouse project: the planning, the imple-mentation and the operation.
Data Warehouse TheoryBill Inmon first defined the term
data warehouse. He definedit in the ollowing way: Adata warehouse is a subject-oriented, integrated, time-variant, and non-volatile
collection o data in sup-port o managementsdecision making pro-cess. Bill Inmons viewo the data warehouse
is also known as thetop down design method, as
it involves a lot o up-ront end-in-mind planning beore any results can beextracted.
Ralph Kimball, another well-known datawarehouse author, defines a data warehouseas a copy o transaction data specifically
structured or query and analysis. Kimballis a proponent o the bottom-up approachto data warehouse design. In this approach,data marts, or mini data warehouse datastorage acilities are first created to providereporting and analytical capabilities orspecific business processes. Like bricksorming a wall, the data contained
within these data marts can eventually becombined to create a more comprehensivedata warehouse.
A large number o vendors have initiallyappeared on the market to bring to lieInmon and Kimball theories. As DWsofware vendors are maturing, a wave oconsolidations and buy-outs similar to whathas been seen in the ERP vendor space istaking shape. (See Table 1.)
http://-/?-http://-/?-http://-/?-http://-/?- -
8/13/2019 Data Wareouse Primary Concepts
3/10TORONTO USERS GROUP for Power Systems March 200918
In the table presented, note the ollowingpoints:
Several acquirers, like BusinessObjects, who bought Crystal Reportsand FirstLogic, and Cognos whobought Applix, were subsequentlybought out themselves.
Te takeover events above are in ordero date. On can clearly see a trend onthe right side, to bigger and bigger
valuations, pointing to a maturing andincreasing valuation o corporationsin this sector.
Data WarehouseFoundations:the Star SchemaTe oundation o any data Warehousestarts with giving some thought as to howthe data will be organized within. Teclassic data organization method or DW
data storage is called the Star Schema.Here is how it works:
Te first aim o the data warehousesystem is to help decision makers find,earlier than their competitors, hithertounoreseen trends that may affect theirbusiness. o support these decision makingrequirements, the data in a data warehouseis divided into acts and dimensions.Facts are tangible events which also carryinherent characteristics. Dimensions are
any data elements that may affect thebehavior o these acts.
o illustrate this point, the diagram inFigure 1shows a retail star schema. At the center o the star are the acts.
Facts are tangible events. In thiscase, the acts are individual salestransactions.
Around the acts are five dimensions:1) Customer Loyalty Dimension2) Geographic Dimension3) Product Dimension4) HR Dimension5) ime Dimension
Further defining the Geographicdimension are two sub-dimensions,also known as snowflake dimensionsbecause o the shape they give to the star:
1) ax Snowflake Dimension,which depends on the geographiclocation and the time when the actoccurred
2) Weather Snowflake Dimension,which also depends on thegeographic location and the time
when the act occurred
Deciding when to snowflake a dimension orto include it should be above all a practicaldecision. For example, i you get weatherdata or 1,000 retail outlet postal codesevery day rom an outside supplier, it maybe just easier to keep it in a separate table.Speed o retrieval is also a actor. I the extratime it takes to access snowflake data is tooexpensive, it may be advisable to add theinormation to the main dimension or evenin the act table i it is critical enougheffectively de-normalizing, trading storageor speed.
Having divided the data into acts anddimensions, one can mine the data or
trends. In a retail environment, one couldlook or questions such as: What distance will the average loyalty
card holding customer travel romtheir home to one o the companyretail stores?
Is there a correlation between thedistance and the requency o the
visits? I a promotion flyer was distributed by
mail to a given postal code, what wasthe loyalty card holder response?
In the spring season, at what averagetemperature do customers purchasemore cold drinks, like ruit juices thanhot drinks, like coffee?
I a customer bought a product inthe salty snack category, what is theprobability that they would also buyone or more cold drinks?
Is there a typical basket o goodspurchased on certain weekdays?
What is the profile o the employeeswith the best sales?
I a sales education course wasprovided or employees o a giventerritory, can the results be measured?
Tese are just examples o questions onecould ask and conveniently get answers tousing data stored in a act/dimension-basedstar schema. Better yet, beyond havinganswers to questions that the marketersmay be curious about, the secondary aim othe data warehouse star schema is to enable
data mining. Effectively, data mining is
the use o sofware to uncover hithertounknown trends, or trends not easily visibleotherwise.
o make the point, here is an example: Alarge (big surace/many stores) retail grocerhas its retail transactions organized in astar schema-based data warehouse. Teyuse a sofware package to mine the data orcyclic sales trends. What i they discoveredthat eggplants sales go through the rooevery time a ull-moon is on a Tursday?Te result is that they could prepare andstock up on that item in advance o everyull-moon Tursday, enabling managementto take advantage o a cyclical, predictableevent.
I the eggplant story is not convincing, hereis one that is real. In the book he wroteabout Leadership, Rudolph Giuliani,
ormer mayor o New York, describes howdata mining actually helped reduce crimein the New York City prison system. Tesituation was as ollows: Te Prison systemhad point-o-sale systems to manage the
prison concession stores, which sell itemssuch as cigarettes, chocolate bars and such.Tey also had a completely separate systemto handle criminal records, includingthe criminal history o each individualregistered in the system, with a track recordo their crimes both outside and inside the
prison system.
One could describe these two systemsas silos of information, as they wereindependent of each other. Using data
warehouse technology and data miningtools, the NYPD systems analystsuncovered a hitherto un-noticed trend:
prior to most prison riots or group crimeevents, there was a run up in concessionsales. When they reported their findingsto the wardens, effectively the businessusers of these systems, it made totalsense to them. What happened was thata few prison kingpins and gang leaders
would first stock up on concession goods,and then trigger a riot. As a predictable
punishment for such events, the prisonauthorities would clamp down on theinmates and shut down the concessionstores for a period of time, thereby creatinga shortage of goods. Te kingpins wouldthen have a captive black market to re-sell
what they had bought at regular price, at a
http://-/?-http://-/?-http://-/?-http://-/?- -
8/13/2019 Data Wareouse Primary Concepts
4/10TORONTO USERS GROUP for Power Systems March 2009 19
premium to other inmates. By monitoringsudden spikes in concession sales, theNY prison wardens managed to reducein-prison crime. When such trends werenoticeable, they would move the knowntrouble-makers to un-familiar cells, beefup the security, do more in-depth searchesand effectively destroy the ability of these
gang leaders to take advantage of theirpositions. Te net effect was a significantreduction in prison rioting events.
Making It All Happen:PlanningPrior to starting a data warehouse in an Idepartment that did not previously use one;
the first step is to ensure the staff who willwork in this area have a strong understandingo data warehouse technologies. Plan toget ormal training or the team that willollow through with the implementation,development and maintenance o the DW.raining the Business Analysts is equallyimportant, as they will convey the value
Figure 1.
TIME(DIMENSION) Date
Hour
Week Day
Week Number
Month
Season
Public Holliday
Rush Hour
HR(DIMENSION) Sales Person #
Age
Seniority
Education
Sales Period
Sales
SALESTRANSACTIONS(FACTS) item number
item quantity
sale date
sale time
store number
sales person #
item price Frequent
Customer #
GEOGRAPHY(DIMENSION) Store Number
Province/State
County
City
Postal Code
Time Zone
PRODUCT(DIMENSION) Item number
Brand Name
Item Category
Item Packaging
Order by Unit of Measure
Sold by Unit of Measure
TAXES(SNOWFLAKEDIMENSION) Province/State
County
City
Effective Start Date
Effective End Date
CUSTOMERLOYALTY(DIMENSION) Frequent Customer #
Customer Name
Customer Postal Code
WEATHER(SNOWFLAKEDIMENSION) Date
Hour
Province/State
County
City Temperature
Conditions
Sample: Retail Star Schema
http://-/?-http://-/?-http://-/?-http://-/?- -
8/13/2019 Data Wareouse Primary Concepts
5/10
-
8/13/2019 Data Wareouse Primary Concepts
6/10TORONTO USERS GROUP for Power Systems March 2009 21
Te process o bringing data rom your operational systems tothe data warehouse is commonly known as EL or Extract/ransorm/Load. Tese three verbs suggest exactly what you willneed to do to get that data into the data warehouse.
1. Data extraction involves:a.Identiying extraction criteria and requency o extractionb.Cleaning the data (ensuring there are no duplicates,
ensuring there is a deault value or missing elements, andensuring that all the data ollows the same rules.
c.Extracting also means, most times, to physically pull datarom a number o possible operational systems to the data
warehouse, thereby involving sofware interaces.
2. Data transformation involves:a.Bringing the data to its most granular shape: For example,
in your financial system, your monthly data is stored in12 columns per table. Making the data more granularmeans to transorm the shape o this data and produce12 individual rows (one or each month), Tis will enablejoining at the GL act data at year/month/account level
to data rom other parts o the business stored at a similarlevel o granularity and make comparisons/trending atthat level.
b.Making the data more meaningul, e.g. i 1 meantMale and 2 meant Female in the source system,having Male and Female in the data warehouse makesthese values more readable and more usable.
c.Deriving values rom existing data e.g. sales = qty * pricethe result o which will be stored or quicker retrieval (noneed to calculate it at retrieval time).
d.Data cleansing is also typically part of thetransformation. Tis activity includes omission of
useless data as well as validation and possibly rejection(error reporting) for data that does not conform toestablished rules. Activities such as replacing nulls withdefault values or flagging and removing duplicate valuesare part of this discipline. On the topic of duplicates,one should keep in mind that some duplicates aretrickier to identify than others. For example, having
GM and General Motors as two different clientswith the same address could be considered a duplicate,even if the names are completely different. Tis is thetype of data cleansing that has to be addressed for thedata warehouse to be effective. Even better, if possible,the cleansing routine should report such anomaliesand help the source systems clean their data if possible.Some companies actually specialize in data cleanup.
3. Data Loading:a.Ensuring the new data is not a duplicate o the existing
one already loaded in the DW.b.I the data is loaded multiple times per day, the load
may involve re-calculating the running totals used insubsequent extract every time a new load is perormed.
c.Reerential integrity is critical or the unctioning o adata warehouse. Tis aspect must be verified at load time.
d.Source to target integrity checking veriying that whathas been loaded in the DW does reflect what is in thesource system.
At the end o the interaces and EL processes, one final step shouldalways ollow: Te integrity check, which will ensure that the dataloaded in the data warehouse truly reflects the contents o thebusiness data. I that integrity is lost, then business decisions made
with the help o the data warehouse will likely be unreliable thecredibility o the DW will be questioned, rightully so and this willimpact the support rom sponsors.
SchedulingOne o the seldom-talked-about topics o data warehousing isscheduling. On the load side, gathering data on regular basis, asseldom as monthly and as ofen as hourly or better, does requireautomation. Most I departments make use o some orm o jobscheduler. Your data warehouse EL jobs will have to not onlybe scheduled but also be subject to business run dependencies.Attention to detail, when it comes to setting job schedulerdependencies is critical.
http://-/?-http://-/?-http://-/?-http://-/?- -
8/13/2019 Data Wareouse Primary Concepts
7/10TORONTO USERS GROUP for Power Systems March 200922
On the data retrieval side, while a simpledata pull rom the Business system to thedata warehouse sounds like routine, manythings can go wrong. Make sure you check
your integrity reports daily or better yet,schedule integrity checking jobs that willnotiy a pager i any integrity is out. Keepa keen eye on any disruption or any jobthat runs abnormally slowly or quickly. Ithere is no programming or schedulingmodifications in the DW systems, this isofen a sign o trouble with source data.Many times, the data warehouse integritieswill be the first to signal anomalies in thedata and make Business Systems aware oa problem even beore the Business Usersknow about it.
24/7 accessibility is also a requirementto consider, especially or operations that
work across many time zones. Tis should
be taken into account when planningscheduled jobs and architecting solutions.Conversely, in limited time-zone DWimplementations, having the night hours
with relatively low client demands opensthe opportunity to do data-preparationin off hours. Examples o these could be
canned reports or cubes that take a longtime to process. Preparing these at nightor during off hours will ensure the data isready or the users. When they come in themorning, the users will not have to wait orthe reports, they will be pre-calculated.
Retrieving the Data Providing the ValueTis is the visible part o total effort thatgoes into the data warehouse, indeed,the one that will get the most attention.Te inormation sourced rom the datawarehouse is broadly known as BusinessIntelligence (BI) - strategic decision-making data. Business decision makers
will consume the reporting and datamining results rom the data warehouse.Being visible to the high end-users whoofen sponsor data warehouse operations,reporting and data mining get a lot o
research and development dollars rommost BI vendors. Reporting rom the data
warehouse is done in two broad methods:1. Reporting:
which itsel is divided intoconventional reports, OLAP anddashboards
2. Data Mining:which comprises analysis o pastevents and predicting uture events,on the base o the past trends
Reporting Describingwhat happenedReliable and timely Reporting o what hashappened, in terms o business transactions
will no doubt help management makeeducated decisions on what to do next. Inthis section, I will describe three levels oreporting: Conventional, two-
dimensional reports (like one thatcould be printed on paper)
OLAP cubes, which have theability to report on more than twodimensions as well as pre-calculatingaggregate values
Dashboard reporting, which is
typically geared to report at a highlevel (Chie Executive Officers, ChieFinancial officers etc.)
Conventional Reports:wo-dimensional query toolsSimplyReportingmany corporations use thedata warehouse acility simply to enableusers to draw their own data. Te data,being typically stored in a star schemaormat, is in the best shape to acilitate act/dimension reporting with sub-totals and
categorization by dimension. A variety ospecialized vendors offer typically point-and-click web-based tools which enableeven the most junior users to produce
very usable reports. Tese reporting toolstypically offer scheduling options and theability to provide sophisticated results atlow user effort cost.
Spreadsheets are generally looked-downupon as a reporting tool. Te truth howeveris that they are still widely used; primarilybecause o their low cost. Te ease o useand flexibility offered by this tool is alsoits downall. Spreadsheets enable usersto conveniently manipulate data andpotentially alter the original version o thetruth. Use o spreadsheets, because o theirflexibility and ease o change, also meansthat there will likely be inconsistenciesbetween methods o extraction, sourceo data, calculations, and results. Withso many potential inconsistencies, siloso inormation tend to appear. While
http://-/?-http://-/?-http://-/?-http://-/?- -
8/13/2019 Data Wareouse Primary Concepts
8/10TORONTO USERS GROUP for Power Systems March 2009 23
spreadsheets are easy to use, most expertsagree on limiting their use or data
warehouse analysis and reporting.
OLAP:is short or On-Line Analytical Processing.Te website www.olapreport.com has amore descriptive acronym: FASMI orFast Analysis o Shared MultidimensionalInormation. OLAP Cubes are akin toreports, except that they can have morethan two dimensions. It could enable orexample, to see a representation o sales
per cities over a time period (3 dimensionsrather than 2) which is the best a paperreport or even a conventional spreadsheetcan do. Cubes can only be visualized usingspecialized sofware, much in the same
way that Microsof Excel can open an.xls document. OLAP Cube terminologyreers to measures (equivalent o acts),
categorized by dimensions.
Dashboards:
In an automobile, the dashboard has a simpleunction: to inorm the driver at a glance othe status o all critical systems in the car. Tisincludes the speed, RPM, uel, and enginetemperature. In all cases, there is a red lineon each gauge to clearly indicate the unsaezone o operation. Corporate dashboards,in a similar way, are geared to provide vitalcorporate data at a glance to the top decision
makers. Te inormation is provided in aormat that shows how close or how ar thecorporation is operating rom a pre-definedred line. When or example, inventory levelsare critically low or how ar rom the pre-defined norm the key perormance indicatorssuch as sales or cash balance are. Te idea isthat the executive level management wouldbe inormed early and accurately real timei possible about the state o the companyto enable action early on.
One o the best-known digital dashboardmeasurements is the Norton KaplanBalanced Scorecard, which measures ourcritical aspects o a company: Financial perspective Customer perspective Internal process perspective Learning and growth perspective
By setting up ranges within which thecompany should operate on digitaldashboards, the C-Suite staff can monitor
the corporate health and trends withrelative ease. I one particular area wouldneed attention, the Chie can drill down(another word or get down to the details)to see what is happening. For example, ithe sales were unexpectedly down or agiven month, the action to take would beto drill down in the sales by territory, to seei any specific territory is a problem, or by
product category to see i a given productline was in trouble. Corrective action canthen be taken early, beore the troublespreads.
Data Mining: Analyzing andPredicting
Data mining:suggests a rigorous, rule-based statisticalapproach to examining the data with
purpose to identiy repeatable trends
within. o effectively mine data, companiesuse purpose-built sofware tools. Te trendsthey are looking or are ofen not visibleto the naked eye. Tink o it as seismicprospecting in your data.
Data mining has a somewhat serendipitousconnotation. Te general idea would bethat data mining sofware would be ableto auto-find trends hitherto un-noticed.
While this does happen occasionally, mosttimes, the findings are not completely
unexpected.
Data mining is all about getting a betterunderstanding o the data. In effect, theend-result o data mining is literally a ormo inormation about data. o put thisin simpler words, i a user says I have ahypothesis about this data, the data miningsofware tool can be programmed to veriythe hypothesis, enabling stronger decision-making and possibly pro-active or pre-emptive action.
Analyzing describing why ithappened:Te first purpose o data mining is todescribe why it happened. Effectively, tofigure out, out o the mass o organized data
within the data warehouse, how a changein a dimension has impacted the acts. Isuch a trend can be reliably recognizedand i the change in dimension is cyclicalin nature, there is a good bet that the eventis predictable. A simple example could be
to determine at what point, between thewinter and summer, sales o cold drinksbecome more prevalent than warm drinks.Te driver here would be the temperaturechange, which is cyclical in nature and thus
predictable.
Predicting describing whatwill happen or rather whatwill likely happen:Tis is a stage where the DW has enoughpast data to mine, and has attained a stageo maturity where it can reliably find
predictable trends within the data. Terationale is to enable the decision makersto take pre-emptive action to capitalize onpredictable events beore the competitionwould.
Data mining can be applied to look ortrends outside o the companys control,
weather or demographic changes orexample. It can also be used to measurethe impact o actions within the companysown control like pricing changes, storerenovations
Customer retention is one o these areasthat are ofen challenging. Do customersabout to switch to the competition haveidentifiable pattern behaviors? Couldcustomers about to leave be identified inadvance? Could a phone call rom a sales
associate or customer care representativeincrease their chances o staying? Datamining can be used to look at past clientpurchasing patterns common to switchersand help recognize in advance those aboutto deect, enabling action prior to losingcustomers.
The Critical Value of a DWGovernance Model
We have now visited the design, theinterace and the data retrieval processes.Successul DW operations are typically busy.Managing the new requests as well as largedata volumes that flow in every day doesrequire strong coordination and leadershipthat can distinguish the difference between
urgent and important.
Setting up a DW Governance team,especially at the inception o the project,is counter-intuitive. Many will say whynow?. Te act is that the Governance eameffectively aligns the user requirements, the
http://-/?-http://-/?-http://-/?-http://www.olapreport.com/http://www.olapreport.com/http://-/?- -
8/13/2019 Data Wareouse Primary Concepts
9/10TORONTO USERS GROUP for Power Systems March 200924
corporate requirements and the best DWsolutions or the aorementioned.
By definition, a DW operation gathersdata rom many sources and typically alsohas clients rom all parts o the company.
With such varieties o requirements andpotentially diverging driving interests, thepotential or going astray is strong. ypicalpitalls o data warehousing are the creationo inormation silos, or subject areas alsoknown as data marts designed in such a
way that they cannot be correlated withother data marts within the same data
warehouse. Inormation silos ofen lead tothe many versions o the truth syndrome.Constant, unrelenting governance is criticalto data warehousing success.
o ensure these potentially diverginginterests are properly prioritized andunded, it is wise to establish early on a
permanent governance team, composed oboth technical and business leaders.
Te mission o the DW governance team isto ensure each new DW project or initiativeollows strict standards, is designed inharmony with the existing structure(avoiding isolated data silos) and with theduty to make recommendations or seniormanagement when new investments arenecessary. oo many times, when manyconflicting interests require help rom thedata warehouse, the squeaky wheel getsthe attention first.
At a high level, the DW Governance eammust establish priorities or DW services,enabling work on what is important asopposed to what is deemed urgent bysome. At the ground level, to ensure thedata quality is monitored and maintained.Beyond this, establishing strong processes
and a DW project lie cycle will add value.While it does require some extra resources,consistent methods o bringing new datain will ensure predictable results. As inall things in I, no surprises is desirable.Each new DW proposed initiative shouldalso be subjected to a governance review toensure prioritization, alignment with otherdata marts and methods o execution are inaccordance with the direction agreed uponby the Governance team.
While all this sounds an awul lot like redtape, the mission o the governance teamis toabove allensure the DW teambest satisfies the Business requirements.Te data warehouse must be built withBusiness requirements in mind. It shouldbebeore anything else (including I) aBusiness-sponsored activity. I this is to bea successul, long-term project, it will haveto be done with the interest o the business
A Data Warehouse Governance Model
Request Governance Action
Data Warehouse SteeringCommittee
DW SPONSOR
Data Stewards
DW Team Lead
DesignatedData Focal Points
Data Consumers(Business users)
Data WarehouseDay-to-Day
Work Queue
DW Architects
DW IT TeamThe Focal Points orsuper-users will both helpusers and gatherrequirements, which will goto the Data Stewards
The Business users are theconsumers of the data. Theywill get first help for any DWrequest from the Focal Pointpersons. They will also spawnnew DW requests, which willgo to the steering committeevia the Focal Points and theData Stewards
The Sponsorwill finance the DataWarehouse Activities
The steering committeewill gather requests and
make recommendations togo ahead with whichproject.Stewards own theBusiness data. DW
Architects own the DWdesign & methods
UserRequests
UserSupport
ValidateNewInitiatives
ImplementNewInitiatives
Development withend-in-mind
Figure 3.
A Data Warehouse Governance Model
http://-/?-http://-/?-http://-/?-http://-/?- -
8/13/2019 Data Wareouse Primary Concepts
10/10TORONTO USERS GROUP for Power Systems March 2009 25
as a top-o-mind, overriding requirement. Everything else will besecondary in the governance model.
Te diagram in Figure 3proposes a data governance structure thatdistributes the day-to-day DW requests to super-users, known asData Warehouse Focal Points. Tese are typically business userswith a good understanding o the data. When necessary, they willbring concerns or new DW requirements to the Data Stewards, alsobusiness users, but a higher level, who effectively have responsibilityor the data. Te Data Stewards are part o the steering committee.ogether with the DW Data Architects, they will make therecommendations on what new DW projects will be implementedand how. Tese projects will be submitted to the sponsor orapproval. Note that the governance process involves all the partiesthat have any involvement with the data warehouse, rom financing(sponsor) all the way to data consumers (the decision makers whouse the data provided by the DW).
Size and ScalabilityOne o the more salient properties o data warehouses at large isthat they typically are BIG systems. Tey require lots o disk storage,
processing power and they are geared to grow every day as a mattero existence. I one ollows Bill Inmons principles, data should neverbe purged rom a DW database. On this point, while theory has tostand on principle, there are times when common sense may enableexceptions. I or example, data germane to an obsolete businessunit is eating storage but not yielding value, it should be archivedand removed rom the disk. Part o the governance is to guide theDW staff in constantly re-evaluating an ofen overlooked question
how much data should be stored? How valuable is the oldest data?Is orever data really worth it? Maybe it is, maybe it is not, but thequestion should be asked.
Scalability is also an issue that should be considered every time newcode is written. One ofen-ound problem is code that works wellwith small volumes o data but slows down exponentially with theincreased volume load. esting DW applications, whether or ELor reporting purposes, is critical. Te value o having proper indexes,especially on large volume tables, cannot be overestimated.
ConclusionOne can say that data warehousing is ofen perceived as a simpleconcept. At a high level, operations consist in bringing data to acentral repository at the back-end and retrieving it to do reportingand analysis at the ront-end. Te reality is that to build andmaintain a successul data warehouse read: useul to its users andsponsors there are a lot o actors to consider.
Planning:Number one recommendation dont just jump in. Tis is a goodspot to take a deep breath and ensure a strong plan is put in place.
Building:Inmon vs. Kimball. You dont have to make a choice. Most datawarehouse models are hybrids o these two models. Te successactor is reflected by the usage and the benefits the DW brings tothe company.
Operating:Ensure strong processes are in place, especially a governance model.Te need or governance is proportional to the diversity andnumber o requests coming rom all parts o the company. Withoutgovernance, DW operations will likely struggle to please too manycustomers and not achieve ull value potential.
Data warehouses are expensive, large-scale operations. Sincedesigning and extracting value from a data warehouse is not anexact science, one could wager that at least a percentage of DWimplementations are mediocre to the point of not bringing anyreturn to their sponsoring business. Data warehousing is notrisk-free.
In this article, I have described examples of customer data, salesdata, data related to the customer/business relationship. Inactual fact, DW technology can be pointed to just about anytype of data, including operational or security. Imagine what
you could find if you mined the data produced by your firewallor Accounts Payable for example. What trends could you find?If it is well done, the DW may bring huge returns in terms of
understanding data trendsboth internal [operational] andexternal, ahead of the competitors.
For the Big Picture perspective, it is Human nature to notwant to be the last to know. Because of this simple fact, thetrend for more sophisticated Business Intelligence and data
warehouse implementations will not slow down anytime soon.Te Internet, 24/7 access to data and Sarbanes-Oxley will onlyaccelerate this tendency for more information, delivered faster,covering more ground more accurately. IBM, HP, SAP andOracle now own most of the more established DW outfits. Nodoubt, they will put their weight behind this technology and
soon bring new, more powerful, more sophisticated tools tothe market. Expect these vendors to take competition in thisarea to a whole new level. No doubt, they themselves use data
warehousing services. Tis can only be good for current andfuture DW consumers. TG
Thibault Dambrineworks for Shell Canada
Limited as a seniorsystems analyst.
He holds the ITILFoundations as wellas the Release and
Control PractitionersCertifcates. His past
articles can be foundat www.tylogix.com.
http://-/?-http://-/?-http://-/?-http://-/?-