dw tutorial
TRANSCRIPT
-
7/31/2019 Dw Tutorial
1/62
Recent Developments inData Warehousing
Hugh J. WatsonTerry College of BusinessUniversity of [email protected]://www.terry.uga.edu/~hwatson/dw_tutorial.ppt
-
7/31/2019 Dw Tutorial
2/62
-
7/31/2019 Dw Tutorial
3/62
-
7/31/2019 Dw Tutorial
4/62
-
7/31/2019 Dw Tutorial
5/62
-
7/31/2019 Dw Tutorial
6/62
-
7/31/2019 Dw Tutorial
7/62
-
7/31/2019 Dw Tutorial
8/62
Operational Data Store
An operational data store consolidates data frommultiple source systems and provides a near real-time, integrated view of volatile, current data.
Its purpose is to provide integrated data foroperational purposes. It has add, change, and deletefunctionality.
It may be created to avoid a full blown ERPimplementation.
-
7/31/2019 Dw Tutorial
9/62
Prod
Mkt
HR
Fin
Acctg
Data Sources
Transaction Data
IBM
IMS
VSAM
Oracle
Sybase
ETL Software Data Stores Data AnalysisTools andApplications
Users
Other Internal Data
ERP SAP
Clickstream Informix
Web Data
External Data
Demographic Harte-
Hanks
STAGI
NG
AREA
OPERATI
ONAL
D
ATA
STORE
Ascential
Ext ract
Sagent
SAS
Clean/ScrubTrans formFirstlogic
Load
Informatica
Data MartsTeradataIBM
DataWarehouse
MetaData
Finance
Marketing
Sales
Essbase
Microsoft
ANALYSTS
MANAGERS
EXECUTIVES
OPERATIONAPERSONNEL
CUSTOMERS/ SUPPLIERS
SQL
Cognos
SAS
Queries,Reporting, DSS/EIS,
Data Mining
Micro Strategy
Siebel
BusinessObjects
WebBrowser
-
7/31/2019 Dw Tutorial
10/62
Topics CoveredDefinitions and conceptsTwo case studies: Harrahs Entertainment (first)and Owens&Minor (last)
The data mart and enterprise-wide datawarehouse strategiesData extraction, cleansing, transformation andloading
Meta dataData storesOnline analytical processing (OLAP)Warehouse users, tools, and applications
-
7/31/2019 Dw Tutorial
11/62
Harrahs Entertainment
Harrahs Entertainment -- data warehousingsupported a successful shift to a CRM oriented
corporate strategy. Winner of the 2000 TDWILeadership AwardOperates 21 casinos across the countryIn 1993, the gaming laws changed, whichallowed Harrahs to expandHarrahs decided to compete using a brandstrategy supported by information technologyNeeded to know their customers exceptionallywell
-
7/31/2019 Dw Tutorial
12/62
Harrahs Data WarehousingArchitecture
WINet sources data from the casino,hotel, and event systems
The patron data base serves as anoperational data storeThe marketing workbench serves as
the data warehouse
-
7/31/2019 Dw Tutorial
13/62
Sample Applications
Operational personnel use PDB tocheck the preferences, history, and
value of customersAnalysts use PDB and MWB to createoffers to visit a Harrahs casino
Analysts use MWB to supportpredictive modeling efforts
-
7/31/2019 Dw Tutorial
14/62
-
7/31/2019 Dw Tutorial
15/62
Execute
Right Offer
Right MessageRight Time
Predict the valueof a customer
Market based onthat expected value
Track transactionsthat are linked to
marketinginitiatives
Evaluate theeffectiveness
Track profitability
Refine Marketing Approaches
Learn
CustomerTreatment
CustomerAction/
Non-Action
Track
Measure: Profit & Loss Behavior change New test re ort
Define: Objectives Tests Control cells
-
7/31/2019 Dw Tutorial
16/62
Customer Relationship Lifecycle
Annual Revenue
Establish Reinvigorate
Length of Relationship
Strengthen
-
7/31/2019 Dw Tutorial
17/62
-
7/31/2019 Dw Tutorial
18/62
The Data Mart StrategyThe most common approachBegins with a single mart and architected martsare added over time for more subject areasRelatively inexpensive and easy to implementCan be used as a proof of concept for datawarehousingCan perpetuate the silos of information problemCan postpone difficult decisions and activitiesRequires an overall integration plan
-
7/31/2019 Dw Tutorial
19/62
The Enterprise-wide StrategyA comprehensive warehouse is builtinitiallyAn initial dependent data mart is builtusing a subset of the data in thewarehouseAdditional data marts are built usingsubsets of the data in the warehouseLike all complex projects, it is expensive,time consuming, and prone to failureWhen successful, it results in an
integrated, scalable warehouse
-
7/31/2019 Dw Tutorial
20/62
Data Sources and TypesPrimarily from legacy, operationalsystemsAlmost exclusively numerical data at thepresent timeExternal data may be included, oftenpurchased from third-party sources
Technology exists for storing unstructureddata and expect this to become moreimportant over time
-
7/31/2019 Dw Tutorial
21/62
Extraction, Transformation,and Loading (ETL) Processes
The plumbing work of datawarehousing
Data are moved from source totarget data basesA very costly, time consuming part
of data warehousing
-
7/31/2019 Dw Tutorial
22/62
-
7/31/2019 Dw Tutorial
23/62
Recent Development:Clickstream Data
Results from clicks at web sitesA dialog manager handles userinteractions. An ODS helps to customtailor the dialogThe clickstream data is filtered andparsed and sent to a data warehousewhere it is analyzedSoftware is available to analyze theclickstream data
-
7/31/2019 Dw Tutorial
24/62
Recent Development:Further Automation of ETL Processes
MetaRecon from Metagenix reverseengineers data into information
Analyzes and profiles source systemsUncovers problems in source systemsRecommends primary and secondarykeys, dimensions and measures, etc.Generates ETL scripts
-
7/31/2019 Dw Tutorial
25/62
Data ExtractionOften performed by COBOL routines(not recommended because of highprogram maintenance and noautomatically generated meta data)Sometimes source data is copied to thetarget database using the replicationcapabilities of standard RDMS (notrecommended because of dirty data inthe source systems)Increasing performed by specialized ETLsoftware
-
7/31/2019 Dw Tutorial
26/62
Sample ETL ToolsDataStage from Ascential SoftwareSAS System from SAS Institute
Power Mart/Power Center fromInformaticaSagent Solution from Sagent
SoftwareHummingbird Genio Suite fromHummingbird Communications
-
7/31/2019 Dw Tutorial
27/62
-
7/31/2019 Dw Tutorial
28/62
Data CleansingSource systems contain dirty data thatmust be cleansedETL software contains rudimentary datacleansing capabilitiesSpecialized data cleansing software isoften used. Important for performingname and address correction andhouseholding functionsLeading data cleansing vendors includeVality (Integrity), Harte-Hanks (Trillium),and Firstlogic (i.d.Centric)
-
7/31/2019 Dw Tutorial
29/62
Steps in Data Cleansing
Parsing
Correcting
Standardizing
Matching
Consolidating
-
7/31/2019 Dw Tutorial
30/62
Parsing
Parsing locates and identifiesindividual data elements in the
source files and then isolates thesedata elements in the target files.Examples include parsing the first,
middle, and last name; streetnumber and street name; and cityand state.
-
7/31/2019 Dw Tutorial
31/62
Correcting
Corrects parsed individual datacomponents using sophisticated data
algorithms and secondary datasources.Example include replacing a vanity
address and adding a zip code.
-
7/31/2019 Dw Tutorial
32/62
Standardizing
Standardizing applies conversionroutines to transform data into its
preferred (and consistent) formatusing both standard and custombusiness rules.
Examples include adding a prename, replacing a nickname, andusing a preferred street name.
-
7/31/2019 Dw Tutorial
33/62
Matching
Searching and matching recordswithin and across the parsed,
corrected and standardized databased on predefined business rulesto eliminate duplications.
Examples include identifying similarnames and addresses.
-
7/31/2019 Dw Tutorial
34/62
Consolidating
Analyzing and identifyingrelationships between matched
records and consolidating/mergingthem into ONE representation.
-
7/31/2019 Dw Tutorial
35/62
Data StagingOften used as an interim step between dataextraction and later stepsAccumulates data from asynchronous sources
using native interfaces, flat files, FTP sessions,or other processesAt a predefined cutoff time, data in the stagingfile is transformed and loaded to the warehouseThere is usually no end user access to thestaging fileAn operational data store may be used for datastaging
-
7/31/2019 Dw Tutorial
36/62
Data Transformation
Transforms the data in accordancewith the business rules and
standards that have beenestablishedExample include: format changes,
deduplication, splitting up fields,replacement of codes, derivedvalues, and aggregates
-
7/31/2019 Dw Tutorial
37/62
-
7/31/2019 Dw Tutorial
38/62
Meta DataData about dataNeeded by both information technologypersonnel and usersIT personnel need to know data sourcesand targets; database, table and columnnames; refresh schedules; data usagemeasures; etc.Users need to know entity/attributedefinitions; reports/query tools available;report distribution information; help deskcontact information, etc.
-
7/31/2019 Dw Tutorial
39/62
Recent Development:Meta Data Integration
A growing realization that meta data iscritical to data warehousing successProgress is being made on gettingvendors to agree on standards and toincorporate the sharing of meta dataamong their toolsVendors like Microsoft, ComputerAssociates, and Oracle have entered themeta data marketplace with significantproduct offerings
-
7/31/2019 Dw Tutorial
40/62
Database Vendors
High end (i.e., terabyte plus)vendors include IBM (DB2) and
NCR-Teradata (Teradata)Oracle (8i) and Microsoft (SQLServer 7) are major players for
smaller databases
-
7/31/2019 Dw Tutorial
41/62
On-line AnalyticalProcessing (OLAP)
A set of functionality that facilitatesmultidimensional analysis
Allows users to analyze data in waysthat are natural to themComes in many varieties -- ROLAP,
MOLAP, DOLAP, etc.
-
7/31/2019 Dw Tutorial
42/62
ROLAPRelational OLAPUses a RDBMS to implement and OLAP
environmentTypically involves a star schema toprovide the multidimensional capabilitiesOLAP tool manipulates RDBMS starschema dataCalled slowlap by MOLAP vendors
-
7/31/2019 Dw Tutorial
43/62
-
7/31/2019 Dw Tutorial
44/62
Star SchemaCreates non-normalized datastructures
Easier for users to understandOptimized for OLAPUses fact (facts or measures in thebusiness) and dimension(establishes the context of the facts)tables
-
7/31/2019 Dw Tutorial
45/62
OLAP ToolsProducts come from vendors such as Brio, Cognos, Hyperion,and BusinessObjectsTypically available as a fat or thin (i.e., browser) client
In a web environment, the browser communicates with aweb server, which talks to an application server, whichconnects to backend databasesThe application server provides query, reporting, and OLAPanalysis functionality over the webJava applets or downloaded components augment the thinclientA broadcast server may be used to schedule, run, publish,and broadcast reports, alerts, and responses over the LAN,
email, or personal digital assistant.
-
7/31/2019 Dw Tutorial
46/62
-
7/31/2019 Dw Tutorial
47/62
Dimension Table ExamplesRetail -- store name, zip code, productname, product category, day of week
Telecommunications -- call origin, calldestinationBanking -- customer name, accountnumber, branch, account officer
Insurance -- policy type, insured party
-
7/31/2019 Dw Tutorial
48/62
Fact Table ExamplesRetail -- number of units sold, salesamount
Telecommunications -- length of call in minutes, average number of callsBanking -- average monthlybalanceInsurance -- claims amount
-
7/31/2019 Dw Tutorial
49/62
The Fact Table Key Concatenatesthe Dimension KeysAssume that you want to know thenumber of television sets soldto Best Buys on January 15, 2001.
The query might be:SELECT CLIENT.CUSNAME, SALES.NOSOLD
FROM CLIENT, PRODUCT, TIME, SALES
WHERE CLIENT.CUSNAME=SALES.CUSNAME ANDPRODUCT.PRODNAME=SALES.PRODNAME ANDTIME.DATE=SALES.DATE AND CLIENT.CUSNAME=BEST BUYS
AND PRODUCT.PRODNAME=TELEVISION AND
TIME.DATE=#01/15/2001#
-
7/31/2019 Dw Tutorial
50/62
Warehouse Users
AnalystsManagers
ExecutivesOperational personnelCustomers and suppliers
-
7/31/2019 Dw Tutorial
51/62
Warehouse Tools andApplications
SQL queriesManaged query environments
Structured and ad hoc reportsDSS/EISPortals
Data miningPackaged applicationsCustom-built applications
Recent Development:
-
7/31/2019 Dw Tutorial
52/62
Recent Development:Growing Dominance of MS SQLServer 7.0 with OLAP Services
Low cost, integration of bundledDSS components from one vendor,and extended SQL for OLAPCompetitors are either leaving themarket or are repositioning their
products to be complimentary
-
7/31/2019 Dw Tutorial
53/62
-
7/31/2019 Dw Tutorial
54/62
Owens & MinorOwens&Minor -- data warehousing hassupported integration along the supply chain.Winner of the 1999 TDWI Leadership Award
the nation's leading distributor of name-brandmedical and surgical supplieshas transformed its business model byintegrating supply chain management, e-business, data warehousing, and Internet
technologiesas part of this initiative, WISDOM(WebIntelligence Supporting Decisions fromOwens & Minor) has been especially valuable
-
7/31/2019 Dw Tutorial
55/62
-
7/31/2019 Dw Tutorial
56/62
WISDOMa Web-based decision support systemthat provides information to OMsemployees, suppliers and customers
accesses data from a data warehousethat maintains supplier and customertransaction datasold to trading partners as a value addedproductWISDOM II provides data about thetransactions that suppliers and customershave with all of their trading partners
-
7/31/2019 Dw Tutorial
57/62
Sample ApplicationsSupports reporting and queries forinternal personnelSupports an EIS for senior managementSuppliers can determine their marketshare in specific hospitalsHospitals can identify which products arebeing bought off contractWISDOM II extends data warehousing totrading partners through an outsourcingarrangement
-
7/31/2019 Dw Tutorial
58/62
Articles
Cooper, B.L., H.J. Watson, B.H. Wixom, and D.L. Goodhue, "Data WarehousingSupports Corporate Strategy at First American Corporation," MIS Quarterly ,(December 2000), pp. 547-567. Provides a case study of how the First
American Corporation turned their strategy and fortunes around through theuse of data warehousing. Stoller, Wixom, and Watson, WISDOM Provides Competitive Advantage atOwens & Minor, (http://terry.uga.edu/~watson/owens&minor.doc) Provides acase study of how data warehousing can support supply chain integration. Watson, Wixom, Buonamica, and Revak, Sherwin -Williams' Data MartStrategy: Creating Intelligence Across the Supply Chain, Communications of
ACIS, April 2001 . Provides a textbook example of how to implement a datamart strategy.
Watson, H.J., D.A. Annino, B.H. Wixom, K.L. Avery, and M. Rutherford, Current Practices in Data Warehousing, Information Systems Management ,(Winter, 2001), pp. 47-55. Provides data on companies data warehousingexperiences, with an emphasis on the benefits being realized.Watson, H.J. and L. Volonino, Harrahs High Payoff from CustomerInformation, (http://www.terry.uga.edu/~hwatson/harrahs.doc) Provides acase study of how Harrahs Entertainment has implemented a CRM strategy
facilitated by data warehousing.
-
7/31/2019 Dw Tutorial
59/62
-
7/31/2019 Dw Tutorial
60/62
Websiteshttp://www.olapreport.com (provides detailed information about the OLAPmarket, products, and applications)http://www.firstlogic.com (includes an interactive demo of their datacleansing tool)http://www.billinmon.com (a wealth of current information from thefather of data warehousing) http://www.metagenix.com(illustrates recent advances in ETL tools)http://www.microstrategy.com(excellent materials from one of the leadingDSS vendors)
-
7/31/2019 Dw Tutorial
61/62
Questions
-
7/31/2019 Dw Tutorial
62/62