dw tutorial

Upload: robert-alberto-cornejo

Post on 05-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Dw Tutorial

    1/62

    Recent Developments inData Warehousing

    Hugh J. WatsonTerry College of BusinessUniversity of [email protected]://www.terry.uga.edu/~hwatson/dw_tutorial.ppt

  • 7/31/2019 Dw Tutorial

    2/62

  • 7/31/2019 Dw Tutorial

    3/62

  • 7/31/2019 Dw Tutorial

    4/62

  • 7/31/2019 Dw Tutorial

    5/62

  • 7/31/2019 Dw Tutorial

    6/62

  • 7/31/2019 Dw Tutorial

    7/62

  • 7/31/2019 Dw Tutorial

    8/62

    Operational Data Store

    An operational data store consolidates data frommultiple source systems and provides a near real-time, integrated view of volatile, current data.

    Its purpose is to provide integrated data foroperational purposes. It has add, change, and deletefunctionality.

    It may be created to avoid a full blown ERPimplementation.

  • 7/31/2019 Dw Tutorial

    9/62

    Prod

    Mkt

    HR

    Fin

    Acctg

    Data Sources

    Transaction Data

    IBM

    IMS

    VSAM

    Oracle

    Sybase

    ETL Software Data Stores Data AnalysisTools andApplications

    Users

    Other Internal Data

    ERP SAP

    Clickstream Informix

    Web Data

    External Data

    Demographic Harte-

    Hanks

    STAGI

    NG

    AREA

    OPERATI

    ONAL

    D

    ATA

    STORE

    Ascential

    Ext ract

    Sagent

    SAS

    Clean/ScrubTrans formFirstlogic

    Load

    Informatica

    Data MartsTeradataIBM

    DataWarehouse

    MetaData

    Finance

    Marketing

    Sales

    Essbase

    Microsoft

    ANALYSTS

    MANAGERS

    EXECUTIVES

    OPERATIONAPERSONNEL

    CUSTOMERS/ SUPPLIERS

    SQL

    Cognos

    SAS

    Queries,Reporting, DSS/EIS,

    Data Mining

    Micro Strategy

    Siebel

    BusinessObjects

    WebBrowser

  • 7/31/2019 Dw Tutorial

    10/62

    Topics CoveredDefinitions and conceptsTwo case studies: Harrahs Entertainment (first)and Owens&Minor (last)

    The data mart and enterprise-wide datawarehouse strategiesData extraction, cleansing, transformation andloading

    Meta dataData storesOnline analytical processing (OLAP)Warehouse users, tools, and applications

  • 7/31/2019 Dw Tutorial

    11/62

    Harrahs Entertainment

    Harrahs Entertainment -- data warehousingsupported a successful shift to a CRM oriented

    corporate strategy. Winner of the 2000 TDWILeadership AwardOperates 21 casinos across the countryIn 1993, the gaming laws changed, whichallowed Harrahs to expandHarrahs decided to compete using a brandstrategy supported by information technologyNeeded to know their customers exceptionallywell

  • 7/31/2019 Dw Tutorial

    12/62

    Harrahs Data WarehousingArchitecture

    WINet sources data from the casino,hotel, and event systems

    The patron data base serves as anoperational data storeThe marketing workbench serves as

    the data warehouse

  • 7/31/2019 Dw Tutorial

    13/62

    Sample Applications

    Operational personnel use PDB tocheck the preferences, history, and

    value of customersAnalysts use PDB and MWB to createoffers to visit a Harrahs casino

    Analysts use MWB to supportpredictive modeling efforts

  • 7/31/2019 Dw Tutorial

    14/62

  • 7/31/2019 Dw Tutorial

    15/62

    Execute

    Right Offer

    Right MessageRight Time

    Predict the valueof a customer

    Market based onthat expected value

    Track transactionsthat are linked to

    marketinginitiatives

    Evaluate theeffectiveness

    Track profitability

    Refine Marketing Approaches

    Learn

    CustomerTreatment

    CustomerAction/

    Non-Action

    Track

    Measure: Profit & Loss Behavior change New test re ort

    Define: Objectives Tests Control cells

  • 7/31/2019 Dw Tutorial

    16/62

    Customer Relationship Lifecycle

    Annual Revenue

    Establish Reinvigorate

    Length of Relationship

    Strengthen

  • 7/31/2019 Dw Tutorial

    17/62

  • 7/31/2019 Dw Tutorial

    18/62

    The Data Mart StrategyThe most common approachBegins with a single mart and architected martsare added over time for more subject areasRelatively inexpensive and easy to implementCan be used as a proof of concept for datawarehousingCan perpetuate the silos of information problemCan postpone difficult decisions and activitiesRequires an overall integration plan

  • 7/31/2019 Dw Tutorial

    19/62

    The Enterprise-wide StrategyA comprehensive warehouse is builtinitiallyAn initial dependent data mart is builtusing a subset of the data in thewarehouseAdditional data marts are built usingsubsets of the data in the warehouseLike all complex projects, it is expensive,time consuming, and prone to failureWhen successful, it results in an

    integrated, scalable warehouse

  • 7/31/2019 Dw Tutorial

    20/62

    Data Sources and TypesPrimarily from legacy, operationalsystemsAlmost exclusively numerical data at thepresent timeExternal data may be included, oftenpurchased from third-party sources

    Technology exists for storing unstructureddata and expect this to become moreimportant over time

  • 7/31/2019 Dw Tutorial

    21/62

    Extraction, Transformation,and Loading (ETL) Processes

    The plumbing work of datawarehousing

    Data are moved from source totarget data basesA very costly, time consuming part

    of data warehousing

  • 7/31/2019 Dw Tutorial

    22/62

  • 7/31/2019 Dw Tutorial

    23/62

    Recent Development:Clickstream Data

    Results from clicks at web sitesA dialog manager handles userinteractions. An ODS helps to customtailor the dialogThe clickstream data is filtered andparsed and sent to a data warehousewhere it is analyzedSoftware is available to analyze theclickstream data

  • 7/31/2019 Dw Tutorial

    24/62

    Recent Development:Further Automation of ETL Processes

    MetaRecon from Metagenix reverseengineers data into information

    Analyzes and profiles source systemsUncovers problems in source systemsRecommends primary and secondarykeys, dimensions and measures, etc.Generates ETL scripts

  • 7/31/2019 Dw Tutorial

    25/62

    Data ExtractionOften performed by COBOL routines(not recommended because of highprogram maintenance and noautomatically generated meta data)Sometimes source data is copied to thetarget database using the replicationcapabilities of standard RDMS (notrecommended because of dirty data inthe source systems)Increasing performed by specialized ETLsoftware

  • 7/31/2019 Dw Tutorial

    26/62

    Sample ETL ToolsDataStage from Ascential SoftwareSAS System from SAS Institute

    Power Mart/Power Center fromInformaticaSagent Solution from Sagent

    SoftwareHummingbird Genio Suite fromHummingbird Communications

  • 7/31/2019 Dw Tutorial

    27/62

  • 7/31/2019 Dw Tutorial

    28/62

    Data CleansingSource systems contain dirty data thatmust be cleansedETL software contains rudimentary datacleansing capabilitiesSpecialized data cleansing software isoften used. Important for performingname and address correction andhouseholding functionsLeading data cleansing vendors includeVality (Integrity), Harte-Hanks (Trillium),and Firstlogic (i.d.Centric)

  • 7/31/2019 Dw Tutorial

    29/62

    Steps in Data Cleansing

    Parsing

    Correcting

    Standardizing

    Matching

    Consolidating

  • 7/31/2019 Dw Tutorial

    30/62

    Parsing

    Parsing locates and identifiesindividual data elements in the

    source files and then isolates thesedata elements in the target files.Examples include parsing the first,

    middle, and last name; streetnumber and street name; and cityand state.

  • 7/31/2019 Dw Tutorial

    31/62

    Correcting

    Corrects parsed individual datacomponents using sophisticated data

    algorithms and secondary datasources.Example include replacing a vanity

    address and adding a zip code.

  • 7/31/2019 Dw Tutorial

    32/62

    Standardizing

    Standardizing applies conversionroutines to transform data into its

    preferred (and consistent) formatusing both standard and custombusiness rules.

    Examples include adding a prename, replacing a nickname, andusing a preferred street name.

  • 7/31/2019 Dw Tutorial

    33/62

    Matching

    Searching and matching recordswithin and across the parsed,

    corrected and standardized databased on predefined business rulesto eliminate duplications.

    Examples include identifying similarnames and addresses.

  • 7/31/2019 Dw Tutorial

    34/62

    Consolidating

    Analyzing and identifyingrelationships between matched

    records and consolidating/mergingthem into ONE representation.

  • 7/31/2019 Dw Tutorial

    35/62

    Data StagingOften used as an interim step between dataextraction and later stepsAccumulates data from asynchronous sources

    using native interfaces, flat files, FTP sessions,or other processesAt a predefined cutoff time, data in the stagingfile is transformed and loaded to the warehouseThere is usually no end user access to thestaging fileAn operational data store may be used for datastaging

  • 7/31/2019 Dw Tutorial

    36/62

    Data Transformation

    Transforms the data in accordancewith the business rules and

    standards that have beenestablishedExample include: format changes,

    deduplication, splitting up fields,replacement of codes, derivedvalues, and aggregates

  • 7/31/2019 Dw Tutorial

    37/62

  • 7/31/2019 Dw Tutorial

    38/62

    Meta DataData about dataNeeded by both information technologypersonnel and usersIT personnel need to know data sourcesand targets; database, table and columnnames; refresh schedules; data usagemeasures; etc.Users need to know entity/attributedefinitions; reports/query tools available;report distribution information; help deskcontact information, etc.

  • 7/31/2019 Dw Tutorial

    39/62

    Recent Development:Meta Data Integration

    A growing realization that meta data iscritical to data warehousing successProgress is being made on gettingvendors to agree on standards and toincorporate the sharing of meta dataamong their toolsVendors like Microsoft, ComputerAssociates, and Oracle have entered themeta data marketplace with significantproduct offerings

  • 7/31/2019 Dw Tutorial

    40/62

    Database Vendors

    High end (i.e., terabyte plus)vendors include IBM (DB2) and

    NCR-Teradata (Teradata)Oracle (8i) and Microsoft (SQLServer 7) are major players for

    smaller databases

  • 7/31/2019 Dw Tutorial

    41/62

    On-line AnalyticalProcessing (OLAP)

    A set of functionality that facilitatesmultidimensional analysis

    Allows users to analyze data in waysthat are natural to themComes in many varieties -- ROLAP,

    MOLAP, DOLAP, etc.

  • 7/31/2019 Dw Tutorial

    42/62

    ROLAPRelational OLAPUses a RDBMS to implement and OLAP

    environmentTypically involves a star schema toprovide the multidimensional capabilitiesOLAP tool manipulates RDBMS starschema dataCalled slowlap by MOLAP vendors

  • 7/31/2019 Dw Tutorial

    43/62

  • 7/31/2019 Dw Tutorial

    44/62

    Star SchemaCreates non-normalized datastructures

    Easier for users to understandOptimized for OLAPUses fact (facts or measures in thebusiness) and dimension(establishes the context of the facts)tables

  • 7/31/2019 Dw Tutorial

    45/62

    OLAP ToolsProducts come from vendors such as Brio, Cognos, Hyperion,and BusinessObjectsTypically available as a fat or thin (i.e., browser) client

    In a web environment, the browser communicates with aweb server, which talks to an application server, whichconnects to backend databasesThe application server provides query, reporting, and OLAPanalysis functionality over the webJava applets or downloaded components augment the thinclientA broadcast server may be used to schedule, run, publish,and broadcast reports, alerts, and responses over the LAN,

    email, or personal digital assistant.

  • 7/31/2019 Dw Tutorial

    46/62

  • 7/31/2019 Dw Tutorial

    47/62

    Dimension Table ExamplesRetail -- store name, zip code, productname, product category, day of week

    Telecommunications -- call origin, calldestinationBanking -- customer name, accountnumber, branch, account officer

    Insurance -- policy type, insured party

  • 7/31/2019 Dw Tutorial

    48/62

    Fact Table ExamplesRetail -- number of units sold, salesamount

    Telecommunications -- length of call in minutes, average number of callsBanking -- average monthlybalanceInsurance -- claims amount

  • 7/31/2019 Dw Tutorial

    49/62

    The Fact Table Key Concatenatesthe Dimension KeysAssume that you want to know thenumber of television sets soldto Best Buys on January 15, 2001.

    The query might be:SELECT CLIENT.CUSNAME, SALES.NOSOLD

    FROM CLIENT, PRODUCT, TIME, SALES

    WHERE CLIENT.CUSNAME=SALES.CUSNAME ANDPRODUCT.PRODNAME=SALES.PRODNAME ANDTIME.DATE=SALES.DATE AND CLIENT.CUSNAME=BEST BUYS

    AND PRODUCT.PRODNAME=TELEVISION AND

    TIME.DATE=#01/15/2001#

  • 7/31/2019 Dw Tutorial

    50/62

    Warehouse Users

    AnalystsManagers

    ExecutivesOperational personnelCustomers and suppliers

  • 7/31/2019 Dw Tutorial

    51/62

    Warehouse Tools andApplications

    SQL queriesManaged query environments

    Structured and ad hoc reportsDSS/EISPortals

    Data miningPackaged applicationsCustom-built applications

    Recent Development:

  • 7/31/2019 Dw Tutorial

    52/62

    Recent Development:Growing Dominance of MS SQLServer 7.0 with OLAP Services

    Low cost, integration of bundledDSS components from one vendor,and extended SQL for OLAPCompetitors are either leaving themarket or are repositioning their

    products to be complimentary

  • 7/31/2019 Dw Tutorial

    53/62

  • 7/31/2019 Dw Tutorial

    54/62

    Owens & MinorOwens&Minor -- data warehousing hassupported integration along the supply chain.Winner of the 1999 TDWI Leadership Award

    the nation's leading distributor of name-brandmedical and surgical supplieshas transformed its business model byintegrating supply chain management, e-business, data warehousing, and Internet

    technologiesas part of this initiative, WISDOM(WebIntelligence Supporting Decisions fromOwens & Minor) has been especially valuable

  • 7/31/2019 Dw Tutorial

    55/62

  • 7/31/2019 Dw Tutorial

    56/62

    WISDOMa Web-based decision support systemthat provides information to OMsemployees, suppliers and customers

    accesses data from a data warehousethat maintains supplier and customertransaction datasold to trading partners as a value addedproductWISDOM II provides data about thetransactions that suppliers and customershave with all of their trading partners

  • 7/31/2019 Dw Tutorial

    57/62

    Sample ApplicationsSupports reporting and queries forinternal personnelSupports an EIS for senior managementSuppliers can determine their marketshare in specific hospitalsHospitals can identify which products arebeing bought off contractWISDOM II extends data warehousing totrading partners through an outsourcingarrangement

  • 7/31/2019 Dw Tutorial

    58/62

    Articles

    Cooper, B.L., H.J. Watson, B.H. Wixom, and D.L. Goodhue, "Data WarehousingSupports Corporate Strategy at First American Corporation," MIS Quarterly ,(December 2000), pp. 547-567. Provides a case study of how the First

    American Corporation turned their strategy and fortunes around through theuse of data warehousing. Stoller, Wixom, and Watson, WISDOM Provides Competitive Advantage atOwens & Minor, (http://terry.uga.edu/~watson/owens&minor.doc) Provides acase study of how data warehousing can support supply chain integration. Watson, Wixom, Buonamica, and Revak, Sherwin -Williams' Data MartStrategy: Creating Intelligence Across the Supply Chain, Communications of

    ACIS, April 2001 . Provides a textbook example of how to implement a datamart strategy.

    Watson, H.J., D.A. Annino, B.H. Wixom, K.L. Avery, and M. Rutherford, Current Practices in Data Warehousing, Information Systems Management ,(Winter, 2001), pp. 47-55. Provides data on companies data warehousingexperiences, with an emphasis on the benefits being realized.Watson, H.J. and L. Volonino, Harrahs High Payoff from CustomerInformation, (http://www.terry.uga.edu/~hwatson/harrahs.doc) Provides acase study of how Harrahs Entertainment has implemented a CRM strategy

    facilitated by data warehousing.

  • 7/31/2019 Dw Tutorial

    59/62

  • 7/31/2019 Dw Tutorial

    60/62

    Websiteshttp://www.olapreport.com (provides detailed information about the OLAPmarket, products, and applications)http://www.firstlogic.com (includes an interactive demo of their datacleansing tool)http://www.billinmon.com (a wealth of current information from thefather of data warehousing) http://www.metagenix.com(illustrates recent advances in ETL tools)http://www.microstrategy.com(excellent materials from one of the leadingDSS vendors)

  • 7/31/2019 Dw Tutorial

    61/62

    Questions

  • 7/31/2019 Dw Tutorial

    62/62