krithi talk impact

Upload: idhayasakthi

Post on 05-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Krithi Talk Impact

    1/169

    DATA WAREHOUSINGAND

    DATA MINING

    S. SudarshanKrithi Ramamritham

    IIT Bombay

    [email protected]@cse.iitb.ernet.in

  • 7/31/2019 Krithi Talk Impact

    2/169

    2

    Course OverviewThe course:what and how

    0. IntroductionI. Data WarehousingII. Decision Support

    and OLAPIII. Data MiningIV. Looking Ahead

    Demos and Labs

  • 7/31/2019 Krithi Talk Impact

    3/169

    3

    0. Introduction

    Data Warehousing,OLAP and data mining:what and why (now)?Relation to OLTPA case study

    demos, labs

  • 7/31/2019 Krithi Talk Impact

    4/169

    4

    Which are ourlowest/highest margin

    customers ? Who are my customers

    and what productsare they buying?

    Which customersare most likely to goto the competition ?

    What impact willnew products/services

    have on revenueand margins?

    What product prom--otions have the biggest

    impact on revenue?

    What is the mosteffective distribution

    channel?

    A producer wants to know.

  • 7/31/2019 Krithi Talk Impact

    5/169

    5

    Data, Data everywhere yet ... I cant find the data I need

    data is scattered over thenetworkmany versions, subtledifferences

    I cant get the data I need need an expert to get the data

    I cant understand the data Ifound

    available data poorly documented

    I cant use the data I found results are unexpected

    data needs to be transformedfrom one form to other

  • 7/31/2019 Krithi Talk Impact

    6/169

  • 7/31/2019 Krithi Talk Impact

    7/169

    7

    What are the users saying...

    Data should be integratedacross the enterpriseSummary data has a realvalue to the organizationHistorical data holds thekey to understanding data

    over timeWhat-if capabilities arerequired

  • 7/31/2019 Krithi Talk Impact

    8/169

    8

    What is Data Warehousing?

    A process of transforming data intoinformation andmaking it available tousers in a timelyenough manner to

    make a difference

    [Forrester Research, April1996]Data

    Information

  • 7/31/2019 Krithi Talk Impact

    9/169

    9

    Evolution

    60s: Batch reports hard to find and analyze informationinflexible and expensive, reprogram every newrequest

    70s: Terminal -based DSS and EIS (executiveinformation systems)

    still inflexible, not integrated with desktop tools

    80s: Desktop data access and analysis tools query tools, spreadsheets, GUIseasier to use, but only access operational databases

    90s: Data warehousing with integrated OLAP

    engines and tools

  • 7/31/2019 Krithi Talk Impact

    10/169

    10

    Warehouses are Very LargeDatabases

    35%

    30%

    25%

    20%

    15%

    10%

    5%

    0%5GB

    5-9GB

    10-19GB 50-99GB 250-499GB

    20-49GB 100-249GB 500GB-1TB

    InitialProjected 2Q96

    Source: META Group, Inc.

    R e s p o n

    d e n t s

  • 7/31/2019 Krithi Talk Impact

    11/169

    11

    Very Large Data Bases

    Terabytes -- 10^12 bytes:

    Petabytes -- 10^15 bytes:

    Exabytes -- 10^18 bytes:

    Zettabytes -- 10^21

    bytes:

    Zottabytes -- 10^24bytes:

    Walmart -- 24 Terabytes

    Geographic Information

    SystemsNational Medical Records

    Weather images

    Intelligence AgencyVideos

  • 7/31/2019 Krithi Talk Impact

    12/169

    12

    Data Warehousing --It is a process

    Technique for assembling andmanaging data from varioussources for the purpose of

    answering businessquestions. Thus makingdecisions that were notprevious possibleA decision support databasemaintained separately fromthe organizations operational

    database

  • 7/31/2019 Krithi Talk Impact

    13/169

    13

    Data Warehouse

    A data warehouse is asubject-oriented

    integrated

    time-varying

    non-volatile

    collection of data that is used primarily inorganizational decision making.

    -- Bill Inmon, Building the Data Warehouse 1996

  • 7/31/2019 Krithi Talk Impact

    14/169

    14

    Explorers, Farmers and Tourists

    Explorers: Seek out the unknown andpreviously unsuspected rewards hiding inthe detailed data

    Farmers: Harvest informationfrom known access paths

    Tourists: Browse informationharvested by farmers

  • 7/31/2019 Krithi Talk Impact

    15/169

    15

    Data Warehouse Architecture

    Data WarehouseEngine

    Optimized Loader

    ExtractionCleansing

    AnalyzeQuery

    Metadata Repository

    RelationalDatabases

    LegacyData

    Purchased

    Data

    ERPSystems

  • 7/31/2019 Krithi Talk Impact

    16/169

    16

    Data Warehouse for DecisionSupport & OLAP

    Putting Information technology to help theknowledge worker make faster and betterdecisions

    Which of my customers are most likely to goto the competition?What product promotions have the biggest

    impact on revenue?How did the share price of softwarecompanies correlate with profits over last 10years?

  • 7/31/2019 Krithi Talk Impact

    17/169

    17

    Decision Support

    Used to manage and control business

    Data is historical or point-in-time

    Optimized for inquiry rather than updateUse of the system is loosely defined andcan be ad-hoc

    Used by managers and end-users tounderstand the business and make

    judgements

  • 7/31/2019 Krithi Talk Impact

    18/169

    18

    Data Mining works with WarehouseData

    Data Warehousingprovides the Enterprisewith a memory

    Data Mining providesthe Enterprise withintelligence

  • 7/31/2019 Krithi Talk Impact

    19/169

    19

    We want to know ...Given a database of 100,000 names, which persons are theleast likely to default on their credit cards?

    Which types of transactions are likely to be fraudulentgiven the demographics and transactional history of aparticular customer?

    If I raise the price of my product by Rs. 2, what is theeffect on my ROI?

    If I offer only 2,500 airline miles as an incentive topurchase rather than 5,000, how many lost responses willresult?

    If I emphasize ease-of-use of the product as opposed to itstechnical capabilities, what will be the net effect on myrevenues?

    Which of my customers are likely to be the most loyal?

    Data Mining helps extract such information

  • 7/31/2019 Krithi Talk Impact

    20/169

  • 7/31/2019 Krithi Talk Impact

    21/169

    21

    Data Mining in Use

    The US Government uses Data Mining totrack fraudA Supermarket becomes an informationbrokerBasketball teams use it to track gamestrategy

    Cross SellingWarranty Claims RoutingHolding on to Good Customers

    Weeding out Bad Customers

  • 7/31/2019 Krithi Talk Impact

    22/169

    22

    What makes data mining possible?

    Advances in the following areas aremaking data mining deployable:

    data warehousingbetter and more data (i.e., operational,

    behavioral, and demographic)the emergence of easily deployed data

    mining tools andthe advent of new data mining

    techniques. -- Gartner Group

  • 7/31/2019 Krithi Talk Impact

    23/169

    23

    Why Separate Data Warehouse?

    PerformanceOp dbs designed & tuned for known txs & workloads.Complex OLAP queries would degrade perf. for op txs.Special data organization, access & implementationmethods needed for multidimensional views & queries.

    FunctionMissing data: Decision support requires historical data, which

    op dbs do not typically maintain.Data consolidation: Decision support requires consolidation(aggregation, summarization) of data from manyheterogeneous sources: op dbs, external sources.Data quality: Different sources typically use inconsistent data

    representations, codes, and formats which have to bereconciled.

  • 7/31/2019 Krithi Talk Impact

    24/169

    24

    What are Operational Systems?

    They are OLTP systemsRun mission criticalapplicationsNeed to work withstringent performancerequirements for

    routine tasksUsed to run abusiness!

  • 7/31/2019 Krithi Talk Impact

    25/169

    25

    RDBMS used for OLTP

    Database Systems have been usedtraditionally for OLTP

    clerical data processing tasksdetailed, up to date datastructured repetitive tasksread/update a few recordsisolation, recovery and integrity are

    critical

  • 7/31/2019 Krithi Talk Impact

    26/169

    26

    Operational Systems

    Run the business in real timeBased on up-to-the-second dataOptimized to handle largenumbers of simple read/writetransactionsOptimized for fast response topredefined transactionsUsed by people who deal with

    customers, products -- clerks,salespeople etc.They are increasingly used bycustomers

  • 7/31/2019 Krithi Talk Impact

    27/169

    27

    Examples of Operational DataData Industry Usage Technology Volumes

    Customer File

    All Track Customer Details

    Legacy application, flat files, main frames

    Small-medium

    Account Balance Finance Control account activities

    Legacy applications, hierarchical databases, mainframe

    Large

    Point-of- Sale data

    Retail Generate bills, manage stock

    ERP, Client/Server, relational databases

    Very Large

    Call Record Telecomm- unications Billing Legacy application, hierarchical database, mainframe

    Very Large

    Production Record

    Manufact- uring

    Control Production

    ERP, relational databases,

    AS/400

    Medium

  • 7/31/2019 Krithi Talk Impact

    28/169

    So, whats different?

  • 7/31/2019 Krithi Talk Impact

    29/169

    29

    Application-Orientation vs.Subject-Orientation

    Application-Orientation

    Operational

    Database

    LoansCreditCard

    Trust

    Savings

    Subject-Orientation

    Data

    Warehouse

    Customer

    VendorProduct

    Activity

  • 7/31/2019 Krithi Talk Impact

    30/169

    30

    OLTP vs. Data Warehouse

    OLTP systems are tuned for knowntransactions and workloads whileworkload is not known a priori in a data

    warehouseSpecial data organization, access methodsand implementation methods are neededto support data warehouse queries(typically multidimensional queries)

    e.g ., average amount spent on phone callsbetween 9AM-5PM in Pune during the monthof December

  • 7/31/2019 Krithi Talk Impact

    31/169

    31

    OLTP vs Data Warehouse

    OLTPApplicationOriented

    Used to runbusinessDetailed dataCurrent up to date

    Isolated DataRepetitive accessClerical User

    Warehouse (DSS)Subject OrientedUsed to analyze

    businessSummarized andrefinedSnapshot data

    Integrated DataAd-hoc accessKnowledge User(Manager)

  • 7/31/2019 Krithi Talk Impact

    32/169

    32

    OLTP vs Data Warehouse

    OLTPPerformance SensitiveFew Records accessed at

    a time (tens)

    Read/Update Access

    No data redundancy

    Database Size 100MB-100 GB

    Data WarehousePerformance relaxedLarge volumes accessed

    at a time(millions)Mostly Read (BatchUpdate)Redundancy presentDatabase Size100 GB - few terabytes

  • 7/31/2019 Krithi Talk Impact

    33/169

    33

    OLTP vs Data Warehouse

    OLTPTransactionthroughput is the

    performance metricThousands of usersManaged inentirety

    Data WarehouseQuery throughputis the performance

    metricHundreds of usersManaged bysubsets

  • 7/31/2019 Krithi Talk Impact

    34/169

    34

    To summarize ...

    OLTP Systems areused to run abusiness

    The DataWarehouse helpsto optimize thebusiness

  • 7/31/2019 Krithi Talk Impact

    35/169

    35

    Why Now?

    Data is being producedERP provides clean data

    The computing power is availableThe computing power is affordableThe competitive pressures are

    strongCommercial products are available

    h d

  • 7/31/2019 Krithi Talk Impact

    36/169

    36

    Myths surrounding OLAP Serversand Data Marts

    Data marts and OLAP servers are departmentalsolutions supporting a handful of usersMillion dollar massively parallel hardware is

    needed to deliver fast time for complex queriesOLAP servers require massive and unwieldyindicesComplex OLAP queries clog the network withdataData warehouses must be at least 100 GB to beeffective

    Source -- Arbor Software Home Page

  • 7/31/2019 Krithi Talk Impact

    37/169

    37

    Wal*Mart Case Study

    Founded by Sam WaltonOne the largest Super Market Chains

    in the US

    Wal*Mart: 2000+ Retail Stores

    SAM's Clubs 100+WholesalersStores

    This case study is from Felipe Carinos (NCR

    Teradata) presentation made at Stanford DatabaseSeminar

  • 7/31/2019 Krithi Talk Impact

    38/169

    38

    Old Retail Paradigm

    Wal*MartInventoryManagement

    Merchandise AccountsPayablePurchasingSupplier Promotions:

    National, Region,Store Level

    SuppliersAccept OrdersPromote Products

    Provide specialIncentivesMonitor and TrackThe Incentives

    Bill and CollectReceivablesEstimate RetailerDemands

    N (J I Ti ) R il

  • 7/31/2019 Krithi Talk Impact

    39/169

    39

    New (Just-In-Time) RetailParadigm

    No more dealsShelf-Pass Through (POS Application)

    One Unit PriceSuppliers paid once a week on ACTUAL items sold

    Wal*Mart ManagerDaily Inventory RestockSuppliers (sometimes SameDay) ship to Wal*Mart

    Warehouse-Pass ThroughStock some Large Items

    Delivery may come from supplierDistribution Center

    Suppliers merchandise unloaded directly onto Wal*MartTrucks

  • 7/31/2019 Krithi Talk Impact

    40/169

    40

    Wal*Mart System

    NCR 5100M 96Nodes;Number of Rows:Historical Data:New Daily Volume:

    Number of Users:Number of Queries:

    24 TB Raw Disk; 700 -1000 Pentium CPUs

    > 5 Billions65 weeks (5 Quarters)Current Apps: 75 MillionNew Apps: 100 Million +Thousands60,000 per week

  • 7/31/2019 Krithi Talk Impact

    41/169

    41

    Course Overview

    0. IntroductionI. Data Warehousing

    II. Decision Supportand OLAPIII. Data Mining

    IV. Looking Ahead

    Demos and Labs

    I D W h

  • 7/31/2019 Krithi Talk Impact

    42/169

    42

    I. Data Warehouses:Architecture, Design & Construction

    DW ArchitectureLoading, refreshingStructuring/ModelingDWs and Data MartsQuery Processing

    demos, labs

  • 7/31/2019 Krithi Talk Impact

    43/169

    43

    Data Warehouse Architecture

    Data WarehouseEngine

    Optimized Loader

    ExtractionCleansing

    AnalyzeQuery

    Metadata Repository

    RelationalDatabases

    LegacyData

    Purchased

    Data

    ERPSystems

  • 7/31/2019 Krithi Talk Impact

    44/169

    44

    Components of the Warehouse

    Data Extraction and LoadingThe Warehouse

    Analyze and Query -- OLAP ToolsMetadata

    Data Mining tools

  • 7/31/2019 Krithi Talk Impact

    45/169

    Loading the Warehouse

    Cleaning the data

    before it is loaded

  • 7/31/2019 Krithi Talk Impact

    46/169

    46

    Source Data

    Typically host based, legacy

    applicationsCustomized applications,COBOL, 3GL, 4GL

    Point of Contact DevicesPOS, ATM, Call switches

    External SourcesNielsens, Acxiom, CMIE,

    Vendors, Partners

    Sequential Legacy Relational ExternalOperational/ Source Data

  • 7/31/2019 Krithi Talk Impact

    47/169

    47

    Data Quality - The Reality

    Tempting to think creating a datawarehouse is simply extractingoperational data and entering into adata warehouse

    Nothing could be farther from thetruthWarehouse data comes fromdisparate questionable sources

  • 7/31/2019 Krithi Talk Impact

    48/169

    48

    Data Quality - The Reality

    Legacy systems no longer documentedOutside sources with questionable qualityproceduresProduction systems with no built inintegrity checks and no integration

    Operational systems are usually designed to

    solve a specific business problem and arerarely developed to a a corporate plan

    And get it done quickly, we do not have time toworry about corporate standards...

  • 7/31/2019 Krithi Talk Impact

    49/169

    49

    Data Integration Across Sources

    Trust Credit cardSavings Loans

    Same datadifferent name

    Different dataSame name

    Data found herenowhere else

    Different keyssame data

  • 7/31/2019 Krithi Talk Impact

    50/169

    50

    Data Transformation Example

    appl A - balanceappl B - balappl C - currbalappl D - balcurr

    appl A - pipeline - cmappl B - pipeline - inappl C - pipeline - feet

    appl D - pipeline - yds

    appl A - m,f appl B - 1,0appl C - x,yappl D - male, female

    Data Warehouse

  • 7/31/2019 Krithi Talk Impact

    51/169

    51

    Data Integrity Problems

    Same person, different spellingsAgarwal, Agrawal, Aggarwal etc...

    Multiple ways to denote company namePersistent Systems, PSPL, Persistent Pvt.LTD.

    Use of different namesmumbai, bombay

    Different account numbers generated bydifferent applications for the same customerRequired fields left blankInvalid product codes collected at point of sale

    manual entry leads to mistakes

    in case of a problem use 9999999

  • 7/31/2019 Krithi Talk Impact

    52/169

    52

    Data Transformation Terms

    ExtractingConditioning

    ScrubbingMergingHouseholding

    EnrichmentScoring

    LoadingValidatingDelta Updating

  • 7/31/2019 Krithi Talk Impact

    53/169

    53

    Data Transformation Terms

    ExtractingCapture of data from operational source in

    as is status

    Sources for data generally in legacymainframes in VSAM, IMS, IDMS, DB2; moredata today in relational databases on Unix

    ConditioningThe conversion of data types from the sourceto the target data store (warehouse) --always a relational database

  • 7/31/2019 Krithi Talk Impact

    54/169

    54

    Data Transformation Terms

    HouseholdingIdentifying all members of a household

    (living at the same address)Ensures only one mail is sent to a

    householdCan result in substantial savings: 1

    lakh catalogues at Rs. 50 each costs Rs.50 lakhs. A 2% savings would save Rs.1 lakh.

  • 7/31/2019 Krithi Talk Impact

    55/169

    55

    Data Transformation Terms

    EnrichmentBring data from external sources to

    augment/enrich operational data. Data

    sources include Dunn and Bradstreet, A.C. Nielsen, CMIE, IMRA etc...Scoring

    computation of a probability of anevent. e.g..., chance that a customerwill defect to AT&T from MCI, chancethat a customer is likely to buy a newproduct

  • 7/31/2019 Krithi Talk Impact

    56/169

    56

    Loads

    After extracting, scrubbing, cleaning,validating etc. need to load the datainto the warehouse

    Issueshuge volumes of data to be loadedsmall time window available when warehouse can betaken off line (usually nights)when to build index and summary tablesallow system administrators to monitor, cancel, resume,change load ratesRecover gracefully -- restart after failure from whereyou were and without loss of data integrity

  • 7/31/2019 Krithi Talk Impact

    57/169

    57

    Load Techniques

    Use SQL to append or insert newdata

    record at a time interfacewill lead to random disk I/Os

    Use batch load utility

  • 7/31/2019 Krithi Talk Impact

    58/169

  • 7/31/2019 Krithi Talk Impact

    59/169

    59

    Refresh

    Propagate updates on source data tothe warehouseIssues:

    when to refreshhow to refresh -- refresh techniques

  • 7/31/2019 Krithi Talk Impact

    60/169

    60

    When to Refresh?

    periodically (e.g., every night, everyweek) or after significant eventson every update: not warranted unlesswarehouse data require current data (upto the minute stock quotes)refresh policy set by administrator based

    on user needs and trafficpossibly different policies for differentsources

  • 7/31/2019 Krithi Talk Impact

    61/169

    61

    Refresh Techniques

    Full Extract from base tablesread entire source table: too expensivemaybe the only choice for legacy

    systems

  • 7/31/2019 Krithi Talk Impact

    62/169

    62

    How To Detect Changes

    Create a snapshot log table to recordids of updated rows of source dataand timestampDetect changes by:

    Defining after row triggers to updatesnapshot log when source table

    changesUsing regular transaction log to detect

    changes to source data

  • 7/31/2019 Krithi Talk Impact

    63/169

    63

    Data Extraction and Cleansing

    Extract data from existingoperational and legacy dataIssues:

    Sources of data for the warehouseData quality at the sourcesMerging different data sourcesData Transformation

    How to propagate updates (on the sources) tothe warehouseTerabytes of data to be loaded

  • 7/31/2019 Krithi Talk Impact

    64/169

    64

    Scrubbing Data

    Sophisticatedtransformation tools.Used for cleaning the

    quality of dataClean data is vital for thesuccess of thewarehouse

    ExampleSeshadri, Sheshadri,Sesadri, Seshadri S.,Srinivasan Seshadri, etc.are the same person

  • 7/31/2019 Krithi Talk Impact

    65/169

    65

    Scrubbing Tools

    Apertus -- Enterprise/IntegratorVality -- IPE

    Postal Soft

  • 7/31/2019 Krithi Talk Impact

    66/169

    Structuring/Modeling Issues

    Data Heart of the Data

  • 7/31/2019 Krithi Talk Impact

    67/169

    67

    Data -- Heart of the DataWarehouse

    Heart of the data warehouse is thedata itself!Single version of the truthCorporate memoryData is organized in a way thatrepresents business -- subjectorientation

  • 7/31/2019 Krithi Talk Impact

    68/169

    68

    Data Warehouse Structure

    Subject Orientation -- customer,product, policy, account etc... Asubject may be implemented as aset of related tables. E.g.,customer may be five tables

  • 7/31/2019 Krithi Talk Impact

    69/169

    69

    Data Warehouse Structure

    base customer (1985-87)custid, from date, to date, name, phone, dob

    base customer (1988-90)custid, from date, to date, name, credit rating,employer

    customer activity (1986-89) -- monthlysummarycustomer activity detail (1987-89)

    custid, activity date, amount, clerk id, order no

    customer activity detail (1990-91)custid, activity date, amount, line item no, order no

    Time is part of key of each table

  • 7/31/2019 Krithi Talk Impact

    70/169

    70

    Data Granularity in Warehouse

    Summarized data storedreduce storage costsreduce cpu usageincreases performance since smaller

    number of records to be processeddesign around traditional high level

    reporting needstradeoff with volume of data to be

    stored and detailed usage of data

  • 7/31/2019 Krithi Talk Impact

    71/169

    71

    Granularity in Warehouse

    Can not answer some questions withsummarized data

    Did Anand call Seshadri last month?Not possible to answer if total durationof calls by Anand over a month is onlymaintained and individual call detailsare not.

    Detailed data too voluminous

  • 7/31/2019 Krithi Talk Impact

    72/169

    72

    Granularity in Warehouse

    Tradeoff is to have dual level of granularity

    Store summary data on disks95% of DSS processing done against this

    data

    Store detail on tapes

    5% of DSS processing against this data

  • 7/31/2019 Krithi Talk Impact

    73/169

    73

    Vertical Partitioning

    Frequentlyaccessed Rarelyaccessed

    Smaller tableand so less I/O

    Acct.No Name Balance Date Opened

    InterestRate Address

    Acct.No Balance

    Acct.No Name Date Opened

    InterestRate Address

  • 7/31/2019 Krithi Talk Impact

    74/169

    74

    Derived Data

    Introduction of derived (calculateddata) may often helpHave seen this in the context of duallevels of granularityCan keep auxiliary views andindexes to speed up queryprocessing

    h

  • 7/31/2019 Krithi Talk Impact

    75/169

    75

    Schema Design

    Database organizationmust look like businessmust be recognizable by business user

    approachable by business userMust be simple

    Schema Types

    Star SchemaFact Constellation SchemaSnowflake schema

    bl

  • 7/31/2019 Krithi Talk Impact

    76/169

    76

    Dimension Tables

    Dimension tablesDefine business in terms already

    familiar to users

    Wide rows with lots of descriptive textSmall tables (about a million rows)Joined to fact table by a foreign keyheavily indexedtypical dimensions

    time periods, geographic region (markets,cities), products, customers, salesperson,etc.

    F T bl

  • 7/31/2019 Krithi Talk Impact

    77/169

    77

    Fact Table

    Central tablemostly raw numeric itemsnarrow rows, a few columns at mostlarge number of rows (millions to a

    billion)Access via dimensions

    S S h

  • 7/31/2019 Krithi Talk Impact

    78/169

    78

    Star Schema

    A single fact table and for eachdimension one dimension tableDoes not capture hierarchies directly

    T i

    m e

    p r o d

    c u s t

    c i t y

    f

    a c t

    date, custno, prodno, cityname, ...

  • 7/31/2019 Krithi Talk Impact

    79/169

    F C ll i

  • 7/31/2019 Krithi Talk Impact

    80/169

    80

    Fact Constellation

    Fact ConstellationMultiple fact tables that share many

    dimension tables

    Booking and Checkout may share manydimension tables in the hotel industry

    Hotels

    Travel Agents

    Promotion

    Room Type

    Customer

    Booking

    Checkout

  • 7/31/2019 Krithi Talk Impact

    81/169

  • 7/31/2019 Krithi Talk Impact

    82/169

    82

    Creating Arrays

    Many times each occurrence of a sequence of data is in a different physical locationBeneficial to collect all occurrences togetherand store as an array in a single rowMakes sense only if there are a stablenumber of occurrences which are accessedtogetherIn a data warehouse, such situations arisenaturally due to time based orientation

    can create an array by month

  • 7/31/2019 Krithi Talk Impact

    83/169

    83

    Selective Redundancy

    Description of an item can be storedredundantly with order table --most often item description is alsoaccessed with order tableUpdates have to be careful

  • 7/31/2019 Krithi Talk Impact

    84/169

    84

    Partitioning

    Breaking data into severalphysical units that can behandled separatelyNot a question of whether to do it in datawarehouses but how to doitGranularity andpartitioning are key toeffective implementationof a warehouse

  • 7/31/2019 Krithi Talk Impact

    85/169

  • 7/31/2019 Krithi Talk Impact

    86/169

    86

    Criterion for Partitioning

    Typically partitioned bydateline of businessgeographyorganizational unitany combination of above

  • 7/31/2019 Krithi Talk Impact

    87/169

    87

    Where to Partition?

    Application level or DBMS levelMakes sense to partition atapplication level

    Allows different definition for each yearImportant since warehouse spans many

    years and as business evolves definitionchanges

    Allows data to be moved betweenprocessing complexes easily

  • 7/31/2019 Krithi Talk Impact

    88/169

    Data Warehouse vs. Data Marts

    What comes first

    From the Data Warehouse to Data

  • 7/31/2019 Krithi Talk Impact

    89/169

    89

    Marts

    DepartmentallyStructured

    IndividuallyStructured

    Data WarehouseOrganizationallyStructured

    Less

    More

    HistoryNormalizedDetailed

    Data

    Information

  • 7/31/2019 Krithi Talk Impact

    90/169

    90

    Data Warehouse and Data Marts

    OLAPData MartLightly summarizedDepartmentally structured

    Organizationally structured AtomicDetailed Data Warehouse Data

    Characteristics of the

  • 7/31/2019 Krithi Talk Impact

    91/169

    91

    Departmental Data Mart

    OLAPSmallFlexible

    Customized byDepartmentSource is

    departmentallystructured datawarehouse

    Techniques for Creating

  • 7/31/2019 Krithi Talk Impact

    92/169

    92

    Departmental Data Mart

    OLAP

    Subset

    SummarizedSuperset

    Indexed

    Arrayed

    Sales Mktg.Finance

  • 7/31/2019 Krithi Talk Impact

    93/169

    93

    Data Mart Centric

    Data Marts

    Data Sources

    Data Warehouse

    Problems with Data Mart Centric

  • 7/31/2019 Krithi Talk Impact

    94/169

    94

    Solution

    If you end up creating multiple warehouses,integrating them is a problem

  • 7/31/2019 Krithi Talk Impact

    95/169

    95

    True Warehouse

    Data Marts

    Data Sources

    Data Warehouse

  • 7/31/2019 Krithi Talk Impact

    96/169

    96

    Query Processing

    Indexing

    Pre computedviews/aggregatesSQL extensions

  • 7/31/2019 Krithi Talk Impact

    97/169

    97

    Indexing Techniques

    Exploiting indexes to reducescanning of data is of crucialimportanceBitmap IndexesJoin IndexesOther Issues

    Text indexingParallelizing and sequencing of index

    builds and incremental updates

    Indexing Techniques

  • 7/31/2019 Krithi Talk Impact

    98/169

    98

    Indexing Techniques

    Bitmap index:A collection of bitmaps -- one for each

    distinct value of the column

    Each bitmap has N bits where N is thenumber of rows in the tableA bit corresponding to a value v for a

    row r is set if and only if r has the valuefor the indexed attribute

  • 7/31/2019 Krithi Talk Impact

    99/169

    d

  • 7/31/2019 Krithi Talk Impact

    100/169

    100

    Customer Query : select * from customer where

    gender = F and vote = Y

    0

    0

    0

    0

    0

    0

    0

    0

    0

    1

    1

    1

    1

    1

    1

    1

    1

    1

    Bitmap Index

    M

    F

    F

    F

    F

    M

    Y

    Y

    Y

    N

    N

    N

    d

  • 7/31/2019 Krithi Talk Impact

    101/169

    101

    Bit Map Index

    Cust Region RatingC1 N HC2 S MC3 W LC4 W HC5 S LC6 W L

    C7 N H

    Base Table

    Row ID N S E W

    1 1 0 0 0

    2 0 1 0 0

    3 0 0 0 1

    4 0 0 0 1

    5 0 1 0 0

    6 0 0 0 1

    7 1 0 0 0

    Row ID H M L

    1 1 0 0

    2 0 1 0

    3 0 0 0

    4 0 0 0

    5 0 1 0

    6 0 0 0

    7 1 0 0

    Rating Index Region Index

    Customers where Region = W Rating = M And

    Bi M I d

  • 7/31/2019 Krithi Talk Impact

    102/169

    102

    BitMap Indexes

    Comparison, join and aggregation operationsare reduced to bit arithmetic with dramaticimprovement in processing time

    Significant reduction in space and I/O (30:1)Adapted for higher cardinality domains as well.Compression (e.g., run-length encoding)exploitedProducts that support bitmaps: Model 204,TargetIndex (Redbrick), IQ (Sybase), Oracle7.3

  • 7/31/2019 Krithi Talk Impact

    103/169

    J i I d

  • 7/31/2019 Krithi Talk Impact

    104/169

    104

    Join Indexes

    Join indexes can also span multipledimension tables

    e.g., a join index on city and time dimension of calls fact table

    Star Join Processing

  • 7/31/2019 Krithi Talk Impact

    105/169

    105

    Star Join Processing

    Use join indexes to join dimensionand fact table

    Calls C+T

    C+T+L

    C+T+L +P

    Time

    Loca- tion

    Plan

    Optimized Star Join Processing

  • 7/31/2019 Krithi Talk Impact

    106/169

    106

    Optimized Star Join Processing

    Time

    Loca- tion

    Plan

    Calls

    Virtual Cross Product of T, L and P

    Apply Selections

    Bitmapped Join Processing

  • 7/31/2019 Krithi Talk Impact

    107/169

    107

    Bitmapped Join Processing

    AND

    Time

    Loca- tion

    Plan

    Calls

    Calls

    Calls

    Bitmaps 1 0

    1

    0 0 1

    1 1 0

  • 7/31/2019 Krithi Talk Impact

    108/169

  • 7/31/2019 Krithi Talk Impact

    109/169

    Parallel Query Processing

  • 7/31/2019 Krithi Talk Impact

    110/169

    110

    Parallel Query Processing

    Partitioned DataParallel scansYields I/O parallelism

    Parallel algorithms for relational operatorsJoins, Aggregates, Sort

    Parallel UtilitiesLoad, Archive, Update, Parse, Checkpoint,Recovery

    Parallel Query Optimization

    Pre-computed Aggregates

  • 7/31/2019 Krithi Talk Impact

    111/169

    111

    Pre computed Aggregates

    Keep aggregated data forefficiency (pre-computed queries)

    QuestionsWhich aggregates to compute?How to update aggregates?

    How to use pre-computedaggregates in queries?

    P t d Agg g t

  • 7/31/2019 Krithi Talk Impact

    112/169

    112

    Pre-computed Aggregates

    Aggregated table can be maintainedby the

    warehouse server

    middle tierclient applications

    Pre-computed aggregates -- specialcase of materialized views -- samequestions and issues remain

    SQL Extensions

  • 7/31/2019 Krithi Talk Impact

    113/169

    113

    SQL Extensions

    Extended family of aggregatefunctions

    rank (top 10 customers)percentile (top 30% of customers)median, modeObject Relational Systems allow

    addition of new aggregate functions

    SQL Extensions

  • 7/31/2019 Krithi Talk Impact

    114/169

    114

    SQL Extensions

    Reporting featuresrunning total, cumulative totals

    Cube operatorgroup by on all subsets of a set of

    attributes (month,city)redundant scan and sorting of data can

    be avoided

    Red Brick has Extended set ofAggregates

  • 7/31/2019 Krithi Talk Impact

    115/169

    115

    Aggregates

    Select month, dollars, cume(dollars) asrun_dollars, weight, cume(weight) asrun_weightsfrom sales, market, product, period t

    where year = 1993and product like Columbian% and city like San Fr% order by t.perkey

    RISQL (Red Brick Systems)Extensions

  • 7/31/2019 Krithi Talk Impact

    116/169

    116

    Extensions

    AggregatesCUMEMOVINGAVG

    MOVINGSUMRANKTERTILERATIOTOREPORT

    Calculating RowSubtotals

    BREAK BY

    Sophisticated DateTime SupportDATEDIFF

    Using SubQueriesin calculations

    Using SubQueries in Calculations

  • 7/31/2019 Krithi Talk Impact

    117/169

    117

    Using SubQueries in Calculations

    select product, dollars as jun97_sales,(select sum(s1.dollars)from market mi, product pi, period, ti, sales si

    where pi.product = product.productand ti.year = period.yearand mi.city = market.city) as total97_sales,100 * dollars/

    (select sum(s1.dollars)from market mi, product pi, period, ti, sales si where pi.product = product.product

    and ti.year = period.yearand mi.city = market.city) as percent_of_yr

    from market, product, period, sales

    where year = 1997and month = June and city like Ahmed% order by product;

    Course Overview

  • 7/31/2019 Krithi Talk Impact

    118/169

    118

    Course Overview

    The course:what and how

    0. IntroductionI. Data WarehousingII. Decision Supportand OLAP

    III. Data MiningIV. Looking Ahead

    Demos and Labs

  • 7/31/2019 Krithi Talk Impact

    119/169

    II. On-Line Analytical Processing (OLAP)

    Making DecisionSupport Possible

    Limitations of SQL

  • 7/31/2019 Krithi Talk Impact

    120/169

    120

    Limitations of SQL

    A Freshman inBusiness needs

    a Ph.D. in SQL

    -- Ralph Kimball

    Typical OLAP Queries

  • 7/31/2019 Krithi Talk Impact

    121/169

    121

    Typical OLAP Queries

    Write a multi-table join to compare sales for eachproduct line YTD this year vs. last year.

    Repeat the above process to find the top 5

    product contributors to margin.Repeat the above process to find the sales of aproduct line to new vs. existing customers.

    Repeat the above process to find the customersthat have had negative sales growth.

    What Is OLAP?

  • 7/31/2019 Krithi Talk Impact

    122/169

    122

    * Reference: http://www.arborsoft.com/essbase/wht_ppr/coddTOC.html

    Online Analytical Processing - coined byEF Codd in 1994 paper contracted byArbor Software * Generally synonymous with earlier terms such asDecisions Support, Business Intelligence, ExecutiveInformation SystemOLAP = Multidimensional DatabaseMOLAP: Multidimensional OLAP (Arbor Essbase,Oracle Express)ROLAP: Relational OLAP (Informix MetaCube,Microstrategy DSS Agent)

    The OLAP Market

  • 7/31/2019 Krithi Talk Impact

    123/169

    123

    The OLAP Market

    Rapid growth in the enterprise market1995: $700 Million1997: $2.1 Billion

    Significant consolidation activity among

    major DBMS vendors10/94: Sybase acquires ExpressWay7/95: Oracle acquires Express11/95: Informix acquires Metacube1/97: Arbor partners up with IBM10/96: Microsoft acquires Panorama

    Result: OLAP shifted from small verticalniche to mainstream DBMS category

    Strengths of OLAP

  • 7/31/2019 Krithi Talk Impact

    124/169

    124

    Strengths of OLAP

    It is a powerful visualization paradigm

    It provides fast, interactive responsetimes

    It is good for analyzing time series

    It can be useful to find some clusters and

    outliersMany vendors offer OLAP tools

    OLAP Is FASMI

  • 7/31/2019 Krithi Talk Impact

    125/169

    125

    Nigel Pendse, Richard Creath - The OLAP Report

    OLAP Is FASMI

    FastAnalysisSharedMultidimensionalInformation

    M l i di i l D

  • 7/31/2019 Krithi Talk Impact

    126/169

    126Month

    1 2 3 4 765

    P r o

    d u c

    t

    Toothpaste

    JuiceColaMilk

    Cream

    Soap

    WSN

    Dimensions: Product, Region, TimeHierarchical summarization paths

    Product Region Time Industry Country Year

    Category Region Quarter

    Product City Month Week

    Office Day

    Multi-dimensional Data

    HeyI sold $100M worth of goods

    Data Cube Lattice

  • 7/31/2019 Krithi Talk Impact

    127/169

    127

    Data Cube Lattice

    Cube latticeABC

    AB AC BCA B C

    noneCan materialize some groupbys, compute otherson demandQuestion: which groupbys to materialze?

    Question: what indices to createQuestion: how to organize data (chunks, etc)

    Visualizing Neighbors is simpler

  • 7/31/2019 Krithi Talk Impact

    128/169

    128

    Visualizing Neighbors is simpler

    1 2 3 4 5 6 7 8 AprMayJun

    Jul AugSepOctNov

    DecJanFebMar

    Month Store Sales Apr 1 Apr 2 Apr 3 Apr 4 Apr 5 Apr 6 Apr 7 Apr 8May 1May 2May 3May 4

    May 5May 6May 7May 8Jun 1Jun 2

    A Visual Operation: Pivot (Rotate)

  • 7/31/2019 Krithi Talk Impact

    129/169

    129

    A Visual Operation: Pivot (Rotate)

    10

    47

    30

    12

    JuiceCola

    Milk

    Cream

    3/1 3/2 3/3 3/4

    Date

    Product

    Slicing and Dicing

  • 7/31/2019 Krithi Talk Impact

    130/169

    130

    Slicing and Dicing

    Product

    Sales Channel Retail Direct Special

    Household

    Telecomm

    Video

    Audio IndiaFar East

    Europe

    The Telecomm Slice

  • 7/31/2019 Krithi Talk Impact

    131/169

    Nature of OLAP Analysis

  • 7/31/2019 Krithi Talk Impact

    132/169

    132

    Nature of OLAP Analysis

    Aggregation -- (total sales,percent-to-total)Comparison -- Budget vs.Expenses

    Ranking -- Top 10, quartileanalysisAccess to detailed and

    aggregate dataComplex criteriaspecificationVisualization

  • 7/31/2019 Krithi Talk Impact

    133/169

    Multidimensional Spreadsheets

  • 7/31/2019 Krithi Talk Impact

    134/169

    134

    Multidimensional Spreadsheets

    Analysts needspreadsheets that supportpivot tables (cross-tabs)drill-down and roll-up

    slice and dicesortselectionsderived attributes

    Popular in retail domain

    OLAP - Data Cube

  • 7/31/2019 Krithi Talk Impact

    135/169

    135

    OLAP Data Cube

    Idea: analysts need to group data in manydifferent ways

    eg. Sales(region, product, prodtype,prodstyle, date, saleamount)

    saleamount is a measure attribute, rest aredimension attributesgroupby every subset of the other attributes

    materialize (precompute and store)groupbys to give online response

    Also: hierarchies on attributes: date ->weekday,date -> month -> quarter -> year

    SQL Extensions

  • 7/31/2019 Krithi Talk Impact

    136/169

    136

    SQL Extensions

    Front-end tools requireExtended Family of Aggregate Functionsrank, median, mode

    Reporting Featuresrunning totals, cumulative totals

    Results of multiple group bytotal sales by month and total sales by

    productData Cube

    Relational OLAP: 3 Tier DSS

  • 7/31/2019 Krithi Talk Impact

    137/169

    137

    Relational OLAP: 3 Tier DSSData Warehouse ROLAP Engine Decision Support Client

    Database Layer Application Logic Layer Presentation Layer

    Store atomic

    data in industrystandardRDBMS.

    Generate SQL

    execution plans inthe ROLAP engineto obtain OLAPfunctionality.

    Obtain multi-

    dimensionalreports from theDSS Client.

  • 7/31/2019 Krithi Talk Impact

    138/169

    Typical OLAP ProblemsD E l i

  • 7/31/2019 Krithi Talk Impact

    139/169

    139

    Data Explosion Syndrome

    Number of Dimensions

    N u m b

    e r o

    f A g g r e g a

    t i o n s

    (4 levels in each dimension)

    Data Explosion

    Microsoft TechEd98

    Metadata Repository

  • 7/31/2019 Krithi Talk Impact

    140/169

    140

    etadata epos to y

    Administrative metadatasource databases and their contentsgateway descriptionswarehouse schema, view & derived data definitions

    dimensions, hierarchiespre-defined queries and reportsdata mart locations and contentsdata partitionsdata extraction, cleansing, transformation rules,defaultsdata refresh and purging rulesuser profiles, user groupssecurity: user authorization, access control

    Metdata Repository .. 2

  • 7/31/2019 Krithi Talk Impact

    141/169

    141

    p y

    Business databusiness terms and definitionsownership of data

    charging policiesoperational metadata

    data lineage: history of migrated data andsequence of transformations appliedcurrency of data: active, archived, purgedmonitoring information: warehouse usagestatistics, error reports, audit trails.

    Recipe for a Successful

  • 7/31/2019 Krithi Talk Impact

    142/169

    pWarehouse

  • 7/31/2019 Krithi Talk Impact

    143/169

    For a Successful Warehouse

  • 7/31/2019 Krithi Talk Impact

    144/169

    144

    Look closely at the data extracting,cleaning, and loading toolsImplement a user accessible automated

    directory to information stored in thewarehouseDetermine a plan to test the integrity of the data in the warehouseFrom the start get warehouse users in thehabit of 'testing' complex queries

    For a Successful Warehouse

  • 7/31/2019 Krithi Talk Impact

    145/169

    145

    Coordinate system roll-out with networkadministration personnelWhen in a bind, ask others who have

    done the same thing for adviceBe on the lookout for small, but strategic,projectsMarket and sell your data warehousingsystems

    Data Warehouse Pitfalls

  • 7/31/2019 Krithi Talk Impact

    146/169

    146

    You are going to spend much time extracting,cleaning, and loading data

    Despite best efforts at project management, datawarehousing project scope will increase

    You are going to find problems with systemsfeeding the data warehouse

    You will find the need to store data not beingcaptured by any existing system

    You will need to validate data not being validatedby transaction processing systems

    Data Warehouse Pitfalls

  • 7/31/2019 Krithi Talk Impact

    147/169

    147

    Some transaction processing systems feeding thewarehousing system will not contain detail

    Many warehouse end users will be trained andnever or seldom apply their training

    After end users receive query and report tools,requests for IS written reports may increase

    Your warehouse users will develop conflictingbusiness rules

    Large scale data warehousing can become anexercise in data homogenizing

    Data Warehouse Pitfalls

  • 7/31/2019 Krithi Talk Impact

    148/169

    148

    'Overhead' can eat up great amounts of diskspaceThe time it takes to load the warehouse willexpand to the amount of the time in theavailable window... and then someAssigning security cannot be done with atransaction processing system mindsetYou are building a HIGH maintenance system

    You will fail if you concentrate on resourceoptimization to the neglect of project, data, andcustomer management issues and anunderstanding of what adds value to thecustomer

    DW and OLAP Research Issues

  • 7/31/2019 Krithi Talk Impact

    149/169

    149

    Data cleaningfocus on data inconsistencies, not schema differencesdata mining techniques

    Physical Designdesign of summary tables, partitions, indexestradeoffs in use of different indexes

    Query processingselecting appropriate summary tablesdynamic optimization with feedbackacid test for query optimization: cost estimation, use of transformations, search strategiespartitioning query processing between OLAP server andbackend server.

    DW and OLAP Research Issues .. 2

  • 7/31/2019 Krithi Talk Impact

    150/169

    150

    Warehouse Managementdetecting runaway queriesresource managementincremental refresh techniquescomputing summary tables during loadfailure recovery during load and refreshprocess management: scheduling queries,

    load and refreshQuery processing, cachinguse of workflow technology for processmanagement

  • 7/31/2019 Krithi Talk Impact

    151/169

    Products, References, Useful Links

    Reporting Tools

  • 7/31/2019 Krithi Talk Impact

    152/169

    152

    g

    Andyne Computing -- GQLBrio -- BrioQueryBusiness Objects -- Business ObjectsCognos -- ImpromptuInformation Builders Inc. -- Focus for WindowsOracle -- Discoverer2000Platinum Technology -- SQL*Assist, ProReportsPowerSoft -- InfoMakerSAS Institute -- SAS/AssistSoftware AG -- EsperantSterling Software -- VISION:Data

    OLAP and Executive InformationSystems

  • 7/31/2019 Krithi Talk Impact

    153/169

    153

    Andyne Computing -- PabloArbor Software -- Essbase

    Cognos -- PowerPlay

    Comshare -- Commander

    OLAPHolistic Systems -- Holos

    Information Advantage --AXSYS, WebOLAP

    Informix -- MetacubeMicrostrategies --DSS/Agent

    Microsoft -- PlatoOracle -- Express

    Pilot -- LightShip

    Planning Sciences --

    GentiumPlatinum Technology --ProdeaBeacon, Forest & Trees

    SAS Institute -- SAS/EIS,OLAP++

    Speedware -- Media

    Other Warehouse RelatedProducts

  • 7/31/2019 Krithi Talk Impact

    154/169

    154

    Data extract, clean, transform,refresh

    CA-Ingres replicator

    Carleton PassportPrism Warehouse ManagerSAS Access

    Sybase Replication ServerPlatinum Inforefiner, Infopump

    Extraction and TransformationTools

  • 7/31/2019 Krithi Talk Impact

    155/169

    155

    Carleton Corporation -- PassportEvolutionary Technologies Inc. -- Extract

    Informatica -- OpenBridge

    Information Builders Inc. -- EDA Copy Manager

    Platinum Technology -- InfoRefiner

    Prism Solutions -- Prism Warehouse Manager

    Red Brick Systems -- DecisionScape Formation

    Scrubbing Tools

  • 7/31/2019 Krithi Talk Impact

    156/169

    156

    Apertus -- Enterprise/IntegratorVality -- IPEPostal Soft

    Warehouse Products

  • 7/31/2019 Krithi Talk Impact

    157/169

    157

    Computer Associates -- CA-IngresHewlett-Packard -- Allbase/SQLInformix -- Informix, Informix XPS

    Microsoft -- SQL ServerOracle -- Oracle7, Oracle Parallel ServerRed Brick -- Red Brick Warehouse

    SAS Institute -- SASSoftware AG -- ADABASSybase -- SQL Server, IQ, MPP

    Warehouse Server Products

  • 7/31/2019 Krithi Talk Impact

    158/169

    158

    Oracle 8Informix

    Online Dynamic ServerXPS --Extended Parallel ServerUniversal Server for object relational

    applicationsSybase

    Adaptive Server 11.5Sybase MPPSybase IQ

    Warehouse Server Products

  • 7/31/2019 Krithi Talk Impact

    159/169

    159

    Red Brick WarehouseTandem NonstopIBM

    DB2 MVSUniversal ServerDB2 400

    Teradata

    Other Warehouse RelatedProducts

  • 7/31/2019 Krithi Talk Impact

    160/169

    160

    Connectivity to SourcesApertusInformation Builders EDA/SQL

    Platimum InfohubSAS ConnectIBM Data Joiner

    Oracle Open ConnectInformix Express Gateway

    Other Warehouse RelatedProducts

  • 7/31/2019 Krithi Talk Impact

    161/169

    161

    Query/Reporting EnvironmentsBrio/QueryCognos Impromptu

    Informix ViewpointCA Visual ExpressBusiness Objects

    Platinum Forest and Trees

    4GL's, GUI Builders, and PCDatabases

  • 7/31/2019 Krithi Talk Impact

    162/169

    162

    Information Builders -- FocusLotus -- ApproachMicrosoft -- Access, Visual BasicMITI -- SQR/WorkbenchPowerSoft -- PowerBuilder

    SAS Institute -- SAS/AF

    Data Mining Products

  • 7/31/2019 Krithi Talk Impact

    163/169

    163

    DataMind -- neurOagentInformation Discovery -- IDISSAS Institute -- SAS/Neuronets

    Data Warehouse

  • 7/31/2019 Krithi Talk Impact

    164/169

    164

    W.H. Inmon, Building the DataWarehouse, Second Edition, John Wileyand Sons, 1996W.H. Inmon, J. D. Welch, Katherine L.Glassey, Managing the Data Warehouse,John Wiley and Sons, 1997Barry Devlin, Data Warehouse from

    Architecture to Implementation, AddisonWesley Longman, Inc 1997

    Data Warehouse

  • 7/31/2019 Krithi Talk Impact

    165/169

    165

    W.H. Inmon, John A. Zachman, JonathanG. Geiger, Data Stores Data Warehousingand the Zachman Framework, McGraw HillSeries on Data Warehousing and DataManagement, 1997Ralph Kimball, The Data WarehouseToolkit, John Wiley and Sons, 1996

    OLAP and DSS

  • 7/31/2019 Krithi Talk Impact

    166/169

    166

    Erik Thomsen, OLAP Solutions, John Wileyand Sons 1997Microsoft TechEd Transparencies fromMicrosoft TechEd 98Essbase Product LiteratureOracle Express Product LiteratureMicrosoft Plato Web SiteMicrostrategy Web Site

    Data Mining

  • 7/31/2019 Krithi Talk Impact

    167/169

    167

    Michael J.A. Berry and Gordon Linoff, DataMining Techniques, John Wiley and Sons1997Peter Adriaans and Dolf Zantinge, DataMining, Addison Wesley Longman Ltd.1996KDD Conferences

    Other Tutorials

  • 7/31/2019 Krithi Talk Impact

    168/169

    168

    Donovan Schneider, Data Warehousing Tutorial,Tutorial at International Conference forManagement of Data (SIGMOD 1996) andInternational Conference on Very Large Data

    Bases 97Umeshwar Dayal and Surajit Chaudhuri, DataWarehousing Tutorial at International Conferenceon Very Large Data Bases 1996Anand Deshpande and S. Seshadri, Tutorial onDatawarehousing and Data Mining, CSI-97

    Useful URLs

  • 7/31/2019 Krithi Talk Impact

    169/169

    Ralph Kimballs home page http://www.rkimball.com

    Larry Greenfields Data WarehouseInformation Center

    http://pwp.starnetinc.com/larryg/

    Data Warehousing Institutehttp://www.dw-institute.com/

    OLAP Council

    http://www.rkimball.com/http://pwp.starnetinc.com/larryg/http://www.dw-institute.com/http://www.dw-institute.com/http://www.dw-institute.com/http://www.dw-institute.com/http://www.dw-institute.com/http://pwp.starnetinc.com/larryg/http://www.rkimball.com/