90300_633579030311875000

Upload: puneet-garg

Post on 05-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 90300_633579030311875000

    1/18

    Data Warehousing/Mining 1

    Data Warehousing/Mining

    Introduction

  • 8/2/2019 90300_633579030311875000

    2/18

    Data Warehousing/Mining 2

    Outline of Lecture

    Brief History of Data Warehousing

    What is a Data Warehouse?

    Need For Strategic Information Information Crisis

    Operational and Decision Support System

    Difference B/W standard DB and Datawarehouse

  • 8/2/2019 90300_633579030311875000

    3/18

    Data Warehousing/Mining 3

    Data Warehouse Evolution

    TIME

    2000199519801960 1975

    Information-

    Based

    Management

    Data

    RevolutionMiddle

    Ages

    Prehistoric

    Times

    Relational

    Databases

    PCs and

    SpreadsheetsEnd-user

    Interfaces

    1st DW

    Article

    DW

    Confs.

    Vendor DW

    Frameworks

    Company

    DWs

    Building theDW

    Inmon (1992)

    Data ReplicationTools

    1985 1990

  • 8/2/2019 90300_633579030311875000

    4/18

    Data Warehousing/Mining 4

    Escalating Need For Strategic Information

    Organizations need information to formulatethe business strategies,establish Goals,set

    Objectivese.g. Increase the customer by 10% over the next 5 years

    Gain market share by 15% in the next 2 years

    Increase product quality levels in the top five productgroups

  • 8/2/2019 90300_633579030311875000

    5/18

    Data Warehousing/Mining 5

    The Information Crisis

    Information is said to be doubled every 18months

    Organizations have tons of data availableThen why information Crisis?

    Why cant organizations convert the data into

    useful information for strategic decisionmaking?

  • 8/2/2019 90300_633579030311875000

    6/18

    Data Warehousing/Mining 6

    Problem: Heterogeneous InformationSources

    Heterogeneities are everywhere

    Different interfaces

    Different data representations

    Diverse structure of databases

    Duplicate and inconsistent information

    PersonalDatabases

    Digital LibrariesScientific Databases WorldWide

    Web

  • 8/2/2019 90300_633579030311875000

    7/18Data Warehousing/Mining 7

    About Some Definitions

    What is data?

    What is information?

    What is Warehouse?

  • 8/2/2019 90300_633579030311875000

    8/18Data Warehousing/Mining 8

    What is a Data Warehouse?A Practitioners Viewpoint

    A data warehouse is simply a single, complete,and consistent store of data obtained from avariety of sources and made available to endusers in a way they can understand and use itin a business context.

    -- Barry Devlin, IBM Consultant

  • 8/2/2019 90300_633579030311875000

    9/18Data Warehousing/Mining 9

    A Data Warehouse is...

    Stored collection of diverse data A solution to data integration problem

    Single repository of information

    Subject-oriented Organized by subject, not by application

    Used for analysis, data mining, etc.

    Large volume of data (Gb, Tb) Non-volatile

    Historical

    Time attributes are important

  • 8/2/2019 90300_633579030311875000

    10/18Data Warehousing/Mining 10

    A Data Warehouse is... (continued)

    Updates infrequent

    Examples All transactions EVER at WalMart

    Complete client histories at insurance firm

    Stockbroker financial information and portfolios

  • 8/2/2019 90300_633579030311875000

    11/18Data Warehousing/Mining 11

    Summary

    Operational Systems

    Data Warehouse

    Population

    Data

    Warehouse

    Business Information

    Interface

  • 8/2/2019 90300_633579030311875000

    12/18Data Warehousing/Mining 12

    What is Operational and DecisionSupport System

    Operational Systems

    Making the wheels of Business Turn

    Take an order Process a claim

    Make shipment

    Generate an invoice

    Receive cash

    Reserve an airline seat

  • 8/2/2019 90300_633579030311875000

    13/18

    Data Warehousing/Mining 13

    Decision Support System

    Watching the wheels of business turn

    Show the top selling products Show the problem regions

    Tell me why (drill down)

    Let me see other data (drill across)

    Alert me when a district sells below target

    What is Operational and DecisionSupport System (Contd)

  • 8/2/2019 90300_633579030311875000

    14/18

    Data Warehousing/Mining 14

    Difference

    OperationalCurrent Values

    Optimized fortransaction

    High

    Read, update, delete

    Predictable, repetitive

    Sub seconds

    Large Number

    InformationalArchived, derived, optimized

    Optimized for complexqueries

    Medium to Low

    Read

    Ad hoc, random, Heuristic

    Several Seconds to Minutes

    Relatively Small number

    Data Content

    Data Structure

    AccessFrequency

    Access Type

    UsageResponse Time

    Users

  • 8/2/2019 90300_633579030311875000

    15/18

    Data Warehousing/Mining 15

    Warehouse is a Specialized DB

    Standard DB Mostly updates

    Many small transactions

    Mb - Gb of data

    Current snapshot

    Index/hash on p.k.

    Raw data Thousands of users (e.g.,

    clerical users)

    Warehouse Mostly reads

    Queries are long and complex

    Gb - Tb of data

    History

    Lots of scans

    Summarized, reconciled data Hundreds of users (e.g.,

    decision-makers, analysts)

  • 8/2/2019 90300_633579030311875000

    16/18

    Data Warehousing/Mining 16

    Warehousing and Industry

    Warehousing is big business $2 billion in 1995

    $3.5 billion in early 1997 About $8 billion in 1998 [Metagroup]

    WalMart has largest warehouse 900-CPU, 2,700 disk, 23 TB Teradata system

    ~7TB in warehouse

    40-50GB per day

  • 8/2/2019 90300_633579030311875000

    17/18

    Data Warehousing/Mining 17

    Data Warehousing: Two DistinctIssues

    (1) How to get information into warehouse

    Data warehousing

    (2) What to do with data once its in warehouseWarehouse DBMS

    Both rich research areas

    Industry has focused on (2)

  • 8/2/2019 90300_633579030311875000

    18/18

    Data Warehousing/Mining 18

    Thank You Very Much