dwh overview

Upload: svdontha

Post on 06-Jul-2018

231 views

Category:

Documents


2 download

TRANSCRIPT

  • 8/17/2019 DWH Overview

    1/35

    An Introduction toAn Introduction to

    Data WarehousingData Warehousing

    • Adil SiddiquiAdil Siddiqui• [email protected]@tcs.com

  • 8/17/2019 DWH Overview

    2/35

  • 8/17/2019 DWH Overview

    3/35

    O%&ectivesO%&ectives

    • At the end of this lesson' (ou will )now *At the end of this lesson' (ou will )now *+ What is Data WarehousingWhat is Data Warehousing

    + !he evolution of Data Warehousing !he evolution of Data Warehousing+ Need for Data WarehousingNeed for Data Warehousing+ O !" #s Warehouse ApplicationsO !" #s Warehouse Applications+

    Data marts #s Data WarehousesData marts #s Data Warehouses

    + Operational Data $toresOperational Data $tores+ Overview of Warehouse ArchitectureOverview of Warehouse Architecture

  • 8/17/2019 DWH Overview

    4/35

    What is a Data Warehouse ,What is a Data Warehouse ,

    A data warehouse is a A data warehouse is a subject-oriented,sub

    ject-oriented, integrated,inte

    grated, nonvolatile,nonvolatile, time-variant time-variant collection of data in support ofcollection of data in support of

    management's decisions.management's decisions.

    - WH Inmon- WH Inmon

    WH Inmon - Regarded As Father Of Data WarehousingWH Inmon - Regarded As Father Of Data Warehousing

    Data stored forhistorical period.Data is populated inthe data warehouseon daily/weeklybasis dependingupon there uirement.

    Data stored forhistorical period.Data is populated inthe data warehouseon daily/weekly

    basis dependingupon therequirement.

    an I see creditreport fromAccounts! Salesfrom marketingand open orderreport from orderentry for thiscustomer

    an I see creditreport fromAccounts! Salesfrom marketingand open orderreport from orderentry for thiscustomer

    Data frommultiplesources isintegrated fora sub"ect

    Data from

    multiplesources isintegrated fora sub"ect

    Identicalqueries will gi#esame results atdi$erent times.Supportsanalysisrequiring

    historical data

    Identicalqueries will gi#esame results atdi$erent times.Supportsanalysisrequiringhistorical data

  • 8/17/2019 DWH Overview

    5/35

    $u%&ect-Oriented-$u%&ect-Oriented-Characteristics of a DataCharacteristics of a DataWarehouseWarehouse

    uotes

    eads

    Orders

    "rospects

    Operational

    DataWarehouse

    Customers "roducts

    Regions !ime

    Focus is on Subject Areas rather than ApplicationsFocus is on Subject Areas rather than Applications

  • 8/17/2019 DWH Overview

    6/35

    Non-volatile -Non-volatile -

    Characteristics of a DataCharacteristics of a DataWarehouseWarehouse

    Operational DataWarehouse

    replace change

    insert

    changeinsert

    delete load

    read onl(access

    Integrated Vie Is !he "ssence Of A Data WarehouseIntegrated Vie Is !he "ssence Of A Data Warehouse

  • 8/17/2019 DWH Overview

    7/35

    !ime #ariant - !ime #ariant -

    Characteristics of a DataCharacteristics of a DataWarehouseWarehouse

    Operational DataWarehouse

    Current #alue data• time hori.on * /0-10 da(s• )e( ma( not have element of

    time

    $napshot data• time hori.on * 2-30 (ears• )e( has an element of time• data warehouse storeshistorical data

    Data Warehouse !#picall# Spans Across !imeData Warehouse !#picall# Spans Across !ime

  • 8/17/2019 DWH Overview

    8/35

  • 8/17/2019 DWH Overview

    9/35

  • 8/17/2019 DWH Overview

    10/35

    Evolution of DataEvolution of Data

    WarehousingWarehousing1960 - 1985 : MIS Era

    • Unfriendly

    • Slow

    • Dependent on IS programmers

    • Inflexible

    • Analysis limited to defined reports

    Focus on ReportingFocus on Reporting

  • 8/17/2019 DWH Overview

    11/35

    Evolution of DataEvolution of Data

    WarehousingWarehousing1985 - 1990 : Querying Era

    • Adhoc, unstructured access to corporate data

    • SQL as interface not scalable

    • annot handle complex analysis

    Focus on Online $uer#ingFocus on Online $uer#ing

  • 8/17/2019 DWH Overview

    12/35

    Evolution of DataEvolution of Data

    WarehousingWarehousing1990 - 20xx : Analysis Era

    • !rend Analysis

    • "hat If #

    • $o%ing A%erages• ross Dimensional omparisons

    • Statistical profiles

    • Automated pattern and rule disco%ery

    Focus on Online Anal#sisFocus on Online Anal#sis

  • 8/17/2019 DWH Overview

    13/35

    Need for Data WarehousingNeed for Data Warehousing• 5etter %usiness intelligence for end-users5etter %usiness intelligence for end-users• Reduction in time to locate' access' andReduction in time to locate' access' and

    anal(.e informationanal(.e information

    • Consolidation of disparate information sourcesConsolidation of disparate information sources• $trategic advantage over competitors$trategic advantage over competitors• 6aster time-to-mar)et for products and6aster time-to-mar)et for products and

    servicesservices• Replacement of older' less-responsive decisionReplacement of older' less-responsive decision

    support s(stemssupport s(stems• Reduction in demand on 7$ to generate reportsReduction in demand on 7$ to generate reports

  • 8/17/2019 DWH Overview

    14/35

    O !" #s WarehouseO !" #s Warehouse

    Operational SystemOperational System Data WarehouseData Warehouse Transaction ProcessingTransaction Processing Query ProcessingQuery Processing

    Time SensitiveTime Sensitive History OrientedHistory Oriented

    Operator ViewOperator View Managerial ViewManagerial View

    Organized y transactionsOrganized y transactions!Order" #nput" #nventory$!Order" #nput" #nventory$

    Organized y su %ect !&ustomer"Organized y su %ect !&ustomer"Product$Product$

    'elatively smaller data ase'elatively smaller data ase

    (arge data ase size(arge data ase size

    Many concurrent usersMany concurrent users 'elatively )ew concurrent users'elatively )ew concurrent users

    Volatile DataVolatile Data *on Volatile Data*on Volatile Data

    Stores all dataStores all data Stores relevant dataStores relevant data

    *ot +le,i le*ot +le,i le +le,i le+le,i le

  • 8/17/2019 DWH Overview

    15/35

    Capacit( "lanningCapacit( "lanning

    P r o c e s s

    i n g P o w e r

    Time of day

    %rocessing &oad %ea's During the (eginning and "nd of Da#%rocessing &oad %ea's During the (eginning and "nd of Da#

  • 8/17/2019 DWH Overview

    16/35

    E8amples Of $omeE8amples Of $ome

    ApplicationsApplications !arget Mar)eting !arget Mar)etingMar)et $egmentationMar)et $egmentation5udgeting5udgeting

    Credit Rating AgenciesCredit Rating Agencies6inancial Reporting and Consolidation6inancial Reporting and Consolidation

    Mar-et .as-et /nalysis 0Mar-et .as-et /nalysis 0 POS /nalysisPOS /nalysis

    &hurn /nalysis&hurn /nalysisPro)ita ility ManagementPro)ita ility Management

    1vent trac-ing1vent trac-ing

    Manufacturers Manufacturers Manufacturers Manufacturers

    Customers Customers Customers Customers

    Retailers Retailers Retailers Retailers

    http://sheks/stuff/Data%20Warehousing%20Example%20-%20Amazon.htmhttp://sheks/stuff/Data%20Warehousing%20Example%20-%20Amazon.htmhttp://sheks/stuff/Data%20Warehousing%20Example%20-%20Amazon.htm

  • 8/17/2019 DWH Overview

    17/35

    Do we need a separateDo we need a separatedata%ase ,data%ase ,

    • O !" and data warehousing re9uire two ver(O !" and data warehousing re9uire two ver(di:erentl( con4gured s(stemsdi:erentl( con4gured s(stems

    • 7solation of "roduction $(stem from 5usiness7solation of "roduction $(stem from 5usiness7ntelligence $(stem7ntelligence $(stem• $igni4cant and highl( varia%le resource$igni4cant and highl( varia%le resource

    demands of the data warehousedemands of the data warehouse

    • Cost of dis) space no longer a concernCost of dis) space no longer a concern• "roduction s(stems not designed for 9uer("roduction s(stems not designed for 9uer(

    processingprocessing

  • 8/17/2019 DWH Overview

    18/35

    Data MartsData Marts• Enterprise wide data warehousing pro&ects have aEnterprise wide data warehousing pro&ects have a

    ver( large c(cle timever( large c(cle time• ;etting consensus %etween multiple parties ma(;etting consensus %etween multiple parties ma(

    also %e di

  • 8/17/2019 DWH Overview

    19/35

    Data MartsData Marts

    • $u%&ect or Application Oriented$u%&ect or Application Oriented5usiness #iew of Warehouse5usiness #iew of Warehouse

    + uic) $olution to a speci4c 5usinessuic) $olution to a speci4c 5usiness"ro%lem"ro%lem

    + 6inance' Manufacturing' $ales etc>6inance' Manufacturing' $ales etc>

    + $maller amount of data used for$maller amount of data used forAnal(tic "rocessingAnal(tic "rocessing

    A &ogical Subset of !he )omplete Data WarehouseA &ogical Subset of !he )omplete Data Warehouse

  • 8/17/2019 DWH Overview

    20/35

    Data Warehouses or DataData Warehouses or DataMartsMarts

    %or companies interested in changing their corporate%or companies interested in changing their corporatecultures or integrating separate departments! ancultures or integrating separate departments! an

    enterpriseenterprise

    wide approach makes sense.wide approach makes sense.

    ompanies that want a quick solution to a speci&companies that want a quick solution to a speci&c

    businessbusiness

    problem are better ser#ed by a standalone data mart.problem are better ser#ed by a standalone data mart.

    Some companies opt to build a warehouseSome companies opt to build a warehouse

    incrementally!incrementally!

    data mart by data mart.data mart by data mart.

    A &ogical Subset of !he )omplete Data WarehouseA &ogical Subset of !he )omplete Data Warehouse

  • 8/17/2019 DWH Overview

    21/35

    Data Warehouse and DataData Warehouse and Data

    MartMart DataDataWarehouseWarehouse

    Data MartsData Marts

    $cope$cope • Application NeutralApplication Neutral• Centrali.ed' $haredCentrali.ed' $hared• CrossCross

    O5=enterpriseO5=enterprise

    • $peci4c$peci4cApplicationApplicationRe9uirementRe9uirement• O5'O5'departmentdepartment• 5usiness5usiness"rocess Oriented"rocess Oriented

    DataData"erspe"erspectivective

    • ?istorical Detailed?istorical Detaileddatadata• $ome summar($ome summar(

    • Detailed @someDetailed @somehistor(histor(• $ummari.ed$ummari.ed

    $u%&ect$u%&ect • Multiple su%&ectMultiple su%&ectareasareas • $ingle "artial$ingle "artialsu%&ectsu%&ect

  • 8/17/2019 DWH Overview

    22/35

    Data Warehouse and DataData Warehouse and Data

    MartMart DataDataWarehouseWarehouse

    Data MartsData Marts

    DataData$ources$ources • Man(Man(• Operational= E8ternalOperational= E8ternalDataData

    • 6ew6ew• Operational'Operational'e8ternal datae8ternal data

    7mplement7mplement !ime !ime6rame6rame

    • 1-3B months for 4rst1-3B months for 4rststagestage

    • Multiple stageMultiple stageimplementationimplementation

    • -3 months-3 months

    CharacterisCharacteristicstics

    • 6le8i%le' e8tensi%le6le8i%le' e8tensi%le• Dura%le=$trategicDura%le=$trategic• Data orientationData orientation

    • Restrictive' nonRestrictive' none8tensi%lee8tensi%le• $hort life=tactical$hort life=tactical• "ro&ect"ro&ect

  • 8/17/2019 DWH Overview

    23/35

    Warehouse or Mart 6irst ,Warehouse or Mart 6irst ,

    DataData Warehouse +irstWarehouse +irst Data Mart )irstData Mart )irst E8pensiveE8pensive Relativel( cheapRelativel( cheap

    arge development c(clearge development c(cle Delivered in / monthsDelivered in / months

    Change management isChange management isdi

  • 8/17/2019 DWH Overview

    24/35

    O !" $(stems #s DataO !" $(stems #s Data

    WarehouseWarehouse Remember Between OLTP and Data Warehouse systems

    users are different

    data content is different,

    data structures are different

    hardware is different *nderstanding !he Differences Is !he +e#*nderstanding !he Differences Is !he +e#

  • 8/17/2019 DWH Overview

    25/35

    Operational Data $tore -Operational Data $tore -De4nitionDe4nition

    &

    A

    'perational

    DSS

    Data"arehouse

    'DS

  • 8/17/2019 DWH Overview

    26/35

    Operational Data $tore - De4nitionOperational Data $tore - De4nition

    AA sub"ect orientedsub "ect oriented !! integratedinte grated !!#olatile#olatile !! current #aluedcurrent #alued datadata

    store containing only corporatestore containing only corporatedetailed datadetailed data

    Data stored only or!urrent "eriod# $ld

    Data is eit%erar!%i&ed or 'o&ed to

    Data (are%ouse

    )an I see !reditre"ort ro'

    A!!ounts* Salesro' 'ar+eting

    and o"en order

    re"ort ro'order entry ort%is !usto'er

    Identi!al ,ueries 'aygi&e di erent results

    at di erent ti'es#Su""orts analysisre,uiring !urrent

    data

    Data ro' 'ulti"lesour!es is integrated

    or a su .e!t

  • 8/17/2019 DWH Overview

    27/35

    Operational Data $toreOperational Data $tore

    • !he OD$ applies onl( to the world of !he OD$ applies onl( to the world ofoperational s(stems>operational s(stems>

    • !he OD$ contains current valued and !he OD$ contains current valued andnear current valued data>near current valued data>• !he OD$ contains almost e8clusivel( !he OD$ contains almost e8clusivel(

    all detail dataall detail data• !he OD$ re9uires a full function' !he OD$ re9uires a full function'update' record oriented environment>update' record oriented environment>

  • 8/17/2019 DWH Overview

    28/35

    Operational Data $toreOperational Data $tore• 6unctions of an OD$6unctions of an OD$

    + Converts Data'Converts Data'+ Decides Which Data of Multiple $ources 7s theDecides Which Data of Multiple $ources 7s the

    5est'5est'+ $ummari.es Data'$ummari.es Data'+ Decodes=encodes Data'Decodes=encodes Data'+ Alters the Fe( $tructures'Alters the Fe( $tructures'+ Alters the "h(sical $tructures'Alters the "h(sical $tructures'+ Reformats Data'Reformats Data'+ 7nternall( Represents Data'7nternall( Represents Data'+ Recalculates Data>Recalculates Data>

  • 8/17/2019 DWH Overview

    29/35

    Di:erent )inds ofDi:erent )inds of

    7nformation Needs7nformation Needs• CurrentCurrent

    • RecentRecent

    • ?istorical?istorical

    • CurrentCurrent

    • RecentRecent

    • ?istorical?istorical

    Is t%is 'edi!ine a&aila lein sto!+

    (%at are t%e tests t%is"atient %as !o'"leted so

    ar

    /as t%e in!iden!e ou er!ulosis in!reased in

    last 5 years in Sout%ernregion

  • 8/17/2019 DWH Overview

    30/35

    O !" #s OD$ #s DW?O !" #s OD$ #s DW?&haracte&haracteristicristic

    O(TPO(TP ODSODS DataDataWarehouseWarehouse

    /udience /udience OperatingOperatingPersonnelPersonnel

    /nalysts /nalysts Managers andManagers andanalystsanalysts

    Data accessData access #ndividual#ndividualrecords"records"transactiontransactiondrivendriven

    #ndividual#ndividualrecords"records"transaction ortransaction oranalysis drivenanalysis driven

    Set o) records"Set o) records"analysis drivenanalysis driven

    Data contentData content &urrent" real0&urrent" real0timetime &urrent and&urrent andnear0currentnear0current HistoricalHistorical

    DataDataStructureStructure

    DetailedDetailed Detailed andDetailed andlightlylightlysummarizedsummarized

    Detailed andDetailed andSummarizedSummarized

    DataDataor anizationor anization

    +unctional+unctional Su %ect0orientedSu %ect0oriented Su %ect0orientedSu %ect0oriented

  • 8/17/2019 DWH Overview

    31/35

    O !" #s OD$ #s DW?O !" #s OD$ #s DW?&haracteristic&haracteristic O(TPO(TP ODSODS DataDataWarehouseWarehouse Data redundancyData redundancy *on0redundant within*on0redundant within

    system2 3nmanagedsystem2 3nmanagedredundancy amongredundancy amongsystemssystems

    SomewhatSomewhatredundant withredundant withoperationaloperationaldata asesdata ases

    Managed redundancyManaged redundancy

    Data updateData update +ield y )ield+ield y )ield +ield y )ield+ield y )ield &ontrolled atch&ontrolled atch

    Data ase sizeData ase size ModerateModerate ModerateModerate (arge to very large(arge to very large

    DevelopmentDevelopmentMethodology Methodology

    'e4uirements driven"'e4uirements driven"structuredstructured

    Data driven"Data driven"somewhatsomewhatevolutionaryevolutionary

    Data driven"Data driven"evolutionaryevolutionary

    PhilosophyPhilosophy Support day0to0daySupport day0to0dayoperationoperation

    Support day0to0Support day0to0day decisions 5day decisions 5operationaloperationalactivitiesactivities

    Support managing theSupport managing theenterpriseenterprise

  • 8/17/2019 DWH Overview

    32/35

    !(pical Data Warehouse !(pical Data Warehouse

    ArchitectureArchitecture

    OperationalSystems6Data

    SelectExtract

    Transform

    Integrate

    Maintain

    DataPreparation

    Middleware6 /P#

    DataWarehouse

    Metadata

    1#S 6DSS

    Query Tools

    O(/P6'O(/P

    We .rowsers

    Data Mining

    DataMarts

    ,ulti-tiered Data Warehouse ithout ODS,ulti-tiered Data Warehouse ithout ODS

    ( l h

  • 8/17/2019 DWH Overview

    33/35

    !(pical Data Warehouse!(pical Data WarehouseArchitectureArchitecture

    OperationalSystems6Data

    Select

    Extract

    Transform

    Integrate

    Maintain

    DataPreparation

    DataMarts

    DataWarehouse

    Metadata

    ODS

    Metadata

    Select

    Extract

    Transform

    Load

    DataPreparation

    ,ulti-tiered Data Warehouse ith ODS,ulti-tiered Data Warehouse ith ODS

  • 8/17/2019 DWH Overview

    34/35

  • 8/17/2019 DWH Overview

    35/35

    Thank You