data processing & warehousing concepts

Upload: pcash-ad

Post on 07-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Data Processing & Warehousing Concepts

    1/26

    Data Processing

    Data Warehousing

  • 8/6/2019 Data Processing & Warehousing Concepts

    2/26

    Data Processing

    act of handling or manipulating data

    assign meaning to data

    ultimate goal transform data into information

    Definition: Data processing is the process through which facts

    and figures are collected, assigned meaning,

    communicated to others and retained for future use.Hence we can define data processing as a series ofactions or operations that converts data into usefulinformation.

  • 8/6/2019 Data Processing & Warehousing Concepts

    3/26

    Data Processing Activities

  • 8/6/2019 Data Processing & Warehousing Concepts

    4/26

    Data Processing Cycle

  • 8/6/2019 Data Processing & Warehousing Concepts

    5/26

    Data Warehouse concept

    Distinction between data and information

    DATA: composed of observable and recordable

    facts Data only comes as valuable fact to end user

    when organized and presented as information

    INFORMATION: integrated collections of factsand used as a basis for decision making

  • 8/6/2019 Data Processing & Warehousing Concepts

    6/26

    Data warehouse definitions

    The data warehouse is that portion of an

    overall Architected Data Environment that

    serves as the single integrated source of data

    for processing information. The data

    warehouse has specific characteristics that

    include the following:

  • 8/6/2019 Data Processing & Warehousing Concepts

    7/26

    The data warehouse has specificcharacteristics that include the following:

    Subject-Oriented

    Integrated

    Non-Volatile

    Time-Variant

    Accessible

    Process-Oriented

  • 8/6/2019 Data Processing & Warehousing Concepts

    8/26

    Subject Oriented

    Information is presented according to specificsubjects or areas of interest, not simply ascomputer files.

    Data is manipulated to provide informationabout a particular subject.

    For example, the registrars record of student

    DB is not simply made accessible to end-users,but is provided structure and organizedaccording to the specific needs.

  • 8/6/2019 Data Processing & Warehousing Concepts

    9/26

    Integrated

    single source of information to understand

    multiple areas of interest.

    The data warehouse provides one-stopshopping and contains information about a

    variety of subjects.

  • 8/6/2019 Data Processing & Warehousing Concepts

    10/26

    Non-Volatile

    Stable information that doesnt change each

    time an operational process is executed.

    Information is consistent regardless of whenthe warehouse is accessed.

  • 8/6/2019 Data Processing & Warehousing Concepts

    11/26

    Time Variant

    Containing a history of the subject, as well as

    current information.

    Historical information is an importantcomponent of a data warehouse.

  • 8/6/2019 Data Processing & Warehousing Concepts

    12/26

    Accessible

    The primary purpose of a data warehouse is to

    provide readily accessible information to end-

    users.

  • 8/6/2019 Data Processing & Warehousing Concepts

    13/26

    View data warehousing as a process for

    delivery of information.

    maintenance ongoing and iterative in nature.

  • 8/6/2019 Data Processing & Warehousing Concepts

    14/26

    Some Definitions

    Data Warehouse: A data structure that is optimized for distribution. It collects and

    stores integrated sets of historical data from multipleoperational systems and feeds them to one or more data marts.It may also provide end-user access to support enterprise viewsof data.

    Data Mart: A data structure that is optimized for access. It is designed to

    facilitate end-user analysis of data. It typically supports a single,analytic application used by a distinct set of workers.

    Staging Area: Any data store that is designed primarily to receive data into a

    warehousing environment.

  • 8/6/2019 Data Processing & Warehousing Concepts

    15/26

    OperationalData Store: A collection of data that addresses operational needs of various

    operational units. It is not a component of a data warehousingarchitecture, but a solution to operational needs.

    OLAP (On-Line Analytical Processing): A method by which multidimensional analysis occurs.

    Multidimensional Analysis: The ability to manipulate information by a variety of relevant

    categories or dimensions to facilitate analysis andunderstanding of the underlying data. It is also sometimesreferred to as drilling-down, drilling-across and slicing anddicing

    Hypercube: A means of visually representing multidimensional data.

  • 8/6/2019 Data Processing & Warehousing Concepts

    16/26

    Star Schema: A means of aggregating data based on a set of known dimensions. It stores

    data multidimensionally in a two dimensional Relational DatabaseManagement System (RDBMS), such as Oracle.

    Snowflake Schema:

    An extension of the star schema by means of applying additional dimensionsto the dimensions of a star schema in a relational environment.

    MultidimensionalDatabase: Also known as MDDB or MDDBS. A class of proprietary, non-relational

    database management tools that store and manage data in amultidimensional manner, as opposed to the two dimensions associated withtraditional relational database management systems.

    OLAP Tools: A set of software products that attempt to facilitate multidimensional

    analysis. Can incorporate data acquisition, data access, data manipulation, orany combination thereof.

  • 8/6/2019 Data Processing & Warehousing Concepts

    17/26

    Operational Data Store v/s Data

    Warehouse

    Operational Data Store Data Warehouse

    Characteristics: Data Focused IntegrationFrom Transaction ProcessingFocused Systems

    Subject OrientedIntegratedNon-VolatileTime Variant

    Age Of The Data: Current, Near Term(Today, Last Weeks)

    Historic(Last Month, Qtrly, FiveYears)

    Primary Use: Day-To-Day DecisionsTactical Reporting

    Current Operational Results

    Long-Term DecisionsStrategic Reporting

    Trend DetectionFrequency Of Load: Twice Daily , Daily, Weekly Weekly, Monthly, Quarterly

  • 8/6/2019 Data Processing & Warehousing Concepts

    18/26

    The Data Warehousing Process Part

    1

    Determine Informational Requirements

    Identify and analyze existing informationalcapabilities.

    Identify from key users the significant businessquestions and key metrics that the target user. groupregards as their most important requirements forinformation.

    Decompose these metrics into their component partswith specific definitions.

    Map the component parts to the informational modeland systems of record.

  • 8/6/2019 Data Processing & Warehousing Concepts

    19/26

    The Data Warehousing Process Part

    2

    Evolutionary and Iterative Development Process

    When you begin to develop your first datawarehouse increment, the architecture is new and

    fresh. With the second and subsequent increments,the following is true:

    Start with one subject area (or subset or superset) and onetarget user group.

    Continue and add subject areas, user groups andinformational capabilities.

    Improved and learned from what was in previousincrements

  • 8/6/2019 Data Processing & Warehousing Concepts

    20/26

    Improvements are made from what was learned

    about warehouse operation and support.

    The technical environment may have changed.

    Results are seen very quickly after each iteration.

    The end user requirements are refined after each

    iteration.

  • 8/6/2019 Data Processing & Warehousing Concepts

    21/26

    evolutionary/iterative process that

    follows a spiral pattern

  • 8/6/2019 Data Processing & Warehousing Concepts

    22/26

    Population Process

    A data warehouse is populated through a seriesof steps that

    1) Remove data from the source environment

    (extract). 2) Change the data to have desired warehouse

    characteristics like subject-orientation and time-variance (transform).

    3)

    Place the data into a target environment (load).

    This process is represented by the acronym ETLfor Extract, Transform and Load.

  • 8/6/2019 Data Processing & Warehousing Concepts

    23/26

    Complexity of transformation and

    integration Requires change in technology

    Reformatted

    Cleansed

    Multiple input sources exists

    Default values needed

    Summarization needs to be done

    Data format conversion

    Massive volumes of input

    Perhaps the worst of all: Data relationships that have been built into old legacy program

    logic must be understood and unraveled before those files canbe used as input.

  • 8/6/2019 Data Processing & Warehousing Concepts

    24/26

    Data Warehouse Tools/Softwares

  • 8/6/2019 Data Processing & Warehousing Concepts

    25/26

    Future Trends In Data Warehouse

    Increased Data Mining Exploration

    Prove Hypothesis

    Increase Competitive Advantage

    (i.e., Identify Cross-sellingOpportunities)

    Integration into Supply Chain & e-Business

  • 8/6/2019 Data Processing & Warehousing Concepts

    26/26

    SubjectOriented

    Integrated

    Time Variant

    Non-volatile

    Summary

    Data Warehouse Concepts

    A Data Warehouse Is A Structured Repositoryof Historic Data.

    It Is:

    It Contains: Business Specified Data,

    To Answer Business Questions