data processing & warehousing concepts
TRANSCRIPT
-
8/6/2019 Data Processing & Warehousing Concepts
1/26
Data Processing
Data Warehousing
-
8/6/2019 Data Processing & Warehousing Concepts
2/26
Data Processing
act of handling or manipulating data
assign meaning to data
ultimate goal transform data into information
Definition: Data processing is the process through which facts
and figures are collected, assigned meaning,
communicated to others and retained for future use.Hence we can define data processing as a series ofactions or operations that converts data into usefulinformation.
-
8/6/2019 Data Processing & Warehousing Concepts
3/26
Data Processing Activities
-
8/6/2019 Data Processing & Warehousing Concepts
4/26
Data Processing Cycle
-
8/6/2019 Data Processing & Warehousing Concepts
5/26
Data Warehouse concept
Distinction between data and information
DATA: composed of observable and recordable
facts Data only comes as valuable fact to end user
when organized and presented as information
INFORMATION: integrated collections of factsand used as a basis for decision making
-
8/6/2019 Data Processing & Warehousing Concepts
6/26
Data warehouse definitions
The data warehouse is that portion of an
overall Architected Data Environment that
serves as the single integrated source of data
for processing information. The data
warehouse has specific characteristics that
include the following:
-
8/6/2019 Data Processing & Warehousing Concepts
7/26
The data warehouse has specificcharacteristics that include the following:
Subject-Oriented
Integrated
Non-Volatile
Time-Variant
Accessible
Process-Oriented
-
8/6/2019 Data Processing & Warehousing Concepts
8/26
Subject Oriented
Information is presented according to specificsubjects or areas of interest, not simply ascomputer files.
Data is manipulated to provide informationabout a particular subject.
For example, the registrars record of student
DB is not simply made accessible to end-users,but is provided structure and organizedaccording to the specific needs.
-
8/6/2019 Data Processing & Warehousing Concepts
9/26
Integrated
single source of information to understand
multiple areas of interest.
The data warehouse provides one-stopshopping and contains information about a
variety of subjects.
-
8/6/2019 Data Processing & Warehousing Concepts
10/26
Non-Volatile
Stable information that doesnt change each
time an operational process is executed.
Information is consistent regardless of whenthe warehouse is accessed.
-
8/6/2019 Data Processing & Warehousing Concepts
11/26
Time Variant
Containing a history of the subject, as well as
current information.
Historical information is an importantcomponent of a data warehouse.
-
8/6/2019 Data Processing & Warehousing Concepts
12/26
Accessible
The primary purpose of a data warehouse is to
provide readily accessible information to end-
users.
-
8/6/2019 Data Processing & Warehousing Concepts
13/26
View data warehousing as a process for
delivery of information.
maintenance ongoing and iterative in nature.
-
8/6/2019 Data Processing & Warehousing Concepts
14/26
Some Definitions
Data Warehouse: A data structure that is optimized for distribution. It collects and
stores integrated sets of historical data from multipleoperational systems and feeds them to one or more data marts.It may also provide end-user access to support enterprise viewsof data.
Data Mart: A data structure that is optimized for access. It is designed to
facilitate end-user analysis of data. It typically supports a single,analytic application used by a distinct set of workers.
Staging Area: Any data store that is designed primarily to receive data into a
warehousing environment.
-
8/6/2019 Data Processing & Warehousing Concepts
15/26
OperationalData Store: A collection of data that addresses operational needs of various
operational units. It is not a component of a data warehousingarchitecture, but a solution to operational needs.
OLAP (On-Line Analytical Processing): A method by which multidimensional analysis occurs.
Multidimensional Analysis: The ability to manipulate information by a variety of relevant
categories or dimensions to facilitate analysis andunderstanding of the underlying data. It is also sometimesreferred to as drilling-down, drilling-across and slicing anddicing
Hypercube: A means of visually representing multidimensional data.
-
8/6/2019 Data Processing & Warehousing Concepts
16/26
Star Schema: A means of aggregating data based on a set of known dimensions. It stores
data multidimensionally in a two dimensional Relational DatabaseManagement System (RDBMS), such as Oracle.
Snowflake Schema:
An extension of the star schema by means of applying additional dimensionsto the dimensions of a star schema in a relational environment.
MultidimensionalDatabase: Also known as MDDB or MDDBS. A class of proprietary, non-relational
database management tools that store and manage data in amultidimensional manner, as opposed to the two dimensions associated withtraditional relational database management systems.
OLAP Tools: A set of software products that attempt to facilitate multidimensional
analysis. Can incorporate data acquisition, data access, data manipulation, orany combination thereof.
-
8/6/2019 Data Processing & Warehousing Concepts
17/26
Operational Data Store v/s Data
Warehouse
Operational Data Store Data Warehouse
Characteristics: Data Focused IntegrationFrom Transaction ProcessingFocused Systems
Subject OrientedIntegratedNon-VolatileTime Variant
Age Of The Data: Current, Near Term(Today, Last Weeks)
Historic(Last Month, Qtrly, FiveYears)
Primary Use: Day-To-Day DecisionsTactical Reporting
Current Operational Results
Long-Term DecisionsStrategic Reporting
Trend DetectionFrequency Of Load: Twice Daily , Daily, Weekly Weekly, Monthly, Quarterly
-
8/6/2019 Data Processing & Warehousing Concepts
18/26
The Data Warehousing Process Part
1
Determine Informational Requirements
Identify and analyze existing informationalcapabilities.
Identify from key users the significant businessquestions and key metrics that the target user. groupregards as their most important requirements forinformation.
Decompose these metrics into their component partswith specific definitions.
Map the component parts to the informational modeland systems of record.
-
8/6/2019 Data Processing & Warehousing Concepts
19/26
The Data Warehousing Process Part
2
Evolutionary and Iterative Development Process
When you begin to develop your first datawarehouse increment, the architecture is new and
fresh. With the second and subsequent increments,the following is true:
Start with one subject area (or subset or superset) and onetarget user group.
Continue and add subject areas, user groups andinformational capabilities.
Improved and learned from what was in previousincrements
-
8/6/2019 Data Processing & Warehousing Concepts
20/26
Improvements are made from what was learned
about warehouse operation and support.
The technical environment may have changed.
Results are seen very quickly after each iteration.
The end user requirements are refined after each
iteration.
-
8/6/2019 Data Processing & Warehousing Concepts
21/26
evolutionary/iterative process that
follows a spiral pattern
-
8/6/2019 Data Processing & Warehousing Concepts
22/26
Population Process
A data warehouse is populated through a seriesof steps that
1) Remove data from the source environment
(extract). 2) Change the data to have desired warehouse
characteristics like subject-orientation and time-variance (transform).
3)
Place the data into a target environment (load).
This process is represented by the acronym ETLfor Extract, Transform and Load.
-
8/6/2019 Data Processing & Warehousing Concepts
23/26
Complexity of transformation and
integration Requires change in technology
Reformatted
Cleansed
Multiple input sources exists
Default values needed
Summarization needs to be done
Data format conversion
Massive volumes of input
Perhaps the worst of all: Data relationships that have been built into old legacy program
logic must be understood and unraveled before those files canbe used as input.
-
8/6/2019 Data Processing & Warehousing Concepts
24/26
Data Warehouse Tools/Softwares
-
8/6/2019 Data Processing & Warehousing Concepts
25/26
Future Trends In Data Warehouse
Increased Data Mining Exploration
Prove Hypothesis
Increase Competitive Advantage
(i.e., Identify Cross-sellingOpportunities)
Integration into Supply Chain & e-Business
-
8/6/2019 Data Processing & Warehousing Concepts
26/26
SubjectOriented
Integrated
Time Variant
Non-volatile
Summary
Data Warehouse Concepts
A Data Warehouse Is A Structured Repositoryof Historic Data.
It Is:
It Contains: Business Specified Data,
To Answer Business Questions