lecture 9 1 bis4435 – data warehousing dr. nawaz khan e-mail: [email protected]@mdx.ac.uk
TRANSCRIPT
Lecture 9
2Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing: Reading Assignment
Reading Suggestion:
Connolly, T.M., and Begg, C.E., Database Systems: A Practical Approach to Design, Implementation and Management, Addison
Wesley, 4th Edition, ISBN: 0321210255(chapters 31-33)
Global campus materials on OASIS: http://oasis.mdx.ac.uk/ (unit 9)
More Reading: Fundamentals of Database Systems. R. Elmasri and S. B.
Navathe, 4th Edition, 2004, Addison-Wesley, ISBN 0-321-12226-7: Chapter 28
Data Warehousing, Data Mining, and OLAP, Alex Berson and Stephen J. Smith, McGraw-Hill, 1997, ISBN 0-07-006272-2: Chapters 6, 7
Lecture 9
3Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing
Outline Definition Compare with operational systems Architecture Design issues - star schema Relation with DM Summary
Lecture 9
4Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data WarehousingDefinition
What is a data warehouse?
DW is an environment + facilities
Bring scattered data query data
plant1 plant2 planti …... plantn
warehouseFinished product
…...
query/delivery
Lecture 9
5Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data WarehousingDefinition
What is a data warehouse?
FinancialDepartment
Human ResourceDepartment …...
R&DDepartment
DW
Data transformation
Access tool
users
Operationalsystems
Lecture 9
6Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing Compare with operational systems
Operational DB systems focus on day-to-day business - data structured around
events run in OLTP environment support large number of transactions require quick respond - small, focused DB
DW systems focus on business needs and requirements - data organised
around trends and patterns in events run in off-line environment support complex queries, ad hoc and static reports - based
on historical data
Lecture 9
7Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing Compare with operational systems
system
feature Operational DW
Size Small Large - history of business
Performance Speed - essential Better information
Content Small work areas Cross-functional subjects
Tools Restrict standard reporting tools
Various flexible transform/present data as intelligence
Lecture 9
8Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing Compare with operational systems
Separation of an Operational and DW System Minimises impact of reporting and complex query processing
on operational systems Preserves operational data for re-use Manages data based on time, historical data available to
users Provides a data store that can be modified to conform to the
way the user views the data Unifies data, one version
Lecture 9
9Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing Compare with operational systems
The need for DW Consistent and quality data Cost reduction More timely data access Improved performance and productivity Two distinct types of reporting still require
Operational systems derive notification style reports DW systems generate general information reports
Lecture 9
10Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data WarehousingArchitecture
Overall Architecture Operational data and processing is separated from data
warehouse processing DW is a central information repository surrounded by a
number of components - environment
Data Warehouse
Data Mart
Access Tools
InformationDelivery System
Data Transformation
Operational DataMetadata
Lecture 9
11Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data WarehousingArchitecture
The DW DW database is a cornerstone of the environment It is implemented on RDBMS technology It should support large size, ad hoc query, user view
Data Transformation Significant effort on extracting data from operational system and
putting it in a suitable format into DW system Functionality
Removing unwanted data from operational databases Converting to common data names and definitions Calculating summaries and derived data Establishing defaults for missing data Accommodating source data definition changes
Difficulties Database heterogeneity Data heterogeneity
Lecture 9
12Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
12 Rules for Data Warehouse
Data warehouse and operational environments are separated
Data are integrated Contains historical data Represent snapshot data at a given point in time Data are subject oriented Data are read-only Data warehouse life cycle is data driven Contain summarised data Read-only transactions involve Involves data transformation Meta data component is very critical Ensure optimum use of data by end users.
Lecture 9
13Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data WarehousingArchitecture
Meta Data - data that describes the data warehouse Description of the data model Description of the database design Definition of the system managing the data items A map of the data location in the DW, including its origin,
how it is transformed/aggregated, where it went Specific database design definitions Data element definitions, including rules for derivations and
summaries
Lecture 9
14Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data WarehousingArchitecture
Meta Data (cont.) Information Directory - metadata that helps users to
interactively access to DW and understand content, find data
A gateway to the DW environment Support easy distribution and replication of its content Searchable by business-oriented key words Act as launch platform for user data access and analysis
tools Support information sharing Support a variety of scheduling options Support distribution of query results Provide interface to other applications Support end-user monitoring of the status of the DW
environment
Lecture 9
15Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data WarehousingArchitecture
Access Tools
Production reporting tool - Generate regular operational reports or support high volume batch jobs
Report writer - designed for end users Managed query tool - a meta-layer between end-user and
database, provides point-click creation of SQL, formats the query results into easy-to-read reports, or on-screen presentation
Query and reporting tools
Managed query tool Reporting tool
Production reporting tool Desktop report writer
Lecture 9
16Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data WarehousingArchitecture
Access Tools (cont.) Application development tools - graphical data access
environment EIS tools - high level summarisation OLAP - multidimensional DB Data mining tools Data visualisation tools - display complex relationships and
patterns, techniques include 3-D imaging & sound, virtual reality
Lecture 9
17Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data WarehousingArchitecture
Data Marts Data store that is subsidiary to a data warehouse of
integrated data It is created for the use of a dedicated group of users for a
subject area It can be placed on the DW database In most instances, data mart is separated from the DW
database and put on a separate database server Dependent data mart - data content is from the DW Independent data mart - alternative to the DW
simple & inexpensive to build inconsistent - each has its own assumptions overlapping in data content, connectivity and management scalability problem
Lecture 9
18Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data WarehousingArchitecture
Data Marts (cont.) Data integration issue - Ralph Kimball For any two data marts, common dimensions must conform
to the equality or roll-up rule
TimePeriod Sales Products
month
week
day
Lecture 9
19Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data WarehousingArchitecture
Information Delivery System It distributes data from warehouse to other DW and end-user
products such as spreadsheets and local DBs (via Internet) Delivery is based on time or event Users receive report or an analytical view of data are not
aware of location and maintenance
Lecture 9
20Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing Design issues - star schema
Architecture Blueprint - mission, goals, objectives Logical Architecture
Enterprise Mission, Plan, Process
Data Architecture(data)
Application Architecture(tools)Technology Architecture
(hardware, software & network)
Lecture 9
21Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing Design issues - star schema
Star Schema
It is used to model the data in a DW from decision-makers view of the business and operational aspects of the business
It defines the join paths for accessing the facts of business It allows user to filter, aggregate, drill down & slice and dice
the business fact
Time Dimension
LocationDimension
Age GroupDimension
Product Dimension
OtherDimension
salesrevenues
Lecture 9
22Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing Design issues - star schema
Star Schema - 3 logical entities Measure entities – centre
Dimension entities – point
Category (detail) entities - extended from point
Lecture 9
23Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing Design issues - star schema
Measure entities – centre
Focus of the users’ query activity
Factual information -> business intelligence
Synonymous names are used - measures, analysis, indicators
Quantitative data - numerical information
Data contained in measure entities grows large over time
Month Branch Product Sales forecast Sales actual Variance199901199901199901…
ABCXYZPQR
COLACOLACOLA
200000150000125000
190000015500001050000
-1000050000-20000…
Lecture 9
24Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing Design issues - star schema
Dimension entities – point
Allow users to browse measurement data from different angles - time, location, product ...
Minimize the rows of data within a measure entity - filter
All Location
Canada France Germany USA
Eastern Area Western Area
North-east Central South-eastRegion Region Region
Location dimension
Country
Area
Region
Lecture 9
25Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing Design issues - star schema
Category (detail) entities - extended from point
Provide detailed information of a category within a dimension
Textual/qualitative information
All Clients(Dimension)
RegionClientState
….
CLIENT_KEYCOMPANY_NAMEADDRESSPOST_CODECONTRYNAMEPHONE
CLIENT DETAILS
Lecture 9
26Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing Design issues - star schema
Forming Star Schema: Star schema can be formed based on an information package, which is constructed during data gathering process
All TimePeriods
AllLocations
All Products All AgeGroups
All Econ.Classes
AllGenders
Year5
Country20
Classification8
Age Group8
Class10
Gender3
Quarter20
Area80
Group40
Month60
Region400
Product200
District2,000Store
200,000Measures/Facts:Forecast Sales, Budget Sales, Actual Sales, Forecast Variance (calc.), Budget Variance (calc.)
Dimension
Category
Lecture 9
27Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing Design issues - star schema
Forming Star Schema: Define measure entity lowest category within each dimension along with each of
the measures/facts defines measure entity give a name to reflect the business purpose put in the centre of the star schema in a rectangle box
SalesAnalysis
Lecture 9
28Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing Design issues - star schema
Forming Star Schema: Define dimension entity each column of an information package defines the
dimension entity place on the periphery of the star in a diamond shaped box consider relationship to the measure entity “measures based
on dimension” Time
Sales Analysis
AgeGenderE-class
Location Product
Lecture 9
29Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
BIS4227 - Online Database Systems
Data Warehousing Design issues - star schema
Forming Star Schema: Define category entity examine each individual cell of an information package to
determine if it qualifies as a category detail entity category entities become extensions of dimension entities add to star schema in stop sign box
All TimePeriods
AllLocations
All Products All AgeGroups
All Econ.Classes
AllGenders
Year5
Country20
Classification8
Age Group8
Class10
Gender3
Quarter60
Area80
Group40
Month60
Region400
Product200
District2,000Store
200,000
Store
Product
CustomerCategory
Dimension
Lecture 9
30Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing Design issues - star schema
Forming Star Schema:
Sales Analysis
Time Produc
t
ProductDetails
Gender
Age
Econ.Class
Customer details
Location
Store details
Lecture 9
31Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehouse Example: Operational Data
Lecture 9
32Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehouse Example: Star Schema
Lecture 9
33Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehouse Example: Summary Report
Lecture 9
34Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]
Data Warehousing Relation with DM
They have the same purpose - decision support DW assembles, formats, and organises historical data to
answer user query as it is - depends on content of DW DW will not attempt to extract further information neither
will it predict trends and patterns from data DM will extract previously unknown and useful information
as well as predict trends and patterns DM can be performed on DW and/or traditional DB DM: next lecture