unit-v data warehousing, data mining & olap
TRANSCRIPT
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
1/64
UNIT-V DATA Warehousing.
Data Warehousing Components. Building a Data Warehouse.
Mapping the Data Warehouse to a
Multiprocessor Architecture. DBMS Schemas for Decision Support. Data
Extraction, cleanup & Transformation Tools.
Metadata.
Data Mining: Introduction to data mining
Kapil Tomar, IT Deptt. AKGEC 1
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
2/64
What is Data Warehousing
Data Warehousing is an architectural
construct of information systems that
provides users with current and historical
decision support information that is hard to
access or present in traditional operational
data stores.
Kapil Tomar, IT Deptt. AKGEC 2
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
3/64
Data Warehouse definition
A formal definition of the data warehouse isoffered by W.H. Inmon:
A data warehouse is asubject-oriented,integrated, time-variant, nonvolatile
collection of data in support of management
decisions
Kapil Tomar, IT Deptt. AKGEC 3
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
4/64
Seven data warehouse components
Data sourcing, cleanup, transformation, and
migration tools
Metadata repository
Warehouse/database technology
Data marts Data query, reporting, analysis, and mining tools
Data warehouse administration and management
Information delivery system
Kapil Tomar, IT Deptt. AKGEC 4
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
5/64
Typically, the source data for the warehouse is coming from the
operational applications [an exception might be an operational data
store (ODS), As the data enters the data warehouse, it is transformedinto an integrated structure format. The transformation process may
involve conversion, summarization, filtering" and condensation of data.
Because data within the data warehouse contains a large historical
component (sometimes covering 5 to 10 years), the data warehouse
must be capable of holding and managing large volumes of data as wellas different data structures for the same database over time.
Kapil Tomar, IT Deptt. AKGEC 5
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
6/64
Kapil Tomar, IT Deptt. AKGEC 6
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
7/64
Kapil Tomar, IT Deptt. AKGEC 7
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
8/64
Sourcing, Acquisition, Cleanup,
and Transformation Tools
A significant portion of the data warehouse implementation effort is spentextracting data from operational systems and putting it in a format suitable
for informational applications that will run off the data warehouse.
perform all of the conversions, summarizations, key changes, structural
changes, and condensations needed to transform disparate data intoinformation that can be used by the decision support tool.
Removing unwanted data from operational databases
Kapil Tomar, IT Deptt. AKGEC 8
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
9/64
The functionality includes
Removing unwanted data from operational databases
Converting to common data names and definitions
Calculating summaries and derived data
Establishing defaults for missing data
Accommodating source data definition changes
Kapil Tomar, IT Deptt. AKGEC 9
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
10/64
The data sourcing, cleanup, extract, transformation, and migration
tools have to deal with some significant issues as follows:
Database heterog eneity. DBMSs are very different in data
models, data access language, data navigation, operations,
concurrency, integrity, recovery, etc.
Data heterogeneity. This is the difference in the way data is
defined andused in different models- homonyms, synonyms, unit
incompatibility different attributes for the same entity, and different
ways of modeling the same fact.
Kapil Tomar, IT Deptt. AKGEC 10
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
11/64
Metadata Metadata is data about data that describes the data warehouse. It is
used for building, maintaining, managing, and using the data warehouse.
Metadata can be classified into
Technical metadata,
Business metadata
Kapil Tomar, IT Deptt. AKGEC 11
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
12/64
Technical metadata, which contains information about warehouse data
for use by warehouse designers and administrators when carrying out
warehouse development and management tasks. Technical meta data
documents include
Information about data source
Transformation descriptions, i.e., the mapping method from operational
databases into the warehouse, and algorithms used to convert, enhance
or transform data Warehouse object and data structure definitions for data targets
The rules used to perform data cleanup and data enhancement
Data mapping operations when capturing data from source systems and
applying it to the target warehouse database
Access authorization, backup history, archive history, informationdelivery history, data acquisition history, data access, etc.
Kapil Tomar, IT Deptt. AKGEC 12
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
13/64
Business metadata contains information that gives users an easy-to
understand perspective of the information stored in the data-ware house.
Business metadata documents information about
Subject areas and information object type, including queries, reports,
image video, and/or audio clips.
Internet home pages.
Other information to support all data warehousing components. For
example, the information related to the information delivery system (see
Sec. 6.8) should include subscription information, scheduling information,
details of delivery destinations, and the business query objects such as
predefined queries, reports, and analyses.
Data warehouse operational information, e.g., data history (snapshots,
versions), ownership, extract audit trail, usage data
Kapil Tomar, IT Deptt. AKGEC 13
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
14/64
Metadata repository management software can be used to map the
source data to the target database, generate code for data
transformations, integrate and transform the data, and control moving
data to the warehouse.
One of the important functional components of the metadata repository is
the information directory.
Kapil Tomar, IT Deptt. AKGEC 14
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
15/64
From a technical requirements point of view, the information directory
and the entire metadata repository
Should be a gateway to the data warehouse environment, and thus
should be accessible from any platform via transparent and seamless
connections
Should support an easy distribution and replication of its content for high
performance and availability
Should be searchable by business-oriented key words
Should act as a launch platform for end-user data access and analysis
tools
Kapil Tomar, IT Deptt. AKGEC 15
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
16/64
Should support the sharing of information objects such as queries,
reports, data collections, and subscriptions between users
Should support a variety of scheduling options for requests against the
data warehouse, including on-demand, one-time, repetitive, event-driven,and conditional delivery (in conjunction with the information delivery
system)
Should support the distribution of the query results to one or more
destinations in any of the user-specified formats (in conjunction with the
information delivery system) .
Should support and provide interfaces to other applications such as e-
mail, spreadsheet, and schedulers
Kapil Tomar, IT Deptt. AKGEC 16
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
17/64
Access Tools The principal purpose of data warehousing is to provide information to
business users for strategic decision making. These users interact with
the data warehouse using front-end tools.
For the purpose of this discussion let's divide these tools into five main
groups:
Data query and reporting tools
Application development tools
Executive information system (EIS) tools
On-line analytical processing tools
Data mining tools
Kapil Tomar, IT Deptt. AKGEC 17
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
18/64
Data query and reporting tools
This category can be further divided into two groups:
1. rep ort in g too ls and2. managed query to ols .
1. Reporting tools can be divided into
i. production reporting tools and
ii. desktop report writers.
Kapil Tomar, IT Deptt. AKGEC 18
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
19/64
Production reporting tools will let companies generate regular
operational reports or support high-volume 'batch jobs, such as calculating
and printing paychecks.
Report writers, on the other hand, are inexpensive desktop tools
designed for end users.
2. Managed query tools shield end users from the complexities of SQL and
database structures by inserting a metalayer between users and the
database. The metalayer is the software that provides subject-oriented
views of a database andsupports point-and-click creation of SQL.
Kapil Tomar, IT Deptt. AKGEC 19
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
20/64
Application development tools
in-house application development
PowerBuilder from PowerSoft,
Visual Basic from Microsoft,
Forte from Forte Software, and
Business Objects from Business Objects.
Kapil Tomar, IT Deptt. AKGEC 20
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
21/64
On-line analytical processing tools
On-line analytical processing (OLAP) tools. These tools are based on the
concepts of multidimensional databases and allow a sophisticated user
to analyze the data using elaborate, multidimensional views.
Typically business applications for these tools include product
performance and profitability, effectiveness of a sales program or a
marketing campaign, sales forecasting, and capacity planning.
Kapil Tomar, IT Deptt. AKGEC 21
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
22/64
Data Mining
A critical success factor for any business today is its ability to use
information effectively.
Knowing this information, an organization can formulate effective
business, marketing, and sales strategies; precisely target promotional
activity; discover and penetrate new markets; and successfully compete
in the marketplace from a position of informed strength.
A relatively new and promising technology aimed at achieving this
strategic advantage is known as data mining.
major attraction of data mining is its ability to buildpredictive rather thanretrospective models.
Kapil Tomar, IT Deptt. AKGEC 22
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
23/64
Most organizations engage in data mining to
Visu alize Data
Correct Data Disco ver knowledge. The goal of knowledge discovery is to determine
explicithidden relationships, patterns, or correlations from data stored in
an enterprise's database. Specifically data mining can be used to
perform:
Segmentation (e.g. group customer records for custom-tailored marketing) Classification (assignment of input data to a predefined class, discovery and
understanding of trends, text document classification)
Association (discovery of cross-sales opportunities)
Preferencing(determining preference of customer's majority)
Kapil Tomar, IT Deptt. AKGEC 23
PACS
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
24/64
Data Marts However, the term data mart means different things to different people.
A rigorous definition of this term is a data store that is subsidiary to a
data warehouse of integrated data. The data mart is directed at a
partition of data (often called a subject area) that is created for the use of
a dedicatedgroup of users.
A data mart might, in fact, be a set of denormalized, summarized, or
aggregated data. Sometimes, such a set could be placed on the data
warehouse database rather than a physically separate store of data.
In most instances, however, the data mart is a physically separate storeof data and is normally resident on a separate database server, often on
the local area network serving a dedicated user group.
Kapil Tomar, IT Deptt. AKGEC 24
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
25/64
it is often a necessary and valid solution to a pressing business
problem, thus achieving the goal of rapid delivery of enhanced
decision support functionality to end users. The business drivers
underlying such developments include
Extremely urgent user requirements
The absence of a budget for a full data warehouse strategy
The. absence of a sponsor for an enterprise wide decision support
strategy The decentralization of business units
The attraction of easy-to-use tools and a mind-sized project
Kapil Tomar, IT Deptt. AKGEC 25
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
26/64
In summary, data marts present two problems:
(1) scalability in situations where an initial small data mart grows
quickly in multiple dimensions and
(2) data integration.
Therefore, when designing data marts, the organizations should pay
close attention to system scalability, data consistency, and
manageability issues.
The key to a successful data mart strategy is the development of an
overall scalable data warehouse architecture; and the key step in
that architecture is identifying and implementing the common
dimensions.
Kapil Tomar, IT Deptt. AKGEC 26
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
27/64
Data Warehouse Administration and Management
Security and priority management
Monitoring updates from multiple sources
Data quality checks
Managing and updating metadata
Auditing and reporting data warehouse usage and status (for managing
the response time and resource utilization, and providing chargeback
information)
Purging data
Replicating, subsetting, and distributing data
Backup and recovery
Data warehouse storage management [e.g., capacity planning,
hierarchical storage management (HSM), purging of aged data]
Kapil Tomar, IT Deptt. AKGEC 27
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
28/64
Information Delivery System
The information delivery component is used to enable the process of
subscribing for data warehouse information and having it delivered to
one or more destinations of choice according to some user-specIfIedscheduling algorithm.
Kapil Tomar, IT Deptt. AKGEC 28
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
29/64
Information Delivery System
The information delivery component is used to enable the process of
subscribing for data warehouse information and having it delivered to
one or more destinations of choice according to some user-specIfIedscheduling algorithm.
In other words, the infrormation delivery system distributes warehouse-
stored data and other information objects to other data warehouses and
end-user products such as spreadsheets and local databases.
Kapil Tomar, IT Deptt. AKGEC 29
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
30/64
Information Delivery System
Delivery of information may be based on time of day, or on a completion
of an external event.
The value of data warehousing is maximized when the right information
gets into the hands of those individuals who need it, where they need It,
and when they need it the most.
Kapil Tomar, IT Deptt. AKGEC 30
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
31/64
Building a Data Warehouse
Kapil Tomar, IT Deptt. AKGEC 31
S f
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
32/64
Nine-Step Method in the Design of a
Data Warehouse
1. Choosing the subject matter2. Deciding what a fact table represents
3. Identifying and conforming the dimensions
4. Choosing the facts5. Storing precalculations in the fact table
6. Rounding out the dimension tables
7. Choosing the duration of the database
8. The need to track slowly changing dimensions
9. Deciding the query priorities and the query modes
Kapil Tomar, IT Deptt. AKGEC 32
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
33/64
Benefits of Data Warehousing
Locating the right information Presentation of information (reports, graphs)
Testing of hypothesis
Discovery of information
Sharing the analysis
Kapil Tomar, IT Deptt. AKGEC 33
T ibl b fit
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
34/64
Tangible benefits Product inventory turnover is improved.
Costs of product introduction are decreased with improved selection of targetmarkets.
More cost-effective decision making is enabled by separating (ad hoc) query
processing from running against operational databases.
Better business intelligence is enabled by increased quality and flexibility of market
analysis available through multilevel data structures, which may range from detailed
to highly summarized. For example, determining the effectiveness of marketing
programs allows the elimination of weaker programs and enhancement of stronger
ones.
Enhanced asset and liability management means that a data warehouse can
provide a "big picture of enterprise wide purchasing and inventory patterns, and
can indicate otherwise unseen credit exposure and opportunities for cost savings.
Kapil Tomar, IT Deptt. AKGEC 34
I t ibl b fit
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
35/64
Intangible benefits Improved productivity, by keeping all required data in a single location and
eliminating the rekeying of data
Reduced redundant processing, support, and software to support
overlapping decision support applications
Enhanced customer relations through improved knowledge of individual
requirements and trends, through customization, Improvedcommunications, and tailored product offerings
Enabling business process reengineering - data warehousing can provide
useful insights into the work processes themselves, resulting in developing
breakthrough ideas for the reengineering of those processes
Kapil Tomar, IT Deptt. AKGEC 35
M i th D t W h t
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
36/64
Mapping the Data Warehouse to a
Multiprocessor Architecture
The organizations that embarked on data warehousingdevelopment deal with ever-increasing amounts of data.
Generally speaking, the size of a data warehouse rapidly
approaches the point where the search for better
performance and scalability becomes a real necessity. This
search is pursuing two goals:
Speed-up-the ability to execute the same request on the
same amount ofdata in less time
Scale-up-the ability to obtain the same performance on the
same request as the database size increases
Kapil Tomar, IT Deptt. AKGEC 36
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
37/64
An additional and important goal is to achieve linear speed-up and
scale-up; doublingthe number of processors cuts the response time
in half (linear speed-up) or provides the same performance on twice
as much data (linear scale-up).
These goals of linear performance and scalability can be satisfied by
parallel hardware architectures, parallel operating systems, and
parallel database management systems. Parallel hardware
architectures are based on multiprocessor systems designed as a
shared-memory model [symmetric multiprocessors (SMPs),shared-disk model, or distributed-memory model [massively parallel
processors (MPPs), and clusters of uniprocessors and/or SMPs].
Kapil Tomar, IT Deptt. AKGEC 37
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
38/64
Types of parallelism
Horizontal parallelism
Vertical parallelism
Kapil Tomar, IT Deptt. AKGEC 38
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
39/64
Horizon tal paral lel ism, which means that the database is
partitioned across multiple disks, and parallel processing occurs
within a specific task (i.e., table scan) that is performed
concurrently on different processors against different sets ofdata.
Vertic al paral lel ism, which occurs among different tasks-all
componentquery operations (i.e., scan, join, sort) are executedin parallel in a pipelined fashion. In other words, an output
from one task (e.g., scan) becomes an input into another task
(e.g., join) as soon as records become available
A truly parallel DBMS should support both horizontal and
vertical types of parallelism concurrently (see Fig. 8.1, case 4).
Kapil Tomar, IT Deptt. AKGEC 39
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
40/64
Kapil Tomar, IT Deptt. AKGEC 40
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
41/64
Data partitioning
Hash part i t ion ing
Key range part i t ion ing
Schema part i t ioning
User-def ined part i t ioning
Kapil Tomar, IT Deptt. AKGEC 41
D t titi i
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
42/64
Data partitioning Hash part i t ionin g. A hash algorithm is used to calculate the partition number
(hash value) based on the value of the partitioning key for each row.
Key range part i t ionin g. Rows are placed and located in the partitions according
to the value of the partitioning key (all rows with the key value from A to K are in
partition 1, L to T are in partition 2, etc.).
Schema part i t ionin g. This is an option not to partition a table across disks;instead, an entire table is placed on one disk, another table is placed on a
different disk, etc. This is useful for small reference tables that are more
effectively used when replicated in each partition rather than spread across
partitions.
User-def ined part i t ion ing. This is a partitioning method that allows a table to be
partitioned on the basis of a user-defined expression (e.g., use state codes to
place rows in one of 50 partitions) ..
Kapil Tomar, IT Deptt. AKGEC 42
D t b A hit t
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
43/64
Database Architectures
for Parallel Processing
Shared-memory architecture
Shared-disk architecture
Shared-nothing architecture
Kapil Tomar, IT Deptt. AKGEC 43
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
44/64
Shared-memory architecture
Kapil Tomar, IT Deptt. AKGEC 44
Shared disk architecture
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
45/64
Shared-disk architecture
Kapil Tomar, IT Deptt. AKGEC 45
S
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
46/64
Shared-nothing architecture
Kapil Tomar, IT Deptt. AKGEC 46
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
47/64
Parallel DBMS Features
Scope and techniques of parallel DBMS operations
Optimizer implementation
Application transparency
The parallel environment. DBMS management tools
Price /performance
Kapil Tomar, IT Deptt. AKGEC 47
DBMS S h f
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
48/64
DBMS Schemas for
Decision Support
Data warehousing projects were forced to choose
between a data model and a corresponding database
schema that is intuitive for analysis but performs poorly
and a model-schema that performs better but is not wellsuited for analysis.
The schema methodology that is gaining widespread
acceptance for data warehousing is the star schema.
Kapil Tomar, IT Deptt. AKGEC 48
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
49/64
Indeed, solving modern business problems such as
market analysis and financial forecasting requires query-
centric database schemas that are array oriented and
multidimensional in nature. These business problems
are characterized by the need to retrieve large numbersof records from very large data sets (hundreds of
gigabytes and even terabytes) and summarize them on
the fly.
Kapil Tomar, IT Deptt. AKGEC 49
DBMS S h f
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
50/64
DBMS Schemas for
Decision Support Star Schema
Potential performance problems with star
schemas
Kapil Tomar, IT Deptt. AKGEC 50
Star Schema
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
51/64
Star Schema The multidimensional view of data that is expressed using relational
database semantics is provided by the database schema design called
star schema.
The basic premise of star schemas is that information can be classified
into two groups: facts and dimensions.
Facts are the core data element being analyzed.
For example, units of individual items sold are facts,
while dimensions are attributes about the facts.
For example, dimensions are the product types purchased and the date
of purchase (see Fig 9.1).
Kapil Tomar, IT Deptt. AKGEC 51
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
52/64
facts (UNITS) through a set of dimensions (MARKETS, PRODUCTS,
PERIOD).
It's-important to notice that, in the typical star schema, the fact table is
much larger than any of its dimension tables.
This point becomes an important consideration of the performance
issues associated with star schemas.
Kapil Tomar, IT Deptt. AKGEC 52
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
53/64
Kapil Tomar, IT Deptt. AKGEC 53
Potential performance problems
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
54/64
Potential performance problems
with star schemas
Indexing, using indexes can enforce the uniqueness of the keys
It requires multiple metadata definitions (one for each key component) to
define a single relationship (table); this adds to the design complexity,
and sluggishness in performance.
Since the fact table must carry all key components as part of its primary
key, addition or deletion of levels in the hierarchy will require physical
modification of the affected table, which is a time-consuming process that
limits flexibility.
Carrying all the segments of the compound dimensional key in the fact
table increases the size of the index, thus impacting both performance
and scalability.
Kapil Tomar, IT Deptt. AKGEC 54
Metadata
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
55/64
Metadata Metadata is one of the most important aspects of data warehousing. It is
data about data stored in the warehouse and its users. At a minimum,
metadata contains:--
The location and description of warehouse system and data components
(warehouse objects).
Names, definition, structure, and content of the data warehouse andenduser views.
Identification of authoritative data sources (systems of record).
Integration and transformation rules used to populate the data warehouse;these include the mapping method from operational databases into the
warehouse, and algorithms used to convert, enhance, or transform data.
Kapil Tomar, IT Deptt. AKGEC 55
Integration and transformation rules used to deliver data to end-user
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
56/64
g
analytical tools.
Subscription information for the information delivery to the analysis
subscribers.
Data warehouse operational information, which includes a history of
warehouse updates, refreshments, snapshots, versions, ownership
authorizations, and extract audit trail.
Metrics used to analyze warehouse usage and performance according end
user usage patterns.
Security authorizations, access control lists, etc.
Kapil Tomar, IT Deptt. AKGEC 56
Metadata Repositor
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
57/64
Metadata Repository Metadata repository management software can be used to map the
source data to the target database, generate code for datatransformations, integrate and transform the data, and control moving
data to the warehouse. This software, which typically runs on a
workstation, enables users to specify how the data should be
transformed, such as data mapping,conversion,.and summarization.
Metadata is searched by users to find data definitions or subject areas.
In other words, metadata provides decision support oriented pointers to
warehouse data, and thus provides a logical link between warehouse
data and the decision support application.
Kapil Tomar, IT Deptt. AKGEC 57
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
58/64
Kapil Tomar, IT Deptt. AKGEC 58
Having such metadata repository implemented as a part of the data
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
59/64
g p y p p
ware house framework provides the following benefits:
It provides a comprehensive suite of tools for enterprise wide metadata
management.
It reduces and eliminates information redundancy, inconsistency, and
under utilization.
It simplifies management and improves organization, control, and
accounting of information assets.
It increases identification, understanding, coordination, and utilization of
enterprise wide information assets.
It provides effective data administration tools to better manage corporate
information assets with full-function data dictionary.
Kapil Tomar, IT Deptt. AKGEC 59
It increases flexibility, control, and reliability of the application development
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
60/64
y y
process and accelerates internal application development.
It leverages investment in legacy systems with the ability to inventory and
utilize existing applications.
It provides a universal relational model for heterogeneous RDBMSs to
interact and share information.
It enforces CASE development standards and eliminates redundancy with
the ability to share and reuse metadata.
Kapil Tomar, IT Deptt. AKGEC 60
M t d t M t
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
61/64
Metadata Management A frequently occurring problem in data warehousing is the inability to
communicate to the end user what information resides in the data
warehouse and how it can be accessed.
The key to providing users and applications with a roadmap to the
information stored in the warehouse is the metadata.
It can define all data elements and their attributes, data sources and
timing, and the rules that govern data use and data transformations.
Since metadata describes the information in the warehouse from multiple
viewpoints (input, sources, transformation, access, etc.),
Kapil Tomar, IT Deptt. AKGEC 61
What data exists in the data warehouse
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
62/64
What data exists in the data warehouse
Where to find the data
What the original sources of the data are
How summarizations were created
What transformations were used
Who is responsible for correcting errors
What queries can be used to access the data
How business definitions have changed over time
What underlying business assumptions have been
made
Kapil Tomar, IT Deptt. AKGEC 62
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
63/64
Kapil Tomar, IT Deptt. AKGEC 63
-
8/22/2019 UNIT-V Data Warehousing, Data Mining & OLAP
64/64
Thank You