dmdw technical paper presentation

12
 Authors U.SANTOSH AYYAPPA S.VITTAL KUMAR  III B.TECH III B.TECH VIGNANA BHARATHI INSTITUE OF TECHNOGY GHATKESAR HYD-501301 E-mail: santu _ayya ppa@yaho o.com vital_ shree male@yaho o.com Contact : 90328 91259 Contact : 905240 10692

Upload: santosh-ayyappa

Post on 10-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dmdw Technical Paper Presentation

8/8/2019 Dmdw Technical Paper Presentation.

http://slidepdf.com/reader/full/dmdw-technical-paper-presentation 1/12

 Authors

U.SANTOSH AYYAPPA S.VITTAL KUMAR  

III B.TECH III B.TECH

VIGNANA BHARATHI INSTITUE OF TECHNOGY

GHATKESAR 

HYD-501301

E-mail: [email protected] [email protected]

Contact : 90328 91259 Contact : 905240 10692

Page 2: Dmdw Technical Paper Presentation

8/8/2019 Dmdw Technical Paper Presentation.

http://slidepdf.com/reader/full/dmdw-technical-paper-presentation 2/12

Abstract

Organisations are today suffering from a malaise of data overflow. The developments in the

transaction processing technology has given rise to a situation where the amount and rate of data

capture is very high, but the processing of this data into information that can be utilised for decision

making, is not developing at the same pace. Knowledge and data mining provide a technology that

enables the decision-maker in the corporate sector/govt. to process this huge amount of data in a

reasonable amount of time, to extract intelligence/knowledge in a near real time. The data warehouse,

also called knowledge, allows the storage of data in a format that facilitates its access, but if the tools

for deriving information and/or knowledge and presenting them in a format that is useful for decision

making are not provided the whole rationale for the existence of the warehouse disappears. Various

technologies for extracting new insight from the data warehouse have come up which we classify

loosely as "Data Mining Techniques".

Our paper focuses on the need for information repositories and discovery of knowledge and

thence the overview of, the so hyped, Data Warehousing and Data Mining.

Content Overview

Introduction

Warehouse with a database

What is Data-Warehousing?

Warehousing Functions

Architecture Of Data Warehouse

What is Data Mining?

Warehousing and Mining

Data Mining as a part of Knowledge Discovery

Technologies used in Data Mining

Page 3: Dmdw Technical Paper Presentation

8/8/2019 Dmdw Technical Paper Presentation.

http://slidepdf.com/reader/full/dmdw-technical-paper-presentation 3/12

Goals of Data Mining and Knowledge Discovery

Applications & Compendium

Bibliography 

Introduction

“Knowledge [no more Information] is not only power, but also has significant competitive

advantage” 

Organizations have lately realized that just processing transactions and/or information’s faster 

and more efficiently, no longer provide them with a competitive advantage vis-à-vis their competitors

for achieving business excellence. Information technology (IT) tools that are oriented towards

knowledge processing can provide the edge that organizations need to survive and thrive in the

current era of fierce competition. The increasing competitive pressures and the desire to leverage

information technology techniques have led many organizations to explore the benefits of new

emerging technology - "Data Warehousing and Data Mining". What is needed today is not just the

latest and updated to the nano-second information, but the cross-functional information that can help

decisions making activity as "on-line" process.

 Evolution of Information Technology Tools

The evolution of the information systems characterize the evolution of systems from data

maintenance systems, to systems that transform the data into "information" for use in the decision

making process. These systems supported the information acquisition from the database of 

transactional

data. The managerial knowledge acquisition function is/was not directly supported by these systems.

The evolution of new patterns in the changing scenario could not be provided by these systems

directly, the planner was supposed to do this from experience.

The Transformation of Data into Knowledge and associated tools.

Processing

Transactions Processing

systems

 In ormation  Knowled e

Mana ementData Mining Tools & On-Line Analytical

Processing Tools

Processing

Page 4: Dmdw Technical Paper Presentation

8/8/2019 Dmdw Technical Paper Presentation.

http://slidepdf.com/reader/full/dmdw-technical-paper-presentation 4/12

  Warehouse with a database

One thing that remains constant , especially in corporate world , is “ Change” 

  These days, change is occurring at an ever-increasing rate. A key challenge is

implementing an information infrastructure that allows your company to rapidly respond to change.

One solution to this challenge is the datawarehouse. Datawarehousing is an information infrastructure

 based on detail data that supports the decision-making process and provides businesses the ability to

access and analyze data to increase an organization's competitive advantage. Datawarehousing is a

 process, not an off-the-shelf solution you buy, but hardware--database and tools integrated into an

evolving information infrastructure--that changes with the dynamics of the business.

What is Data-Warehousing? 

The data warehouse makes an attempt to figure out "what we need" before we know we need it. What

it actually is?

• A data warehouse stores current and historical data

• This data is taken from various, perhaps incompatible, sources and stored in a uniform

format

• Several tools transform this data into meaningful business information for the purpose

of comparisons, trends and forecasting

• Data in a warehouse is not updates or changed in any way, but is only loaded and

accessed later on

Page 5: Dmdw Technical Paper Presentation

8/8/2019 Dmdw Technical Paper Presentation.

http://slidepdf.com/reader/full/dmdw-technical-paper-presentation 5/12

• Data is organized according to subject instead of application.

In general a database is not a data warehouse unless it has the following two features:

• It collects information from a number of different disparate sources and is the place

where this disparity is reconciled, and

• It allows several different applications to make use of the same information.

Conceptually, a Data Warehouse looks like this:

 Information Sources always include the core operational systems, which form the backbone

of day-to-day activities. It is these systems, which have traditionally provided management

information to support decision-making.

 Decision Support Tools are used to analyze the information stored in the warehouse, typically

to identify trends and new business opportunities.

The Data Warehouse itself is the bridge between the operational systems and the decision

support tools. It holds a copy of much of the operational system data in a logical structure,

which is more conducive to analysis. The Data Warehouse, which will be refreshed in

scheduled bursts from operational systems and from relevant external data sources, provides a

single, consistent view of corporate data, leaving operational systems unaffected.

Data – Warehouse Functions

The main function behind a data warehouse is to get the enterprise-wide data

in a format that is most useful to end-users, regardless of their locations

Page 6: Dmdw Technical Paper Presentation

8/8/2019 Dmdw Technical Paper Presentation.

http://slidepdf.com/reader/full/dmdw-technical-paper-presentation 6/12

. Data warehousing is used for:

• Increasing the speed and flexibility of analysis.

• Providing a foundation for enterprise-wide integration and access.

• Improving or re-inventing business processes.

• Gaining a clear understanding of customer behavior.

Data Warehouse Architecture

Each implementation of a data warehouse is different in its detailed design (as shown in figure

 below), but all are characterised by a handful of the following key components:

• A data model to define the warehouse contents.

• A carefully designed warehouse database, whether hierarchical, relational, or 

multidimensional. While choosing a DBMS it must be kept in view that the database

management system should be powerful enough to handle huge amount of data running up

to terabytes.

• A front end for  Decision Support System (DSS) for reporting and for structured

andunstructuredanalysis

Page 7: Dmdw Technical Paper Presentation

8/8/2019 Dmdw Technical Paper Presentation.

http://slidepdf.com/reader/full/dmdw-technical-paper-presentation 7/12

Data Mining

Data base mining or Data mining (DM) (formally termed Knowledge Discovery in Databases)

is a process that aims to use existing data to invent new facts and to uncover new relationships

 previously unknown even to experts thoroughly familiar with the data. It is based on filtration and

assaying of mountain of data “ore” in order to get “nuggets” of knowledge. The data mining process

is diagrammatically exemplified in Figure below

Transformed Data

 

1

2

 

 N

The Data Mining Process.

Data Mining and Data Warehousing

The goal of a data warehouse is to support decision making with data. Data mining can be

used in conjunction with a data warehouse to help with certain types of decisions. Data mining can be

applied to operational databases with individual transactions. To make data mining more efficient,

the data warehouse should have an aggregated or summarized collection of data .Data mining helps

AssimilatedInformation

 Data Sources

Data

Warehouse SelectedData

Select Transform Mine Assimilate

Extracted

Information

Page 8: Dmdw Technical Paper Presentation

8/8/2019 Dmdw Technical Paper Presentation.

http://slidepdf.com/reader/full/dmdw-technical-paper-presentation 8/12

in extracting meaningful new patterns that cannot be found necessarily by merely querying or 

 processing data or metadata in the data warehouse.

Data Mining as a Part of the Knowledge Discovery Process

 Knowledge Discovery in Databases, frequently abbreviated as KDD, typically

encompasses more than data mining. The knowledge discovery process comprises five phases:

Selection: creating possible segmentation criteria for selecting data

Preprocessing: Normalize, rationalize, and cleanse the data

Transformation: Create general representation and tables of metadata

Extraction: Extract patterns from the data warehouse, turning data into knowledge

Interpretation and Evaluation: Evaluate utility of extracted and identified patterns

Page 9: Dmdw Technical Paper Presentation

8/8/2019 Dmdw Technical Paper Presentation.

http://slidepdf.com/reader/full/dmdw-technical-paper-presentation 9/12

Technologies Used in Data Mining :

Artificial neural networks: Non-linear predictive models that learn through training and

resemble biological neural networks in structure.

Genetic algorithms: Optimization techniques that use processes such as genetic combination,

mutation, and natural selection in a design based on the concepts of natural evolution.

Decision trees: Tree-shaped structures that represent sets of decisions. These decisions

generate rules for the classification of a dataset.

Case Based reasoning (CBR):To forecast a situation, or to make a correct decision, such

systems find the closest past analogs of the present situation and choose the same solution, which

was the right one in those past situations. That is why this method is also called the nearest

neighbor method

Rule induction: The extraction of useful if-then rules from data based on statistical

significance.

Data visualization: The visual interpretation of complex relationships in multidimensional

data. Graphics tools are used to illustrate data relationships.

Goals of Data Mining and Knowledge Discovery

 The goals of data mining fall into the following classes:

• Prediction: Data mining can show how certain attributes within the

data will behave in the future.

• Identification: Data patterns can be used to identify the existence of an item,

an event, or an activity.

Page 10: Dmdw Technical Paper Presentation

8/8/2019 Dmdw Technical Paper Presentation.

http://slidepdf.com/reader/full/dmdw-technical-paper-presentation 10/12

• Classification: Data mining can partition the data so that different classes or 

categories can be identified based on combinations of parameters.

• Optimization :One eventual goal of data mining may be to optimize the use of 

limited resources such as time, space, money, or materials and to maximize

output variables such as sales or profits under a given set of constraints.

• Applications

• Using Data Mining and Warehousing For Knowledge Discovery

• Knowledge Discovery is a powerful new solution to information overload.

• Make predictions about new data.

• Identify and explain hidden patterns and trends in existing data.

• Summarize the contents of large databases to facilitate understanding and decision making

• Dependability and fault tolerance

• High Availability and Disaster Recovery

• Survivability of evaluative systems

• Reliability and Robustness Issues

• Accuracy and reliability of responses

• Reliable and Failure Tolerant Business Process Integration

• Failure Tolerant and trustworthy Sensor Networks

• Privacy and security policies and social impact of data mining

• Privacy preserving data integration

• Access control techniques and secure data models

• Encryption & Authentication

• Pseudonymization and Encryption

• Anonymization and pseudonymization

Compendium

Page 11: Dmdw Technical Paper Presentation

8/8/2019 Dmdw Technical Paper Presentation.

http://slidepdf.com/reader/full/dmdw-technical-paper-presentation 11/12

A new technological leap is needed to structure and prioritize information for specific end-user 

 problems.

The data mining tools can make this leap. Quantifiable business benefits have been proven through

the integration of data mining with current information systems

Over the next few years, the growth of data warehousing is going to be enormous with new products

and technologies coming out frequently.

Data mining helps in focusing business to servicing customers and to provide efficient business

 processes.

Data Warehousing provides the means to change raw data into information for making effective

 business decisions—the emphasis on information, not data. The data warehouse is the hub for 

decision support data. A good data warehouse will... provide the  RIGHT data... to the RIGHT

 people... at the RIGHT time: RIGHT NOW! While data warehouse organizes data for business

analysis, Internet has emerged as the standard for information sharing.

The paper concludes hoping that companies will use data mining and data warehousing

effectively in order to focus on serving the customers and serving themselves in doing so.

   Bibliography

  Arun k.Pujari , by “Data mining and warehousing”

• Mastering Data Mining by Michael J A Berry & Gordon S . Linoff 

• “Building The Datawarehousing” by John Wiley and sona,1993

 

• InternetE-books

•  blog4jntu.co.cc

• .http://www.spss.com/datamine/ocdm.html

Page 12: Dmdw Technical Paper Presentation

8/8/2019 Dmdw Technical Paper Presentation.

http://slidepdf.com/reader/full/dmdw-technical-paper-presentation 12/12