14 data mining and warehousing
TRANSCRIPT
-
8/3/2019 14 Data Mining and Warehousing
1/7
ABSTRACT..
Data mining is a combination of databaseand artificial intelligence technologies. Although theAI field has taken a major dive in the last decade; this
new emerging field has shown that AI can add majorcontributions to existing fields in computer science. Infact, many experts believe that data mining is the thirdhottest field in the
industry behi-nd the Internet, and datawarehousing.
Data mining is really just the next step inthe process of analyzing data. Instead of gettingqueries on standard or user-specified relationships,data mining goes a step farther by finding meaningful
relationships in data. Relationships that were thoughtto have not existed or ones that give a more insightfulview of the data. For example, a computer-generatedgraph may not give the user any insight; however data
mining can find trends in the same data that shows theuser more precisely what is going on. Using trendsthat the end-user would have never thought to querythe computer about.
A data warehouse is a repository of anorganization's electronically stored data. Data
warehouses are designed to facilitate reporting andanalysis.This classic definition of the data warehousefocuses on data storage. However, the means toretrieve and analyze data, to extract, transform and
load data, and to manage the dictionary data are alsoconsidered essential components of a data
warehousing system. Many references to datawarehousing use this broader context. Thus, an
expanded definition for data warehousing includesbusiness intelligent tools to extract, transform, andload data into the repository, and tools to manage andretrieve metadata
A data warehouse is a relational databasethat is designed for query and analysisrather than for transaction processing. It
usually contains historical data derivedfrom transaction data, but
can include data from other sources. Datawarehouses separate analysis workloadfrom transaction workload and enable anorganization to consolidate data fromseveral sources. This helps in:
Maintaining historical records
Analyzing the data to gain a betterunderstanding of the business andto improve the business.
Introduction :-
Data mining, the extraction of hiddenpredictive information from large databases , is apowerful new technology with great potential to help
companies focus on the most important information intheir data warehouses. Data mining tools predict futuretrends and behaviors, allowing businesses to make
proactive, knowledge-driven decisions. Data miningtools can answer business questions that traditionally
were too time consuming to resolve.
This evolution began when businessdata was first stored on computers, continuedwith improvements in data access, and more
recently, generated technologies that allow usersto navigate through their data in real time.
Massive data collection
Powerful multiprocessor computers
Data mining algorithms
The Scope of Data Mining
Data mining derives its name from the
similarities between searching for valuable business
information in a large database for example,
1
DATA MINING AND WAREHOUSING
Presented by :
P.Satya vathi
M.Divya
SRI SIVANI COLLEGE OF ENGINEERINGSRIKAKULAM
EMAIL ID :[email protected]
-
8/3/2019 14 Data Mining and Warehousing
2/7
finding linked products in gigabytes of store scanner
data and mining a mountain for a vein of valuable
ore. Both processes require either sifting through an
immense amount of material, or intelligently probing
it to find exactly where the value resides.
Automated prediction of trends and
behaviors. Data mining automates theprocess of finding predictive information in
large databases. Questions that traditionallyrequired extensive hands-on analysis cannow be answered directly from the data.
Automated discovery of previously
unknown patterns. Data mining tools
sweep through databases and identifypreviously hidden patterns in one step. Anexample of pattern discovery is the analysis
of retail sales data to identify seeminglyunrelated products that are often purchasedtogether.
Techniques:
3.3 Neural networks
Neural networks have broad applicability to realworld business problems and have already been
successfully applied in many industries. Since neuralnetworks are best at identifying patterns or trends in
data, they are well suited for prediction or forecastingneeds including:
sales forecasting
industrial process control
customer research
data validation
risk management
target marketing etc.
The bottom layer represents the input layer,
in this case with 5 inputs labels X1 through X5. In themiddle is something called the hidden layer, with avariable number of nodes. It is the hidden layer that
performs much of the work of the network. The output
layer in this case has two nodes, Z1 and Z2
representing output values we are trying to determinefrom the inputs.
3.2.1 Decision trees
Decision trees are simple knowledge
representation and they classify examples to a finitenumber of classes, the nodes are labeled with attributenames, the edges are labeled with possible values forthis attribute and the leaves labeled with different
classes.
The following is an example of objects that
describe the weather at a given time. The objectscontain information on the outlook, humidity etc.Some objects are positive examples denote by P andothers are negative i.e. N.
Decision tree structure
Genetic algorithms: Optimizationtechniques that use processes such as geneticcombination, mutation, and natural selection
in a design based on the concepts ofevolution.
Nearest neighbor method: A technique thatclassifies each record in a dataset based on acombination of the classes of the k record(s)
most similar to it in a historical dataset(where k 1). Sometimes called the k-nearest neighbor technique.
Rule induction: The extraction of useful if-then rules from data based on statistical
significance.
2
-
8/3/2019 14 Data Mining and Warehousing
3/7
:-How Data Mining Works
How exactly is data mining able to tell youimportant things that you didn't know or what is goingto happen next? The technique that is used to perform
these feats in data mining is called modeling. For
instance, if you were looking for a sunken Spanishgalleon on the high seas the first thing you might do isto research the times when Spanish treasure had been
found by others in the past. You might note that theseships often tend to be found off the coast of Bermudaand that there are certain characteristics to the oceancurrents, and certain routes that have likely been taken
by the ships captains in that era. You note thesesimilarities and build a model that includes the
characteristics that are common to the locations ofthese sunken treasures. With these models in hand you
sail off looking for treasure where your modelindicates it most likely might be given a similar
situation in the past.
This act of model building is thus something thatpeople have been doing for a long time, certainlybefore the advent of computers or data mining
technology. What happens on computers, however, isnot much different than the way people build models.Computers are loaded up with lots of informationabout a variety of situations where an answer is known
and then the data mining software on the computermust run through that data and distill the
characteristics of the data that should go into themodel. For example, say that you are the director of
marketing for a telecommunications company andyou'd like to acquire some new long distance phone
customers.
Table 2 - Data Mining for Prospecting
Table 3 shows another common scenario for buildingmodels: predict what is going to happen in the future.
3
Customers Prospects
General
information
(e.g.
demographicdata)
Known Known
Proprietaryinformation
(e.g. customertransactions)
Known Target
-
8/3/2019 14 Data Mining and Warehousing
4/7
Yesterday Today Tomorrow
Staticinformationand current
Known Known Known
plans (e.g.demographic
data,marketing
plans)
Table 3 - Data Mining for Predictions
Architecture for Data Mining :
To best apply these advanced techniques,
they must be fully integrated with a data warehouse aswell as flexible interactive business analysis tools. Theresulting analytic data warehouse can be applied toimprove business processes throughout the
organization, in areas such as promotional campaignmanagement, fraud detection, new product rollout, and
so on. Figure 1 illustrates architecture for advancedanalysis in a large data warehouse.
Figure 1 - Integrated Data Mining Architecture
The ideal starting point is a data warehousecontaining a combination of internal data tracking all
customer contact coupled with external market dataabout competitor activity. Background information on
potential customers also provides an excellent basisfor prospecting.
:-Applications
A wide range of companies have deployed
successful applications of data mining.
Combating Terrorism
Data mining has been cited as the method bywhich the U.S. Army unit Able Danger hadidentified the September 11, 2001 attacks leader,Mohamed Attar, and three other 9/11 hijackers as
possible members of anAl Qaedacell operating in theU.S. more than a year before the attack.
A pharmaceutical company can analyze its recent
sales force activity and their results to improvetargeting of high-value physicians and determine
which marketing activities will have the greatestimpact in the next few months.
A credit card company can leverage its vast
warehouse of customer transaction data toidentify customers most likely to be interested ina new credit product. Using a small test mailing,the attributes of customers with an affinity for the
product can be identified.
A diversified transportation company with alarge direct sales force can apply data mining to
identify the best prospects for its services.
A large consumer package goods company can
apply data mining to improve its sales process toretailers.
Introduction:
Most firms want to set uptransaction processing systems so there isa high probability that transactions will becompleted in what is judged to be anacceptable amount of time. Reports andqueries, which can require a much greaterrange of limited server/disk resources than
4
Dynamic
information (e.g.customertransactions)
Known Known Target
http://en.wikipedia.org/wiki/Al_Qaedahttp://en.wikipedia.org/wiki/Al_Qaedahttp://en.wikipedia.org/wiki/Al_Qaedahttp://en.wikipedia.org/wiki/Al_Qaeda -
8/3/2019 14 Data Mining and Warehousing
5/7
transaction processing, run on theservers/disks used by transactionprocessing systems can lower theprobability that transactions complete in anacceptable amount of time. Or, runningqueries and reports, with their variableresource requirements, on the
servers/disks used by transactionprocessing systems can make it quitecomplex to manage servers/disks so thereis a high enough probability thatacceptable response time can be achieved.
Definition:
Data Warehouse:
The term Data Warehouse was coined by Bill Inmanin 1990, which he defined in the following way: "A
warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support
of management's decision making process". Hedefined the terms in the sentence as follows:
Subject Oriented:
Data that gives information about a particular subject
instead of about a company's ongoing operations.
Integrated:
Data that is gathered into the data warehouse from avariety of sources and merged into a coherent whole.
Time-variant:
All data in the data warehouse is identified with a
particular time period.
Non-volatile
Data is stable in a data warehouse. More data is addedbut data is never removed. This enablesmanagement to gain a consistent picture ofthe business.
Data Warehouse Architectures
Data warehouses and their architectures varydepending upon the specifics of an organization'ssituation. Three common architectures are:
Data Warehouse Architecture: Basic
Data Warehouse Architecture: with aStaging Area
Data Warehouse Architecture: with a
Staging Area and Data Marts
Data Warehouse Architecture: Basic
Figure 1-2 shows a simple architecture for a data
warehouse. End users directly access data derivedfrom several source systems through the datawarehouse.
Figure 1-2 Architecture of a Data Warehouse
Description of "Figure 1-2 Architecture of aData Warehouse"
In Figure 1-2, themetadata and raw
data of atraditional OLTPsystem is present,as is an additional
type of data,
summary data.
Data
Warehouse Architecture: with aStaging Area
You need to clean and process your operational databefore putting it into the warehouse, as shown in
Figure 1-2.
5
http://www.filibeto.org/sun/lib/nonsun/oracle/11.1.0.6.0/B28359_01/server.111/b28313/img_text/dwhsg013.htmhttp://www.filibeto.org/sun/lib/nonsun/oracle/11.1.0.6.0/B28359_01/server.111/b28313/img_text/dwhsg013.htmhttp://www.filibeto.org/sun/lib/nonsun/oracle/11.1.0.6.0/B28359_01/server.111/b28313/img_text/dwhsg013.htmhttp://www.filibeto.org/sun/lib/nonsun/oracle/11.1.0.6.0/B28359_01/server.111/b28313/img_text/dwhsg013.htm -
8/3/2019 14 Data Mining and Warehousing
6/7
Figure 1-3 Architecture of a Data Warehouse with a
St
aging Area
Description of "Figure 1-3 Architecture of aData Warehouse with a Staging Area"
Data Warehouse Architecture: with aStaging Area and Data Marts
Although the architecture in Figure 1-3 is quitecommon, you may want to customize yourwarehouse's architecture for different groups within
your organization.
Figure 1-4 Architecture of a Data Warehouse with aStaging Area and Data Marts
Description of "Figure 1-4 Architecture of aData Warehouse with a Staging Area andData Marts"
Note:
Data marts are an important part of many data
warehouses, but they are not the focus of this book.
Data warehouse Components
Data warehousing is essentially what you need to doin order to create a data warehouse, and what you do
with it. It is the process of creating, populating, andthen querying a data warehouse and can involve a
number of discrete technologies
Application Uses
DW appliances provide solutions for many analytic
application uses, including:
Enterprise data warehousing
Super-sized sandboxes isolate power users
with resource intensive queries
Pilot projects or projects requiring rapid
prototyping and rapid time-to-value
Off-loading projects from the enterprise data
warehouse; ie large analytical query projectsthat affect the overall workload of theenterprise data warehouse
Applications with specific performance or
loading requirements
Data marts that have outgrown their presentenvironment
Turnkey data warehouses or data marts
Solutions for applications with high datagrowth and high performance requirements
Applications requiring data warehouse
encryption
Disadvantages of data warehouses
There are also disadvantages to using a data
warehouse. Some of them are:
Over their life, data warehouses can have
high costs. The data warehouse is usuallynot static. Maintenance costs are high.
Data warehouses can get outdated relatively
quickly. There is a cost of delivering
suboptimal information to the organization.
There is often a fine line between datawarehouses and operational systems.
Duplicate, expensive functionality may bedeveloped. Or, functionality may bedeveloped in the data warehouse that, inretrospect, should have been developed in
the operational systems and vice versa..
The future of data warehousing
Data warehousing, like any technology niche, has ahistory of innovations that did not receive marketacceptance.
Service Oriented Architecture
Search capabilities integrated into reporting
and analysis technology
Software as a Service
Analytic tools that work in memory
Visualization
Another prediction is that data warehouse performancewill continue to be improved by use of data warehouse
appliances,
Difference between data mining and data ware
housing :
6
http://www.filibeto.org/sun/lib/nonsun/oracle/11.1.0.6.0/B28359_01/server.111/b28313/img_text/dwhsg015.htmhttp://www.filibeto.org/sun/lib/nonsun/oracle/11.1.0.6.0/B28359_01/server.111/b28313/img_text/dwhsg015.htmhttp://www.filibeto.org/sun/lib/nonsun/oracle/11.1.0.6.0/B28359_01/server.111/b28313/img_text/dwhsg064.htmhttp://www.filibeto.org/sun/lib/nonsun/oracle/11.1.0.6.0/B28359_01/server.111/b28313/img_text/dwhsg064.htmhttp://www.filibeto.org/sun/lib/nonsun/oracle/11.1.0.6.0/B28359_01/server.111/b28313/img_text/dwhsg064.htmhttp://www.filibeto.org/sun/lib/nonsun/oracle/11.1.0.6.0/B28359_01/server.111/b28313/img_text/dwhsg015.htmhttp://www.filibeto.org/sun/lib/nonsun/oracle/11.1.0.6.0/B28359_01/server.111/b28313/img_text/dwhsg015.htmhttp://www.filibeto.org/sun/lib/nonsun/oracle/11.1.0.6.0/B28359_01/server.111/b28313/img_text/dwhsg064.htmhttp://www.filibeto.org/sun/lib/nonsun/oracle/11.1.0.6.0/B28359_01/server.111/b28313/img_text/dwhsg064.htmhttp://www.filibeto.org/sun/lib/nonsun/oracle/11.1.0.6.0/B28359_01/server.111/b28313/img_text/dwhsg064.htm -
8/3/2019 14 Data Mining and Warehousing
7/7
Data mining: A method of comparing large amountsof data to find patters. Normally this is used for
models and forecasting.Or The process of discovering meaningful
correlations, patterns, and trends by sifting throughlarge amounts of data stored in repositories, using
pattern recognition technologies as well as statisticaland mathematical techniques.
Data warehousing: The ability of a system to store
data resulting from Data Mining to be used in futureinquiries of that database.Or A data warehouse is a central repository (orstorehouse) for data that an enterprise's various
business systems collect. Data from various online
applications and other sources is selectively extractedand organized in the data warehouse for usefulanalysis
Conclusion :
Comprehensive data warehouses that integrate
operational data with customer, supplier, and marketinformation have resulted in an explosion of information.Competition requires timely and sophisticated analysison an integrated view of the data. However, there is a
growing gap between more powerful storage andretrieval systems and the users ability to effectivelyanalyze and act on the information they contain. A newtechnological leap is needed to structure and prioritize
information for specific end-user problems. The datamining tools can make this leap. Quantifiable business
benefits have been proven through the integration of datamining with current information systems, and new
products are on the horizon that will bring thisintegration to an even wider audience of users.
REFERENCES :
1.DATA MINING TECHNIQUES BY ARUN KPUJARI
2. DATA WAREHOUSING IN THE REAL WORLDBY SAM ANAHORY & DENNIS MURRY
3. GOOGLE SEARCH & MSN SEARCH
4. ENCYLOPEDIA
5. IEEE magazines
6. DATA WAREHOUSING & FUNDAMENTALS
7