olam and data mining: concepts and techniques. introduction data explosion problem: –automated...
TRANSCRIPT
![Page 1: OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead](https://reader036.vdocument.in/reader036/viewer/2022082710/56649dde5503460f94ad6d40/html5/thumbnails/1.jpg)
OLAM and Data Mining:Concepts and Techniques
![Page 2: OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead](https://reader036.vdocument.in/reader036/viewer/2022082710/56649dde5503460f94ad6d40/html5/thumbnails/2.jpg)
Introduction • Data explosion problem:
– Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories
• We are drowning in data, but starving for knowledge!
• Data warehousing and data mining: – On-line analytical processing – query-driven data
analysis – The efficient discovery of interesting knowledge (rules,
regularities, patterns, constraints) from data in large databases
![Page 3: OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead](https://reader036.vdocument.in/reader036/viewer/2022082710/56649dde5503460f94ad6d40/html5/thumbnails/3.jpg)
Evolution of Database Technology • 1960s:
– Data collection, database creation, IMS and network DBMS
• 1970s: – Relational data model, relational DBMS
• 1980s: – RDBMS, advanced data models (extended-relational,
OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.)
• 1990s: – Data mining and data warehousing, multimedia
databases, and Web technology
![Page 4: OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead](https://reader036.vdocument.in/reader036/viewer/2022082710/56649dde5503460f94ad6d40/html5/thumbnails/4.jpg)
What is data mining?
• Data mining: the process of efficient discovery of previously unknown patterns, relationships, rules in large databases and data warehouses
• Goal: help the human analyst to understand the data
• SQL query: – How many bottles of wine did we sell in 1st Qtr of 1999
in Poland vs Austria?
![Page 5: OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead](https://reader036.vdocument.in/reader036/viewer/2022082710/56649dde5503460f94ad6d40/html5/thumbnails/5.jpg)
What is data mining?
• Data mining query: – How do the buyers of wine in Poland and Austria
differ?
– What else do the buyers of wine in Austria buy along with wine?
– How the buyers of wine can be characterized?
![Page 6: OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead](https://reader036.vdocument.in/reader036/viewer/2022082710/56649dde5503460f94ad6d40/html5/thumbnails/6.jpg)
What is data mining?
• Data mining (knowledge discovery in databases): – Extraction of interesting ( non-trivial, implicit,
previously unknown and potentially useful) information from data in large databases
• Alternative names and their “inside stories”: – Knowledge discovery in databases (KDD: SIGKDD),
knowledge extraction, data archeology, data dredging, information harvesting, business intelligence, etc.
– Data mining: a misnomer?
• What is not data mining? – Expert systems or small statistical programs
– OLAP
![Page 7: OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead](https://reader036.vdocument.in/reader036/viewer/2022082710/56649dde5503460f94ad6d40/html5/thumbnails/7.jpg)
Data Mining: A KDD Process • Steps of a KDD Process:
– Learning the application domain:• relevant prior knowledge and goals of application
– Creating a target data set: data selection– Data cleaning and preprocessing: (may take 60% of effort!)– Data reduction and projection:– Find useful features, dimensionality/variable reduction, invariant
representation.– Choosing functions of data mining
• summarization, classification, regression, association, clustering.
– Choosing the mining algorithm(s)– Data mining: search for patterns of interest– Interpretation: analysis of results.
• visualization, transformation, removing redundant patterns, etc.
– Use of discovered knowledge
![Page 8: OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead](https://reader036.vdocument.in/reader036/viewer/2022082710/56649dde5503460f94ad6d40/html5/thumbnails/8.jpg)
Data Mining and Business Intelligence
Increasing potentialto supportbusiness decisions
Data SourcesPaper, Files, Database systems, OLTP, WWW
Data Warehouses/Data MartsOLAP, MDA
Data ExplorationStatistical Analysis, Reporting
Data MiningInformation Discovery
Data PresentationVisualization
MakingDecisions
End User
DBA
BusinessAnalyst
DataAnalyst
![Page 9: OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead](https://reader036.vdocument.in/reader036/viewer/2022082710/56649dde5503460f94ad6d40/html5/thumbnails/9.jpg)
Data Warehouse
Meta Data
MDDB
OLAMEngine
OLAPEngine
User GUI API
Data Cube API
Database API
Data cleaning
Data integration
Filtering
Databases
Filtering&Integration
Mining query Mining result
An OLAM Architecture
![Page 10: OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead](https://reader036.vdocument.in/reader036/viewer/2022082710/56649dde5503460f94ad6d40/html5/thumbnails/10.jpg)
Data Mining: Confluence of Multiple Disciplines
• Database systems, data warehouse and OLAP
• Statistics
• Machine learning
• Visualization
• Information science
• High performance computing
• Other disciplines:
– Neural networks, mathematical modeling, information retrieval, pattern recognition, etc.
![Page 11: OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead](https://reader036.vdocument.in/reader036/viewer/2022082710/56649dde5503460f94ad6d40/html5/thumbnails/11.jpg)
Data Mining: On What Kind of Data? • Relational databases • Data warehouses • Transactional databases • Advanced DB systems and information
repositories– Object-oriented and object-relational databases– Spatial databases– Time-series data and temporal data– Text databases and multimedia databases– Heterogeneous and legacy databases– WWW
![Page 12: OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead](https://reader036.vdocument.in/reader036/viewer/2022082710/56649dde5503460f94ad6d40/html5/thumbnails/12.jpg)
Data Mining Functionality Data mining methods may be classified onto 6 basic classes:
• Associations – Finding rules like “if the customer buys mustard,
sausage, and beer, then the probability that he/she buys chips is 50%”
• Classifications – Classify data based on the values of the decision
attribute, e.g. classify patients based on their “state”
• Clustering – Group data to form new classes, cluster customers
based on their behavior to find common patterns
![Page 13: OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead](https://reader036.vdocument.in/reader036/viewer/2022082710/56649dde5503460f94ad6d40/html5/thumbnails/13.jpg)
Data Mining Functionality
• Sequential patterns – Finding rules like “if the customer buys TV, then, few
days later, he/she buys camera, then the probability that he/she will buy within 1 month video is 50%”
• Time-Series similarities – Finding similar sequences (or subsequences) in time-
series (e.g. stock analysis)
• Outlier detection – Finding anomalies/exceptions/deviations in data