chapter one(2004)
TRANSCRIPT
-
8/2/2019 Chapter One(2004)
1/60
1
Department of Information
Technology
Course Title: Data Warehousing and Data Mining
Course No: IT4204
Credit Hr: 3
By: TWW
-
8/2/2019 Chapter One(2004)
2/60
2
Chapter one
Introduction In this chapter we will cover the following issues in brief
Motivation: Why data mining?
What is data mining & data warehousing?
Architecture of a typical DM system
Why DM: Application of data mining
Data Mining: On what kind of data?
Data mining functionalities
Are all the patterns interesting?
Classification of data mining systems
Major issues in data mining
-
8/2/2019 Chapter One(2004)
3/60
3
Motivation:
Necessity is the Mother of Invention
Our capacity of generating and collecting data have been increased
rapidly in the last several decades
Huge amount of data is available at the tip of our hand
It is predicted that more data will be produced in the next year than
has been generated during the entire existence of humankind!
A study by IBM predicts that by 2005 the amount of information on
the planet will increase from 3.2 million exabytes to 43 million
exabytes (1 exabyte=250 bytes) .
-
8/2/2019 Chapter One(2004)
4/60
4
Motivation:
Necessity is the Mother of Invention
Contributing factors include
Widespread use of bar code for most commercial products,
Computerization of many business, scientific, and governmental
transactions, Advances in data collection tools (audio, video, satellite remote
sensing, scanning, image capturing tools)
Usage of WWW as a global information system (the Internet in
general), Development comprehensive application software,
new computing and storage technologies
-
8/2/2019 Chapter One(2004)
5/60
5
Motivation:
Necessity is the Mother of Invention
All this have made it easier to create, collect, and store all types of
data.
As a result it creates a problem what is called data exposition
Data explosion is the problem of having huge amount of data in anenterprise stored in databases, data warehouses and other information
repositories generated by automated data collection tools and mature
database technology in large databases which has to be processed to
make a decision.
As the size of data get larger, analyzing the data becomes very
difficult
-
8/2/2019 Chapter One(2004)
6/60
6
Motivation:
Necessity is the Mother of Invention
Data can be managed and stored in
structured relational databases;
in semi-structured file systems, such as e-mail;
unstructured fixed content, like documents and graphic files.
Companies rely on this enterprise data to improve decision-makingand to gain a competitive advantage;
Data has indeed become a highly valued business asset.
The huge amount of data exceeds our human ability to make
comprehension on the data and to put the best decision without tools
Generating and storing of large volumes of data has reached a critical
mass and appropriate tools for comprehend the data becomes vital.
-
8/2/2019 Chapter One(2004)
7/60
7
Motivation:
Necessity is the Mother of Invention
We are drowning in data, but starving for knowledge!
The Solution: Data warehousing and data mining
Data mining can be viewed as a result of the natural evolution of information
technology.
This can be more explained if we look at the evolution of database technology
since 19th century.
-
8/2/2019 Chapter One(2004)
8/60
8
Motivation:
Necessity is the Mother of Invention 1960s:
Known to be the era of primitive file processing
There were activities such as
Data collection,
database creation (No DBMS),
Information management system (IMS), mainly using COBOL
1970s:
Relational data model, relational DBMS implementation
Data modeling tools like ER diagram
Indexing and data organization techniques such as B+ tree, hashing, etc
Query language such as SQL
User interfaces, forms and reports
Query processing and optimization techniques
Transaction management: recovery, concurrency control, etc
Online Transaction processing (OLTP)
-
8/2/2019 Chapter One(2004)
9/60
9
Motivation:
Necessity is the Mother of Invention
1980s:
Period of advanced DB Systems
advanced data models
extended-relational, Object Oriented, Object-Relational, deductive, etc.)
application-oriented DBMS
spatial, temporal, multimedia, active, scientific, engineering, Knowledgebase,
etc.)
1990s2000s:
Data mining and data warehousing, Knowledge discovery, OLAP and Web
based databases
-
8/2/2019 Chapter One(2004)
10/60
10
What is Data Mining & Data warehousing?
Data mining is extraction of interesting (non-trivial, implicit, previously unknownand potentially useful) information or patterns from data in large databases (data
warehouse)
The term Data mining is a misnomer as it doesnt directly related to what is does.
For exampling mining gold from rock is called Gold mining but not rock mining.
Similarly oil mining is mining oil from the ground.
Data mining should best describe as knowledge mining from data rather that datamining
Any way, we will use the term with this understanding
-
8/2/2019 Chapter One(2004)
11/60
11
What is Data Mining & Data warehousing?
Alternative names
Knowledge discovery(mining) from databases (KDD), knowledge extraction,
data/pattern analysis,
data archeology,
data dredging,
information harvesting,
business intelligence, etc.
Note that: query processing systems, Expert systems (knowledge base systems) or
Information retrieval systems are not data mining tasks
-
8/2/2019 Chapter One(2004)
12/60
12
What is Data Mining & Data warehousing?
Data can now be stored in many different types of databases.
Special DB architecture that has recently emerged is data warehouse
Data warehouse is a repository of multiple heterogeneous data sources organized
under a unified schema at a single site in order to facilitate management decisionmaking process
Data warehouse technology includes
Data cleansing (removing noise and inconsistent data)
Data integration (combining multiple data sources into one data warehouse)
On-Line Analytical Processing (OLAP)
-
8/2/2019 Chapter One(2004)
13/60
13
What is Data Mining & Data warehousing?
OLAP is analysis technique which have functionalities such as
Summarization
Consolidation
Aggregation as well as
The ability to view information from different angle
Although OLAP tools support multidimensional analysis and decision making,
additional data analysis tools are required for in depth data analysis such as
Data classification
Clustering and
Characterization of data changes over time
The abundance of data, coupled with the need for powerful data analysis tools has
been described as data rich but information poor situation
-
8/2/2019 Chapter One(2004)
14/60
14
What is Data Mining & Data warehousing?
Data mining (Knowledge Discovery in Databases) consists of iterative sequences
of the following seven steps
1. Learning the application domain:
Learn relevant prior knowledge to be used in the DM process
Learn the goals of DM application
2. Create target data set (data selection)
Data cleaning and preprocessing: (may take 60% of effort!)
Data reduction and transformation:
Find useful features, dimensionality/variable reduction, invariant representation.
3. Choosing functions of data mining
summarization, classification, regression, association, clustering.
-
8/2/2019 Chapter One(2004)
15/60
15
What is Data Mining & Data warehousing?
4. Choosing the mining algorithm(s)
Data mining: search for patterns of interest
5. Identify the relevant knowledge by measuring the mining resultinterestingness
Pattern evaluation
6. Present the knowledge to the user
knowledge presentation visualization, transformation, removing redundant patterns, etc.
7. Use of discovered knowledge
-
8/2/2019 Chapter One(2004)
16/60
16April 7, 2012 16
Data mining Models
The above seven steps are usually standardized with various data models
The most common standard in data mining is the CRoss-Industry Standard
Process for Data Mining (CRISP-DM),
CRISP-DM has the following steps
Business/research understanding (Learning the application domain),
Data understanding (data selection for the problem)
Data preparation which involves
collecting, cleaning, consolidating and amalgamating records, summarizing fields,
checking for data integrity, detecting irregularities and illegal attributes, filling in for
missing values, trimming outliers.
Data modeling involves
selecting data mining tools, transforming the data if the tools require it, generating
samples for training and testing the model and finally using the tools to build and
select a model
Evaluating the model and
Deploying the model
-
8/2/2019 Chapter One(2004)
17/60
17April 7, 2012 17
Data mining Models
-
8/2/2019 Chapter One(2004)
18/60
18April 7, 2012 18
Data Mining: A KDD Process
Data mining: the core of knowledgediscovery in Database
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection & Transformation
Data Mining
Pattern Evaluation
-
8/2/2019 Chapter One(2004)
19/60
19April 7, 2012 19
Data Mining and Business Intelligence
Increasing potentialto support
business decisions End User
Business Analyst
Data Analyst
DBA
Making
Decisions
Data PresentationVisualization Techniques
Data Mining
Information Discovery
Data Exploration
OLAP
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
Data SourcesPaper, Files, Information Providers, Database Systems, OLTP
-
8/2/2019 Chapter One(2004)
20/60
20April 7, 2012 20
Architecture of a Typical Data Mining
System
Data
Warehouse
Data cleaning & data integration Filtering
Databases
Databaseordata
warehouseserver
Dataminingengine
Patternevaluation
Graphical user interface
Knowledge-base
-
8/2/2019 Chapter One(2004)
21/60
21April 7, 2012 21
Why Data Mining?
Potential Applications
Data mining has many and varied fields of application
some of which are listed below.1 Retail/Marketing
2 Banking
3 Insurance and Health Care
4 Transportation
5 Medicine
-
8/2/2019 Chapter One(2004)
22/60
22April 7, 2012 22
Potential Applications: Retail/Marketing Analysis
Market analysis is a tool companies use in order to better understand the
environment in which they operate.
It is one of the main steps in the development of a marketing plan which
involves critically reviewing and organizing collected data so that it can be
used in making strategic marketing decisions
Retailers can use information collected through affinity programs (e.g.,
shoppers club cards, frequent flyer points, contests) to assess the effectiveness
of product selection and placement decisions, coupon offers, and which
products are often purchased together.
-
8/2/2019 Chapter One(2004)
23/60
23April 7, 2012 23
Potential Applications: Retail/Marketing Analysis
Companies such asbanking service providers andmusic clubs can use data
mining to create a churn analysis, to assess which customers are likely to
remain as subscribers and which ones are likely to switch to a competitor
The source of data can be credit card transactions, loyalty cards, discount
coupons, customer complaint calls, plus (public) lifestyle studies
-
8/2/2019 Chapter One(2004)
24/60
24April 7, 2012 24
Potential Applications: Retail/Marketing Analysis
The potential application involves
Target marketing analysis which involves finding clusters model
customers who share the same characteristics, interest, income level,
spending habits, etc to participate in specific market.
Predict response to mailing, calling, advertizing campaigns
Determine customer purchasing patterns over time: Finding pattern of
customer on buying items: based on attribute or combination of attributes
such as marital status, age, salary, citizen, type of credit card, family
condition, etc
Cross-market analysis Associations/co-relations between product sales
Prediction based on the association information
-
8/2/2019 Chapter One(2004)
25/60
25April 7, 2012 25
Potential Applications: Retail/Marketing Analysis
The potential application involves
Customer profiling
what types of customers buy what products (clustering or classification)
Find associations among customer demographic characteristics (age, name,
address, profession, sex, etc)
Identify buying patterns from customers
Identifying customer requirements
identifying the best products for different customers
use prediction to find what factors will attract new customers
Provides summary information
various multidimensional summary reports for decision making
statistical summary information (data central tendency and variation)
-
8/2/2019 Chapter One(2004)
26/60
26April 7, 2012 26
Potential Applications: Retail/Marketing Analysis
The potential application involves
market basket analysis:
identifying group of items that customers usually buy together
Market segmentation
Gathering demographic, geographic, behavioral and physiological information
about a customer and cluster them for proper handling
Risk analysis and management
Forecasting, customer retention, improved underwriting, quality control,
competitive analysis
-
8/2/2019 Chapter One(2004)
27/60
27April 7, 2012 27
Potential Applications: Fraud detection andmanagement
Detecting outliers and manage them before they destroy the organization
environment
Widely used in health care, retail, credit card services, telecommunications
(phone card fraud), etc.
Use historical data to build models of fraudulent behavior and use data
mining to help identify similar instances
Potential Applications: Fraud detection and
-
8/2/2019 Chapter One(2004)
28/60
28April 7, 2012 28
Potential Applications: Fraud detection andmanagement
Examples
Auto insurance:
detect a group of people who stage accidents to collect on insurance
Money laundering:
detect suspicious money transactions in banking network
Medical insurance:
detect professional patients and ring of doctors and ring of references
Detecting inappropriate medical treatment
Detecting telephone fraud
detect users of telephone line which either hijacked or stolen from acustomer
Potential Applications: Fraud detection and
-
8/2/2019 Chapter One(2004)
29/60
29April 7, 2012 29
Potential Applications: Fraud detection andmanagement
Examples
Telephone call model:
destination of the call, duration, time of day or week. Analyze patterns that deviatefrom an expected norm.
Telecom can identify discrete groups of callers with frequent intra-groupcalls, especially mobile phones, and broke a possible multimillion dollar
fraud.
Retail: Analysts estimate that 38% of retail shrink is due to dishonest employees.
-
8/2/2019 Chapter One(2004)
30/60
30April 7, 2012 30
Potential Applications: Banking
Detect patterns of fraudulent credit card use
Identify loyal' customers
Predict customers likely to change their credit card affiliation
Determine credit card spending by customer groups
Find hidden correlations between different financial indicators Identify stock trading rules from historical market data
-
8/2/2019 Chapter One(2004)
31/60
31April 7, 2012 31
Potential Applications: Insurance and Health Care
Claims analysis - i.e which medical procedures are claimed together
Predict which customers will buy new policies
Identify behavior patterns of risky customers
Identify fraudulent behavior
-
8/2/2019 Chapter One(2004)
32/60
32April 7, 2012 32
Potential Applications: Transportation
Determine the distribution schedules among outlets
Analyze loading patterns
-
8/2/2019 Chapter One(2004)
33/60
P t ti l A li ti
-
8/2/2019 Chapter One(2004)
34/60
34April 7, 2012 34
Potential Applications: Others
Text mining (news group, email, documents) and Web analysis.
Intelligent query answering
Sports
Astronomy
-
8/2/2019 Chapter One(2004)
35/60
35April 7, 2012 35
Potential Applications: Corporate Analysisand Risk Management
Finance planning and asset evaluation
cash flow analysis and prediction
contingent claim analysis to evaluate assets
cross-sectional and time series analysis (financial-ratio, trend analysis,
etc.) Resource planning:
summarize and compare the resources and spending
Competition:
monitor competitors and market directions
group customers into classes and a class-based pricing procedure
set pricing strategy in a highly competitive market
-
8/2/2019 Chapter One(2004)
36/60
36April 7, 2012 36
Data Mining: On What Kind of Data?
Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
Object-oriented and object-relational databases
Spatial databases
Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous and legacy databases
WWW
-
8/2/2019 Chapter One(2004)
37/60
37April 7, 2012 37
Data Mining Functionalities
Data mining can be performed on various types of data stores and
Databases
Data mining functionalities are used to specify the kind of patterns to
be found in data mining task
The kind of pattern mined form a given data is not known for the
user.
Techniques should be implemented to extract various pattern from
the available data so that user can choose what they need to use.
There are different kinds of data mining functionalities (tasks) that
can be used to extract various types of pattern from data
-
8/2/2019 Chapter One(2004)
38/60
38April 7, 2012 38
Data Mining Functionalities
Generally data mining task can be broadly classified as
Descriptive (un supervised)
Predictive (supervised)
Descriptive data mining task characterize the general properties of the data
in a database.
This kind of data mining is usually unsupervised
Predictive data mining task perform inference on the current data in order tomake prediction to the future reference
This is usually supervised technique
-
8/2/2019 Chapter One(2004)
39/60
39April 7, 2012 39
Data Mining Functionalities
The supervised predictive data mining functionalities includes
Classification
Regression
Time series
Estimation
The unsupervised descriptive data mining functionalities includes Concept /class description: Characterization (summarization) and discrimination
Association Analysis
Clustering analysis
Outlier analysis
Evolution analysis
Data Mining Functionalities:
-
8/2/2019 Chapter One(2004)
40/60
40April 7, 2012 40
Data Mining Functionalities:Classification
Classification is the process of finding a set of models (or functions) that
describe and distinguish data classes or concepts for the purpose of being able
to use the model to predict the class of an object whose class is unknown
The derived class is based on training data set and can be represented in
various forms such as classification IFTHEN rule, decision tree,
mathematical formulae or neural networks
Prediction is the process of predicting some missing or unavailable data values
rather than class labels.
Finding models (functions) that describe and distinguish classes or concepts
for future prediction
Data Mining Functionalities:
-
8/2/2019 Chapter One(2004)
41/60
41April 7, 2012 41
Data Mining Functionalities:Regression
Regression is the process of predicting some missing or unavailable data values
rather than class labels.
Finding models (functions) that describe and distinguish classes or concepts
for future prediction
Data Mining Functionalities
-
8/2/2019 Chapter One(2004)
42/60
42April 7, 2012 42
Data Mining FunctionalitiesConcept/class description: Characterization and discrimination
Given a class/classes with data that belongs to the class, describe the class by
making observation of its members. Hence one can describe individual classes or concepts in a summarized,
concise and yet precise terms which is called class/concept description.
These description can be derived via data characterization or data
discrimination or both
Data characterization refers to summarizing the data of the class under
consideration (target class) in general term
For example one may characterize the item class as a class in which 90% of the objects are
computer and its peripheral
Data discrimination is description made by making comparative analysisbetween the target class with the other comparative class (contrasting classes)
For example one may discriminate item class from other class like customer and order class
by saying the item class attributes get modified more frequently than others
a a n ng unc ona es:
-
8/2/2019 Chapter One(2004)
43/60
43April 7, 2012 43
a a n ng unc ona es:Association Analysis
Association analysis is the discovery of association rules showing attribute-
value conditions that occur frequently together in a given set of data
Association rules are of the form XY [Support = s%, confidence = c%] where
X is conjunctions of attributes and Y is conjunctions of values and interpreted as
if X then it is likely to happen Y with support s% and confidence c%.
For example
age(X, 20..29) ^ income(X, 20..29K)buys(X, PC) [support = 2%,confidence = 60%]
Interpreted as any one whose age ranges from 20 to 29 and income
range is from 20 to 29K likely buy PC with support 2% and confidence
of 60%
Support shows the probability that all the predicates in X and Y fulfill
together. i.e. P(X U Y)
Confidence shows if predicates in X fulfilled then the predicate in Y is also
fulfilled with the stated percentage. i.e. P(Y | X)
Data Mining Functionalities
-
8/2/2019 Chapter One(2004)
44/60
44April 7, 2012 44
Data Mining FunctionalitiesAssociation Analysis
Example 2:
contains(T, computer)contains(T, software) [1%, 75%]
Interpreted as if Item T contains computer it is also likely to contain
software with support 1% and confidence 75%
In the above two examples, Age, Income, buys and Contains are calledattributes or predicates
An attribute is a value if it is after the implication sign
Association rule can be Multi-dimensional (more than 1 predicate in X and Y)
or single-dimensional association rule (only one predicate in both X and Y)
For example the association rule in example 1 is multi-dimensional where as in
example two is single dimensional
Data Mining Functionalities
-
8/2/2019 Chapter One(2004)
45/60
45April 7, 2012 45
Data Mining Functionalities
Cluster analysis
In cluster Analysis, class labels are unknown and a group of data is
given to be classified.
Cluster analysis group data to form new classes, e.g., cluster houses
to find distribution patterns
Clustering based on the principle:
maximizing the intra-class similarity and minimizing the interclass
similarity
-
8/2/2019 Chapter One(2004)
46/60
46April 7, 2012 46
Data Mining Functionalities
Outlier analysis
Database may contain data object that do not comply with the general
behavior or model of the data.
These data objects are outliers.
Usually outlier data items are considered as noise or exception in many
data mining applications
However, in some application such as fraud detection, the rare events can
be more interesting than the more regularly occurring ones.
The analysis of outlier data is referred to as outlier mining
-
8/2/2019 Chapter One(2004)
47/60
47April 7, 2012 47
Data Mining Functionalities
Trend and evolution analysis
Describe and model regularities or trends for objects whose
behavior changes over time.
Though this may include characterization, discrimination,
association or clustering of time related data, distinct features of
such an analysis include time series data analysis, sequence or
periodicity pattern matching and similarity based data analysis.
It is also referred as regression analysis, sequential pattern mining,
periodicity analysis, similarity-based analysis
A All th Di d P tt I t ti ?
-
8/2/2019 Chapter One(2004)
48/60
48April 7, 2012 48
Are All the Discovered Patterns Interesting?
A data mining system/query may generate thousands of patterns, not all of them
are interesting Questions
What makes a pattern interesting?
Can a data mining system generate all of the interesting patterns?
Can a data mining system generate only interesting patterns?
Answers for all the three questions will be given bellow
Q i 1
-
8/2/2019 Chapter One(2004)
49/60
49April 7, 2012 49
A pattern is interesting if it is easily understood by humans, valid on
new or test data with some degree of certainty, potentially useful, novel,
or validates some hypothesis that a user seeks to confirm
An interesting pattern represents knowledge
Measure of Interestingness measures
Two types (Objective vs. subjective)
Objective: based on statistics and structures of patterns, e.g., support,
confidence, etc.
Subjective:based on users belief in the data, e.g., unexpectedness
(contradicting a users belief), novelty, actionability, etc.
Question 1
Are All the Discovered Patterns Interesting?
Q i 2
-
8/2/2019 Chapter One(2004)
50/60
50April 7, 2012 50
Referred as Completeness of the data mining algorithm
No single data mining system is complete but users can set a constraint on
the type of pattern they are looking for in which the data mining function
generate all the pattern with the specified constraints
Association algorithms dont find classification pattern and others for
example
Question 2
Can We Find All Interesting Patterns?
-
8/2/2019 Chapter One(2004)
51/60
51April 7, 2012 51
This is an Optimization problem in data mining system
it remain an challenging issue
Approaches in the research topic
First generate all the patterns and then filter out the uninteresting
ones.
Generate only the interesting patternsmining query optimization
Question 3
Can We Find Only Interesting Patterns?
-
8/2/2019 Chapter One(2004)
52/60
52April 7, 2012 52
Data Mining: Confluence of Multiple Disciplines
Data Mining
Database
TechnologyStatistics
Other
Disciplines
Information
Science
Machine
LearningVisualization
D Mi i Cl ifi i S h
-
8/2/2019 Chapter One(2004)
53/60
53April 7, 2012 53
Data Mining: Classification Schemes
General functionality
Descriptive data mining
Predictive data mining
Different views, different classifications Kinds of databases to be mined
Kinds of knowledge to be discovered
Kinds of techniques utilized
Kinds of applications adapted
-
8/2/2019 Chapter One(2004)
54/60
54April 7, 2012 54
A Multi-Dimensional View of Data Mining Classification
Databases to be mined Relational, transactional, object-oriented, object-relational, active,
spatial, time-series, text, multi-media, heterogeneous, legacy, WWW,etc.
Knowledge to be mined
Characterization, discrimination, association, classification, clustering,trend, deviation and outlier analysis, etc.
Multiple/integrated functions and mining at multiple levels
Techniques utilized
Database-oriented, data warehouse (OLAP) oriented, machinelearning, statistics, visualization, neural network, etc.
Applications adapted
Retail, telecommunication, banking, fraud analysis, DNA mining, stockmarket analysis, Web mining, Weblog analysis, etc.
-
8/2/2019 Chapter One(2004)
55/60
55April 7, 2012 55
Major Issues in Data Mining
The major issue in data mining includes the following aspects Mining methodology and user interaction
Performance and scalability
Issues related to the diversity of data types
Issues related to applications and social impacts These will be briefly discussed in the following slides
a or ssues:
-
8/2/2019 Chapter One(2004)
56/60
56April 7, 2012 56
Mining methodology and user interaction Reflects the kind of knowledge mined, the ability to mine knowledge at different
granularities, the use of domain knowledge, and knowledge visualization
Mining different kinds of knowledge in databases for different users
Require different data mining functionalities and algorithms
Interactive mining ofknowledge at multiple levels of abstraction
Enable to verify the discovered knowledge is relevant or not for the intended purpose
Incorporation of background knowledge Background knowledge is vital in refining the data mining performance and evaluate its
relevance
Data mining query languages and ad-hoc data mining
Such query language enable to retrieve knowledge from a data warehouse just like what SQL
does from the database in a DBMS
Presentation and visualization of data mining results
Result should be presented visually or using high level natural language
Handling noise, incomplete data and exceptions
Pattern evaluation: the interestingness problem
Major Issues:
-
8/2/2019 Chapter One(2004)
57/60
57April 7, 2012 57
Major Issues:Performance and scalability
This includes Efficiency and scalability of data mining algorithms
Parallel, distributed and incremental mining methods for huge amount of
data
Major Issues:
-
8/2/2019 Chapter One(2004)
58/60
58April 7, 2012 58
Major Issues:Issues related to the diversity of data types
This involves
Handling relational and complex types of data.
Relational data need efficient and effective data mining system as it is
widely used
Complex data refers to data such as hyper text, multimedia, spatial, temporaland transactional which further require attention to do data mining on such
data
Mining information from heterogeneous databases and
global information systems (WWW)
Which is a major issue to use as a source of data in data mining
Major Issues:
-
8/2/2019 Chapter One(2004)
59/60
59April 7, 2012 59
Major Issues:Issues related to applications and social impacts
Data mining also concerned on issues related to its application
and the impact it may have on the social aspect
These includes
Identifying the application of discovered knowledge which can be
Domain-specific data mining tools
Intelligent query answering
Process control and decision making
How to integrate the discovered knowledge with existing knowledge:A
knowledge fusion problem
Protection of data security, integrity, and privacy that may have socialimpact
-
8/2/2019 Chapter One(2004)
60/60
Summary Data mining: discovering interesting patterns from large amounts of data
A natural evolution of database technology, in great demand, with wideapplications
A KDD process includes data cleaning, data integration, data selection,
transformation, data mining, pattern evaluation, and knowledge presentation
Mining can be performed in a variety of information repositories
Data mining functionalities: characterization, discrimination, association,
classification, clustering, outlier and trend analysis, etc.
Classification of data mining systems
Major issues in data mining