data mining – a first view roiger & geatz. definition data mining is the process of employing...

10
Data Mining – A First View Roiger & Geatz

Upload: evan-crawford

Post on 12-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically

Data Mining – A First View

Roiger & Geatz

Page 2: Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically

Definition

Data mining is the process of employing one or more computer learning techniques to automatically analyze and extract knowledge contained within a database.Knowledge Discovery in Databases (KDD) is same a data mining.Knowledge from a data mining session gives us a model or generalization of the data.Induction-based learning – generalize by observing specifics.

Page 3: Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically

What Can Computer Learn?

FactsConceptsProceduresPrinciplesComputers are good at learning concepts – concepts are the outputs from a data mining session.

Page 4: Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically

Three Concept Views

Classical view – all concepts have definite defining properties.Probabilistic view – concepts are represented by properties that are probable of concept members.Exemplar view –a given instance is determined to be example of a particular concept if the instance is similar enough to set of one or more known examples of that concept.

Page 5: Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically

Supervised Learning

Also known as induction-based supervised concept learningAttribute-value matrix – table 1.1Decision tree

Page 6: Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically

Unsupervised Clustering

Builds models without predefined classes.Table 1.3.Example questions.

Page 7: Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically

Data Mining?

Can we clearly define the problem?Does potentially meaningful data exist?Does the data contain hidden knowledge? Or is the data factual and useful for reporting purposes only?

Page 8: Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically

Data Mining or Data Query

Shallow knowledge – factual, easily stored and manipulated. SQL is a good tool.Multidimensional knowledge – is also factual but multidimensional knowledge _ OLAP tools.Hidden knowledge – patterns and regularities in data – no SQL – data mining algorithms.Deep knowledge – knowledge in database that can be found only with some direction – current data mining tools are ineffective.

Page 9: Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically

Expert Systems or Data Mining

Data Mining: Data – data mining tool – knowledge

Expert Systems – Human Expert – Knowledge Engineer – ES building tool – Knowledge

Page 10: Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically

Data Mining Application

Fraud detectionHealth careBusiness and financeScientific applicationsSports and gaming