data mining

Post on 04-Nov-2014

324 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

WXES2103 DATABASE

DATA MINING

PRODUCE BY:1)MAZNI YAHYA

2)NUR FARIDZATUL HASHIMAH SHAHRI IES080061

3)MURNIANA SHAZWEN SHAFIEN

4) SITI BASYIRAH AWANG

DEFINITION

Data mining, the extraction of hidden predictive information

from large databases, is a powerful new technology with great

potential to help companies focus on the most important information in their data

warehouses.

5 MAJOR ELEMENTS OF DATA MINING

Extract, transform, and load transaction data onto the data warehouse system.

Store and manage the data in a multidimensional database system.

Provide data access to business analysts and information technology professionals.

Analyze the data by application software.

Present the data in a useful format, such as a graph or table.

HOW DOES DATA MINING WORK??

Classes

Clusters

Association

Sequential patterns

Classes

Stored data is used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials.

Clusters

Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities.

Associations

Data can be mined to identify associations. The

beer-diaper example is an example of associative mining.

Sequential Patterns

• Data is mined to anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes.

Evolutionary Step Business Question Enabling Technologies Product Providers Characteristics

Data Collection(1960s)

"What was my total revenue in the last five years?"

Computers, tapes, disks IBM, CDC Retrospective, static data delivery

Data Access(1980s) "What were unit sales in New England last March?"

Relational databases (RDBMS), Structured Query Language (SQL), ODBC

Oracle, Sybase, Informix, IBM, Microsoft

Retrospective, dynamic data delivery at record level

Data Warehousing &Decision Support(1990s)

"What were unit sales in New England last March? Drill down to Boston."

On-line analytic processing (OLAP), multidimensional databases, data warehouses

Pilot, Comshare, Arbor, Cognos, Microstrategy

Retrospective, dynamic data delivery at multiple levels

Data Mining(Emerging Today)

"What’s likely to happen to Boston unit sales next month? Why?"

Advanced algorithms, multiprocessor computers, massive databases

Pilot, Lockheed, IBM, SGI, numerous startups (nascent industry)

Prospective, proactive information delivery

Scope of Data Mining• Automated prediction of trends and behaviors Data mining automates the process of finding

predictive information in large databases. Questions that traditionally required extensive hands-on analysis can now be answered directly from the data — quickly. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Other predictive problems include forecasting bankruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events.

• AUTOMATED DISCOVERY OF PREVIOUSLY UNKNOWN PATTERNS

Data mining tools sweep through databases and identify previously hidden patterns in one step. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors.

Techniques

Neural Network

Decision Tree

Visualisation

Link Analysis

Neural Network

• Are used in a blackbox fashion.• One creates a test data set,lets the neural network

learn patterns based on known outcomes, then sets the neural network loose on huge amounts of data.

• For example, a credit card company has 3,000 records, 100 of which are known fraud records

• The data set updates the neural network to make sure it knows the difference between the fraud records and the legitimate ones.

Link analysis

• This is another technique for associating like records

• Not used too much, but there are some tools created just for this.

• As the name suggests, the technique tries to find links, either in customers, transactions and demonstrate those links.

Visualisation

• Helps users understand their data• Makes the bridge from text based to graphical

presentation.• Such things as decision tree, rule ,cluster and

pattern visualization help users see data relationships rather than read about them.

• Many of the stronger data mining programs have made strides in improving their visual content over the past few years.

Decision Tree

• Use real data mining algorithms• Decision trees help with classification and spit out

information that is very descriptive,helping users to understand their data.

• A decision tree process will generate the rules followed in a process.

• For example, a lender at a bank goes through a set of rules when approving a loan.

• Based on the loan data a bank has, the outcomes of the loans and limits of acceptable levels of default, the decision tree can set up the guidelines for the lending institution.

PROCESS STAGES

1 The initial exploration

2

3

Model building or pattern identification with validation/verification

Deployment

Stage 1: Exploration

• This stage usually starts with data preparation which may involve cleaning data, data transformations, selecting subsets of records and - in case of data sets with large numbers of variables ("fields")

Stage 2: Model building and validation

This stage involves considering various models and choosing the best one based on their predictive performance.

• i.e. explaining the variability in question and producing stable results across samples.

Process Models

Business Understanding Data Understanding

Data Preparation Modeling

Evaluation

Deployment

Define

Measure

Analyze

Improve

Control

Sample Explore

Modify Model

Assess

Stage 3: Deployment

That final stage involves using the model selected as best in the previous stage and applying it to new data in order to generate predictions or estimates of the expected outcome.

DATA MINING SOFTWARE

• KDD Nuggets and Rexer Analytics have surveys and asked people involved in data mining which the most popular software that they use.

• While it is not necessarily true that the most popular software is the best for a particular purpose they can help guide us in choosing which software to evaluate.

Visu Map• Help people to understand high dimensional non-

linear data(multivariate data).• Provides the following three groups of services.

- Data Mapping Services

- Clustering services

- dynamically Linke data Views

Mield Shield• Protects our privacy by removing all tracks from our

online / offline PC activities.• Shreds the contents of the infamous INDEX.DAT file.• Automitically cleans cookies, history, cache,

index.dat files and mant other.

Mozenda

• Enables users of all types to easily and affordably extract and manage web data.

• Can set up agents that routinely extract data and publish data to multiple destination.

• Users can format, repurpose and mashup the data to be used in other online/offline applications or intalligence.

• A ll data is secure and is hosted in class A data ware house.

• Only cab accessed over the web securely via the Mozenda Web Console.

Screen-Scraper

• For extracting data from websites.• Works much like database- allows you to mine

the data of the world wide wab (www).• Provides a graphical interface to allowed us

designate URL’s data elements• Mines data from web pages and fully

scriptable.

Data Detactive

• An open and flaxible data mining solution.• Easy to integrate with existing process and information

infrastructure.• Specifically created for non-statisticians and non-

experts.

• Features include:- Building production models- Clustering- Profiling- Network analysis- Fuzzy matching- Creating graphs and geographical maps.

Weka

• Include a wide variety of methods.• Easy to use interface makes it accessible

for general user• Flexibility and extensibility make it

suitible for academic user• Is written in java and released under the

GNU General Public Licence (GPL).• Can be run in Windows, Linux, Mac and

other platform.

SAS Enterpise Miner

• Part of SAS suite of analysis software and uses a client-server architacture with java based client allowing parallel processing and grid-computing.

• Can be deployed on both Windows and Linux/Unix platforms.

• User interface-easy to use data-flow gui• Can intergrate code written in the SAS language.• Data mining package with multiple techniques

and data flow interface

THANK YOU

top related