ppt(15 slides)

18
Optimized Distributed Data Mining 1

Upload: karthik-jalla

Post on 06-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 1/18

OptimizedDistributed

Data

Mining

1

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 2/18

Introduction

With the explosive growth of informationsources available on the World Wide Web.

It has become increasingly necessary forusers to utilize automated tools in findingthe desired information resources and to

track and analyze their usage patterns.

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 3/18

Applications of Data Mining:

Data mining tools predict future trends andbehaviors, allowing businesses to make proactive,knowledge-driven decisions.

Data mining tools can answer business questions

that traditionally were too time consuming toresolve.

Data mining techniques can be implementedrapidly on existing software and hardware

platforms . To enhance the value of existing information

resources, and can be integrated with newproducts and systems as they are brought on-line.

3

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 4/18

What do you mean by data

mining?

4

The process of extracting valid, previously

unknown, comprehensible, actionable informationfrom the large database.

Extraction of hidden predictive information from

large data base.

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 5/18

Brief description:

ODAM is a distributed algorithm for geographicallydistributed data sets that reduces communicationcosts.

Distributed Association Rule Mining (D-ARM)algorithms have been developed, to mine patternsacross distributed databases.

Existing D-ARM algorithms cannot discover rulesbased on higher-order associations between items indistributed textual documents

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 6/18

ARM(Association RuleMining)

Association rule mining is the active data mining researcharea.

This ARM algorithm caters to a centralized environment.

ARM algorithms are focused on sequential or centralizedenvironment. 

Association rule mining finds interesting associations and /orcorrelation relationships among large set of data items.

Association rules provide information of this type in the formof "if-then" statements.

6

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 7/18

................................................

..... In addition to the antecedent (the "if" part) and the

consequent (the "then" part) an association rule has two

numbers that express the degree of uncertainty about the

rule.

--Support--Confidence

Example:

bread => milk | 80%

Association rules can be between more than 2 items.

bread, milk => jam | 60%

7

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 8/18

• Item set x={x1,x2,x3…. xn}

• Find all the rules with the minimum support and

confidence.• Support ,s, probability that a transaction contains xUy.

• confidence, c, conditional probability that a transaction ‘x’

also contains ‘y’. 

Working Of Association Rule

8

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 9/189

Aim of ARM:

To reduce the communication cast and synchronization in

data mining system.

We introduced this new system to mainly achieve two major

issues.

Communication

Synchronization

Decreasing of the communication cast is the one of the major

advantage in this new system.

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 10/18

Data mining mainly includes the following methods:a. Association algorithm:

This rule implies certain association relationship among

set of objects in a database.

b. Classification : 

The process of dividing a dataset into mutually exclusive

groups such that member of each group close as possible to one

another, and different groups are far as possible from one another,

where distance is measured with respected to specific variable.

c. Clustering algorithm: 

Here the process is same as above but the distance is

measured with the all variables. 

10

Existing system contains

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 11/18

Transaction-id  Items bought 

10 A, B, D

20 A, C, D

30 A, D, E

40 B, E, F

50 B, C, D, E, FCustomerBuys milk

CustomerBuys bread

CustomerBuys both

The occurrence of the data mining using the association rule mining is

shown in the ven-diagram as follows:

The example of the tabular form of the data base(Market Basket Analysis.

Operation occurrence

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 12/18

The Apriori Algorithm—An Example

12

Database TDB

1st scan

C 1

 L1

 L2

C 2 C 

2

2nd scan

C 3

 L33rd scan

Tid Items

10 A, C, D

20 B, C, E

30 A, B, C, E

40 B, E

Itemset sup

{A} 2

{B} 3

{C} 3

{D} 1

{E} 3

Itemset sup{A} 2

{B} 3

{C} 3

{E} 3

Itemset

{A, B}

{A, C}

{A, E}

{B, C}

{B, E}

{C, E}

Itemset sup

{A, B} 1

{A, C} 2

{A, E} 1

{B, C} 2

{B, E} 3

{C, E} 2

Itemset sup

{A, C} 2

{B, C} 2

{B, E} 3{C, E} 2

Itemset

{B, C, E}

Itemset sup

{B, C, E} 2

Supmin = 2

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 13/18

Proposed System:

The reduce the communication cost in the new system

we highlight several message optimization techniques

those are:

-Direct support count-Indirect support count exchange methods.

Communication is one of the most important DARM

objectives.

All sites share a common globally frequent itemset with

identical support counts.

13

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 14/18

Algorithm design:

Same as Association mining but it broadcasts supportcounts of candidate itemsets after every pass.

ODAM first computes support counts of 1-itemsets from

each site in the same manner .

It then broadcasts those itemsets to other sites and

discovers the global frequent 1-itemsets.

Subsequently, each site generates candidate 2- itemsets

and computes their support counts.

14

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 15/18

--------------------------------------------- ODAM also eliminates all globally infrequent 1-

itemsets from every transaction.

inserts the new transaction into new memory.

After generating support counts of candidate 2-itemsets at each site, then

-ODAM generates the globally frequent 2-itemsets,then iterates through main memory.

-then generates the support counts of candidateitem sets of respective length.

15

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 16/18

----------------------------------------- Hence, it reduces the transaction size (the number of 

items) and finds more identical transactions.

Finally, it writes all main-memory entries for this

partition into a temp file ..

-then each local site generates support counts andbroadcasts them to all other sites to let each site

calculate globally frequent item sets for that pass.

16

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 17/18

17

Implementation

This new system includes all the activities those are present

in the existing system and we included some new features. It is implemented using JAVA.

We established a socket-based, client-server distributed

environment to evaluate ODAM’s message reduction

techniques. Each site has a receiving and sending unit and assigns a

specific port to send and receive candidate support counts.

8/3/2019 Ppt(15 Slides)

http://slidepdf.com/reader/full/ppt15-slides 18/18

  Requirements 

18

Hardware requirements:

Processor : Intel processor IV

RAM : 128MB

Hard disk : 20GB

Monitor : 15’ color  

Keyboard : 108 mercury keyboard

Mouse : Logitech mouse

Software requirements:Operating system : windows xp/2000

Language used : J2sdk1.4.0, Jcreator