parallel mining of closed sequential patterns

18

1 Parallel Mining of Closed Sequential Patterns Shengnan Cong, Jiawei Han, David Padua Proceeding of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining Chicago, Illinois, USA, 2005 Advisor ： Jia-Ling Koh Speaker ： Chun-Wei Hsieh

Upload: qiana

Post on 30-Jan-2016

35 views

Category:

Documents

0 download

Report

Download

Tags:

Embed Size (px):

DESCRIPTION

Parallel Mining of Closed Sequential Patterns. Shengnan Cong, Jiawei Han, David Padua Proceeding of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining Chicago, Illinois, USA, 2005 Advisor ： Jia-Ling Koh - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Parallel Mining of Closed Sequential Patterns

1

Parallel Mining of Closed Sequential Patterns

Shengnan Cong, Jiawei Han, David Padua

Proceeding of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining Chicago, Illinois, USA, 2005

Advisor ： Jia-Ling Koh Speaker ： Chun-Wei Hsieh

Page 2: Parallel Mining of Closed Sequential Patterns

2

Introduction

Numerous applications:– DNA sequences, Analysis of web log, customer shopping

sequences, XML query access patterns…

Closed Sequential patterns– have All information– are more compact

Many applications are time-critical and involve huge volumes of data.

Page 3: Parallel Mining of Closed Sequential Patterns

3

Sequential Algorithm-BIDE

Step 1: Identify the frequent 1-sequences Step 2: Project the dataset along each

frequent 1-sequence Step 3: Mine each resulting projected dataset

Page 4: Parallel Mining of Closed Sequential Patterns

4

Sequential Algorithm-BIDE

The projected dataset forsequence AB is {C,CB,C,BCA}.

Page 5: Parallel Mining of Closed Sequential Patterns

5

Task Decomposition

1. Each processor counts the occurrence of 1-sequences in a different part of the dataset. A global add reduction is executed to obtain the overall counts.

2. Build pseudoprojections. This is done in parallel by assigning a different part of the dataset to each processor. The pseudo-projections are communicated to all processors via an all-to-all broadcast.

3. Dynamic scheduling to distribute the processing of the projections across processors.

Page 6: Parallel Mining of Closed Sequential Patterns

6

Task Decomposition

In the second step, it is more efficient to implement the broadcast using a virtual ring structure.

Assume there are N processor, and

Processor K – Only receives the package from Processor ((K-1) mod N)– Only Sends the package to Processor ((K+1) mod N)

It needs (N-1) send-receive steps and consumes no more than 0.5% of the mining time.

Page 7: Parallel Mining of Closed Sequential Patterns

7

Task Scheduling

1. A master processor maintains a queue of pseudo- projection identifiers. Other processors is initially assigned a projection.

2. After mining a projection, a processor sends a request to the master processor for another projection.

3. This process continues until the queue of projections is empty.

Page 8: Parallel Mining of Closed Sequential Patterns

8

Task Scheduling

If the largest subtask takes 25% of the total mining time, the best possible speedup is only 4 regardless of the number of processors available.

To improve the dynamic scheduling, the approach is to find which projections require long mining time, and to

decompose them.

Page 9: Parallel Mining of Closed Sequential Patterns

9

Relative Mining Time Estimation

Random sampling – selects random subset of the projections– is not accurate if the overhead is kept small

Selective sampling – uses every sequence of the projections– discards infrequent 1-sequences and the last L frequent 1-

sequences ( L = a given fraction t * the average length of the sequences in the dataset )

Page 10: Parallel Mining of Closed Sequential Patterns

10

Selective sampling

For example,– assume (A : 4), (B : 4), (C : 4), (D :3), (E : 3), (F : 3), (G : 1) are the

1-sequences– the support threshold = 4 – the average length of the sequences in the dataset = 4 – Suppose t = 75%

L = 4 0 .∗ 75 = 3 Given a sequence as AABCACDCFDB, selective sampling will reduce this sequence to AABCA

Page 11: Parallel Mining of Closed Sequential Patterns

11

Relative Mining Time Estimation

Page 12: Parallel Mining of Closed Sequential Patterns

12

Par-CSP Algorithm

Page 13: Parallel Mining of Closed Sequential Patterns

13

Experiments

64 nodes OS: Redhat Linux 7.2 CPU: 1GHz Intel Pentium 3 RAM: 1GB Compiler: GNU g++ 2.96

Page 14: Parallel Mining of Closed Sequential Patterns

14

Experiments

•Synthetic Dataset: IBM dataset generator

•Real Dataset: Gazelle, Web click-stream

Page 15: Parallel Mining of Closed Sequential Patterns

15

Experiments

Page 16: Parallel Mining of Closed Sequential Patterns

16

Experiments

Page 17: Parallel Mining of Closed Sequential Patterns

17

Experiments

Page 18: Parallel Mining of Closed Sequential Patterns

18

Experiments

SEQUENTIAL AND PARALLEL ALGORITHMS FOR CAUSAL EXPLANATION WITH

Dryad: Distributed Data-Parallel Programs from Sequential … · Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard Microsoft Research, Silicon

ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Saman Amarasinghe. Lets stick with current sequential languages Parallel Programming is hard! Billons of LOC written in sequential languages Let the compiler

On the potentiality of sequential and parallel codes …web.math.unifi.it/users/brugnano/papers/apnum_25_(1997...On the potentiality of sequential and parallel codes based on extended

A Unified Approach to Sequential and Parallel Algorithms · Web viewAlgorithms Sequential and Parallel: A Unified Approach Russ Miller Laurence Boxer Prentice-Hall, Inc., New Jersey,

Sequential and Parallel Abstract Machines for Optimal ... · Sequential and Parallel Abstract Machines for Optimal Reduction Marco Pedicini (Roma Tre University) in collaboration

“elbowing out” Processors used Speedup Efficiency timeexecution Parallel Processors timeexecution Sequential Efficiency

Easy, E ective, E cient: GPU Programming in Python with ...Consider: Which is easy to do automatically? Parallel program !sequential hardware or Sequential program !parallel hardware?

Dryad: Distributed Data-Parallel Programs from Sequential Building

Sequential Pattern Mining by Pattern-Growth: Principles ...jpei/publications/seqpat-05.pdf · sequential patterns, (3) mining top-k closed sequential patterns, and (4) their appli-cations

Shared Memory Programming with OpenMP · Sequential code •In OpenMP, all code outside parallel regions, or inside MASTER and SINGLE directives is sequential. •Time spent in sequential

Sequential and Parallel Sorting Algorithms

Sequential Design Motivation Sequential processing often more tractable than parallel Example Sequential processing sometimes only method that works Example

Scheduling and ordering issues in Sequential Task Flow ...solhar.gforge.inria.fr/lib/...issues_stf_parallel_multifrontal_method.pdf · Sequential Task Flow parallel multifrontal methods

Communication-optimal parallel and sequential QR and LU ......Communication-optimal parallel and sequential QR and LU factorizations James Demmel, Laura Grigori, Mark Hoemmen, and

Parallel and Sequential Testing of Design Alternatives · Parallel and Sequential Testing of Design Alternatives Christoph H. Loch INSEAD Christian Terwiesch The Wharton School Stefan

Lecture 2: Sequential and Parallel Architectures

UNIVERSITI PUTRA MALAYSIA EFFICIENT SEQUENTIAL AND ...psasir.upm.edu.my/5850/1/FSKTM_2005_4(1-24).pdfefficient sequential and parallel routing algorithms ... algoritma berjujukan dan

SEQUENTIAL AND PARALLEL HEURISTIC ALGORITHMS FOR THE RECTILINEAR …etd.lib.metu.edu.tr/upload/12607896/index.pdf · 2010-07-21 · sequential and parallel heuristic algorithms for

Dryad: Distributed Data-Parallel Programs from Sequential

Accelerating sequential computer vision algorithms using commodity parallel hardware 28june12_Jaap.pdf · · 2012-09-16Accelerating sequential computer vision algorithms using commodity

A Framework For Automated Parallel Random Unit Testing Of Sequential Programs

Solving Sequential Problems in Parallel · 2016-11-23 · Solving Sequential Problems in Parallel, Rev. 0 Freescale Semiconductor 3 RSA Public-key Cryptography In public-key cryptography,

High Performance Computingcalpar/New_HPC_course/1_HPC...High Performance Computing - S. Orlando 4 Sequential vs. Parallel Computing • Sequential computing: – solve a problem with

Mining frequent Max and closed sequential patternssummit.sfu.ca/system/files/iritems1/8688/b2604629a.pdf · MINING FREQUENT MAX AND CLOSED SEQUENTIAL PATTERNS Ramin Afshar ... Mining

Communication-avoiding parallel and sequential QR factorizations

Parallel Sequential Multi-Sensor Change-Point Detection

The Sequential Attack against Power Grid Networksyhzhu/docs/ConfTalk/ICC14_SequentialAttack.pdf · The Sequential Attack against Power Grid Networks ... substation was closed for

HPC Parallel Programming: Overview and Sequential ...hpc.iucaa.in/Documentation/hpc_training/tcs/HPCTrainingscache.pdf · HPC Parallel Computing Course Overview 1.HPC Cluster Overview

Sequential and Parallel Algorithms for Some Problems on Trees Raymond Greenlaw 1 Sequential and Parallel Algorithms for Some Problems on Trees by Raymond

Motivation Parallel programming is difficult Culprit: Non-determinism Interleaving of parallel threads But required to harness parallelism Sequential

Distributed and Parallel High Utility Sequential Pattern ...individual.utoronto.ca/_zihayatm/Papers/IEEEBigData.pdfand distributed high utility sequential pattern mining algo-rithm

Algorithms Sequential and Parallel—A Unified Approach

Adaptive Sequential Posterior Simulators for Massively Parallel