chapter 3 mining sequential access patterns -...

23
46 CHAPTER 3 Mining Sequential Access patterns The focus in the current chapter is on “ Mining Sequential Access Patterns”. In particular, it investigates algorithms pertaining to mining sequential access patterns from web logs. As a primordial exercise, an efficient algorithm CSMA (Conditional Sequence Mining Algorithm) [50]: a Web Access Pattern tree based mining algorithm is investigated. A comparison is made between CSMA and other web access pattern tree based algorithms. The comparison itself does not involve much cost for reconstructing the conditional WAP trees at intermediate stages in the mining process. As a sequel to the above mentioned an enhancement is made to the CSMA which eliminates the need for web access pattern tree which are very useful for mining the web logs in particular, the sequential access patterns. This new methodology is named as TCSMA (Temporal Conditional Sequence Mining Algorithm) [50].Not only does the TCSMA have the ability to mined periodic sequential access patterns- it has an enhanced efficiency aspect to it. The current research work provides the predictive power to anticipate the use of the web page in a stipulated time period with the help of periodic sequential access patterns. The current chapter focuses on TCSMA and its performance. The aim of TCSMA is to mine access patterns which are both periodic and sequential. The sequential common access patterns are also mined along

Upload: hakien

Post on 31-Aug-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

46

CHAPTER 3

Mining Sequential Access patterns

The focus in the current chapter is on “ Mining Sequential Access

Patterns”. In particular, it investigates algorithms pertaining to mining

sequential access patterns from web logs. As a primordial exercise, an

efficient algorithm CSMA (Conditional Sequence Mining Algorithm) [50]: a

Web Access Pattern tree based mining algorithm is investigated. A

comparison is made between CSMA and other web access pattern tree

based algorithms. The comparison itself does not involve much cost for

reconstructing the conditional WAP trees at intermediate stages in the

mining process. As a sequel to the above mentioned an enhancement is

made to the CSMA which eliminates the need for web access pattern tree

which are very useful for mining the web logs in particular, the

sequential access patterns. This new methodology is named as TCSMA

(Temporal Conditional Sequence Mining Algorithm) [50].Not only does

the TCSMA have the ability to mined periodic sequential access patterns-

it has an enhanced efficiency aspect to it. The current research work

provides the predictive power to anticipate the use of the web page in a

stipulated time period with the help of periodic sequential access

patterns.

The current chapter focuses on TCSMA and its performance. The aim

of TCSMA is to mine access patterns which are both periodic and

sequential. The sequential common access patterns are also mined along

Page 2: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

47

with the periodic sequential access patterns. The remaining chapter

focuses on the usage of mining process for sequential access patterns

and then TCSMA is elaborated. This is corroborated by the findings of

experimental results. The chapter concludes by summarizing the above

mentioned.

3.1 PERIODIC SEQUENTIAL ACCESS PATTERN MINING

As may be surmised from chapter 2, many approaches [67, 68, 48], for

mining of sequential patterns from web logs have been proposed. Which

are either based on Apriori algorithms or Web Access Pattern trees. The

major trust and emphasis of earlier research was primarily on the mining

the patterns of the access events from web and in particular, the

common sequential patterns are mined and has increased frequency of

occurrence with in the entire duration of web access transactions. But,

what has been observed-in practice was that the many useful sequential

access pattern frequency was high in a particular periodic time interval.

For example, the time interval could be a morning of every weekend and

certainly did not occur during other times .This could be attributable to

user browsing behavior and habit. The above mentioned sequential

access patterns are termed periodic sequential access patterns . The

periodic time intervals refer to actual real life time interval entries such

as year, month, week or day.

Recent studies have presented mining algorithms. These algorithms

focused on temporal association rules. The work investigated the

Page 3: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

48

drawback of the association rules which exhibit periodic recurrent

changes in meantime [71]. The main focus is to isolate the data into

many set-apart sections by mentioning the time interval like day, week,

month etc. These rules are known as cyclic association rules. The

drawback of the cyclic association rules is that they fail to handle the

multiple coarseness such as morning of the all weekdays, in time

intervals. Calendar algebra defines a group of time period intervals to

mine the calendric association rules. The idea is to enhance the

kindness for mining the association rules [72]. In order to find patterns

in data that approximately matched the user defined patterns, the

formulation of determining the hazy patterns was suggested for

association rules. Unfortunately, this warrants appriori knowledge of

temporal pattern in the transaction databases. Then only it is possible to

define calendar expression.

As a sequel to the above mentioned, Calendar schemas was proposed.

This helped in the easy and better understanding of temporal association

rules [73]. The advantage is that the work has less need for knowledge of

data Apriori. The only prerequisite for the above mentioned is a pattern

which is based on calendar which refers to a particular calendar schema.

For Extract or discover the periodic calendar wearing temporal

association access pattern rules algorithm is wearing on existing

immemorial history for apriori principle. This algorithm works has

made on focused on several exciting ways for mentioning the time

Page 4: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

49

constraints for mining the temporal association rules. But the algorithms

for mining are related to association rules which can be neglected the

keeping in view the sequential features of web access patterns. Along

with this, these algorithms may also get the some anomalies as the

conventional algorithms which need costlier scans of the database for

finding out the most recurrent events.

This chapter has a central focus on mining the periodic sequential

access patterns and an effective mining algorithm known as TCSMA is

suggested.

The following phase will focus on the periodic time constraints which

are based on calendar which should be mentioned in prior to operating

the TCSMA.

3.2 CALENDAR-BASED PERIODIC TIME CONSTRAINTS

This section we start the defining new method a real-life time concept

In the following part, we suggest periodic calendar wearing time

constraints is used for delineate the real time conceit .The centennial

calendar attrition time coercion consisting of calendar attrition template

and calendar attrition item.

Definition 3.1

A centennial calendar bases model is certain stated as

CBT = (PCU1 INT1, PCU2 INT2, …, PCUn INTn).

The calendar components for instance day, week, month, year etc are

defined by each PCUj the bounded interval for the legitimate time values

Page 5: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

50

are given by INTj of PCUj which has all positive integers. A calendar

model signifies a structure of calendar components and legitimate

intervals of time.

Consider, a calendar model which can be seen in the format (year [2007,

2008], {Month[1,12],Day[1,31]OR [Week Day[1,7],Time Hour[0,23]}.

Definition 3.2

acknowledge a periodic wearing calendar model CBT = (PCU1

INT1,PCU2 INT2, ...,PCUn INTn), a calendar deterrent example is

represented by (INT1’, INT2’, ..., INTn’), where INTJ ’ represents a non

negative integers set and INTJ ’ Ij, or is a wild-card symbol * which

denotes the whole legitimate time specifications in Ij. By assigning the

calendar components to some given values the calendar deterrent

example from the calendar model. Then the example is used to represent

the real time scenario notion.

For instance,

Given PCBT = (week_day [1, 7], hour [0, 23]), there is

PCJ = ({6, 7}, {5, 6, 7, 8}) which represents every

Weekend’s early morning time

or

CJ = (*, {19, 20, 21}) represent everyday’s evening time.

The real-time scenario notions for instance mornings and evenings are

considered differently with respect to different people as per their

individual interests and activities. Consider the example where the

Page 6: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

51

morning may considered from the sunrise time to afternoon whereas

many others may take the timings 6 AM to 10 AM as morning. Thus, the

calendar examples may de stated as per the need of the user. The

template for the calendar deterrent examples are given as shown in Table

3.1.

Definition 3.3

A recurrent calendar-based time condition is denoted by (PC)

PC = [PCBT, CJ ]

in which PCBT = a model based on calendar and

CJ = a deterrent example for calendar

For instance, PC = [(Week-Day [1, 7], Hour_time [0, 23]), ({1, 2, 3, 4, 5},

{10, 11})] denotes “10:00 AM to 11:59 AM of all weekdays”.

Consider PC = [PCBT, CJ ], T is the time covered by PC when T is in the

time boundary stated values by PC

For instance, Td1 = “2007-11-10: 21:10:10 Saturday” and Td2 = “2007-

11-04 21:45:22 Sunday” are included in PC. If PCall = [PCBT, (*, ..., *)] is

represented by the recurrent calendar-based time specification, in which

PCBT is the recurrent model based on calendar which mention the

legitimate time interval values.

3.3 THE TCSMA (TEMPORAL CONDITIONAL SEQUENCE MINING

ALGORITHM)

This segment, an approach is suggested referred as TCSMA (Temporal

Conditional Sequence mining algorithm), to mine the similar and

Page 7: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

52

recurrent successive access prototypes from a given web access

transaction database is also formulated.

3.3.1 Problem Statement

Usually, the logs from the web can be seen as the a cluster of successive

access events of a user or phase in the increasing order of timestamp.

The methods in preprocessing [53] includes data cleansing, user

appereception,seession assimilation and transaction apperception are

used to preprocess the actually web logs to attain the sequence events

from the web access sequence files. These method for preprocessing are

dealt in chapter 2.

Consider SUAE = A group of unique access events describing the web

resources used by browsers, i.e. web pages, URLs.

WASP = A pattern for web access sequence

WASP = e1e2…en (ej ϵ SUAE ¥ 1 ≤ j ≤ n) is a collection of successive access

patterns and |WASP| = n denoted the length of WASP.

Sometimes it should be noted that may not be essential that ej ≠ ek for J

≠ k in WASP i.e. the repetition of items is allowed.

WATE = A web access transaction event

WATE = (Td, WASP), which contains transaction time Td

and a web access catenation pattern WASP.

Transaction time Td and a web access catenation pattern WASP. The web

access catenation transactions taken into consideration from a database

may be of a particular user i.e. single user or several users i.e. server-

Page 8: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

53

side logs. The adumbrate algorithm is not dependant on which type of

the web logs which consists of the web access catenation transactions.

Let us assume that SUAE= {p, q, r, s, t, u} be the access event group of a

set of web access catenation transactions An example for the web access

catenation database is shown in Table 3.2.

A Web Access Sequence pattern WASP= e1e2…el el+1…en,

WASP prefix = e1e2…el is also known as prefix of web access successive

chain of WASP, or a prefix successive chain of em+1 in WASP. And

WASP suffix = eL+1el+2…en is also known as suffix successive chain of WASP

or a suffix successive chain of el in WASP.

We have, a web access sequence pattern

(WASP) = WASP prefix + WASP suffix.

For instance,

WAS = pqspr may be represented by WASP = p+ qspr = pq+ spr = … =

pqsp+r.

Let SS1 and SS2 may be the two suffix successive chains of ej in WASP,

and SS1 is known as the suffix chain of ej in SS2. Then SS1 is known as

the sub-suffix of web access chain of SS2 and SS2 is the super-suffix of

web access chain of S1. The suffix of web access chain of ej in WASP

without any super-suffix web access sequence is called the long suffix of

web access chain of ej in WASP.

Page 9: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

54

For instance, let WASP = pqsprq, then SS1 = sq is the sub-suffix of web

access successive chain of SS2 = qsprq and S2 is the super-suffix of web

access successive chain of SS1.

SS2 is may be considered as the long suffix of web access successive

chain of p in WASP.

Assume that,

WATEDB= A web access transaction events database

WATEDB = {(Td1, SS1), (Td2, SS2), …, (Tdm, SSm)} where WASPJ (1 ≤ j ≤ m) is

a web access successive chain, and tJ represents the database web

access transaction time.To Provide a perennial period calander based

time constraints{PC} which is accompaniment in Section 3.2.

WATEDB (PC) = {(Tdj, WASPj) | Tdj is included in PC, 1 ≤ J ≤m} is a set

contained within another set of WATEDB beneath PC .WATEDB {PC} is

defined as the length of WATEDB beneath PC .The threshold support of

WASP in WATEDB in PC is accompaniment in equation (3.1).

WASP is pertain as sequential perennial access pattern mining,when

support(WASP, PC) ≥Support_Minimum, where Support_Minimum shows

the support threshold.

Tak einto consideration the example database in Table 3.2.

│{Sj│WAS є Sj,(Tj,Sj) є WATDB(CPT)} │ │ WATDB(CPT) │

Sup(WAS,CPT)= 3.1

Page 10: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

55

Let Support_Minimum = 70% and recurrent calendar-based time

constraint

PC =[( Week_Day [1, 7], Hour_time [0, 23]), ({6, 7},{20,21}].At once it is

compelling actuate the web access sequence patterns which are plump

by minimum 70% web access sequences in the time breach from 9:00PM

to 10:59 PM of each weekend in the example database.

When PCall is used as the recurrent time based calendar constraint, the

obtained result set after mining must be the patterns which satisfy the

support threshold assumed which is considered earlier.

3.3.2 Proposed Approach

The fig: 3.1 shows that the proposed approach TCSMA involves the below

stated steps:

1. Preprocessing Constraints;

2. Generating and Creating Event Queues for Conditional Web

Access Sequence Base;

3. Testing Single Web Access Sequence for Conditional Web

Access Sequence Base;

4. Creating Sub-Conditional Web Access Sequence Base; and

5. Mining Recursive Patterns for Sub-Conditional Sequence

Web Access Base.

Page 11: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

56

3.3.2.1 Constraint Preprocessing

The initial phase in the TCSMA is to figure out the web access

transaction database by removing the events which do not meet the

requirements the given by the time recurrent calendar-based constraint.

An initial conditional successive chain base is constructed using the

persisting constraint-satisfied (STc) events. The definitions for the initial

conditional successive chain base and conditional successive chain base

are given below.

Definition 3.4

The initial conditional successive chain base, represented by Ini-CWSB,

is the coercion satisfied web access transaction catenation advents set

in the accord web access catenation transaction database, in which the

satisfied coercion transactions are the transaction events which are

included in the recurrent time calendar-based constraint.

Definition 3.5

The conditional web access successive chain base for the event ej

which is based on prefix web access successive chain WASPprefix,

represented by CWSB(STc) , where

STc = WASP prefix + ej, is long suffix successive chains set of event ej

in sequences of a particular dataset.

The dataset and the initial conditional successive chain base of the given

web access transaction database are equivalent, when WASP prefix = Ø.

Page 12: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

57

CWSB(WASTc) may also be referred as the conditional successive chain

base the conditional prefix STc. With the value for STc =Ø, the initial

conditional successive chain base is represented by CWSB(Ø).

The Preprocessing Constraint algorithm for preprocessing constraints

for transaction events from the web access transaction database WATEDB

is as shown below in Fig: 3.2.

Preprocessing_Constraint Algorithm

Input:

1: PC = [PCBT, CJ] – A Time Recurrent Calendar-Based Constraint

consisting of Model for Calendar Based Model PCBT and Calendar

Deterrent Example CJ.

2: WATEDB = {WATEi |WATEj = (Tdj, WASPj), 1 ≤ j ≤ n} – Web Access

Transaction Database, and WATEj is a Web Access Transaction

consisting of Time Tj for Transactions and Web Access (Successive chain)

Sequence WASPj

Output:

1: Init-CWSB - Successive Chain Base for Initial Conditional Patterns of

WATEDB

Method:

1: Assign Init-CWSB = Ø.

2: For all WATEj ϵ WATEDB, if Tdj is included in PC, insert WASPj into Init-

CWSB.

Page 13: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

58

3: Return Init-CWSB.

Example

Consider a Time Recurrent Calendar-Based Constraint

PC = [(Week_Day [1, 7], Hour_time [0, 23]), ({6, 7}, {21, 22})], as the

following Transaction in Table 3.2 is “2007-11-07 18:23:24 Wednesday”,

it is not included in PC. Thus the qqrpr web access sequence is removed.

Once the preprocessing phase is over on the web access transaction

database the Init-CWSB has the following events in its database {pqspr,

tptqrpr, qpqupt, puqpur}.

3.3.2.2 Constructing Event Queues for Conditional Sequence Base

The next phase involved in the TCSMA is to build an event queues for

CWSB(STc)

(for Init-CWSB, STc = Ø). The method doest the following four actions:

(1) Determining the conditional frequent Sequential events

from CWSB(STc);

(2) Building a Table for Head events;

(3) Advents queues creation;and

(4) Abandon the non frequent coercion events.

The definition for the allusive continual coercion advents is catenation

as

Definition 3.6

The allusive continual coercion events is the event which has the

support value much than the support provided in the conditional

Page 14: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

59

successive chain base support threshold value, Support_Minimum. To

determine conditional frequent sequential events in CWSB(STc),it is

necessary to determine the events which has support more than or equal

to Support_Minimum, which is given in the below equation (3.2)

In the above equation (3.2), |{SSk | ej ϵ SSk, SSk ϵ CWSB(STc)}| gives

the number of sequences having the item named ej in CSBP(STc), and the

length of Init-CWSB is given by |Init-CWSB|. Next a Head Table

CSBP(STc), is created using the conditional frequent sequential events.

The structural representation like a linked list for is created every

conditional frequent sequential for every event ej, and it is known as ej –

queue. Every item of ej –queue is the initial item named ej in successive

chains of CWSB(STc). The Head table is recorded with the pointer of every

individual event queue and at last the events in CWSB(STc) which are

named as the non-frequent sequential events are removed, as they are

not useful and necessary for the further processing. The event queue

construction algorithm is given below in fig: 3.3

Event_Queue_Construction Algorithm

│{Sk│ ek є Sk,Sk є CSBP(STc)} │

│ Ini-CSBP │

Sup(ej)= 3.2

≥ MinSup

Page 15: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

60

Input:

1: Support_Minimum – Minimum Support Threshold

2: CWSB(STc) - Conditional Web Access Successive Chain Base of

Transactions STc

3: SUAE = {ej|1 ≤ j ≤ n} – all web access events in CWSB(STc)

Output:

1: event queues and Head Table HT along with CWSB(STc).

Method:

1: Construct an empty Head Table HT for CSBP(STc) .

2: ¥ ej ϵ SUAE, when support(ej) ≥Support_Minimum, ej is inserted into HT.

3: ¥ conditional web access successive chain ϵ CWSB(STc) do

a) ¥ ej ϵ HT, insert the first item labeled ej in this sequence into ej -queue.

b) Discard all event items HT from this web access sequence.

4: Return event queues and CWSBP(STc) with HT .

For instance, the outcome after the creation of the event queues and the

Head Table for the Init-CWSB = {pqspr, tptqrpr, qpqupt, puqprur} is

shown in the below fig : 3.4

The representation for the each access event is given as (event:

count_event), in which name of the event is given by event and the count

signifies the number of sequences consisting of the item named as event

in the Init-CWSB. An event should have the minimum count, in this case

count shoud be at least 4, to be termed as a conditionally frequent event

item (with Support-Minimum = 75% and |Init-CWSB| = 4). Thus, the most

Page 16: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

61

conditionally frequent events in the Init-CWSB are given as (p:4), (q:4) and

(r:3).

The dashed lines initiating from the Head Table represents the p-

queue, q-queue and r-queue. The events in each sequence which are

labeled as non-frequent events s, t and u are removed. In the similar

manner, the Head Table and the event queues can be created for any

subsequent conditionally successive chain base using the algorithm

Event_Queue_Construction_Algorithm.

3.3.2.3 Constructing Sub-Conditional Sequence Base

The definition for the sub-conditional web access sequence base given as.

Definition 3.7

CWSB(WASP prefix+ ej) is known as sub-conditional web access sequence

base of CWSB(WASP prefix), if ej ≠ Ø ¥ web access transaction event ej in

the Head Table of CWSB(STc) , the Sub_CWSB_Construction algorithm

for creating CWSB(STc+ej) which is based on CWSB(STc) is as shown in

Fig: 3.5.

Sub_CWSB_Construction_ Algorithm

Input:

1: CWSB(STc) - Conditional Web Access Successive Chain (Sequence)

Base of STc

2: ej - an event in Head Table of CWSB(STc)

Output:

Page 17: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

62

1: CWSB(STc+ej ) - Conditional Web Access Successive Chain (Sequence)

Base of ej based on CWSB(STc)

Method:

1: Assign CWSB(STc+ej ) = Ø.

2: ¥ web access sequence event item in ej-queue of CWSB(STc) , insert its

suffix web access sequence into CWSB(STc+ej ).

3: Return CWSB(STc+ej ).

For instance, the fig: 3.4 shows the Init-CWSB. Using the Init-

CWSB, the suffix web access sequences of p by using the p-queue as

CWSB(p) be obtained and this suffix web access sequence is one of the

sub-conditional web access sequence bases of Init-CWSB. The fig: 3.6

show the result. CWSB(p) consists of {qpr:1, qrpr:1, qp:1, qprr:1}. The

notation qpr: 1 denotes the abbreviation of (b:1)(a:1)(c:1).

3.3.2.4 Single Sequence Testing for Conditional Sequence Base

In the present section, mining the CWSB(STc) can be terminated when

all the web access sequences in CWSB(STc) are merged to form a single

web access sequence. A part of the resultant recurrent successive chain

access patterns can be formed using single web access sequence. In

contrast, we can also build the Sub_Conditional_Sequence_Base for

CWSB(STc) and carry out repeated mining. The

Conditional_Sequence_Base_Testing_Algorithm is for checking that all

Page 18: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

63

the web access sequences can be merged to form a single web access

sequence and the algorithm is given in below in fig: 3.7

Conditional_Sequence_Base_Testing_Algorithm

Input:

1: CWSB(STc) – Conditional Web Access Successive Chain (Sequence)

Base of STc

2: HT – Head Table of CWSB(STc)

Output:

1: outcome – successful_flag or failed_ flag

2: Single_Sequence - single sequence of CWSB(STc)

Method:

1: Assign Single_Sequence = Ø.

2: If CWSB(STc) = Ø, return successful_flag and Single_Sequence = Ø.

3: For j = 1 to max length of web access sequences ϵ CWSB(STc) do

a) If all the jth elements in whole web access cotenation CSBP(STc) are

the same advent e. And if absolute enumerate of these advent

elements≥ Minimum_Support X |Init- CWSB|, create another advent

element e with the enumerate and insert it into Single_catenation

b) contrarily, rebound failed_ flag and Single_catenation = Ø.

4: Rebound calm_ flag and and Single_Sequence.

Page 19: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

64

For instance, in CWSB(p) = {qpr:1, qrpr:1, qp:1, qprr 1:1}, the outset

elements in each catenation can be assimilate to form one element (q :4),

although the abutting element canny be assimilate. Anon the amalgam is

cease and failed_flag. If the CWSB(pp) = {r:2, rr:1}, the web access

sequences are merged to form a single web access sequence r:3 and

returns the successful_ flag.

3.3.2.5 TCSM for Mining Periodic Sequential Access Patterns

The complete TCSM algorithm is shown in Fig: 3.8.

TCSM Algorithm

Input:

1: PC = [PCBT, CJ ] – Time recurrent calendar-based constraint which ha

recurrent calendar model PCBT and calendar deterrent example CI

2: Minimum_Support - Minimum support threshold

3: WATEDB = {WATEj |WATEj = (Tdj, WASPj ), 1 ≤ j ≤ n} – web access

catenation bond advent database, and WATEj is a web access catenation

bond advents which has bond database time Tdj and web access

successive chain pattern WASPj

4: TE = {tej|1 ≤ j ≤ n} – all access catenation bond advents in WATEDB

Output:

1: PSAPE - the Periodical Time Sequential Access Pattern events set

Method:

1: Assign PSAPE = Ø.

Page 20: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

65

2: Make Use of Preprocessing_Constraint_Algorithm to construct Init-

CWSB(CWSBP(STc) , STc = Ø).

3: Make Use of Event_Queue_Construction to build event queues for

CWSB(STc) .

4: Make Use of Conditional_sequence_Base_Testing to check

single_sequence for CWSB(STc) .

a) If result is successful, insert all ordered combinations of transaction

event items in frequent sequence items FSI = STc+Single_Sequence into

PSAPE.

b) Otherwise, ¥ event tej in Head Table of CWSBP(STc) , use

Sub_CWSB_Construction_Algorithm to build CWSB(STc+tej ). Set

STc = STc+tej and repeatedly mine CWSB(STc) from step3.

5: Return PSAPE.

For instance,

The full length recurrent sequential web access patterns with PC =

[(week_day

[1, 7], hour_time [0, 23]), ({6, 7}, {21, 22})] and Support_Minimum =

75% is given in Table 3.3.

3.4 PERFORMANCE EVALUATION

This segment discusses the performance of the proposed approach

with the conventional approaches for sequential access mining of the

patterns.

Page 21: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

66

The performance of the TCSM is compared with the conventional i.e. the

traditional version of the web access pattern mine algorithm i.e TWAPM

algorithm for recurrent successive chain access patterns mining. TWAPM

is the most effective and performance oriented algorithm which mines the

general sequential web access patterns using an effective data structure

also called as Web Access Pattern (WAP) tree. The performance of the

WAP mine algorithm is faster than the traditional Apriori- based in the

order of magnitude. Thus , we use only the TCSM algorithm and TWAPM

algorithm for comparison.

To handle the time recurrent calendar-based constraints, the

Preprocessing_Constraint_Algorithm is used over TWAPM to obtain all the

constraint-satisfied web access transactions from the actual web access

transaction database. After obtaining the constraint-satisfied

transactions the WAP-tree is constructed and the WAP mine algorithm is

used mine the recurrent successive chain patterns.

The proposed methodologies for sequential access pattern mining has

been coded using java language in this section.

The hardware requirements in order to perform this experiment are 3.0

GHZ Pentium 4 PC computer, 512 MB RAM, Microsoft Windows Xp

Professional as an operating system. The database used is the web data

for mining association rules from the Microsoft anonymous web data. The

data details used here is a group of sessions which has reference for web

page in sequence for every session.

Page 22: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

67

PC = [(week_day [1, 7], hour_time [0, 23]), ({6,7}, *)], represents every hour

of every weekend. Around 22,717 Coercion amuse web access catenation

transactions are used for the achievement measurement. For estimate

the achievement of the two experiments are carried out.

The first experiment measures the scalable of both the algorithms with

adoration to various support threshold. It uses 22,716 constraint

satisfied web access transactions and uses the threshold values from

0.2% to 2.4%. The fig: 3.9 shows that the TWAPM run time sharply rises

when the support threshold comes down. Thus the TCSM uses less time

time than TWAPM.

The second experiment measures the Scalable of the both the

algorithms a with adoration to various sizes of the coercion amuse we

access catenation .The experiment performs to use if a constant support

threshold(0.2%) with heterogeneous databases(whose size varies

from5,000 to 22,727 coercion amuse web access catenation).The probe

results in fig 3.9(b) exposition that the TCSM has more better scalable

than the TWAPM although the

3.5 SUMMARY

Brief this chapter convene on adumbrate an adequate avenue, namely

TCSM for mining common and recurrent successive chain i.e. Catenation

access patterns which are based on time recurrent calendar based

coercion constraints which are used for defining the Real time

apprehension .The achievement of the TCSM algorithm and traditional Of

Page 23: CHAPTER 3 Mining Sequential Access patterns - …shodhganga.inflibnet.ac.in/bitstream/10603/2194/5/05_chapter 3.pdf · CHAPTER 3 Mining Sequential ... In particular, it investigates

68

the WAP mine algorithm have been estimate and collate with. The

conclusions of the experiments have given the result that TCSM

algorithm is efficient and performs much better than the TWAP-mine

algorithm. The TCSM gives the best result When the support threshold

decreases and the number of web access catenation increases.