efficient frequent mining frequent patterns without candidate fpgrowth 2004requent pattern mining...

Upload: vu-duc-toan

Post on 03-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014

    1/15

    Efficient frequent pattern mining based on Linear Prefix tree

    Gwangbum Pyun a, Unil Yun a,, Keun Ho Ryu b

    a Department of Computer Engineering, Sejong University, Seoul, Republic of Koreab Department of Computer Science, Chungbuk National University, Cheongju, Republic of Korea

    a r t i c l e i n f o

    Article history:

    Received 24 April 2013

    Received in revised form 11 October 2013

    Accepted 12 October 2013

    Available online 24 October 2013

    Keywords:

    Data mining

    Frequent pattern mining

    Linear tree

    Pattern growth

    Knowledge discovery

    a b s t r a c t

    Outstanding frequent pattern mining guarantees both fast runtime and low memory usage with respect

    to various data with different types and sizes. However, it is hard to improve the two elements sinceruntime is inversely proportional to memory usage in general. Researchers have made efforts to

    overcome the problem and have proposed mining methods which can improve both through various

    approaches. Many of state-of-the-art mining algorithms use tree structures, and they create nodes

    independently and connect them as pointers when constructing their own trees. Accordingly, the

    methods have pointers for each node in the trees, which is an inefficient way since they should manage

    and maintain numerous pointers. In this paper, we propose a novel tree structure to solve the limitation.

    Our new structure, LP-tree (Linear Prefix Tree) is composed of array forms and minimizes pointers

    between nodes. In addition, LP-tree uses minimum information required in mining process and linearly

    accesses corresponding nodes. We also suggest an algorithm applying LP-tree to the mining process. The

    algorithm is evaluated through various experiments, and the experimental results show that our

    approach outperforms previous algorithms in term of the runtime, memory, and scalability.

    2013 Elsevier B.V. All rights reserved.

    1. Introduction

    As a part of the association rule mining, frequent pattern mining

    is a method for finding frequent patterns in large data [15]. The

    patterns obtained from mining operations are usefully utilized to

    analyze data characteristics or gain information needed for

    decision-making. In addition, it can be applied in a variety of real

    data analyses such as web data [20], customer data in finance,

    correlation of product data, vehicle and communication data [9],

    bio data[13], hardware monitoring of computer system [45], and

    regular pattern mining[28]. In pattern mining, a pattern is a set of

    items in a certain database, and a support of the pattern is defined

    as the number of transactions containing the pattern, where we

    regard patterns satisfying a given minimum support threshold as

    frequent ones. Apriori [1] and FP-growth [14] are fundamentalalgorithms in frequent pattern mining, and current studies are

    proceeding based on the twoalgorithms. Moreover, other numerous

    methods have beensuggested. First, there are methods usingclosed

    patterns such as BMCIF [8], and CEMiner [9] and those for maximal

    patternssuch as MAFIA [5], FP-MAX [12], LFIMiner[16], MCWP[41],

    and MWS [42]. Furthermore, there exist other approaches for

    stream environments such as WMFP-SW[19], BSM[30], CPS-tree

    [31], and RPS-tree[32], and for utility patterns such as HUIPM [2],

    HUPMS[3], and UP-growth [34]. The following techniques apply

    item weights into the mining process. WARM[33], WAS[39],and

    MWFIM [40] are weight-based algorithms, and TIWS [7] adds

    weights with times. In addition, there is an approach which finds

    frequent patterns from the top support to kth support without

    any given minimum support threshold. The method is called

    Top-k pattern mining, and typical studies are MinSummary [18],

    PND[24], Chenoff[35], Topk-PU[43], SpiderMine[44], etc. In the

    sequential pattern mining considering item sequence, there are

    SeqStream [6], StreamCloseq [10], ApproxMAP [17], TD-seq [22],

    CSP [27], WSpan [38], and so forth. U2P-Miner[23] mines uncertain

    data, and GAMiner[36]gives meaning to interesting patterns and

    then extracts patterns. Developing an improved algorithm for the

    frequentpattern mining can contribute to advancingminingperfor-

    mance in various mining fields. FP-growth-based frequent patternmining, such as FP-growth [12],patricia-tree [26], and IFP-growth

    [21], has the following characteristics. FP-growth has connection

    information among all nodes in FP-tree in order to search thenodes.

    Therefore, it has many pointers for connecting nodes, thereby using

    a lot of runtime and memory resources. In this paper, we, therefore,

    propose a novel tree structure, LP-tree (Linear prefixtree) and an

    algorithm using the tree, called LP-growth which can conduct

    mining operations more quickly and efficiently than previous

    algorithms. OurLP-treecan solve the above limitation dueto its spe-

    cial structure basedon thelinearform. Wecan obtain advantages by

    converting trees nodes as array forms. It can increase memory effi-

    ciency through arrayed nodes since they can reduce connection

    0950-7051/$ - see front matter 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.knosys.2013.10.013

    Corresponding author. Tel.: +82 234082902.

    E-mail addresses: [email protected] (G. Pyun), [email protected] (U. Yun),

    [email protected](K.H. Ryu).

    Knowledge-Based Systems 55 (2014) 125139

    Contents lists available at ScienceDirect

    Knowledge-Based Systems

    j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / k n o s y s

    http://dx.doi.org/10.1016/j.knosys.2013.10.013mailto:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.knosys.2013.10.013http://www.sciencedirect.com/science/journal/09507051http://www.elsevier.com/locate/knosyshttp://www.elsevier.com/locate/knosyshttp://www.sciencedirect.com/science/journal/09507051http://dx.doi.org/10.1016/j.knosys.2013.10.013mailto:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.knosys.2013.10.013http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://crossmark.crossref.org/dialog/?doi=10.1016/j.knosys.2013.10.013&domain=pdfhttp://-/?-
  • 8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014

    2/15

    information. We can also speed up item traversal times since

    LP-tree does not use pointers in most cases and generates a large

    number of nodes at once due to its linear structure. By applying

    the features of LP-tree to mining process, we can obtain the follow-

    ing benefits: (1) Tree generation rate of our approach becomes fas-

    ter than that of FP-growth since ours can create multiple nodes at

    once by a series of array operations. Meanwhile, FP-growth makes

    nodes one by one. (2) We can access parent or child nodes without

    corresponding pointers when searching trees since the nodes are

    stored as an array form. (3) Memory usage for each node becomes

    relatively small since LP-tree does not require internal node point-

    ers. (4) It is possible to traverse trees more quickly compared to

    searching for them with pointers since our approach directly acces-

    ses corresponding memories due to the feature of the array struc-

    ture. This paper is organized as follows. In Section 2, we introduce

    related work with respect to LP-tree and LP-growth, and describe

    details for our techniques and algorithmin Section 3. Next, wecom-

    pare performance of our algorithm with those of previous algo-

    rithms through various experiments in Section 4, and we finally

    conclude this paper in the last section.

    2. Related work

    Frequent pattern mining extracts specific patterns with sup-

    ports higher than or equal to a minimum support threshold, and

    many of mining methods have been researched as mentioned

    above, but Apriori [1] and FP-growth [14] are still regarded as

    underlying algorithms. Apriori is the oldest conventional mining

    algorithm, and it performs mining operations by extending pattern

    lengths. The algorithm generates candidate patterns through the

    pattern extension in advance, and then confirms whether the can-

    didates are actually frequent patterns by scanning a database. Con-

    sequently, Apriori has no choice but to scan the database as much

    as the maximum length among frequent patterns. UT-Miner [37] is

    an improved Apriori algorithm specialized in sparse data, where

    sparse data indicate that most transactions are different from eachother. The algorithm uses an array structure, unit triple storing rela-

    tions between items and transactions in a database to improve

    mining performance. However, UT-Miner does not guarantee fine

    performance in terms of runtime and memory usage since the

    algorithm is based on Apriori method. On the other hand, FP-

    growth [14]solved the above problemby scanning a database only

    twice. It uses a tree structure, called FP-tree, which can prevent the

    algorithm from generating candidate patterns. FP-tree consists of a

    tree for storing database information and a header table containing

    item names, supports, and node links. A tree is composed of nodes,

    where each of them includes an item name, a support, a parent

    pointer, a child pointer, and a node link. The node link is a pointer

    that connects all nodes with the same item to each other. Since the

    FP-growth algorithm was proposed, various algorithms have beenpublished on the basis of the algorithm. FP-growth-goethals [11]is

    a FP-growth implementation which is optimized by Bart-Goethals.

    To increase efficiency of search space in FP-growth, FP-growth-tiny

    [25] generates conditional FP-trees using conditional patterns

    without creating any conditional database. In CT-PRO [29], the

    authors suggested Compressed FP-tree adding a count array into

    the nodes of the FP-tree, where each entry of the array corre-

    sponded to the number of itemsets occurrences. The algorithm

    mines frequent patterns using the information added in the tree

    without recursive calls. IFP-growth [21] enhanced pruning effect

    with a new tree structure, FP-tree+, where the tree adds an address

    table to the FP-tree. Therefore, the algorithm decreases the number

    of conditional FP-trees, thereby improving mining speed. Mean-

    while, it needs more information than the original FP-tree. In addi-tion, IFP-growth does not upgrade memory efficiency although this

    contributes to reducing runtime. MAFIA-FI[5]saves data informa-

    tion into a bitmap form so as to reduce the number of tree

    searches. The bitmap is made up of two dimensions, where x-axis

    means items and y-axis is transactions. For example, a point (2,4)

    of a certain bitmap means that the second item exists in the fourth

    transaction. Thus, MAFIA-FI can compute patterns or items sup-

    ports through AND operations of the bitmap without tree tra-

    versals. In addition, the algorithm can prevent creating needless

    trees with infrequent patterns and maximize pruning efficiency.

    However, MAFIA-FI requires more memory although its runtime

    is faster than the original method. Patricia-tree [26]also uses an

    array structure to a part of the FP-tree, where the algorithm gener-

    ates paths with the same support as an array. Meanwhile, the

    LP-tree proposed in this paper constructs all paths as arrays

    regardless of items supports, where the shapes of the arrays vary

    depending on each transactions form. FP-growth [12] proposed

    FP-array with pattern information and increased pruning efficiency

    with FP-array. The approach calculates supports of patterns to be

    expanded in advance, and eliminates infrequent patterns effec-

    tively through the proposed FP-array. However, FP-growth also

    does not reduce the size of trees since it still uses the original

    FP-tree-based structures. As a result, we need to develop a new

    tree structure to improve fundamental performance of the mining

    algorithm. Consequently, we propose a novel tree and algorithm

    for satisfying both runtime and memory efficiency. In our LP-tree,

    its runtime and memory performances are more outstanding than

    those of FP-growth due to its special tree structure based on the

    array.

    3. Frequent pattern mining based on Linear Prefix-tree

    In this section, we present details of LP-growth algorithm and

    related techniques. The algorithm conducts mining operations

    with LP-tree and corresponding growth methods.

    3.1. Preliminaries

    Given a transaction database, D,I= {i1, i2, . . ., in} is a set of items

    composingD, andD consists of multiple transactions. All transac-

    tions have each a unique set of items.D includes uniqueIDs, called

    TIDs, with respect to each transaction. A pattern is defined as a sub

    or whole set ofI. Assuming that any pattern Phas several items and

    its first and last ones are ib and ie respectively, P is denoted as

    follows.

    P fib; . . . ; ieg; 1 6 b< e 6 n:

    Ps support means the number of transactions containing in D.

    In other words, this indicates how muchP occurs inD. Let jPj be

    the number of transactions including Pandj

    Djbe the number of

    all transactions in D. Then, we can calculate Ps support rate,sup(P)

    as follows.

    supP jPj=jDj;

    where 0 6 sup(P) 6 1.Pis regarded as a frequent pattern ifsup(P) is

    not smaller than a given minimum support (or minsup). Denoting

    the frequentPasL, it is also included in Iand satisfies sup(L)P min-

    sup, where 0 6 minsup 6 1.

    L fP# IjsupPP minsupg:

    For instance, given a database, {{TID1: a,b,c}, {TID2: a,b}, and {TID3:

    b,c,d,e}},IbecomesI= {a,b,c, d,e}. If a minimum support threshold

    is 60%, a pattern, {a,b} is frequent since it appears in TID1 and 2;thus, its support is higher than the threshold. Meanwhile, another

    126 G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139

  • 8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014

    3/15

    pattern, {a,c} is infrequent because it is contained in onlyTID1, and

    therefore, its support is lower than the threshold.

    3.2. LP-tree: a novel tree structure for mining frequent patterns

    There are several limitations in regard to previous frequent pat-

    tern mining methods. Basically, frequent pattern mining has to find

    all frequent patterns in a transaction database. Thus, in the worst

    case, the method should extract 2n 1 patterns since all of the

    n-items in a database are frequent. Moreover, Apriori-based algo-

    rithms consume more time and memory due to generation of can-

    didate patterns. Meanwhile, FP-growth spends most of the time

    traversing and generating trees. Note that the problem for the

    Apriori is not under consideration, and we focus on the last FP-

    growth approach because FP-growth-based algorithms generally

    have better performance than that of Apriori-based algorithms.

    For improving performance, we should decrease tree traversal

    and generation times. To do that, tree structures need to have a

    simple form, and each node in the trees has to occupy smaller

    memory space. Our LP-tree satisfies both of them, so LP-growth

    with the tree structure can conduct the tree creation and search

    efficiently.

    Definition 1 (LP-tree (Linear Prefix tree)). LP-tree has the following

    structure: (1) Header list consisting of item-names, supports, and

    node links, (2) Linear Prefix Node (LPN) for storing frequent items

    of each transaction and a corresponding header, and (3) Branch

    Node List (BNL) including information of branch nodes and their

    child nodes. LP-tree consisting of c LPNs has the following

    structure.

    LP tree fHeaderlist; BNL; LPN1; LPN2; . . . ; LPNcg:

    LP-tree entirely has a linear structure. Each set of frequent items is

    saved into nodes composed of an array form, where we use multi-

    ple arrays since one array structure cannot express items as a tree

    form with many branches. To connect each array, every array hasa header in the first part of the array, where the header indicates

    its parent array. LPN contains a header and an array node storing

    pattern, and the array node consists of several internal nodes.

    Moreover, a header of any LPN can indicate a root of the tree when

    the LPN is the first one inserted in the tree. Details of LPN are

    mentioned inDefinition 2. LP-tree is composed of more than one

    LPN as shown inFig. 1.

    Definition 2 (LPN (Linear Prefix Node)). LPN is a fundamental

    structure of LP-tree. In an LPN, there are multiple internal nodes

    and a header in the top position of the LPN. Let Parent_Link, i,

    S, L, and b be a parent node pointer connected to another

    LPN (parent LPN), an item, a support, a node link, and branch infor-

    mation respectively. Then, the following Eq. (1) represents how

    LPN is composed, where each of internal node information is

    described between h

    and i

    :

    LPN fhParent Linki; hi1; S; L; bi; hi2; S; L; bi; . . . ; hin; S; L; big: 1

    LPN stores item information into each node. That is, if certain items

    {i1, i2, . . ., in} is added in an LPN, its array node has n-internal nodes.

    In this process, finite internal nodes are generated according to the

    number of inserting items, thereby dealing with whole pattern

    information. In Eq. (1), we can express that a parent node of

    hin, S, L, bi is hin1, S, L, bi and a child node ofhin1, S, L, bi is hin, S, L, bi.

    Parent_Link is a pointer indicating a parent node ofhi1, S, L, biwhich

    is the first node of the LPN. The parent node connected to the

    Parent_Link becomes either a root or any node of another LPN. We

    define the symbol, gpc;k , to express a pointer of a specific node insideLPN. Given a certaincth LPN withn nodes, LPNc, gpc;k indicates thekth node of the LPNc(k= [0,

    . . .

    , n]). gproot is a pointer to the root(0th node is a header node storing Parent_Link). If a parent of the

    first node in any LPN is the 5th node of LPN1, then its Parent_Link

    becomes gp1;5 . An internal node of LPN has four elements as inEq.(1), where each subscript indicates corresponding ordinal num-

    bers. A header in LPN is linked to a branch node of its parent. Thus,

    we can gain patterns tracking headers. A node link, L, plays a role

    in concatenating nodes with the same item. LPN does not have any

    pointers for connection among internal nodes. A branch node has

    more than two child nodes. Therefore, LPN uses a branch node in or-

    der to express nodes having more than two child nodes. The b is

    used as a flag value to mark whether branch nodes exist or not. LPN

    does not manage two or more child nodes due to the arrays limita-

    tion. For this reason, we propose and use BNL for managing the

    branch nodes in order to deal with multiple child nodes.

    Definition 3 (BNL (Branch Node List)). BNL helps manage numer-

    ous branch nodes when generating LP-tree. When items for each

    transaction are inserted, they are sequentially inputted from a root,

    and several branches can occur in this process. If a current position

    reaches any branch node during the insertion, we confirm all child

    nodes of the branch node and then move to appropriate location by

    referring to BNL information. We can easily access multiple child

    nodes through BNL, which is constructed as list forms and stores

    only information of branch nodes and their child nodes. BNL is

    composed of branched node table and child node list, where the

    branched node table stores pointers of all branched nodes and each

    element (pointer) stored in it has onechild node list. Thechild node

    list has child node pointers of a corresponding branched node.Therefore, assuming that LP-tree has i branched nodes, Bi is a

    pointer indicating ith branched nodes, and Ci,j is a pointer of the

    jth child node of Bi. Hence, BNL consists of branched node

    table= {B1, B2, . . ., Bi} and child node list= {{C1,1, C1,2, . . .}, {C2,1, C2,2,

    . . .},. . ., {Ci,1, Ci,2, . . ., Ci,j}}. After matching them using the symbol,

    ?, we can denote BNL as follows, where {Bi? Ci,1, Ci,2, . . ., Ci,j}

    means that Bi indicates a set of child node pointers, {Ci,1, Ci,2, . . ., Ci,j}.

    BNL ffB1! C1;1; C1;2; . . .g; fB2! C2;1; C2;2; . . .g; . . . ;

    fBi ! Ci;1; Ci;2; . . . ; Ci;jgg 2

    Fig. 2 shows the entire BNL structure, where the structure is

    mapped with Eq.(2).

    We can also express a set of child node pointers forBistored in

    BNL as the following equation:Fig. 1. Structure of Linear Prefix Nodes (LPNs).

    G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139 127

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014

    4/15

    BNLBi fCi;1; Ci;2; . . . ; Ci;jg:

    BNL has child node pointers as many as the number of child nodes

    of the branch nodes. Pointers of child nodes stored in BNL are sorted

    in their item name order to conduct a binary search. Thus, we can

    directly access internal child nodes with no branches, while we

    should indirectly access through BNL the other child nodes with

    branches.

    Example 1. Given a certain database as shown inTable 1, when 4

    sorted transactions from TID 1 to 4 are inserted in a LP-tree, the

    tree has 4 LPNs: LPN1-{e,a,f,c,d,g,h}, LPN2-{b,a,f}, LPN3-{c,h}, and

    LPN4-{b,f,g}, where the root of the tree and the node ofLPN1 having

    an item e become branch nodes. Then, B1of a branched node table

    inFig. 2becomes gproot. C1,1 becomes the second child node of theroot, i.e., the node ofLPN4 with b, gp4;1 .B2 is assigned as the nodeofLPN1with e, denoted as gp1;1 , andC2,1 becomes the node ofLPN2with b which is the second child node of the node ofLPN1 having

    e, denoted as gp2;1 . Similarly,C2,2 is p3,1, which is the node ofLPN3with c.

    Definition 4 (Header list). One of the elements of LP-tree, Header

    list, has information needed for mining patterns from the tree,

    where the list consists of item-name, item-support, and node-link.

    Item-name denotes names of items consisting of LP-tree, and item-

    support means the number of items with the same name in the

    tree. For example, if any item name is a and its support is 5, it

    implies that item a occurs 5 times in the tree. Node-link is con-

    nected with the first node among all of the nodes with the same

    item in the tree, and then the first node is concatenated with the

    second node again. Terminating the connection, one chain is gen-

    erated by concatenating all the nodes with the same item.

    3.3. Constructing LP-tree

    In this section, we describe a method for creating LP-tree. Treeconstruction is conducted as follows. We scan a database and

    count all item supports. Thereafter, we sort all items in their sup-

    port descending order and then generate a corresponding header

    list, where the list stores items according to the sorted order.

    Namely, the upper items in the list have greater supports while

    the lower ones have smaller values. The insertion approach of

    LP-trees transaction is divided into the two cases. The former

    one is that the first transaction is inserted into the LP-tree. Its pro-

    cedure is as follows. First, we generate LP-tree by scanning the

    database again and sort the first transaction depending on the se-

    quence of the header list. That is, its items with smaller support

    than the minimum support are deleted, and the remaining items

    are sorted in support descending order. After that, we insert the

    sorted transaction into the tree, where LPN is created and con-

    nected to a root since the tree is initially empty. Then, the first

    transaction is entered into one LPN, which has internal array nodes

    as many as the transaction length. That is, if any transaction length

    is n and all items of the transaction are inserted in one LPN, the size

    of LPN isn + 1 including a header. We connect LPNs header to its

    parent after inserting the transaction items, where the header is

    linked to a root since the current LPN is initially added to the tree.

    We add a pointer of the root to thebranch node table and store the

    first node of the newly created LPN into the child node list con-

    nected to the root pointer. The second case is when all of the trans-

    actions except for the first one are added in the tree. Its insertion is

    performed as follows. We remove infrequent items in the inserted

    transaction and sort its frequent items in support descending or-

    der. Next, we add into BNL the addresses of the root and the first

    node (i.e. header) of the current LPN since the root makes a child

    node and thereby a branch occurs. Then, we insert the transaction

    comparing corresponding paths from the root. Thereafter, we con-

    firm all the child nodes of the root with BNL information since the

    previous transaction is already added in the tree, i.e. the root has

    one or more child nodes, where, we initially check the internal

    child node of the current LPN. If the item to be inserted is the same

    as the item of the checked node (the internal node), the current

    location moves to that node, and its support is increased by 1.

    Otherwise, to confirm the other child nodes, we read the corre-sponding branch information in BNL and then increase support of

    the current item by 1 if the item is equal to the node derived from

    BNL. If it is not equal to that, we generate a new LPN and insert

    remaining items of the transaction in the new LPN, where the cur-

    rent node becomes a branch node and is added in BNL. Assuming

    thatn is the length of any transaction and ris the number of items

    already inserted in the previous LPN, we store the remaining items

    in the new LPN at the same time, where the number of array nodes

    in the LPN is nr+ 1 (including a header). To store all transactions

    with no problems, LP-tree connects all of its nodes in one of two

    ways. First, internal nodes of LPN are directly connected to each

    other without any pointer. Second, when any branch occurs, LP-

    tree links corresponding child nodes utilizing BNL. Processing all

    transactions, we can gain a complete LP-tree. Once the LP-tree con-struction terminates, BNL is eliminated because it is not used any

    longer. LP-tree generated by the above processes can store all

    transactions in a given database, and all internal and external

    nodes of LPNs can be connected by the following Lemma 1.

    Lemma 1. We can access all internal nodes in LPN without any

    pointer which connects parent nodes with child ones while nodes of

    the other LPN can be linked through the LPNs header and BNL.

    Proof. Since LPN is composed of array nodes, we can find nodes

    directly without pointers due to characteristics of the array. Given

    any node, d , its parent node and child node are denoted as d 1

    and d + 1, respectively. However, d+ 2 indicates ds descendantnode, notds second child node since the array has only one child

    Fig. 2. Structure of Branch Node List (BNL).

    Table 1

    A transaction database.

    TID Original items Sorted items

    1 a, c, d, e, f, g, h e, a, f, c, d, g, h

    2 a, b, e, f e, b, a, f

    3 c, e, h e, c, h

    4 b, f, g b, f, g

    5 a, b, d, e, g e, b, a, d, g

    6 e, g e, g

    7 b, c, e, f e, b, f, c

    8 a, b, c, e, f e, b, a, f, c

    9 a, d, e e, a, d

    10 b, d, e e, b, d

    128 G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139

  • 8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014

    5/15

  • 8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014

    6/15

    is allocated, both of the trees use O(jfreq(Trans)j) with respect to

    the time to store item information. Thus, runtimes after storingthe transaction becomes O(2 jfreq(Trans)j) in FP-tree and

    O(jfreq(Trans)j+ 1) in LP-tree. In other words, since FP-tree gener-

    ates and records nodes one by one, it uses the time by

    O(2 jfreq(Trans)j). However, since LP-tree generates nodes at

    once, it needsO(1), and the total time becomes O(jfreq(Trans)j+ 1)

    by adding the information record time. In the step of transaction

    insertion, LP-tree is more efficient than FP-tree when nodes related

    to items of any inserted transaction have 3 or less child nodes on

    average, according to the followingLemma 2.

    Lemma 2. Let a and b be search times needed when we insert acertain transaction in LP-tree, and FP-tree, respectively. Then, it is true

    thata < bif the average number of child nodes related to the inserted

    transaction is not greater than 3.

    Proof. In the transaction insertion, we first confirm whether or not

    a certain itemto be inserted after the current node exists among its

    child nodes, and then, the current position moves to a correspond-

    ing node or a new child node is created according to the result. Let

    n,c, and K be the number of nodes which we have to visit to inserta transaction, the number of child nodes for each visited node, and

    a set of c, respectively. Then, K is denoted as the followingequation.

    K fc1; c2; . . . ; cng; cP 1:

    To check whether the next inserted item exists among the child

    nodes, FP-tree finds child nodes through the binary search method.

    Since it accesses child node pointers and then visits corresponding

    nodes, these processes require 2 Pn

    i1lgcitimes, andn times areadditionally considered because we should move to the next nodes

    as many as the number of the visited nodes, n. Therefore, the total

    search time of FP-tree, b is as follows.

    b 2 Xni1

    lgci n:

    LP-tree directly accesses internal child nodes of the current LPN

    while it indirectly traverses the other child nodes through BNL. That

    is, LP-tree first checks if the item to be inserted is equal to that of

    the internal node, and then, it accesses the other child nodes using

    BNL if there is no same item. Since searching for child nodes in BNL

    is based on the binary search as in the FP-tree, LP-tree needs ntimesfor traversing child nodes in BNL and 2

    Pni1lgci 1 times to

    search for BNL, where LP-tree needs lg(ci 1) instead oflg(ci) due

    to the advantage of the internal nodes. In addition, when the cur-rent location moves to the next inserted nodes, LP-tree requires

    n-1 times, not n since it can directly access the nodes if they exist

    in the current LPN. Accordingly, the total time of LP-tree, a is de-noted as follows:

    a 2 Xni1

    lgci 1 n 1 n:

    The relation,a < b is equal to

    2 Xni1

    lgci 1 n 1 n< 2 Xni1

    lgci n:

    Solving this is as follows:

    Xni1

    lgci Xni1

    lgci 1>n 1

    2

    Xni1

    lgci lgci 1 1

    2

    >

    1

    2:

    In the above inequality, lgci lgci 1 12

    should not be smaller

    than 0 so that the formula is true. Thus, ifci is less than approxi-mately 3.414215, the formula lgci lgci 1 >

    12

    is satisfied. Con-

    sequently, it is certain thata< bwhen the average number of childnodes is not greater than 3. h

    In Section4.2, we will show the experimental results of calcu-

    lating the average number of child nodes regarding the LP-trees

    generated from various datasets, where we will be able to see thatthe number does not exceed 3 in any case.

    Next, we compare runtimes with regard to searching the trees.

    FP-tree uses O(1) since it can find any target node directly using

    the node link. LP-tree also consumes O(1) because of the node link.

    However, there occurs a difference when the two trees search from

    the item selected by the node link to a root. In here, we have to

    consider time calculation depending on whether the current struc-

    ture is array-based or pointer-based form. The array-based form

    (LP-tree) has a structure of which all data is stored contiguously.

    Therefore, it can directly access any node by approaching real

    memory at the same time. On the other hand, using the pointer-

    based form (FP-tree), we have to access a certain node indirectly

    since we confirm where any pointer is stored, and then we ap-

    proach the corresponding memory. That is, the first requires oneaccess, while the second needs two tasks [4]. Thus, assuming that

    Fig. 4. A state of LP-tree inserting 310 transactions additionally to Fig. 3.

    130 G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139

  • 8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014

    7/15

    t is memory access time, the direct approach uses O(t) while the

    indirect one uses O(2 t). Therefore, considering all of the above

    results, we know that LP-tree is faster than FP-tree in most cases.

    3.4.2. Integrating LPN

    In the previous section, we learned how to create LP-tree. How-

    ever, this method can cause fragmentation of LPNs since each

    transaction is processed individually without comprehensive con-siderations. That is, any transaction may be stored in multiple LPNs

    even though it can be inserted in only one LPN sufficiently. For

    example, assuming that we insert two transactions, {a,b, c} and

    {a,b,c,d} in an empty LP-tree, the first one is fully stored in one

    LPN. Thereafter, in the second transaction, there occurs a branch

    in item c, and then a new LPN is created and the remaining item

    d is added in the LPN. Thus, the second LPN has a small number of

    array nodes. To generate internal nodes as many as possible for

    each LPN, we can consider attaching nodes at the very end of

    LPN. Through the LPN Integrating operations, certain nodes are in-

    serted at the very end of any LPN. Let I= {i1, i2, . . ., in} be any itemset

    to be added and a be an item of internal nodes of LPN, i.e.

    LPN= {hParent_Link i, ha1, S, L, bi, ha2, S, L, b i, . . ., ham, S, L, bi}, m< n.

    Then, in order to apply it, the following conditions have to be sat-isfied: (1) The length of the inserted itemset is longer than that of

    the target LPN (i.e. m< n); (2) Items of internal nodes in the LPN are

    equal to the upper part of the inserted items; and (3) Sequence of

    the common part should be consistent (i.e. i1= a1,i2= a2, . . ., im=

    am, 1 6 m< n). If these conditions are completely satisfied, we con-

    duct item insertion steps according to the following process: (1)

    Supports of the common part between them increase by 1. (2)

    We assign an array of a new LPN with the length computed from

    the inserted items. (3) All the nodes of the previous LPN are in-

    serted into the new LPN. (4) The remainder of the itemset is added

    in the very end of the new LPN. (5) The previous LPN is deleted. By

    applying this technique, we can make LPNs have more array nodes

    compared with the previous LPNs. If shapes of transactions com-

    posing any database are similar to each other, the LPN integratingoperations are more needed, and LPNs length becomes longer

    whenever these operations are performed.

    Since the LPN integrating technique is used only when a length

    of any inserted transaction is longer than that of a target LPN, the

    longer the length of the LPN is, the lower the possibility of the LPN

    integrating operations is.

    Example 4. Let {a,b,c,d,e,f} be a set of items to be inserted in LP-

    tree. Then, assume that, as shown inFig. 5(a),LPN1is connected to

    a root, andLPN2

    andLPN3

    are linked to the node with b inLPN1

    .

    Inserting the set of items without the LPN integrating technique,

    the corresponding LP-tree is shown in Fig. 5(b). In short, the

    number of LPN increases from 3 to 4. In contrast, using

    the technique, the resulting LP-tree becomes Fig. 5(c). That is, the

    number of LPNs is not increased since LPN1is rebuilt depending on

    a series of tasks as mentioned above.

    3.5. Mining frequent patterns based on LP-tree (LP-growth)

    LP-growth searches LP-tree and creates a conditional LP-tree for

    mining frequent patterns. To do that, our algorithm first selects the

    bottom item from the header list and traverses nodes connected to

    corresponding node links. Then, supports of the visited nodes are

    stored, and nodes from each linked node to a root are searched.Each node can be accessed directly if the search is conducted with-

    in one LPN. In other words, given a current node,Nk, the algorithm

    immediately accesses N(k1)to approach a parent node ofNk. Iterat-

    ing the traversal regarding one LPN, the algorithm reaches a header

    of the LPN, where the header refers to its parent node, i.e. the other

    LPN. LP-growth stops operations if the next position of the header

    is a root; otherwise, it continues to find nodes tracking the other

    LPN linked fromthe header. If any header indicates a root, it means

    that the corresponding path has been searched completely, and

    items in the path become a conditional transaction with support

    of the first visited node. After visiting all of the other nodes refer-

    ring to the node links, the algorithm constructs a conditional pat-

    tern base (conditional database) with the obtained results. After

    that, we compute item supports in the conditional database, andsome of the items are eliminated in the database if their supports

    are less than a given minimum support threshold. Each transaction

    Fig. 5. Item insertion applying the LPN integrating strategy.

    G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139 131

    http://-/?-http://-/?-
  • 8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014

    8/15

    of the conditional database is sorted in support descending order,

    and then a new LP-tree is generated from the sorted database.

    The new one becomes a conditional LP-tree and includes a prefix

    itemset, a frequent item or pattern selected in the previous phase.

    If a certain LP-tree forms a single-path, all combinations of the tree

    are considered as frequent patterns in common with the FP-growthapproach. Therefore, in this case, the algorithm extracts frequent

    patterns joining the prefix itemset and each of the combinations.

    Searching trees in FP-tree requires numerous pointer usages in

    general since it has to use pointers to move from any node to an-

    other one. Meanwhile, LP-tree can minimize the number of using

    pointers through the LPN strategy, which is proved as the following

    Lemma 4.

    Lemma 3. When any tree is traversed in bottom up manner, thenumber of using pointers in LP-tree is lower than or equal to that of

    FP-tree.

    Proof. Assuming that n is the length of a certain path from any

    node to a root, FP-tree needs the n number of using pointers in

    any case since it has to pass through n pointers with respect to

    the path. In the case of LP-tree, it consists of one or more LPNs

    and uses pointers (i.e. headers) only when new branches occur.

    Thus, LP-growth uses the pointer when visiting headers for each

    LPN. Here, we can consider the two cases as shown in Fig. 6(a and

    b). The first is when all of the visited LPNs have one node. Let jKcj

    be the numberof headers, i.e. LPNs, and jNj be the number of nodes.

    Then, the number of visiting nodes, R is denoted as R= jKcj+ jNj. In

    the first case in Fig. 6, wehaveto visit headers of LPNs jNj times. FP-tree refers to variables where parent pointers are stored so as to

    Fig. 6. Cases of tree searches in LP-tree and FP-tree.

    Fig. 7. Algorithm for LP-tree construction.

    132 G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139

    http://-/?-
  • 8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014

    9/15

    access parent nodes as shown inFig. 6(c). Therefore, in the FP-tree,

    the total number of using pointers from a certain node to a root

    becomes 2 jNjconsidering not only the pointer accesses but also

    node visits. In case 1, LP-tree refers to headers in order to approach

    parent nodes in common with the case of FP-tree, where R= 2 jNj

    because jKcj= jNj. Therefore, it is regarded as the worst case and

    needs pointer usages as many as the case of FP-tree. The number

    is decreased continuously as the size of LPN increases. In case 2 of

    Fig. 6, wecan calculate R= 1 + jNj since all the visiting nodes belong

    to the lone LPN and since we visit only one header. Thus, if there is

    an item set from any node to a root in one LPN, we can visit one

    header regardless of the number of the items. This is considered

    as the best case. That is, FP-tree uses jNj pointers while LP-tree

    needs only one pointer use. h

    Example 5. In Fig. 6, (a and b) showcertain LPNs in LP-tree, and (c)

    is a part of FP-tree. Note that they have the same data (items).

    When they search items from i4 to their own roots, Fig. 6(a) tra-

    verses the LPNs as the following sequence, hi4, 1,NULL,falsei,

    hgp3;1 i, hi3, 1,NULL,false i , hgp2;1 ihi2; 1; NULL;falsei; hgp1;1 i, h i1, 1,NULL,falsei, andhgprooti. Thus, the number of memory accesses (i.e., usingpointers) needed for searching nodes is 8. Meanwhile, Fig. 6(b)searches for them as the following sequence, hi4, 1,NULL,falsei,

    hi3, 1,NULL,falsei, hi2, 1,NULL,falsei, hi1, 1,NULL,falsei, and hgprooti,thereby accessing the memory 5 times. Since Fig. 6(c) uses pointers

    to traverse 4 nodes, it needs 4 pointer accesses and 4 node acces-

    ses. Thus, the total number of memory accesses is 8.

    3.6. LP-growth algorithm

    To mine frequent patterns, LP-growth constructs LP-tree with

    the algorithm Construct_LP-tree shown in Fig. 7. The algorithm

    first scans D to calculate items supports and generates a Header

    list (lines 13), and thereafter, D is scanned again to construct

    LP-tree (line 4). LP-growth performs item insertion starting fromthe root (lines 542). If there is an item matched with one of the

    child nodes of the root from BNL (line 6), the algorithm moves to

    the corresponding node and increases its support (line 7). Other-

    wise, a new LPN is generated (lines 912). After that, the algorithm

    confirms whether the current node, gpc;ris a branch node (line 15).After that, it refers to the next node, gpc;r1 ifgpc;r is not a branchnode since gpc;rhas one child node or none. It increases the corre-sponding node support by 1 ifgpc;r1 is equal to ik, i.e. the child nodehas the same item as the inserted one (line 17). If they are not

    equivalent, LP-growth creates a new LPN, wheregpc;r becomes anew branch node (lines 2529). After generating the new LPN with

    the size of the inserted itemset, the algorithm inserts the remain-

    ing items into the new LPN (lines 26). Then, its branch information

    is recorded in BNL (line 27).In the LPN integrating procedure (lines 1823), LP-growth con-

    ducts the LPN integrating operations if there are remaining items

    after the insertion to the current LPN. Since in is the last item, if

    ik is not the last one, it means that there are still items to be in-

    serted. The algorithm generates a new LPN with the length of

    the inserted itemset (line 19) and copies node information of the

    previous LPN to the new one (line 20). Thereafter, it stores the

    remaining items in the new LPN (line 21) and deletes the previous

    LPN (line 22). When the current node is a branch node, steps

    corresponding to the branch node operations are performed (lines

    3142). If the next array node has an item equal to the inserted

    item (line 32), the items support is increased by 1. After that,

    the current position moves to the next one (line 33). In case

    gpc;r1 is not equal to the itemto be inserted, the algorithm confirmschild nodes with BNL. LP-growth finds a location corresponding to

    gpc;rin BNL and searches for child nodes. If there is any child nodehaving the item matched with ik among the found ones (line 34),

    the algorithm increases its support by 1 and regards the corre-

    sponding node as the new current node (line 35). If none of the

    nodes exist in BNL, this algorithm makes a new LPN and inserts

    the remaining items (lines 3842).

    Fig. 8 shows the overall LP-growth algorithm, and it is per-

    formed as follows. First, LP-growth checks whether current LP-tree

    is a single-path (line 1). If it is true, the algorithm combines all the

    items in the path with the prefix (lines 24), where the results be-

    come valid frequent patterns. On the other hand, if it is multiple

    paths, the algorithm traverses the current tree and creates a condi-

    tional LP-tree (lines 519). Thereafter, our LP-growth selects an

    item, i in the header list at first in order to search the tree (line

    5), and it finds nodes from each node with the selected item to a

    root using the corresponding node links (line 6). Items of visited

    nodes are stored into L (line 10), where LP-growth directly accesses

    the immediately preceding node if the current location is inside

    LPN (line 11).

    If the current gpc;ris the first node (header) of LPN (line 12), thecurrent position shifts to the parent node pointer stored in the

    header of LPN (lines 1314). Iterating the traversal, we obtain all

    conditional transactions including i, where a set of conditional

    transactions become a conditional database, L0 (line 15). Con-

    struct_LP-tree procedure is called for generating a conditional LP-

    tree (line 16). After that, the algorithm removes BNL since it is

    not needed any longer in the current step (line 17) and then calls

    LP-growth recursively to extend the pattern (line 19).

    4. Performance evaluation

    In this section, we present experimental results by comparing

    our algorithm, LP-growth, with the state-of-art algorithms. In order

    to show that the experiments are reasonable, we evaluate their

    performances based on three important criteria: runtime, memory

    usage, and scalability. In addition, we also present tests for theaverage number of child nodes to prove the efficiency of LP-tree.

    4.1. Test environment and datasets

    LP-growth is written in C++, compiled at gcc 3.4.4, and run in

    3.3 GHz Intel processor, 8 Gbyte memory, and Windows 7 OS.

    Based on the environment, we compare our algorithm, LP-growth

    Fig. 8. LP-growth algorithm.

    G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139 133

    http://-/?-http://-/?-
  • 8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014

    10/15

  • 8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014

    11/15

    number with various settings of the minimum support threshold

    for each dataset. The average number of child nodes is the division

    of the sum of the child nodes for each node except for the leaf

    nodes by the number of all nodes without the leaves. The leaf

    nodes are not considered since they have none of child nodes.Figs. 9 and 10show the results of dense datasets such as Chess,

    Connect, and Pumsb, and sparse ones including Retail, Kosarak,

    T10I4D1000K, BMS-WebView-1, and Chain-store, respectively.

    From those results, we can observe that all of the LP-trees

    generated by these datasets have less than 3 Average numbers of

    child nodes (ACN) regardless of the minimum support threshold.

    Note that, when the minimum support is 10%, the results of

    T10I4D1000K, BMS-WebView-1, and Chain-store are not shown.

    because all the items mined from these datasets have smaller sup-

    port than the given minimum support threshold. That is, any tree is

    not constructed and none of frequent patterns are generated under

    this threshold setting.

    4.3. Runtime evaluation

    Figs. 1118show results of runtime experiments regarding the

    real and synthetic datasets shown in Tables 2 and 3 respectively. In

    these figures, we can observe that our LP-growth outperforms the

    others in almost all of the cases. LP-growth uses the proposed lin-

    ear structure to its trees instead of the previous tree form in order

    to minimize access times to search nodes. As a result, its advanta-

    ges have a positive effect on reducing runtime in whole experi-

    ments. Especially as the minimum support threshold becomes

    lower, the difference of runtime between our algorithm and the

    others is bigger.

    FP-growth shows the worst performance with respect to all the

    datasets except for the Retail. Mining times of the FP-growth algo-

    rithm are 3305 s when the minimum support threshold is 80% forthe Connect dataset and 1065 s when the threshold is 70% for the

    Pumsb dataset. Note that the algorithm did not operate normally

    for the Kosarak dataset because its memory usage exceeded the

    limit allowed in our test environment. As an improved version of

    the FP-growth, FP-growth-geothals shows better performance than

    that of the FP-growth algorithm in the dense datasets such as Con-

    nect, Pumsb, and Chess. Since FP-growth-tiny can reduce the size

    of the FP-tree, it presents more improved runtime performance

    compared to the above two algorithms. Its speed is also similar

    to that of FP-growth in many cases but lags behind that of

    CT-PRO and our LP-growth. CT-PRO has outstanding runtime per-

    formance in many cases due to its technique which focuses on

    increasing mining speed by storing and utilizing additional data

    with a bit form. However, the CT-PRO algorithm generally usesenormous memory with respect to almost all of the used datasets,

    and thus, its overall efficiency remarkably falls behind that of LP-

    growth. Especially when the threshold was less than 0.1% for the

    Chain-store and 0.7% for the Kosarak, the algorithm failed to mine

    frequent patterns successfully because of its heavy memory con-

    sumption that our system could not bear. For this reason, the re-

    sults of CT-PRO for these datasets are not expressed in our graph

    figures. FP-growth* shows fine runtime results in general since it

    uses its own technique, FP-array for reducing the number of tree

    traversals. Nevertheless, efficiency of FP-growth falls behind that

    of our algorithm as shown in the figures since the benefit of LP-treeis higher than that of FP-array. MAFIA-FI requires more execution

    Fig. 13. Runtime test (Pumsb).Fig. 12. Runtime test (Retail).

    Fig. 14. Runtime test (Kosarak).

    Fig. 15. Runtime test (Chess).

    G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139 135

    http://-/?-http://-/?-
  • 8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014

    12/15

    time than that of LP-growth and FP-growth in all the cases. Espe-

    cially in the sparse datasets such as Retail, Kosarak, T10I4D1000K

    and BMS-WebView-1, MAFIA-FI has worse performance than that

    of FP-growth. Moreover, we could not evaluate the runtime perfor-

    mance of the algorithm for the Chain-store due to its enormous

    memory consumption. The reason is that it uses a vertical bitmap

    representation to increase mining performance, but its effect is not

    applied in sparse datasets in contrast to dense ones. In Fig. 16, MA-

    FIA-FI needs 1054 s with a minimum support, 0.01%, so its run-

    times are not shown in these figures since we can infer that thealgorithm continues to have the worst performance in all the cases

    of the T10I4D1000K dataset. This tendency is also similarly

    represented in Fig. 12. From all of the runtime results shown in

    Figs. 1118, we can observe that LP-growth and CT-PRO present

    outstanding runtime performance. However, CT-PRO is unstable

    and not available for several cases as shown in the figures because

    it requires a very large amount of memory. Therefore, it is assumed

    that LP-growth is better than CT-PRO in terms of overall capability

    of mining frequent patterns.

    4.4. Memory usage evaluation

    In this section, we evaluate memory usage for each algorithmwith the same datasets as the runtime tests. In Figs. 1926,

    Fig. 16. Runtime test (T10I4D1000K).

    Fig. 21. Memory test (Pumsb).

    Fig. 20. Memory test (Retail).

    Fig. 19. Memory test (Connect).

    Fig. 18. Runtime test (BMS-WebView-1).

    Fig. 17. Runtime test (Chain-store).

    136 G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139

  • 8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014

    13/15

    LP-growth and FP-growth present outstanding memory perfor-

    mance, while CT-PRO shows the worst results in almost all cases.

    Although our algorithm does not show the best memory usage in

    a few cases, it guarantees memory consumption as good as that

    of the state-of-the-art algorithm, FP-growth. Moreover, our algo-

    rithm presents the most outstanding results in many cases. Espe-

    cially inFig. 20for Retail dataset, our LP-growth outperforms the

    others including FP-growth in all of the cases. For the Kosarak

    dataset, CT-PRO could not operate normally when the minimum

    support threshold was 0.6% or less, and for the Chain-store dataset,

    it was not performed successfully when the threshold was 0.01% orless since it consumed too much memory, which is the reason why

    the results of the algorithm are not included inFigs. 22 and 25. Inaddition, CT-PRO uses a lot of memory compared to the other algo-

    rithms. Meanwhile, since LP-growth uses very little memory com-

    pared to CT-PRO and the others, it can be more effective in

    memory-constrained environments than the others.

    FP-growth also requires a lot of memory in many cases which is

    common with the CT-PRO algorithm. FP-growth used too much

    memory mining frequent patterns, so we could not express the re-

    sults into the graph figures with respect to the following situa-

    tions: it used 833 MB when the minimum support threshold was

    80% for the Connect dataset and 1633 MB when the threshold

    was 80% for the Pumsb dataset. Furthermore, FP-growth could

    not mine patterns normally when the threshold was 0.7% or less

    for the Kosarak dataset since it required more memory than the

    limit allowed in our system. In addition, the algorithm consumedmore memory than that of CT-PRO with respect to the Chess data-

    set as shown inFig. 23. Meanwhile, FP-growth-goethals, which is

    an optimized version of FP-growth, showed relatively fine memory

    efficiency although its performance still lags behind that of ours.

    Due to the techniques for saving memory space by generating none

    of conditional databases, FP-growth-tiny reduces memory usage

    for the relatively large datasets such as Pumsb and Kosrark com-

    pared to FP-growth-goethals although its effect is not available

    for the small datasets such as Connect and Chess. Since the LP-tree

    of LP-growth minimizes its tree sizes by using linear structure, it

    guarantees outstanding performance as shown in the figures.

    Moreover, LP-growth has almost constant and stable memory

    consumption regardless of the threshold in comparison to the

    other algorithms. Meanwhile, MAFIA-FI requires relatively muchmemory since the bitmap proposed in the algorithm needs more

    Fig. 26. Memory test (BMS-WebView-1).

    Fig. 24. Memory test (T10I4D1000K).

    Fig. 22. Memory test (Kosarak).

    Fig. 23. Memory test (Chess).

    Fig. 25. Memory test (Chain-store).

    G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139 137

    http://-/?-http://-/?-http://-/?-
  • 8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014

    14/15

    memory resources than the others such as LP-tree and FP-growth

    .FP-growth also shows these characteristics in many cases, but it

    does not guarantees them in the Retail dataset shown in Fig. 20.

    As in the previous runtime test in Fig. 16, memory results of MA-

    FIA-FI are not provided in Fig. 24. The reason is that when the min-

    imum support is 0.01%, it consumes 487 MB to perform its own

    mining operations, and therefore, we can expect that the algorithm

    requires more memory in the case of the other minimum supports.

    Thus, we do not need to denote the results to the figure.

    4.5. Scalability evaluation

    Figs. 2730 show results for scalability tests performed with the

    datasets inTable 3. Note that MAFIA-FI is excluded in the test for

    the datasets with increasing transactions since it needs longer run-times and more memory usages than the others in these tests, so it

    is hard to express the results of its scalability experiments into the

    figures with those of the other algorithms. In addition, FP-growth,

    MAFIA-FI, and CT-PRO are not evaluated in the test for the other

    datasets with increasing items because they cannot be performed

    normally for these datasets due to their lack of memory. The min-

    imum support threshold is fixed at 0.1% in these tests. In Fig. 27,

    runtime increase of LP-growth is far smaller and more stable than

    that of the others since LP-tree allows LP-growth to perform min-

    ing operations effectively regardless of increment of transactions.

    FP-growth shows better scalability than FP-growth and FP-

    growth-goethals due to its special structure, FP-array, although

    its efficiency is lower than ours. FP-growth-tiny also presents fine

    scalability similar to that of FP-growth

    . CT-PRO shows an out-standing scalability result, but our LP-growth is still better. Our

    algorithm also guarantees the best runtime scalability for the testshown inFig. 29. FP-growth-goethals has the worst result, while

    FP-growth shows fine runtime scalability although its perfor-

    mance fall behind ours. FP-growth-tiny presents good performance

    similar to that of FP-growth in the beginning, but its scalability

    becomes drastically low as the number of items gradually in-

    creases. In Fig. 28, all of the algorithms have almost constant mem-

    ory usages since all of them are tree-based algorithms, but their

    absolute values are different from each other, and especially, LP-

    growth presents the smallest memory consumption due to the

    advantages of LP-tree. On the other hand, Fig. 30 shows results

    different from Fig. 28. Since necessary tree sizes become larger

    gradually as the number of attributes is increased, memory usages

    of the algorithms become bigger as shown in the figure. However,

    LP-growth shows the best memory scalability while the othershave relatively poor performance, which indicates that our LP-tree

    can store these increasing attributes more efficiently than the

    other structures of the competitor algorithms. Through the above

    experimental results, we know that the proposed algorithm,

    LP-growth, outperforms the others with respect to increasing

    transactions and items in terms of salability as well as runtime

    and memory usage for the real datasets.

    5. Conclusion

    In this paper, we proposed a new tree structure, LP-tree, and an

    algorithm, LP-growth, applying it to the mining process. The main

    goal of the proposed algorithm was to reduce not only memoryusage needed for building trees but also time to traverse them by

    Fig. 27. Scalability test (Runtime).

    Fig. 30. Scalability test (Memory).Fig. 28. Scalability test (Memory).

    Fig. 29. Scalability test (Runtime).

    138 G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139

  • 8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014

    15/15

    applying a linear structure instead of the previous form used in FP-

    growth. LP-tree contributed to improving performance of frequent

    pattern mining since it spent less memory generating nodes com-

    pared to FP-tree and accessed them without any pointers in many

    cases. Our experimental results showed that LP-growth presented

    outstanding performance in terms of runtime, memory usage, and

    scalability. We could also observe that our algorithmoutperformed

    the previous algorithms especially in the runtime experiments due

    to the reduced pointer accesses. The techniques and strategies de-

    scribed in this paper can be applied to not only general frequent

    pattern mining but also a variety of pattern mining fields such as

    closed/maximal pattern mining, top-k pattern mining, and graph

    mining. We expect that these future researches lead to improve-

    ment of mining performance in various areas.

    Acknowledgements

    This research was supported by the National Research Founda-

    tion of Korea (NRF) funded by the Ministry of Education, Science

    and Technology (NRF Nos. 2013005682 and 20080062611).

    References

    [1] R. Agrawal, R. Srikant, Fast algorithms for mining association rules, Very Large

    Data Bases(VLDB) (1994) 487499.

    [2] C.F. Ahmed, S.K. Tanbeer, B. Jeong, H. Choi, A framework for mining interesting

    high utility patterns with a strong frequency affinity, Information Science

    (ISCI) 181 (21) (2011) 48784894.

    [3] C.F. Ahmed, S.K. Tanbeer, B.S. Jeong, Y.K. Lee, Interactive mining of high utility

    patterns over data streams, Expert System with Applications (ESWA) 39 (15)

    (2012) 1197911991.

    [4] B. Andres, U. kothe, T. Kroger, F.A. Hamprecht, Runtime-flexible multi-

    dimensional arrays and views for C++98 and C++0x, Software: Practice and

    Experience 35 (2) (2010) 159188.

    [5] D. Burdick, M. Calimlim, J. Flanick, J. gehrke, T. Yiu, MAFIA: a maximal frequent

    itemset algorithm, Transactions on Knowledge and Data Engineering (TKDE)

    17 (11) (2005) 14901503.

    [6] L. Chang, T. Wang, D. Yang, H. Luan, SeqStream: mining closed sequential

    patterns over stream sliding windows, International Conference on Data

    Mining (ICDM) (2008) 8392.[7] J.H. Chang, N.H. Park, Comparative analysis of sequence weighting approaches

    for mining time-interval weighted sequential patterns, Expert System with

    Applications (ESWA) 39 (3) (2012) 38673873.

    [8] G.P. Chen, Y.B. Yang, Y. Zhang, MapReduce-based balanced mining for closed

    frequent itemset, International Conference Web Services (2012) 652653.

    [9] Y. Chen, W. Peng, S. Lee, CEMiner an efficient algorithm for mining closed

    patterns from time interval-based data, in: International Conference on Data

    Mining (ICDM), 2011, pp. 121130.

    [10] C. Gao, J. Wang, Q. Yang, Efficient mining of closed sequential patterns on

    stream sliding window, in: International Conference on Data Mining (ICDM),

    2011, pp. 10441049.

    [11] B. Goethals. .[12] G. Grahne, J. Zhu, Fast algorithms for frequent itemset mining using FP-trees,

    Transactions on Knowledge and Data Engineering (TKDE) 17 (10) (2005)

    13471362.

    [13] M. Hamada, K. Tsuda, T. Kudo, T. Kin, K. Asai, Mining frequent stem patterns

    from unaligned RNA sequences, Bioinformatics 22 (20) (2006) 24802487.

    [14] J. Han, J. Pei, Y. Yin, R. Mao, Mining frequent patterns without candidate

    generation: a frequent pattern tree approach, Data Mining and KnowledgeDiscovery (DMKD) 8 (1) (2004) 5387.

    [15] J. Han, H. Cheng, D. Xin, X. Yan, Frequent pattern mining: current status and

    future directions, Data Mining and Knowledge Discovery (DMKD) 15 (1)

    (2007) 5586.

    [16] T. Hu, S.Y. Sung, H. Xiong, Q. Fu, Discovery of maximum length frequent

    itemsets, Information Sciences 178 (1) (2008) 6987.

    [17] H.C. Kum, J.H. Changa, W. Wang, Sequential pattern mining in multi-databases

    via multiple alignment, Data Mining and Knowledge Discovery (DMKD) 12 (2)

    (2006) 151180.

    [18] H.T. Lam, T. Calders, Mining top-K frequent items in a data streamwith flexible

    sliding windows, Knowledge Discovery and Data Mining (KDD) (2010) 283

    292.

    [19] G. Lee, U. Yun, K. Ryu, Sliding window based weighted maximal frequent

    pattern mining over data streams, Expert Systems with Applications 41 (2)

    (2014) 694708.

    [20] H.F. Li, S. Lee, Mining top-K path traversal patterns over streaming web click-

    sequences, Journal of Information Science and Engineering 25 (4) (2009)

    11211133.

    [21] K. Lin, I. Liao, Z. Chen, An improved frequent pattern growth method for

    mining association rules, Expert System with Applications (ESWA) 38 (5)(2011) 51545161.

    [22] H. Liu, F. Lin, J. He, Y. Cai, New approach for the sequential pattern mining of

    high-dimensional sequence databases, Decision Support Systems 50 (1) (2010)

    270280.

    [23] Y. Liu, Mining frequent patterns from univariate uncertain data, Data and

    Knowledge Engineering (DKE) 71 (1) (2012) 4768.

    [24] C. Lucchese, S. Orlando, R. Perego, Mining top-K patterns from binary datasets

    in presence of noise, in: Proceedings of the SIAM International Conference on

    Data Mining (SDM), 2010, pp. 165176.

    [25] E. Ozkural, C. Aykanat, A space optimization for FP-growth, in: FIMI 04

    Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining

    Implementations, November 2004.

    [26] A. Pietracaprina, D. Zandolin, Mining frequent itemsets using patricia tries, in:

    Workshop on Frequent Itemset Mining Implementations, 2003.

    [27] C. Raissi, T. Calders, P. poncelet, Mining conjunctive sequential patterns, Data

    Mining and Knowledge Discovery (DMKD) 17 (1) (2008) 7793.

    [28] S. Ruggieri, Frequent regular itemset mining, Knowledge Discovery and Data

    Minin (KDD) (2010) 263272.

    [29] Y.G. Sucahyo, R.P. Gopalan, CT-PRO: a bottom-up non recursive frequent

    itemset mining algorithm using compressed FP-tree data structure, in: FIMI

    04 Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining

    Implementations, November 2004.

    [30] S.K. Tanbeer, C.F. Ahmed, B.S. Jeong, Y. Lee, Efficient single-pass frequent

    pattern mining using a prefix-tree, Information Sciences 179 (5) (2008) 559

    583.

    [31] S.K. Tanbeer, C.F. Ahmed, B.S. Jeong, Y.K. Lee, Sliding window-based frequent

    pattern miningover data streams, Information Sciences 179 (22) (2009)3843

    3865.

    [32] S.K. Tanbeer, C.F. Ahmed, B.S. Jeong, Y.K. Lee, Mining regular patterns in data

    streams, Database Systems for Advanced Applications (2010) 399413.

    [33] F. Tao, Weighted association rule mining using weighted support and

    significant framework, Knowledge Discovery and Data Minin (KDD) (2003)

    661666.

    [34] V.S. Tseng, C.W. Wu, B.E. Shie, P.S. Yu, UP-Growth: an efficient algorithm for

    high utility itemset mining, Knowledge Discovery and Data mining (KDD)

    (2010) 253262.

    [35] R.C. Wong, A.W. Fu, Mining top-K frequent itemsets from data streams, DataMining and Knowledge Discovery (DMKD) 13 (2) (2006) 193217.

    [36] T. Wu, Y. Chen, J. han, Re-examination of interestingness measures in pattern

    mining: a unified framework, Data Mining and Knowledge Discovery (DMKD)

    21 (3) (2010) 371397.

    [37] F.Y. Ye, J.D. Wang, B.L. Shao, New algorithm for mining frequent itemsets in

    sparse database, in: Proc. the Fourth International Conference on Machine

    Learning and Cybernetics, 2005, pp. 15541558.

    [38] U. Yun, K.H. Ryu, Discovering important sequential patterns with length-

    decreasing weighted support constraints, International Journal of Information

    Technology and Decision Making 9 (4) (2010) 575599.

    [39] U. Yun, K. Ryu, E. Yoon, Weighted approximate sequential pattern mining

    within tolerance factors, Intelligent Data Analysis 15 (4) (2011) 551569.

    [40] U. Yun, H. Shin, K. Ryu, E. Yoon, An efficient mining algorithm for maximal

    weighted frequent patterns in transactional databases, Knowledge Based

    Systems 33 (2012) 5364.

    [41] U. Yun, K. Ryu, Efficient mining of maximal correlated weight frequent

    patterns, Intelligent Data Analysis 17 (5) (2013) 917939.

    [42] U. Yun, G. Lee, K. Ryu, Mining maximal frequent patterns by considering

    weight conditions over data streams, in: Knowledge Based Systems 55 (2014)4965.

    [43] X. Zhang, Y. Zhang, Sliding-window top-K pattern mining on uncertain

    streams, Journal of Computational Information Systems 7 (3) (2011) 984992.

    [44] F. Zhu, Q. Qu, D. Lo, X. Yan, J. Han, P.S. Yu, Mining top-K large structure pattern

    in a massive network, Very Large Data Bases (VLDB) (2011) 807818 .

    [45] J. Zou, J. Xiao, R. Hou, Yanqi Wang, Frequent instruction sequential pattern

    mining in hardware sample data, International Conference on Data Mining

    (ICDM) (2010) 12051210.

    G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139 139

    http://refhub.elsevier.com/S0950-7051(13)00324-9/h0005http://refhub.elsevier.com/S0950-7051(13)00324-9/h0005http://refhub.elsevier.com/S0950-7051(13)00324-9/h0010http://refhub.elsevier.com/S0950-7051(13)00324-9/h0010http://refhub.elsevier.com/S0950-7051(13)00324-9/h0010http://refhub.elsevier.com/S0950-7051(13)00324-9/h0015http://refhub.elsevier.com/S0950-7051(13)00324-9/h0015http://refhub.elsevier.com/S0950-7051(13)00324-9/h0015http://refhub.elsevier.com/S0950-7051(13)00324-9/h0020http://refhub.elsevier.com/S0950-7051(13)00324-9/h0020http://refhub.elsevier.com/S0950-7051(13)00324-9/h0020http://refhub.elsevier.com/S0950-7051(13)00324-9/h0020http://refhub.elsevier.com/S0950-7051(13)00324-9/h0025http://refhub.elsevier.com/S0950-7051(13)00324-9/h0025http://refhub.elsevier.com/S0950-7051(13)00324-9/h0025http://refhub.elsevier.com/S0950-7051(13)00324-9/h0025http://refhub.elsevier.com/S0950-7051(13)00324-9/h0030http://refhub.elsevier.com/S0950-7051(13)00324-9/h0030http://refhub.elsevier.com/S0950-7051(13)00324-9/h0030http://refhub.elsevier.com/S0950-7051(13)00324-9/h0030http://refhub.elsevier.com/S0950-7051(13)00324-9/h0035http://refhub.elsevier.com/S0950-7051(13)00324-9/h0035http://refhub.elsevier.com/S0950-7051(13)00324-9/h0035http://refhub.elsevier.com/S0950-7051(13)00324-9/h0040http://refhub.elsevier.com/S0950-7051(13)00324-9/h0040http://adrem.ua.ac.be/~goethals/software/http://adrem.ua.ac.be/~goethals/software/http://refhub.elsevier.com/S0950-7051(13)00324-9/h0045http://refhub.elsevier.com/S0950-7051(13)00324-9/h0045http://refhub.elsevier.com/S0950-7051(13)00324-9/h0045http://refhub.elsevier.com/S0950-7051(13)00324-9/h0050http://refhub.elsevier.com/S0950-7051(13)00324-9/h0050http://refhub.elsevier.com/S0950-7051(13)00324-9/h0055http://refhub.elsevier.com/S0950-7051(13)00324-9/h0055http://refhub.elsevier.com/S0950-7051(13)00324-9/h0055http://refhub.elsevier.com/S0950-7051(13)00324-9/h0060http://refhub.elsevier.com/S0950-7051(13)00324-9/h0060http://refhub.elsevier.com/S0950-7051(13)00324-9/h0060http://refhub.elsevier.com/S0950-7051(13)00324-9/h0065http://refhub.elsevier.com/S0950-7051(13)00324-9/h0065http://refhub.elsevier.com/S0950-7051(13)00324-9/h0070http://refhub.elsevier.com/S0950-7051(13)00324-9/h0070http://refhub.elsevier.com/S0950-7051(13)00324-9/h0070http://refhub.elsevier.com/S0950-7051(13)00324-9/h0075http://refhub.elsevier.com/S0950-7051(13)00324-9/h0075http://refhub.elsevier.com/S0950-7051(13)00324-9/h0075http://refhub.elsevier.com/S0950-7051(13)00324-9/h0080http://refhub.elsevier.com/S0950-7051(13)00324-9/h0080http://refhub.elsevier.com/S0950-7051(13)00324-9/h0080http://refhub.elsevier.com/S0950-7051(13)00324-9/h0085http://refhub.elsevier.com/S0950-7051(13)00324-9/h0085http://refhub.elsevier.com/S0950-7051(13)00324-9/h0085http://refhub.elsevier.com/S0950-7051(13)00324-9/h0090http://refhub.elsevier.com/S0950-7051(13)00324-9/h0090http://refhub.elsevier.com/S0950-7051(13)00324-9/h0090http://refhub.elsevier.com/S0950-7051(13)00324-9/h0095http://refhub.elsevier.com/S0950-7051(13)00324-9/h0095http://refhub.elsevier.com/S0950-7051(13)00324-9/h0095http://refhub.elsevier.com/S0950-7051(13)00324-9/h0100http://refhub.elsevier.com/S0950-7051(13)00324-9/h0100http://refhub.elsevier.com/S0950-7051(13)00324-9/h0105http://refhub.elsevier.com/S0950-7051(13)00324-9/h0105http://refhub.elsevier.com/S0950-7051(13)00324-9/h0105http://refhub.elsevier.com/S0950-7051(13)00324-9/h0110http://refhub.elsevier.com/S0950-7051(13)00324-9/h0110http://refhub.elsevier.com/S0950-7051(13)00324-9/h0115http://refhub.elsevier.com/S0950-7051(13)00324-9/h0115http://refhub.elsevier.com/S0950-7051(13)00324-9/h0115http://refhub.elsevier.com/S0950-7051(13)00324-9/h0120http://refhub.elsevier.com/S0950-7051(13)00324-9/h0120http://refhub.elsevier.com/S0950-7051(13)00324-9/h0120http://refhub.elsevier.com/S0950-7051(13)00324-9/h0125http://refhub.elsevier.com/S0950-7051(13)00324-9/h0125http://refhub.elsevier.com/S0950-7051(13)00324-9/h0130http://refhub.elsevier.com/S0950-7051(13)00324-9/h0130http://refhub.elsevier.com/S0950-7051(13)00324-9/h0130http://refhub.elsevier.com/S0950-7051(13)00324-9/h0135http://refhub.elsevier.com/S0950-7051(13)00324-9/h0135http://refhub.elsevier.com/S0950-7051(13)00324-9/h0135http://refhub.elsevier.com/S0950-7051(13)00324-9/h0140http://refhub.elsevier.com/S0950-7051(13)00324-9/h0140http://refhub.elsevier.com/S0950-7051(13)00324-9/h0145http://refhub.elsevier.com/S0950-7051(13)00324-9/h0145http://refhub.elsevier.com/S0950-7051(13)00324-9/h0145http://refhub.elsevier.com/S0950-7051(13)00324-9/h0150http://refhub.elsevier.com/S0950-7051(13)00324-9/h0150http://refhub.elsevier.com/S0950-7051(13)00324-9/h0150http://refhub.elsevier.com/S0950-7051(13)00324-9/h0155http://refhub.elsevier.com/S0950-7051(13)00324-9/h0155http://refhub.elsevier.com/S0950-7051(13)00324-9/h0160http://refhub.elsevier.com/S0950-7051(13)00324-9/h0160http://refhub.elsevier.com/S0950-7051(13)00324-9/h0160http://refhub.elsevier.com/S0950-7051(13)00324-9/h0160http://refhub.elsevier.com/S0950-7051(13)00324-9/h0165http://refhub.elsevier.com/S0950-7051(13)00324-9/h0165http://refhub.elsevier.com/S0950-7051(13)00324-9/h0170http://refhub.elsevier.com/S0950-7051(13)00324-9/h0170http://refhub.elsevier.com/S0950-7051(13)00324-9/h0175http://refhub.elsevier.com/S0950-7051(13)00324-9/h0175http://refhub.elsevier.com/S0950-7051(13)00324-9/h0180http://refhub.elsevier.com/S0950-7051(13)00324-9/h0180http://refhub.elsevier.com/S0950-7051(13)00324-9/h0180http://refhub.elsevier.com/S0950-7051(13)00324-9/h0180http://refhub.elsevier.com/S0950-7051(13)00324-9/h0180http://refhub.elsevier.com/S0950-7051(13)00324-9/h0180http://refhub.elsevier.com/S0950-7051(13)00324-9/h0175http://refhub.elsevier.com/S0950-7051(13)00324-9/h0175http://refhub.elsevier.com/S0950-7051(13)00324-9/h0170http://refhub.elsevier.com/S0950-7051(13)00324-9/h0170http://refhub.elsevier.com/S0950-7051(13)00324-9/h0165http://refhub.elsevier.com/S0950-7051(13)00324-9/h0165http://refhub.elsevier.com/S0950-7051(13)00324-9/h0160http://refhub.elsevier.com/S0950-7051(13)00324-9/h0160http://refhub.elsevier.com/S0950-7051(13)00324-9/h0160http://refhub.elsevier.com/S0950-7051(13)00324-9/h0155http://refhub.elsevier.com/S0950-7051(13)00324-9/h0155http://refhub.elsevier.com/S0950-7051(13)00324-9/h0150http://refhub.elsevier.com/S0950-7051(13)00324-9/h0150http://refhub.elsevier.com/S0950-7051(13)00324-9/h0150http://refhub.elsevier.com/S0950-7051(13)00324-9/h0145http://refhub.elsevier.com/S0950-7051(13)00324-9/h0145http://refhub.elsevier.com/S0950-7051(13)00324-9/h0145http://refhub.elsevier.com/S0950-7051(13)00324-9/h0140http://refhub.elsevier.com/S0950-7051(13)00324-9/h0140http://refhub.elsevier.com/S0950-7051(13)00324-9/h0135http://refhub.elsevier.com/S0950-7051(13)00324-9/h0135http://refhub.elsevier.com/S0950-7051(13)00324-9/h0135http://refhub.elsevier.com/S0950-7051(13)00324-9/h0130http://refhub.elsevier.com/S0950-7051(13)00324-9/h0130http://refhub.elsevier.com/S0950-7051(13)00324-9/h0130http://refhub.elsevier.com/S0950-7051(13)00324-9/h0125http://refhub.elsevier.com/S0950-7051(13)00324-9/h0125http://refhub.elsevier.com/S0950-7051(13)00324-9/h0120http://refhub.elsevier.com/S0950-7051(13)00324-9/h0120http://refhub.elsevier.com/S0950-7051(13)00324-9/h0120http://refhub.elsevier.com/S0950-7051(13)00324-9/h0115http://refhub.elsevier.com/S0950-7051(13)00324-9/h0115http://refhub.elsevier.com/S0950-7051(13)00324-9/h0115http://refhub.elsevier.com/S0950-7051(13)00324-9/h0110http://refhub.elsevier.com/S0950-7051(13)00324-9/h0110http://refhub.elsevier.com/S0950-7051(13)00324-9/h0105http://refhub.elsevier.com/S0950-7051(13)00324-9/h0105http://refhub.elsevier.com/S0950-7051(13)00324-9/h0100http://refhub.elsevier.com/S0950-7051(13)00324-9/h0100http://refhub.elsevier.com/S0950-7051(13)00324-9/h0095http://refhub.elsevier.com/S0950-7051(13)00324-9/h0095http://refhub.elsevier.com/S0950-7051(13)00324-9/h0095http://refhub.elsevier.com/S0950-7051(13)00324-9/h0090http://refhub.elsevier.com/S0950-7051(13)00324-9/h0090http://refhub.elsevier.com/S0950-7051(13)00324-9/h0090http://refhub.elsevier.com/S0950-7051(13)00324-9/h0085http://refhub.elsevier.com/S0950-7051(13)00324-9/h0085http://refhub.elsevier.com/S0950-7051(13)00324-9/h0085http://refhub.elsevier.com/S0950-7051(13)00324-9/h0080http://refhub.elsevier.com/S0950-7051(13)00324-9/h0080http://refhub.elsevier.com/S0950-7051(13)00324-9/h0080http://refhub.elsevier.com/S0950-7051(13)00324-9/h0075http://refhub.elsevier.com/S0950-7051(13)00324-9/h0075http://refhub.elsevier.com/S0950-7051(13)00324-9/h0075http://refhub.elsevier.com/S0950-7051(13)00324-9/h0070http://refhub.elsevier.com/S0950-7051(13)00324-9/h0070http://refhub.elsevier.com/S0950-7051(13)00324-9/h0070http://-/?-http://refhub.elsevier.com/S0950-7051(13)00324-9/h0065http://refhub.elsevier.com/S0950-7051(13)00324-9/h0065http://-/?-http://refhub.elsevier.com/S0950-7051(13)00324-9/h0060http://refhub.elsevier.com/S0950-7051(13)00324-9/h0060http://refhub.elsevier.com/S0950-7051(13)00324-9/h0060http://-/?-http://refhub.elsevier.com/S0950-7051(13)00324-9/h0055http://refhub.elsevier.com/S0950-7051(13)00324-9/h0055http://refhub.elsevier.com/S0950-7051(13)00324-9/h0055http://-/?-http://refhub.elsevier.com/S0950-7051(13)00324-9/h0050http://refhub.elsevier.com/S0950-7051(13)00324-9/h0050http://-/?-http://refhub.elsevier.com/S0950-7051(13)00324-9/h0045http://refhub.elsevier.com/S0950-7051(13)00324-9/h0045http://refhub.elsevier.com/S0950-7051(13)00324-9/h0045http://-/?-http://adrem.ua.ac.be/~goethals/software/http://adrem.ua.ac.be/~goethals/software/http://-/?-http://-/?-http://refhub.elsevier.com/S0950-7051(13)00324-9/h0040http://refhub.elsevier.com/S0950-7051(13)00324-9/h0040http://refhub.elsevier.com/S0950-7051(13)00324-9/h0035http://refhub.elsevier.com/S0950-7051(13)00324-9/h0035http://refhub.elsevier.com/S0950-7051(13)00324-9/h0035http://refhub.elsevier.com/S0950-7051(13)00324-9/h0030http://refhub.elsevier.com/S0950-7051(13)00324-9/h0030http://refhub.elsevier.com/S0950-7051(13)00324-9/h0030http://refhub.elsevier.com/S0950-7051(13)00324-9/h0025http://refhub.elsevier.com/S0950-7051(13)00324-9/h0025http://refhub.elsevier.com/S0950-7051(13)00324-9/h0025http://refhub.elsevier.com/S0950-7051(13)00324-9/h0020http://refhub.elsevier.com/S0950-7051(13)00324-9/h0020http://refhub.elsevier.com/S0950-7051(13)00324-9/h0020http://refhub.elsevier.com/S0950-7051(13)00324-9/h0015http://refhub.elsevier.com/S0950-7051(13)00324-9/h0015http://refhub.elsevier.com/S0950-7051(13)00324-9/h0015http://refhub.elsevier.com/S0950-7051(13)00324-9/h0010http://refhub.elsevier.com/S0950-7051(13)00324-9/h0010http://refhub.elsevier.com/S0950-7051(13)00324-9/h0010http://refhub.elsevier.com/S0950-7051(13)00324-9/h0005http://refhub.elsevier.com/S0950-7051(13)00324-9/h0005http://-/?-http://-/?-