efficient frequent mining frequent patterns without candidate fpgrowth 2004requent pattern mining...
TRANSCRIPT
-
8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014
1/15
Efficient frequent pattern mining based on Linear Prefix tree
Gwangbum Pyun a, Unil Yun a,, Keun Ho Ryu b
a Department of Computer Engineering, Sejong University, Seoul, Republic of Koreab Department of Computer Science, Chungbuk National University, Cheongju, Republic of Korea
a r t i c l e i n f o
Article history:
Received 24 April 2013
Received in revised form 11 October 2013
Accepted 12 October 2013
Available online 24 October 2013
Keywords:
Data mining
Frequent pattern mining
Linear tree
Pattern growth
Knowledge discovery
a b s t r a c t
Outstanding frequent pattern mining guarantees both fast runtime and low memory usage with respect
to various data with different types and sizes. However, it is hard to improve the two elements sinceruntime is inversely proportional to memory usage in general. Researchers have made efforts to
overcome the problem and have proposed mining methods which can improve both through various
approaches. Many of state-of-the-art mining algorithms use tree structures, and they create nodes
independently and connect them as pointers when constructing their own trees. Accordingly, the
methods have pointers for each node in the trees, which is an inefficient way since they should manage
and maintain numerous pointers. In this paper, we propose a novel tree structure to solve the limitation.
Our new structure, LP-tree (Linear Prefix Tree) is composed of array forms and minimizes pointers
between nodes. In addition, LP-tree uses minimum information required in mining process and linearly
accesses corresponding nodes. We also suggest an algorithm applying LP-tree to the mining process. The
algorithm is evaluated through various experiments, and the experimental results show that our
approach outperforms previous algorithms in term of the runtime, memory, and scalability.
2013 Elsevier B.V. All rights reserved.
1. Introduction
As a part of the association rule mining, frequent pattern mining
is a method for finding frequent patterns in large data [15]. The
patterns obtained from mining operations are usefully utilized to
analyze data characteristics or gain information needed for
decision-making. In addition, it can be applied in a variety of real
data analyses such as web data [20], customer data in finance,
correlation of product data, vehicle and communication data [9],
bio data[13], hardware monitoring of computer system [45], and
regular pattern mining[28]. In pattern mining, a pattern is a set of
items in a certain database, and a support of the pattern is defined
as the number of transactions containing the pattern, where we
regard patterns satisfying a given minimum support threshold as
frequent ones. Apriori [1] and FP-growth [14] are fundamentalalgorithms in frequent pattern mining, and current studies are
proceeding based on the twoalgorithms. Moreover, other numerous
methods have beensuggested. First, there are methods usingclosed
patterns such as BMCIF [8], and CEMiner [9] and those for maximal
patternssuch as MAFIA [5], FP-MAX [12], LFIMiner[16], MCWP[41],
and MWS [42]. Furthermore, there exist other approaches for
stream environments such as WMFP-SW[19], BSM[30], CPS-tree
[31], and RPS-tree[32], and for utility patterns such as HUIPM [2],
HUPMS[3], and UP-growth [34]. The following techniques apply
item weights into the mining process. WARM[33], WAS[39],and
MWFIM [40] are weight-based algorithms, and TIWS [7] adds
weights with times. In addition, there is an approach which finds
frequent patterns from the top support to kth support without
any given minimum support threshold. The method is called
Top-k pattern mining, and typical studies are MinSummary [18],
PND[24], Chenoff[35], Topk-PU[43], SpiderMine[44], etc. In the
sequential pattern mining considering item sequence, there are
SeqStream [6], StreamCloseq [10], ApproxMAP [17], TD-seq [22],
CSP [27], WSpan [38], and so forth. U2P-Miner[23] mines uncertain
data, and GAMiner[36]gives meaning to interesting patterns and
then extracts patterns. Developing an improved algorithm for the
frequentpattern mining can contribute to advancingminingperfor-
mance in various mining fields. FP-growth-based frequent patternmining, such as FP-growth [12],patricia-tree [26], and IFP-growth
[21], has the following characteristics. FP-growth has connection
information among all nodes in FP-tree in order to search thenodes.
Therefore, it has many pointers for connecting nodes, thereby using
a lot of runtime and memory resources. In this paper, we, therefore,
propose a novel tree structure, LP-tree (Linear prefixtree) and an
algorithm using the tree, called LP-growth which can conduct
mining operations more quickly and efficiently than previous
algorithms. OurLP-treecan solve the above limitation dueto its spe-
cial structure basedon thelinearform. Wecan obtain advantages by
converting trees nodes as array forms. It can increase memory effi-
ciency through arrayed nodes since they can reduce connection
0950-7051/$ - see front matter 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.knosys.2013.10.013
Corresponding author. Tel.: +82 234082902.
E-mail addresses: [email protected] (G. Pyun), [email protected] (U. Yun),
[email protected](K.H. Ryu).
Knowledge-Based Systems 55 (2014) 125139
Contents lists available at ScienceDirect
Knowledge-Based Systems
j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / k n o s y s
http://dx.doi.org/10.1016/j.knosys.2013.10.013mailto:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.knosys.2013.10.013http://www.sciencedirect.com/science/journal/09507051http://www.elsevier.com/locate/knosyshttp://www.elsevier.com/locate/knosyshttp://www.sciencedirect.com/science/journal/09507051http://dx.doi.org/10.1016/j.knosys.2013.10.013mailto:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.knosys.2013.10.013http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://crossmark.crossref.org/dialog/?doi=10.1016/j.knosys.2013.10.013&domain=pdfhttp://-/?- -
8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014
2/15
information. We can also speed up item traversal times since
LP-tree does not use pointers in most cases and generates a large
number of nodes at once due to its linear structure. By applying
the features of LP-tree to mining process, we can obtain the follow-
ing benefits: (1) Tree generation rate of our approach becomes fas-
ter than that of FP-growth since ours can create multiple nodes at
once by a series of array operations. Meanwhile, FP-growth makes
nodes one by one. (2) We can access parent or child nodes without
corresponding pointers when searching trees since the nodes are
stored as an array form. (3) Memory usage for each node becomes
relatively small since LP-tree does not require internal node point-
ers. (4) It is possible to traverse trees more quickly compared to
searching for them with pointers since our approach directly acces-
ses corresponding memories due to the feature of the array struc-
ture. This paper is organized as follows. In Section 2, we introduce
related work with respect to LP-tree and LP-growth, and describe
details for our techniques and algorithmin Section 3. Next, wecom-
pare performance of our algorithm with those of previous algo-
rithms through various experiments in Section 4, and we finally
conclude this paper in the last section.
2. Related work
Frequent pattern mining extracts specific patterns with sup-
ports higher than or equal to a minimum support threshold, and
many of mining methods have been researched as mentioned
above, but Apriori [1] and FP-growth [14] are still regarded as
underlying algorithms. Apriori is the oldest conventional mining
algorithm, and it performs mining operations by extending pattern
lengths. The algorithm generates candidate patterns through the
pattern extension in advance, and then confirms whether the can-
didates are actually frequent patterns by scanning a database. Con-
sequently, Apriori has no choice but to scan the database as much
as the maximum length among frequent patterns. UT-Miner [37] is
an improved Apriori algorithm specialized in sparse data, where
sparse data indicate that most transactions are different from eachother. The algorithm uses an array structure, unit triple storing rela-
tions between items and transactions in a database to improve
mining performance. However, UT-Miner does not guarantee fine
performance in terms of runtime and memory usage since the
algorithm is based on Apriori method. On the other hand, FP-
growth [14]solved the above problemby scanning a database only
twice. It uses a tree structure, called FP-tree, which can prevent the
algorithm from generating candidate patterns. FP-tree consists of a
tree for storing database information and a header table containing
item names, supports, and node links. A tree is composed of nodes,
where each of them includes an item name, a support, a parent
pointer, a child pointer, and a node link. The node link is a pointer
that connects all nodes with the same item to each other. Since the
FP-growth algorithm was proposed, various algorithms have beenpublished on the basis of the algorithm. FP-growth-goethals [11]is
a FP-growth implementation which is optimized by Bart-Goethals.
To increase efficiency of search space in FP-growth, FP-growth-tiny
[25] generates conditional FP-trees using conditional patterns
without creating any conditional database. In CT-PRO [29], the
authors suggested Compressed FP-tree adding a count array into
the nodes of the FP-tree, where each entry of the array corre-
sponded to the number of itemsets occurrences. The algorithm
mines frequent patterns using the information added in the tree
without recursive calls. IFP-growth [21] enhanced pruning effect
with a new tree structure, FP-tree+, where the tree adds an address
table to the FP-tree. Therefore, the algorithm decreases the number
of conditional FP-trees, thereby improving mining speed. Mean-
while, it needs more information than the original FP-tree. In addi-tion, IFP-growth does not upgrade memory efficiency although this
contributes to reducing runtime. MAFIA-FI[5]saves data informa-
tion into a bitmap form so as to reduce the number of tree
searches. The bitmap is made up of two dimensions, where x-axis
means items and y-axis is transactions. For example, a point (2,4)
of a certain bitmap means that the second item exists in the fourth
transaction. Thus, MAFIA-FI can compute patterns or items sup-
ports through AND operations of the bitmap without tree tra-
versals. In addition, the algorithm can prevent creating needless
trees with infrequent patterns and maximize pruning efficiency.
However, MAFIA-FI requires more memory although its runtime
is faster than the original method. Patricia-tree [26]also uses an
array structure to a part of the FP-tree, where the algorithm gener-
ates paths with the same support as an array. Meanwhile, the
LP-tree proposed in this paper constructs all paths as arrays
regardless of items supports, where the shapes of the arrays vary
depending on each transactions form. FP-growth [12] proposed
FP-array with pattern information and increased pruning efficiency
with FP-array. The approach calculates supports of patterns to be
expanded in advance, and eliminates infrequent patterns effec-
tively through the proposed FP-array. However, FP-growth also
does not reduce the size of trees since it still uses the original
FP-tree-based structures. As a result, we need to develop a new
tree structure to improve fundamental performance of the mining
algorithm. Consequently, we propose a novel tree and algorithm
for satisfying both runtime and memory efficiency. In our LP-tree,
its runtime and memory performances are more outstanding than
those of FP-growth due to its special tree structure based on the
array.
3. Frequent pattern mining based on Linear Prefix-tree
In this section, we present details of LP-growth algorithm and
related techniques. The algorithm conducts mining operations
with LP-tree and corresponding growth methods.
3.1. Preliminaries
Given a transaction database, D,I= {i1, i2, . . ., in} is a set of items
composingD, andD consists of multiple transactions. All transac-
tions have each a unique set of items.D includes uniqueIDs, called
TIDs, with respect to each transaction. A pattern is defined as a sub
or whole set ofI. Assuming that any pattern Phas several items and
its first and last ones are ib and ie respectively, P is denoted as
follows.
P fib; . . . ; ieg; 1 6 b< e 6 n:
Ps support means the number of transactions containing in D.
In other words, this indicates how muchP occurs inD. Let jPj be
the number of transactions including Pandj
Djbe the number of
all transactions in D. Then, we can calculate Ps support rate,sup(P)
as follows.
supP jPj=jDj;
where 0 6 sup(P) 6 1.Pis regarded as a frequent pattern ifsup(P) is
not smaller than a given minimum support (or minsup). Denoting
the frequentPasL, it is also included in Iand satisfies sup(L)P min-
sup, where 0 6 minsup 6 1.
L fP# IjsupPP minsupg:
For instance, given a database, {{TID1: a,b,c}, {TID2: a,b}, and {TID3:
b,c,d,e}},IbecomesI= {a,b,c, d,e}. If a minimum support threshold
is 60%, a pattern, {a,b} is frequent since it appears in TID1 and 2;thus, its support is higher than the threshold. Meanwhile, another
126 G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139
-
8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014
3/15
pattern, {a,c} is infrequent because it is contained in onlyTID1, and
therefore, its support is lower than the threshold.
3.2. LP-tree: a novel tree structure for mining frequent patterns
There are several limitations in regard to previous frequent pat-
tern mining methods. Basically, frequent pattern mining has to find
all frequent patterns in a transaction database. Thus, in the worst
case, the method should extract 2n 1 patterns since all of the
n-items in a database are frequent. Moreover, Apriori-based algo-
rithms consume more time and memory due to generation of can-
didate patterns. Meanwhile, FP-growth spends most of the time
traversing and generating trees. Note that the problem for the
Apriori is not under consideration, and we focus on the last FP-
growth approach because FP-growth-based algorithms generally
have better performance than that of Apriori-based algorithms.
For improving performance, we should decrease tree traversal
and generation times. To do that, tree structures need to have a
simple form, and each node in the trees has to occupy smaller
memory space. Our LP-tree satisfies both of them, so LP-growth
with the tree structure can conduct the tree creation and search
efficiently.
Definition 1 (LP-tree (Linear Prefix tree)). LP-tree has the following
structure: (1) Header list consisting of item-names, supports, and
node links, (2) Linear Prefix Node (LPN) for storing frequent items
of each transaction and a corresponding header, and (3) Branch
Node List (BNL) including information of branch nodes and their
child nodes. LP-tree consisting of c LPNs has the following
structure.
LP tree fHeaderlist; BNL; LPN1; LPN2; . . . ; LPNcg:
LP-tree entirely has a linear structure. Each set of frequent items is
saved into nodes composed of an array form, where we use multi-
ple arrays since one array structure cannot express items as a tree
form with many branches. To connect each array, every array hasa header in the first part of the array, where the header indicates
its parent array. LPN contains a header and an array node storing
pattern, and the array node consists of several internal nodes.
Moreover, a header of any LPN can indicate a root of the tree when
the LPN is the first one inserted in the tree. Details of LPN are
mentioned inDefinition 2. LP-tree is composed of more than one
LPN as shown inFig. 1.
Definition 2 (LPN (Linear Prefix Node)). LPN is a fundamental
structure of LP-tree. In an LPN, there are multiple internal nodes
and a header in the top position of the LPN. Let Parent_Link, i,
S, L, and b be a parent node pointer connected to another
LPN (parent LPN), an item, a support, a node link, and branch infor-
mation respectively. Then, the following Eq. (1) represents how
LPN is composed, where each of internal node information is
described between h
and i
:
LPN fhParent Linki; hi1; S; L; bi; hi2; S; L; bi; . . . ; hin; S; L; big: 1
LPN stores item information into each node. That is, if certain items
{i1, i2, . . ., in} is added in an LPN, its array node has n-internal nodes.
In this process, finite internal nodes are generated according to the
number of inserting items, thereby dealing with whole pattern
information. In Eq. (1), we can express that a parent node of
hin, S, L, bi is hin1, S, L, bi and a child node ofhin1, S, L, bi is hin, S, L, bi.
Parent_Link is a pointer indicating a parent node ofhi1, S, L, biwhich
is the first node of the LPN. The parent node connected to the
Parent_Link becomes either a root or any node of another LPN. We
define the symbol, gpc;k , to express a pointer of a specific node insideLPN. Given a certaincth LPN withn nodes, LPNc, gpc;k indicates thekth node of the LPNc(k= [0,
. . .
, n]). gproot is a pointer to the root(0th node is a header node storing Parent_Link). If a parent of the
first node in any LPN is the 5th node of LPN1, then its Parent_Link
becomes gp1;5 . An internal node of LPN has four elements as inEq.(1), where each subscript indicates corresponding ordinal num-
bers. A header in LPN is linked to a branch node of its parent. Thus,
we can gain patterns tracking headers. A node link, L, plays a role
in concatenating nodes with the same item. LPN does not have any
pointers for connection among internal nodes. A branch node has
more than two child nodes. Therefore, LPN uses a branch node in or-
der to express nodes having more than two child nodes. The b is
used as a flag value to mark whether branch nodes exist or not. LPN
does not manage two or more child nodes due to the arrays limita-
tion. For this reason, we propose and use BNL for managing the
branch nodes in order to deal with multiple child nodes.
Definition 3 (BNL (Branch Node List)). BNL helps manage numer-
ous branch nodes when generating LP-tree. When items for each
transaction are inserted, they are sequentially inputted from a root,
and several branches can occur in this process. If a current position
reaches any branch node during the insertion, we confirm all child
nodes of the branch node and then move to appropriate location by
referring to BNL information. We can easily access multiple child
nodes through BNL, which is constructed as list forms and stores
only information of branch nodes and their child nodes. BNL is
composed of branched node table and child node list, where the
branched node table stores pointers of all branched nodes and each
element (pointer) stored in it has onechild node list. Thechild node
list has child node pointers of a corresponding branched node.Therefore, assuming that LP-tree has i branched nodes, Bi is a
pointer indicating ith branched nodes, and Ci,j is a pointer of the
jth child node of Bi. Hence, BNL consists of branched node
table= {B1, B2, . . ., Bi} and child node list= {{C1,1, C1,2, . . .}, {C2,1, C2,2,
. . .},. . ., {Ci,1, Ci,2, . . ., Ci,j}}. After matching them using the symbol,
?, we can denote BNL as follows, where {Bi? Ci,1, Ci,2, . . ., Ci,j}
means that Bi indicates a set of child node pointers, {Ci,1, Ci,2, . . ., Ci,j}.
BNL ffB1! C1;1; C1;2; . . .g; fB2! C2;1; C2;2; . . .g; . . . ;
fBi ! Ci;1; Ci;2; . . . ; Ci;jgg 2
Fig. 2 shows the entire BNL structure, where the structure is
mapped with Eq.(2).
We can also express a set of child node pointers forBistored in
BNL as the following equation:Fig. 1. Structure of Linear Prefix Nodes (LPNs).
G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139 127
http://-/?-http://-/?-http://-/?-http://-/?- -
8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014
4/15
BNLBi fCi;1; Ci;2; . . . ; Ci;jg:
BNL has child node pointers as many as the number of child nodes
of the branch nodes. Pointers of child nodes stored in BNL are sorted
in their item name order to conduct a binary search. Thus, we can
directly access internal child nodes with no branches, while we
should indirectly access through BNL the other child nodes with
branches.
Example 1. Given a certain database as shown inTable 1, when 4
sorted transactions from TID 1 to 4 are inserted in a LP-tree, the
tree has 4 LPNs: LPN1-{e,a,f,c,d,g,h}, LPN2-{b,a,f}, LPN3-{c,h}, and
LPN4-{b,f,g}, where the root of the tree and the node ofLPN1 having
an item e become branch nodes. Then, B1of a branched node table
inFig. 2becomes gproot. C1,1 becomes the second child node of theroot, i.e., the node ofLPN4 with b, gp4;1 .B2 is assigned as the nodeofLPN1with e, denoted as gp1;1 , andC2,1 becomes the node ofLPN2with b which is the second child node of the node ofLPN1 having
e, denoted as gp2;1 . Similarly,C2,2 is p3,1, which is the node ofLPN3with c.
Definition 4 (Header list). One of the elements of LP-tree, Header
list, has information needed for mining patterns from the tree,
where the list consists of item-name, item-support, and node-link.
Item-name denotes names of items consisting of LP-tree, and item-
support means the number of items with the same name in the
tree. For example, if any item name is a and its support is 5, it
implies that item a occurs 5 times in the tree. Node-link is con-
nected with the first node among all of the nodes with the same
item in the tree, and then the first node is concatenated with the
second node again. Terminating the connection, one chain is gen-
erated by concatenating all the nodes with the same item.
3.3. Constructing LP-tree
In this section, we describe a method for creating LP-tree. Treeconstruction is conducted as follows. We scan a database and
count all item supports. Thereafter, we sort all items in their sup-
port descending order and then generate a corresponding header
list, where the list stores items according to the sorted order.
Namely, the upper items in the list have greater supports while
the lower ones have smaller values. The insertion approach of
LP-trees transaction is divided into the two cases. The former
one is that the first transaction is inserted into the LP-tree. Its pro-
cedure is as follows. First, we generate LP-tree by scanning the
database again and sort the first transaction depending on the se-
quence of the header list. That is, its items with smaller support
than the minimum support are deleted, and the remaining items
are sorted in support descending order. After that, we insert the
sorted transaction into the tree, where LPN is created and con-
nected to a root since the tree is initially empty. Then, the first
transaction is entered into one LPN, which has internal array nodes
as many as the transaction length. That is, if any transaction length
is n and all items of the transaction are inserted in one LPN, the size
of LPN isn + 1 including a header. We connect LPNs header to its
parent after inserting the transaction items, where the header is
linked to a root since the current LPN is initially added to the tree.
We add a pointer of the root to thebranch node table and store the
first node of the newly created LPN into the child node list con-
nected to the root pointer. The second case is when all of the trans-
actions except for the first one are added in the tree. Its insertion is
performed as follows. We remove infrequent items in the inserted
transaction and sort its frequent items in support descending or-
der. Next, we add into BNL the addresses of the root and the first
node (i.e. header) of the current LPN since the root makes a child
node and thereby a branch occurs. Then, we insert the transaction
comparing corresponding paths from the root. Thereafter, we con-
firm all the child nodes of the root with BNL information since the
previous transaction is already added in the tree, i.e. the root has
one or more child nodes, where, we initially check the internal
child node of the current LPN. If the item to be inserted is the same
as the item of the checked node (the internal node), the current
location moves to that node, and its support is increased by 1.
Otherwise, to confirm the other child nodes, we read the corre-sponding branch information in BNL and then increase support of
the current item by 1 if the item is equal to the node derived from
BNL. If it is not equal to that, we generate a new LPN and insert
remaining items of the transaction in the new LPN, where the cur-
rent node becomes a branch node and is added in BNL. Assuming
thatn is the length of any transaction and ris the number of items
already inserted in the previous LPN, we store the remaining items
in the new LPN at the same time, where the number of array nodes
in the LPN is nr+ 1 (including a header). To store all transactions
with no problems, LP-tree connects all of its nodes in one of two
ways. First, internal nodes of LPN are directly connected to each
other without any pointer. Second, when any branch occurs, LP-
tree links corresponding child nodes utilizing BNL. Processing all
transactions, we can gain a complete LP-tree. Once the LP-tree con-struction terminates, BNL is eliminated because it is not used any
longer. LP-tree generated by the above processes can store all
transactions in a given database, and all internal and external
nodes of LPNs can be connected by the following Lemma 1.
Lemma 1. We can access all internal nodes in LPN without any
pointer which connects parent nodes with child ones while nodes of
the other LPN can be linked through the LPNs header and BNL.
Proof. Since LPN is composed of array nodes, we can find nodes
directly without pointers due to characteristics of the array. Given
any node, d , its parent node and child node are denoted as d 1
and d + 1, respectively. However, d+ 2 indicates ds descendantnode, notds second child node since the array has only one child
Fig. 2. Structure of Branch Node List (BNL).
Table 1
A transaction database.
TID Original items Sorted items
1 a, c, d, e, f, g, h e, a, f, c, d, g, h
2 a, b, e, f e, b, a, f
3 c, e, h e, c, h
4 b, f, g b, f, g
5 a, b, d, e, g e, b, a, d, g
6 e, g e, g
7 b, c, e, f e, b, f, c
8 a, b, c, e, f e, b, a, f, c
9 a, d, e e, a, d
10 b, d, e e, b, d
128 G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139
-
8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014
5/15
-
8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014
6/15
is allocated, both of the trees use O(jfreq(Trans)j) with respect to
the time to store item information. Thus, runtimes after storingthe transaction becomes O(2 jfreq(Trans)j) in FP-tree and
O(jfreq(Trans)j+ 1) in LP-tree. In other words, since FP-tree gener-
ates and records nodes one by one, it uses the time by
O(2 jfreq(Trans)j). However, since LP-tree generates nodes at
once, it needsO(1), and the total time becomes O(jfreq(Trans)j+ 1)
by adding the information record time. In the step of transaction
insertion, LP-tree is more efficient than FP-tree when nodes related
to items of any inserted transaction have 3 or less child nodes on
average, according to the followingLemma 2.
Lemma 2. Let a and b be search times needed when we insert acertain transaction in LP-tree, and FP-tree, respectively. Then, it is true
thata < bif the average number of child nodes related to the inserted
transaction is not greater than 3.
Proof. In the transaction insertion, we first confirm whether or not
a certain itemto be inserted after the current node exists among its
child nodes, and then, the current position moves to a correspond-
ing node or a new child node is created according to the result. Let
n,c, and K be the number of nodes which we have to visit to inserta transaction, the number of child nodes for each visited node, and
a set of c, respectively. Then, K is denoted as the followingequation.
K fc1; c2; . . . ; cng; cP 1:
To check whether the next inserted item exists among the child
nodes, FP-tree finds child nodes through the binary search method.
Since it accesses child node pointers and then visits corresponding
nodes, these processes require 2 Pn
i1lgcitimes, andn times areadditionally considered because we should move to the next nodes
as many as the number of the visited nodes, n. Therefore, the total
search time of FP-tree, b is as follows.
b 2 Xni1
lgci n:
LP-tree directly accesses internal child nodes of the current LPN
while it indirectly traverses the other child nodes through BNL. That
is, LP-tree first checks if the item to be inserted is equal to that of
the internal node, and then, it accesses the other child nodes using
BNL if there is no same item. Since searching for child nodes in BNL
is based on the binary search as in the FP-tree, LP-tree needs ntimesfor traversing child nodes in BNL and 2
Pni1lgci 1 times to
search for BNL, where LP-tree needs lg(ci 1) instead oflg(ci) due
to the advantage of the internal nodes. In addition, when the cur-rent location moves to the next inserted nodes, LP-tree requires
n-1 times, not n since it can directly access the nodes if they exist
in the current LPN. Accordingly, the total time of LP-tree, a is de-noted as follows:
a 2 Xni1
lgci 1 n 1 n:
The relation,a < b is equal to
2 Xni1
lgci 1 n 1 n< 2 Xni1
lgci n:
Solving this is as follows:
Xni1
lgci Xni1
lgci 1>n 1
2
Xni1
lgci lgci 1 1
2
>
1
2:
In the above inequality, lgci lgci 1 12
should not be smaller
than 0 so that the formula is true. Thus, ifci is less than approxi-mately 3.414215, the formula lgci lgci 1 >
12
is satisfied. Con-
sequently, it is certain thata< bwhen the average number of childnodes is not greater than 3. h
In Section4.2, we will show the experimental results of calcu-
lating the average number of child nodes regarding the LP-trees
generated from various datasets, where we will be able to see thatthe number does not exceed 3 in any case.
Next, we compare runtimes with regard to searching the trees.
FP-tree uses O(1) since it can find any target node directly using
the node link. LP-tree also consumes O(1) because of the node link.
However, there occurs a difference when the two trees search from
the item selected by the node link to a root. In here, we have to
consider time calculation depending on whether the current struc-
ture is array-based or pointer-based form. The array-based form
(LP-tree) has a structure of which all data is stored contiguously.
Therefore, it can directly access any node by approaching real
memory at the same time. On the other hand, using the pointer-
based form (FP-tree), we have to access a certain node indirectly
since we confirm where any pointer is stored, and then we ap-
proach the corresponding memory. That is, the first requires oneaccess, while the second needs two tasks [4]. Thus, assuming that
Fig. 4. A state of LP-tree inserting 310 transactions additionally to Fig. 3.
130 G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139
-
8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014
7/15
t is memory access time, the direct approach uses O(t) while the
indirect one uses O(2 t). Therefore, considering all of the above
results, we know that LP-tree is faster than FP-tree in most cases.
3.4.2. Integrating LPN
In the previous section, we learned how to create LP-tree. How-
ever, this method can cause fragmentation of LPNs since each
transaction is processed individually without comprehensive con-siderations. That is, any transaction may be stored in multiple LPNs
even though it can be inserted in only one LPN sufficiently. For
example, assuming that we insert two transactions, {a,b, c} and
{a,b,c,d} in an empty LP-tree, the first one is fully stored in one
LPN. Thereafter, in the second transaction, there occurs a branch
in item c, and then a new LPN is created and the remaining item
d is added in the LPN. Thus, the second LPN has a small number of
array nodes. To generate internal nodes as many as possible for
each LPN, we can consider attaching nodes at the very end of
LPN. Through the LPN Integrating operations, certain nodes are in-
serted at the very end of any LPN. Let I= {i1, i2, . . ., in} be any itemset
to be added and a be an item of internal nodes of LPN, i.e.
LPN= {hParent_Link i, ha1, S, L, bi, ha2, S, L, b i, . . ., ham, S, L, bi}, m< n.
Then, in order to apply it, the following conditions have to be sat-isfied: (1) The length of the inserted itemset is longer than that of
the target LPN (i.e. m< n); (2) Items of internal nodes in the LPN are
equal to the upper part of the inserted items; and (3) Sequence of
the common part should be consistent (i.e. i1= a1,i2= a2, . . ., im=
am, 1 6 m< n). If these conditions are completely satisfied, we con-
duct item insertion steps according to the following process: (1)
Supports of the common part between them increase by 1. (2)
We assign an array of a new LPN with the length computed from
the inserted items. (3) All the nodes of the previous LPN are in-
serted into the new LPN. (4) The remainder of the itemset is added
in the very end of the new LPN. (5) The previous LPN is deleted. By
applying this technique, we can make LPNs have more array nodes
compared with the previous LPNs. If shapes of transactions com-
posing any database are similar to each other, the LPN integratingoperations are more needed, and LPNs length becomes longer
whenever these operations are performed.
Since the LPN integrating technique is used only when a length
of any inserted transaction is longer than that of a target LPN, the
longer the length of the LPN is, the lower the possibility of the LPN
integrating operations is.
Example 4. Let {a,b,c,d,e,f} be a set of items to be inserted in LP-
tree. Then, assume that, as shown inFig. 5(a),LPN1is connected to
a root, andLPN2
andLPN3
are linked to the node with b inLPN1
.
Inserting the set of items without the LPN integrating technique,
the corresponding LP-tree is shown in Fig. 5(b). In short, the
number of LPN increases from 3 to 4. In contrast, using
the technique, the resulting LP-tree becomes Fig. 5(c). That is, the
number of LPNs is not increased since LPN1is rebuilt depending on
a series of tasks as mentioned above.
3.5. Mining frequent patterns based on LP-tree (LP-growth)
LP-growth searches LP-tree and creates a conditional LP-tree for
mining frequent patterns. To do that, our algorithm first selects the
bottom item from the header list and traverses nodes connected to
corresponding node links. Then, supports of the visited nodes are
stored, and nodes from each linked node to a root are searched.Each node can be accessed directly if the search is conducted with-
in one LPN. In other words, given a current node,Nk, the algorithm
immediately accesses N(k1)to approach a parent node ofNk. Iterat-
ing the traversal regarding one LPN, the algorithm reaches a header
of the LPN, where the header refers to its parent node, i.e. the other
LPN. LP-growth stops operations if the next position of the header
is a root; otherwise, it continues to find nodes tracking the other
LPN linked fromthe header. If any header indicates a root, it means
that the corresponding path has been searched completely, and
items in the path become a conditional transaction with support
of the first visited node. After visiting all of the other nodes refer-
ring to the node links, the algorithm constructs a conditional pat-
tern base (conditional database) with the obtained results. After
that, we compute item supports in the conditional database, andsome of the items are eliminated in the database if their supports
are less than a given minimum support threshold. Each transaction
Fig. 5. Item insertion applying the LPN integrating strategy.
G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139 131
http://-/?-http://-/?- -
8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014
8/15
of the conditional database is sorted in support descending order,
and then a new LP-tree is generated from the sorted database.
The new one becomes a conditional LP-tree and includes a prefix
itemset, a frequent item or pattern selected in the previous phase.
If a certain LP-tree forms a single-path, all combinations of the tree
are considered as frequent patterns in common with the FP-growthapproach. Therefore, in this case, the algorithm extracts frequent
patterns joining the prefix itemset and each of the combinations.
Searching trees in FP-tree requires numerous pointer usages in
general since it has to use pointers to move from any node to an-
other one. Meanwhile, LP-tree can minimize the number of using
pointers through the LPN strategy, which is proved as the following
Lemma 4.
Lemma 3. When any tree is traversed in bottom up manner, thenumber of using pointers in LP-tree is lower than or equal to that of
FP-tree.
Proof. Assuming that n is the length of a certain path from any
node to a root, FP-tree needs the n number of using pointers in
any case since it has to pass through n pointers with respect to
the path. In the case of LP-tree, it consists of one or more LPNs
and uses pointers (i.e. headers) only when new branches occur.
Thus, LP-growth uses the pointer when visiting headers for each
LPN. Here, we can consider the two cases as shown in Fig. 6(a and
b). The first is when all of the visited LPNs have one node. Let jKcj
be the numberof headers, i.e. LPNs, and jNj be the number of nodes.
Then, the number of visiting nodes, R is denoted as R= jKcj+ jNj. In
the first case in Fig. 6, wehaveto visit headers of LPNs jNj times. FP-tree refers to variables where parent pointers are stored so as to
Fig. 6. Cases of tree searches in LP-tree and FP-tree.
Fig. 7. Algorithm for LP-tree construction.
132 G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139
http://-/?- -
8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014
9/15
access parent nodes as shown inFig. 6(c). Therefore, in the FP-tree,
the total number of using pointers from a certain node to a root
becomes 2 jNjconsidering not only the pointer accesses but also
node visits. In case 1, LP-tree refers to headers in order to approach
parent nodes in common with the case of FP-tree, where R= 2 jNj
because jKcj= jNj. Therefore, it is regarded as the worst case and
needs pointer usages as many as the case of FP-tree. The number
is decreased continuously as the size of LPN increases. In case 2 of
Fig. 6, wecan calculate R= 1 + jNj since all the visiting nodes belong
to the lone LPN and since we visit only one header. Thus, if there is
an item set from any node to a root in one LPN, we can visit one
header regardless of the number of the items. This is considered
as the best case. That is, FP-tree uses jNj pointers while LP-tree
needs only one pointer use. h
Example 5. In Fig. 6, (a and b) showcertain LPNs in LP-tree, and (c)
is a part of FP-tree. Note that they have the same data (items).
When they search items from i4 to their own roots, Fig. 6(a) tra-
verses the LPNs as the following sequence, hi4, 1,NULL,falsei,
hgp3;1 i, hi3, 1,NULL,false i , hgp2;1 ihi2; 1; NULL;falsei; hgp1;1 i, h i1, 1,NULL,falsei, andhgprooti. Thus, the number of memory accesses (i.e., usingpointers) needed for searching nodes is 8. Meanwhile, Fig. 6(b)searches for them as the following sequence, hi4, 1,NULL,falsei,
hi3, 1,NULL,falsei, hi2, 1,NULL,falsei, hi1, 1,NULL,falsei, and hgprooti,thereby accessing the memory 5 times. Since Fig. 6(c) uses pointers
to traverse 4 nodes, it needs 4 pointer accesses and 4 node acces-
ses. Thus, the total number of memory accesses is 8.
3.6. LP-growth algorithm
To mine frequent patterns, LP-growth constructs LP-tree with
the algorithm Construct_LP-tree shown in Fig. 7. The algorithm
first scans D to calculate items supports and generates a Header
list (lines 13), and thereafter, D is scanned again to construct
LP-tree (line 4). LP-growth performs item insertion starting fromthe root (lines 542). If there is an item matched with one of the
child nodes of the root from BNL (line 6), the algorithm moves to
the corresponding node and increases its support (line 7). Other-
wise, a new LPN is generated (lines 912). After that, the algorithm
confirms whether the current node, gpc;ris a branch node (line 15).After that, it refers to the next node, gpc;r1 ifgpc;r is not a branchnode since gpc;rhas one child node or none. It increases the corre-sponding node support by 1 ifgpc;r1 is equal to ik, i.e. the child nodehas the same item as the inserted one (line 17). If they are not
equivalent, LP-growth creates a new LPN, wheregpc;r becomes anew branch node (lines 2529). After generating the new LPN with
the size of the inserted itemset, the algorithm inserts the remain-
ing items into the new LPN (lines 26). Then, its branch information
is recorded in BNL (line 27).In the LPN integrating procedure (lines 1823), LP-growth con-
ducts the LPN integrating operations if there are remaining items
after the insertion to the current LPN. Since in is the last item, if
ik is not the last one, it means that there are still items to be in-
serted. The algorithm generates a new LPN with the length of
the inserted itemset (line 19) and copies node information of the
previous LPN to the new one (line 20). Thereafter, it stores the
remaining items in the new LPN (line 21) and deletes the previous
LPN (line 22). When the current node is a branch node, steps
corresponding to the branch node operations are performed (lines
3142). If the next array node has an item equal to the inserted
item (line 32), the items support is increased by 1. After that,
the current position moves to the next one (line 33). In case
gpc;r1 is not equal to the itemto be inserted, the algorithm confirmschild nodes with BNL. LP-growth finds a location corresponding to
gpc;rin BNL and searches for child nodes. If there is any child nodehaving the item matched with ik among the found ones (line 34),
the algorithm increases its support by 1 and regards the corre-
sponding node as the new current node (line 35). If none of the
nodes exist in BNL, this algorithm makes a new LPN and inserts
the remaining items (lines 3842).
Fig. 8 shows the overall LP-growth algorithm, and it is per-
formed as follows. First, LP-growth checks whether current LP-tree
is a single-path (line 1). If it is true, the algorithm combines all the
items in the path with the prefix (lines 24), where the results be-
come valid frequent patterns. On the other hand, if it is multiple
paths, the algorithm traverses the current tree and creates a condi-
tional LP-tree (lines 519). Thereafter, our LP-growth selects an
item, i in the header list at first in order to search the tree (line
5), and it finds nodes from each node with the selected item to a
root using the corresponding node links (line 6). Items of visited
nodes are stored into L (line 10), where LP-growth directly accesses
the immediately preceding node if the current location is inside
LPN (line 11).
If the current gpc;ris the first node (header) of LPN (line 12), thecurrent position shifts to the parent node pointer stored in the
header of LPN (lines 1314). Iterating the traversal, we obtain all
conditional transactions including i, where a set of conditional
transactions become a conditional database, L0 (line 15). Con-
struct_LP-tree procedure is called for generating a conditional LP-
tree (line 16). After that, the algorithm removes BNL since it is
not needed any longer in the current step (line 17) and then calls
LP-growth recursively to extend the pattern (line 19).
4. Performance evaluation
In this section, we present experimental results by comparing
our algorithm, LP-growth, with the state-of-art algorithms. In order
to show that the experiments are reasonable, we evaluate their
performances based on three important criteria: runtime, memory
usage, and scalability. In addition, we also present tests for theaverage number of child nodes to prove the efficiency of LP-tree.
4.1. Test environment and datasets
LP-growth is written in C++, compiled at gcc 3.4.4, and run in
3.3 GHz Intel processor, 8 Gbyte memory, and Windows 7 OS.
Based on the environment, we compare our algorithm, LP-growth
Fig. 8. LP-growth algorithm.
G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139 133
http://-/?-http://-/?- -
8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014
10/15
-
8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014
11/15
number with various settings of the minimum support threshold
for each dataset. The average number of child nodes is the division
of the sum of the child nodes for each node except for the leaf
nodes by the number of all nodes without the leaves. The leaf
nodes are not considered since they have none of child nodes.Figs. 9 and 10show the results of dense datasets such as Chess,
Connect, and Pumsb, and sparse ones including Retail, Kosarak,
T10I4D1000K, BMS-WebView-1, and Chain-store, respectively.
From those results, we can observe that all of the LP-trees
generated by these datasets have less than 3 Average numbers of
child nodes (ACN) regardless of the minimum support threshold.
Note that, when the minimum support is 10%, the results of
T10I4D1000K, BMS-WebView-1, and Chain-store are not shown.
because all the items mined from these datasets have smaller sup-
port than the given minimum support threshold. That is, any tree is
not constructed and none of frequent patterns are generated under
this threshold setting.
4.3. Runtime evaluation
Figs. 1118show results of runtime experiments regarding the
real and synthetic datasets shown in Tables 2 and 3 respectively. In
these figures, we can observe that our LP-growth outperforms the
others in almost all of the cases. LP-growth uses the proposed lin-
ear structure to its trees instead of the previous tree form in order
to minimize access times to search nodes. As a result, its advanta-
ges have a positive effect on reducing runtime in whole experi-
ments. Especially as the minimum support threshold becomes
lower, the difference of runtime between our algorithm and the
others is bigger.
FP-growth shows the worst performance with respect to all the
datasets except for the Retail. Mining times of the FP-growth algo-
rithm are 3305 s when the minimum support threshold is 80% forthe Connect dataset and 1065 s when the threshold is 70% for the
Pumsb dataset. Note that the algorithm did not operate normally
for the Kosarak dataset because its memory usage exceeded the
limit allowed in our test environment. As an improved version of
the FP-growth, FP-growth-geothals shows better performance than
that of the FP-growth algorithm in the dense datasets such as Con-
nect, Pumsb, and Chess. Since FP-growth-tiny can reduce the size
of the FP-tree, it presents more improved runtime performance
compared to the above two algorithms. Its speed is also similar
to that of FP-growth in many cases but lags behind that of
CT-PRO and our LP-growth. CT-PRO has outstanding runtime per-
formance in many cases due to its technique which focuses on
increasing mining speed by storing and utilizing additional data
with a bit form. However, the CT-PRO algorithm generally usesenormous memory with respect to almost all of the used datasets,
and thus, its overall efficiency remarkably falls behind that of LP-
growth. Especially when the threshold was less than 0.1% for the
Chain-store and 0.7% for the Kosarak, the algorithm failed to mine
frequent patterns successfully because of its heavy memory con-
sumption that our system could not bear. For this reason, the re-
sults of CT-PRO for these datasets are not expressed in our graph
figures. FP-growth* shows fine runtime results in general since it
uses its own technique, FP-array for reducing the number of tree
traversals. Nevertheless, efficiency of FP-growth falls behind that
of our algorithm as shown in the figures since the benefit of LP-treeis higher than that of FP-array. MAFIA-FI requires more execution
Fig. 13. Runtime test (Pumsb).Fig. 12. Runtime test (Retail).
Fig. 14. Runtime test (Kosarak).
Fig. 15. Runtime test (Chess).
G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139 135
http://-/?-http://-/?- -
8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014
12/15
time than that of LP-growth and FP-growth in all the cases. Espe-
cially in the sparse datasets such as Retail, Kosarak, T10I4D1000K
and BMS-WebView-1, MAFIA-FI has worse performance than that
of FP-growth. Moreover, we could not evaluate the runtime perfor-
mance of the algorithm for the Chain-store due to its enormous
memory consumption. The reason is that it uses a vertical bitmap
representation to increase mining performance, but its effect is not
applied in sparse datasets in contrast to dense ones. In Fig. 16, MA-
FIA-FI needs 1054 s with a minimum support, 0.01%, so its run-
times are not shown in these figures since we can infer that thealgorithm continues to have the worst performance in all the cases
of the T10I4D1000K dataset. This tendency is also similarly
represented in Fig. 12. From all of the runtime results shown in
Figs. 1118, we can observe that LP-growth and CT-PRO present
outstanding runtime performance. However, CT-PRO is unstable
and not available for several cases as shown in the figures because
it requires a very large amount of memory. Therefore, it is assumed
that LP-growth is better than CT-PRO in terms of overall capability
of mining frequent patterns.
4.4. Memory usage evaluation
In this section, we evaluate memory usage for each algorithmwith the same datasets as the runtime tests. In Figs. 1926,
Fig. 16. Runtime test (T10I4D1000K).
Fig. 21. Memory test (Pumsb).
Fig. 20. Memory test (Retail).
Fig. 19. Memory test (Connect).
Fig. 18. Runtime test (BMS-WebView-1).
Fig. 17. Runtime test (Chain-store).
136 G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139
-
8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014
13/15
LP-growth and FP-growth present outstanding memory perfor-
mance, while CT-PRO shows the worst results in almost all cases.
Although our algorithm does not show the best memory usage in
a few cases, it guarantees memory consumption as good as that
of the state-of-the-art algorithm, FP-growth. Moreover, our algo-
rithm presents the most outstanding results in many cases. Espe-
cially inFig. 20for Retail dataset, our LP-growth outperforms the
others including FP-growth in all of the cases. For the Kosarak
dataset, CT-PRO could not operate normally when the minimum
support threshold was 0.6% or less, and for the Chain-store dataset,
it was not performed successfully when the threshold was 0.01% orless since it consumed too much memory, which is the reason why
the results of the algorithm are not included inFigs. 22 and 25. Inaddition, CT-PRO uses a lot of memory compared to the other algo-
rithms. Meanwhile, since LP-growth uses very little memory com-
pared to CT-PRO and the others, it can be more effective in
memory-constrained environments than the others.
FP-growth also requires a lot of memory in many cases which is
common with the CT-PRO algorithm. FP-growth used too much
memory mining frequent patterns, so we could not express the re-
sults into the graph figures with respect to the following situa-
tions: it used 833 MB when the minimum support threshold was
80% for the Connect dataset and 1633 MB when the threshold
was 80% for the Pumsb dataset. Furthermore, FP-growth could
not mine patterns normally when the threshold was 0.7% or less
for the Kosarak dataset since it required more memory than the
limit allowed in our system. In addition, the algorithm consumedmore memory than that of CT-PRO with respect to the Chess data-
set as shown inFig. 23. Meanwhile, FP-growth-goethals, which is
an optimized version of FP-growth, showed relatively fine memory
efficiency although its performance still lags behind that of ours.
Due to the techniques for saving memory space by generating none
of conditional databases, FP-growth-tiny reduces memory usage
for the relatively large datasets such as Pumsb and Kosrark com-
pared to FP-growth-goethals although its effect is not available
for the small datasets such as Connect and Chess. Since the LP-tree
of LP-growth minimizes its tree sizes by using linear structure, it
guarantees outstanding performance as shown in the figures.
Moreover, LP-growth has almost constant and stable memory
consumption regardless of the threshold in comparison to the
other algorithms. Meanwhile, MAFIA-FI requires relatively muchmemory since the bitmap proposed in the algorithm needs more
Fig. 26. Memory test (BMS-WebView-1).
Fig. 24. Memory test (T10I4D1000K).
Fig. 22. Memory test (Kosarak).
Fig. 23. Memory test (Chess).
Fig. 25. Memory test (Chain-store).
G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139 137
http://-/?-http://-/?-http://-/?- -
8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014
14/15
memory resources than the others such as LP-tree and FP-growth
.FP-growth also shows these characteristics in many cases, but it
does not guarantees them in the Retail dataset shown in Fig. 20.
As in the previous runtime test in Fig. 16, memory results of MA-
FIA-FI are not provided in Fig. 24. The reason is that when the min-
imum support is 0.01%, it consumes 487 MB to perform its own
mining operations, and therefore, we can expect that the algorithm
requires more memory in the case of the other minimum supports.
Thus, we do not need to denote the results to the figure.
4.5. Scalability evaluation
Figs. 2730 show results for scalability tests performed with the
datasets inTable 3. Note that MAFIA-FI is excluded in the test for
the datasets with increasing transactions since it needs longer run-times and more memory usages than the others in these tests, so it
is hard to express the results of its scalability experiments into the
figures with those of the other algorithms. In addition, FP-growth,
MAFIA-FI, and CT-PRO are not evaluated in the test for the other
datasets with increasing items because they cannot be performed
normally for these datasets due to their lack of memory. The min-
imum support threshold is fixed at 0.1% in these tests. In Fig. 27,
runtime increase of LP-growth is far smaller and more stable than
that of the others since LP-tree allows LP-growth to perform min-
ing operations effectively regardless of increment of transactions.
FP-growth shows better scalability than FP-growth and FP-
growth-goethals due to its special structure, FP-array, although
its efficiency is lower than ours. FP-growth-tiny also presents fine
scalability similar to that of FP-growth
. CT-PRO shows an out-standing scalability result, but our LP-growth is still better. Our
algorithm also guarantees the best runtime scalability for the testshown inFig. 29. FP-growth-goethals has the worst result, while
FP-growth shows fine runtime scalability although its perfor-
mance fall behind ours. FP-growth-tiny presents good performance
similar to that of FP-growth in the beginning, but its scalability
becomes drastically low as the number of items gradually in-
creases. In Fig. 28, all of the algorithms have almost constant mem-
ory usages since all of them are tree-based algorithms, but their
absolute values are different from each other, and especially, LP-
growth presents the smallest memory consumption due to the
advantages of LP-tree. On the other hand, Fig. 30 shows results
different from Fig. 28. Since necessary tree sizes become larger
gradually as the number of attributes is increased, memory usages
of the algorithms become bigger as shown in the figure. However,
LP-growth shows the best memory scalability while the othershave relatively poor performance, which indicates that our LP-tree
can store these increasing attributes more efficiently than the
other structures of the competitor algorithms. Through the above
experimental results, we know that the proposed algorithm,
LP-growth, outperforms the others with respect to increasing
transactions and items in terms of salability as well as runtime
and memory usage for the real datasets.
5. Conclusion
In this paper, we proposed a new tree structure, LP-tree, and an
algorithm, LP-growth, applying it to the mining process. The main
goal of the proposed algorithm was to reduce not only memoryusage needed for building trees but also time to traverse them by
Fig. 27. Scalability test (Runtime).
Fig. 30. Scalability test (Memory).Fig. 28. Scalability test (Memory).
Fig. 29. Scalability test (Runtime).
138 G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139
-
8/12/2019 Efficient Frequent Mining Frequent Patterns without Candidate FPGrowth 2004requent Pattern Mining Based on Linear Prefix Tree 2014
15/15
applying a linear structure instead of the previous form used in FP-
growth. LP-tree contributed to improving performance of frequent
pattern mining since it spent less memory generating nodes com-
pared to FP-tree and accessed them without any pointers in many
cases. Our experimental results showed that LP-growth presented
outstanding performance in terms of runtime, memory usage, and
scalability. We could also observe that our algorithmoutperformed
the previous algorithms especially in the runtime experiments due
to the reduced pointer accesses. The techniques and strategies de-
scribed in this paper can be applied to not only general frequent
pattern mining but also a variety of pattern mining fields such as
closed/maximal pattern mining, top-k pattern mining, and graph
mining. We expect that these future researches lead to improve-
ment of mining performance in various areas.
Acknowledgements
This research was supported by the National Research Founda-
tion of Korea (NRF) funded by the Ministry of Education, Science
and Technology (NRF Nos. 2013005682 and 20080062611).
References
[1] R. Agrawal, R. Srikant, Fast algorithms for mining association rules, Very Large
Data Bases(VLDB) (1994) 487499.
[2] C.F. Ahmed, S.K. Tanbeer, B. Jeong, H. Choi, A framework for mining interesting
high utility patterns with a strong frequency affinity, Information Science
(ISCI) 181 (21) (2011) 48784894.
[3] C.F. Ahmed, S.K. Tanbeer, B.S. Jeong, Y.K. Lee, Interactive mining of high utility
patterns over data streams, Expert System with Applications (ESWA) 39 (15)
(2012) 1197911991.
[4] B. Andres, U. kothe, T. Kroger, F.A. Hamprecht, Runtime-flexible multi-
dimensional arrays and views for C++98 and C++0x, Software: Practice and
Experience 35 (2) (2010) 159188.
[5] D. Burdick, M. Calimlim, J. Flanick, J. gehrke, T. Yiu, MAFIA: a maximal frequent
itemset algorithm, Transactions on Knowledge and Data Engineering (TKDE)
17 (11) (2005) 14901503.
[6] L. Chang, T. Wang, D. Yang, H. Luan, SeqStream: mining closed sequential
patterns over stream sliding windows, International Conference on Data
Mining (ICDM) (2008) 8392.[7] J.H. Chang, N.H. Park, Comparative analysis of sequence weighting approaches
for mining time-interval weighted sequential patterns, Expert System with
Applications (ESWA) 39 (3) (2012) 38673873.
[8] G.P. Chen, Y.B. Yang, Y. Zhang, MapReduce-based balanced mining for closed
frequent itemset, International Conference Web Services (2012) 652653.
[9] Y. Chen, W. Peng, S. Lee, CEMiner an efficient algorithm for mining closed
patterns from time interval-based data, in: International Conference on Data
Mining (ICDM), 2011, pp. 121130.
[10] C. Gao, J. Wang, Q. Yang, Efficient mining of closed sequential patterns on
stream sliding window, in: International Conference on Data Mining (ICDM),
2011, pp. 10441049.
[11] B. Goethals. .[12] G. Grahne, J. Zhu, Fast algorithms for frequent itemset mining using FP-trees,
Transactions on Knowledge and Data Engineering (TKDE) 17 (10) (2005)
13471362.
[13] M. Hamada, K. Tsuda, T. Kudo, T. Kin, K. Asai, Mining frequent stem patterns
from unaligned RNA sequences, Bioinformatics 22 (20) (2006) 24802487.
[14] J. Han, J. Pei, Y. Yin, R. Mao, Mining frequent patterns without candidate
generation: a frequent pattern tree approach, Data Mining and KnowledgeDiscovery (DMKD) 8 (1) (2004) 5387.
[15] J. Han, H. Cheng, D. Xin, X. Yan, Frequent pattern mining: current status and
future directions, Data Mining and Knowledge Discovery (DMKD) 15 (1)
(2007) 5586.
[16] T. Hu, S.Y. Sung, H. Xiong, Q. Fu, Discovery of maximum length frequent
itemsets, Information Sciences 178 (1) (2008) 6987.
[17] H.C. Kum, J.H. Changa, W. Wang, Sequential pattern mining in multi-databases
via multiple alignment, Data Mining and Knowledge Discovery (DMKD) 12 (2)
(2006) 151180.
[18] H.T. Lam, T. Calders, Mining top-K frequent items in a data streamwith flexible
sliding windows, Knowledge Discovery and Data Mining (KDD) (2010) 283
292.
[19] G. Lee, U. Yun, K. Ryu, Sliding window based weighted maximal frequent
pattern mining over data streams, Expert Systems with Applications 41 (2)
(2014) 694708.
[20] H.F. Li, S. Lee, Mining top-K path traversal patterns over streaming web click-
sequences, Journal of Information Science and Engineering 25 (4) (2009)
11211133.
[21] K. Lin, I. Liao, Z. Chen, An improved frequent pattern growth method for
mining association rules, Expert System with Applications (ESWA) 38 (5)(2011) 51545161.
[22] H. Liu, F. Lin, J. He, Y. Cai, New approach for the sequential pattern mining of
high-dimensional sequence databases, Decision Support Systems 50 (1) (2010)
270280.
[23] Y. Liu, Mining frequent patterns from univariate uncertain data, Data and
Knowledge Engineering (DKE) 71 (1) (2012) 4768.
[24] C. Lucchese, S. Orlando, R. Perego, Mining top-K patterns from binary datasets
in presence of noise, in: Proceedings of the SIAM International Conference on
Data Mining (SDM), 2010, pp. 165176.
[25] E. Ozkural, C. Aykanat, A space optimization for FP-growth, in: FIMI 04
Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining
Implementations, November 2004.
[26] A. Pietracaprina, D. Zandolin, Mining frequent itemsets using patricia tries, in:
Workshop on Frequent Itemset Mining Implementations, 2003.
[27] C. Raissi, T. Calders, P. poncelet, Mining conjunctive sequential patterns, Data
Mining and Knowledge Discovery (DMKD) 17 (1) (2008) 7793.
[28] S. Ruggieri, Frequent regular itemset mining, Knowledge Discovery and Data
Minin (KDD) (2010) 263272.
[29] Y.G. Sucahyo, R.P. Gopalan, CT-PRO: a bottom-up non recursive frequent
itemset mining algorithm using compressed FP-tree data structure, in: FIMI
04 Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining
Implementations, November 2004.
[30] S.K. Tanbeer, C.F. Ahmed, B.S. Jeong, Y. Lee, Efficient single-pass frequent
pattern mining using a prefix-tree, Information Sciences 179 (5) (2008) 559
583.
[31] S.K. Tanbeer, C.F. Ahmed, B.S. Jeong, Y.K. Lee, Sliding window-based frequent
pattern miningover data streams, Information Sciences 179 (22) (2009)3843
3865.
[32] S.K. Tanbeer, C.F. Ahmed, B.S. Jeong, Y.K. Lee, Mining regular patterns in data
streams, Database Systems for Advanced Applications (2010) 399413.
[33] F. Tao, Weighted association rule mining using weighted support and
significant framework, Knowledge Discovery and Data Minin (KDD) (2003)
661666.
[34] V.S. Tseng, C.W. Wu, B.E. Shie, P.S. Yu, UP-Growth: an efficient algorithm for
high utility itemset mining, Knowledge Discovery and Data mining (KDD)
(2010) 253262.
[35] R.C. Wong, A.W. Fu, Mining top-K frequent itemsets from data streams, DataMining and Knowledge Discovery (DMKD) 13 (2) (2006) 193217.
[36] T. Wu, Y. Chen, J. han, Re-examination of interestingness measures in pattern
mining: a unified framework, Data Mining and Knowledge Discovery (DMKD)
21 (3) (2010) 371397.
[37] F.Y. Ye, J.D. Wang, B.L. Shao, New algorithm for mining frequent itemsets in
sparse database, in: Proc. the Fourth International Conference on Machine
Learning and Cybernetics, 2005, pp. 15541558.
[38] U. Yun, K.H. Ryu, Discovering important sequential patterns with length-
decreasing weighted support constraints, International Journal of Information
Technology and Decision Making 9 (4) (2010) 575599.
[39] U. Yun, K. Ryu, E. Yoon, Weighted approximate sequential pattern mining
within tolerance factors, Intelligent Data Analysis 15 (4) (2011) 551569.
[40] U. Yun, H. Shin, K. Ryu, E. Yoon, An efficient mining algorithm for maximal
weighted frequent patterns in transactional databases, Knowledge Based
Systems 33 (2012) 5364.
[41] U. Yun, K. Ryu, Efficient mining of maximal correlated weight frequent
patterns, Intelligent Data Analysis 17 (5) (2013) 917939.
[42] U. Yun, G. Lee, K. Ryu, Mining maximal frequent patterns by considering
weight conditions over data streams, in: Knowledge Based Systems 55 (2014)4965.
[43] X. Zhang, Y. Zhang, Sliding-window top-K pattern mining on uncertain
streams, Journal of Computational Information Systems 7 (3) (2011) 984992.
[44] F. Zhu, Q. Qu, D. Lo, X. Yan, J. Han, P.S. Yu, Mining top-K large structure pattern
in a massive network, Very Large Data Bases (VLDB) (2011) 807818 .
[45] J. Zou, J. Xiao, R. Hou, Yanqi Wang, Frequent instruction sequential pattern
mining in hardware sample data, International Conference on Data Mining
(ICDM) (2010) 12051210.
G. Pyun et al. / Knowledge-Based Systems 55 (2014) 125139 139
http://refhub.elsevier.com/S0950-7051(13)00324-9/h0005http://refhub.elsevier.com/S0950-7051(13)00324-9/h0005http://refhub.elsevier.com/S0950-7051(13)00324-9/h0010http://refhub.elsevier.com/S0950-7051(13)00324-9/h0010http://refhub.elsevier.com/S0950-7051(13)00324-9/h0010http://refhub.elsevier.com/S0950-7051(13)00324-9/h0015http://refhub.elsevier.com/S0950-7051(13)00324-9/h0015http://refhub.elsevier.com/S0950-7051(13)00324-9/h0015http://refhub.elsevier.com/S0950-7051(13)00324-9/h0020http://refhub.elsevier.com/S0950-7051(13)00324-9/h0020http://refhub.elsevier.com/S0950-7051(13)00324-9/h0020http://refhub.elsevier.com/S0950-7051(13)00324-9/h0020http://refhub.elsevier.com/S0950-7051(13)00324-9/h0025http://refhub.elsevier.com/S0950-7051(13)00324-9/h0025http://refhub.elsevier.com/S0950-7051(13)00324-9/h0025http://refhub.elsevier.com/S0950-7051(13)00324-9/h0025http://refhub.elsevier.com/S0950-7051(13)00324-9/h0030http://refhub.elsevier.com/S0950-7051(13)00324-9/h0030http://refhub.elsevier.com/S0950-7051(13)00324-9/h0030http://refhub.elsevier.com/S0950-7051(13)00324-9/h0030http://refhub.elsevier.com/S0950-7051(13)00324-9/h0035http://refhub.elsevier.com/S0950-7051(13)00324-9/h0035http://refhub.elsevier.com/S0950-7051(13)00324-9/h0035http://refhub.elsevier.com/S0950-7051(13)00324-9/h0040http://refhub.elsevier.com/S0950-7051(13)00324-9/h0040http://adrem.ua.ac.be/~goethals/software/http://adrem.ua.ac.be/~goethals/software/http://refhub.elsevier.com/S0950-7051(13)00324-9/h0045http://refhub.elsevier.com/S0950-7051(13)00324-9/h0045http://refhub.elsevier.com/S0950-7051(13)00324-9/h0045http://refhub.elsevier.com/S0950-7051(13)00324-9/h0050http://refhub.elsevier.com/S0950-7051(13)00324-9/h0050http://refhub.elsevier.com/S0950-7051(13)00324-9/h0055http://refhub.elsevier.com/S0950-7051(13)00324-9/h0055http://refhub.elsevier.com/S0950-7051(13)00324-9/h0055http://refhub.elsevier.com/S0950-7051(13)00324-9/h0060http://refhub.elsevier.com/S0950-7051(13)00324-9/h0060http://refhub.elsevier.com/S0950-7051(13)00324-9/h0060http://refhub.elsevier.com/S0950-7051(13)00324-9/h0065http://refhub.elsevier.com/S0950-7051(13)00324-9/h0065http://refhub.elsevier.com/S0950-7051(13)00324-9/h0070http://refhub.elsevier.com/S0950-7051(13)00324-9/h0070http://refhub.elsevier.com/S0950-7051(13)00324-9/h0070http://refhub.elsevier.com/S0950-7051(13)00324-9/h0075http://refhub.elsevier.com/S0950-7051(13)00324-9/h0075http://refhub.elsevier.com/S0950-7051(13)00324-9/h0075http://refhub.elsevier.com/S0950-7051(13)00324-9/h0080http://refhub.elsevier.com/S0950-7051(13)00324-9/h0080http://refhub.elsevier.com/S0950-7051(13)00324-9/h0080http://refhub.elsevier.com/S0950-7051(13)00324-9/h0085http://refhub.elsevier.com/S0950-7051(13)00324-9/h0085http://refhub.elsevier.com/S0950-7051(13)00324-9/h0085http://refhub.elsevier.com/S0950-7051(13)00324-9/h0090http://refhub.elsevier.com/S0950-7051(13)00324-9/h0090http://refhub.elsevier.com/S0950-7051(13)00324-9/h0090http://refhub.elsevier.com/S0950-7051(13)00324-9/h0095http://refhub.elsevier.com/S0950-7051(13)00324-9/h0095http://refhub.elsevier.com/S0950-7051(13)00324-9/h0095http://refhub.elsevier.com/S0950-7051(13)00324-9/h0100http://refhub.elsevier.com/S0950-7051(13)00324-9/h0100http://refhub.elsevier.com/S0950-7051(13)00324-9/h0105http://refhub.elsevier.com/S0950-7051(13)00324-9/h0105http://refhub.elsevier.com/S0950-7051(13)00324-9/h0105http://refhub.elsevier.com/S0950-7051(13)00324-9/h0110http://refhub.elsevier.com/S0950-7051(13)00324-9/h0110http://refhub.elsevier.com/S0950-7051(13)00324-9/h0115http://refhub.elsevier.com/S0950-7051(13)00324-9/h0115http://refhub.elsevier.com/S0950-7051(13)00324-9/h0115http://refhub.elsevier.com/S0950-7051(13)00324-9/h0120http://refhub.elsevier.com/S0950-7051(13)00324-9/h0120http://refhub.elsevier.com/S0950-7051(13)00324-9/h0120http://refhub.elsevier.com/S0950-7051(13)00324-9/h0125http://refhub.elsevier.com/S0950-7051(13)00324-9/h0125http://refhub.elsevier.com/S0950-7051(13)00324-9/h0130http://refhub.elsevier.com/S0950-7051(13)00324-9/h0130http://refhub.elsevier.com/S0950-7051(13)00324-9/h0130http://refhub.elsevier.com/S0950-7051(13)00324-9/h0135http://refhub.elsevier.com/S0950-7051(13)00324-9/h0135http://refhub.elsevier.com/S0950-7051(13)00324-9/h0135http://refhub.elsevier.com/S0950-7051(13)00324-9/h0140http://refhub.elsevier.com/S0950-7051(13)00324-9/h0140http://refhub.elsevier.com/S0950-7051(13)00324-9/h0145http://refhub.elsevier.com/S0950-7051(13)00324-9/h0145http://refhub.elsevier.com/S0950-7051(13)00324-9/h0145http://refhub.elsevier.com/S0950-7051(13)00324-9/h0150http://refhub.elsevier.com/S0950-7051(13)00324-9/h0150http://refhub.elsevier.com/S0950-7051(13)00324-9/h0150http://refhub.elsevier.com/S0950-7051(13)00324-9/h0155http://refhub.elsevier.com/S0950-7051(13)00324-9/h0155http://refhub.elsevier.com/S0950-7051(13)00324-9/h0160http://refhub.elsevier.com/S0950-7051(13)00324-9/h0160http://refhub.elsevier.com/S0950-7051(13)00324-9/h0160http://refhub.elsevier.com/S0950-7051(13)00324-9/h0160http://refhub.elsevier.com/S0950-7051(13)00324-9/h0165http://refhub.elsevier.com/S0950-7051(13)00324-9/h0165http://refhub.elsevier.com/S0950-7051(13)00324-9/h0170http://refhub.elsevier.com/S0950-7051(13)00324-9/h0170http://refhub.elsevier.com/S0950-7051(13)00324-9/h0175http://refhub.elsevier.com/S0950-7051(13)00324-9/h0175http://refhub.elsevier.com/S0950-7051(13)00324-9/h0180http://refhub.elsevier.com/S0950-7051(13)00324-9/h0180http://refhub.elsevier.com/S0950-7051(13)00324-9/h0180http://refhub.elsevier.com/S0950-7051(13)00324-9/h0180http://refhub.elsevier.com/S0950-7051(13)00324-9/h0180http://refhub.elsevier.com/S0950-7051(13)00324-9/h0180http://refhub.elsevier.com/S0950-7051(13)00324-9/h0175http://refhub.elsevier.com/S0950-7051(13)00324-9/h0175http://refhub.elsevier.com/S0950-7051(13)00324-9/h0170http://refhub.elsevier.com/S0950-7051(13)00324-9/h0170http://refhub.elsevier.com/S0950-7051(13)00324-9/h0165http://refhub.elsevier.com/S0950-7051(13)00324-9/h0165http://refhub.elsevier.com/S0950-7051(13)00324-9/h0160http://refhub.elsevier.com/S0950-7051(13)00324-9/h0160http://refhub.elsevier.com/S0950-7051(13)00324-9/h0160http://refhub.elsevier.com/S0950-7051(13)00324-9/h0155http://refhub.elsevier.com/S0950-7051(13)00324-9/h0155http://refhub.elsevier.com/S0950-7051(13)00324-9/h0150http://refhub.elsevier.com/S0950-7051(13)00324-9/h0150http://refhub.elsevier.com/S0950-7051(13)00324-9/h0150http://refhub.elsevier.com/S0950-7051(13)00324-9/h0145http://refhub.elsevier.com/S0950-7051(13)00324-9/h0145http://refhub.elsevier.com/S0950-7051(13)00324-9/h0145http://refhub.elsevier.com/S0950-7051(13)00324-9/h0140http://refhub.elsevier.com/S0950-7051(13)00324-9/h0140http://refhub.elsevier.com/S0950-7051(13)00324-9/h0135http://refhub.elsevier.com/S0950-7051(13)00324-9/h0135http://refhub.elsevier.com/S0950-7051(13)00324-9/h0135http://refhub.elsevier.com/S0950-7051(13)00324-9/h0130http://refhub.elsevier.com/S0950-7051(13)00324-9/h0130http://refhub.elsevier.com/S0950-7051(13)00324-9/h0130http://refhub.elsevier.com/S0950-7051(13)00324-9/h0125http://refhub.elsevier.com/S0950-7051(13)00324-9/h0125http://refhub.elsevier.com/S0950-7051(13)00324-9/h0120http://refhub.elsevier.com/S0950-7051(13)00324-9/h0120http://refhub.elsevier.com/S0950-7051(13)00324-9/h0120http://refhub.elsevier.com/S0950-7051(13)00324-9/h0115http://refhub.elsevier.com/S0950-7051(13)00324-9/h0115http://refhub.elsevier.com/S0950-7051(13)00324-9/h0115http://refhub.elsevier.com/S0950-7051(13)00324-9/h0110http://refhub.elsevier.com/S0950-7051(13)00324-9/h0110http://refhub.elsevier.com/S0950-7051(13)00324-9/h0105http://refhub.elsevier.com/S0950-7051(13)00324-9/h0105http://refhub.elsevier.com/S0950-7051(13)00324-9/h0100http://refhub.elsevier.com/S0950-7051(13)00324-9/h0100http://refhub.elsevier.com/S0950-7051(13)00324-9/h0095http://refhub.elsevier.com/S0950-7051(13)00324-9/h0095http://refhub.elsevier.com/S0950-7051(13)00324-9/h0095http://refhub.elsevier.com/S0950-7051(13)00324-9/h0090http://refhub.elsevier.com/S0950-7051(13)00324-9/h0090http://refhub.elsevier.com/S0950-7051(13)00324-9/h0090http://refhub.elsevier.com/S0950-7051(13)00324-9/h0085http://refhub.elsevier.com/S0950-7051(13)00324-9/h0085http://refhub.elsevier.com/S0950-7051(13)00324-9/h0085http://refhub.elsevier.com/S0950-7051(13)00324-9/h0080http://refhub.elsevier.com/S0950-7051(13)00324-9/h0080http://refhub.elsevier.com/S0950-7051(13)00324-9/h0080http://refhub.elsevier.com/S0950-7051(13)00324-9/h0075http://refhub.elsevier.com/S0950-7051(13)00324-9/h0075http://refhub.elsevier.com/S0950-7051(13)00324-9/h0075http://refhub.elsevier.com/S0950-7051(13)00324-9/h0070http://refhub.elsevier.com/S0950-7051(13)00324-9/h0070http://refhub.elsevier.com/S0950-7051(13)00324-9/h0070http://-/?-http://refhub.elsevier.com/S0950-7051(13)00324-9/h0065http://refhub.elsevier.com/S0950-7051(13)00324-9/h0065http://-/?-http://refhub.elsevier.com/S0950-7051(13)00324-9/h0060http://refhub.elsevier.com/S0950-7051(13)00324-9/h0060http://refhub.elsevier.com/S0950-7051(13)00324-9/h0060http://-/?-http://refhub.elsevier.com/S0950-7051(13)00324-9/h0055http://refhub.elsevier.com/S0950-7051(13)00324-9/h0055http://refhub.elsevier.com/S0950-7051(13)00324-9/h0055http://-/?-http://refhub.elsevier.com/S0950-7051(13)00324-9/h0050http://refhub.elsevier.com/S0950-7051(13)00324-9/h0050http://-/?-http://refhub.elsevier.com/S0950-7051(13)00324-9/h0045http://refhub.elsevier.com/S0950-7051(13)00324-9/h0045http://refhub.elsevier.com/S0950-7051(13)00324-9/h0045http://-/?-http://adrem.ua.ac.be/~goethals/software/http://adrem.ua.ac.be/~goethals/software/http://-/?-http://-/?-http://refhub.elsevier.com/S0950-7051(13)00324-9/h0040http://refhub.elsevier.com/S0950-7051(13)00324-9/h0040http://refhub.elsevier.com/S0950-7051(13)00324-9/h0035http://refhub.elsevier.com/S0950-7051(13)00324-9/h0035http://refhub.elsevier.com/S0950-7051(13)00324-9/h0035http://refhub.elsevier.com/S0950-7051(13)00324-9/h0030http://refhub.elsevier.com/S0950-7051(13)00324-9/h0030http://refhub.elsevier.com/S0950-7051(13)00324-9/h0030http://refhub.elsevier.com/S0950-7051(13)00324-9/h0025http://refhub.elsevier.com/S0950-7051(13)00324-9/h0025http://refhub.elsevier.com/S0950-7051(13)00324-9/h0025http://refhub.elsevier.com/S0950-7051(13)00324-9/h0020http://refhub.elsevier.com/S0950-7051(13)00324-9/h0020http://refhub.elsevier.com/S0950-7051(13)00324-9/h0020http://refhub.elsevier.com/S0950-7051(13)00324-9/h0015http://refhub.elsevier.com/S0950-7051(13)00324-9/h0015http://refhub.elsevier.com/S0950-7051(13)00324-9/h0015http://refhub.elsevier.com/S0950-7051(13)00324-9/h0010http://refhub.elsevier.com/S0950-7051(13)00324-9/h0010http://refhub.elsevier.com/S0950-7051(13)00324-9/h0010http://refhub.elsevier.com/S0950-7051(13)00324-9/h0005http://refhub.elsevier.com/S0950-7051(13)00324-9/h0005http://-/?-http://-/?-