1 maintaining knowledge-bases of navigational patterns from streams of navigational sequences...
TRANSCRIPT
1
Maintaining Knowledge-Bases of Navigational Patterns from Streams of
Navigational Sequences
Ajumobi Udechukwu, Ken Barker, Reda Alhajj
Proceedings of the 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA’05)
Advisor : Jia-Ling Koh
Speaker : Chun-Wei Hsieh
2
Introduction
Navigational patterns: traversal patterns
Two broad techniques for mining
navigational patterns– 1. level-wise, apriori-based techniques– 2. tree-based techniques
3
Methodology
Sliding window Batch-update strategy
– Batch: the web log in the base time unit
Example
B1
4
B1 B2B1 B2 B3B1 B2 B3 B4B1 B2 B3 B4 B5B1 B2 B3 B4 B5 B6
4
Adapted GST
Adapted generalized suffix tree Appending a stop symbol to all strings Mining without thresholds
5
1,12,1
1,22,2
1,32,3
LQ R$$R Q
R$
1,42,43,3
$
3,1
$
3,1
$
1,11,2
1,3
LQR$$RQ
R$
1,4
$
1,12,1
1,22,2
1,32,3
LQR$$RQ
R$
1,42,4
$
Adapted GST
LQR
1,1LQR$
1,11,2LQR$$RQ
1,11,2
1,3
LQR$$RQ
R$
LQR LQ
6
Adapted GST
7
The Challenge of Adapted GST
”LQ” occurs in B1 with support count of 4
and “L” occurs independently in B2 with support count of 2
Total count of “L” should be 4 + 2
8
AC-NAP tree 1
9
AC-NAP tree 2
Output all node labels and counts to a database
10
Maintaining patterns within a window
11
Maintaining patterns within a window
Count total support
Remove out_of_date patterns
12
Experiments
OS: Microsoft Windows XP professional edition CPU: 2GHz Intel Pentium 4 RAM: 512MB Program language: Java DBMS: MySQL Data: real-world web logs of ”msnbc.com”
13
Experiments
14
Experiments
15
Experiments