a survey on web usage mining: theory and applications
TRANSCRIPT
the tree. This technique is more precise and scalable for
mining frequent access patterns with dissimilar lengths.
Qianhui Althea LIANG et al., [15] proposed and
sophisticated the conception of Web service usage patterns
and pattern discovery through service mining. The author
described three different levels of service handling data: (a)
user demand level, (b) template level and (c) instance level. At
every stage, the author examined the patterns of service
handling data and the detection of these patterns. A technique
for service pattern detection at the template level has been
developed.
Huge amount of data are collected automatically by Web
servers and accumulated in access log files. Examination of
server access data can offer considerable and valuable
information. WUM is the procedure of utilizing data mining
approaches to the discovery of usage patterns from Web data
and is aimed towards applications. It extracts the secondary
data obtained from the communications of the users
throughout certain time of Web sessions. By knowing the
value of significant application, WUM has seen a quick raise
in attention, from both the research and application fields.
Etminani et al., [16] exploited the use of Kohonen's SOM
(Self Organizing Map) to pre-processed Web logs of one of
the leading university Web server logs (http://www.um.ac.ir/)
and mined frequent patterns.
Nina et al., [17] presented a complete scheme regarding the
pattern discovery of WUM. Web site developers are supposed
to possess comprehensible understanding of user's profile and
site objectives, over and above underlined facts of the manner
users will look through the Web pages. The developers can
learn the visitor's behavior by means of the Web investigation
and discover patterns of the visitor's activities. This Web
analysis engages the renovation and understanding of the Web
log data to realize the hidden information or predictive pattern
by the data mining and knowledge discovery method.
Cooley et al., [18] developed information and pattern
discovery techniques on the WWW. Application of data
mining approaches to the WWW, known as Web mining, has
been the center of attention of numerous recent research
projects. The word Web mining has been applied in two
distinctive manners. The primary word, called Web content
mining in is the procedure of information discovery from
sources throughout the WWW. The second word, termed as
WUM, is the procedure of mining for user browsing behavior
and access patterns. Cooley et al., expressed WEBMINER in
brief, is a scheme for WUM.
2.3. Pattern Analysis
Klos et al., [19] established researches with the technique
used for the examination and assessment of Web pages. This
technique is constructed on a silent contract stuck between
Web developers and web users. The major features of this
contract are Web patterns which are utilized by Web
developers in their Web page designs. Using this technique it
is easy to determine whether the pattern is accessible on the
page with a better level of significance.
Web applications are dependent on uninterrupted and quick
development. Over and over again it takes place that
developers by chance duplicate Web pages without allowing
for systematic improvement and maintenance techniques. This
method facilitates code clones that create Web applications
complicated to maintain and use again. De Lucia et al., [20]
proposed a technique for reengineering Web applications
derived from clone investigation that intends to recognize and
simplify static and dynamic pages and navigational patterns of
a Web application. Clone investigation is also supportive for
recognizing literals that can be produced from a database. A
case study is illustrated by this author which demonstrates
how this technique can be used for restructuring the
navigational pattern of a Web application by eradicating
redundant code.
Kudelka et al., [21] proposed an innovative technique for
semantic investigation of Web pages. Examination is carried
out based on the accepted and empirically confirmed contract
between users and Web developers by means of Web patterns
[26]. This technique is developed for the extraction of patterns
which are uniqueness for actual domain. Patterns present
formalization of the contract and facilitate assignment of
semantics to segments of Web pages. Experimental
observations confirm the effectiveness of this technique.
Most of the approaches that have been exploited for pattern
detection from Web Usage Data (WUD) are clustering
techniques. In e-commerce applications, clustering techniques
can be exploited for the function of formulating marketing
approaches, product assistance, personalization and Web site
revision. An innovative Partitional dependent technique for
dynamically combining Web users in accordance with their
Web access patterns using Adaptive Resonance Theory1
Neural Network (ART1 NN) clustering approach is developed
by Raju et al., [22]. Experimental outcome confirms that this
ART1 NN clustering technique achieves better on the basis of
intra-cluster and inter-cluster distances when evaluated against
the K-Means and SOM clustering approaches.
Owing to the inbuilt correlation between Web objects and
the need of a standardized representation of Web documents,
Web community mining and investigation has turned out to be
a significant area for Web data management and analysis. The
investigation of Web communities lengthens the amount of
research fields such as Web mining, clustering, Web search
and text retrieval. Yanchun Zhang et al., [23] provides some
up to date investigations on this area, which cover finding
appropriate Web pages on the basis of linkage information,
determining user access patterns through examining Web log
files, co-clustering Web objects and examining social
networks from Web data.
One of the objects significant for reuse is design pattern.
This technique focuses on the usage of web design patterns
while examining the structural design and contents of web
pages. Kudelka et al., [24] have generated a technique called
Pattrio technique of pattern discovery on web pages. The
identified patterns on web pages illustrate the web page
structural design from the external point of view of the user.
The information of this structural design can be utilized in
different connections. Experiments have been discussed in
numerous conferences using the technique of knowing the
P Nithya et al ,Int.J.Computer Technology & Applications,Vol 3 (4), 1625-1629
IJCTA | July-August 2012 Available [email protected]
1627
ISSN:2229-6093
composition of web pages as automatically found. The
experimental evaluation compares this technique with other
selected techniques. The evaluation result confirms that web
design patterns can play a key role in the field of analysis of
web page composition and contents.
III. PROBLEMS AND DIRECTIONS
The most important difficulty with Web Mining in common
and WUM in specific is the temperament of the data they deal
with. With the exception of the quantity of the data, the data is
not absolutely structured. It is in a semi-structured
arrangement hence it needs numerous preprocessing steps
before the extraction of the essential information. Several
researches have to be done on preprocessing the data and the
on following problems.
Reducing the Paths of High visit Pages: The pages which
are recurrently visited by the users can be seen as to
follow a particular path. These pages can be integrated in
a simply accessible branch of the Website thus resulting
in reducing the navigation path length.
Eradicating or Integrating Low Visit Pages: The pages
which are not regularly visited by users can be either
eliminated or their content can be integrated with pages
with frequent access.
Redesigning Pages to facilitate User Navigation: To
assist the user to browse through the website in the best
achievable way, the information acquired can be used to
redesign the configuration of the Website.
IV. CONCLUSION
The increasing popularity of the Web has greatly attracted
the Web mining technology. A vital research area in Web
mining is Web usage mining which mainly focuses on the
discovery of patterns in the browsing and navigation data of
Web users. WUM has been a potential technology for
understanding behavior of the user on the Web.
There are several techniques proposed by different
researchers for the web usage mining. This paper discussed
about various techniques available for web usage mining. This
paper mainly discusses about three vial steps in WUM such as
preprocessing, pattern discovery and pattern analysis. It is
obvious that enhanced cluster recovery provides highly
accurate guessing of a Web user’s future visit if the user’s
cluster can be exactly determined.
REFERENCES
[1] Dr. G. K. Gupta, “Introduction to Data Mining with Case
Studies”, PHI Publication, 2005.
[2] Jaideep Srivastava, Robert Cooley, Mukund Deshpande,
Pang-Ning Tan, “Web Usage Mining: Discovery and
Applications of Usage Patterns from Web Data”, SIGKDD
Explorations, Vol. 1, No. 2, Pp. 12-23, 2000.
[3] Adel T. Rahmani and B. Hoda Helmi, “EIN-WUM an AIS-
based Algorithm for Web Usage Mining”, Proceedings of
GECCO’08, Atlanta, Georgia, USA, ACM978-1-60558-130-
9/08/07, Pp. 291-292, 2008.
[4] Shailey Minocha, Nicola Millard, Lisa Dawson, “Integrating
Customer Relationship Management Strategies in (B2C) E-
Commerce Environments”, IFIP Conference on Human-
Computer Interaction- INTERACT, 2003.
[5] C. Ramya, G. Kavitha, K. S. Shreedhara, “Preprocessing: A
Prerequisite for Discovering Patterns in Web Usage Mining
Process”, Computing Research Repository - CORR, vol.
abs/1105.0, 2011.
[6] V. Chitraa, Antony Selvdoss Davamani, “A Survey on
Preprocessing Methods for Web Usage Data”, Computing
Research Repository-CORR, Vol. abs/1004.1, 2010.
[7] Nizar R. Mabroukeh, Christie I. Ezeife, “A taxonomy of
sequential pattern mining algorithms”, ACM Computing
Surveys - CSUR, Vol. 43, No. 1, Pp. 1-41, 2010.
[8] Francesco Moscato, Nicola Mazzocca, Valeria Vittorini,
Giusy Di Lorenzo, Paola Mosca, Massimo Magaldi,
“Workflow Pattern Analysis in Web Services”, High
Performance Computing and Communications - HPCC, Pp.
395-400, 2005.
[9] Hussain, T.; Asghar, S.; Masood, N.; “Web usage mining: A
survey on preprocessing of web log file”, International
Conference on Information and Emerging Technologies
(ICIET), Pp. 1 – 6, 2010.
[10] Tanasa, D.; Trousse, B.; “Advanced data preprocessing for
intersites Web usage mining”, IEEE Intelligent Systems, Vol.
19, No. 2, Pp. 59 – 65, 2004.
[11] Othman, Z.A.; Abu Bakar, A.; Hamdan, A.R.; Omar, K.;
Shuib, N.L.M.; “Agent based preprocessing”, International
Conference on Intelligent and Advanced Systems (ICIAS),
Pp. 219 – 223, 2007.
[12] Khasawneh, N.; Chien-Chung Chan; “Active User-Based and
Ontology-Based Web Log Data Preprocessing for Web Usage
Mining”, IEEE/WIC/ACM International Conference on Web
Intelligence, Pp. 325 – 328, 2006.
[13] Tanasa, D.; Trousse, B.; “Data preprocessing for WUM”,
IEEE Potentials, Vol. 23, No. 3, Pp. 22 – 25, 2004.
[14] Xidong Wang; Yiming Ouyang; Xuegang Hu; Yan Zhang;
“Discovery of user frequent access patterns on Web usage
mining”, The Proceedings 8th International Conference on
Computer Supported Cooperative Work in Design, Vol. 1, Pp.
765 – 769, 2004.
[15] Qianhui Althea LIANG; Jen-Yao CHUNG; Steven MILLER;
Yang OUYANG; “Service Pattern Discovery of Web Service
Mining in Web Service Registry-Repository”, IEEE
International Conference on e-Business Engineering (ICEBE
'06), Pp. 286 – 293, 2006.
[16] Etminani, K.; Delui, A.R.; Yanehsari, N.R.; Rouhani, M.;
“Web usage mining: Discovery of the users' navigational
patterns using SOM”, First International Conference on
Networked Digital Technologies (NDT '09), Pp. 224 – 249,
2009.
[17] Nina, S.P.; Rahman, M.; Bhuiyan, K.I.; Ahmed, K.; “Pattern
Discovery of Web Usage Mining”, International Conference
on Computer Technology and Development (ICCTD '09),
Vol. 1, Pp. 499 – 503, 2009.
[18] Cooley, R.; Mobasher, B.; Srivastava, J.; “Web mining:
information and pattern discovery on the World Wide Web”,
Proceedings Ninth IEEE International Conference on Tools
with Artificial Intelligence, Pp. 558 – 567, 1997.
[19] Klos, K.; Kocibova, J.; Lehecka, O.; Kudelka, M.; Snasel,
V.; “Web Page Analysis: Experiments Based on Web
Patterns”, 4th International Conference on Innovations in
Information Technology (IIT '07), Pp. 16 – 20, 2007.
P Nithya et al ,Int.J.Computer Technology & Applications,Vol 3 (4), 1625-1629
IJCTA | July-August 2012 Available [email protected]
1628
ISSN:2229-6093
[20] De Lucia, A.; Francese, R.; Scanniello, G.; Tortora, G.;
“Reengineering Web applications based on cloned pattern
analysis”, Proceedings 12th IEEE International Workshop on
Program Comprehension, Pp. 132 – 141, 2004.
[21] Kudelka, M.; Snasel, V.; Lehecka, O.; El-Qawasmeh, E.;
“Semantic Analysis of Web Pages Using Web Patterns”,
IEEE/WIC/ACM International Conference on Web
Intelligence, Pp. 329 – 333, 2006.
[22] Raju, G.T.; Sudhamani, M.V.; “A novel approach for
extraction of cluster patterns from Web Usage Data and its
performance analysis”, International Conference on Emerging
Trends in Electrical and Computer Technology (ICETECT),
Pp. 718 – 723, 2011.
[23] Yanchun Zhang; Guandong Xu; “Using Web Clustering for
Web Communities Mining and Analysis”, IEEE/WIC/ACM
International Conference on Web Intelligence and Intelligent
Agent Technology (WI-IAT '08), Vol. 1, Pp. 20 – 31, 2008.
[24] Kudelka, Milos; Snasel, Vaclav; Lehecka, Ondrej; El-
Qawasmeh, Eyas; “Web content mining using web design
patterns”, IEEE International Conference on Information
Reuse and Integration (IRI), Pp. 232 – 237, 2008.
[25] Kudelka, Milos; Lehecka, Ondrej; Snasel, Vaclav; El-
Qawasmeh, Eyas; “Web pages clustering based on web
patterns”, 2nd International Conference on Digital
Information Management (ICDIM '07), Vol. 2, Pp. 657 – 664,
2007.
[26] Rui Wu; “Clustering Web Access Patterns Based on Hybrid
Approach”, Fifth International Conference on Fuzzy Systems
and Knowledge Discovery (FSKD '08), Vol. 1, Pp. 52 – 56,
2008.
P Nithya et al ,Int.J.Computer Technology & Applications,Vol 3 (4), 1625-1629
IJCTA | July-August 2012 Available [email protected]
1629
ISSN:2229-6093