a survey on web usage mining: theory and applications

5
1 P.Nithya, 2 Dr. P.Sumathi 1 Doctoral student in Manonmaniam Sundaranar University, Tirunelveli ,Tamil Nadu, India 2 Asst. Professor, Chikkanna Govt. Arts College, Tirupur, Tamil Nadu, India E-mail:[email protected] Abstract--- The World Wide Web maintaining its development at an incredible pace. The information available in the WWW is a gateway and an intermediate for carrying out business. Web mining is the extraction of exciting and constructive facts and inherent information from artifacts or actions related to the WWW. Web usage mining (WUM) puts an effort to determine valuable information from the secondary data obtained from the communications of the users with the Web. WUM has turned out to be an extremely significant for successful Web site organization, generating adaptive Web sites, business and maintenance services, personalization, network traffic flow examination and so on. WUM comprises of three steps, namely preprocessing, pattern discovery, and pattern analysis. WUM has become an active area of research in field of data mining due to its vital importance. This paper provides a comprehensive discussion of the all the phases in WUM and related works in this field. Keywords--- Web Usage Mining (WUM), Customer Relationship Management (CRM), Preprocessing, Pattern Discovery, Pattern Analysis I. I NTRODUCTION Data in Web Usage Mining, can be obtained in server logs, browser logs, proxy logs, or collected from an organization's database. These data collections vary in terms of the location of the data source, the kinds of data available, the segment of population from which the data was obtained, and techniques of implementation [1]. WUM is a division of Web Mining, which, sequentially, is a component of Data Mining. The process of mining significant and valuable information from vast database is called Data Mining [2]. WUM mines the usage features of the users of Web Applications. This obtained data can then be applied in a various ways such as, checking of fake elements etc. WUM is considered as a component of the Business Intelligence in an organization [3]. It is applied for deciding business approaches via the competent use of Web Applications. It is very vital for the Customer Relationship Management (CRM) since it can guarantee customer fulfillment till the interface between the customer and the organization is concerned [4]. There are many kinds of data that can be used in Web Mining. 1. Content: The visible data in the Web pages or the data which was intended to be provided to the users. This greatly includes text and graphics (images). 2. Structure: The organization of the website is illustrated by this data. It is partitioned into two categories. Intra-page structure data consist of the arrangement of several Hyper Text Markup Language (HTML) or Extended Markup Language (XML) tags within a given page. The key type of inter-page structure information is the hyper-links used for site navigation. 3. Usage: Data that illustrates the usage patterns of Web pages, such as IP addresses, page references and the date and time of accesses and other information based on the log format. The main processes in WUM are: Preprocessing: Data preprocessing illustrates any sort of processing executed on raw data to organize it for another processing process [5]. Data preprocessing alters the data into a format that will be more efficiently processed for the convenient of the user. Preprocessing steps used in WUM are [6]: 1. Usage Pre-Processing: Pre-Processing involving Usage patterns of users. 2. Content Pre-Processing: Pre-Processing of content accessed. 3. Structure Pre-Processing: Pre-Processing involving structure of the website. Pattern Discovery: WUM can be utilized to expose patterns in server logs but is frequently executed only on samples of data. The mining procedure will be unproductive if the models are not a significant illustration of the larger body of data [7]. The following are the pattern discovery methods. 1. Statistical Analysis 2. Association Rules 3. Clustering 4. Classification 5. Sequential Patterns 6. Dependency Modeling Pattern Analysis: This is the ultimate step in the WUM process. After the completion of the preprocessing and pattern discovery, the collected usage patterns are examined to filter insignificant information and obtain the valuable information A Survey on Web Usage Mining: Theory and Applications

Upload: others

Post on 12-Sep-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Survey on Web Usage Mining: Theory and Applications
Page 2: A Survey on Web Usage Mining: Theory and Applications
Page 3: A Survey on Web Usage Mining: Theory and Applications

the tree. This technique is more precise and scalable for

mining frequent access patterns with dissimilar lengths.

Qianhui Althea LIANG et al., [15] proposed and

sophisticated the conception of Web service usage patterns

and pattern discovery through service mining. The author

described three different levels of service handling data: (a)

user demand level, (b) template level and (c) instance level. At

every stage, the author examined the patterns of service

handling data and the detection of these patterns. A technique

for service pattern detection at the template level has been

developed.

Huge amount of data are collected automatically by Web

servers and accumulated in access log files. Examination of

server access data can offer considerable and valuable

information. WUM is the procedure of utilizing data mining

approaches to the discovery of usage patterns from Web data

and is aimed towards applications. It extracts the secondary

data obtained from the communications of the users

throughout certain time of Web sessions. By knowing the

value of significant application, WUM has seen a quick raise

in attention, from both the research and application fields.

Etminani et al., [16] exploited the use of Kohonen's SOM

(Self Organizing Map) to pre-processed Web logs of one of

the leading university Web server logs (http://www.um.ac.ir/)

and mined frequent patterns.

Nina et al., [17] presented a complete scheme regarding the

pattern discovery of WUM. Web site developers are supposed

to possess comprehensible understanding of user's profile and

site objectives, over and above underlined facts of the manner

users will look through the Web pages. The developers can

learn the visitor's behavior by means of the Web investigation

and discover patterns of the visitor's activities. This Web

analysis engages the renovation and understanding of the Web

log data to realize the hidden information or predictive pattern

by the data mining and knowledge discovery method.

Cooley et al., [18] developed information and pattern

discovery techniques on the WWW. Application of data

mining approaches to the WWW, known as Web mining, has

been the center of attention of numerous recent research

projects. The word Web mining has been applied in two

distinctive manners. The primary word, called Web content

mining in is the procedure of information discovery from

sources throughout the WWW. The second word, termed as

WUM, is the procedure of mining for user browsing behavior

and access patterns. Cooley et al., expressed WEBMINER in

brief, is a scheme for WUM.

2.3. Pattern Analysis

Klos et al., [19] established researches with the technique

used for the examination and assessment of Web pages. This

technique is constructed on a silent contract stuck between

Web developers and web users. The major features of this

contract are Web patterns which are utilized by Web

developers in their Web page designs. Using this technique it

is easy to determine whether the pattern is accessible on the

page with a better level of significance.

Web applications are dependent on uninterrupted and quick

development. Over and over again it takes place that

developers by chance duplicate Web pages without allowing

for systematic improvement and maintenance techniques. This

method facilitates code clones that create Web applications

complicated to maintain and use again. De Lucia et al., [20]

proposed a technique for reengineering Web applications

derived from clone investigation that intends to recognize and

simplify static and dynamic pages and navigational patterns of

a Web application. Clone investigation is also supportive for

recognizing literals that can be produced from a database. A

case study is illustrated by this author which demonstrates

how this technique can be used for restructuring the

navigational pattern of a Web application by eradicating

redundant code.

Kudelka et al., [21] proposed an innovative technique for

semantic investigation of Web pages. Examination is carried

out based on the accepted and empirically confirmed contract

between users and Web developers by means of Web patterns

[26]. This technique is developed for the extraction of patterns

which are uniqueness for actual domain. Patterns present

formalization of the contract and facilitate assignment of

semantics to segments of Web pages. Experimental

observations confirm the effectiveness of this technique.

Most of the approaches that have been exploited for pattern

detection from Web Usage Data (WUD) are clustering

techniques. In e-commerce applications, clustering techniques

can be exploited for the function of formulating marketing

approaches, product assistance, personalization and Web site

revision. An innovative Partitional dependent technique for

dynamically combining Web users in accordance with their

Web access patterns using Adaptive Resonance Theory1

Neural Network (ART1 NN) clustering approach is developed

by Raju et al., [22]. Experimental outcome confirms that this

ART1 NN clustering technique achieves better on the basis of

intra-cluster and inter-cluster distances when evaluated against

the K-Means and SOM clustering approaches.

Owing to the inbuilt correlation between Web objects and

the need of a standardized representation of Web documents,

Web community mining and investigation has turned out to be

a significant area for Web data management and analysis. The

investigation of Web communities lengthens the amount of

research fields such as Web mining, clustering, Web search

and text retrieval. Yanchun Zhang et al., [23] provides some

up to date investigations on this area, which cover finding

appropriate Web pages on the basis of linkage information,

determining user access patterns through examining Web log

files, co-clustering Web objects and examining social

networks from Web data.

One of the objects significant for reuse is design pattern.

This technique focuses on the usage of web design patterns

while examining the structural design and contents of web

pages. Kudelka et al., [24] have generated a technique called

Pattrio technique of pattern discovery on web pages. The

identified patterns on web pages illustrate the web page

structural design from the external point of view of the user.

The information of this structural design can be utilized in

different connections. Experiments have been discussed in

numerous conferences using the technique of knowing the

P Nithya et al ,Int.J.Computer Technology & Applications,Vol 3 (4), 1625-1629

IJCTA | July-August 2012 Available [email protected]

1627

ISSN:2229-6093

Page 4: A Survey on Web Usage Mining: Theory and Applications

composition of web pages as automatically found. The

experimental evaluation compares this technique with other

selected techniques. The evaluation result confirms that web

design patterns can play a key role in the field of analysis of

web page composition and contents.

III. PROBLEMS AND DIRECTIONS

The most important difficulty with Web Mining in common

and WUM in specific is the temperament of the data they deal

with. With the exception of the quantity of the data, the data is

not absolutely structured. It is in a semi-structured

arrangement hence it needs numerous preprocessing steps

before the extraction of the essential information. Several

researches have to be done on preprocessing the data and the

on following problems.

Reducing the Paths of High visit Pages: The pages which

are recurrently visited by the users can be seen as to

follow a particular path. These pages can be integrated in

a simply accessible branch of the Website thus resulting

in reducing the navigation path length.

Eradicating or Integrating Low Visit Pages: The pages

which are not regularly visited by users can be either

eliminated or their content can be integrated with pages

with frequent access.

Redesigning Pages to facilitate User Navigation: To

assist the user to browse through the website in the best

achievable way, the information acquired can be used to

redesign the configuration of the Website.

IV. CONCLUSION

The increasing popularity of the Web has greatly attracted

the Web mining technology. A vital research area in Web

mining is Web usage mining which mainly focuses on the

discovery of patterns in the browsing and navigation data of

Web users. WUM has been a potential technology for

understanding behavior of the user on the Web.

There are several techniques proposed by different

researchers for the web usage mining. This paper discussed

about various techniques available for web usage mining. This

paper mainly discusses about three vial steps in WUM such as

preprocessing, pattern discovery and pattern analysis. It is

obvious that enhanced cluster recovery provides highly

accurate guessing of a Web user’s future visit if the user’s

cluster can be exactly determined.

REFERENCES

[1] Dr. G. K. Gupta, “Introduction to Data Mining with Case

Studies”, PHI Publication, 2005.

[2] Jaideep Srivastava, Robert Cooley, Mukund Deshpande,

Pang-Ning Tan, “Web Usage Mining: Discovery and

Applications of Usage Patterns from Web Data”, SIGKDD

Explorations, Vol. 1, No. 2, Pp. 12-23, 2000.

[3] Adel T. Rahmani and B. Hoda Helmi, “EIN-WUM an AIS-

based Algorithm for Web Usage Mining”, Proceedings of

GECCO’08, Atlanta, Georgia, USA, ACM978-1-60558-130-

9/08/07, Pp. 291-292, 2008.

[4] Shailey Minocha, Nicola Millard, Lisa Dawson, “Integrating

Customer Relationship Management Strategies in (B2C) E-

Commerce Environments”, IFIP Conference on Human-

Computer Interaction- INTERACT, 2003.

[5] C. Ramya, G. Kavitha, K. S. Shreedhara, “Preprocessing: A

Prerequisite for Discovering Patterns in Web Usage Mining

Process”, Computing Research Repository - CORR, vol.

abs/1105.0, 2011.

[6] V. Chitraa, Antony Selvdoss Davamani, “A Survey on

Preprocessing Methods for Web Usage Data”, Computing

Research Repository-CORR, Vol. abs/1004.1, 2010.

[7] Nizar R. Mabroukeh, Christie I. Ezeife, “A taxonomy of

sequential pattern mining algorithms”, ACM Computing

Surveys - CSUR, Vol. 43, No. 1, Pp. 1-41, 2010.

[8] Francesco Moscato, Nicola Mazzocca, Valeria Vittorini,

Giusy Di Lorenzo, Paola Mosca, Massimo Magaldi,

“Workflow Pattern Analysis in Web Services”, High

Performance Computing and Communications - HPCC, Pp.

395-400, 2005.

[9] Hussain, T.; Asghar, S.; Masood, N.; “Web usage mining: A

survey on preprocessing of web log file”, International

Conference on Information and Emerging Technologies

(ICIET), Pp. 1 – 6, 2010.

[10] Tanasa, D.; Trousse, B.; “Advanced data preprocessing for

intersites Web usage mining”, IEEE Intelligent Systems, Vol.

19, No. 2, Pp. 59 – 65, 2004.

[11] Othman, Z.A.; Abu Bakar, A.; Hamdan, A.R.; Omar, K.;

Shuib, N.L.M.; “Agent based preprocessing”, International

Conference on Intelligent and Advanced Systems (ICIAS),

Pp. 219 – 223, 2007.

[12] Khasawneh, N.; Chien-Chung Chan; “Active User-Based and

Ontology-Based Web Log Data Preprocessing for Web Usage

Mining”, IEEE/WIC/ACM International Conference on Web

Intelligence, Pp. 325 – 328, 2006.

[13] Tanasa, D.; Trousse, B.; “Data preprocessing for WUM”,

IEEE Potentials, Vol. 23, No. 3, Pp. 22 – 25, 2004.

[14] Xidong Wang; Yiming Ouyang; Xuegang Hu; Yan Zhang;

“Discovery of user frequent access patterns on Web usage

mining”, The Proceedings 8th International Conference on

Computer Supported Cooperative Work in Design, Vol. 1, Pp.

765 – 769, 2004.

[15] Qianhui Althea LIANG; Jen-Yao CHUNG; Steven MILLER;

Yang OUYANG; “Service Pattern Discovery of Web Service

Mining in Web Service Registry-Repository”, IEEE

International Conference on e-Business Engineering (ICEBE

'06), Pp. 286 – 293, 2006.

[16] Etminani, K.; Delui, A.R.; Yanehsari, N.R.; Rouhani, M.;

“Web usage mining: Discovery of the users' navigational

patterns using SOM”, First International Conference on

Networked Digital Technologies (NDT '09), Pp. 224 – 249,

2009.

[17] Nina, S.P.; Rahman, M.; Bhuiyan, K.I.; Ahmed, K.; “Pattern

Discovery of Web Usage Mining”, International Conference

on Computer Technology and Development (ICCTD '09),

Vol. 1, Pp. 499 – 503, 2009.

[18] Cooley, R.; Mobasher, B.; Srivastava, J.; “Web mining:

information and pattern discovery on the World Wide Web”,

Proceedings Ninth IEEE International Conference on Tools

with Artificial Intelligence, Pp. 558 – 567, 1997.

[19] Klos, K.; Kocibova, J.; Lehecka, O.; Kudelka, M.; Snasel,

V.; “Web Page Analysis: Experiments Based on Web

Patterns”, 4th International Conference on Innovations in

Information Technology (IIT '07), Pp. 16 – 20, 2007.

P Nithya et al ,Int.J.Computer Technology & Applications,Vol 3 (4), 1625-1629

IJCTA | July-August 2012 Available [email protected]

1628

ISSN:2229-6093

Page 5: A Survey on Web Usage Mining: Theory and Applications

[20] De Lucia, A.; Francese, R.; Scanniello, G.; Tortora, G.;

“Reengineering Web applications based on cloned pattern

analysis”, Proceedings 12th IEEE International Workshop on

Program Comprehension, Pp. 132 – 141, 2004.

[21] Kudelka, M.; Snasel, V.; Lehecka, O.; El-Qawasmeh, E.;

“Semantic Analysis of Web Pages Using Web Patterns”,

IEEE/WIC/ACM International Conference on Web

Intelligence, Pp. 329 – 333, 2006.

[22] Raju, G.T.; Sudhamani, M.V.; “A novel approach for

extraction of cluster patterns from Web Usage Data and its

performance analysis”, International Conference on Emerging

Trends in Electrical and Computer Technology (ICETECT),

Pp. 718 – 723, 2011.

[23] Yanchun Zhang; Guandong Xu; “Using Web Clustering for

Web Communities Mining and Analysis”, IEEE/WIC/ACM

International Conference on Web Intelligence and Intelligent

Agent Technology (WI-IAT '08), Vol. 1, Pp. 20 – 31, 2008.

[24] Kudelka, Milos; Snasel, Vaclav; Lehecka, Ondrej; El-

Qawasmeh, Eyas; “Web content mining using web design

patterns”, IEEE International Conference on Information

Reuse and Integration (IRI), Pp. 232 – 237, 2008.

[25] Kudelka, Milos; Lehecka, Ondrej; Snasel, Vaclav; El-

Qawasmeh, Eyas; “Web pages clustering based on web

patterns”, 2nd International Conference on Digital

Information Management (ICDIM '07), Vol. 2, Pp. 657 – 664,

2007.

[26] Rui Wu; “Clustering Web Access Patterns Based on Hybrid

Approach”, Fifth International Conference on Fuzzy Systems

and Knowledge Discovery (FSKD '08), Vol. 1, Pp. 52 – 56,

2008.

P Nithya et al ,Int.J.Computer Technology & Applications,Vol 3 (4), 1625-1629

IJCTA | July-August 2012 Available [email protected]

1629

ISSN:2229-6093