evolving dynamic web pages using web mining

Post on 31-Dec-2015

41 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Evolving dynamic web pages using web mining. Kartik Menon Smart Engineering Systems Laboratory Engineering Management Department University of Missouri-Rolla. Overview. Goal Web Mining General Principle behind web mining Web Data Web Access Pattern Clustering - PowerPoint PPT Presentation

TRANSCRIPT

Evolving dynamic web pages using web mining

Kartik MenonSmart Engineering Systems LaboratoryEngineering Management DepartmentUniversity of Missouri-Rolla

Overview• Goal• Web Mining• General Principle behind web mining• Web Data• Web Access Pattern Clustering• Evolving web pages using cluster information• Clustering Techniques• Fuzzy C means• Experimental Set-up• Results• Conclusion and Future work• Questions

Goal

Cluster similar web access traversal patterns and train the system to understand the needs and demands of different users accessing the website and use this information to evolve web pages.

Web Mining

• Web Mining Learning about different users

accessing a web page.• The needs and requirements of the user• Web Access Traversal Patterns• Links which are more popular than

others• For example www.yahoo.com

» Emails» Search engine» News» Greeting cards

General Principle behind web mining

• Gather web data from Web Log servers

• Cluster web traversal patterns• Evolve web pages

Web Data

• What information is important for Mining– Links traversed (URL’s requested)– Documents downloaded – Time spent on the web page as

compared total time spent– Web Traffic– GET or POST messages

Web Access Pattern Clustering

• Find users with similar web access patterns• Grouping and separating users• Concise representation of a system's behavior• Generalize about user needs and interests

Evolving Web Pagesusing cluster information

• The cluster information can be used – To know about users– Modify the web page– Web personalization– Evolving Web pages

Clustering Techniques • Neural Nets

– Kohonen’s Self Organizing Maps (SOMs)

• Statistical– K-Means

• Fuzzy Logic– Fuzzy C Means– Fuzzy ISODATA

Fuzzy C Means

• Is a data clustering technique where each data point belongs to a cluster to some degree that is specified by a membership function

• If – X is a set of n data sample vectors – U is a partition of X in c part,– V are cluster centers – d^2 is an inner product induced norm – u grade of membership of xk to the cluster i between 0 and 1 – m is a parameter to increase or decrease the fuzziness

Fuzzy C Means (contd)

)vx(d)u()V,U(J i,k2m

n

1k

c

1iikm

c

j

m

ji

ki

ki

d

du

1

)1(2

)(

)(

)(

1

2|| ikik vxd

N

i

mki

N

iij

mki

i

u

xu

v

1)(

1)(

Experimental Set-up

• Target the website http://campus.umr.edu.• Mine the web log files for web data.• The main problem is to convert the web sites

accessed into numeric values.• Identify all the URLs from where you can go from this

web page • Number these URLs from 1 to N where N is the Nth

URL which can be accessed• Assign fuzzy weights (w(j)) to each URL that can be

accessed• A Boolean variable s(j) is defined which is set to 1 if

the jth URL is accessed by the user else s(j) is set to null.

Experimental Set-up (contd.)

• Define the data point x as the number corresponding to the for all the sites accessed by the user in that particular user session.

• Apply fuzzy c-means by calculating Euclidean distance between the data sample as dij=|xj-ci| where xj being the data point and ci being the center of cluster i.

http://.campus.umr.edu(0)

/parents(0.3)/community(0.5)/faculty(0.4)/staff(0.2)/students(0.1)

/registrar(0.11) www.umr.edu/~career(0.120) /departments(0.13)

/registrar/star(0.111) /registrar/courseinfo(0.112) /fairs(0.121) /jobtrack/*(0.122)

/academic.html#art_science (0.131) /academic.html#engineering(0.132)

IP Address URL’s Accessed by the user

131.151.9.999 http://campus.umr.edu, /students, /departments, /departments/academic.html#arts_science

181.147.7.970 http://campus.umr.edu, /students, /registrar, /registrar/star

181.147.7.972 http://campus.umr.edu, /students, http://web.umr.edu/~career, /jobtrak/*

181.148.7.979 http://campus.umr.edu, /students, http://web.umr.edu/~career, /fairs

Results : For 2 and 3 clusters

Results :For 2 and 3 clusters(contd)

Web Page Evolution

• Use the clustered information as

an input to modify the web page so that

users having similar access patterns get same web page as compared to others

• Adjust the placement of links

• Remove certain links (if possible)

Conclusions• Fuzzy c-means is an easy way of

clustering similar web access patterns

for different user sessions • The use of Euclidean distance was very helpful to

learn more about these web access patterns. • The experiment provided easy results and plots

which was highly interpretable • We observe that that fuzzy c-means provided stable

results for the different data sets we took.

Future Work

• Use other clustering algorithms and compare

• Developing self evolving web sites - sites that improve themselves by learning from user access patterns

• The results which we got using the fuzzy clustering algorithms could be used to recommend the web master of the http://campus.umr.edu

• Increase the popularity of the web page by tailoring it more to the needs of the users accessing it

Questions ???

top related