شهره کاظمی kazemi@ce.aut.ac.ir 1 آزمايشکاه سيستم های هوشمند (...

Post on 14-Dec-2015

243 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

گزارش پيشرفت کار پروژه

مدل مارکف

2آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Modeling and Predicting a User’s Browsing Behavior

the problem of modeling and predicting a user’s browsing behavior on a Web site can be used to improve: the Web cache performance [1; 2; 3] recommend related pages [4;5] improve search engines [6] understand and influence buying patterns [7] personalize the browsing experience [8]

3آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Markov models

Markov models [9] have been used for studying and understanding stochastic processes

They shown to be well suited for modeling and predicting a user’s browsing behavior on a Web site.

4آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Markov models

In general, the input for these problems is the sequence of Web pages accessed by a user

The goal is to build Markov models that can be used to predict the Web page that the user will most likely access next

5آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Markov Models for Predicting Next-Accessed Page

The act of a user browsing a Web site is commonly modeled by observing the set of pages that he or she visits[10]

This set of pages is referred to as a Web session

W =( P1,P2, ... , Pl )

6آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Markov Models for Predicting Next-Accessed Page

The next-page prediction problem can be solved using a probabilistic framework as follows:

Let W be a user’s Web session of length l let P( pi | W ) be the probability that the user visits

page pi next Then the page pl+1 that the user will visit next is

given by

7آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Markov Models for Predicting Next-Accessed Page

the probability of visiting a page pi does not depend on all the pages in the Web session, but only on a small set of k preceding pages, where k « l

Then we have:

8آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Markov Models for Predicting Next-Accessed Page

The number of preceding pages k that the next page depends on is called the order of the Markov model, and the resulting model M is called the kth-order Markov model

9آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

P1

P2

P4

P3

P5

Markov Models for Predicting Next-Accessed Page

the site map for a sample Web site as a directed graph

10آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Markov Models for Predicting Next-Accessed Page

a set of Web sessions that were generated on thisWeb site

Training setW1 : <P1 , P3 , P4>W2 : <P1 , P2 , P3 , P5>W3 : <P1 , P2>W4 : <P1 , P3 , P4 , P3 , P1 , P2>W5 : <P1 , P2 , P3 , P5 , P3>W6 : <P1 , P3 , P1 , P2 , P1 , P3 , P4>

Test set:Wt1 : <P1 , P2 , P3 , ?> <P5>Wt1 : <P1 , P2 , P3 , P5 , P3 , ?> <P4>

11آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Markov Models for Predicting Next-Accessed Page

the frequencies of different states for first-order Markov models

1st –Order States Fr. P1 P2 P3 P4 P5

S(1,1)=<P1>S(1,2)=<P2>S(1,3)=<P3>S(1,4)=<P4>S(1,5)=<P5>

93711

01200

50000

42011

00300

00200

12آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Markov Models for Predicting Next-Accessed Page

the frequencies of different states for second-order Markov models

13آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

how these models are used to predict the most probable page for Web session Wt1

Markov Models for Predicting Next-Accessed Page

14آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Performance Measures for Markov Models

The first is the accuracy of the model The second is the number of states of the

model The third is the coverage of the mode

the ratio of the number of Web sessions for which the model is able to correctly predict the hidden page to the total number of Web sessions in the test setthe total number of states for which a Markov model has estimatedthe ratio of the number of Web sessions whose state required for making a prediction was found in the model to the total number of Web sessions in the test set

15آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Lower-order Markov models

lower-order Markov models (first or second) are not successful in accurately predicting the next page to be accessed by the user

Because these models do not look far into the past

16آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Higher-order Markov models

In order to obtain better predictions, higher-order models must be used

these higher-order models have a number of limitations:

(i) high state-space complexity

(ii) reduced coverage

(iii) sometimes even worse accuracy due to the lower coverage

17آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Comparing accuracy, coverage and model size with the order of Markov model

18آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

All-Kth-Order Markov model

One method to overcome coverage problem is to train varying order Markov models and then combine them for prediction[8]

For each test instance, the highest-order Markov model that covers the instance is used for prediction

This scheme is called :All-Kth-Order Markov model

But it increases the problem of model size

19آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Some techniques developed to intelligently combine different order Markov models

The resulting model :Has low state complexity, Retains the coverage of the All-Kth-Order

Markov modelAchieves comparable accuracies

20آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Frequency based

They are based on the observation that states that occur with low frequency in the training set, tend to also have low prediction accuracies

These low frequency states can be eliminated without affecting the accuracy of the resulting model

21آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Frequency based

The amount of pruning is controlled by the parameter Φ referred to as the frequency threshold

Note that they will never prune a state from a first-order Markov model that will not reduce the coverage of the original model

22آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Frequency based

Frequencythreshold

Accuracy # states

02468

1012141618202224

30.2430.6831.3231.5631.6531.7131.7431.7331.7231.7231.7231.6731.67

1264644452820914141641089989527661671659695389496546094296

23آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Error based

The final predictions are computed by using only the states of the model that have the smallest estimated error rate

the error associated with each state is estimated by a validation step

A higher-order state is pruned by comparing its error rate with the error rate of its lower-order states

24آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

For example, to prune the state S(3,q) (Pi , Pj , Pk), its error rate will be compared with the error rate for states S(2,r) (Pj , Pk), and state S(1,s) (Pk); the state S(3,q) will be pruned if its error rate is higher than any of them.

Error based

25آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Training and validating Web sessions

26آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Various order Markov states withtheir maximum frequency page

27آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

Error rates for Markov states

28آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

<P1,P3,P5> <P2,P4,P5><P2,P3,P5>

<P5>

<P3,P5> <P4,P5><P3,P5>

<P5> <P5>

29آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

References

[1] SCHECHTER, S., KRISHNAN, M., AND SMITH, M. D. 1998. Using path profiles to predict http requests.In 7th International World Wide Web Conference

[2] BESTRAVOS, A. 1995. Using speculation to reduce server load and service time on www. In Proceedings of the 4th ACM International Conference of Information and Knowledge Management. ACM Press.

[3] PADMANABHAM, V. AND MOGUL, J. 1996. Using predictive prefetching to improve world wide web latency. Comput. Commun. Rev.

[4] DEAN, J. AND HENZINGER, M. R. 1999. Finding related pages in world wide web. In Proceedings of the 8th International World Wide Web Conference.

[5] PIROLLI, P., PITKOW, J., AND RAO, R. 1996. Silk from a sow’s ear: Extracting usable structures from the web. In Proceedings of ACM Conference on Human Factors in Computing Systems (CHI-96).

30آزمايشکاه سيستم های هوشمند

(http://ce.aut.ac.ir/islab)شهره کاظمی kazemi@ce.aut.ac.ir

[6] BRIN, S. AND PAGE, L. 1998. The anatomy of large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference.

[7] CHI, E., PITKOW, J., MACKINLAY, J., PIROLLI, P., GOSSWEILER, R., AND CARD, S. 1998. Visualizing the evolution of web ecologies. In Proceedings of ACM Conference on Human Factors in Computing Systems (CHI 98).

[8] PITKOW, J. AND PIROLLI, P. 1999. Mining longest repeating subsequence to predict world wide web surfing. In 2nd USENIX Symposium on Internet Technologies and Systems. Boulder, CO.

[9] PAPOULIS, A. 1991. Probability, Random Variables, and Stochastic Processes. McGraw Hill.

[10] SRIVASTAVA, J., COOLEY, R., DESHPANDE, M., AND TAN, P.-N. 2000. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explor. 1, 2.

top related