web personalization

42
WEB PERSONALIZATION NLP Course Seminar Group 14 Vishaal Jatav (04d05013) Varun Garg (04d05015)

Upload: rae-miranda

Post on 30-Dec-2015

57 views

Category:

Documents


1 download

DESCRIPTION

NLP Course Seminar. WEB PERSONALIZATION. Group 14 Vishaal Jatav (04d05013) Varun Garg (04d05015) ‏. Roadmap. Motivation Introduction The Personalization Process Personalization Approaches Personalization Techniques Issues Conclusion. Motivation. Some Facts - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: WEB PERSONALIZATION

WEB PERSONALIZATION

NLP Course Seminar

Group 14Vishaal Jatav (04d05013)

Varun Garg (04d05015)

Page 2: WEB PERSONALIZATION

Roadmap

Motivation Introduction The Personalization Process Personalization Approaches Personalization Techniques Issues Conclusion

Page 3: WEB PERSONALIZATION

Motivation

Some Facts Overwhelming amount of information on web Not all the documents are relevant to the user Users cannot convey their information needs Users never find any document 100% relevant

Users expect more personal behavior I don't want results of Delhi when I am in Bombay. I was looking for crane (the bird) not crane (the machine).

Page 4: WEB PERSONALIZATION

Google Customization

Page 5: WEB PERSONALIZATION

Google (without personalization)

Page 6: WEB PERSONALIZATION

Google (with personalization)

Page 7: WEB PERSONALIZATION

Google Search History

Page 8: WEB PERSONALIZATION

Google Search History

Page 9: WEB PERSONALIZATION

Introduction

Personalization React differently to different users System reacts in a way the users want it to Ultimately bring back the user to the system

Web Personalization Apply machine learning and data mining Build models of user behavior (called profiles) Predict user's needs and expectations Adaptively estimate better models

Page 10: WEB PERSONALIZATION

The Personalization Process

Consider the following pieces of information Geographical Location Age, gender, ethnicity, religion, etc. Interests Previous reviews on products ......

How could these pieces of information help?

How to collect these information?

Page 11: WEB PERSONALIZATION

The Personalization Process(Contd...)

Collect lots of information on the user behavior Information must be attributable to a single user

Decide on a user model Featuring user needs, lifestyle, situations, etc.

Create user profile for each user of the system Profile captures the individuality of the user

Habits, browsing behavior, lifestyle, etc.

With every interaction, modify the user profile

Page 12: WEB PERSONALIZATION

The Personalization Process More Formally

Web is a collection of n items I = {i1,i

2,....i

n}

User comes from a set U = {u1,u

2,...u

m}

User has rated each item by ruk

: I → [0,1] U ! where, i

j = ! means i

j is not rated by the user

Ik(u) is set of items not yet rated by user u

k

Ik(r) is set of items rated by user u

k

GOAL: recommend items ij to user u

a that are

present in Ia

(u), which might be of his interest

Page 13: WEB PERSONALIZATION

Classification of Personalization Approaches

Individual Vs Collaborative

Reactive Vs Proactive

User Vs Item Information

Page 14: WEB PERSONALIZATION

Classification of Personalization Approaches

Individual Vs Collaborative Individual approach (Google Personalized Search)

Use only individual user's data Generate user profile by analyzing

User's browsing behavior User's active feedback on the system

Advantage Can be implemented on the client-side - no privacy

violation Disadvantage

Based only on past interactions – lack of serendipity

Page 15: WEB PERSONALIZATION

Classification of Personalization Approaches

Individual Vs CollaborativeContd...

Collaborative approach (Amazon recommendations)

Find the neighborhood of the active user React according to an assumption

If A is like B, then B likes the same things as A likes Disadvantages

New item rating problem New user problem

Advantage Better than individual approach - Once the two problems are

solved.

Page 16: WEB PERSONALIZATION

Classification of Personalization Approaches

Reactive Vs Proactive

Reactive approach Explicitly ask user for preferences

Either in the form of query or feedback

Proactive approach Learn user preferences by user behavior

No explicit preference demand from the user Behavior is extracted

Click-through rates Navigational pattern

Page 17: WEB PERSONALIZATION

Classification of Personalization Approaches

User Vs Item Information

User Information Geographic location (from IP address)

age, gender, marital status, etc (explicit query)

Lifestyle, etc. (inference from past behavior)

Item Information Content of Topics – movie genre, etc. Product/ domain ontology

Page 18: WEB PERSONALIZATION

Personalization Techniques

Content-Based Filtering

Collaborative Filtering

Model Based Personalization

Rule based

Graph theoretic

Language Model

Page 19: WEB PERSONALIZATION

Content-Based Filtering

Syskill and Webert use explicit feedback Individual, Reactive, Item-information Uses naïve Bayes to distinguish likes from dislikes Initial probabilities updated with new interactions Uses 128 most informative words from each item

Letizia uses implicit feedback Individual, Proactive, Item-information Find likes/dislikes based on tf-idf similarity

Others use nearest-neighborhood for similarity

Page 20: WEB PERSONALIZATION

Collaborative Filtering

Found successful in recommendation systems

General Technique For every user, a user neighborhood is computed

Neighborhood contains users who have rated several items almost equally

Get candidate items for recommendations Items seen by the neighborhood but not by active user ua

Data is stored in the form of a rating matrix Items as rows and users as columns

Page 21: WEB PERSONALIZATION

Collaborative FilteringContd....

System must provide the following algorithms Measure similarity between users

For creation of the neighborhood Pearson and Spearman Correlation, cosine similarity, etc.

Predicting rank of the item not rated by the user To decide order with which these items will be presented Weighted sum of ranks – most common

Select neighborhood subset for prediction To reduce large amount of computation Threshold in similarity value – most common

Page 22: WEB PERSONALIZATION

Model Based Personalization Approaches

Executed in two stages Offline process – to create the actual model Online process – using the model and interaction

Common data used for model generation Web usage data (web history, click-through rates, etc.) Item's structure and content data

Examples Rule-Based Models Graph-Theoretic Models Language Models

Page 23: WEB PERSONALIZATION

Model Based Personalization

Rule Based Models

Association rule-based Item ia is in unordered association with ib If user considers ib, then ia is a good recommendation

Sequence rule-based Item ia is in sequential association with ib If user considers ia, then ib is a good recommendation

Association between items can be stored as a dependency graph

Page 24: WEB PERSONALIZATION

Model Based Personalization

Graph Theoretic Model

Ratings data is transformed into a directed graph Nodes are users A edge between ui and uj means that ui predicts uj

Weights on edges represents the predictability

To predict if an item ik will be of interest to ui

Calculate shortest path from ui to any user ur Where u

r has rated i

k

Predicted rating is calculated as a function of path between ui and ur

Page 25: WEB PERSONALIZATION

Model Based Personalization

Language Modeling Approaches

Without using user's relevance feedback Simple language modeling

Using user's relevance feedback N gram based methods Noisy channel model based method

Page 26: WEB PERSONALIZATION

Language Model Approach

Simple Language Modeling

Without using user's feedback History consists of all the words in the past

queries Learn User Profile as {(w

1,P(w

1)),... (w

n,P(w

n))}

where

Page 27: WEB PERSONALIZATION

Language Model Approach

Simple Language Modeling Sample User profile

Page 28: WEB PERSONALIZATION

Language Model Approach

Simple Language Modeling

Re-ranking of unpersonalized results Re-ranking is done according to P(Q|D,u)

α Is a weighter parameter between 0 and 1 UP is user profile

Page 29: WEB PERSONALIZATION

Language Model Approach

N gram based approach

Using user's relevance feedback Learn User Profile

Let Hu represent the search history of user u

H = {(q1, rf

1), (q

2, rf

2), (q

3, rf

3), ...., (q

n, rf

n)}

Unigram

Now the user profile consists of

{(w1, P(w

1)), (w

2, P(w

2)), (w

3, P(w

3)), ...., (w

n, P(w

n))}

Page 30: WEB PERSONALIZATION

Language Model Approach

N gram based approach Sample Unigram User Profile

Page 31: WEB PERSONALIZATION

Language Model Approach

N gram based approach

Bigram

the user profile consists of

{(w1w

2, P(w

2|w

1)), (w

2w

3, P(w

3|w

2)), ... , (w

n-1w

n, P(w

n|w

n-1))}

Page 32: WEB PERSONALIZATION

Language Model Approach

N gram based approach

Sample Bigram User Profile

Page 33: WEB PERSONALIZATION

Language Model Approach

N gram based approach

Re-ranking unpersonalized results Based on unigram (α = weighting parameter)

Q = q1 q2 q3 .... qn

P(q1 q2 q3 .... qn)= P(q1) P(q2) P(q3) ....... P(qn)

Page 34: WEB PERSONALIZATION

Language Model Approach

N gram based approach Based on bigrams

Q = q1 q2 q3 .... qn

P(q1 q2 q3 .... qn)= P(q1|q2) P(q2|q3) ....... P(qn-1|qn)

Page 35: WEB PERSONALIZATION

Language Model Approach

Noisy Channel based approach With using User's Feedback (Implicit) User history is represented as

Hi = (Q1,D1) , (Q2,D2) , .... (QN,DN) Di is the document visited for Qi

D consists of words w1, w2, .... wm

Basic Idea – Statistical Machine Translation Given Parallel Text of languages S and T We get P(ti|si) ∀ si ϵ S and ti ϵ T Using EM we get the optimized model P(T|S)

Page 36: WEB PERSONALIZATION

Language Model Approach

Noisy Channel based approach Similarly

T = past queries Q1, Q2, .... QK

S = text of relevant documents for queries T We learn the model P(Q|D) or more precisely P(qi|wj)

Assumption Translate the ideal [information containing] document into a query Document – a verbose language Query – a compact language

User profile is stored as Tuples < qi , wj , P(qi|wj) >

Page 37: WEB PERSONALIZATION

Language Model Approach

Noisy Channel based approach

Sample Noisy Channel User Profile

Page 38: WEB PERSONALIZATION

Language Model Approach

Noisy Channel based approach

Re-ranking Re-rank the documents using P(Q|D,u)

α = weighting parameter P(q

i|GE) is the lexical probability of q

i

Page 39: WEB PERSONALIZATION

Issues in Personalization

Cold Start Problem (new user problem)

Latency Problem (new item problem)

Data sparseness Scalability Privacy Recommendation List Diversity Robustness

Page 40: WEB PERSONALIZATION

Conclusion

Web personalization is the need of the hour for e-businesses

A relatively new research topic Several issues are yet to be solved effectively

Data should be collected without evading user privacy

Creating user models effectively and scaling it to the size of a large number of users/ items is at the core of Personalization

Page 41: WEB PERSONALIZATION

Bibliography Rohini U, Vamshi Ambati and Vasudeva Varma. Statistical

Machine Translation Models for Personalized Search. In the Proceedings of 3rd International Joint Conference on Natural Language Processing (IJCNLP 2008), January 7-12, 2008, Hyderabad, India.

Sarabjot S. Anand and Bamshad Mobasher. Intelligent techniques for web personalization. In Intelligent Techniques for Web Personalization, pages 1-36. Springer, 2005.

Vasudeva Verma. Personalization in Information Retrieval, Extraction and Access. In Workshop On Ontology, NLP, Personalization And IE/IR - IIT Bombay, Mumbai 15-17 July 2008

http://en.wikipedia.org/wiki/Personalisation Snapshots from Google Inc.

Page 42: WEB PERSONALIZATION

Questions