information filtering si650: information retrieval

2010 © University of Michigan

Information Filtering

SI650: Information Retrieval

Winter 2010

School of Information

University of Michigan

Many slides are from Prof. ChengXiang Zhai’s lecture

1


Lecture Plan

• Filtering vs. Retrieval

• Content-based filtering (adaptive filtering)

• Collaborative filtering (recommender systems)

2


Short vs. Long Term Info Need

• Short-term information need (Ad hoc retrieval)

– “Temporary need”, e.g., info about used cars

– Information source is relatively static

– User “pulls” information

– Application example: library search, Web search

• Long-term information need (Filtering)

– “Stable need”, e.g., new data mining algorithms

– Information source is dynamic

– System “pushes” information to user

– Applications: news filter, recommender systems

3


Examples of Information Filtering

• News filtering

• Email filtering

• Movie/book recommenders

• Literature recommenders

• And many others …

4


Content-based Filtering vs.

Collaborative Filtering

• Basic filtering question: Will user U like item X?

• Two different ways of answering it

– Look at what U likes

– Look at who likes X

• Can be combined

characterize X content-based filtering

characterize U collaborative filtering

5


Content-based filtering is also called

1. “Adaptive Information Filtering” in TREC

2. “Selective Dissemination of Information (SDI)

in Library & Information Science

Collaborative filtering is also called

“Recommender Systems”

6


Why Bother?

• Why do we need recommendation?

• Why should amazon provide recommendation?

• Why does Netflix spend millions to improve their

recommendation algorithm?

• Why can‟t we rely on a search engine for all the

recommendation?

7


Chris Anderson, „The Long Tail‟, Wired, Issue 12.10 - October 2004

Need a way to discover items that match our interests & tastes among

tens or hundreds of thousands

The Long Tail

8

http://www.wired.com/wired/archive/12.10/


Part I: Adaptive Filtering

9


Adaptive Information Filtering

• Stable & long term interest, dynamic info. source

• System must make a delivery decision

immediately as a document “arrives”

Filtering

System …

my interest:

10


AIF vs. Retrieval & Categorization

• Like retrieval over a dynamic stream of docs, but

(global) ranking is impossible

• Like online binary categorization, but with no initial

training data and with limited feedback

• Typically evaluated with a utility function

– Each delivered doc gets a utility value

– Good doc gets a positive value

– Bad doc gets a negative value

– E.g., Utility = 3* #good - 2 *#bad (linear utility)

11


A Typical AIF System

... Binary

Classifier

User

Interest

Profile

User

Doc Source

Accepted Docs

Initialization

Learning Feedback Accumulated

Docs

utility func.

User profile

text

12


Three Basic Problems in AIF

• Making filtering decision (Binary classifier)

– Doc text, profile text yes/no

• Initialization

– Initialize the filter based on only the profile text or very

few examples

• Learning from

– Limited relevance judgments (only on “yes” docs)

– Accumulated documents

• All trying to maximize the utility

13


Major Approaches to AIF

• “Extended” retrieval systems

– “Reuse” retrieval techniques to score documents

– Use a score threshold for filtering decision

– Learn to improve scoring with traditional feedback

– New approaches to threshold setting and learning

• “Modified” categorization systems

– Adapt to binary, unbalanced categorization

– New approaches to initialization

– Train with “censored” training examples

14


Difference between Ranking, Classification,

and Recommendation

• What is the query?

– Ranking: no query

– Classification: labeled data (a.k.a, training data)

– Recommendation: user + items

• What is the target?

– Ranking: order of objects

– Classification: labels of objects

– Recommendation: Score? Label? Order?

15


A General Vector-Space Approach

doc

vector

profile vector

Scoring Thresholding

yes

no

Feedback

Information

Vector Learning

Threshold Learning

threshold

Utility Evaluation

16


Difficulties in Threshold Learning

36.5 R

33.4 N

32.1 R

29.9 ?

27.3 ?

…

...

=30.0

• Censored data

• Little/none labeled data

• Scoring bias due to vector learning

• Exploration vs. Exploitation

17


Threshold Setting

in Extended Retrieval Systems

• Utility-independent approaches (generally not

working well, not covered in this lecture)

• Indirect (linear) utility optimization

– Logistic regression (score prob. of relevance)

• Direct utility optimization

– Empirical utility optimization

– Expected utility optimization given score distributions

• All try to learn the optimal threshold

18


Logistic Regression (Robertson & Walker. 00)

• General idea: convert score of D to p(R|D)

• Fit the model using feedback data

• Linear utility is optimized with a fixed prob. cutoff

• But, – Possibly incorrect parametric assumptions

– No positive examples initially

– Censored data and limited positive feedback

– Doesn‟t address the issue of exploration

)()|(log DsDRO

19


Direct Utility Optimization

• Given

– A utility function U(CR+ ,CR- ,CN+ ,CN-)

– Training data D={<si, {R,N,?}>}

• Formulate utility as a function of the threshold and training data: U=F(,D)

• Choose the threshold by optimizing F(,D), i.e.,

),(maxarg DF

20


Empirical Utility Optimization

• Basic idea

– Compute the utility on the training data for each candidate threshold (score of a training doc)

– Choose the threshold that gives the maximum utility

• Difficulty: Biased training sample!

– We can only get an upper bound for the true optimal threshold.

• Solutions:

– Heuristic adjustment(lowering) of threshold

– Lead to “beta-gamma threshold learning”

21


Score Distribution Approaches ( Aramptzis & Hameren 01; Zhang & Callan 01)

• Assume generative model of scores p(s|R),

p(s|N)

• Estimate the model with training data

• Find the threshold by optimizing the expected

utility under the estimated model

• Specific methods differ in the way of defining

and estimating the scoring distributions

22


Gaussian-Exponential Distributions

• P(s|R) ~ N(,2) p(s-s0|N) ~ E()

)(

)(

)|()|( 02

2

02

1 ss

s

eNsspeRsp

(From Zhang & Callan 2001)

23


Score Distribution Approaches (cont.)

• Pros

– Principled approach

– Arbitrary utility

– Empirically effective

• Cons

– May be sensitive to the scoring function

– Exploration not addressed

24


Part II: Collaborative Filtering

25


What is Collaborative Filtering (CF)?

• Making filtering decisions for an individual user

based on the judgments of other users

• Inferring individual‟s interest/preferences from

that of other similar users

• General idea

– Given a user u, find similar users {u1, …, um}

– Predict u‟s preferences based on the preferences of

u1, …, um

26


CF: Assumptions

• Users with a common interest will have similar

preferences

• Users with similar preferences probably share the

same interest

• Examples

– “interest is IR” “favor SIGIR papers”

– “favor SIGIR papers” “interest is IR”

• Sufficiently large number of user preferences are

available

27


CF: Intuitions

• User similarity (Paul Resnick vs. Rahul Sami)

– If Paul liked the paper, Rahul will like the paper

– ? If Paul liked the movie, Rahul will like the movie

– Suppose Paul and Rahul viewed similar movies in the

past six months …

• Item similarity

– Since 90% of those who liked Star Wars also liked

Independence Day, and, you liked Star Wars

– You may also like Independence Day

The content of items “didn’t matter”!

28


Rating-based vs. Preference-based

• Rating-based: User‟s preferences are

encoded using numerical ratings on items

– Complete ordering

– Absolute values can be meaningful

– But, values must be normalized to combine

• Preference-based: User‟s preferences are

represented by partial ordering of items

(Learning to Rank!)

– Partial ordering

– Easier to exploit implicit preferences

29


A Formal Framework for Rating

u1

u2

…

ui

...

um

Users: U

Objects: O

o1 o2 … oj … on

3 1.5 …. … 2

2

1

3

Xij=f(ui,oj)=?

?

The task

Unknown function

f: U x O R

• Assume known f values for

some (u,o)’s

• Predict f values for other

(u,o)’s

• Essentially function

approximation, like other

learning problems

30


Where are the intuitions?

• Similar users have similar preferences

– If u u‟, then for all o‟s, f(u,o) f(u‟,o)

• Similar objects have similar user preferences

– If o o‟, then for all u‟s, f(u,o) f(u,o‟)

• In general, f is “locally constant”

– If u u‟ and o o‟, then f(u,o) f(u‟,o‟)

– “Local smoothness” makes it possible to predict

unknown values by interpolation or extrapolation

• What does “local” mean?

31


Two Groups of Approaches

• Memory-based approaches

– f(u, o) = g(u)(o) g(u‟)(o) if u u‟ (g =preference function)

– Find “neighbors” of u and combine g(u‟)(o)‟s

• Model-based approaches (not covered)

– Assume structures/model: object cluster, user cluster, f‟ defined on clusters

– f(u, o) = f‟(cu, co)

– Estimation & Probabilistic inference

32


Memory-based Approaches (Resnick et al. 94)

• General ideas:

– xij: rating of object j by user i

– ni: average rating of all objects by user i

– Normalized ratings:

– Memory-based prediction

• Specific approaches differ in w(a, i) -- the

distance/similarity between user a and i

m

i

ijdj viawkv1

),(

m

i

iawk1

),(/1 aajaj nvx

iijij nxv

33


User Similarity Measures

• Pearson correlation coefficient (sum over

commonly rated items)

• Cosine measure

• Many other possibilities!

j

iij

j

aaj

j

iijaaj

p

nxnx

nxnx

iaw22 )()(

))((

),(

n

j

ij

n

j

aj

n

j

ijaj

c

xx

xx

iaw

1

2

1

2

1),(

34


Improving User Similarity Measures (Breese et al. 98)

• Dealing with missing values: default ratings

• Inverse User Frequency (IUF): similar to IDF

• Case Amplification: use w(a, i)p, e.g., p=2.5

35


A Network Interpretation

• If your neighbors love it,

you are likely to love it

• Closer neighbors make

a larger impact

• Measure the closeness

by the items you and

her liked/disliked

36


Network based approaches

• No clear notion of “distance”

now, but the “scores” are

estimated based on some

propagation process.

• Score propagation based

on random walk

• Two approaches:

– Starting from the user, how

likely/how long can I reach the

object?

– Starting from the object, how

likely/how long can I hit the

user?

37


Random Walk and Hitting Time

i

k

A

j P = 0.7

P = 0.3

• Hitting Time

– TA: the first time that the random

walk is at a vertex in A

• Mean Hitting Time

– hiA: expectation of TA given that

the walk starts from vertex i

0.3

0.7

38


Computing Hitting Time

i

k A

j

TA: the first time that the random

walk is at a vertex in A

}0,:min{ tAXtT t

A

A ifor ,1)( Vj

A

jhjip

A

ihA ifor ,0

Iterative

Computation

hiA: expectation of TA given that the

walk starting from vertex i

A i

h = 0

hiA = 0.7 hj

A + 0.3 hkA + 1

0.7

0.7

Apparently, hiA = 0 for those

39


Bipartite Graph and Hitting Time

Expected proximity of query i to the

query A : hitting time of i A, hiA

Bipartite Graph: - Edges between V1 and V2

- No edge inside V1 or V2

- Edges are weighted

- e.g., V1 = query; V2 = Url

A

i j w(i, j) = 3

4

5

0.7

0.4 V1 V2

7 1

)73(

3),()(

id

jiwjip

A

i j

4

5

0.7

0.4 V1 V2

7 1

)13(

3),()(

jd

jiwijp

A

k

i j

4

5

0.7

0.4 V1 V2

7 1

2

),(),()(

Vj ji d

jkw

d

jiwkip

• convert to a directed graph, even collapse one group

40


Example: Query Suggestion

Welcome to the

hotel california

Suggestions

hotel california

eagles hotel california

hotel california band

hotel california by the eagles

hotel california song

lyrics of hotel california

listen hotel california eagle


Generating Query Suggestion using Click

Graph (Mei et al. 08)

42

A

aa

american airline

mexiana

www.aa.com

www.theaa.com/travelwatch/ planner_main.jsp

en.wikipedia.org/wiki/Mexicana

300

15

Query Url • Construct a (kNN)

subgraph from the

query log data (of a

predefined number of

queries/urls)

• Compute transition

probabilities p(i j)

• Compute hitting time hiA

• Rank candidate queries

using hiA

freq(q, url)


Result: Query Suggestion

Hitting time

wikipedia friends

friends tv show wikipedia

friends home page

friends warner bros

the friends series

friends official site

friends(1994)

Google

friendship

friends poem

friendster

friends episode guide

friends scripts

how to make friends

true friends

Yahoo

secret friends

friends reunited

hide friends

hi 5 friends

find friends

poems for friends

friends quotes

Query = friends

43


Summary

• Filtering is “easy”

– The user‟s expectation is low

– Any recommendation is better than none

– Making it practically useful

• Filtering is “hard”

– Must make a binary decision, though ranking is

also possible

– Data sparseness

– “Cold start”

44


Example: NetFlix

45

2010 © University of Michigan 46

References (Adaptive Filtering)

General papers on TREC filtering evaluation D. Hull, The TREC-7 Filtering Track: Description and Analysis , TREC-7 Proceedings.

D. Hull and S. Robertson, The TREC-8 Filtering Track Final Report, TREC-8 Proceedings.

S. Robertson and D. Hull, The TREC-9 Filtering Track Final Report, TREC-9 Proceedings.

S. Robertson and I. Soboroff, The TREC 2001 Filtering Track Final Report, TREC-10

Proceedings

Papers on specific adaptive filtering methods

Stephen Robertson and Stephen Walker (2000), Threshold Setting in Adaptive Filtering .

Journal of Documentation, 56:312-331, 2000

Chengxiang Zhai, Peter Jansen, and David A. Evans, Exploration of a heuristic approach to

threshold learning in adaptive filtering, 2000 ACM SIGIR Conference on Research and

Development in Information Retrieval (SIGIR'00), 2000. Poster presentation.

Avi Arampatzis and Andre van Hameren The Score-Distributional Threshold Optimization for

Adaptive Binary Classification Tasks , SIGIR'2001.

Yi Zhang and Jamie Callan, 2001, Maximum Likelihood Estimation for Filtering Threshold,

SIGIR 2001.

T. Ault and Y. Yang, 2002, kNN, Rocchio and Metrics for Information Filtering at TREC-10,

TREC-10 Proceedings.


References (Collaborative Filtering)

• Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., and Riedl, J., "GroupLens: An

Open Architecture for Collaborative Filtering of Netnews," CSCW. 175-186.

• P Resnick, HR Varian, Recommender Systems, CACM 1997

• Cohen, W.W., Schapire, R.E., and Singer, Y. (1999) "Learning to Order Things",

Journal of AI Research, Volume 10, pages 243-270.

• Freund, Y., Iyer, R.,Schapire, R.E., & Singer, Y. (1999). An efficient boosting algorithm

for combining preferences. Machine Learning Journal. 1999.

• Breese, J. S., Heckerman, D., and Kadie, C. (1998). Empirical Analysis of Predictive

Algorithms for Collaborative Filtering. In Proceedings of the 14th Conference on

Uncertainty in Articial Intelligence, pp. 43-52.

• Alexandrin Popescul and Lyle H. Ungar, Probabilistic Models for Unified

Collaborative and Content-Based Recommendation in Sparse-Data Environments,

UAI 2001.

• N. Good, J.B. Schafer, J. Konstan, A. Borchers, B. Sarwar, J. Herlocker, and J. Riedl.

"Combining Collaborative Filtering with Personal Agents for Better

Recommendations." Proceedings AAAI-99. pp 439-446. 1999.

• Qiaozhu Mei, Dengyong Zhou, Kenneth Church. Query Suggestion Using Hitting

Time, CIKM'08, pages 469-478

47

information filtering si650: information retrieval

Documents