yucheng lowmultiple domain user personalization deepak agarwal yahoo! research yucheng low carnegie...

39
Yucheng Low Multiple Domain User Personalization Multiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Upload: axel-mustin

Post on 29-Mar-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Multiple Domain User

Personalization

Deepak AgarwalYahoo! Research

Yucheng Low Carnegie Mellon University

Alexander J. SmolaYahoo! Research

Page 2: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Information Flood

Page 3: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Personalization

3

Golf Reader Tech. Reader

Can we provide personalization to new

users?

Page 4: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

MoviesUser 1

User 2

Impossible when you have only one domain.Best you can do is to have a good baseline.

One Domain Cold-Start

Page 5: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Movies NewsMusic

Possible when you have many domains.

Multiple Domains Cold Start

Page 6: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Personalization across all domain

Combine tokens from all spaces ignoring the

source domain User

Reads Golf NewsWatches MTV

Golf, Tiger,Music, Song

Expand token space to include source domain

Golf:1, Tiger:1,Music:2, Song:2

Your FavoritePersonalization

Algorithm

Page 7: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Personalization across all domain

Combine tokens from all spaces ignoring the

source domain User

Reads Golf NewsWatches MTV

Golf, Tiger,Music, Song

Expand token space to include source domain

Golf:1, Tiger:1,Music:2, Song:2

Your FavoritePersonalization

Algorithm

Domains with more observations will swamp out all other domains

What is a good personalization algorithm

that will work for all domains?

Page 8: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Solution Meta-Profile

User MetaProfile

User MusicProfile

User NewsProfile

Isolates each domain: Prevents larger domains from swamping out smaller domains.

PersonalizedNews

PersonalizedMusic

Page 9: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Solution Meta-Profile

User MetaProfile

User MusicProfile

User NewsProfile

User MovieProfile

Extensible: domains can be added/removed easily

Page 10: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

Basketball

NBA, hoopTrain

3-point

Topic 1Golf,

Tiger, Woods,

Club, Green, Hole-in-

one

Topic 2Machine,

Learning, Neural,

Network,Train

Topic 3

DocumentTopic 1Topic 2Topic 3

Michael I. Jordan trains a

Neural Network to play golf

2Golf

3Network

Page 11: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

N

Document

1. Each document has a mixture over topics

2. For each word in each document

a)Draw a topicb)Draw a word from the topic

A document is a bag of words.A topic is a mixture of words.

Page 12: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

N

Document

1.Each document has a mixture over topics

2. For each word in each document

a)Draw a topicb)Draw a word from the topic

A document is a bag of words.A topic is a mixture of words.

Document

Page 13: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

N

Document

1. Each document has a mixture over topics

2.For each word in each documenta)Draw a topicb)Draw a word from the topic

A document is a bag of words.A topic is a mixture of words.

Document

Sample From:

Page 14: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

N

Document

1. Each document has a mixture over topics

2.For each word in each documenta)Draw a topicb)Draw a word from the topic

A document is a bag of words.A topic is a mixture of words.

Topic 1: Basketball, Michael, Jordan

Topic 2: Golf, Tiger, Woods, Club, GreenTopic 3: Machine, Learning, Neural

Page 15: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

N

Document

1. Each document has a mixture over topics

2. For each word in each documenta)Draw a topicb)Draw a word from the topic

A document is a bag of words.A topic is a mixture of words.

Topics which make upeach document

Words which make up

each topic

Page 16: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Single Domain Personalization

N

1. Each user has a mixture over topics 2. For each word in each

documenta)Draw a topicb)Draw a word from the topic

A user’s interaction with a domain is a bag of words.A topic is a mixture of words.

User

Words which make up

each topic

Topics each user is interested in

Page 17: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Multiple Domain Personalization

N

User u’s interaction with domain dUser

A user’s interaction with a domain is a bag of words.A topic is a mixture of words.

Each user has a meta-profile:

Each domain has a latent matrix:

User’s prior interest in a domain is

Page 18: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Solution Meta-Profile

User MetaProfile

User MusicProfile

User NewsProfile

User MovieProfile

Page 19: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Users

Music

News

Movies

Topic->word table

Topic->word table

Topic->word table

Page 20: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Gibbs Sampling

N

User u’s interaction with domain p

LDA

Page 21: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Gibbs Sampling

N

User u’s interaction with domain p

Hold Constant

Sample using LDA Sampler

1: Sample

Hold Constant

Page 22: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Gibbs Sampling

N

User u’s interaction with domain p

Hold Constant

Hold Constant

1: Sample2: Sample

Sample Langevin Diffusion

Page 23: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Gibbs Sampling

N

User u’s interaction with domain p

Optimize

Hold Constant

1: Sample2: Sample 3: Optimize

Hold Constant

LBFGS

Page 24: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Experiments

Page 25: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Experiments @ Yahoo! 2 domain dataset.

Frontpage and News clicks of 5.6 million users. Frontpage/News: Article text for each click.

3 domain dataset: Frontpage, News and MyYahoo clicks of 5.6 million users. MyYahoo: Only has article IDs for each click with no text. Not semantically meaningful.

All user information was anonymized.

Page 26: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Test Protocol

Holdout proportion of users who see more than one domain. Hide one of those domain and try to predict the words.

Prediction metric is cosine similarityBaseline is “mean prediction”.

Page 27: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

ImplementationDistributed implementation in C++ using Memcached for communication.

Alex Smola, Shravan Narayanamurthy “An Architecture for Parallel Topic Models” VLDB 2010

Distributed LBFGS line search: Implement standard MPI-like in Memcached.

BroadcastReduceBarrier

Takes 2-3 days for 500 iterations on 30 machines

Page 28: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

2 Property Sanity Check

Page 29: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

2 Property

Page 30: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

3 Property

Page 31: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

3 Property

Page 32: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

sandra, oscar, oscars, red, carpet, bullock, golden, gown, bullocks, nominee, bestactress, sparkles, stunning,

vienna, bachelor, jake, pavelka, giraldi, finale, show, stars, dancing, love, season, time, abc,

bacteria, fight, super, struggling, developed, doctors, resistant, lethal, virtually, drugs, antibiotic, competitors, chad,

film, movie, movies, films, director, story, avatar, james, time, hollywood, big, make, hes, star,

Frontpage -> NewsCelebrity

Entertainment

Science

Science Fiction

Page 33: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

iphone, apple, app, apps, ipod, google, store, apples, android, mac, mobile, touch, ipad, device, phone,

college, year, earn, years, 000, bestpaid, average, 129, colleges, graduates, ten, alums, schools, actor, likes,

health, care, bill, obama, president, rep, house, republican, senate, news, sen, democrats, fox, congress, reform

drafts, player, nfl, scouts, team, riskiest, peril, bryant, dez, pick, talented, nba, james, news,

News -> Frontpage

home, bank, facing, ceo, gomez, eviction, rosalina, bought, cleaning, foreclosed, client, janitor, offices, surprising, video,,

captured, inside, mountain, terrorist, observers, impresses, alqaidas, complexity, base, features, hideout, size, special, secret, struck,,

Politics Devices

College

Page 34: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Extension

User MetaProfile

User MusicProfile

User NewsProfile

Latent Dirichlet Allocation

Latent Dirichlet Allocation

User MovieProfile

Latent Dirichlet Allocation

Page 35: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Extension

User MetaProfile

User MusicProfile

User NewsProfile

Flexible: Allows different algorithm for each domain

Linear ModelMatrix

Factorization

User MovieProfilefLDA

Page 36: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

It Is How You Use It

User MetaProfile

User MusicProfile

Personalized withAlgorithm X

Use the Meta Profile for Initialization.

Page 37: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

It Is How You Use It

User MetaProfile

User MusicProfile

Personalized withAlgorithm X

Periodically Update the Meta Profile and Domain Latent Matrix

Page 38: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

ConclusionAn generic, extensible model for combining domain personalization schemes. Scalable inference procedure that extends to millions of users.Demonstrate strong predictive performance on a large real world data

Page 39: Yucheng LowMultiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

Yucheng Low Multiple Domain User Personalization

Questions?