yucheng lowmultiple domain user personalization deepak agarwal yahoo! research yucheng low carnegie...

Post on 29-Mar-2015

214 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Yucheng Low Multiple Domain User Personalization

Multiple Domain User

Personalization

Deepak AgarwalYahoo! Research

Yucheng Low Carnegie Mellon University

Alexander J. SmolaYahoo! Research

Yucheng Low Multiple Domain User Personalization

Information Flood

Yucheng Low Multiple Domain User Personalization

Personalization

3

Golf Reader Tech. Reader

Can we provide personalization to new

users?

Yucheng Low Multiple Domain User Personalization

MoviesUser 1

User 2

Impossible when you have only one domain.Best you can do is to have a good baseline.

One Domain Cold-Start

Yucheng Low Multiple Domain User Personalization

Movies NewsMusic

Possible when you have many domains.

Multiple Domains Cold Start

Yucheng Low Multiple Domain User Personalization

Personalization across all domain

Combine tokens from all spaces ignoring the

source domain User

Reads Golf NewsWatches MTV

Golf, Tiger,Music, Song

Expand token space to include source domain

Golf:1, Tiger:1,Music:2, Song:2

Your FavoritePersonalization

Algorithm

Yucheng Low Multiple Domain User Personalization

Personalization across all domain

Combine tokens from all spaces ignoring the

source domain User

Reads Golf NewsWatches MTV

Golf, Tiger,Music, Song

Expand token space to include source domain

Golf:1, Tiger:1,Music:2, Song:2

Your FavoritePersonalization

Algorithm

Domains with more observations will swamp out all other domains

What is a good personalization algorithm

that will work for all domains?

Yucheng Low Multiple Domain User Personalization

Solution Meta-Profile

User MetaProfile

User MusicProfile

User NewsProfile

Isolates each domain: Prevents larger domains from swamping out smaller domains.

PersonalizedNews

PersonalizedMusic

Yucheng Low Multiple Domain User Personalization

Solution Meta-Profile

User MetaProfile

User MusicProfile

User NewsProfile

User MovieProfile

Extensible: domains can be added/removed easily

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

Basketball

NBA, hoopTrain

3-point

Topic 1Golf,

Tiger, Woods,

Club, Green, Hole-in-

one

Topic 2Machine,

Learning, Neural,

Network,Train

Topic 3

DocumentTopic 1Topic 2Topic 3

Michael I. Jordan trains a

Neural Network to play golf

2Golf

3Network

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

N

Document

1. Each document has a mixture over topics

2. For each word in each document

a)Draw a topicb)Draw a word from the topic

A document is a bag of words.A topic is a mixture of words.

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

N

Document

1.Each document has a mixture over topics

2. For each word in each document

a)Draw a topicb)Draw a word from the topic

A document is a bag of words.A topic is a mixture of words.

Document

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

N

Document

1. Each document has a mixture over topics

2.For each word in each documenta)Draw a topicb)Draw a word from the topic

A document is a bag of words.A topic is a mixture of words.

Document

Sample From:

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

N

Document

1. Each document has a mixture over topics

2.For each word in each documenta)Draw a topicb)Draw a word from the topic

A document is a bag of words.A topic is a mixture of words.

Topic 1: Basketball, Michael, Jordan

Topic 2: Golf, Tiger, Woods, Club, GreenTopic 3: Machine, Learning, Neural

Yucheng Low Multiple Domain User Personalization

Latent Dirichlet Allocation

N

Document

1. Each document has a mixture over topics

2. For each word in each documenta)Draw a topicb)Draw a word from the topic

A document is a bag of words.A topic is a mixture of words.

Topics which make upeach document

Words which make up

each topic

Yucheng Low Multiple Domain User Personalization

Single Domain Personalization

N

1. Each user has a mixture over topics 2. For each word in each

documenta)Draw a topicb)Draw a word from the topic

A user’s interaction with a domain is a bag of words.A topic is a mixture of words.

User

Words which make up

each topic

Topics each user is interested in

Yucheng Low Multiple Domain User Personalization

Multiple Domain Personalization

N

User u’s interaction with domain dUser

A user’s interaction with a domain is a bag of words.A topic is a mixture of words.

Each user has a meta-profile:

Each domain has a latent matrix:

User’s prior interest in a domain is

Yucheng Low Multiple Domain User Personalization

Solution Meta-Profile

User MetaProfile

User MusicProfile

User NewsProfile

User MovieProfile

Yucheng Low Multiple Domain User Personalization

Users

Music

News

Movies

Topic->word table

Topic->word table

Topic->word table

Yucheng Low Multiple Domain User Personalization

Gibbs Sampling

N

User u’s interaction with domain p

LDA

Yucheng Low Multiple Domain User Personalization

Gibbs Sampling

N

User u’s interaction with domain p

Hold Constant

Sample using LDA Sampler

1: Sample

Hold Constant

Yucheng Low Multiple Domain User Personalization

Gibbs Sampling

N

User u’s interaction with domain p

Hold Constant

Hold Constant

1: Sample2: Sample

Sample Langevin Diffusion

Yucheng Low Multiple Domain User Personalization

Gibbs Sampling

N

User u’s interaction with domain p

Optimize

Hold Constant

1: Sample2: Sample 3: Optimize

Hold Constant

LBFGS

Yucheng Low Multiple Domain User Personalization

Experiments

Yucheng Low Multiple Domain User Personalization

Experiments @ Yahoo! 2 domain dataset.

Frontpage and News clicks of 5.6 million users. Frontpage/News: Article text for each click.

3 domain dataset: Frontpage, News and MyYahoo clicks of 5.6 million users. MyYahoo: Only has article IDs for each click with no text. Not semantically meaningful.

All user information was anonymized.

Yucheng Low Multiple Domain User Personalization

Test Protocol

Holdout proportion of users who see more than one domain. Hide one of those domain and try to predict the words.

Prediction metric is cosine similarityBaseline is “mean prediction”.

Yucheng Low Multiple Domain User Personalization

ImplementationDistributed implementation in C++ using Memcached for communication.

Alex Smola, Shravan Narayanamurthy “An Architecture for Parallel Topic Models” VLDB 2010

Distributed LBFGS line search: Implement standard MPI-like in Memcached.

BroadcastReduceBarrier

Takes 2-3 days for 500 iterations on 30 machines

Yucheng Low Multiple Domain User Personalization

2 Property Sanity Check

Yucheng Low Multiple Domain User Personalization

2 Property

Yucheng Low Multiple Domain User Personalization

3 Property

Yucheng Low Multiple Domain User Personalization

3 Property

Yucheng Low Multiple Domain User Personalization

sandra, oscar, oscars, red, carpet, bullock, golden, gown, bullocks, nominee, bestactress, sparkles, stunning,

vienna, bachelor, jake, pavelka, giraldi, finale, show, stars, dancing, love, season, time, abc,

bacteria, fight, super, struggling, developed, doctors, resistant, lethal, virtually, drugs, antibiotic, competitors, chad,

film, movie, movies, films, director, story, avatar, james, time, hollywood, big, make, hes, star,

Frontpage -> NewsCelebrity

Entertainment

Science

Science Fiction

Yucheng Low Multiple Domain User Personalization

iphone, apple, app, apps, ipod, google, store, apples, android, mac, mobile, touch, ipad, device, phone,

college, year, earn, years, 000, bestpaid, average, 129, colleges, graduates, ten, alums, schools, actor, likes,

health, care, bill, obama, president, rep, house, republican, senate, news, sen, democrats, fox, congress, reform

drafts, player, nfl, scouts, team, riskiest, peril, bryant, dez, pick, talented, nba, james, news,

News -> Frontpage

home, bank, facing, ceo, gomez, eviction, rosalina, bought, cleaning, foreclosed, client, janitor, offices, surprising, video,,

captured, inside, mountain, terrorist, observers, impresses, alqaidas, complexity, base, features, hideout, size, special, secret, struck,,

Politics Devices

College

Yucheng Low Multiple Domain User Personalization

Extension

User MetaProfile

User MusicProfile

User NewsProfile

Latent Dirichlet Allocation

Latent Dirichlet Allocation

User MovieProfile

Latent Dirichlet Allocation

Yucheng Low Multiple Domain User Personalization

Extension

User MetaProfile

User MusicProfile

User NewsProfile

Flexible: Allows different algorithm for each domain

Linear ModelMatrix

Factorization

User MovieProfilefLDA

Yucheng Low Multiple Domain User Personalization

It Is How You Use It

User MetaProfile

User MusicProfile

Personalized withAlgorithm X

Use the Meta Profile for Initialization.

Yucheng Low Multiple Domain User Personalization

It Is How You Use It

User MetaProfile

User MusicProfile

Personalized withAlgorithm X

Periodically Update the Meta Profile and Domain Latent Matrix

Yucheng Low Multiple Domain User Personalization

ConclusionAn generic, extensible model for combining domain personalization schemes. Scalable inference procedure that extends to millions of users.Demonstrate strong predictive performance on a large real world data

Yucheng Low Multiple Domain User Personalization

Questions?

top related