probabilistic models in recommender systems: time variant models

Post on 06-Jan-2017

825 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2015-12-10Eliezer de Souza da Silva (State-space models, Dynamic PMF vis HDP)

Tomasz Kuśmierczyk (Tensor factorization)

Session 3: Time variant models

Tensor factorizationState-space models

Dynamic Bayesian PMF (via HDP)

Approximate and Scalable Inference for ComplexProbabilistic Models in Recommender Systems

Part 1: Models and Representations

Literature / Sources● Temporal Collaborative Filtering with Bayesian Probabilistic Tensor

Factorization.-- Xiong, L., Chen, X., Huang, T. K., Schneider, J. G., & Carbonell, J. G. 2010. SDM Proceedings.

● Dynamic Matrix Factorization: A State-Space Approach -- John Z. Sun, Kush R. Varshney and Karthik Subbian. 2012. ICASSP.

● Dynamic Bayesian Probabilistic Matrix Factorization -- Sotirios P. Chatzis. 2014. AAAI.

Temporal Collaborative Filtering with

Bayesian Probabilistic Tensor Factorization

Matrix Factorization (previous cases)

M Items

N U

sers

latent 1 latent D

Ratings (normalized)

Matrix Factorization (previous cases)

Users(N x D)

Items(M x D)

Tensors generalization (multi-way data)- P-mode tensor of dimensions M1 x … x Mp (example: observations x

measurements x time x equipments).- Multiple relationships between multidimensional variables- Focus on 3-way (canonical decomposition or parallel factor analysis - CP)

CP Tensor Factorization (current case: 3 way analysis)

M Items

N U

sers K Con

texts

latent 1 latent D

Ratings (normalized)

CP Tensor Factorization (current case)

Users(N x D)

Items(M x D) Context values

(K x D)

M Items

N U

sers K Con

texts

latent 1 latent D

Ratings (normalized)

CP Tensor Factorization (current case)

Temporal ...

● 1 additional type of contexts = time

(3D tensor instead of 2D matrix R)

● In practice:○ ECCO sales: two context values per season (early/late

season)○ Netflix, Movielens: one context value per month

MAP Approach: what’s new to PMF

MAP Approach

MAP Approach

MAP Approach

MAP Approachargmax log p(U,V,T,T0| R)

argmax log p(R|U,V,T,T0) + log p(U,V,T,T0)

MAP Approachargmax log p(U,V,T,T0| R)

argmax log p(R|U,V,T,T0) + log p(U,V,T,T0)

MAP Approachargmax log p(U,V,T,T0| R)

argmax log p(R|U,V,T,T0) + log p(U,V,T,T0)

argmax

MAP Approach

● Four params (lambdas)

● SGD● Block Coordinate Descent

Bayesian approach

Bayesian approach

Bayesian approach

Predictions for unobserved

Integrate over all params

A posteriori distribution of

params

Observed evidence

Bayesian approach: Expectation over posterior dist

Bayesian approach: MCMC estimate

Sample from posterior distribution

Linear state-space approach

Linear state-space approach- User latent factors are time dependent- gaussian assumptions for the dynamics allows exact inference

Linear state-space approach- User latent factors are time dependent- User latent factors are hidden states in a state-space system

time dependent user features

Linear state-space approach- items latent factors are stationary- ratings are time dependent and observed

Stationary items factors

time dependent ratings

time dependent user features

Kalman filters: combining new information

System dynamics

Prediction

Kalman gain

Update

PMF meets Kalman

Stationary items factors

time dependent ratings

time dependent user features

PMF meets Kalman

PMF meets Kalman- Parameters are time-independent- Initial state iid zero mean gaussian for all users with similar scaling of preferences σU- process (time evolution of user preferences) and measurement (estimation of rating from user and item latent

factors) noise are iid zero mean gaussians, σQ,σR- Transitions (A) and measurements (items latent factors H) can be calculated to maximize the log-likelihood.

PMF meets Kalman: learning the parameters- EM with expected joint likelihood maximization- Other approaches: minimizing the residual prediction error, maximizing the prediction likelihood, maximizing the

measurement likelihood, optimizing the performance after smoothing.

Dynamic Bayesian Probabilistic Matrix Factorization

Dynamic Bayesian Probabilistic Matrix Factorization- User patterns changing over time- Groups of users share latent structure (clustering of user features)- Capture the dynamics of the generative process of the group structure- dHDP - dynamic hierarchical dirichlet process

Dirichlet distribution

Dirichlet distribution

Dirichlet process- Distribution of distributions (infinite distribution of discrete distributions)- Clustering effect: rich gets richer- Chinese Restaurant process.

Hierarchical Dirichlet Process (HDP)

HDP for time domain

Bayesian PMF

dHDP

Groups of users

Bayesian PMF

top related