probabilistic models in recommender systems: time variant models

2015-12-10Eliezer de Souza da Silva (State-space models, Dynamic PMF vis HDP)

Tomasz Kuśmierczyk (Tensor factorization)

Session 3: Time variant models

Tensor factorizationState-space models

Dynamic Bayesian PMF (via HDP)

Approximate and Scalable Inference for ComplexProbabilistic Models in Recommender Systems

Part 1: Models and Representations

Literature / Sources● Temporal Collaborative Filtering with Bayesian Probabilistic Tensor

Factorization.-- Xiong, L., Chen, X., Huang, T. K., Schneider, J. G., & Carbonell, J. G. 2010. SDM Proceedings.

● Dynamic Matrix Factorization: A State-Space Approach -- John Z. Sun, Kush R. Varshney and Karthik Subbian. 2012. ICASSP.

● Dynamic Bayesian Probabilistic Matrix Factorization -- Sotirios P. Chatzis. 2014. AAAI.

Temporal Collaborative Filtering with

Bayesian Probabilistic Tensor Factorization

Matrix Factorization (previous cases)

M Items

latent 1 latent D

Ratings (normalized)

Matrix Factorization (previous cases)

Users(N x D)

Items(M x D)

Tensors generalization (multi-way data)- P-mode tensor of dimensions M1 x … x Mp (example: observations x

measurements x time x equipments).- Multiple relationships between multidimensional variables- Focus on 3-way (canonical decomposition or parallel factor analysis - CP)

CP Tensor Factorization (current case: 3 way analysis)

M Items

sers K Con

latent 1 latent D

CP Tensor Factorization (current case)

Users(N x D)

Items(M x D) Context values

(K x D)

M Items

sers K Con

latent 1 latent D

CP Tensor Factorization (current case)

Temporal ...

● 1 additional type of contexts = time

(3D tensor instead of 2D matrix R)

● In practice:○ ECCO sales: two context values per season (early/late

season)○ Netflix, Movielens: one context value per month

MAP Approach: what’s new to PMF

MAP Approach

MAP Approachargmax log p(U,V,T,T0| R)

argmax log p(R|U,V,T,T0) + log p(U,V,T,T0)

argmax

MAP Approach

● Four params (lambdas)

● SGD● Block Coordinate Descent

Bayesian approach

Predictions for unobserved

Integrate over all params

A posteriori distribution of

params

Observed evidence

Bayesian approach: Expectation over posterior dist

Bayesian approach: MCMC estimate

Sample from posterior distribution

Linear state-space approach

Linear state-space approach- User latent factors are time dependent- gaussian assumptions for the dynamics allows exact inference

Linear state-space approach- User latent factors are time dependent- User latent factors are hidden states in a state-space system

time dependent user features

Linear state-space approach- items latent factors are stationary- ratings are time dependent and observed

Stationary items factors

time dependent ratings

Kalman filters: combining new information

System dynamics

Prediction

Kalman gain

Update

PMF meets Kalman

Stationary items factors

time dependent ratings

PMF meets Kalman

PMF meets Kalman- Parameters are time-independent- Initial state iid zero mean gaussian for all users with similar scaling of preferences σU- process (time evolution of user preferences) and measurement (estimation of rating from user and item latent

factors) noise are iid zero mean gaussians, σQ,σR- Transitions (A) and measurements (items latent factors H) can be calculated to maximize the log-likelihood.

PMF meets Kalman: learning the parameters- EM with expected joint likelihood maximization- Other approaches: minimizing the residual prediction error, maximizing the prediction likelihood, maximizing the

measurement likelihood, optimizing the performance after smoothing.

Dynamic Bayesian Probabilistic Matrix Factorization

Dynamic Bayesian Probabilistic Matrix Factorization- User patterns changing over time- Groups of users share latent structure (clustering of user features)- Capture the dynamics of the generative process of the group structure- dHDP - dynamic hierarchical dirichlet process

Dirichlet distribution

Dirichlet process- Distribution of distributions (infinite distribution of discrete distributions)- Clustering effect: rich gets richer- Chinese Restaurant process.

Hierarchical Dirichlet Process (HDP)

HDP for time domain

Bayesian PMF

Groups of users

Bayesian PMF

probabilistic models in recommender systems: time variant models

Data & Analytics

recommender systems - universidade nova de...

saab aeronautics handbook for development of simulation...

machine learning models for context-aware recommender …

cognitive models in recommender systems

recommender systems an introduction chapter07 evaluating...

recommender systems handbook - home -...

recommender systems - university of washington€¦ ·...

low rank models for recommender …low rank models for...

tfr: a tourist food recommender system based on...

deep matrix factorization models for recommender systems ·...

latent factor models for web recommender...

use of discrete choice models with recommender systems ·...

sap assortment planning for retail 2.0 … article variant...

latent factor models for web recommender systems - ideal...

tutorial: recommender systemswelling/teaching/cs77b... ·...

yashar deldjoo, tommaso di noia, and felice antonio … ·...

personalized recommender by exploiting domain based expert...

recommender lecture

dynamic time variant connection management for pgas models...

recommender introduction to recommender systems and