recommender systems and their effects - ibmmatrix completion in recommender systems let us consider...

75
IBM Research - Ireland Recommender Systems and their Effects Jakub Mareˇ cek IBM Research - Ireland TCD, March 20th, 2019

Upload: others

Post on 30-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Recommender Systems and their Effects

Jakub MarecekIBM Research - Ireland

TCD, March 20th, 2019

Page 2: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

The Agenda

1. “Up the wall game”: Matrix completion with interval uncertainty setsfor some entries. Sparse noise on top. Online variants.

2. “Now for something completely different”:What if everyone used the recommenders?

3. “What would the user do without the recommender?”:User modelling, when they interact with recommenders.

Based on joint work with a number of fabulous colleagues, incl. AlbertAkhriev (IBM), Jonathan Epperlein (IBM), Andre Fioravanti (UNICAMP),Mark Kozdoba (Technion), Shie Mannor (Technion), Peter Richtarik(Edinburgh/KAUST), Robert Shorten (IBM/UCD), Andrea Simonetto(IBM), Matheus Souza (UNICAMP), Martin Takac (Lehigh), TigranTchrakian (IBM), Fabian Wirth (Passau), Jing Xu (Penn/SingTel), and JiaYuan Yu (Concordia).

Page 3: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Matrix Completion in Recommender Systems

• Let us consider the most simple abstraction of a recommender system:

• There is a matrix, where each row corresponds to one user and eachcolumn corresponds to a product or service

• Every user rates only a modest number of products or services, i.e.,only some elements of the matrix are known

• Without imposing any further requirements on the matrix, there areinfinitely many completions

• The search for the most succinct explanation corresponds to rankminimisation

• Netflix prize: A 2006 challenge with a grand prize of US $1,000,000

Page 4: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Our Extensions

• The matrix changes over time, Mk at time k .

• Instead of some elements i , j of Mk , there are intervals [Mk,ij ,Mk,ij ],e.g., [x −∆, x + ∆].

• Sparse noise in the observations, block-wise (e.g., RGB in video,sensor readings from one site).

• Upper and lower bounds on all elements, e.g., for a scale [0,B] onehas [max{0, x −∆},min{x + ∆,B}].

Page 5: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Our Extensions

This is based on joint work with Albert Akhriev (IBM), DimitriosGunopulos (Athens), Vana Kalogeraki (Athens), Stathis Maroulis (Athens),Peter Richtarik (Edinburgh/KAUST), Andrea Simonetto (IBM), andMartin Takac (Lehigh):

• Matrix Completion under Interval UncertaintyEuropean Journal of Operational Research 256(1), 35–43, 2017https://arxiv.org/abs/1408.2467

• Low-rank Methods in Event Detectionhttps://arxiv.org/abs/1802.03649

• Pursuit of Low-Rank Models of Time-Varying Matrices Robust toSparse and Measurement Noisehttps://arxiv.org/abs/1809.03550

Page 6: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

The Off-line Setting

There exists Rk ∈ Rr×nN , such that our observations xd ∈ RnN for row dare

(xd)i = (1n − Ii ,k) ◦[(RT

k cd)i + (ed)i

]+ Ii ,k ◦ si , for block i , (1)

where

• vector cd ∈ Rr weighs the rows of matrix Rk ,

• ed ∈ RnN is the noise vector, where each entry be uniformlydistributed between known, fixed −∆ and ∆,

• si ∈ Rn is an arbitrary noise vector,

• Boolean vector Ii ∈ {0, 1}n has entries that are all ones or zerosdepending on whether we receive a measurement belonging to ourmodel or not,

• ◦ represents element-wise multiplication.

Page 7: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

The Off-line Setting

Assumption

For each (i , j) of Mk there is a finite element-wise upper bound Mk,ij anda finite element-wise lower bound Mk,ij .

This assumption is satisfied even for any missing values at ij when themeasurements lie naturally in a bounded set, e.g., [0, 255] in manycomputer-vision applications.

Page 8: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Algorithm 1: MACO

Page 9: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Algorithm 2: The Sparse Noise on top

Page 10: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Convergence Rate

Theorem

There exists τ > 0, such that Algorithm 1 with the initialization to all-zerovector after at most T = O(log 1

ε ) steps has f (CT ,RT ) 6 f ∗ + ε withprobability 1.

Notice that we do not assume α-MSC, β-MSS, and C -robust bistability,but rather prove them.

Page 11: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

The On-Line Setting

Assumption

The variation of the observation matrix Mk at two subsequent instant kand k − 1 is so to guarantee that

|f (Ck ,Rk ; Mk)− f (Ck ,Rk ; Mk−1)| 6 e,

for all instants k > 0.

Page 12: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

A Bound on the Tracking Error

Theorem

Let the two assumptions above hold. Then, Algorithm 2 starting from anall-zero matrices generates a sequence of matrices {(Ck ,Rk)} for which

f (Ck ,Rk ; Mk)− f (C∗k ,R∗k ; Mk) 6

η0(f (Ck−1,Rk−1; Mk−1)− f (C∗k−1,R∗k−1; Mk−1)) + η0e, (2)

for some η0 < 1. And in the limit,

lim supk→∞

f (Ck ,Rk ; Mk)− f (C∗k ,R∗k ; Mk) 6

η0e

1− η0=: E . (3)

Page 13: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Our Evaluation

• A benchmark in image in-painting

• Small Netflix: 95526× 3561 matrix of rank 2 or 3

• Well-known dataset: 100, 198, 805 ratings from 480, 189 usersconsidering 17, 770 products

• Yelp’s Academic Dataset1, from which we have extracted a252, 898× 41, 958 matrix with 1,125,458 non-zeros, again on the 1–5scale.

• changedetection.net

1https://www.yelp.co.uk/academic_dataset

Page 14: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Our Results: A Benchmark in In-painting

Inst./Alg. SVT SVP SoftImpute LMaFit ADMiRA JS OR1MP EOR1MP MACO

Barbara 26.9635 25.2598 25.6073 25.9589 23.3528 23.5322 26.5314 26.4413 23.8015Camer. 25.6273 25.9444 26.7183 24.8956 26.7645 24.6238 27.8565 27.8283 28.9670Clown 28.5644 19.0919 26.9788 27.2748 25.7019 25.2690 28.1963 28.2052 29.0057Couple 23.1765 23.7974 26.1033 25.8252 25.6260 24.4100 27.0707 27.0310 27.1824Crowd 26.9644 22.2959 25.4135 26.0662 24.0555 18.6562 26.0535 26.0510 26.1705Girl 29.4688 27.5461 27.7180 27.4164 27.3640 26.1557 30.0878 30.0565 30.4110Goldhill 28.3097 16.1256 27.1516 22.4485 26.5647 25.9706 28.5646 28.5101 28.6265Lenna 28.1832 25.4586 26.7022 23.2003 26.2371 24.5056 28.0115 27.9643 28.3581Man 27.0223 25.3246 25.7912 25.7417 24.5223 23.3060 26.5829 26.5049 26.5990Peppers 25.7202 26.0223 26.8475 27.3663 25.8934 24.0979 28.0781 28.0723 28.8469

Page 15: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Our Results: Small Netflix

0 2 4 6 8 101.5

2

2.5

3

RM

SE

Epoch

RMSE for rank =2, ∆ =0

RMSE for rank =2, ∆ =1

RMSE for rank =3, ∆ =0

RMSE for rank =3, ∆ =1

Page 16: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Our Results: A Well-Known Instance

0 500 1000 1500 2000 2500

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

Elapsed Time [s]

RM

SE

Test 1

Test 2

Test 4

Test 8

Test 16

Train 1

Train 2

Train 4

Train 8

Train 16

Page 17: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Our Results: A Well-Known Instance

0 100 200 300 400

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

Epochs

RM

SE

Test 1

Test 2

Test 4

Test 8

Test 16

Train 1

Train 2

Train 4

Train 8

Train 16

Page 18: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Our Results: Kaggle

• We performed 10-fold cross-validation on the training set, usingvarying rank.

• For rank 1 to 2, 4, 8, 16, 32, and 50, the average error decreased from1.7958 to 1.8284, 1.6464, 1.4590, 1.3395, 1.2702, and 1.2454,respectively.

• This seems to be comparable to the best results from the 2013Recommender Systems Challenge2,

2https://www.kaggle.com/c/yelp-recsys-2013

Page 19: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Our Results: changedetection.net

Page 20: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

A Summary of Part 1

• Robust optimisation improves the statistical performance and doesnot cost much, computationally.

• Present-best statistical performance in matrix completion on anin-painting benchmark.

• In the off-line case, wall-clock run-time on a single computercomparable to those achieved on a large cluster previously.

• In the on-line case, present-best guarantees and a method practical forvideo data.

• What would happen, if everyone received the same recommendations?

Page 21: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

The Agenda

1. “Up the wall game”: Matrix completion with interval uncertainty setsfor some entries. Sparse noise on top. Online variants.

2. “Now for something completely different”:What if everyone used the recommenders?

3. “What would the user do without the recommender?”:User modelling, when they interact with recommenders.

Based on joint work with a number of fabulous colleagues, incl. AlbertAkhriev (IBM), Jonathan Epperlein (IBM), Andre Fioravanti (UNICAMP),Mark Kozdoba (Technion), Shie Mannor (Technion/Ford), Peter Richtarik(Edinburgh/KAUST), Robert Shorten (IBM/UCD), Andrea Simonetto(IBM), Matheus Souza (UNICAMP), Martin Takac (Lehigh), TigranTchrakian (IBM), Fabian Wirth (Passau), Jing Xu (Penn/SingTel), and JiaYuan Yu (Concordia).

Page 22: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Closed-Loop Performance of Recommender Systems

Central Authority

Agent 1

ω1

Agent 2

ω2

Agent N

ωN{µt} {µt} {µt}

histogram

a1t

aNt

a2t

cM (nMt )nMt

c1(n1t )

n1t

mux

z−1

c(nt)

st

Page 23: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Closed-Loop Performance of Recommender Systems

• Consider route recommendations to drivers in the form of publicsignalling

• Even if transport authorities just provide information about traveltimes to tell the drivers, the drivers pick their route based on theinformation.

• Assume that travel time depends only on the number of concurrentusers of the road(degree-4 polynomial link performance functions in HCM 2010,“Bureau of Public Roads functions”)

• Stability is sought while some measure of system performance is to beregulated

• Solutions have to be acceptable to road users, e.g., each user istreated fairly, independent of the initial state

Page 24: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

A Simplified Model

• Transport can be modelled as a discrete-time dynamical system:• N agents want to travel from origin O to a destination D• A,B, . . . , denote the alternative routes from O to D• travel time on route A is a function of the number of users on the

roads cA : N→ R+

• nAt agents pick A at time t• social cost C (nAt ) weights the costs of actions with the proportions of

agents taking the actions, i.e., C (nAt ) , nAtN · cA(nAt ) + nBt

N · cB(nBt ) + . . .

0.0 0.2 0.4 0.6 0.8 1.01.01.21.41.61.82.02.22.4

c X(n

)

n/N

0.0 0.2 0.4 0.6 0.8 1.01.2

1.4

1.6

1.8

2.0

2.2

2.4

C(n

)

n/N

Figure: An Example. Left: cA , 1.2 + x/N (dashed line) andcB , 1 + (1.08− x/N)−1/22 (solid line). Right: Social cost.

Page 25: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

A Simplified Model

• A central agent has information about the state of the system andpicks signals to send to users: s it := (yA,it , yB,it ), where

yA,it , cA(nAt−1), yB,it , cB(nBt−1),

(we will call this 0-scalar signaling later).

• Each driver i picks action ait ,

ait =

{arg minX∈{A,B,...} yX ,it−1, if t > 1,

A, otherwise,(4)

i.e. optimises a myopic objective.

Page 26: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

A Simplified Model

• Let us consider two roads from O to D, e.g.:I A local road, where the travel time increases sharply with congestionI A highway, where the free-flow travel time is higher than for the local

road, but it does not increase as much with congestion.

• Let us consider a number of agents who start traveling from O to Dat the same time.

• If all the agents decide that the travel time on the local road was lessthan on the highway, all will use the local road.

• You obtain a limit cycle, where the social cost (travel time, summedacross all users), will be arbitrarily higher than in the social optimum.

Page 27: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

The Actual Model

• There is a finite set {1, 2, . . . ,M} of alternative roads,from which each driver chooses exactly 1 at every time t.

• The travel-time for m is cm(nmt ), where nmt denotes the number ofdrivers choosing m at time t.

• Vector nt :=[n1t · · · nMt

]is the congestion profile.

• Central authority knows the history of congestion profiles {nτ}t−1τ=1

• The goal is to minimise the aggregate travel time∑Mm=1(nmt /N)cm(nmt )

• Two scalars umt and vmt are used to describe resource m.

Page 28: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

r -Extreme Signalling

• Send an interval per road segment m at each time t:

umt := arg minj=t−r ,...,t−1

{cm(nmj )} (5)

vmt := arg maxj=t−r ,...,t−1

{cm(nmj )} (6)

• Each agent has its own myopic policy parametrized by ω ∈ [0, 1]:

πω(st) := arg minm=1,...,M

ωumt + (1− ω)vmt . (7)

Page 29: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Weak Convergence for r -Extreme Signalling

Assumption (i.i.d. µt , “Population Renewal”)

The distribution µt is an i.i.d. sequence of random variables with P(µt =ηk) = dk , 0 < dk < 1 for all t and k.

Theorem (Marecek et al., IJC 2016)

There exists constant k ′, such that under the assumption above, if thefunctions {cm : m = 1, . . . ,M} are `-Lipschitz continuous for ` < 1/(Nk ′),there exists a unique limit random variable Z such that the congestionprofile nt/N converges to Z in distribution as t →∞.

Page 30: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Exponential Smoothing

• Send an interval signal employing exponential smoothing on the pastcosts of resource m to obtain umt , and a measure vmt of their volatility:

umt := (1− q1)(cm(nmt−1) + q1cm(nmt−2) + · · ·+ qt−1

1 cm(nm1 ))

vmt := (1− q2)(∣∣cm(nmt−1)− umt−1

∣∣+ q2

∣∣cm(nmt−2)− umt−2

∣∣+ · · ·

+ qt−12

∣∣cm(nm1 )− um1∣∣),

(8)

Page 31: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Population Dynamics

• Each driver has its own myopic policy parametrized by ω ∈ [0, 1]:

πω(st) := arg minm=1,...,M

ωumt + (1− ω)vmt . (9)

• There is a family of possible distributions of the levels ω of riskaversion, with a finite index set J = {1, . . . ,K}

• A Markov chain with K states and transition probability matrixP ∈ [0, 1]K×K

• The probability of appearance of population ηj , j ∈ J , at iterationt + 1 is now given by P(µt+1 = ηj | µt = i) = pij , i.e., the probabilityof a specific ηj depends on what the last observed population was.

Page 32: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Population Dynamics

A simple example:

P =

0 1 0 00 0 1 00 0 0 11 0 0 0

1

“noon”

“evening”

“night”

“morning”

2

“noon”

“evening”

“night”

“morning” 3

“noon”

“evening”

“night”

“morning”

4

“noon”

“evening”

“night”

“morning”

Page 33: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Weak Convergence for Exponential Smoothing

Theorem (Epperlein, JM, Allerton 2017)

There exists a set of constants κ′m, m = 1, . . . ,M such that if the costscm(·) are 1/κ′m-Lipschitz (with respect to the 1-norm), and q2 > q1, thesignal st and the congestion profile nt converge in distribution as t →∞.

Page 34: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Population Dynamics II

A more complicated example:

• The probability of the next population being ηj given that the currentone is ηi should depend on “how different” they are.

• Let ∆(·, ·) be a metric on the space of populations, e.g., Wassersteinmetric, substitution metric

• Probability pij should be a decreasing function of ∆(ηi , ηj)

• A parameter ψ ∈ (0, 1) that reflects the probability of an agentchanging its policy, pij := ψ∆(ηi ,ηj ).

Page 35: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Weak Convergence for Exponential Smoothing

Theorem (“Asymptotic Stability with Population Dynamics II”)

There exists a set of constants κ′m, m = 1, . . . ,M such that if the costscm(·) are 1/κ′m-Lipschitz (with respect to the 1-norm), and q2 > q1, thesignal st and the congestion profile nt converge in distribution as t →∞.

Page 36: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Further Pointers

This part is based on joint work with Jonathan Epperlein, Robert Shorten,and Jia Yuan Yu:

• Signalling and Obfuscation for Congestion Controlhttps://arxiv.org/abs/1406.7639

http://dx.doi.org/10.1080/00207179.2015.1033758

• r -Extreme Signalling for Congestion Controlhttps://arxiv.org/abs/1404.2458

http://dx.doi.org/10.1080/00207179.2016.1146968

• Distributional Robustness in Congestion Controlhttps://arxiv.org/abs/1705.09152 (ITS)

• Resource Allocation with Population Dynamicshttps://arxiv.org/abs/1703.07308

http://dx.doi.org/10.1109/ALLERTON.2017.8262886

Page 37: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

The Regulation Problem

• A controller C, which represents the central authority with privatestate xc(k) ∈ Rnc , produces a signal π(k) ∈ Π ⊆ Rnπ at time k .

• xi (k) is the use of agent i at time k (a random variable)

• y(k)def=∑N

i=1 xi (k) is the aggregate resource utilisation at time k (arandom variable)

• y(k) is the observable output of a filter F on y(k)

• r of the desired utilisation of the resource.

• e(k) is the error signal y(k)− r utilised by the controller.

Page 38: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

The Regulation Problem

Page 39: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

The Regulation Problem

With probability 1, we would like to see:

G1 feasibility: for all k ∈ N

N∑i=1

xi (k) = y(k) 6 r . (10)

G2 predictability: for each agent i there exists a constant r i such that

limk→∞

1

k + 1

k∑j=0

xi (j) = r i , (11)

where this latter limit is independent of initial conditions.

Page 40: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

A Generalisation of Agents’ ResponseIn particular:• Wi ∈ N state transition maps wij : Rni → Rni , j = 1, . . . ,Wi andHi ∈ N output maps hil : Rni → Ri , l = 1, . . . ,Hi .

•xi (k + 1) ∈ {wij(xi (k)) | j = 1, . . . ,Wi} (12)

yi (k) ∈ {hij(xi (k)) | j = 1, . . . ,Hi} (13)

where the choice of agent i ’s response at time k is governed by aprobability functions pij : Π→ [0, 1], j = 1, . . . ,Wi , respectivelyp′il : Π→ [0, 1], l = 1, . . . ,Hi . Specifically, we have for all k ∈ N andall π(k) ∈ Π that

P(xi (k + 1) = wij(xi (k))) = pij(π(k)), (14a)

P(yi (k) = hil(xi (k))) = p′il(π(k)). (14b)

Wi∑j=1

pij(π) =

Hi∑l=1

p′il(π) = 1. (14c)

Page 41: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Negative Results

• The convergence of error e = r − y , limk→∞ e(k) = 0, is oftenassumed to be assured by controllers with integral action, such as theProportional-Integral (PI):

π(k) = π(k − 1) + κ[e(k)− αe(k − 1)

], (15)

which means its transfer function from e to π is given by

C (z)def=

π(z)

e(z)= κ

1− αz−1

1− z−1. (16)

• The integral action may be heavily dependent on the controller’sinitial state.

• Our results apply to any controller with any sort of integral action,i.e., pole at z = 1. We also have negative results for switchedcontrollers, etc.

Page 42: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Negative Results: An Example

• Sets Ai , r and the coefficients of F are rational, pij continuous.

• N = 10 agents, whose states xi are in the set {0, 1}, with x1 to x5

having (i = 1, . . . , 5)

pi1(xi (k + 1) = 1) = 0.02 +0.95

1 + exp(−100(π(k)− 5)),

whereas the remaining agents’ (for i = 6, . . . , 10)

pi1(xi (k + 1) = 1) = 0.98− 0.95

1 + exp(−100(π(k)− 1)).

• r = 5. If the control signal π(k)� 5, then the first five agents aremore likely to be active. On the other hand, if π(k)� 1, thenremaining ones are more likely to take the resource.

• PI controller with κ = 0.1 and α = −4.

• Lag controller with κ = 0.1, α = −4.01 and β = 0.99.

Page 43: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Negative Results: A Sample Trajectory

Figure: Filter output for a single simulation.

Page 44: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Negative Results: The Ergodic Aggregate Behaviour

Figure: Average number of active systems. Regulation is observed for the PI and for thelag controllers, within a given precision.

Page 45: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Negative Results: Non-Ergodic for the Individual

Figure: Average trajectory of the first agent for both controllers and for both initialcontroller states. Predictability is lost for the PI controller.

Page 46: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Negative Results: Non-Ergodic for the Individual

Figure: Average value for x1(1000) for different initial conditions of both controllers.

Page 47: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Negative Results: The Non-Ergodic Signal

Figure: Average value of the broadcast signal π(k) for both controllers and for bothinitial controller states.

Page 48: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Negative Results Formalised

Theorem

Consider N agents with states xi , i = 1, . . . ,N. Assume that there is anupper bound L on the different values the agents can attain, i.e., for eachi we have xi ∈ Ai = {a1, . . . , aLi} ⊂ R for a given set Ai and 1 6 Li 6 L.Consider a system, where F : y 7→ y is a finite-memory moving-average(FIR) filter. Assume the controller CL is a linear marginally stable single-input single-output (SISO) system with a pole s1 = eqiπ on the unit circlewhere q is a rational number. In addition, let the probability functionspij : R → [0, 1] be continuous for all i = 1, . . . ,N, j = 1, . . . ,Mi , i.e., ifπ(k) is the output of CL at time k, then P(xi (k + 1) = aj) = pij(π(k)).Then:

(i) The set OF of possible output values of the filter F is finite.

(ii) If the real additive group E generated by {r − y | y ∈ OF} isdiscrete, then the closed loop cannot be ergodic.

Page 49: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Positive Results I: Assumptions I

• An alternative linear controller:

C :

{xc(k + 1) = Acxc(k) + Bce(k),

π(k) = Ccxc(k) + Dce(k),(17)

where xc : N→ Rnc is its internal state.

• A linear model for the filter F , e.g. for y(k) :=∑N

i=1 xi (k): which is

F :

{xf (k + 1) = Af xf (k) + F1y(k) + F2y(k),

y(k) = Cf xf (k),(18)

where y stores the previous M values of y .

Page 50: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Positive Results II: Assumptions II

• That is, y evolves by y(k + 1) = Jy(k) + Ly(k) with

J =

100...0

, L =

0 0 · · · 0 01 0 · · · 0 00 1 · · · 0 0...

.... . .

......

0 0 · · · 1 0

. (19)

Page 51: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Positive Results III

Theorem

Consider the system with C and F above. Assume that each agent i ∈{1, · · · ,N} has state xi with dynamics governed by the following affinestochastic difference equation:

xi (k + 1) = wij(xi ), (20)

where the affine mapping wij is chosen at each step of time according to a

Dini-continuous probability function pij(xi , π(k)), out of wij(xi )def= Aixi +

bij , where Ai is a Schur matrix and for all i , π(k),∑

j pij(xi , π(k)) = 1. Inaddition, suppose that there exist scalars δi > 0 such that pij(xi , π) > δi >0; that is, the probabilities are bounded away from zero. Then, for everystable linear controller C and every stable linear filter F , the feedback loopconverges in distribution to a unique invariant measure.

Page 52: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Positive Results IV

Consider a non-linear system with:{xi (k + 1) ∈ {wi (xi (k)) | j = 1, . . . ,Wi}yi (k) ∈ {hi (xi (k)) | j = 1, . . . ,Hi},

(21)

y(k) =N∑i=1

yi (k), (22)

F :

{xf (k + 1) = wf (xf (k), y(k))

y(k) = hf (xf (k), y(k)),(23)

C :

{xc(k + 1) = wc(xc(k), y(k), r)π(k) = hc(xc(k), y(k), r),

(24)

Page 53: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Positive Results V

• Let Xi , i = 1, . . . ,N,XC and XF be the state spaces of the agents,the controller and the filter,

• The system evolves on X :=∏N

i=1 Xi × XC × XF as

x(k + 1) :=

(xi )Ni=1

xfxc

(k + 1) ∈ {Fm(x(k)) |m ∈M}. (25)

• Each map Fm is of the form Fm(x(k)) =

(wij (xi (k)))Ni=1wf (xf (k),

∑Ni=1 hil (xi (k)))

wc (xc (k), hf (xf (k)))

• Maps Fm are indexed by m from (26), chosen w/ (qm(π(k))):

N∏i=1

{(i , 1), . . . , (i ,Wi )} ×N∏i=1

{(i , 1), . . . , (i ,Hi )}. (26)

P (x(k + 1) = Fm(x(k))) =

(N∏i=1

piji (π(k))

)(N∏i=1

p′ili (π(k))

).

(qm(π(k)))

Page 54: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Positive Results VI

Theorem

Assume that each agent i has a state xi (k + 1) = wij(xi (k)), yi (k) =hij(xi (k)) where wij and hij are globally Lipschitz-continuous functions wlij , resp. l ′ij and Dini-continuous probability functions pij , p

′il and scalars

δ, δ′ > 0 such that pij(π) > δ > 0, p′ij(π) > δ′ > 0 for all (i , j), and one ofthe following holds:

(a) “contractivity”: for all 1 ≤ i ≤ N, 1 ≤ j ≤ J, lij < 1 and l ′ij < 1.

(b) “average contractivity”: for all 1 ≤ i ≤ N,∑J

j=1 pij(xi )lij < 1; for all

1 ≤ i ≤ N,∑J

j=1 p′ij(xi )l

′ij < 1.

(c) “marginal contractivity”: for all 1 ≤ i ≤ N, 1 ≤ j ≤ J, lij ≤ 1, withprobability 1, there exist i , j , such that li ,j < 1. Noticepij(xi ) ≥ δi > 0 by definition. Likewise for l ′i .

Then, for every stable linear controller C and every stable linear filter F ,the loop has a unique attractive invariant measure.

Page 55: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Positive Results VII

Next:

• Agents’ actions are limited to a finite set.

• Lipschitz conditions in the previous cannot be satisfied except intrivial cases.

• XS :=∏N

i=1Ai is finite and we consider the graph G = (XS ,E ) wherethere is an edge from a tuple (xi ) ∈ XS to (yi ) ∈ XS if there is achoice of maps wij such that (wij(xi )) = (yi ).

Page 56: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Positive Results VIII

Theorem

Consider the feedback system depicted in Figure 1. Consider the feedbacksystem depicted in Figure 1. Assume that Ai is finite for each i . Assumethat each agent i ∈ {1, · · · ,N} has a state governed by the non-linearstochastic difference equations above. Assume we have Dini continuousprobability functions pij , p

′il so that the probabilistic laws above are satisfied.

Assume furthermore that there are scalars δ, δ′ > 0 such that pij(π) > δ >0, p′ij(π) > δ′ > 0 for all (i , j). Then, for every stable linear controller C andevery stable linear filter F the following holds: If the graph G = (XS ,E ) isstrongly connected, then there exists an invariant measure for the feedbackloop. If in addition, the adjacency matrix of the graph is primitive, thenthe invariant measure is attractive and the system is ergodic.

Page 57: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Further Pointers

This part is based on joint work with Andre Fioravanti, Robert Shorten,Matheus Souza, and Fabian Wirth:

• On Classical Control and Smart Citieshttps://arxiv.org/abs/1703.07308

https://dx.doi.org/10.1109/CDC.2017.8263852

• On the Ergodic Control of Ensembleshttps://arxiv.org/pdf/1807.03256

Page 58: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

The Agenda

1. “Up the wall game”: Matrix completion with interval uncertainty setsfor some entries. Sparse noise on top. Online variants.

2. “Now for something completely different”:What if everyone used the recommenders?

3. “What would the user do without the recommender?”:User modelling, when they interact with recommenders.

Based on joint work with a number of fabulous colleagues, incl. AlbertAkhriev (IBM), Jonathan Epperlein (IBM), Andre Fioravanti (UNICAMP),Mark Kozdoba (Technion), Shie Mannor (Technion), Peter Richtarik(Edinburgh/KAUST), Robert Shorten (IBM/UCD), Andrea Simonetto(IBM), Matheus Souza (UNICAMP), Martin Takac (Lehigh), TigranTchrakian (IBM), Fabian Wirth (Passau), Jing Xu (Penn/SingTel), and JiaYuan Yu (Concordia).

Page 59: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

“What would the user do without the recommender?”

• If we want to say our recommender results in a better performancethan someone else’s recommender, we need a model of what wouldhave the user done without the recommender.

• Such a “plan vanilla” user model is often not available:users already receive some recommendations.

• For example, consider advanced traveller information systems (ATIS)or satellite navigation systems.

Page 60: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Markov-Modulated Markov Chains

• Consider a Markov chain R with state space [R] and state rt ∈ [R], inwhich the transition probabilities

P(rt = j | rt−1 = i) = aRij (st)

depend on a latent random variable st .

• “Markov chain is modulated by the random variable st”

• “Markov-modulated Markov chain” (MMMC): st is the state ofanother Markov chain S with transition matrix AS and state space [S ].

Page 61: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Closed-Loop Markov-Modulated Markov Chains

• The current state of the visible Markov chain R also modulates thetransition probabilities in S .

• The closed-loop MMMC (clMMMC) is a tuple µ = (AR(·),AS(·)),where AR and AS both have pages.

• There is a partition Γ = {Γ1, . . . , Γp} of [R] such that there is a pagein AS for each Γi ; A

S : [p]→ [0, 1]S×S , withP(st = j | st−1 = i) = aSij (γ(rt)).

Page 62: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Parameter Estimation

• A parameter estimation problem: Given a sequence of observations,i.e. a realization (r1r2 · · · rT ) of the process defined by S and R, whatare the underlying transition probabilities AS(`), ` = 1, . . . , p, AR(`),` = 1, . . . ,S?

• An EM-type algorithm provides local optima of the maximumlikelihood.

• More sophisticated algorithms allow for regret bounds.

• Notice, however, that until a while ago, no regret bounds (againstKalman filters) were available even for linear dynamical systems.

Page 63: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

A Linear Dynamical System

Consider a simple linear dynamical system (G ,F , v ,W ) is:

φt = Gφt−1 + ωt (27)

Yt = F ′φt + νt , (28)

where

• Yt are scalar observations,

• φt ∈ Rn×1 is the hidden state,

• G ∈ Rn×n is the state transition matrix,

• F ∈ Rn×1 is the observation direction.

• ωt is the process noise, N (0,W )

• νt is the observation noise, N (0, v)

• φ0 is initial state, N (m0,C0).

Page 64: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Kalman Filter

• Kalman filter is a key tool for time-series forecasting and analysis.

• An estimate of the current hidden state, given the observations fort > 1:

mt = E (φt |Y0, . . . ,Yt) , (29)

and let Ct be the covariance matrix of φt given Y0, . . . ,Yt .

• Forecast of the next observation, given the current data:

ft+1 = E (Yt+1|Yt , . . . ,Y0) = F ′Gmt . (30)

Page 65: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Kalman Filter Unrolled

ft+1 = F ′GAtYt + F ′s−1∑j=0

[(j∏

i=0

Zt−i

)GAt−j−1Yt−j−1

]︸ ︷︷ ︸

AR(s+1)

+ F ′

(s∏

i=0

Zt−i

)at−s .︸ ︷︷ ︸

Remainder term

(31)

Page 66: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

The Results

Theorem

If the covariance matrix of the process noise is non-zero, then there isγ = γ(W , v ,F ,G ) < 1 such that for every x ∈ Rn,

[(I − A⊗ F )Ux , (I − A⊗ F )Ux ] 6 γ [x , x ] , (32)

where [x , y ] = 〈Rx , y〉 is the inner product induced by the limit R of Rt onRn.

Page 67: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

The Results

Theorem (LDS Approximation)

Let L = L(F ,G , v ,W ) be an observable LDS with W > 0.

1. For any ε > 0, and any B0 > 0, there is T0 > 0, s > 0 and θ ∈ Rs ,such that for every sequence Yt with |Yt | 6 B0, and for every t > T0,∣∣∣∣∣ft+1 −

s−1∑i=0

θiYt−i

∣∣∣∣∣ 6 ε. (33)

2. For any ε, δ > 0, and any B1 > 0, there is T0 > 0, s > 0 and θ ∈ Rs ,such that for every sequence Yt with |Yt+1 − Yt | 6 B1, and for everyt > T0, ∣∣∣∣∣ft+1 −

s−1∑i=0

θiYt−i

∣∣∣∣∣ 6 2 max (ε, δ |Yt |) . (34)

Page 68: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

The Algorithm

1: Input: Regression length s, domain bound D.Observations {Yt}∞0 , given sequentially.

2: Set the learning rate ηt = t−12 .

3: Initialize θs arbitrarily in D.4: for t = s to ∞ do5: Predict yt =

∑s−1i=0 θt,iYt−i−1

6: Observe Yt and compute the loss `t(θt)7: Update θt+1 ← πD (θ − ηt∇`t(θt))8: end for

Page 69: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

The Regret Bound

Theorem

Let S be a finite family of LDSs, such that every L = L(F ,G , v ,W ) ∈ S ,is observable and has W > 0. Let B0 be given. For any ε > 0, there ares,D, and CS , such that the following holds:

For every sequence Yt with |Yt | 6 B0, if θt is a sequence produced by thealgorithm with parameters s and D, then for every T > 0,

T∑t=0

`t(θt)−minL∈S

T∑t=0

`(Yt , ft(L)) 6 CS + 2(D2 + B20 )√T + εT . (35)

Page 70: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Further Pointers

This part was based on joint work with Jonathan Epperlein, MarkKozdoba, Shie Mannor, Robert Shorten, Tigran Tchrakian, and Jing Xu:

• On-Line Learning of Linear Dynamical Systems: ExponentialForgetting in Kalman FiltersAAAI 2019, https://arxiv.org/abs/1809.05870

• Parameter Estimation in Gaussian Mixture Models with MaliciousNoise, without Balanced Mixing CoefficientsAllerton 2019, https://arxiv.org/abs/1711.08082

• Recovering Markov Models from Closed-Loop Datahttps://doi.org/10.1016/j.automatica.2019.01.022

https://arxiv.org/pdf/1706.06359.pdf

• Parameter Estimation for Closed-Loop Markov Modulated MarkovChains with Applications in Recommender Systemssubmitted

Page 71: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

The Conclusions

• “Data knots”.

• Closed-loop analyses of the effects of recommender systems.

• Even if the models of decision making are linear (e.g., Markovian), theclosed-loop interactions with the recommenders render the parameterestimation non-convex.

• Questions and comments most welcome!

• We are seeking PhD students for internships!

• This work has been supported in part by the EU H2020 project VaVeL[grant agreement 688380].

Page 72: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Dini Continuity from Wikipedia

• Let X be a compact subset of a metric space (such as Rn), and letf : X → X be a function from X into itself.

• The modulus of continuity of f is ωf (t) = supd(x ,y)≤t d(f (x), f (y)).

• The function f is called Dini-continuous if∫ 1

0ωf (t)t dt <∞.

• An equivalent condition is that, for any θ ∈ (0, 1),∑∞

i=1 ωf (θia) <∞where a is the diameter of X .

Page 73: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Weak Convergence

• Convergence in distribution of x(1), x(2), x(3), . . . to a randomvariable X : limn→∞ Fn(x) = F (x), for all x where F is continuous,where Fn and F are the cumulative distribution functions of x(k) andX

• Almost sure convergence to a limit: the measure of the set of samplepaths which are converging is 1 (with respect to a probability measureon the set of sample paths)

• Convergence in probability: for every ε > 0 and δ > 0 there exists ak0 such that for all k > k0 the probability of being further away fromthe limit than δ is smaller than ε.

• Almost sure convergence implies convergence distribution andconvergence in probability

• Patrick Billingsley: Convergence of Probability Measures (Wiley Seriesin Probability and Statistics)

Page 74: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Coupling Arguments I

• The proof relies on a theorem of Hairer, who won a Fields medal for it

• Coupling arguments provide criteria for the non-existence of a uniqueinvariant measure, essentially linking the existence of a coupling withthe forgetfulness of initial conditions.

• Σ∞ (the “path space”) is the space of trajectories of a Σ-valuedMarkov chain {X (k)}k∈N, i.e., the space of all sequences(x(0), x(1), x(2), . . .) with x(k) ∈ Σ, k ∈ N.

• A coupling of two measures Pµ1 ,Pµ2 ∈ M(Σ∞) is a measure onΣ∞ × Σ∞ whose marginals coincide with Pµ1 ,Pµ2 .

• The set C (Pµ1 ,Pµ1) of couplings of Pµ1 ,Pµ2 ∈ M(Σ∞) is thendefined by

{Γ ∈ M(Σ∞ × Σ∞) : Π(1)Γ = Pµ1 ,Π(2)Γ = Pµ2}.

Page 75: Recommender Systems and their Effects - IBMMatrix Completion in Recommender Systems Let us consider the most simple abstraction of a recommender system: There is a matrix, where each

IBM Research - Ireland

Coupling Arguments II

• We say that a coupling Γ is an asymptotic coupling if Γ has fullmeasure on the pairs of convergent sequences. To make this preciseconsider the following set denoted D:{

(x1, x2) ∈ Σ∞ × Σ∞ : limk→∞

‖x1(k)− x2(k)‖ = 0

}Γ is an asymptotic coupling if Γ(D) = 1.

Theorem (Hairer et al.)

Let P be a Markov operator admitting two ergodic invariant measures µ1

and µ2. The following are equivalent:

(i) µ1 = µ2.

(ii) There exists an asymptotic coupling of Pµ1 and Pµ2 .

If no asymptotic coupling of Pµ1 and Pµ2 exists, then µ1 and µ2 aredistinct.