bilinear prediction using low-rank modelsmodern prediction problems in machine learning ad-word...

80
Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J. Hsieh, P. Jain, N. Natarajan, H. Yu and K. Zhong Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Upload: others

Post on 10-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Bilinear Prediction Using Low-Rank Models

Inderjit S. DhillonDept of Computer Science

UT Austin

26th International Conference on Algorithmic Learning TheoryBanff, Canada

Oct 6, 2015

Joint work with C-J. Hsieh, P. Jain, N. Natarajan, H. Yu and K. Zhong

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 2: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Outline

Multi-Target Prediction

Features on Targets: Bilinear Prediction

Inductive Matrix Completion

1 Algorithms

2 Positive-Unlabeled Matrix Completion

3 Recovery Guarantees

Experimental Results

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 3: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Sample Prediction Problems

Predicting stock prices

Predicting risk factors in healthcare

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 4: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Regression

Real-valued responses (target) t

Predict response for given input data (features) a

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 5: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Linear Regression

Estimate target by a linear function of given data a, i.e. t ≈ t = aTx.

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 6: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Linear Regression: Least Squares

Choose x that minimizes

Jx =1

2

n∑i=1

(aTi x− ti )2

Closed-form solution: x∗ = (ATA)−1AT t.

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 7: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Prediction Problems: Classification

Spam detection

Character Recognition

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 8: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Binary Classification

Categorical responses (target) t

Predict response for given input data (features) a

Linear methods — decision boundary is a linear surface or hyperplane

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 9: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Linear Methods for Prediction Problems

Regression:

Ridge Regression: Jx = 12

∑ni=1(aTi x− ti )

2 + λ‖x‖22.

Lasso: Jx = 12

∑ni=1(aTi x− ti )

2 + λ‖x‖1.

Classification:

Linear Support Vector Machines

Jx =1

2

n∑i=1

max(0, 1− tiaTi x) + λ‖x‖2

2.

Logistic Regression

Jx =1

2

n∑i=1

log(1 + exp(−tiaTi x)) + λ‖x‖22.

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 10: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Linear Prediction

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 11: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Linear Prediction

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 12: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-Target Prediction

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 13: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Modern Prediction Problems in Machine Learning

Ad-word Recommendation

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 14: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Modern Prediction Problems in Machine Learning

Ad-word Recommendation

geico auto insurance

geico car insurance

car insurance

geico insurance

need cheap auto insurance

geico com

car insurance coupon code

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 15: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Modern Prediction Problems in Machine Learning

Wikipedia Tag Recommendation

Learning in computervision

Machine learning

Learning

Cybernetics...

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 16: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Modern Prediction Problems in Machine Learning

Predicting causal disease genes

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 17: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Prediction with Multiple Targets

In many domains, goal is to simultaneously predict multiple targetvariables

Multi-target regression: targets are real-valued

Multi-label classification:targets are binary

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 18: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Prediction with Multiple Targets

Applications

Bid word recommendation

Tag recommendation

Disease-gene linkage prediction

Medical diagnoses

Ecological modeling

. . .

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 19: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Prediction with Multiple Targets

Input data ai is associated with m targets, ti = (t(1)i , t

(2)i , . . . , t

(m)i )

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 20: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-target Linear Prediction

Basic model: Treat targets independently

Estimate regression coefficients xj for each target j

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 21: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-target Linear Prediction

Assume targets t(j) are independent

Linear predictive model: ti ≈ aTi X

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 22: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-target Linear Prediction

Assume targets t(j) are independent

Linear predictive model: ti ≈ aTi X

Multi-target regression problem has a closed-form solution:

VAΣ−1A U>A T = arg min

X‖T − AX‖2

F

where A = UAΣAVTA is the thin SVD of A

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 23: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-target Linear Prediction

Assume targets t(j) are independent

Linear predictive model: ti ≈ aTi X

Multi-target regression problem has a closed-form solution:

VAΣ−1A U>A T = arg min

X‖T − AX‖2

F

where A = UAΣAVTA is the thin SVD of A

In multi-label classification: Binary Relevance (independent binary classifierfor each label)

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 24: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-target Linear Prediction: Low-rank Model

Exploit correlations between targets T , where T ≈ AX

Reduced-Rank Regression [A.J. Izenman, 1974] — model thecoefficient matrix X as low-rank

A. J. Izenman. Reduced-rank regression for the multivariate linear model . Journal of Multivariate Analysis 5.2 (1975): 248-264.

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 25: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-target Linear Prediction: Low-rank Model

X is rank-k

Linear predictive model: ti ≈ aTi X

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 26: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-target Linear Prediction: Low-rank Model

X is rank-k

Linear predictive model: ti ≈ aTi X

Low-rank multi-target regression problem has a closed-form solution:

X ∗ = minX :rank(X )≤k

‖T − AX‖2F

=

VAΣ−1

A U>A Tk if A is full row rank,

VAΣ−1A Mk otherwise,

where A = UAΣAVTA is the thin SVD of A, M = U>A T , and Tk , Mk are

the best rank-k approximations of T and M respectively.

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 27: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Modern Challenges

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 28: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-target Prediction with Missing Values

In many applications, several observations (targets) may be missing

E.g. Recommending tags for images and wikipedia articles

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 29: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Modern Prediction Problems in Machine Learning

Ad-word Recommendation

geico auto insurance

geico car insurance

car insurance

geico insurance

need cheap auto insurance

geico com

car insurance coupon code

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 30: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-target Prediction with Missing Values

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 31: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-target Prediction with Missing Values

Low-rank model: ti = aTi X where X is low-rank

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 32: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Canonical Correlation Analysis

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 33: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Bilinear Prediction

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 34: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Bilinear Prediction

Augment multi-target prediction with features on targets as well

Motivated by modern applications of machine learning —bioinformatics, auto-tagging articles

Need to model dyadic or pairwise interactions

Move from linear models to bilinear models — linear in input featuresas well as target features

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 35: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Bilinear Prediction

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 36: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Bilinear Prediction

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 37: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Bilinear Prediction

Bilinear predictive model: Tij ≈ aTi Xbj

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 38: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Bilinear Prediction

Bilinear predictive model: Tij ≈ aTi Xbj

Corresponding regression problem has a closed-form solution:

VAΣ−1A U>A TUBΣ−1

B V TB = arg min

X‖T − AXB>‖2

F

where A = UAΣAV>A , B = UBΣBV

>B are the thin SVDs of A and B

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 39: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Bilinear Prediction: Low-rank Model

X is rank-k

Bilinear predictive model: Tij ≈ aTi Xbj

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 40: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Bilinear Prediction: Low-rank Model

X is rank-k

Bilinear predictive model: Tij ≈ aTi Xbj

Corresponding regression problem has a closed-form solution:

X ∗ = minX :rank(X )≤k

‖T − AXB>‖2F

=

VAΣ−1

A U>A TkUBΣ−1B V T

B if A,B are full row rank,

VAΣ−1A MkΣ−1

B V TB otherwise,

where A = UAΣAV>A , B = UBΣBV

>B are the thin SVDs of A and B,

M = U>A TUB , and Tk , Mk are the best rank-k approximations of Tand M

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 41: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Modern Challenges in Multi-Target Prediction

Millions of targets

Correlations among targets

Missing values

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 42: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Modern Challenges in Multi-Target Prediction

Millions of targets (Scalable)

Correlations among targets (Low-rank)

Missing values (Inductive Matrix Completion)

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 43: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Bilinear Prediction with Missing Values

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 44: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Matrix Completion

Missing Value Estimation ProblemMatrix Completion: Recover a low-rank matrix from observed entries

Matrix Completion: exact recovery requires O(kn log2(n)) samples,under the assumptions of:

1 Uniform sampling2 Incoherence

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 45: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Inductive Matrix Completion

Inductive Matrix Completion: Bilinear low-rank prediction with missingvalues

Degrees of freedom in X are O(kd)

Can we get better sample complexity (than O(kn))?

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 46: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Algorithm 1: Convex Relaxation

1 Nuclear-norm Minimization:

min ‖X‖∗s.t. aTi Xbj = Tij , (i , j) ∈ Ω

Computationally expensive

Sample complexity for exact recovery: O(kd log d log n)

Conditions for exact recovery:

C1. Incoherence on A,B.C2. Incoherence on AU∗ and BV∗,where X∗ = U∗Σ∗V

T∗ is the SVD of

the ground truth X∗

C1 and C2 are satisfied with high probability when A,B are Gaussian

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 47: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Algorithm 1: Convex Relaxation

Theorem (Recovery Guarantees for Nuclear-norm Minimization)

Let X∗ = U∗Σ∗VT∗ ∈ Rd×d be the SVD of X∗ with rank k . Assume A,B are

orthonormal matrices w.l.o.g., satisfying the incoherence conditions. Then ifΩ is uniformly observed with

|Ω| ≥ O(kd log d log n),

the solution of nuclear-norm minimization problem is unique and equal toX∗ with high probability.

The incoherence conditions are

C1. maxi∈[n]‖ai‖2

2 ≤µd

n, max

j∈[n]‖bj‖2

2 ≤µd

n

C2. maxi∈[n]‖UT∗ ai‖2

2 ≤µ0k

n, max

j∈[n]‖V T∗ bj‖2

2 ≤µ0k

n

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 48: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Algorithm 2: Alternating Least Squares

Alternating Least Squares (ALS):

minY∈Rd1×kZ∈Rd2×k

∑(i ,j)∈Ω

(aTi YZTbj − Tij)

2

Non-convex optimizationAlternately minimize w.r.t. Y and Z

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 49: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Algorithm 2: Alternating Least Squares

Computational complexity of ALS.

At h-th iteration, fixing Yh, solve the least squares problem for Zh+1:∑(i,j)∈Ω

(aTi ZTh+1bj)bj a

Ti =

∑(i,j)∈Ω

Tijbj aTi

where ai = Y Th ai . Similarly solve for Yh when fixing Zh.

1 Closed form: O(|Ω|k2d × (nnz(A) + nnz(B))/n + k3d3).2 Vanilla conjugate gradient: O(|Ω|k× (nnz(A) +nnz(B))/n) per iteration.3 Exploit the structure for conjugate gradient:∑

(i,j)∈Ω

(aTi ZTbj)bj a

Ti = BTDA

where D is a sparse matrix with Dji = aTi ZTbj , (i , j) ∈ Ω, and A = AYh.

Only O((nnz(A) + nnz(B) + |Ω|)× k) per iteration.

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 50: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Algorithm 2: Alternating Least Squares

Theorem (Convergence Guarantees for ALS )

Let X∗ be a rank-k matrix with condition number β, and T = AX∗BT .

Assume A,B are orthogonal w.l.o.g. and satisfy the incoherence conditions.Then if Ω is uniformly sampled with

|Ω| ≥ O(k4β2d log d),

then after H iterations of ALS, ‖YHZTH+1 − X∗‖2 ≤ ε, where

H = O(log(‖X∗‖F/ε)).

The incoherence conditions are:

C1. maxi∈[n]‖ai‖2

2 ≤µd

n, max

j∈[n]‖bj‖2

2 ≤µd

n

C2’. maxi∈[n]‖Y T

h ai‖22 ≤

µ0k

n, max

j∈[n]‖ZT

h bj‖22 ≤

µ0k

n,

for all the Yh’s and Zh’s generated from ALS.Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 51: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Algorithm 2: Alternating Least Squares

Proof sketch for ALS

Consider the case when the rank k = 1:

miny∈Rd1 ,z∈Rd2

∑(i,j)∈Ω

(aTi yzTbj − Tij)

2

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 52: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Algorithm 2: Alternating Least Squares

Proof sketch for rank-1 ALS

miny∈Rd1 ,z∈Rd2

∑(i ,j)∈Ω

(aTi yzTbj − Tij)

2

(a) Let X∗ = σ∗y∗zT∗ be the thin SVD of X∗ and assume A and B are

orthogonal w.l.o.g.(b) In the absence of missing values, ALS = Power method.

∂‖AyhzTBT − T‖2

F

∂z= 2BT (BzyT

h AT−TT )Ayh = 2(z‖yh‖2−BTTTAyh)

zh+1 ← (ATTB)T yh ; normalize zh+1

yh+1 ← (ATTB)zh+1 ; normalize yh+1

Note that ATTB = ATAX∗BTB = X∗ and the power method converges

to the optimal.

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 53: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Algorithm 2: Alternating Least Squares

Proof sketch for rank-1 ALS

miny∈Rd1 ,z∈Rd2

∑(i ,j)∈Ω

(aTi yzTbj − Tij)

2

(c) With missing values, ALS is a variant of power method with noise ineach iteration

zh+1 ← QR( XT∗ yh︸ ︷︷ ︸

power method

−σ∗N−1((yT∗ yh)N − N)z∗︸ ︷︷ ︸

noise term g

)

where N =∑

(i,j)∈Ω bjaTi yhyTh aib

Tj , N =

∑(i,j)∈Ω bjaTi yhy

T∗ aib

Tj .

(d) Given C1 and C2’, the noise term g = σ∗N−1((yT

∗ yh)N − N)z∗becomes smaller as the iterate gets close to the optimal:

‖g‖2 ≤1

99

√1− (yT

h y∗)2

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 54: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Algorithm 2: Alternating Least Squares

Proof sketch for rank-1 ALS

miny∈Rd1 ,z∈Rd2

∑(i ,j)∈Ω

(aTi yzTbj − Tij)

2

(e) Given C1 and C2’, the first iterate y0 is well initialized, i.e. yT0 y∗ ≥ 0.9,

which guarantees the initial noise is small enough(f) The iterates can then be shown to linearly converge to the optimal:

1− (zTh+1z∗)2 ≤ 1

2(1− (yT

h z∗)2)

1− (yTh+1y∗)

2 ≤ 1

2(1− (zTh+1y∗)

2)

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 55: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Algorithm 2: Alternating Least Squares

Proof sketch for rank-1 ALS

miny∈Rd1 ,z∈Rd2

∑(i ,j)∈Ω

(aTi yzTbj − Tij)

2

(e) Given C1 and C2’, the first iterate y0 is well initialized, i.e. yT0 y∗ ≥ 0.9,

which guarantees the initial noise is small enough(f) The iterates can then be shown to linearly converge to the optimal:

1− (zTh+1z∗)2 ≤ 1

2(1− (yT

h z∗)2)

1− (yTh+1y∗)

2 ≤ 1

2(1− (zTh+1y∗)

2)

Similarly, the rank-k case can be proved.

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 56: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Inductive Matrix Completion: Sample Complexity

Sample complexity of Inductive Matrix Completion (IMC) and MatrixCompletion (MC).

methods IMC MC

Nuclear-norm O(kd log n log d) kn log2 n (Recht, 2011)

ALS O(k4β2d log d) k3β2n log n (Hardt, 2014)

where β is the condition number of X

In most cases, n d

Incoherence conditions on A,B are required

Satisfied e.g. when A,B are Gaussian (no assumption on X needed)

B. Recht. A simpler approach to matrix completion. The Journal of Machine Learning Research 12 : 3413-3430 (2011).

M. Hardt. Understanding alternating minimization for matrix completion. Foundations of Computer Science (FOCS), IEEE 55th

Annual Symposium, pp. 651-660 (2014).

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 57: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Inductive Matrix Completion: Sample Complexity Results

All matrices are sampled from Gaussian random distribution.

Left two figures: fix k = 5, n = 1000 and change d .

Right two figures: fix k = 5, d = 50 and change n.

Darkness of the shading is proportional to the number of failures(repeated 10 times).

50 100 1500

500

1000

1500

2000

2500

3000

d

Num

ber o

f Mea

sure

men

ts

0

0.2

0.4

0.6

0.8

1

50 100 1500

500

1000

1500

2000

2500

3000

d

Num

ber o

f Mea

sure

men

ts

0

0.2

0.4

0.6

0.8

1

0 200 400 600 800 1000200

300

400

500

600

700

800

900

1000

n

N

umbe

r of M

easu

rem

ents

0

0.2

0.4

0.6

0.8

1

0 200 400 600 800 1000200

300

400

500

600

700

800

900

1000

n

Num

ber o

f Mea

sure

men

ts

0

0.2

0.4

0.6

0.8

1

|Ω| vs. d (ALS) |Ω| vs. d (Nuclear) |Ω| vs. n (ALS) |Ω| vs. n (Nuclear)

Sample complexity is proportional to d while almost independent of nfor both Nuclear-norm and ALS methods.

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 58: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Positive-Unlabeled Learning

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 59: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Modern Prediction Problems in Machine Learning

Predicting causal disease genes

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 60: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Bilinear Prediction: PU Learning

In many applications, only “positive” labels are observed

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 61: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

PU Learning

No observations of the “negative” class available

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 62: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

PU Inductive Matrix Completion

Guarantees so far assume observations are sampled uniformly

What can we say about the case when observations are all 1’s(“positives”)?

Typically, 99% entries are missing (“unlabeled”)

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 63: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

PU Inductive Matrix Completion

Inductive Matrix Completion:

minX :‖X‖∗≤t

∑(i ,j)∈Ω

(aTi Xbj − Tij)2

Commonly used PU strategy: Biased Matrix Completion

minX :‖X‖∗≤t

α∑

(i ,j)∈Ω

(aTi Xbj − Tij)2 + (1− α)

∑(i ,j)6∈Ω

(aTi Xbj − 0)2

Typically, α > 1− α (α ≈ 0.9).

V. Sindhwani, S. S. Bucak, J. Hu, A. Mojsilovic. One-class matrix completion with low-density factorizations. ICDM, pp.

1055-1060. 2010.

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 64: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

PU Inductive Matrix Completion

Inductive Matrix Completion:

minX :‖X‖∗≤t

∑(i ,j)∈Ω

(aTi Xbj − Tij)2

Commonly used PU strategy: Biased Matrix Completion

minX :‖X‖∗≤t

α∑

(i ,j)∈Ω

(aTi Xbj − Tij)2 + (1− α)

∑(i ,j)6∈Ω

(aTi Xbj − 0)2

Typically, α > 1− α (α ≈ 0.9).

We can show guarantees for the biased formulation

V. Sindhwani, S. S. Bucak, J. Hu, A. Mojsilovic. One-class matrix completion with low-density factorizations. ICDM, pp.

1055-1060. 2010.

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 65: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

PU Learning: Random Noise Model

Can be formulated as learning with “class-conditional” noise

N. Natarajan, I. S. Dhillon, P. Ravikumar, and A.Tewari. Learning with Noisy Labels. In Advances in Neural Information

Processing Systems, pp. 1196-1204. 2013.

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 66: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

PU Inductive Matrix Completion

A deterministic PU learning model

Tij =

1 if Mij > 0.5,

0 if Mij ≤ 0.5

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 67: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

PU Inductive Matrix Completion

A deterministic PU learning model

P(Tij = 0|Tij = 1) = ρ and P(Tij = 0|Tij = 0) = 1.

We are given only T but not T or M

Goal: Recover T given T (recovering M is not possible!)

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 68: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Algorithm 1: Biased Inductive Matrix Completion

X = minX :‖X‖∗≤t

α∑

(i ,j)∈Ω

(aTi Xbj − 1)2 + (1− α)∑

(i ,j) 6∈Ω

(aTi Xbj − 0)2

Rationale:(a) Fix α = (1 + ρ)/2 and define Tij = I

[(AXBT )ij > 0.5

](b) The above problem is equivalent to:

X = minX :‖X‖∗≤t

∑i,j

`α((AXBT )ij , Tij)

where `α(x , Tij) = αTij(x − 1)2 + (1− α)(1− Tij)x2

(c) Minimizing `α loss is equivalent to minimizing the true error, i.e.

1

mn

∑ij

`α((AXBT )ij , Tij) = C11

mn‖T − T‖2

F + C2

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 69: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Algorithm 1: Biased Inductive Matrix Completion

Theorem (Error Bound for Biased IMC)

Assume ground-truth X satisfies ‖X‖∗ ≤ t (where M = AXBT ). DefineTij = I

[(AXBT )ij > 0.5

], A = maxi ‖ai‖ and B = maxi ‖bi‖. If α = 1+ρ

2 ,then with probability at least 1− δ,

1

n2‖T − T‖2

F = O

(η√

log(2/δ)

n(1− ρ)+η tAB

√log2d

(1− ρ)n3/2

)where η = 4(1 + 2ρ).

C-J. Hsieh, N. Natarajan, and I. S. Dhillon. PU Learning for Matrix Completion. In Proceedings of The 32nd International

Conference on Machine Learning, pp. 2445-2453 (2015).

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 70: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Experimental Results

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 71: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-target Prediction: Image Tag Recommendation

NUS-Wide Image Dataset

161,780 training images

107,879 test images

1,134 features

1,000 tags

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 72: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-target Prediction: Image Tag Recommendation

H. F. Yu, P. Jain, P. Kar, and I. S. Dhillon. Large-scale Multi-label Learning with Missing Labels. In Proceedings of The 31st

International Conference on Machine Learning, pp. 593-601 (2014).

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 73: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-target Prediction: Image Tag Recommendation

Low-rank Model with k = 50:time(s) prec@1 prec@3 AUC

LEML(ALS) 574 20.71 15.96 0.7741WSABIE 4,705 14.58 11.37 0.7658

Low-rank Model with k = 100:time(s) prec@1 prec@3 AUC

LEML(ALS) 1,097 20.76 16.00 0.7718WSABIE 6,880 12.46 10.21 0.7597

H. F. Yu, P. Jain, P. Kar, and I. S. Dhillon. Large-scale Multi-label Learning with Missing Labels. In Proceedings of The 31st

International Conference on Machine Learning, pp. 593-601 (2014).

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 74: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-target Prediction: Wikipedia Tag Recommendation

Wikipedia Dataset

881,805 training wiki pages

10,000 test wiki pages

366,932 features

213,707 tags

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 75: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Multi-target Prediction: Wikipedia Tag Recommendation

Low-rank Model with k = 250:time(s) prec@1 prec@3 AUC

LEML(ALS) 9,932 19.56 14.43 0.9086WSABIE 79,086 18.91 14.65 0.9020

Low-rank Model with k = 500:time(s) prec@1 prec@3 AUC

LEML(ALS) 18,072 22.83 17.30 0.9374WSABIE 139,290 19.20 15.66 0.9058

H. F. Yu, P. Jain, P. Kar, and I. S. Dhillon. Large-scale Multi-label Learning with Missing Labels. In Proceedings of The 31st

International Conference on Machine Learning, pp. 593-601 (2014).

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 76: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

PU Inductive Matrix Completion: Gene-Disease Prediction

N. Natarajan, and I. S. Dhillon. Inductive matrix completion for predicting gene disease associations. Bioinformatics, 30(12),

i60-i68 (2014).

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 77: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

PU Inductive Matrix Completion: Gene-Disease Prediction

0 20 40 60 80 1000

0.1

0.2

Number of genes looked at

P(h

idden g

ene a

mong g

enes looked a

t)

Catapult

Katz

Matrix Completion on C

IMC

LEML

Matrix Completion

0 5 10 15 20 25 300

0.5

1

1.5

2

2.5

Recall(%)P

recis

ion

(%)

Predicting gene-disease associations in the OMIM data set(www.omim.org).

N. Natarajan, and I. S. Dhillon. Inductive matrix completion for predicting gene disease associations. Bioinformatics, 30(12),

i60-i68 (2014).

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 78: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

PU Inductive Matrix Completion: Gene-Disease Prediction

0 20 40 60 80 1000

0.1

Number of genes looked at

P(h

idden g

ene a

mong g

enes looked a

t)

Catapult

Katz

Matrix Completion on C

IMC

LEML

0 20 40 60 80 1000

0.1

0.2

Number of genes looked atP

(hid

den g

ene a

mong g

enes looked a

t)

Predicting genes for diseases with no training associations.

N. Natarajan, and I. S. Dhillon. Inductive matrix completion for predicting gene disease associations. Bioinformatics, 30(12),

i60-i68 (2014).

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 79: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

Conclusions and Future Work

Inductive Matrix Completion:

Scales to millions of targetsCaptures correlations among targetsOvercomes missing valuesExtension to PU learning

Much work to do:

Other structures: low-rank+sparse, low-rank+column-sparse (outliers)?Different loss functions?Handling “time” as one of the dimensions — incorporating smoothnessthrough graph regularization?Incorporating non-linearities?Efficient (parallel) implementations?Improved recovery guarantees?

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction

Page 80: Bilinear Prediction Using Low-Rank ModelsModern Prediction Problems in Machine Learning Ad-word Recommendation geico auto insurance geico car insurance car insurance geico insurance

References

[1] P. Jain, and I. S. Dhillon. Provable inductive matrix completion. arXiv preprintarXiv:1306.0626 (2013).

[2] K. Zhong, P. Jain, I. S. Dhillon. Efficient Matrix Sensing Using Rank-1 GaussianMeasurements. In Proceedings of The 26th Conference on Algorithmic Learning Theory(2015).

[3] N. Natarajan, and I. S. Dhillon. Inductive matrix completion for predicting genedisease associations. Bioinformatics, 30(12), i60-i68 (2014).

[4] H. F. Yu, P. Jain, P. Kar, and I. S. Dhillon. Large-scale Multi-label Learning withMissing Labels. In Proceedings of The 31st International Conference on MachineLearning, pp. 593-601 (2014).

[5] C-J. Hsieh, N. Natarajan, and I. S. Dhillon. PU Learning for Matrix Completion. InProceedings of The 32nd International Conference on Machine Learning, pp. 2445-2453(2015).

Inderjit S. Dhillon Dept of Computer Science UT Austin Low-Rank Bilinear Prediction