matrix completion with als - stanford...

17
Reza Zadeh Matrix Completion with ALS @Reza_Zadeh | http://reza-zadeh.com

Upload: others

Post on 04-Dec-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

Reza Zadeh

Matrix Completion with ALS

@Reza_Zadeh | http://reza-zadeh.com

Page 2: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

Collaborative Filtering Goal: predict users’ movie ratings based on past ratings of other movies

R  =  

1  ?  ?  4  5  ?  3  ?  ?  3  5  ?  ?  3  5  ?  5  ?  ?  ?  1  4  ?  ?  ?  ?  2  ?  

Movies  

Users  

Page 3: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

Don’t mistake this with SVD.

Both are matrix factorizations, however SVD cannot handle missing entries.

Page 4: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

Optimization problem

Page 5: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

Alternating Least Squares 1.  Start with random A1, B1 2.  Solve for A2 to minimize ||R – A2B1

T|| 3.  Solve for B2 to minimize ||R – A2B2

T|| 4.  Repeat until convergence

R A = BT

Page 6: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

Attempt 1: Broadcast All

Page 7: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

Attempt 2: Data Parallel

Page 8: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

Attempt 3: Fully Parallel

Page 9: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

ALS on Spark Cache 2 copies of R in memory, one partitioned by rows and one by columns Keep A & B partitioned in corresponding way Operate on blocks to lower communication

R A = BT

Matei Zaharia,"Joey Gonzales,"Virginia Smith

Page 10: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

ALS Results

4208

481 297 0

1000

2000

3000

4000

5000

Tota

l Tim

e (s)

Mahout / Hadoop Spark (Scala) GraphLab (C++)

Page 11: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

State of the Spark ecosystem

Page 12: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

Most active open source community in big data

200+ developers, 50+ companies contributing

Spark Community

Giraph Storm

0

50

100

150

Contributors in past year

Page 13: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

Project Activity M

apRe

duce

YA

RN HD

FS

Stor

m

Spar

k

0

200

400

600

800

1000

1200

1400

1600

Map

Redu

ce

YARN

HDFS

St

orm

Spar

k

0

50000

100000

150000

200000

250000

300000

350000

Commits Lines of Code Changed

Activity in past 6 months

Page 14: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

Continuing Growth

source: ohloh.net

Contributors per month to Spark

Page 15: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

Conclusions

Page 16: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

Spark and Research Spark has all its roots in research, so we hope to keep incorporating new ideas!

Page 17: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD

Conclusion Data flow engines are becoming an important platform for numerical algorithms While early models like MapReduce were inefficient, new ones like Spark close this gap More info: spark.apache.org