![Page 1: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/1.jpg)
Reza Zadeh
Matrix Completion with ALS
@Reza_Zadeh | http://reza-zadeh.com
![Page 2: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/2.jpg)
Collaborative Filtering Goal: predict users’ movie ratings based on past ratings of other movies
R =
1 ? ? 4 5 ? 3 ? ? 3 5 ? ? 3 5 ? 5 ? ? ? 1 4 ? ? ? ? 2 ?
Movies
Users
![Page 3: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/3.jpg)
Don’t mistake this with SVD.
Both are matrix factorizations, however SVD cannot handle missing entries.
![Page 4: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/4.jpg)
Optimization problem
![Page 5: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/5.jpg)
Alternating Least Squares 1. Start with random A1, B1 2. Solve for A2 to minimize ||R – A2B1
T|| 3. Solve for B2 to minimize ||R – A2B2
T|| 4. Repeat until convergence
R A = BT
![Page 6: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/6.jpg)
Attempt 1: Broadcast All
![Page 7: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/7.jpg)
Attempt 2: Data Parallel
![Page 8: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/8.jpg)
Attempt 3: Fully Parallel
![Page 9: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/9.jpg)
ALS on Spark Cache 2 copies of R in memory, one partitioned by rows and one by columns Keep A & B partitioned in corresponding way Operate on blocks to lower communication
R A = BT
Matei Zaharia,"Joey Gonzales,"Virginia Smith
![Page 10: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/10.jpg)
ALS Results
4208
481 297 0
1000
2000
3000
4000
5000
Tota
l Tim
e (s)
Mahout / Hadoop Spark (Scala) GraphLab (C++)
![Page 11: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/11.jpg)
State of the Spark ecosystem
![Page 12: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/12.jpg)
Most active open source community in big data
200+ developers, 50+ companies contributing
Spark Community
Giraph Storm
0
50
100
150
Contributors in past year
![Page 13: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/13.jpg)
Project Activity M
apRe
duce
YA
RN HD
FS
Stor
m
Spar
k
0
200
400
600
800
1000
1200
1400
1600
Map
Redu
ce
YARN
HDFS
St
orm
Spar
k
0
50000
100000
150000
200000
250000
300000
350000
Commits Lines of Code Changed
Activity in past 6 months
![Page 14: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/14.jpg)
Continuing Growth
source: ohloh.net
Contributors per month to Spark
![Page 15: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/15.jpg)
Conclusions
![Page 16: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/16.jpg)
Spark and Research Spark has all its roots in research, so we hope to keep incorporating new ideas!
![Page 17: Matrix Completion with ALS - Stanford Universitystanford.edu/~rezab/slides/sparksummit2015/lec5.pdf · Don’t mistake this with SVD. Both are matrix factorizations, however SVD](https://reader033.vdocument.in/reader033/viewer/2022060816/60950955916dc0560d1a6213/html5/thumbnails/17.jpg)
Conclusion Data flow engines are becoming an important platform for numerical algorithms While early models like MapReduce were inefficient, new ones like Spark close this gap More info: spark.apache.org