data-driven modeling: lecture 09
TRANSCRIPT
![Page 1: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/1.jpg)
Data-driven modelingAPAM E4990
Jake Hofman
Columbia University
April 2, 2012
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 1 / 30
![Page 2: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/2.jpg)
Personalized recommendations
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 2 / 30
![Page 3: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/3.jpg)
Personalized recommendations
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 3 / 30
![Page 4: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/4.jpg)
http://netflixprize.com
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 4 / 30
![Page 5: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/5.jpg)
http://netflixprize.com/rules
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 5 / 30
![Page 6: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/6.jpg)
http://netflixprize.com/faq
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 6 / 30
![Page 7: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/7.jpg)
Netflix prize: results
http://en.wikipedia.org/wiki/Netflix_Prize
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 7 / 30
![Page 8: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/8.jpg)
Netflix prize: results
See [TJB09] and [Kor09] for more gory details.
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 8 / 30
![Page 9: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/9.jpg)
Recommendation systems
High-level approaches:
• Content-based methods(e.g., wgenre: thrillers = +2.3, wdirector: coen brothers = +1.7)
• Collaborative methods(e.g., “Users who liked this also liked”)
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 9 / 30
![Page 10: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/10.jpg)
Netflix prize: data
(userid, movieid, rating, date)
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 10 / 30
![Page 11: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/11.jpg)
Netflix prize: data
(movieid, year, title)
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 10 / 30
![Page 12: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/12.jpg)
Recommendation systems
High-level approaches:
• Content-based methods(e.g., wgenre: thrillers = +2.3, wdirector: coen brothers = +1.7)
• Collaborative methods(e.g., “Users who liked this also liked”)
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 11 / 30
![Page 13: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/13.jpg)
Collaborative filtering
Memory-based(e.g., k-nearest neighbors)
Model-based(e.g., matrix factorization)
http://research.yahoo.com/pub/2859
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 12 / 30
![Page 14: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/14.jpg)
Problem statement
• Given a set of past ratings Rui that user u gave item i• Users may explicitly assign ratings, e.g., Rui ∈ [1, 5] is number
of stars for movie rating• Or we may infer implicit ratings from user actions, e.g.
Rui = 1 if u purchased i ; otherwise Rui = ?
• Make recommendations of several forms• Predict unseen item ratings for a particular user• Suggest items for a particular user• Suggest items similar to a particular item• . . .
• Compare to natural baselines• Guess global average for item ratings• Suggest globally popular items
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 13 / 30
![Page 15: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/15.jpg)
Problem statement
• Given a set of past ratings Rui that user u gave item i• Users may explicitly assign ratings, e.g., Rui ∈ [1, 5] is number
of stars for movie rating• Or we may infer implicit ratings from user actions, e.g.
Rui = 1 if u purchased i ; otherwise Rui = ?
• Make recommendations of several forms• Predict unseen item ratings for a particular user• Suggest items for a particular user• Suggest items similar to a particular item• . . .
• Compare to natural baselines• Guess global average for item ratings• Suggest globally popular items
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 13 / 30
![Page 16: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/16.jpg)
Problem statement
• Given a set of past ratings Rui that user u gave item i• Users may explicitly assign ratings, e.g., Rui ∈ [1, 5] is number
of stars for movie rating• Or we may infer implicit ratings from user actions, e.g.
Rui = 1 if u purchased i ; otherwise Rui = ?
• Make recommendations of several forms• Predict unseen item ratings for a particular user• Suggest items for a particular user• Suggest items similar to a particular item• . . .
• Compare to natural baselines• Guess global average for item ratings• Suggest globally popular items
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 13 / 30
![Page 17: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/17.jpg)
k-nearest neighbors
Key intuition:Take a local popularity vote amongst “similar” users
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 14 / 30
![Page 18: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/18.jpg)
k-nearest neighborsUser similarity
Quantify similarity as a function of users’ past ratings, e.g.
• Fraction of items u and v have in common
Suv =|ru ∩ rv ||ru ∪ rv |
=
∑i RuiRvi∑
i (Rui + Rvi − RuiRvi )(1)
Retain top-k most similar neighbors v for each user u
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 15 / 30
![Page 19: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/19.jpg)
k-nearest neighborsUser similarity
Quantify similarity as a function of users’ past ratings, e.g.
• Angle between rating vectors
Suv =ru · rv|ru| |rv |
=
∑i RuiRvi√∑i R
2ui
∑j R
2vj
(1)
Retain top-k most similar neighbors v for each user u
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 15 / 30
![Page 20: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/20.jpg)
k-nearest neighborsPredicted ratings
Predict unseen ratings R̂ui as a weighted vote over u’s neighbors’ratings for item i
R̂ui =
∑v RviSuv∑v Suv
(2)
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 16 / 30
![Page 21: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/21.jpg)
k-nearest neighborsPractical notes
We expect most users have nothing in common, so calculatesimilarities as:
for each item i :for all pairs of users u, v that have rated i :
calculate Suv (if not already calculated)
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 17 / 30
![Page 22: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/22.jpg)
k-nearest neighborsPractical notes
Alternatively, we can make recommendations using an item-basedapproach [LSY03]:
• Compute similarities Sij between all pairs of items
• Predict ratings with a weighted vote R̂ui =∑
j RujSij/∑
j Sij
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 17 / 30
![Page 23: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/23.jpg)
k-nearest neighborsPractical notes
Several (relatively) simple ways to scale:
• Sample a subset of ratings for each user (by, e.g., recency)
• Use MinHash to cluster users [DDGR07]
• Distribute calculations with MapReduce
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 17 / 30
![Page 24: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/24.jpg)
Matrix factorization
Key intuition:Model item attributes as belonging to a set of unobserved “topics
and user preferences across these “topics”
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 18 / 30
![Page 25: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/25.jpg)
Matrix factorizationLinear model
Start with a simple linear model:
R̂ui = b0︸︷︷︸global average
+ bu︸︷︷︸user bias
+ bi︸︷︷︸item bias
(3)
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 19 / 30
![Page 26: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/26.jpg)
Matrix factorizationLinear model
For example, we might predict that a harsh critic would score apopular movie as
R̂ui = 3.6︸︷︷︸global average
+ −0.5︸︷︷︸user bias
+ 0.8︸︷︷︸item bias
(3)
= 3.9 (4)
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 19 / 30
![Page 27: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/27.jpg)
Matrix factorizationLow-rank approximation
Add an interaction term:
R̂ui = b0︸︷︷︸global average
+ bu︸︷︷︸user bias
+ bi︸︷︷︸item bias
+ Wui︸︷︷︸user-item interaction
(5)
where Wui = pu · qi =∑
k PukQik
• Puk is user u’s preference for topic k
• Qik is item i ’s association with topic k
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 20 / 30
![Page 28: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/28.jpg)
Matrix factorizationLoss function
Measure quality of model fit with squared-loss:
L =∑(u,i)
(R̂ui − Rui
)2(6)
=∑(u,i)
([PQT
]ui− Rui
)2(7)
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 21 / 30
![Page 29: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/29.jpg)
Matrix factorizationOptimization
The loss is non-convex in (P,Q), so no global minimum exists
Instead we can optimize L iteratively, e.g.:
• Alternating least squares: update each row of P, holding Qfixed, and vice-versa
• Stochastic gradient descent: update individual rows pu and qifor each observed Rui
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 22 / 30
![Page 30: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/30.jpg)
Matrix factorizationAlternating least squares
L is convex in rows of P with Q fixed, and Q with P fixed, soalternate solutions to the normal equations:
pu =[Q(u)TQ(u)
]−1Q(u)T r(u) (8)
qi =[P(i)TP(i)
]−1P(i)T r(i) (9)
where:
• Q(u) is the item association matrix restricted to items ratedby user u
• P(i) is the user preference matrix restricted to users that haverated item i
• r(u) are ratings by user u and r(i) are ratings on item i
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 23 / 30
![Page 31: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/31.jpg)
Matrix factorizationStochastic gradient descent
Alternatively, we can avoid inverting matrices by taking steps inthe direction of the negative gradient for each observed rating:
pu ← pu − η∂L∂pu
= pu +(Rui − R̂ui
)qi (10)
qi ← qi − η∂L∂qi
= qi +(Rui − R̂ui
)pu (11)
for some step-size η
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 24 / 30
![Page 32: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/32.jpg)
Matrix factorizationPractical notes
Several ways to scale:
• Distribute matrix operations with MapReduce [GHNS11]
• Parallelize stochastic gradient descent [ZWSL10]
• Expectation-maximization for pLSI with MapReduce[DDGR07]
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 25 / 30
![Page 33: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/33.jpg)
Datasets
• Movielenshttp://www.grouplens.org/node/12
• Reddithttp://bit.ly/redditdata
• CU “million songs”http://labrosa.ee.columbia.edu/millionsong/
• Yahoo Music KDDcuphttp://kddcup.yahoo.com/
• AudioScrobblerhttp://bit.ly/audioscrobblerdata
• Delicioushttp://bit.ly/deliciousdata
• . . .
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 26 / 30
![Page 34: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/34.jpg)
Photo recommendations
http://koala.sandbox.yahoo.com
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 27 / 30
![Page 35: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/35.jpg)
References I
AS Das, M Datar, A Garg, and S Rajaram.Google news personalization: scalable online collaborativefiltering.page 280, 2007.
R Gemulla, PJ Haas, E Nijkamp, and Y Sismanis.Large-scale matrix factorization with distributed stochasticgradient descent.2011.
Yehuda Koren.The bellkor solution to the netflix grand prize.pages 1–10, Aug 2009.
G Linden, B Smith, and J York.Amazon. com recommendations: Item-to-item collaborativefiltering.IEEE Internet computing, 7(1):76–80, 2003.
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 28 / 30
![Page 36: Data-driven modeling: Lecture 09](https://reader034.vdocument.in/reader034/viewer/2022052618/5549f263b4c9051e488b56cf/html5/thumbnails/36.jpg)
References II
A Toscher, M Jahrer, and RM Bell.The bigchaos solution to the netflix grand prize.2009.
M. Zinkevich, M. Weimer, A. Smola, and L. Li.Parallelized stochastic gradient descent.In Neural Information Processing Systems (NIPS), 2010.
Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 29 / 30