algorithmic music recommendations at spotify
DESCRIPTION
In this presentation I introduce various Machine Learning methods that we utilize for music recommendations and discovery at Spotify. Specifically, I focus on Implicit Matrix Factorization for Collaborative Filtering, how to implement a small scale version using python, numpy, and scipy, as well as how to scale up to 20 Million users and 24 Million songs using Hadoop and Spark.TRANSCRIPT
January 13, 2014
Algorithmic Music Discovery at Spotify
Chris Johnson@MrChrisJohnson
Monday, January 13, 14
Who am I??•Chris Johnson
– Machine Learning guy from NYC– Focused on music recommendations– Formerly a graduate student at UT Austin
Monday, January 13, 14
3What is Spotify?
• On demand music streaming service• “iTunes in the cloud”
Monday, January 13, 14
Section name 4
Monday, January 13, 14
5Data at Spotify....• 20 Million songs• 24 Million active users• 6 Million paying users• 8 Million daily active users• 1 TB of compressed data generated from users per day• 700 node Hadoop Cluster• 1 Million years worth of music streamed• 1 Billion user generated playlists
Monday, January 13, 14
6Challenge: 20 Million songs... how do we recommend music to users?
Monday, January 13, 14
7Recommendation Features• Discover (personalized recommendations)• Radio• Related Artists• Now Playing
Monday, January 13, 14
How can we find good recommendations?
• Manual Curation
• Manually Tag Attributes
• Audio Content, Metadata, Text Analysis
• Collaborative Filtering
8
Monday, January 13, 14
Collaborative Filtering - “The Netflix Prize” 9
Monday, January 13, 14
Collaborative Filtering 10
Hey,I like tracks P, Q, R, S!
Well,I like tracks Q, R, S, T!
Then you should check out track P!
Nice! Btw try track T!
Image via Erik BernhardssonMonday, January 13, 14
Section name 11
Monday, January 13, 14
Difference between movie and music recs 12
• Scale of catalog
60,000 movies 20,000,000 songs
Monday, January 13, 14
Difference between movie and music recs 13
• Repeated consumption
Monday, January 13, 14
Difference between movie and music recs 14
• Music is more niche
Monday, January 13, 14
“The Netflix Problem” Vs “The Spotify Problem 15
•Netflix: Users explicitly “rate” movies
•Spotify: Feedback is implicit through streaming behavior
Monday, January 13, 14
Section name 16
Monday, January 13, 14
Explicit Matrix Factorization 17
Movies
Users
Chris
Inception
•Users explicitly rate a subset of the movie catalog•Goal: predict how users will rate new movies
Monday, January 13, 14
• = bias for user• = bias for item• = regularization parameter
Explicit Matrix Factorization 18
ChrisInception
? 3 5 ?1 ? ? 12 ? 3 2? ? ? 55 2 ? 4
•Approximate ratings matrix by the product of low-dimensional user and movie matrices
•Minimize RMSE (root mean squared error)
• = user rating for movie • = user latent factor vector• = item latent factor vector
X Y
Monday, January 13, 14
Implicit Matrix Factorization 19
1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1
•Replace Stream counts with binary labels– 1 = streamed, 0 = never streamed
•Minimize weighted RMSE (root mean squared error) using a function of stream counts as weights
• = bias for user• = bias for item• = regularization parameter
• = 1 if user streamed track else 0• • = user latent factor vector• =i tem latent factor vector
X Y
Monday, January 13, 14
Alternating Least Squares 20
• Initialize user and item vectors to random noise
• Fix item vectors and solve for optimal user vectors– Take the derivative of loss function with respect to user’s vector, set
equal to 0, and solve– Results in a system of linear equations with closed form solution!
• Fix user vectors and solve for optimal item vectors• Repeat until convergence
code: https://github.com/MrChrisJohnson/implicitMF
Monday, January 13, 14
Alternating Least Squares 21
• Note that:
• Then, we can pre-compute once per iteration– and only contain non-zero elements for tracks that
the user streamed– Using sparse matrix operations we can then compute each user’s
vector efficiently in time where is the number of tracks the user streamed
code: https://github.com/MrChrisJohnson/implicitMF
Monday, January 13, 14
22Alternating Least Squares
code: https://github.com/MrChrisJohnson/implicitMFMonday, January 13, 14
•User-Item score is the dot product
•Item-Item similarity is the cosine similarity
•Both operations have trivial complexity based on the number of latent factors
23How do we use the learned vectors?
Monday, January 13, 14
24Latent Factor Vectors in 2 dimensions
Monday, January 13, 14
Section name 25
Monday, January 13, 14
Scaling up Implicit Matrix Factorization with Hadoop
26
Monday, January 13, 14
Hadoop at Spotify 2009 27
Monday, January 13, 14
Hadoop at Spotify 2014 28
700 Nodes in our London data center
Monday, January 13, 14
Implicit Matrix Factorization with Hadoop 29
Reduce stepMap step
u % K = 0i % L = 0
u % K = 0i % L = 1 ... u % K = 0
i % L = L-1
u % K = 1i % L = 0
u % K = 1i % L = 1 ... ...
... ... ... ...
u % K = K-1i % L = 0 ... ... u % K = K-1
i % L = L-1
item vectorsitem%L=0
item vectorsitem%L=1
item vectorsi % L = L-1
user vectorsu % K = 0
user vectorsu % K = 1
user vectorsu % K = K-1
all log entriesu % K = 1i % L = 1
u % K = 0
u % K = 1
u % K = K-1
Figure via Erik BernhardssonMonday, January 13, 14
Implicit Matrix Factorization with Hadoop 30
One map taskDistributed
cache:All user vectors where u % K = x
Distributed cache:
All item vectors where i % L = y
Mapper Emit contributions
Map input:tuples (u, i, count)
where u % K = x
andi % L = y
Reducer New vector!
Figure via Erik BernhardssonMonday, January 13, 14
Implicit Matrix Factorization with Spark 31
Vs
http://www.slideshare.net/Hadoop_Summit/spark-and-shark
Spark
Hadoop
Monday, January 13, 14
Section name 32
Monday, January 13, 14
Approximate Nearest Neighbors 33
code: https://github.com/Spotify/annoy
Monday, January 13, 14
Ensemble of Latent Factor Models 34
Figure via Erik BernhardssonMonday, January 13, 14
AB-Testing Recommendations 35
Monday, January 13, 14
Open Problems 36
•How to go from predictive model to related artists? (learning to rank?)
•How do you learn from user feedback?•How do you deal with observation bias in the user feedback?
(active learning?)•How to factor in temporal information?•How much value in content based recommendations?•How to best evaluate model performance?•How to best train an ensemble?
Monday, January 13, 14
Section name 37
Thank You!
Monday, January 13, 14
Section name 38
Monday, January 13, 14
Section name 39
Monday, January 13, 14
Section name 40
Monday, January 13, 14
Section name 41
Monday, January 13, 14
Section name 42
Monday, January 13, 14