algorithmic music recommendations at spotify

January 13, 2014

Algorithmic Music Discovery at Spotify

Chris Johnson@MrChrisJohnson

Monday, January 13, 14

Who am I??•Chris Johnson

– Machine Learning guy from NYC– Focused on music recommendations– Formerly a graduate student at UT Austin


3What is Spotify?

• On demand music streaming service• “iTunes in the cloud”


Section name 4


5Data at Spotify....• 20 Million songs• 24 Million active users• 6 Million paying users• 8 Million daily active users• 1 TB of compressed data generated from users per day• 700 node Hadoop Cluster• 1 Million years worth of music streamed• 1 Billion user generated playlists


6Challenge: 20 Million songs... how do we recommend music to users?


7Recommendation Features• Discover (personalized recommendations)• Radio• Related Artists• Now Playing


How can we find good recommendations?

• Manual Curation

• Manually Tag Attributes

• Audio Content, Metadata, Text Analysis

• Collaborative Filtering

8


Collaborative Filtering - “The Netflix Prize” 9


Collaborative Filtering 10

Hey,I like tracks P, Q, R, S!

Well,I like tracks Q, R, S, T!

Then you should check out track P!

Nice! Btw try track T!

Image via Erik BernhardssonMonday, January 13, 14

Section name 11


Difference between movie and music recs 12

• Scale of catalog

60,000 movies 20,000,000 songs



• Repeated consumption



• Music is more niche


“The Netflix Problem” Vs “The Spotify Problem 15

•Netflix: Users explicitly “rate” movies

•Spotify: Feedback is implicit through streaming behavior


Section name 16


Explicit Matrix Factorization 17

Movies

Users

Chris

Inception

•Users explicitly rate a subset of the movie catalog•Goal: predict how users will rate new movies


• = bias for user• = bias for item• = regularization parameter

Explicit Matrix Factorization 18

ChrisInception

? 3 5 ?1 ? ? 12 ? 3 2? ? ? 55 2 ? 4

•Approximate ratings matrix by the product of low-dimensional user and movie matrices

•Minimize RMSE (root mean squared error)

• = user rating for movie • = user latent factor vector• = item latent factor vector

X Y


Implicit Matrix Factorization 19

1 0 0 0 1 0 0 10 0 1 0 0 1 0 0 1 0 1 0 0 0 1 10 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1

•Replace Stream counts with binary labels– 1 = streamed, 0 = never streamed

•Minimize weighted RMSE (root mean squared error) using a function of stream counts as weights

• = bias for user• = bias for item• = regularization parameter

• = 1 if user streamed track else 0• • = user latent factor vector• =i tem latent factor vector

X Y


Alternating Least Squares 20

• Initialize user and item vectors to random noise

• Fix item vectors and solve for optimal user vectors– Take the derivative of loss function with respect to user’s vector, set

equal to 0, and solve– Results in a system of linear equations with closed form solution!

• Fix user vectors and solve for optimal item vectors• Repeat until convergence

code: https://github.com/MrChrisJohnson/implicitMF


https://github.com/MrChrisJohnson/implicitMF


Alternating Least Squares 21

• Note that:

• Then, we can pre-compute once per iteration– and only contain non-zero elements for tracks that

the user streamed– Using sparse matrix operations we can then compute each user’s

vector efficiently in time where is the number of tracks the user streamed

code: https://github.com/MrChrisJohnson/implicitMF




22Alternating Least Squares

code: https://github.com/MrChrisJohnson/implicitMFMonday, January 13, 14



•User-Item score is the dot product

•Item-Item similarity is the cosine similarity

•Both operations have trivial complexity based on the number of latent factors

23How do we use the learned vectors?


24Latent Factor Vectors in 2 dimensions


Section name 25


Scaling up Implicit Matrix Factorization with Hadoop

26


Hadoop at Spotify 2009 27


Hadoop at Spotify 2014 28

700 Nodes in our London data center


Implicit Matrix Factorization with Hadoop 29

Reduce stepMap step

u % K = 0i % L = 0

u % K = 0i % L = 1 ... u % K = 0

i % L = L-1

u % K = 1i % L = 0

u % K = 1i % L = 1 ... ...

... ... ... ...

u % K = K-1i % L = 0 ... ... u % K = K-1

i % L = L-1

item vectorsitem%L=0

item vectorsitem%L=1

item vectorsi % L = L-1

user vectorsu % K = 0

user vectorsu % K = 1

user vectorsu % K = K-1

all log entriesu % K = 1i % L = 1

u % K = 0

u % K = 1

u % K = K-1

Figure via Erik BernhardssonMonday, January 13, 14

Implicit Matrix Factorization with Hadoop 30

One map taskDistributed

cache:All user vectors where u % K = x

Distributed cache:

All item vectors where i % L = y

Mapper Emit contributions

Map input:tuples (u, i, count)

where u % K = x

andi % L = y

Reducer New vector!


Implicit Matrix Factorization with Spark 31

Vs

http://www.slideshare.net/Hadoop_Summit/spark-and-shark

Spark

Hadoop




Section name 32


Approximate Nearest Neighbors 33

code: https://github.com/Spotify/annoy




Ensemble of Latent Factor Models 34


AB-Testing Recommendations 35


Open Problems 36

•How to go from predictive model to related artists? (learning to rank?)

•How do you learn from user feedback?•How do you deal with observation bias in the user feedback?

(active learning?)•How to factor in temporal information?•How much value in content based recommendations?•How to best evaluate model performance?•How to best train an ensemble?


Section name 37

Thank You!


Section name 38


Section name 39


Section name 40


Section name 41


Section name 42


algorithmic music recommendations at spotify

Technology

erik bernhardsson

item vectors

user bias

mrchrisjohnsonimplicitmf

14 section

squared error

user feedback

monday