Download - CS158: Final Project
Music RecommendationEvan Casey and Erin Coughlan
Problem● Million Song Dataset Challenge (Kaggle)
○ 110k Users, 1m+ unique songs● Music Recommendation
○ Recommend songs for each user based on a larger training set of user listening histories
● Winner - 0.17910 (17.9%)● Benchmark - 0.02079 (2.1%)
Data● Million Song Dataset● Two subsets of 1000
users (random and most active)
● Echonest API to get metadata
Echonest DataMetadata we obtained:
● Tempo● Danceability● Energy● Speech● Acousticness
Unavailable metadata:
● Genre● Artist popularity● Song popularity● Location● Year released
Previous ApproachesDynamic K-Means:
● Kim et. al (6th Int’l Conference on ML)● Li et. al (University of Michigan)
Item and user-based collaborative-filtering:● Niu et. al (Stanford)● Lu et. al (Stanford)
K-Means
K-Means for our ProblemStep 1:K-Means from all songs listened to by all users
K-Means for our ProblemStep 2:K-Means from user listening history
K-Means for our ProblemStep 3:Predict based on location of user centroids
Mean Average PrecisionPredicted:
Actual:
Mean Average Precision
What are the Results?All Metadata 0.00200326282427
Weighted Centroids 0.00375567272976
Multiple Centroids (2) 0.00364834470835
Modified Metadata 0.00994279218087
All Improvements 0.01008282844
More Data 0.00266295400221
Number of Clusters?
Collaborative FilteringShawn
Billy
Paul
1 4 3 8 9
2
4
3 4
1
2
8 8
4
Collaborative Filtering
User-based Collaborative Filtering
Step 1: Obtain user history profile
User-based Collaborative Filtering
Step 2:Calculate similarity between users to find their nearest neighbors
User-based Collaborative Filtering
Step 3:Compute weighted average of the ratings by the neighbors and find the items with the highest average
Implementation Details
MRJobUsed Amazon EMR with MRJob to parallelize the algorithm across multiple machines
What are the Results?
User Collaborative Filtering (1k Users) 0.008223545412
User Collaborative Filtering (10k Users) 0.012654713312
User Collaborative Filtering (110k Users) 0.112794360446
Compiled ResultsBenchmark (1k Users) 0.0104030562401
K-means 0.01008282844
Benchmark (110k Users) 0.02079
User Collaborative Filtering 0.112794360446
Improvements?● Ensemble techniques● More metadata from echonest (genre, artist
popularity, etc.)● MapReduce for k-means
Questions?
Referenceshttp://cs229.stanford.edu/proj2012/NiuYinZhang-MillionSongDatasetChallenge.pdf
http://cs229.stanford.edu/proj2012/LuXiongLiu-MusicRecommenderSystemUtilizingUsers%E2%80%99ListeningHistoryandSocialNetworkInformation.pdf
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4457263&tag=1
http://www-personal.umich.edu/~yjli/content/projectreport.pdf
Github Repohttps://github.com/erinkidd01/CS158FinalProject.git