cs158: final project

Music RecommendationEvan Casey and Erin Coughlan

Problem● Million Song Dataset Challenge (Kaggle)

○ 110k Users, 1m+ unique songs● Music Recommendation

○ Recommend songs for each user based on a larger training set of user listening histories

● Winner - 0.17910 (17.9%)● Benchmark - 0.02079 (2.1%)

Data● Million Song Dataset● Two subsets of 1000

users (random and most active)

● Echonest API to get metadata

Echonest DataMetadata we obtained:

● Tempo● Danceability● Energy● Speech● Acousticness

Unavailable metadata:

● Genre● Artist popularity● Song popularity● Location● Year released

Previous ApproachesDynamic K-Means:

● Kim et. al (6th Int’l Conference on ML)● Li et. al (University of Michigan)

Item and user-based collaborative-filtering:● Niu et. al (Stanford)● Lu et. al (Stanford)

K-Means

K-Means for our ProblemStep 1:K-Means from all songs listened to by all users

K-Means for our ProblemStep 2:K-Means from user listening history

K-Means for our ProblemStep 3:Predict based on location of user centroids

Mean Average PrecisionPredicted:

Actual:

Mean Average Precision

What are the Results?All Metadata 0.00200326282427

Weighted Centroids 0.00375567272976

Multiple Centroids (2) 0.00364834470835

Modified Metadata 0.00994279218087

All Improvements 0.01008282844

More Data 0.00266295400221

Number of Clusters?

Collaborative FilteringShawn

1 4 3 8 9

Collaborative Filtering

User-based Collaborative Filtering

Step 1: Obtain user history profile

Step 2:Calculate similarity between users to find their nearest neighbors

Step 3:Compute weighted average of the ratings by the neighbors and find the items with the highest average

Implementation Details

MRJobUsed Amazon EMR with MRJob to parallelize the algorithm across multiple machines

What are the Results?

User Collaborative Filtering (1k Users) 0.008223545412

Compiled ResultsBenchmark (1k Users) 0.0104030562401

K-means 0.01008282844

Benchmark (110k Users) 0.02079

User Collaborative Filtering 0.112794360446

Improvements?● Ensemble techniques● More metadata from echonest (genre, artist

popularity, etc.)● MapReduce for k-means

Questions?

Referenceshttp://cs229.stanford.edu/proj2012/NiuYinZhang-MillionSongDatasetChallenge.pdf

http://cs229.stanford.edu/proj2012/LuXiongLiu-MusicRecommenderSystemUtilizingUsers%E2%80%99ListeningHistoryandSocialNetworkInformation.pdf

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4457263&tag=1

http://www-personal.umich.edu/~yjli/content/projectreport.pdf

Github Repohttps://github.com/erinkidd01/CS158FinalProject.git

cs158: final project

Education

project final final (1)

final project hrm final

final final paper alcohol project

final project aai final

06 logisticregression student -...

final project final

cl project (final final final)

simes : final project report - ercim.eu 961620 final project...

01 introduction student -...

final project - presentation final

biotech project final final

final project final doc

westjet project - final final

china final project final

final final year project report

final project (final version)

02 decisiontrees student - harvey mudd...

project final report - petrobot...

final final project

project final...