cs158: final project

25
Music Recommendation Evan Casey and Erin Coughlan

Upload: evan-casey

Post on 26-Dec-2014

74 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: CS158: Final Project

Music RecommendationEvan Casey and Erin Coughlan

Page 2: CS158: Final Project

Problem● Million Song Dataset Challenge (Kaggle)

○ 110k Users, 1m+ unique songs● Music Recommendation

○ Recommend songs for each user based on a larger training set of user listening histories

● Winner - 0.17910 (17.9%)● Benchmark - 0.02079 (2.1%)

Page 3: CS158: Final Project

Data● Million Song Dataset● Two subsets of 1000

users (random and most active)

● Echonest API to get metadata

Page 4: CS158: Final Project

Echonest DataMetadata we obtained:

● Tempo● Danceability● Energy● Speech● Acousticness

Unavailable metadata:

● Genre● Artist popularity● Song popularity● Location● Year released

Page 5: CS158: Final Project

Previous ApproachesDynamic K-Means:

● Kim et. al (6th Int’l Conference on ML)● Li et. al (University of Michigan)

Item and user-based collaborative-filtering:● Niu et. al (Stanford)● Lu et. al (Stanford)

Page 6: CS158: Final Project

K-Means

Page 7: CS158: Final Project

K-Means for our ProblemStep 1:K-Means from all songs listened to by all users

Page 8: CS158: Final Project

K-Means for our ProblemStep 2:K-Means from user listening history

Page 9: CS158: Final Project

K-Means for our ProblemStep 3:Predict based on location of user centroids

Page 10: CS158: Final Project

Mean Average PrecisionPredicted:

Actual:

Page 11: CS158: Final Project

Mean Average Precision

Page 12: CS158: Final Project

What are the Results?All Metadata 0.00200326282427

Weighted Centroids 0.00375567272976

Multiple Centroids (2) 0.00364834470835

Modified Metadata 0.00994279218087

All Improvements 0.01008282844

More Data 0.00266295400221

Page 13: CS158: Final Project

Number of Clusters?

Page 14: CS158: Final Project

Collaborative FilteringShawn

Billy

Paul

1 4 3 8 9

2

4

3 4

1

2

8 8

4

Page 15: CS158: Final Project

Collaborative Filtering

Page 16: CS158: Final Project

User-based Collaborative Filtering

Step 1: Obtain user history profile

Page 17: CS158: Final Project

User-based Collaborative Filtering

Step 2:Calculate similarity between users to find their nearest neighbors

Page 18: CS158: Final Project

User-based Collaborative Filtering

Step 3:Compute weighted average of the ratings by the neighbors and find the items with the highest average

Page 19: CS158: Final Project

Implementation Details

MRJobUsed Amazon EMR with MRJob to parallelize the algorithm across multiple machines

Page 20: CS158: Final Project

What are the Results?

User Collaborative Filtering (1k Users) 0.008223545412

User Collaborative Filtering (10k Users) 0.012654713312

User Collaborative Filtering (110k Users) 0.112794360446

Page 21: CS158: Final Project

Compiled ResultsBenchmark (1k Users) 0.0104030562401

K-means 0.01008282844

Benchmark (110k Users) 0.02079

User Collaborative Filtering 0.112794360446

Page 22: CS158: Final Project

Improvements?● Ensemble techniques● More metadata from echonest (genre, artist

popularity, etc.)● MapReduce for k-means

Page 23: CS158: Final Project

Questions?

Page 25: CS158: Final Project

Github Repohttps://github.com/erinkidd01/CS158FinalProject.git