cs158: final project

Post on 26-Dec-2014

74 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Music RecommendationEvan Casey and Erin Coughlan

Problem● Million Song Dataset Challenge (Kaggle)

○ 110k Users, 1m+ unique songs● Music Recommendation

○ Recommend songs for each user based on a larger training set of user listening histories

● Winner - 0.17910 (17.9%)● Benchmark - 0.02079 (2.1%)

Data● Million Song Dataset● Two subsets of 1000

users (random and most active)

● Echonest API to get metadata

Echonest DataMetadata we obtained:

● Tempo● Danceability● Energy● Speech● Acousticness

Unavailable metadata:

● Genre● Artist popularity● Song popularity● Location● Year released

Previous ApproachesDynamic K-Means:

● Kim et. al (6th Int’l Conference on ML)● Li et. al (University of Michigan)

Item and user-based collaborative-filtering:● Niu et. al (Stanford)● Lu et. al (Stanford)

K-Means

K-Means for our ProblemStep 1:K-Means from all songs listened to by all users

K-Means for our ProblemStep 2:K-Means from user listening history

K-Means for our ProblemStep 3:Predict based on location of user centroids

Mean Average PrecisionPredicted:

Actual:

Mean Average Precision

What are the Results?All Metadata 0.00200326282427

Weighted Centroids 0.00375567272976

Multiple Centroids (2) 0.00364834470835

Modified Metadata 0.00994279218087

All Improvements 0.01008282844

More Data 0.00266295400221

Number of Clusters?

Collaborative FilteringShawn

Billy

Paul

1 4 3 8 9

2

4

3 4

1

2

8 8

4

Collaborative Filtering

User-based Collaborative Filtering

Step 1: Obtain user history profile

User-based Collaborative Filtering

Step 2:Calculate similarity between users to find their nearest neighbors

User-based Collaborative Filtering

Step 3:Compute weighted average of the ratings by the neighbors and find the items with the highest average

Implementation Details

MRJobUsed Amazon EMR with MRJob to parallelize the algorithm across multiple machines

What are the Results?

User Collaborative Filtering (1k Users) 0.008223545412

User Collaborative Filtering (10k Users) 0.012654713312

User Collaborative Filtering (110k Users) 0.112794360446

Compiled ResultsBenchmark (1k Users) 0.0104030562401

K-means 0.01008282844

Benchmark (110k Users) 0.02079

User Collaborative Filtering 0.112794360446

Improvements?● Ensemble techniques● More metadata from echonest (genre, artist

popularity, etc.)● MapReduce for k-means

Questions?

Github Repohttps://github.com/erinkidd01/CS158FinalProject.git

top related