how apache drives music recommendations at spotify

37
How Apache Drives Music Recommendations At Spotify Josh Baer ([email protected]) Note: The view expressed is my own and does not necessarily represent that of Spotify

Upload: josh-baer

Post on 08-Jan-2017

2.694 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: How Apache Drives Music Recommendations At Spotify

How Apache Drives Music Recommendations At Spotify

Josh Baer ([email protected])Note: The view expressed is my own and does not necessarily represent that of Spotify

Page 2: How Apache Drives Music Recommendations At Spotify

Who Am I?• Technical Product Owner at

Spotify • Working with batch and fast

processing infrastructure

@l_phant

Page 3: How Apache Drives Music Recommendations At Spotify

Music Discovery in the 90s

Page 4: How Apache Drives Music Recommendations At Spotify

What is Spotify?• Music Streaming Service • Launched in 2008 • Free and Premium Tiers • Available in 58 Countries

Page 5: How Apache Drives Music Recommendations At Spotify

75+ Million Active Users

Page 6: How Apache Drives Music Recommendations At Spotify

30+ Million Songs

Page 7: How Apache Drives Music Recommendations At Spotify

1+ Billion Plays/Day

Page 8: How Apache Drives Music Recommendations At Spotify

Music Recommendations with Apache

Page 9: How Apache Drives Music Recommendations At Spotify
Page 10: How Apache Drives Music Recommendations At Spotify
Page 11: How Apache Drives Music Recommendations At Spotify

How do we recommend a personalized playlist of

new music to 75+ million users?

Page 12: How Apache Drives Music Recommendations At Spotify

10.123.133.333 - - [Mon, 3 June 2015 11:31:33 GMT] "GET /api/admin/job/aggregator/status HTTP/1.1" 200 1847 "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"

10.123.133.222 - - [Mon, 3 June 2015 11:31:43 GMT] "GET /api/admin/job/aggregator/status HTTP/1.1" 200 1984 "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36”

10.123.133.222 - - [Mon, 3 June 2015 11:33:02 GMT] "GET /dashboard/courses/1291726 HTTP/1.1" 304 - "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"

10.321.145.111 - - [Mon, 3 June 2015 11:33:03 GMT] "GET /api/loggedInUser HTTP/1.1" 304 - "https://my.analytics.app/dashboard/courses/1291726" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"

10.112.322.111 - - [Mon, 3 June 2015 11:33:03 GMT] "POST /api/instrumentation/events/new HTTP/1.1" 200 2 "https://my.analytics.app/dashboard/courses/1291726" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36”

10.123.133.222 - - [Mon, 3 June 2015 11:33:02 GMT] "GET /dashboard/courses/1291726 HTTP/1.1" 304 - "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"

It begins with a log

Page 13: How Apache Drives Music Recommendations At Spotify
Page 14: How Apache Drives Music Recommendations At Spotify
Page 15: How Apache Drives Music Recommendations At Spotify

Apache Kafka at Spotify•340 Kafka-related nodes

•30 TB/day from logs

Page 16: How Apache Drives Music Recommendations At Spotify
Page 17: How Apache Drives Music Recommendations At Spotify

How do we store TBs of new data every data?

Page 18: How Apache Drives Music Recommendations At Spotify
Page 19: How Apache Drives Music Recommendations At Spotify

Apache Hadoop at Spotify• 1700 Nodes

• 60 PB of Data

• 70 TB of Memory

• Over 1 Million jobs run in Q3, 2015

Page 20: How Apache Drives Music Recommendations At Spotify

Proc

essi

ng G

row

th

150%

250%

350%

450%

550%

Q4-2013 Q1-2014 Q2-2014 Q3-2014 Q4-2014 Q1-2015 Q2-2015 Q3-2015

Hadoop at Spotify

Page 21: How Apache Drives Music Recommendations At Spotify

Processing Toolbox• Apache Crunch

• Scalding

• Apache Hive

• Apache Spark

• Apache Storm

• Hadoop Streaming

• Apache Pig

Page 22: How Apache Drives Music Recommendations At Spotify

Storage Formats• Apache Avro

• Apache Parquet

Page 23: How Apache Drives Music Recommendations At Spotify

How do we personalize the playlists?

Page 24: How Apache Drives Music Recommendations At Spotify
Page 25: How Apache Drives Music Recommendations At Spotify

Collaborative FilteringJustin Bieber Drake Avicii Major Lazer

Anna Listened Listened

Gustav Listened Listened Listened

Mary Listened Listened Listened Listened

Michael Listened ListenedSuggest

Page 26: How Apache Drives Music Recommendations At Spotify
Page 27: How Apache Drives Music Recommendations At Spotify

How do we serve new playlists to all our users

every week?

Page 28: How Apache Drives Music Recommendations At Spotify
Page 29: How Apache Drives Music Recommendations At Spotify

Apache Cassandra at Spotify• Number of Clusters: 113

• Number of Machines: 1155

• Largest Cluster: 60 Nodes

Page 30: How Apache Drives Music Recommendations At Spotify

Driven By Data

Page 31: How Apache Drives Music Recommendations At Spotify

Driven By Apache

Page 32: How Apache Drives Music Recommendations At Spotify

Thank YOU for your contributions to

Apache products!

Page 33: How Apache Drives Music Recommendations At Spotify

One Last Thing…

Page 34: How Apache Drives Music Recommendations At Spotify

Spotify Luigi•Workflow Manager •Over 150 contributors •Used by 10s, possibly 100s of companies

Page 35: How Apache Drives Music Recommendations At Spotify

Maybe… Apache Luigi?Sponsors/mentors/contributors wanted!

Page 36: How Apache Drives Music Recommendations At Spotify

Think this stuff is interesting?We have a great time building it!

spotify.com/jobs

Page 37: How Apache Drives Music Recommendations At Spotify

Better Spotify ML Presentations• Algorithmic Music Recommendations at Spotify (Chris Johnson)

• Interactive Recommender Systems with Netflix and Spotify (Chris Johnson)

• Music recommendations @ MLConf 2014 (Erik Bernhardsson)

• Machine learning @ Spotify (Andy Sloane)

• Recommending music on Spotify with deep learning (Sander Dieleman)

• Scala Data Pipelines @ Spotify (Neville Li)

• Spotify's Music Recommendations Lambda Architecture (Esh Kumar and Emily Samuels)