machine learning and data at meetup
DESCRIPTION
Presentation given for Tech Talks at Meetup event on 8/27/13TRANSCRIPT
![Page 2: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/2.jpg)
My Background
● Software Engineer/Data Scientist● Machine learning team● At Meetup since May 2012● BS Computer Science
○ Information Retrieval○ Data Mining○ Math
■ Linear Algebra■ Graph Theory
![Page 3: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/3.jpg)
You
● Data Scientists?● Engineers?● Statisticians?● Students?● Non-technical?
![Page 4: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/4.jpg)
What this talk is
● Super secret peek into Meetup!● Meetup recommendations examples● How we do recommendations
(model/features)● Lessons learned/what’s next
![Page 5: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/5.jpg)
What this talk isn’t
● What is a data scientist?● What is big data?● How does matrix factorization or gradient
boosted decision trees or map reduce or this framework I hope you’ll use work?
![Page 6: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/6.jpg)
Why Meetup data is cool
● Real people meeting up● Every meetup could change someone's life● No ads, just do the best thing● Oh and 114 million rsvps by >14 million
members● 2.7 million rsvps in the last 30 days
○ ~1/second
![Page 7: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/7.jpg)
![Page 8: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/8.jpg)
Data at Meetup
● User data● Site monitoring/performance● AB testing● Recommendations*
![Page 9: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/9.jpg)
“Everything is a recommendation”
● Not my phrase● Not actually true yet● Working on it
![Page 10: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/10.jpg)
Recommendation
![Page 11: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/11.jpg)
![Page 12: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/12.jpg)
![Page 13: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/13.jpg)
Topic Recommendations
● New registrant● Don’t know anything about you yet!● Most popular is boring/repetitive
Algorithm:○ Group local meetups by topic○ Select topic with most groups○ Remove those groups○ Repeat
![Page 14: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/14.jpg)
![Page 15: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/15.jpg)
![Page 16: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/16.jpg)
Group/Event Recommendations
● Replaced a topic only system● Inputs:
○ Member, location, topics, facebook friends? demographics?
● Outputs:○ Ranking
![Page 17: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/17.jpg)
Collaborative Filtering
● Classic recommendations approach● Users who like this also like this
![Page 18: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/18.jpg)
Why Recs at Meetup are hard
● Incomplete Data (topics)● Cold start● Asking user for data is hard● Going to meetups is scary● Sparsity
○ Location○ Groups/person○ Membership: 0.001%○ Compare to Netflix: 1%
![Page 19: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/19.jpg)
Supervised Learning/Classification
● “Inferring a function from labeled training data”
● Joined Meetup/Didn’t join Meetup● “Features”
![Page 20: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/20.jpg)
Topic Match
![Page 21: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/21.jpg)
State Match
![Page 22: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/22.jpg)
Logistic Regression
● Score○ “Probability”○ Ranking
● Fast + Easy● Weights!
![Page 23: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/23.jpg)
Group recommendation weights
● TopicMatch 1.21● TopicMatchExtended 0.17● FacebookFriends 0.15● SecondDegreeFacebook 0.79● AgeUnmatch -2.20● GenderUnmatch -2.6● StateMatchFeature 0.44● CityMatch 0.02● DistanceBucket <2 1.39● DistanceBucket 2-5 0.83● DistanceBucket 5-10 0.60● DistanceBucket >10 n/a
![Page 24: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/24.jpg)
Making up features
● “Zipscore”● All topics not created equal● Facebook likes
![Page 25: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/25.jpg)
Real data is gross
● Preprocessing is critical!○ missing data○ outliers○ log scale○ bucketing○ selection/sampling (not introducing bias)
![Page 26: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/26.jpg)
Cleaning data
● Schenectady● Beverly Hills● Astronaut● Fake RSVP boosts (+100 guests!)● Rsvp hogs
![Page 27: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/27.jpg)
![Page 28: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/28.jpg)
![Page 29: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/29.jpg)
TO THE FUTURE!
● Hadoop● Clicks● Impressions● People to people recommendations?● Recommending people to groups?
![Page 30: Machine learning and data at Meetup](https://reader034.vdocument.in/reader034/viewer/2022051819/54c651c14a7959b1098b45e9/html5/thumbnails/30.jpg)
Thanks!
Smart people come work with me.http://www.meetup.com/jobs/
Special thanks:● Chris Halpert● Victor J Wang