yelper helper concept
DESCRIPTION
A Personalized Review Engine for Yelp Users.TRANSCRIPT
Personalized Review Engine for Yelp Users
Yelper Helper
Alex Ruiz-Euler08/2014
Ye Yelper Helper
PROBLEM SOLUTION
Yelper Helper: Overview.
Determine usefulness of new reviews
Compute user similarity
User making query
Yelper Helper: Overview.
Determine usefulness of new reviews
Compute user similarity
User making query
Yelp Reviews
Useful tags
Predicting Number of “Useful” Tags
Data structure (Las Vegas):
363,691 reviews
112,702 users
3,536 businesses
(source: Yelp Academic Dataset)
Review UserReview
attributesUser
attributesBusiness attributes
Useful tags
1 AbeVocabulary richness, stars...
no. reviews, average rating...
Average rating... 3
Problem: ~75% of Yelp reviews have 0 “useful” tags*.
(* Las Vegas sample.)
Predicting the number k of “useful” tags.
Predicting the number k of “useful” tags.
Zero-Inflated Poisson Distribution
* Feature selection* Model selection (10-fold CV)
Yelper Helper: Overview.
Predict usefulness of new reviews
Compute user similarity
Yelper Helper: Overview.
Predict usefulness of new reviews
Compute user similarity
Use-taste matrix / Restaurant-category matrix
U: Ratings (stars)
Rest 1 Rest 2 Rest 3 Rest 4
User 1 1 3 2
User 2 2 4 1
User 3 2 1
User 4 1 2 1
Hipster Divey Upscale Intimate Touristy Classy Romantic
Rest 1 1 1Rest 2 1 1Rest 3 1 1 1Rest 4 1 1 1
V: Restaurant profile
2
User profile matrix
Hipster Divey Upscale Intimate Touristy Classy Romantic
User 1 3 1 33 1User 2 2User 3 1 1 1User 4 3
13 2 1 3 15 4 4 5
2 31 2 3
13
Similarity Matrix – Euclidean Distance Over UV.
User 1 User 2 User 3 User 4
User 1 0
User 2 1.5 0
User 3 2 3.4 0
User 4 7.2 1 2 0
About Me – Alex Ruiz-Euler (PhD Political Science, 2014)
Thank You.
Validation: Poisson regression / Comparing AIC.
Feature Selection
Model Selection
Issues with data
For similarity:
Attributes of users in Yelp are about activity, not preferences.
→ Uncover taste preferences with collaborative filtering.
For prediction:
Prediction of usefulness of review:
a) Too many zeros (zero-inflated!). Weird results (null vs. full).
→ Zero-inflated Poisson model.