real-time recommendations for retail: architecture, algorithms, and design
DESCRIPTION
Users are constantly searching for new content and to stay competitive organizations must act immediately based on up-to-date data. Outdated recommendations decrease the likelihood of presenting the right offer and make it harder to maintain customer loyalty. In order to provide the most relevant recommendations and increase engagement, organizations must track customer interactions and re-score recommendations on the fly. Data sources have expanded dramatically to include a wealth of historical data and a constant influx of behavior data. The key to moving from predictive models, applied in batch, to models that provide responses in real time, is to focus on the efficiency of model application. The speed that recommendations can be served is influenced by: Architecture of the recommendation serving platform Choice of recommendation algorithm Datastore access patterns In this presentation, we’ll discuss how developers can use open source components like HBase and Kiji to develop low-latency recommendation models that can be easily deployed by e-commerce companies. We will give practical advice on how to choose models and design data stores that make use of the architecture and quickly serve new recommendations.TRANSCRIPT
![Page 1: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/1.jpg)
REAL-TIME RECOMMENDATIONS FOR RETAIL: ARCHITECTURE, ALGORITHMS, AND DESIGN
Juliet Hougland and Jonathan Natkins
![Page 2: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/2.jpg)
Who Are We?
Jonathan NatkinsField Engineer at WibiDataBefore that, Cloudera Software EngineerBefore that, Vertica Software/Field Engineer
Juliet HouglandData Scientist, previously at WibiDataMS in Applied MathBA in Math-Physics
![Page 3: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/3.jpg)
Recommendations in Retail
Personalized versus Non-Personalized
![Page 4: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/4.jpg)
Recommendations in Retail
Personalized versus Non-Personalized
![Page 5: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/5.jpg)
Recommendations in Retail
Personalized versus Non-Personalized
![Page 6: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/6.jpg)
Recommender Contexts
Taste HistoryBased on everything you know about a userInterests over months/years
Current TasteBased on a user’s immediate historyInterests over minutes/hours
EphemeralExtreme version of current tasteFor example, location
Demographic*Similar to taste history, but less subjectiveGeographic region, age bracket, etc.
![Page 7: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/7.jpg)
Why Does Real-Time Matter?
Relevancy
![Page 8: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/8.jpg)
I am a Special Snowflake
Natty
![Page 9: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/9.jpg)
Requirements for a Real-Time System
General System RequirementsHandle millions of customers/usersSupport collection and storage of complex data
Static and event-series
Real-Time System RequirementsQuickly retrieve subsets of data for a single userAggregate/derive new, first-class data per user
![Page 10: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/10.jpg)
What is Kiji?
The Kiji project is a modular, open-source framework for building real-time applications that collect, store, and analyze entity-centric data
kiji.orggithub.com/kijiproject
![Page 11: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/11.jpg)
What is Kiji?
The Kiji project is a modular, open-source framework for building real-time applications that collect, store, and analyze entity-centric data
kiji.orggithub.com/kijiproject
![Page 12: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/12.jpg)
Three Challenges
Developing models for use in real-timeScoring models in real-timeDeploying models into a production environment
![Page 13: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/13.jpg)
How Can We Make Real-Time Models?
Population interests change slowly
Individual interests change quickly
![Page 14: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/14.jpg)
How Can We Make Real-Time Models?
Population interests change slowly
Individual interests change quickly
Models don’t need to be retrained
frequently
![Page 15: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/15.jpg)
How Can We Make Real-Time Models?
Population interests change slowly
Individual interests change quickly
Models don’t need to be retrained
frequently
Application of a model should be fast
![Page 16: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/16.jpg)
A Common Workflow
Train a model over the entire datasetSave fitted model parameters to a file or another tableAccess the model parameters when generating new recommendations based on new data
This is EXPENSIVE
![Page 17: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/17.jpg)
Developing Models
KijiExpressScala interface for interacting with Kiji dataUses Scalding for designing complex dataflows
Model LifecycleAllows analysts and data scientists to break apart a model into phases
![Page 18: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/18.jpg)
Scoring Models in Real-Time
Batch isn’t real-time
![Page 19: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/19.jpg)
Scoring Models in Real-Time
Batch isn’t real-time
Number ofUsers
Number of Interactions
![Page 20: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/20.jpg)
Scoring Models in Real-Time
Batch isn’t real-time
Number ofUsers
Number of Interactions
A few users withmany interactions
![Page 21: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/21.jpg)
Scoring Models in Real-Time
Batch isn’t real-time
Number ofUsers
Number of Interactions
A few users withmany interactions
A lot of users withfew interactions
![Page 22: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/22.jpg)
Fresheners Compute Lazily
Client
KijiScoring Server HBase
Read a column
Get from HBase
![Page 23: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/23.jpg)
Fresheners Compute Lazily
Client
KijiScoring Server HBase
Read a column
Get from HBase
Freshness Policy
![Page 24: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/24.jpg)
Fresheners Compute Lazily
Client
KijiScoring Server HBase
Read a column
Get from HBase
Freshness PolicyYes, return to client
![Page 25: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/25.jpg)
Fresheners Compute Lazily
NO
Client
KijiScoring Server HBase
Read a column
Get from HBase
Freshness Policy
Scorer
![Page 26: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/26.jpg)
Fresheners Compute Lazily
Client
KijiScoring Server HBase
Read a column
Get from HBase
Freshness Policy
ScorerYes, return to client
Write back for next time
![Page 27: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/27.jpg)
Kiji Application Stack
![Page 28: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/28.jpg)
Deployment Challenges
![Page 29: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/29.jpg)
Kiji Model Repository
Link between application and modelsStores Freshener metadata
FreshnessPolicy, Scorer, attached columnLocation of trained model
Stores Scorer codeCode repository makes model scoring code available to the application from a central location
New models can be deployed to the Model Repository and made immediately available to the application
![Page 30: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/30.jpg)
Kiji Model Repository
![Page 31: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/31.jpg)
Retail Recommendation
![Page 32: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/32.jpg)
Types of Recommenders
RecommendationAlgorithms
CollaborativeFilteringMethods
ContentBased
Methods
MemoryBased
ModelBased
![Page 33: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/33.jpg)
Content-Based Recommenders
Orange-Nosed
Lab Assistant
Meeps a lot
Build models around entities using features that we think reflect inherent characteristics
![Page 34: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/34.jpg)
Content-Based Recommenders
safer
faster knife
![Page 35: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/35.jpg)
Pandora: Content-Based
Expertly-CharacterizedMusic
![Page 36: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/36.jpg)
Collaborative Filtering
Represent users-itemaffinities as a sparsematrix
Beaker
BananaSlicer
PineappleSlicerUsers ≈ Rows
Items ≈ Columns
![Page 37: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/37.jpg)
Aspirational Ratings
I put in my queue… I actually watch
![Page 38: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/38.jpg)
Collaborative Filtering
Represent users-itemaffinities as a sparsematrix
Beaker
BananaSlicer
PineappleSlicerUsers ≈ Rows
Items ≈ Columns
![Page 39: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/39.jpg)
Simple aggregate predictors
Collaborative Filtering: How It WorksSimilar Users Similar Products
![Page 40: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/40.jpg)
Similar Entities
What do we mean by similar?Jaccard Index: a measure of set similarityCosine Similarity: the angle between two vectorsPearson Correlation: statistical measure, similar to cosine
Naively, we could compare every entity to each other
…But that would not scale will with increasing numbers of entities
![Page 41: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/41.jpg)
Building the Similarity Matrix
![Page 42: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/42.jpg)
Collaborative Filtering: Is This Useful?
Problem: Too much data!Tracking user preferences and all their events generates huge amounts of data
Problem: Too little data!Dimensions of user-space and item-space are usually very largeMore variables makes it more difficult to generate user preferences
Problem: Cold startIf you don’t know anything about a user, what should you recommend?
Problem: More ratings means slower computationsIdentifying neighborhoods of entities is expensive
![Page 43: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/43.jpg)
Collaborative Filtering: Why Is It Useful?
Because it worksContent-agnostic
All that matters is co-occurrence of events
![Page 44: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/44.jpg)
Amazon: Item-Item Collaborative Filtering
Used for personalized recommendationsFill screen real estate with related itemsProduces specific, but non-creepy recommendations
Linden, G.; Smith, B.; York, J., "Amazon.com recommendations: item-to-item collaborative filtering," Internet Computing, IEEE , vol.7, no.1, pp.76,80, Jan/Feb 2003
>
![Page 45: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/45.jpg)
Item-Item Collaborative Filtering
Beaker buys a banana slicerThen:
Generate list of candidate items to predict ratings forPredict ratings for candidate itemsSelect Top-N items
![Page 46: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/46.jpg)
Accessing External Data
KeyValueStore API enables external data access when applying a modelExternal data might be…
Trained model parametersHierarchical/Taxonomic dataGeo-lookup
Store external data flexiblyText files, sequence files, Kiji tables, etc.Data access is decoupled from use during execution
If the data doesn’t fit in memory, put it in a table
![Page 47: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/47.jpg)
How Much Less Work Can We Do?
We can choose a predictor that allows us to truncate a sum
There are two ways terms in the sum of our predictor can be small
No ratingSmall similarity
![Page 48: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/48.jpg)
How Much Less Work Can We Do?
We can choose a predictor that allows us to truncate a sum
There are two ways terms in the sum of our predictor can be small
No ratingSmall similarity
![Page 49: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/49.jpg)
How Much Less Work Can We Do?
We can choose a predictor that allows us to truncate a sum
There are two ways terms in the sum of our predictor can be small
No ratingSmall similarity
Ignore unrated items
![Page 50: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/50.jpg)
How Much Less Work Can We Do?
We can choose a predictor that allows us to truncate a sum
There are two ways terms in the sum of our predictor can be small
No ratingSmall similarity
Ignore dissimilar items
![Page 51: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/51.jpg)
How Much Less Work Can We Do?
If we only present a few recommendations, we don’t need to predict ratings for all itemsChoose your candidate set to estimate ratings wisely or infer from nearest neighbors
![Page 52: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/52.jpg)
Organizing Data in Item-Item CF
![Page 53: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/53.jpg)
Accessing Data During Freshening
![Page 54: Real-time Recommendations for Retail: Architecture, Algorithms, and Design](https://reader036.vdocument.in/reader036/viewer/2022062319/558b82a8d8b42a9c3b8b4649/html5/thumbnails/54.jpg)
Want to Know More?
The Kiji Projectkiji.orggithub.com/kijiproject
Questions about this presentation?Twitter: @JulietHougland or @nattyiceEmail: [email protected]