real-time recommendations for retail: architecture, algorithms, and design

REAL-TIME RECOMMENDATIONS FOR RETAIL: ARCHITECTURE, ALGORITHMS, AND DESIGN

Juliet Hougland and Jonathan Natkins

Who Are We?

Jonathan NatkinsField Engineer at WibiDataBefore that, Cloudera Software EngineerBefore that, Vertica Software/Field Engineer

Juliet HouglandData Scientist, previously at WibiDataMS in Applied MathBA in Math-Physics

Recommendations in Retail

Personalized versus Non-Personalized

Recommender Contexts

Taste HistoryBased on everything you know about a userInterests over months/years

Current TasteBased on a user’s immediate historyInterests over minutes/hours

EphemeralExtreme version of current tasteFor example, location

Demographic*Similar to taste history, but less subjectiveGeographic region, age bracket, etc.

Why Does Real-Time Matter?

Relevancy

I am a Special Snowflake

Requirements for a Real-Time System

General System RequirementsHandle millions of customers/usersSupport collection and storage of complex data

Static and event-series

Real-Time System RequirementsQuickly retrieve subsets of data for a single userAggregate/derive new, first-class data per user

What is Kiji?

The Kiji project is a modular, open-source framework for building real-time applications that collect, store, and analyze entity-centric data

kiji.orggithub.com/kijiproject

What is Kiji?

The Kiji project is a modular, open-source framework for building real-time applications that collect, store, and analyze entity-centric data

kiji.orggithub.com/kijiproject

Three Challenges

Developing models for use in real-timeScoring models in real-timeDeploying models into a production environment

How Can We Make Real-Time Models?

Population interests change slowly

Individual interests change quickly

Models don’t need to be retrained

frequently

Models don’t need to be retrained

frequently

Application of a model should be fast

A Common Workflow

Train a model over the entire datasetSave fitted model parameters to a file or another tableAccess the model parameters when generating new recommendations based on new data

This is EXPENSIVE

Developing Models

KijiExpressScala interface for interacting with Kiji dataUses Scalding for designing complex dataflows

Model LifecycleAllows analysts and data scientists to break apart a model into phases

Scoring Models in Real-Time

Batch isn’t real-time

Number ofUsers

Number of Interactions

Number ofUsers

A few users withmany interactions

Number ofUsers

A few users withmany interactions

A lot of users withfew interactions

Fresheners Compute Lazily

Client

KijiScoring Server HBase

Read a column

Get from HBase

Client

Read a column

Get from HBase

Freshness Policy

Client

Read a column

Get from HBase

Freshness PolicyYes, return to client

Client

Read a column

Get from HBase

Freshness Policy

Scorer

Client

Read a column

Get from HBase

Freshness Policy

ScorerYes, return to client

Write back for next time

Kiji Application Stack

Deployment Challenges

Kiji Model Repository

Link between application and modelsStores Freshener metadata

FreshnessPolicy, Scorer, attached columnLocation of trained model

Stores Scorer codeCode repository makes model scoring code available to the application from a central location

New models can be deployed to the Model Repository and made immediately available to the application

Kiji Model Repository

Retail Recommendation

Types of Recommenders

RecommendationAlgorithms

CollaborativeFilteringMethods

ContentBased

Methods

MemoryBased

ModelBased

Content-Based Recommenders

Orange-Nosed

Lab Assistant

Meeps a lot

Build models around entities using features that we think reflect inherent characteristics

Content-Based Recommenders

faster knife

Pandora: Content-Based

Expertly-CharacterizedMusic

Collaborative Filtering

Represent users-itemaffinities as a sparsematrix

Beaker

BananaSlicer

PineappleSlicerUsers ≈ Rows

Items ≈ Columns

Aspirational Ratings

I put in my queue… I actually watch

Collaborative Filtering

Represent users-itemaffinities as a sparsematrix

Beaker

BananaSlicer

PineappleSlicerUsers ≈ Rows

Items ≈ Columns

Simple aggregate predictors

Collaborative Filtering: How It WorksSimilar Users Similar Products

Similar Entities

What do we mean by similar?Jaccard Index: a measure of set similarityCosine Similarity: the angle between two vectorsPearson Correlation: statistical measure, similar to cosine

Naively, we could compare every entity to each other

…But that would not scale will with increasing numbers of entities

Building the Similarity Matrix

Collaborative Filtering: Is This Useful?

Problem: Too much data!Tracking user preferences and all their events generates huge amounts of data

Problem: Too little data!Dimensions of user-space and item-space are usually very largeMore variables makes it more difficult to generate user preferences

Problem: Cold startIf you don’t know anything about a user, what should you recommend?

Problem: More ratings means slower computationsIdentifying neighborhoods of entities is expensive

Collaborative Filtering: Why Is It Useful?

Because it worksContent-agnostic

All that matters is co-occurrence of events

Amazon: Item-Item Collaborative Filtering

Used for personalized recommendationsFill screen real estate with related itemsProduces specific, but non-creepy recommendations

Linden, G.; Smith, B.; York, J., "Amazon.com recommendations: item-to-item collaborative filtering," Internet Computing, IEEE , vol.7, no.1, pp.76,80, Jan/Feb 2003

Item-Item Collaborative Filtering

Beaker buys a banana slicerThen:

Generate list of candidate items to predict ratings forPredict ratings for candidate itemsSelect Top-N items

Accessing External Data

KeyValueStore API enables external data access when applying a modelExternal data might be…

Trained model parametersHierarchical/Taxonomic dataGeo-lookup

Store external data flexiblyText files, sequence files, Kiji tables, etc.Data access is decoupled from use during execution

If the data doesn’t fit in memory, put it in a table

How Much Less Work Can We Do?

We can choose a predictor that allows us to truncate a sum

There are two ways terms in the sum of our predictor can be small

No ratingSmall similarity

Ignore unrated items

Ignore dissimilar items

If we only present a few recommendations, we don’t need to predict ratings for all itemsChoose your candidate set to estimate ratings wisely or infer from nearest neighbors

Organizing Data in Item-Item CF

Accessing Data During Freshening

Want to Know More?

The Kiji Projectkiji.orggithub.com/kijiproject

Questions about this presentation?Twitter: @JulietHougland or @nattyiceEmail: natty@wibidata.com

real-time recommendations for retail: architecture, algorithms, and design

realtime models

realtime scoring models

realtime deploying models

kiji data

realtime matter

user interests

population interests

central locationnew

Technology

chapter 5 intersection recommendations · land uses...

westside future fund retail market analysis &...

friend recommendations in social networks using genetic...

portfolio 2020 creative content marketing · online...

retail stores - wedc · recommendations for retail stores...

sold out? ukrainian grocery retail-implications for...

welcome wmata northern bus garage redevelopment …workshop...

preserving local, independent retail: recommendations for

2011 china retail loss prevention survey - kpmg | us ·...

oracle retail size profile optimization - data sheet | ·...

graphs and algorithms lecture notes for comp40008 2020...

retail strategy: evidence and recommendations to · pdf...

retail strategy - sgsep.com.au · the following is a...

real-time recommendations for retail: architecture,...

retail reference architecture part 3: scalable insight...

algorithms for distributed stream...

retail licence exemptions for solar power purchase ... ·...

friends’ recommendations in social networks: an online...

friend recommendations in social networks using genetic...

case study a business model and marketing strategy for a...