retail reference architecture part 3: scalable insight component providing user history,...

87
Retail Reference Architecture with MongoDB Antoine Girbal Principal Solutions Engineer, MongoDB Inc. @antoinegirbal

Upload: mongodb

Post on 08-Sep-2014

336 views

Category:

Technology


1 download

DESCRIPTION

During this session we will cover the best practices for implementing the insight component with MongoDB. This includes efficiently ingesting and managing a large volume of user activity logs, such as clickstreams, views, likes and sales. We'll dive into how you can derive user statistics, product maps and trends using different analytics tools like the aggregation framework, map/reduce or the Hadoop connector. We will also cover operational considerations, including low-latency data ingestion and seamless aggregation queries.

TRANSCRIPT

Page 1: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Retail Reference Architecturewith MongoDB

Antoine GirbalPrincipal Solutions Engineer, MongoDB Inc.@antoinegirbal

Page 2: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Introduction

Page 3: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

MongoDB Overview

Page 4: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

4

MongoDB Strategic Advantages

Horizontally Scalable-Sharding

AgileFlexible

High Performance &Strong Consistency

Application

HighlyAvailable-Replica Sets

{ customer: “roger”, date: new Date(), comment: “Spirited Away”, tags: [“Tezuka”, “Manga”]}

Page 5: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

5

Documents let you build your data to fit your application

Relational MongoDB{ customer_id : 1,

name : "Mark Smith",city : "San Francisco",orders: [ {

order_number : 13,store_id : 10,date: “2014-01-03”,products: [

{SKU: 24578234,

Qty: 3, Unit_price:

350},{SKU:

98762345, Qty: 1, Unit_Price:

110}]

},{ <...> }

]}

CustomerID First Name Last Name City0 John Doe New York1 Mark Smith San Francisco2 Jay Black Newark3 Meagan White London4 Edward Danields Boston

Order Number Store ID Product Customer ID10 100 Tablet 011 101 Smartphone 012 101 Dishwasher 013 200 Sofa 114 200 Coffee table 115 201 Suit 2

Page 6: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

6

Notions

RDBMS MongoDB

Database Database

Table Collection

Row Document

Column Field

Page 7: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Architecture Overview

Page 8: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

8

Information Management

Merchandising

Content

Inventory

Customer

Channel

Sales & Fulfillment

Insight

Social

Architecture Overview

Customer

ChannelsAmazon

Ebay…

StoresPOSKiosk

MobileSmartphone

Tablet

Website

Contact Center

APIData and Service

Integration

SocialFacebook

Twitter…

Data Warehouse

Analytics

Supply Chain Management

System

Suppliers

3rd Party

In Network

Web Servers

Application Servers

Page 9: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

9

Commerce Functional Components

Information Layer

Look & Feel

Navigation

Customization

Personalization

Branding

Promotions

Chat

Ads

Customer's Perspective

ResearchBrowseSearch

SelectShopping Cart

PurchaseCheckout

ReceiveTrack

UseFeedbackMaintain

DialogAssist

Market / Offer

Guide

Offer

Semantic Search

Recommend

Rule-based Decisions

Pricing

Coupons

Sell / Fullfill

Orders

Payments

Fraud Detection

Fulfillment

Business Rules

InsightSession CaptureActivity

Monitoring

Customer Enterprise

Information Management

Merchandising

Content

Inventory

Customer

Channel

Sales & Fulfillment

Insight

Social

Page 10: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Merchandising

Page 11: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

11

Merchandising

Merchandising

MongoDB

Product Variation

Product Hierarchy

Pricing

Promotions

Ratings & Reviews

Calendar

Semantic Search

Product Definition

Localization

Page 12: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

12

• Single view of a product: Single scalable catalog service used by all services and channels

• Read volume is high and sustained

• Write volume spikes up during catalog update, but also allows real-time updating of a product

• Advanced indexing and querying is a requirement: find product by SKU, category, color, etc

• Geographical distribution and low latency achieved through replication

• Scaling achieved through sharding

Merchandising - principles

Page 13: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

13

Merchandising - requirements

Requirement Example Challenge MongoDB

Single-view of product Blended description and hierarchy of product to ensure availability on all channels

Flexible document-oriented storage

High sustained read volume with low latency

Constant querying from online users and sales associates, requiring immediate response

Fast indexed querying, replication allows local copy of catalog, sharding for scaling

Spiky and real-time write volume

Bulk update of full catalog without impacting production, real-time touch update

Fast in-place updating, real-time indexing, , sharding for scaling

Advanced querying Find product based on color, size, description

Ad-hoc querying on any field, advanced secondary and compound indexing

Page 14: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

14

Merchandising - Product Page

Product images

General Informatio

n

List of Variations

External Informatio

n

Localized Descriptio

n

Page 15: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

15

> db.definitions.findOne()

{ productId: "301671", // main product id

department: "Shoes",

category: "Shoes/Women/Pumps",

brand: "Guess",

thumbnail: "http://cdn…/pump.jpg",

image: "http://cdn…/pump1.jpg", // larger version of thumbnail

title: "Evening Platform Pumps",

description: "Those evening platform pumps put the perfect finishing touches on your most glamourous night-on-the-town outfit",

shortDescription: "Evening Platform Pumps",

style: "Designer",

type: "Platform",

rating: 4.5, // user rating

lastUpdated: Date("2014/04/01"), // last update time

… }

Merchandising - Product Definition

Page 16: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

16

• Get item from Product Id

db.definition.findOne( { productId: "301671" } )

• Get item from Product Ids

db.definition.findOne( { productId: { $in: ["301671", "301672" ] } } )

• Get items by department

db.definition.find({ department: "Shoes" })

• Get items by category prefix

db.definition.find( { category: /^Shoes\/Women/ } )

• Indices

productId, department, category, lastUpdated

Merchandising - Product Definition

Page 17: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

17

> db.variations.findOne()

{

_id: "730223104376", // the sku

productId: "301671", // references product id

thumbnail: "http://cdn…/pump-red.jpg",

image: "http://cdn…/pump-red.jpg", // larger version of thumbnail

size: 6.0,

color: "Red",

width: "B",

heelHeight: 5.0,

lastUpdated: Date("2014/04/01"), // last update time

}

Merchandising - Product Variation

Page 18: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

18

• Get Variation from SKU

db.variation.find( { _id: "730223104376" } )

• Get all variations for a product, sorted by SKU

db.variation.find( { productId: "301671" } ).sort( { _id: 1 } )

• Indices

productId, lastUpdated

Merchandising - Product Variation

Page 19: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

20

Price: {

_id: "sku730223104376_store123",

currency: "USD",

price: 89.95,

lastUpdated: Date("2014/04/01"), // last update time

}

_id: concatenation of item and store.

Store: can be a store group or store id.

Item: can be an item id or sku

Indices: lastUpdated

Merchandising – Pricing

Page 20: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

21

• Get all prices for a given item

db.prices.find( { _id: /^p301671_/ )

• Get all prices for a given sku (price could be at item level)

db.prices.find( { _id: { $in: [ /^sku730223104376_/, /^p301671_/ ])

• Get minimum and maximum prices for a sku

db.prices.aggregate( { match }, { $group: { _id: 1, min: { $min: price },

max: { $max : price} } })

• Get price for a sku and store id (returns up to 4 prices)

db.prices.find( { _id: { $in: [ "sku730223104376_store1234",

"sku730223104376_sgroup0",

"p301671_store1234",

"p301671_sgroup0"] , { price: 1 })

Merchandising - Pricing

Page 21: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

22

• The hierarchy of items typically follows:

• Company– Division:

• Department: Women's shoe store– Class: Pumps

»Item: Guess classic pump• Variation: size 6 black

Merchandising – Product Hierarchy

Page 22: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

24

Merchandising – Browse and Search products

Browse by category

Special Lists

Filter by attributes

Lists hundreds of item

summaries

Ideally a single query is issued to the database to obtain all items and metadata to display

Page 23: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

25

The previous page presents many challenges:

• Response is needed within milliseconds for hundreds of items

• Faceted search on many attributes of an item: department, brand, category, etc

• Attributes to match may be at the variation level: color, size, etc, in which case the variation should be shown

• One item may have thousands of variations. Only one item should be displayed even if many variations match

• Efficient sorting on several attributes: price, popularity

• Pagination feature which requires deterministic ordering

Merchandising – Browse and Search products

Page 24: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

26

Merchandising – Browse and Search products

Hundreds of sizes

One Item

Dozens of colors

A single item may have thousands of variations

Page 25: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

27

Merchandising – Browse and Search products

Images of the matching variations are displayed

HierarchySort

parameter

Faceted Search

Page 26: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

28

Merchandising – Traditional Architecture

Relational DBSystem of Records

Full Text SearchEngine

Indexing

#1 obtain search

results IDs

ApplicationCache

#2 obtain objects by

ID

Pre-joined into objects

Page 27: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

29

The traditional architecture presents issues:

• 3 different systems to maintain: RDBMS, Search engine, Caching layer

• A search returns a list of IDs which then are looked up in the cache as a batch or one by one. It significantly increases latency of response

• RDBMS schema is complex and static

• The search index needs to be refreshed at intervals

• Setup does not allow efficient pagination

Merchandising – Traditional Architecture

Page 28: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

30

MongoDB Data Store

Merchandising - Architecture

Product Summaries

Product Definitions

Pricing

PromotionsProduct

VariationsRatings & Reviews

#1 Obtain results

Page 29: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

31

The product index relies on the following parameters:

• The department (required): the main component of category, e.g. "Shoes"

• An indexed attribute (optional)

– Category path, e.g. "Shoes/Women/Pumps"

– Price range (based on online prices)

– List of Item Attributes, e.g. Brand = Guess

– List of Variation Attributes, e.g. Color = red

• A non-indexed attribute (optional)

– List of Item Secondary Attributes, e.g. Style = Designer

– List of Variation Secondary Attributes, e.g. heel height = 5.0

• As well as Sorting, e.g. Price Low to High

Merchandising – Product Summaries

Page 30: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

32

> db.summaries.findOne()

{ "_id": "p39",

"title": "Evening Platform Pumps 39",

"department": "Shoes", "category": "Shoes/Women/Pumps",

"thumbnail": "http://cdn…/pump-small-39.jpg", "image": "http://cdn…/pump-39.jpg",

"price": 145.99,

"rating": 0.95,

"attrs": [ { "brand" : "Guess"}, … ],

"sattrs": [ { "style" : "Designer"} , { "type" : "Platform"}, …],

"vars": [

{ "sku": "sku2441",

"thumbnail": "http://cdn…/pump-small-39.jpg.Blue",

"image": "http://cdn…/pump-39.jpg.Blue",

"attrs": [ { "size": 6.0 }, { "color": "Blue" }, …],

"sattrs": [ { "width" : "B"} , { "heelHeight" : 5.0 }, …],

}, … Many more skus …

] }

Indices: vars.sku, department + attr + category, department + vars.attrs + category,

department + category, department + price, department + rating

Merchandising – Product Summaries

Page 31: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

33

• Get summary from item iddb.variation.find({ _id: "p301671" })

• Get summary's specific variation from SKUdb.variation.find( { "vars.sku": "730223104376" }, { "vars.$": 1 } )

• Get summary by department, sorted by ratingdb.variation.find( { department: "Shoes" } ).sort( { rating: 1 } )

• Get summary with mix of parametersdb.variation.find( { department : "Shoes" ,

"vars.attrs" : { "color" : "Gray"} , "category" : ^/Shoes/Women/ , "price" : { "$gte" : 65.99 , "$lte" :

180.99 } } )

Merchandising - Product Summaries

Page 32: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

34

Merchandising – Query stats

Department Category Price Primary attribute

Time Average (ms)

90th (ms) 95th (ms)

1 0 0 0 2 3 3

1 1 0 0 1 2 2

1 0 1 0 1 2 3

1 1 1 0 1 2 2

1 0 0 1 0 1 2

1 1 0 1 0 1 1

1 0 1 1 1 2 2

1 1 1 1 0 1 1

1 0 0 2 1 3 3

1 1 0 2 0 2 2

1 0 1 2 10 20 35

1 1 1 2 0 1 1

Page 33: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Content

Page 34: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

36

Content

Content

MongoDB

Metadata

Asset Repository

Digital Right Mgt

Access Control

Processing / Encoding

Page 35: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Inventory

Page 36: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

38

Inventory

Inventory

MongoDB

External Inventory

Internal Inventory

Regional Inventory

Purchase Orders

Fulfillment

Promotions

Page 37: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

39

Demonstration Document Model

Definitions• id: p0

Variations• id: sku0• pId: p0

Summary• id: p0• vars: [sku0,

sku1, …]

Stores• id: s1• Loc: [22, 33]

Inventory• store: s1• pId: p0• vars:

[{sku: sku0, q: 3},{sku: sku2, q: 2}]

Product

Page 38: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

40

db.stores.findOne()

{ "_id" : ObjectId("53549fd3e4b0aaf5d6d07f35"),

"className" : "catalog.Store",

"storeId" : "store0",

"name" : "Bessemer store",

"address" : {

"addr1" : "1st Main St",

"city" : "Bessemer",

"state" : "AL",

"zip" : "12345",

"country" : "US"

},

"location" : [

-86.95444,

33.40178

]

… }

Inventory - Stores

Page 39: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

41

• Get a store by storeId

db.stores.find({ productId: "301671" })

• Get nearby stores sorted by distance

db.stores.runCommand({ "geoNear" : "stores" , "near" : [ -82.800672 , 40.090844] , "maxDistance" : 10.0 , "spherical" : true}

Inventory - Stores

Page 40: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

42

> db.inventory.findOne()

{ "_id": "5354869f300487d20b2b011d",

"storeId": "store0",

"location": [

-86.95444,

33.40178

],

"productId": "p0",

"vars": [

{ "sku": "sku1", "q": 14 },

{ "sku": "sku3", "q": 7 },

{ "sku": "sku7", "q": 32 },

{ "sku": "sku14", "q": 65 },

...

]

}

Inventory - Quantities

Page 41: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

43

• Get all items in a storedb.inventory.find({ storeId: "store100" })

• Get quantity for an item at a storedb.inventory.find({ storeId: "store100", productId: "p200" })

• Get quantity for a sku at a storedb.inventory.find(

{ storeId: "store100", productId: "p200", "vars.sku": "sku11736" }, { "vars.$": 1 })

• Increment / decrement inventory for an item at a storedb.inventory.update(

{ storeId: "store100", productId: "p200", "vars.sku": "sku11736" }, { $inc: { "vars.$.q": 20 } })

• Indices: productId, storeId + productId, location (geo) + productId

Inventory - Stores

Page 42: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

44

• Aggregate total quantity for an itemdb.inventory.aggregate([

{ $match: { productId: "p200" }}, { $unwind: "$vars" }, { $group: { _id: "result", count: {$sum: 1} } }])

{ "_id" : "result", "count" : 101752 }

• Aggregate total quantity for a storedb.inventory.aggregate([

{ $match: { storeId: "store100" }}, { $unwind: "$vars" }, { $group: { _id: "result", count: {$sum: 1} } }])

{ "_id" : "result", "count" : 29347 }

Inventory - Stores

Page 43: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

45

• Get inventory for an item near a pointdb.runCommand(

{ "geoNear" : "inventory" , "near" : [ -82.800672 , 40.090844] , "maxDistance" : 10.0 , "spherical" : true, limit: 10, query: { productId: "p200", "vars.sku": "sku11736" }})

• Get closest store with available skudb.runCommand(

{ "geoNear" : "inventory" , "near" : [ -82.800672 , 40.090844] , "maxDistance" : 10.0 , "spherical" : true, limit: 10, query: { productId: "p200", vars: { $elemMatch: { "sku": "sku11736", q: { $gt: 0 } }}}}})

Inventory - Stores

Page 44: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Customer

Page 45: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

47

Customer

Customer

MongoDB

Profile

Market Segment

Demographics

Wish List

Preference

Inbox

Sales / Support Chat

Content Subscription

Page 46: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Channels

Page 47: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

49

Channels

Channels

MongoDB

Location

Store

Assortment

Point of Sale

Channel Definition

Planogram

Page 48: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Sales & Fulfillment

Page 49: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

51

Sales & Fulfillment

Sales & Fulfillment

MongoDB

Sales Transaction

Shipping

Tracking

Return & Exchange

Business Rule

Audit

Shopping Cart

Page 50: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Insight

Page 51: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

53

Insight

Insight

MongoDB

Advertising metrics

Clickstream

Recommendations

Session Capture

Activity Logging

Geo Tracking

Product Analytics

Customer Insight

Application Logs

Page 52: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

54

• Many user activities can be of interest:– Search– Product view, like or wish– Shopping cart add / remove– Sharing on social network– Ad impression, Clickstream

• Those will be used to compute:– Product Map (relationships, etc)– User Preferences– Recommendations– Trends

Activity Logging – Data of interest

Page 53: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

55

Activity logging - Architecture

MongoDB

HVDFAPI

Activity LoggingUser History

External Analytics:Hadoop,Spark,Storm,

User Preferences

Recommendations

Trends

Product MapApps

Internal Analytics:

Aggregation,MR

All user activity is recorded

MongoDB – Hadoop

Connector

Personalization

Page 54: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

56

Activity Logging

Page 55: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

57

• You need to store and manage an incoming stream of data samples (views, impressions, orders, …)– High arrival rate of data from many sources– Variable schema of arriving data– You need to control retention period of data

• You need to compute derivative data sets based on these samples– Aggregations and statistics based on data – Roll-up data into pre-computed reports and summaries

• You need low latency access to up-to-date data (user history)– Flexible indexing of raw and derived data sets – Rich querying based on time + meta-data fields in samples

Activity Logging – Problem statement

Page 56: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

58

Activity logging - Requirements

Requirement MongoDB

Ingestion of 100ks of writes / sec

Fast C++ process, multi-threads, multi-locks. Horizontal scaling via sharding. Sequential IO via time partitioning.

Flexible schema Dynamic schema, each document is independent. Data is stored the same format and size as it is inserted.

Fast querying on varied fields, sorting

Secondary Btree indexes can lookup and sort the data in milliseconds.

Easy clean up of old data Deletes are typically as expensive as inserts. Getting free deletes via time partitioning.

Page 57: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

59

Activity Logging using HVDF

HVDF (High Volume Data Feed):

• Open source reference implementation of high volume writing with MongoDB

• Rest API server written in Java with most popular libraries

• Public project, issues can be logged

• Can be run as-is, or customized as needed

Page 58: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

60

Feed

High volume data feed architecture

Channel

Sample Sample Sample Sample

Source

Source

Processor

Inline Processing

Batch Processing

Stream Processing

The Channel is the sequence of data

samples that a sensor sends into the

platform.

Sources send samples into the Channel

Processors generate derivative Channels from

other Channel data

Page 59: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

61

HVDF -- High Volume Data Feed engine

HVDF – Reference implementation

REST Service API

Processor Plugins

Inline

Batch

Stream

Channel Data Storage

Raw Channel

Data

Aggregated Rollup T1

Aggregated Rollup T2

Query Processor Streaming spout

Custom Stream Processing Logic

Incoming Sample Stream

POST /feed/channel/data

GET /feed/channeldata?time=XXX&range=YYY

Real-time Queries

Page 60: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

62

{ _id: ObjectId(),

geoCode: 1, // used to localize write operations

sessionId: "2373BB…",

device: { id: "1234",

type: "mobile/iphone",

userAgent: "Chrome/34.0.1847.131"

}

type: "VIEW|CART_ADD|CART_REMOVE|ORDER|…", // type of activity

itemId: "301671",

sku: "730223104376",

order: { id: "12520185",

… },

location: [ -86.95444, 33.40178 ],

tags: [ "smartphone", "iphone", … ], // associated tags

timeStamp: Date("2014/04/01 …")

}

User Activity - Model

Page 61: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

63

Dynamic schema for sample data

Sample 1{ deviceId: XXXX, time: Date(…) type: "VIEW", …}

Channel

Sample 2{ deviceId: XXXX, time: Date(…) type: "CART_ADD", cartId: 123, …}

Sample 3{ deviceId: XXXX, time: Date(…) type: “FB_LIKE”}

Each sample can have

variable fields

Page 62: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

64

Channels are sharded

Shard

Shard

Shard

Shard

Shard

Shard Key: Customer_id

Sample{ customer_id: XXXX, time: Date(…) type: "VIEW",}

ChannelYou choose how

to partition samples

Samples can have dynamic

schema

Scale horizontally by adding shards

Each shard is highly available

Page 63: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

65

Channels are time partitioned

Channel

Sample Sample Sample Sample Sample Sample Sample Sample

- 2 days - 1 Day Today

Partitioning keeps indexes manageable

This is where all of the writes

happen

Older partitions are read only for

best possible concurrency

Queries are routed only to needed

partitions

Partition 1 Partition 2 Partition N

Each partition is a separate collection

Efficient and space reclaiming

purging of old data

Page 64: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

66

Dynamic queries on Channels

Channel

Sample Sample Sample Sample

AppApp

App

Indexes

Queries Pipelines Map-Reduce

Create custom indexes on Channels

Use full mongodb query language to access samples

Use mongodb aggregation pipelines to

access samples

Use mongodb inline map-reduce to access samples

Full access to field, text, and geo

indexing

Page 65: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

67

North America - West

North America - East

Europe

Geographically distributed system

Channel

Sample Sample Sample Sample

Source

Source

Source

Source

Source

Source

Sample

Sample

Sample

Sample

Geo shards per location

Clients write local nodes

Single view of channel available

globally

Page 66: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

68

Insight

Page 67: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

69

Insight – Useful Data

• Useful data for better shopping:– User history (e.g. recently seen products)– User statistics (e.g. total purchases, visits)– User interests (e.g. likes videogames and SciFi)– User social network– Cross-selling: people who bought this item had

tendency to buy those other items (e.g. iPhone, then bought iPhone case)

– Up-selling: people who looked at this item eventually bought those items (alternative product that may be better)

Page 68: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

70

Example of real-time aggregation with Agg Framework

User Activity – Computing User Stats

Page 69: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

71

Example of real-time aggregation with Agg Framework

User Activity – Computing User Stats

Page 70: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

72

Let's simplify each activity recorded as the following:

{ userId: 123, type: order, itemId: 2, time }

{ userId: 123, type: order, itemId: 3, time }

{ userId: 234, type: order, itemId: 7, time }

To calculate items bought by a user for a period of time, let's use MongoDB's Map Reduce:

- Match activities of type "order" for the past 2 weeks

- map: emit the document by userId

- reduce: push all itemId in a list

- Output looks like { _id: userId, items: [2, 3, 8] }

User Activity – Items frequently bought together

Page 71: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

73

Then run a 2nd mapreduce job that for each of the previous results:

- map: emits every combination of 2 items, starting with lowest itemId

- reduce: sum up the total.

- output looks like { _id: { a: 2, b: 3 } , count: 36 }

User Activity – Items frequently bought together

Page 72: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

74

The output collection can then be queried per item Id and sorted by count, and cutoff at a threshold.

Need of index on { _id.a, count } and { _id.b, count }

You then obtain an affiliation collection with docs like:

{ itemId: 2, affil: [ { id: 3, weight: 36}, { id: 8, weight: 23} ] }

User Activity – Items frequently bought together

Page 73: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

75

Example of Hadoop integration

User Activity – Hadoop integration

Page 74: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Social

Page 75: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

77

Social

Social

MongoDB

Social Channels

User Network

Activity

Chat

Social Profiles

Community Mgt

Rewards / Gamification

Page 76: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Conclusion

Page 77: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Appendix

Page 78: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

83

West DC

Primary

Primary

Primary

Shard“West”

Shard“Center”

Shard“East”

Center DC East DC

Single View of Product Cluster Topology

Page 79: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

84

West DC

Primary

Primary

Primary

Shard“West”

Shard“Center”

Shard“East”

Center DC East DCPrimary node replicates data to all secondaries in the shard

as fast as possible

Single View of Product Cluster Topology

Page 80: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

85

West DC

Primary

Primary

Primary

Shard“West”

Shard“Center”

Shard“East”

Center DC East DC

Center Shard contains all the data for stores

in Center region

Single View of Product Cluster Topology

Page 81: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

86

West DC

Primary

Primary

Primary

Shard“West”

Shard“Center”

Shard“East”

Center DC East DC

Center Shard contains all the data for stores

in Center region

Local writes enable very high throughput

of updates

Single View of Product Cluster Topology

Page 82: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

87

West DC

Primary

Primary

Primary

Shard“West”

Shard“Center”

Shard“East”

Center DC East DC

Each region is able to see the data of all

stores from its “local” DC.

Single View of Product Cluster Topology

Page 83: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

88

West DC

Primary

Primary

Primary

Shard“West”

Shard“Center”

Shard“East”

Center DC East DC

Two nodes in each DC for painless maintenance

with zero downtime

Single View of Product Cluster Topology

Page 84: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

89

West DC

Primary

Primary

Primary

Shard“West”

Shard“Center”

Shard“East”

Center DC East DC

Even if a DC goes out, the database remains fully available

thanks to automated failover

Single View of Product Cluster Topology

Page 85: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

90

West DC

Primary

Primary

Primary

Shard“West”

Shard“Center”

Shard“East”

Center DC East DC

Data set can grow, shards can add up, without any rewrite of the

application code

Single View of Product Cluster Topology

Page 86: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Thank You!

Antoine GirbalSenior Solutions Engineer, MongoDB Inc.@antoinegirbal

Page 87: Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization