real-time big data analytics based on product recommendations case study

38
Real-time big data analytics based on product recommendations case study IT Business Solutions B2B Conference October 2015 © deep.bi

Upload: deepbi

Post on 11-Apr-2017

1.515 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Real-time big data analytics based on product recommendations case study

Real-time big data analytics based on product recommendations case study

IT Business Solutions B2B Conference October 2015

© deep.bi

Page 2: Real-time big data analytics based on product recommendations case study

We started as an ad network

The challenge was to recommend the best product (out of millions)

to the right person in a given moment (thousands of users within a second)

Page 3: Real-time big data analytics based on product recommendations case study

5 billion ad views delivered in 24 months

Page 4: Real-time big data analytics based on product recommendations case study

To put it in the scale context:

If we would serve 1 ad per second it will take

160 years to serve 5 billion ads

Page 5: Real-time big data analytics based on product recommendations case study

So we needed a solution

SQL databases did not work

Popular NoSQL databases did not work

Standard data warehouse approaches (pre-aggregations, creating schemas) - did not work

Page 6: Real-time big data analytics based on product recommendations case study

Re-thinking all the problems with huge data streams flowing to us every second

we have built a complete solution based on open-source technologies

and fresh, smart ideas from our engineering team

It is called deep.bi and now we make it available to other companies

Page 7: Real-time big data analytics based on product recommendations case study

DEEP.BI = BIG DATA FAST DATA SOLUTION

high velocity high volume

Page 8: Real-time big data analytics based on product recommendations case study

deep.bi lets high-growth companies solve fast data problems by providing

scalable, flexible and real-time data collection, enrichment and analytics

Page 9: Real-time big data analytics based on product recommendations case study

deep.bi – complete data processing flow

Data enrichment,

transformation and integration

Unstructured, raw data from many sources

page views, IoT events,

IP, URL, cookie, transactions, call detail

records, etc.

Find patterns,

build models, predict

behavior

collect enrich analyze

Page 10: Real-time big data analytics based on product recommendations case study

How to predict the best offer based on online data – case study.

Page 11: Real-time big data analytics based on product recommendations case study

Collect website, campaigns and CRM data

Website: Google

Analytics

Campaigns: Agency reports

Apps: Dedicated

monitoring tools

Other systems:

Call center IVR, emails

Instead of integrating current reporting tools we need to gather all the single events that our customers generate.

Data is stored in silos. Reporting tools provide aggregated reports impossible to integrate around single customer.

Page 12: Real-time big data analytics based on product recommendations case study

Collecting raw web data is not enough

2015-05-15T00:26:41.328Z,3,D,[ip_hidden],i1xszg0f-19hqrje,"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36",”[url_hidden]",7279848891,@906,"https://www.google.pl/",vuser-history-allegro-1-hc20150509.1,"122_100003_Park@700:html_620x100_single_banner:See offer"

IP, URL, cookie, user-agent, timestamp

Page 13: Real-time big data analytics based on product recommendations case study

* Coming soon

Enrich raw web and mobile data

50+ information

from one interaction

Purchase intent

Device

Time

Location

ISP

Online context

Weather* Demographics

Page 14: Real-time big data analytics based on product recommendations case study

We can learn quite a few things from user IP

Example use: •  international travellers •  townspeople •  people in mountains •  rainy day

•  Country •  Region •  City •  ZIP Code •  Population •  Latitude & Longitude •  Time zone •  IDD prefix to call the city from

another country •  Phone area code •  Mobile Country Code (MCC) •  Mobile Network Code (MNC) •  Elevation •  Weather at the moment of event

Page 15: Real-time big data analytics based on product recommendations case study

ISP tells us more we could expect

Example use: •  competitors’ users->

acquisition •  our users -> retention/up-

selling/cross-selling •  people from particular

company or company type

•  ISP name or Organization name •  Organization type:

•  Commercial •  Organization •  Government •  Military •  University/College/School •  Library •  Content Delivery Network •  Fixed Line ISP •  Mobile ISP •  Data Center/Web Hosting/Transit •  Search Engine Spider •  Reserved

•  Mobile brand •  Net speed

Page 16: Real-time big data analytics based on product recommendations case study

Detailed information about user device

Example use: •  smartphone users •  Apple users •  Samsung Galaxy users •  Google browser users

•  Device Type •  Device Brand •  Device Model •  Device Operating System •  Operating System Producer •  Browser •  Browser Producer

Page 17: Real-time big data analytics based on product recommendations case study

Besides user features, track user behavior too.

Deeper understanding of people’s behavior: •  RFM Segmentation (Recency, Frequency, Monetary) •  Shopping cart analysis •  Purchase sequence analysis

Page 18: Real-time big data analytics based on product recommendations case study

User behavior and characteristics helps predicts next best action/offer

What product should we recommend?

How could end this purchase path?

Page 19: Real-time big data analytics based on product recommendations case study

So, how to build tailored recommendations? Pick an algorithm that is suitable for the problem

Product [ feature_1, feature_2, …, feature_N]

User [ feature_1, feature_2, …, feature_N]

User [ product_1, product_2, …, product_N]

Page 20: Real-time big data analytics based on product recommendations case study

  Simple rules: if a user has some features serve this group of products   Manual segment creating: analysts find

segments of users and match them with product segments   Simple feature matching: get user weighted

feature vector and match with products feature vectors

Manual / people managed rules

Page 21: Real-time big data analytics based on product recommendations case study

  Find segments automatically (e.g. k-means)   Product features based recommendations

  User features based recommendations

  Combined product and user based

recommendations (collaborative filtering, deep learning)

Machine learning-supported recommendations

Page 22: Real-time big data analytics based on product recommendations case study

Prod

uct p

opul

arity

Products

The most interesting recommendations

Recommendations long tail phenomenon

Page 23: Real-time big data analytics based on product recommendations case study

Technology behind Deep BI

Page 24: Real-time big data analytics based on product recommendations case study

 Complex data model for query optimization

 split dimensions in several tables based on reports made

 pre cherry-pick dimensions which we can aggregate based on cardinality

  index every dimension column is a must

  Impossible to add high-cardinality dimensions

 no way to analyze per user (millions of them)

 no way to event add all of user-agent, url, geo-info, ...

Problems with SQL and NoSQL databases

Page 25: Real-time big data analytics based on product recommendations case study

 Complex data loading process

 needs to pre-aggregate in memory

 non-trivial reliability issues

 hard to parallelize

  There is always latency

 pre-aggregation in job loading memory

Problems with SQL and NoSQL databases

Page 26: Real-time big data analytics based on product recommendations case study

Customer databases

Event sources*

Raw data stream

Transformed data stream

Real-time data ingestion Kafka

Data Transformation & Enrichment Node.js, Spark

Streaming

Real-time OLAP Store

Druid

Operational Store

Cassandra

High performance, multi-purpose storage

Web

ana

lytic

s da

shbo

ard

deep.bi API  

ETL  

Customer analytics

dashboard

*e.g.. mobile apps, websites, marketing campaigns, IoT (beacons, wearables)  

Raw Data Store Hadoop,

Parquet, Spark

deep.bi – real-time big data architecture

Page 27: Real-time big data analytics based on product recommendations case study

DEEP Data enrichment, storage & analytics

Client’s DEEP Data Space

End-user browser

Web Data Collection API (HTML or JS)

Trackers pass event data with

<DEEP tracker>

Ingestion API

Data Collection APIs

1  

<D> <D>

Mobile Data Collection API (HTML, JS or Native SDK)

Trackers pass event data with

Page 28: Real-time big data analytics based on product recommendations case study

Events are represented with full flexibility of JSON { "data": { "event_type": "CLICK", "ad_request_event": { "ctx": { "event_time": "2015-07-10T06:15:50.819Z", "ip_address": "XX.XX.XX.XX", "geo_info": { "country": ”US", "region": ”California", "city": ”San Francisco", "timezone": ”PST", "isp": ”XXX", "population": 849,774 }, “page": { "raw_url": ”XXX", "standardized_domain": ”XXX" }, "page_info": { "page_raw_url": ”XXX", ”product_categories": [ { "id": 20585 }, { "id": 100126 }, }, "cookie": "ibx8axlw-17j287o", "user_agent": "Mozilla/5.0 (Linux; Android 4.2.2; GT-S7580 Build/JDQ39) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.93 Mobile Safari/537.36",

Page 29: Real-time big data analytics based on product recommendations case study

  Publish-subscribe service   The nervous system of enterprise data

  decouple producers from consumers   reliable buffer data   send now, process later.

  Scalable distributed, replicated log system   Pause components, restart processing   Powered by:

  web giants like LinkedIn, Twitter, Netflix, Uber, Spotify or Pinterest   >10M messages/second

Apache Kafka

Page 30: Real-time big data analytics based on product recommendations case study

  Scalable, fault-tolerant stream processing system   With simple programming model & rich API & integrations   Powered by:

  Yahoo, Netflix, eBay   NASA, Intel, Cisco

  It is our fundamental technology for streaming applications sessionize events   detect frauds   attribute purchases to click or views   load & read external stores like Druid, Hadoop, Cassand

Apache Spark Streaming

Page 31: Real-time big data analytics based on product recommendations case study

  Open Source Streaming Data Store for Interactive Analytics at Scale   denormalized data   no more snowflake or star-schema!   Build real-time dashboards, analytic applications, exploratory tools on it.

  It’s FAST!   aggregate, drill-down, slice-n-dice in sub-seconds   advanced column-store with compression   sophisticated approximate algorithms

  It’s SCALABLE   horizontally scalable - just add more machines   replicated, highly-available   Over 100 PBs of data, millions events/second

Druid – Real-time OLAP Store

Page 32: Real-time big data analytics based on product recommendations case study

  Ingest historical & real-time data   data available for exploration in milliseconds   can store years of data in very optimized storage

  Powered by   eBay, Netflix, PayPal, Yahoo   Cisco

  It is our core data store of all events, historical and real-time data

Druid – Real-time OLAP Store

Page 33: Real-time big data analytics based on product recommendations case study

  Apache Spark for batch-processing: fast and general engine for large-scale data processing   Replaces Map-Reduce, being up to 10x-100x faster!   Number 1 open-source project in big data space (contributors, commits)   In-memory processing (if possible)   Spark SQL for SQL processing

  Apache Parquet - an optimized storage format   columnar – read only columns you need   compressed – specialized compression for data type + generic compression   2x-4x: 600 GB data -> 150 GB data

  Hadoop can be optimized by 2 order of magnitudes: from hours to seconds!

Hadoop Optimized

Page 34: Real-time big data analytics based on product recommendations case study

Thank you!

Share your thoughts, challenges or case studies with us.

Or drop us a line: [email protected]

SUBMIT »

Page 35: Real-time big data analytics based on product recommendations case study

Backup slides

Page 36: Real-time big data analytics based on product recommendations case study

Let’s assume we want to find users who:

  Were interested in smartphones   Use Samsung product   Live in cities with population over 1M people   Are woman   Were traveling abroad   Came from our display campaign

So, we have a combination of 6 (k) dimensions from 50 (n).

Using the combination formula: we will have…

Complexity of multidimensional queries

Page 37: Real-time big data analytics based on product recommendations case study

… similar number of possible combinations:

15,890,700 as in Lotto (6 from 49).

Page 38: Real-time big data analytics based on product recommendations case study

Thank you!

Share your thoughts, challenges or case studies with us.

Or drop us a line: [email protected]

SUBMIT »