hippo gettogether: the architecture behind hippos relevance platform

Post on 26-Jan-2015

112 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

These slides were from my Hippo GetTogether 2013 presentation. During this presentation I went into detail about the architecture behind our high performance relevance platform. The talk will also cover why we chose CouchBase for storage and how Elasticsearch can be used for search and analytics. I shared how we integrated and leverage both products full-circle from within our Hippo CMS product.

TRANSCRIPT

Building a relevance platform with Couchbase and

Elasticsearch

Hippo GetTogether, 21 June 2013Jeroen Reijn | @jreijn | #hgt2013

Hippo GetTogether 2013

follow the Hippo trail

follow the Hippo trail

Hippo GetTogether 2013

About me

• Architect @ Hippo

• DevOps guy

• Blogger @ http://blog.jeroenreijn.com

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

Relevance?

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

“The capability of a search engine or function to

retrieve data appropriate to a user's needs.”

http://www.thefreedictionary.com/relevance

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

How we deliver relevant content

@Hippo

follow the Hippo trail

Hippo GetTogether 2013

Registration

Visitor - entity making HTTP requests

Collector - records data about a visitor or his behavior

Example: location collector (GeoIPCollector)

Targeting Data - all data about a specific visitor

Example: IP address is located in Amsterdam

follow the Hippo trail

Hippo GetTogether 2013

MatchingCharacteristic - a type of fact about visitors

Example: "comes from a city", "experiences a type of weather"

Target Group - the specification of a Characteristic

Example: "comes from a European city", "comes from Amsterdam"

Persona - one or more target groups that describe a certain type of visitor

Example: "Jim, the European urban consumer",

"Alice, the Pet owner"

follow the Hippo trail

Hippo GetTogether 2013

What do we store?Request log

Targeting data

Statistics

Averages, e.g. how many visitors became which persona

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

BIG DATA !!

follow the Hippo trail

Hippo GetTogether 2013

Real-time analysis

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoArchitecture

follow the Hippo trail

Hippo GetTogether 2013

RDBMS

Hippo Delivery Tier

Hippo Repository

App server

XMLJSON (X)HTML

follow the Hippo trail

Hippo GetTogether 2013

Delivery Tier

URL Matching

Fetch content

Compose output

Request

Response

Request

follow the Hippo trail

Hippo GetTogether 2013

Delivery Tier

URL Matching

Targeting Data Collection

Compose output

Request

Response

Request

Fetch content

Scoring

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoScaling

follow the Hippo trail

Hippo GetTogether 2013

RDBMS

Hippo Delivery Tier

Hippo Repository

App server

Hippo Delivery Tier

Hippo Repository

App server

Scaling out

follow the Hippo trail

Hippo GetTogether 2013

RDBMS

Delivery Tier

Repository

App server

Delivery Tier

Repository

App server

Scaling out

TargetingDatastore

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoWhat kind of ‘storage’?

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoQuestion?

follow the Hippo trail

Hippo GetTogether 2013

Distributed Cache?

follow the Hippo trail

Hippo GetTogether 2013

We have a winner!

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

Requirements change!

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoNoSQL to the rescue

follow the Hippo trail

Hippo GetTogether 2013

Suitable types• Key-value store

• Document database

follow the Hippo trail

Hippo GetTogether 2013

Assessment Criteria

Maturity Data model

Consistency model

PerformanceReplication

Caching model Query model

Monitoring

Scalability

Reliability

Support

follow the Hippo trail

Hippo GetTogether 2013

Selection Criteria• Performance

• Scalability

• Schema flexibility

• Simplicity

• Monitoring

• Support

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

Performance !!

Performance !!!!

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoScalability

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoSchema flexibility

follow the Hippo trail

Hippo GetTogether 2013

{ "visitorId": "7a1c7e75-8539-40", "pageUrl": "http://localhost:8080/site/news", "pathInfo": "/news", "remoteAddr": "127.0.0.1", "referer": "http://localhost:8080/site/", "timestamp": 1371419505909, "collectorData": { "geo": { "country": "", "city": "", "latitude": 0, "longitude": 0 }, "returningvisitor": false, "channel": "English Website" }, "personaIdScores": [], "globalPersonaIdScores": []}

Request log document

follow the Hippo trail

Hippo GetTogether 2013

{ "geo": { "collectorId": "geo", "city": "", "country": "", "latitude": 0, "longitude": 0 }, "channel": { "collectorId": "channel", "channels": [ "English Website" ], "lastVisitedChannel": "English Website" }}

Visitor document

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoSimplicity

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoMonitoring

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoSupport

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoCouchbase

follow the Hippo trail

Hippo GetTogether 2013

Why Couchbase?

• Drop-in replacement for memcached

• Read/Write-through cache

• High throughput

• Easy scalability

• Schema flexibility

• Low latency

follow the Hippo trail

Hippo GetTogether 2013

Couchbase

• Open Source

• Document-oriented

• Easy Scalable

• Consistent High Performance

• Apache license

follow the Hippo trail

Hippo GetTogether 2013

Performance

• Object managed cache

• Write Queue to disk

• Avoids Cold Cache

follow the Hippo trail

Hippo GetTogether 2013

Source: http://www.slideshare.net/Couchbase/benchmarking-couchbase Copyright © Altoros Systems, Inc.

follow the Hippo trail

Hippo GetTogether 2013

Easy scalable

• Auto sharding

• Cross cluster replication (XDCR)

• Master - Master replication

follow the Hippo trail

Hippo GetTogether 2013

Flexible data model

• Native JSON support

• Incremental Map Reduce

• Gives power to the developer

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

How we run Couchbase @Hippo

follow the Hippo trail

Hippo GetTogether 2013

Load Balancer

Database cluster

Hippo Delivery Tier Couchbase cluster

•Request log data•Targeting data•Statistics data

follow the Hippo trail

Hippo GetTogether 2013

Query capabilities• Querying via views

• Secondary indexes via views

• Views based on Map - Reduce

• Lacks some advanced query capabilities

follow the Hippo trail

Hippo GetTogether 2013

Elasticsearch

• Apache Lucene

• Designed to be distributed

• Schema free

• Apache license

• RESTful API

follow the Hippo trail

Hippo GetTogether 2013

Added value of ES• Full text search

• Faceted search

• Geo spatial search

• All in (near) real-time

follow the Hippo trail

Hippo GetTogether 2013

Couchbase Server Cluster Elasticsearch Server Cluster

Hippo Delivery Tier

Java API

Wri

te

Rea

d

XDCR Couchbase ES Transport plugin

Replicating to ES

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoWhat’s Next?

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoWhat’s Next?

follow the Hippo trail

Hippo GetTogether 2013

Advanced analytics

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ GotoDemo time!

follow the Hippo trail

Hippo GetTogether 2013

OneHippo @ Goto

Thank you!

Questions?

j.reijn@onehippo.com | @jreijn

ps. We’re hiring!

top related