couchbase_john_bryce_israel_training_couchbase_hadoop

48
Couchase and Hadoop erry Krug r. Solutions Architect

Upload: couchbase

Post on 17-Jul-2015

213 views

Category:

Business


0 download

TRANSCRIPT

Page 1: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Couchase and Hadoop

Perry Krug

Sr. Solutions Architect

Page 2: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Agenda• View basics

• Lifecycle of a view

• Index definition, build, and query phase

• Indexing details

• Replica indexes, failover and compaction

• Primary and Secondary indexes

• View best practices

• Couchbase and Elastic Search

• Couchbase and Hadoop

Page 3: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

pol·y·glot / päli glät/ˈ ˌAdjective: Knowing or using several languages.Noun: A person who knows several languages.Synonyms: multilingual

per·sist·ence /p r sist ns/ə ˈ əNoun: The continued or prolonged existence

of something.Synonyms: perseverance - tenacity - pertinacity –

stubbornness

Page 4: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Couchbase Views – The basics• Define materialized views on JSON documents and then query

across the data set

• Using views you can define• Primary indexes

• Simple secondary indexes (most common use case)

• Complex secondary, tertiary and composite indexes

• Aggregations (reduction)

• Indexes are eventually indexed

• Queries are eventually consistent with respect to documents

• Built using Map/Reduce technology • Map and Reduce functions are written in Javascript

Page 5: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

View LifecycleDefine -> Build -> Query

5

Page 6: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Buckets & Design docs & Views•C

reate design documents on a bucket

•Create views within a design documentBUCKET 1

Design document 1

View 1View 1

View 2View 2

View 3View 3

Design document 2

View 4View 4

View 5View 5

Design document 3

View 6View 6

View 7View 7

BUCKET 2

Page 7: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Couchbase Server Cluster

Distributed Indexing and Querying

User Configured Replica Count = 1

Active

Doc 5

Doc 2

Doc

Doc

Doc

Server 1

REPLICA

Doc 3

Doc 1

Doc 7

Doc

Doc

Doc

App Server 1

COUCHBASE Client LibraryCOUCHBASE Client Library

Cluster Map

COUCHBASE Client LibraryCOUCHBASE Client Library

Cluster Map

App Server 2

Doc 9

• Indexing work is distributed amongst nodes

• Parallelize the effort

• Each node has index for data stored on it

• Queries combine the results from required nodes

Active

Doc 3

Doc 1

Doc

Doc

Doc

Server 2

REPLICA

Doc 6

Doc 4

Doc 9

Doc

Doc

Doc

Doc 8

Active

Doc 4

Doc 6

Doc

Doc

Doc

Server 3

REPLICA

Doc 2

Doc 5

Doc 8

Doc

Doc

Doc

Doc 7

Query

Create Index / View

Page 8: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

3333 22

Eventually indexed Views – Data flow2

Managed Cache

Dis

k Q

ueue

Disk

Replication Queue

App Server

Couchbase Server Node

Doc 1Doc 1

Doc 1

To other node

View engine

Doc 1

Page 9: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

DEFINE Index / View Definition in JavaScript

CREATE INDEX City ON Brewery.City;

Page 10: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

BUILD Distributed Index Build Phase

• Optimized for lookups, in-order access and aggregations

• View reads are from disk (different performance profile than GET/SET)

• Views built against every document on every node

­ Group them in a design document

• Views are automatically kept up to date

Page 11: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

QUERY Dynamic Queries with Optional Aggregation

• Eventually consistent with respect to document updates• Efficiently fetch a document or group of similar documents • Queries will use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queries

Query ?startkey=“J”&endkey=“K”{“rows”:[{“key”:“Juneau”,“value”:null}]}

Page 12: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Simple Primary and Secondary Indexing

Page 13: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Example Document Document

ID

Page 14: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Define a primary index on the bucket• Lookup the document ID / key by key, range, prefix, suffix

Index definition

Page 15: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Define a secondary index on the bucket

• Lookup an attribute by value, range, prefix, suffix

Index definition

Page 16: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Find documents by a specific attribute

• Lets find beers by brewery_id!

Page 17: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

The index definition

ValueKey

Page 18: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

The result set: beers keyed by brewery_id

Page 19: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Query PatternBasic Aggregations

Page 20: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Use a built-in reduce function with a group query

• Lets find average abv for each brewery!

Page 21: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Group reduce (reduce by unique key)

Page 22: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Query PatternTime-based Rollups

Page 23: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Find patterns in beer comments by time

{   "type": "comment",   "about_id": "beer_Enlightened_Black_Ale",   "user_id": 525,   "text": "tastes like college!",   "updated": "2010-07-22 20:00:20"}{   "id": "f1e62"}

timestamp

Page 24: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Query with group_level=2 to get monthly rollups

Page 25: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

group_level=3 - daily results - great for graphing

Page 26: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Query PatternLeaderboard

Page 27: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Aggregate value stored in a document• Lets find the top-rated beers!

{   "brewery": "New Belgium Brewing",   "name": "1554 Enlightened Black Ale",   "abv": 5.5,   "description": "Born of a flood...",   "category": "Belgian and French Ale",   "style": "Other Belgian-Style Ales",   "updated": "2010-07-22 20:00:20",  “ratings” : {    “jchris” : 5,    “scalabl3” : 4,    “damienkatz” : 1 },  “comments” : [     “f1e62”,     “6ad8c”   ]}

ratings

Page 28: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Sort each beer by its average rating• Lets find the top-rated beers!

average

Page 29: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Couchbase and Elastic Search

Page 30: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Full Text Search

Page 31: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

{ "name": "Abbey Belgian Style Ale", "description": "Winner of four World Beer Cup medals and eight medals at the Great American Beer Fest, Abbey Belgian Ale is the Mark Spitz of New Belgium’s lineup – but it didn’t start out that way."}

Search Across Full JSON Body

Search term: abbey

Page 32: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

{ "name": "Abbey Belgian Style Ale", "description": "Winner of four World Beer Cup medals and eight medals at the Great American Beer Fest, Abbey Belgian Ale is the Mark Spitz of New Belgium’s lineup – but it didn’t start out that way."}

Search Across Full JSON Body

Search term: abbey

Page 33: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Faceted Search

Categories

Items with Counts

Range Facets

Page 34: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Learning Portal – Proof of Concept

Page 35: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Couchbase and Hadoop

Page 36: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Cloudera, etc.

Operational vs. Analytic Databases

Couchbase

AnalyticAnalyticDatabasesDatabases

Get insights from Get insights from datadata

Real-time, Real-time, Interactive DatabasesInteractive Databases

Fast access Fast access to datato data

NoSQL

Page 37: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

What is Sqoop?

Sqoop is a tool designed to transfer data between Hadoop and [OLTP] databases. You can use Sqoop to import data from [an OLTP] database management system (RDBMS) such as MySQL or Oracle [or Couchbase] into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back.

sqoop.apache.org

Page 38: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Traditional ETL

Application DataData

T

What is Sqoop?

Page 39: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

A different paradigm

Data

ApplicationData

What is Sqoop?

Page 40: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

A very scalable different paradigm

Data

Application

Data

Application

Data

Application

Data

Page 41: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Where did the Transform go?

Application

Data

TTT TTT TTT TTT

What is Sqoop?

Page 42: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Couchbase Import and Export

$ sqoop import –-connect http://localhost:8091/pools --table DUMP

$ sqoop import –-connect http://localhost:8091/pools --table BACKFILL_5

$ sqoop export --connect http://localhost:8091/pools

--table DUMP –export-dir DUMP

•For Imports, table must be:– DUMP: All keys currently in Couchbase– BACKFILL_n: All key mutations for n minutes

•Specified –username maps to bucket– By default set to “default” bucket

Page 43: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Hadoop and Couchbase – Ad Targeting

click streamevents

profiles, campaigns

profiles, real time campaign statistics

40 milliseconds to respond with the decision.

2

3

1

Page 44: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Moving Parts

Page 45: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Content & Recommendation Targeting

Page 46: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Moving Parts

Page 47: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Thank you

Couchbase NoSQL Document Database

Page 48: Couchbase_John_Bryce_Israel_Training_couchbase_hadoop