couchbase 2.0: indexing and querying

58
1 1 Couchbase Server 2.0 Indexing and Querying Dipti Borkar Director, Product Management

Upload: couchbase

Post on 22-Jun-2015

6.085 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Couchbase 2.0: Indexing and Querying

1 1

Couchbase Server 2.0 Indexing and Querying

Dipti BorkarDirector, Product Management

Page 2: Couchbase 2.0: Indexing and Querying

Couchbase Server 2.0 - Webinar Series

Couchbase Server 2.0 and Indexing/Querying

Couchbase Server 2.0 and Incremental Map Reduce for Real-Time Analytics

Couchbase Server 2.0 and Cross Data Center Replication

Couchbase Server 2.0 and Full-Text Search Integration

Couchbase Server 2.0 Use Cases Overview

Introducing Couchbase Server 2.0

http://www.couchbase.com/webinarshttp://www.couchbase.com/webinars

Page 3: Couchbase 2.0: Indexing and Querying

New in Two

JSON support

Indexing and Querying

Cross data center replication

Incremental Map Reduce

Page 4: Couchbase 2.0: Indexing and Querying

4

What we’ll talk about

• Records vs Documents • Lifecycle of a view

Index definition, build, and query phase Indexing details

• Replica indexes, failover and compaction• Patterns

Primary and Secondary indexes Basic aggregations (avg ratings by brewery) Time-based analytics with group_level Leaderboard

Page 5: Couchbase 2.0: Indexing and Querying

5

Relational vs Document data model

Relational data model Document data modelCollection of complex documents with

arbitrary, nested data formats andvarying “record” format.

Highly-structured table organization with rigidly-defined data formats and

record structure.

JSONJSON

JSON

C1 C2 C3 C4

{

}

Page 6: Couchbase 2.0: Indexing and Querying

6

Example: User Profile

Address Info

1 DEN 30303CO

2 MV 94040CA

3 CHI 60609IL

User Info

KEY First ZIP_idLast

4 NY 10010NY

1 Dipti 2Borkar

2 Joe 2Smith

3 Ali 2Dodson

4 John 3Doe

ZIP_id CITY ZIPSTATE

1 2

2 MV 94040CA

To get information about specific user, you perform a join across two tables To get information about specific user, you perform a join across two tables

Page 7: Couchbase 2.0: Indexing and Querying

7

All data in a single documentAll data in a single document

Document Example: User Profile

{ “ID”: 1, “FIRST”: “Dipti”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA” }

JSON

= +

Page 8: Couchbase 2.0: Indexing and Querying

8

Making the Same Change with a Document Database

{ “ID”: 1, “FIRST”: “Dipti”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA”, “STATUS”: { “TEXT”: “At Conf” }

}

“GEO_LOC”: “134” },“COUNTRY”: ”USA”

Just add information to a documentJust add information to a document

JSON

,}

Page 9: Couchbase 2.0: Indexing and Querying

9

JSON Documents

• Maps more closely to objects or entities• CRUD Operations, lightweight schema

• Stored under an identifier key

{ “fields” : [“with basic types”, 3.14159, true], “like” : “your favorite language”}

client.set(“mydocumentid”, myDocument);mySavedDocument = client.get(“mydocumentid”);

Page 10: Couchbase 2.0: Indexing and Querying

Couchbase Server 2.0: Views

• Views can cover a few different use cases–Primary Index –Simple secondary indexes (the most common)–Complex secondary, tertiary and composite indexes–Aggregation functions (reduction)• Example: count the number of North American Ales

–Organizing related data

• Built using Map/Reduce–Map function creates a matrix from document fields–Reduce function summarizes (reduces) information–Written using superfast Javascript

Page 11: Couchbase 2.0: Indexing and Querying

What are Views?

• Extract fields from JSON documents and produce an index of the selected information

Page 12: Couchbase 2.0: Indexing and Querying

12

Development vs. Production Views

• Development views index a subset of the data.• Publishing a view builds the

index across the entire cluster.• Queries on production

views are scattered to all cluster members and results are gathered and returned to the client.

Page 13: Couchbase 2.0: Indexing and Querying

View LifecycleDefine -> Build -> Query

1313

Page 14: Couchbase 2.0: Indexing and Querying

• Create design documents on a bucket• Create views within a design document

Buckets & Design docs & Views

14

BUCKET 1

Design document

1

View 1View 1

View 2View 2

View 3View 3

Design document

2

View 4View 4

View 5View 5

Design document

3

View 6View 6

View 7View 7

BUCKET 2

Page 15: Couchbase 2.0: Indexing and Querying

15

Couchbase Server Cluster

Distributed Indexing and Querying

User Configured Replica Count = 1

Active

Doc 5

Doc 2

Doc

Doc

Doc

Server 1

REPLICA

Doc 3

Doc 1

Doc 7

Doc

Doc

Doc

App Server 1

COUCHBASE Client LibraryCOUCHBASE Client Library

Cluster Map

COUCHBASE Client LibraryCOUCHBASE Client Library

Cluster Map

App Server 2

Doc 9

• Indexing work is distributed amongst nodes

• Parallelize the effort

• Each node has index for data stored on it

• Queries combine the results from required nodes

Active

Doc 3

Doc 1

Doc

Doc

Doc

Server 2

REPLICA

Doc 6

Doc 4

Doc 9

Doc

Doc

Doc

Doc 8

Active

Doc 4

Doc 6

Doc

Doc

Doc

Server 3

REPLICA

Doc 2

Doc 5

Doc 8

Doc

Doc

Doc

Doc 7

Query

Create Index / View

Page 16: Couchbase 2.0: Indexing and Querying

16

DEFINE Index / View Definition in JavaScript

CREATE INDEX City ON Brewery.City;

16

Page 17: Couchbase 2.0: Indexing and Querying

17

BUILD Distributed Index Build Phase

• Optimized for lookups, in-order access and aggregations• All view reads from disk (different performance profile)• View builds against every document on every node–This is why you should group them in a design document

• Automatically kept up to date

Page 18: Couchbase 2.0: Indexing and Querying

18

• Efficiently fetch a document or group of similar documents • Queries use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queries

QUERY Dynamic Queries with Optional Aggregation

Query ?startkey=“J”&endkey=“K ”{ “rows”:[{“key”:“Juneau”,“value”:null}]}

Page 19: Couchbase 2.0: Indexing and Querying

–All the views within a design document are incrementally updated when the view is accessed or auto-indexing kicks in–Automatic view updates• In addition to forcing an index build, active & replica indexes are

updated every 3 seconds of inactivity if there are at least 5000 new changes (configurable)

–The entire view is recreated if the view definition has changed–Views can be conditionally updated by specifying the “stale”

argument to the view query–The index information stored on disk consists of the

combination of both the key and value information defined within your view.

Index building details

Page 20: Couchbase 2.0: Indexing and Querying

20

Queries run against stale indexes by default

• stale=update_after (default if nothing is specified)–always get fastest response–can take two queries to read your own writes

• stale=ok–auto update will trigger eventually–might not see your own writes for a few minutes– least frequent updates -> least resource impact

• stale=false–Use with “set with persistence” if data needs to be included in

view results–BUT be aware of delay it adds, only use when really required

Page 21: Couchbase 2.0: Indexing and Querying

Views and Replica indexes

• In addition to replicas for data (up to 3 copies), optionally create replica for indexes–Set at a bucket level –Replica index populated from replica data–Replica index is used after a failover

• Each node manages replica index data structures– Index structure optimized to handle sharded data (vBuckets)–One for all active shards on the node–One for all replica shards on the node

Page 22: Couchbase 2.0: Indexing and Querying

Views and failover

• Replica indexes enabled on failover• Replicas indexes are rebuilt on replica nodes–Automatically incrementally built based on replica data–Updated every 3 seconds of inactivity if there are at least 5000

new changes–Not copied/moved to be consistent with persisted replica data

Page 23: Couchbase 2.0: Indexing and Querying

View Compaction

• Compaction is ONLINE • Reclaims empty allocated space from disk• Indexes are stored on disk for active vBuckets on each

node and updated in append-only manner• Auto-compaction performed in the background–Set the database fragmentation levels–Set the index fragmentation levels–Choose a schedule–Global and bucket specific settings

Page 24: Couchbase 2.0: Indexing and Querying

24 24

Primary and Secondary Indexing

Page 25: Couchbase 2.0: Indexing and Querying

Example Document

25

Document ID

Page 26: Couchbase 2.0: Indexing and Querying

26

Define a primary index on the bucket

• Lookup the document ID / key by key, range, prefix, suffix

Index definitio

n

Page 27: Couchbase 2.0: Indexing and Querying

27

Define a secondary index on the bucket

• Lookup an attribute by value, range, prefix, suffix

Index definitio

n

Page 28: Couchbase 2.0: Indexing and Querying

Query PatternsFind by Attribute

28 28

Page 29: Couchbase 2.0: Indexing and Querying

29

Find documents by a specific attribute

• Lets find beers by brewery_id!

Page 30: Couchbase 2.0: Indexing and Querying

30

The index definition

ValueKey

Page 31: Couchbase 2.0: Indexing and Querying

31

The result set: beers keyed by brewery_id

Page 32: Couchbase 2.0: Indexing and Querying

Query PatternBasic Aggregations

32 32

Page 33: Couchbase 2.0: Indexing and Querying

33

Use a built-in reduce function with a group query

• Lets find average abv for each brewery!

Page 34: Couchbase 2.0: Indexing and Querying

34 34

We are reducing doc.abv with _stats

Page 35: Couchbase 2.0: Indexing and Querying

35 35

Group reduce (reduce by unique key)

Page 36: Couchbase 2.0: Indexing and Querying

Query PatternTime-based Rollups

36 36

Page 37: Couchbase 2.0: Indexing and Querying

37

Find patterns in beer comments by time

{   "type": "comment",   "about_id": "beer_Enlightened_Black_Ale",   "user_id": 525,   "text": "tastes like college!",   "updated": "2010-07-22 20:00:20"}{   "id": "f1e62"}

timestamp

Page 38: Couchbase 2.0: Indexing and Querying

38

Query with group_level=2 to get monthly rollups

Page 39: Couchbase 2.0: Indexing and Querying

39

dateToArray() is your friend

• String or Integer based timestamps• Output optimized for group_level queries• array of JSON numbers:

[2012,9,21,11,30,44]

dateT

oArra

y()

Page 40: Couchbase 2.0: Indexing and Querying

40

group_level=2 results

• Monthly rollup• Sorted by time—sort the query results in your

application if you want to rank by value—no chained map-reduce

Page 41: Couchbase 2.0: Indexing and Querying

41

group_level=3 - daily results - great for graphing

• Daily, hourly, minute or second rollup all possible with the same index.

Page 42: Couchbase 2.0: Indexing and Querying

42 42

Query PatternLeaderboard

Page 43: Couchbase 2.0: Indexing and Querying

43

Aggregate value stored in a document

• Lets find the top-rated beers!

{   "brewery": "New Belgium Brewing",   "name": "1554 Enlightened Black Ale",   "abv": 5.5,   "description": "Born of a flood...",   "category": "Belgian and French Ale",   "style": "Other Belgian-Style Ales",   "updated": "2010-07-22 20:00:20", “ratings” : { “jchris” : 5, “scalabl3” : 4, “damienkatz” : 1 }, “comments” : [ “f1e62”, “6ad8c” ]}

ratings

Page 44: Couchbase 2.0: Indexing and Querying

44 44

Sort each beer by its average rating

• Lets find the top-rated beers!

average

Page 45: Couchbase 2.0: Indexing and Querying

45 45

What NOT to Write

Page 46: Couchbase 2.0: Indexing and Querying

46

Most common mistakes

• Reduces that don’t reduce• Trying to do too many things with one view• Emitting too much data into a view value• Expecting view query performance to be as fast as get/set• Recursive queries require application code.

Page 47: Couchbase 2.0: Indexing and Querying

47 47

Full Text index

Page 48: Couchbase 2.0: Indexing and Querying

48

Elastic Search Adapter

ElasticSearch

• Elastic Search is good for ad-hoc queries and faceted browsing• Our adapter is aware of changing Couchbase topology• Indexed by Elastic Search after stored to disk in Couchbase

Couchbase Server 2.0 and Full-Text Search Integration

Page 49: Couchbase 2.0: Indexing and Querying

Geographic index

49 49

Page 50: Couchbase 2.0: Indexing and Querying

50

Experimental Status

• Not yet using Superstar trees • (only fast on large clusters)

• Optimized for bulk loading

Page 51: Couchbase 2.0: Indexing and Querying

Questions?

51 51

Page 52: Couchbase 2.0: Indexing and Querying

THANK YOU

[email protected]@DBORKAR

52 52

Page 53: Couchbase 2.0: Indexing and Querying

53

The B-tree Index

• Helps to know what's efficient• Superstar

http://damienkatz.net/2012/05/stabilizing_couchbase_server_2.html

Page 54: Couchbase 2.0: Indexing and Querying

54

• Incremental reduce values are stored in the tree

Logical View B-tree

REDUCES

REDUCES

Page 55: Couchbase 2.0: Indexing and Querying

55

• Incremental reduce values are stored in the tree

Logical View B-tree

7 5 5 3 2 3 7 5 5 3 2 3

2525 REDUCES

REDUCES

Page 56: Couchbase 2.0: Indexing and Querying

56

• Incremental reduce values are stored in the tree

Reduce!

7 5 5 3 2 3 7 5 5 3 2 3

2525

_count

function(keys, values) { return keys ? values.length : sum(values);}

_count

function(keys, values) { return keys ? values.length : sum(values);}

Page 57: Couchbase 2.0: Indexing and Querying

57

• You can query that tree dynamically• Lots of the patterns are about pulling value from this data structure

Dynamic Queries

2525

7 5 5 3 2 3 7 5 5 3 2 3

{ }{ }?startkey=“abba”&endkey=“robot”{ “value”:19}?startkey=“abba”&endkey=“robot”{ “value”:19}

_count

function(keys, values) { return keys ? values.length : sum(values);}

_count

function(keys, values) { return keys ? values.length : sum(values);}

Page 58: Couchbase 2.0: Indexing and Querying

58

Lookup via key-range

• Find tables during yesterdays lunch shift• Find shifts owned by which manager

7 5 5 3 2 3 7 5 5 3 2 3

2525

?startkey=“abba”&endkey=“robot”{ “value”:19}?startkey=“abba”&endkey=“robot”{ “value”:19}