indexing and querying 2_couchbasesf_2013

21

Upload: couchbase

Post on 10-May-2015

422 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Indexing and querying 2_CouchbaseSF_2013
Page 2: Indexing and querying 2_CouchbaseSF_2013

Power Techniques inIndexing and Querying (Part 2)

Jasdeep Jaitla

Technical Evangelist

Page 3: Indexing and querying 2_CouchbaseSF_2013

Agenda

• More about aggregates and JSON­ What are schemas in a Document Database?

• The modeling behind our sample database­ Document structure, metadata in Couchbase Server­ An example of how to handle new requirements as needed­ Addressing concurrent modifications to documents

Page 4: Indexing and querying 2_CouchbaseSF_2013

View Lifecycle: Define – Build - Query

Page 5: Indexing and querying 2_CouchbaseSF_2013

Distributed Index Build Phase

5

• Optimized for lookups, in-order access and aggregations

• All view reads from disk (different performance profile)

• View builds against every document on every node

• Automatically kept up to date (on writes and reads)

Doc 4

Doc 2

Doc 5

SERVER­1

Doc 6

Doc 4

SERVER­2

Doc 7

Doc 1

SERVER­3

Doc 3

Doc 9

Doc 7

Doc 8 Doc 6

Doc 3

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

Doc 9

Doc 5

DOC

DOC

DOC

Doc 1

Doc 8 Doc 2

Replica Docs Replica Docs Replica Docs

Active Docs Active Docs Active Docs

Page 6: Indexing and querying 2_CouchbaseSF_2013

Dynamic Range Queries with Optional Aggregation

• Efficiently fetch an row or group of related rows.• Queries use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level

queries

Doc 4

Doc 2

Doc 5

SERVER­1

Doc 6

Doc 4

SERVER­2

Doc 7

Doc 1

SERVER­3

Doc 3

Doc 9

Doc 7

Doc 8 Doc 6

Doc 3

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

Doc 9

Doc 5

DOC

DOC

DOC

Doc 1

Doc 8 Doc 2

Replica Docs Replica Docs Replica Docs

Active Docs Active Docs Active Docs

?startkey=“J”&endkey=“K”{“rows”:[{“key”:“Juneau”,“value”:null}]}

Page 7: Indexing and querying 2_CouchbaseSF_2013

Queries run against stale indexes by default

• stale = UPDATE_AFTER (default if nothing is specified)­ always get fastest response­ can take two queries to read your own writes

• stale = OK­ auto update will trigger eventually­ might not see your own writes for a few minutes­ least frequent updates -> least resource impact

• stale = FALSE­ Use with Persistence observe if data needs to be included in view results­ BUT aware of delay it adds, only use when really required

Page 8: Indexing and querying 2_CouchbaseSF_2013

Development vs. Production Views

• Development views index a subset of the data.

• Publishing a view builds the index across the entire cluster.

• Queries on production views are scattered to all cluster members and results are gathered and returned to the client.

Page 9: Indexing and querying 2_CouchbaseSF_2013

Emergent Schema

JSON.org

Github­API

"Capture the user's intent"

• Falls out of your key-value usage• Helps to know what's efficient• Deal with unstructured data more easily

­ Different schemas/APIs

Page 10: Indexing and querying 2_CouchbaseSF_2013

Query Pattern: Collation of Related Docs

Page 11: Indexing and querying 2_CouchbaseSF_2013

Join Through Collation

See Bradley Holt’s presentation from CouchConf Boston:http://www.couchbase.com/couchconf-boston

Page 12: Indexing and querying 2_CouchbaseSF_2013

Anti-patterns

• Emitting document or too much data into a view­ Especially avoid including the doc itself in an emit() call

• Reduces that don’t reduce­ If you implement a custom reduce, make sure it doesn’t expand!

• Expecting a query on an index to be as fast­ Secondary indexes need to be built, happen asynchronously, and are (currently)

cached at the filesystem level

• Trying to do too much with one view­ Instead, co-locate views in design documents, or have separate design documents

• Note that sometimes, you may need to make requests of multiple views­ There is not directly a method of doing a join, but there is a technique

Page 13: Indexing and querying 2_CouchbaseSF_2013

What about Geo?

• Experimental in the 2.0 release

• Currently completely rewritten internally

• Supports GeoJSON, will support more rich queries soon.

• Java SDK contains Geo support right now!

Page 14: Indexing and querying 2_CouchbaseSF_2013

Couchbase Integration

Page 15: Indexing and querying 2_CouchbaseSF_2013

Integration with ElasticSearch

ElasticSearch

1. ElasticSearch Query

2. ElasticSearch Result

3. Couchbase Multi-GET

4. Couchbase Result

Page 16: Indexing and querying 2_CouchbaseSF_2013

The Learning Portal

• Designed and built as a collaboration between MHE Labs and Couchbase

• Serves as proof-of-concept and testing harness for Couchbase + ElasticSearch integration

• Available for download and further development as open source code

https://github.com/couchbaselabs/learningportal

Page 17: Indexing and querying 2_CouchbaseSF_2013

Integration with Hadoop

Page 18: Indexing and querying 2_CouchbaseSF_2013

Views Allow Common Methods of QueryingCommon patterns such as simple secondary indexes, count and average aggregations, and time series rollups are simple and fast.

Couchbase Integrates for Full Text and Large AnalyticsCouchbase integrates with ElasticSearch, Hadoop and other systems.

Summary

Couchbase has Views for Indexing and QueryingViews are incremental map-reduce code that run across all documents.

Page 19: Indexing and querying 2_CouchbaseSF_2013

Q&A

Page 20: Indexing and querying 2_CouchbaseSF_2013

Thanks!

Page 21: Indexing and querying 2_CouchbaseSF_2013