tuning for performance: indexes and queries – couchbase connect 2016
TRANSCRIPT
©2016 Couchbase Inc. 1
Tuning For PerformanceIndexes And Queries
©2016 Couchbase Inc. 2©2016 Couchbase Inc.
Agenda
• Data Model• Query Execution• Indexing Options• Index Design• Query Tuning• Deployment & Configuration
©2016 Couchbase Inc. 3
Data Model
©2016 Couchbase Inc. 4©2016 Couchbase Inc.
Document Data Modeling for N1QL
• Define document boundaries• Identifying parent and child objects• Deciding whether to embed child objects
• Defining relationships• Parent-child relationships• Independent relationships
• Expressing relationships
©2016 Couchbase Inc. 5©2016 Couchbase Inc.
Identifying parent and child objects
• A Parent object has an independent lifecycle• It is not deleted as part o deleting any other objects• E.g. a registered user of a site
• A child object has dependent lifecycle; it has no meaningful existence without its parent• It must be deleted when it parent is deleted• E.g. a comment on a blog (child of the blog object)
©2016 Couchbase Inc. 6©2016 Couchbase Inc.
Deciding whether to embed child objects
• Couchbase provides per-document atomicity• If the child and parent must be atomically updated or deleted together,
the child must be embedded• There is no key-value look up for embedded objects. If child
requires key-value look up it should not be embedded.• Performance trade off
• Embedding the child makes it faster to read the parent together with all its children (single document fetch)
• If the child has high cardinality, embedding the child makes the parent bigger and slower to store and fetch
©2016 Couchbase Inc. 7©2016 Couchbase Inc.
Defining & Expressing Relationships
• Defining Relationships• Parent-child relationships
• If we model the child as a separate document and not embedded, we have defined a relationship (parent-child)
• Independent relationships• Relationships between two independent objects
• Expressing relationships• 3 ways to express relationships Couchbase
• Parent contains keys of children (outbound)• Children contain key of parent (inbound)• Both of the above (dual)
©2016 Couchbase Inc.©2016 Couchbase Inc.
Using JSON to Store Data{ "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" }, { "type" : "master", "cardnum" : "6274-2842-2847-3909", "expiry" : "2019-03" } ], "Connections" : [ { "CustId" : "XYZ987", "Name" : "Joe Smith" }, { "CustId" : "PQR823", "Name" : "Dylan Smith" } { "CustId" : "PQR823", "Name" : "Dylan Smith" } ], "Purchases" : [ { "id":12, item: "mac", "amt": 2823.52 } { "id":19, item: "ipad2", "amt": 623.52 } ]}
DocumentKey: CBL2015
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
CustomerID
Type Cardnum Expiry
CBL2015 visa 5827… 2019-03
CBL2015 master
6274… 2018-12
CustomerID ConnId Name
CBL2015 XYZ987 Joe Smith
CBL2015 SKR007 Sam Smith
CustomerID item amt
CBL2015 mac 2823.52
CBL2015 ipad2 623.52
CustomerID ConnId Name
CBL2015 XYZ987 Joe Smith
CBL2015 SKR007 Sam Smith
Contacts
CustomerBilling
ConnectionsPurchases
©2016 Couchbase Inc. 9©2016 Couchbase Inc.
Travel-Sample
key: airline_24{ "id": "24", "type": "airline", "callsign": "AMERICAN", "iata": "AA"}
key: airport_3577{ "id": 3577 "type": "airport", "faa": "SEA", "icao": "KSEA"}
key: route_5966{ "id": 5966 "type": "route", "airlineid": "airline_24", "sourceairport": "SEA"}
key: landmark_21661{ "id": 21661 "type": "landmark", "country": "France", "email": null}
Key reference
key: hotel_25592{ "id": 25592 "type": " hotel", "country": " San Francisco", "phone": " +1 415 440-5600 "}
airline
landmark
route
hotel
airport
©2016 Couchbase Inc. 10©2016 Couchbase Inc.
Travel-sample: Hotel Document "docid": "hotel_25390"{ "address": "321 Castro St",… "city": "San Francisco", "country": "United States", "description": "An upscale bed and breakfast in a restored house.", "directions": "at 16th", "geo": { "accuracy": "ROOFTOP", "lat": 37.7634, "lon": -122.435 }, "id": 25390, "name": "Inn on Castro", "phone": "+1 415 861-0321", "price": "$95–$190", "public_likes": ["John Smith", "Joe Carl", "Jane Smith", "Kate Smith"], "reviews": [ { "author": "Mason Koepp", "content": ”blah-blah", "date": "2012-08-23 16:57:56 +0300", "ratings": { "Check in / front desk": 3, "Cleanliness": 3, "Location": 4, "Overall": 2, "Rooms": 2, "Service": -1, "Value": 2 } } ], "state": "California", "type": "hotel", "url": "http://www.innoncastro.com/",}
Document Key
city: Attributes (key-value pairs)
geo: Object. 1:1 relationship
public_likes: Array of strings: Embedded 1:many relationship
reviews: Array of objects: Embedded 1:N relationship
ratings: object within an array
©2016 Couchbase Inc.©2016 Couchbase Inc.
N1QL Access Methods and Performance
Fastest to slowest, 1 to 5
Method Description
1 USE KEYS Document fetch, no index scan
2 COVERED Index Scan Query is (or part of the query during JOIN) is processed with index scan only
3 Index Scan Partial index scan, then fetches
4 JOIN Fetch of left-hand-side, then fetches of right-hand-side
5 Primary Scan Full bucket scan, then fetches
©2016 Couchbase Inc.©2016 Couchbase Inc.
Child Representation and Access Method
Child Representation Access Method Notes
1 Embedded USE KEYS• Parent with children loaded via
USE KEYS• Child can be surfaced via UNNEST
2 Outbound relationship JOIN • Parent contains child keys• Children loaded via JOIN
3 Inbound relationship Index scan• Children contain parent key• child.parent_key is indexed• Index is scanned to load children
4 Not modeled Primary scan • Relationship not explicitly modeled
©2016 Couchbase Inc. 13
Query Execution
©2016 Couchbase Inc.©2016 Couchbase Inc.
NoSQL
{ "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" }, { "type" : "master", "cardnum" : "6274-2842-2847-3909", "expiry" : "2019-03" } ], "Connections" : [ { "CustId" : "XYZ987", "Name" : "Joe Smith" }, { "CustId" : "PQR823", "Name" : "Dylan Smith" } { "CustId" : "PQR823", "Name" : "Dylan Smith" } ], "Purchases" : [ { "id":12, item: "mac", "amt": 2823.52 } { "id":19, item: "ipad2", "amt": 623.52 } ]}
LoyaltyInfo ResultDocuments
Orders
CUSTOMER
Input: JSON Documents
Output: JSON Documents
©2016 Couchbase Inc.©2016 Couchbase Inc.
N1QL: Query Execution Flow
Clients
1. Submit the query over REST API 8. Query result
2. Parse, Analyze, create Plan 7. Evaluate: Filter, Join, Aggregate, Sort, Paginate
3. Scan Request; index filters
6. Fetch the documents
Index Service
Query Service
Data Service
4. Get qualified doc keys
5. Fetch Request, doc keys
SELECT c_id, c_first, c_last, c_max FROM CUSTOMER WHERE c_id = 49165;
{ "c_first": "Joe", "c_id": 49165, "c_last": "Montana", "c_max" : 50000}
©2016 Couchbase Inc.©2016 Couchbase Inc.
Inside a Query Service
Client
FetchParse Plan Join FilterPre-Aggregate
Offset Limit ProjectSortAggregateScan
Query Service
Index Servi
ce
Data Servi
ce
©2016 Couchbase Inc. 17
Indexing Options
©2016 Couchbase Inc. 18©2016 Couchbase Inc.
Index Options
Index Type Description
1 Primary Index Index on the document key on the whole bucket
2 Named Primary Index Give name for the primary index. Allows multiple primary indexes in the cluster
3 Secondary Index Index on the key-value or document-key
4 Secondary Composite Index
Index on more than one key-value
5 Functional Index Index on function or expression on key-values
6 Array Index Index individual elements of the arrays
7 Partial Index Index subset of items in the bucket
8 Covering Index Query able to answer using the the data from the index and skips retrieving the item.
9 Duplicate Index This is not type of index. Feature of indexing that allows load balancing. Thus providing scale-out, multi-dimensional scaling, performance, and high availability.
©2016 Couchbase Inc. 19©2016 Couchbase Inc.
Design Query & Index
Get the lowest 6 to 15 id’s between 0 and 1000 of the airlines in the “United States”. Also get country, name value with id’s.Sample Document: META().id : "airline_9833”{
"callsign": "Epic", "country": "United States", "iata": "FA", "icao": "4AA", "id": 9833, "name": "Epic Holiday", "type": "airline"}
Type of document Count
airline 187
airport 1968
route 24024
landmark 4495
hotel 917
total 31591
Type of document Counttype = “airline” AND country = “United States” 127
type = “airline” AND country = “United States” AND id BETWEEN 0 AND 1000 18
©2016 Couchbase Inc. 20©2016 Couchbase Inc.
Design Query & Index
• Using Primary Index• The data source as 31592 documents• Primary Index gets all the document keys from the index, the documents, apply
predicate, sort and then paginate to return 10 documents
• Using Secondary index
• Predicate (type = "airline") is pushed to indexer, fetch 187 documents• Two predicates not pushed to indexer: (country = "United States" AND id BETWEEN
0 AND 1000)
SELECT country, id, nameFROM `travel-sample`WHERE type = "airline" AND country = "United States" AND id BETWEEN 0 AND 1000ORDER BY idLIMIT 10OFFSET 5;
CREATE INDEX ts_ix1 ON `travel-sample`(type);
©2016 Couchbase Inc. 21©2016 Couchbase Inc.
Design Query & Index
Composite index on all attributes in query predicates
• All predicates are pushed to indexer; fetches 18+ documents.Partial composite index
• The document type can be an index condition• Because document type check is equality, remove it. • Leaner index performs better (saves I/O, memory, CPU, Network)
Covering partial composite index
• Add all referenced attributes to index keys. E.g., name• Covered query avoids document fetch
CREATE INDEX ts_ix1 ON `travel-sample`(type,id,country);
CREATE INDEX ts_ix1 ON `travel-sample`(id,country) WHERE type = "airline";
CREATE INDEX ts_ix1 ON `travel-sample`(id, country, name) WHERE type = "airline";
©2016 Couchbase Inc. 22©2016 Couchbase Inc.
Design Query & Index
ORDER BY optimization• Index stores data is pre-sorted by the index keys• ORDER BY list should match with INDEX keys list order: left to right.• Explain index order to avoid additional fetch and sort
LIMIT pushdown to indexer.
"spans":[ { "Range":{ "High":[ "1000", "successor(\"United States\")" ], "Inclusion":1, "Low":[ “0", "\"United States\"" ] } } ]
LIMIT pushing to indexer improves efficiency & performanceCondition:• Exact predicates are pushed down to indexer • ORDER BY matches index key order• Indexer evaluates all of predicates• Unsupported: JOINs, GROUP BY
©2016 Couchbase Inc. 23©2016 Couchbase Inc.
Design Query & Index
Optimizer for ORDER BY with LIMIT• Query has equal predicate on country; id has range predicate; • This exact predicate will product exact results• Changing to: ORDER BY country, id the result will be same; LIMIT can be pushed
down to indexer
Offset pushdown• Pushed as (limit + offset) and query skips over limit
CREATE INDEX ts_ix1 ON `travel-sample`(country, id, name) WHERE type = "airline";
SELECT country, id, nameFROM `travel-sample`WHERE type = "airline" AND country = "United States" AND id BETWEEN 0 AND 1000ORDER BY country, idLIMIT 10OFFSET 5;
©2016 Couchbase Inc. 24©2016 Couchbase Inc.
Final Query & Index
CREATE INDEX ts_ix1 ON `travel-sample`(country, id, name) WHERE type = "airline";
SELECT country, id, nameFROM `travel-sample`WHERE type = "airline" AND country = "United States" AND id BETWEEN 0 AND 1000ORDER BY country, idLIMIT 10OFFSET 5;
©2016 Couchbase Inc. 25©2016 Couchbase Inc.
Design Query & Index
Get the highest 6 to 15 id’s between 0 and 1000 of the airlines in the “United States”. Also get country, name value with id’s.
Index & Query (For Numbers only)
CREATE INDEX ts_ix1 ON `travel-sample`(country, -id, name) WHERE type = "airline";
SELECT country, -(-id), nameFROM `travel-sample`WHERE type = "airline" AND country = "United States" AND -id BETWEEN -1000 AND 0ORDER BY country, -idLIMIT 10OFFSET 5;
SELECT country, id, nameFROM `travel-sample`WHERE type = "airline" AND country = "United States" AND id BETWEEN 0 AND 1000ORDER BY country, id DESC LIMIT 10OFFSET 5;
©2016 Couchbase Inc. 26
Index Design
©2016 Couchbase Inc. 27©2016 Couchbase Inc.
Advice on Index Design. Part 1
• Standard GSI : Smaller mutations, larger index• MOI Index: . 100% of the data in memory. Large # of mutations. Better performance• Avoid primary index scan in production.
• Avoid creating the primary index itself.• Primary scan equivalent of table scan.
• Query has right predicate to choose right index• Query needs to have predicates on leading index keys
• Explore all combinations of index options. • Divide and conquer with partial indexes. They support complex expressions.
• Index can have large number of keys with maximum total key size: 4096.• Create the index with predicate attributes as leading keys of index, followed by non predicate attributes
for covering.• If the query is not covered, index keys should only be attributes used in query predicates
©2016 Couchbase Inc. 28©2016 Couchbase Inc.
Advice on Index Design. Part 2
• Index key order:• Attributes typically used with EQUALITY & IN predicates• Followed by BETWEEN ({<,<=} AND {>,>=})• Followed by less than (<, <=)• Followed by (>)
• If partial index condition has equal predicate on field, don’t include that field as index keys to make index LEAN (4.5.0+)
• META().id is always present. If META().id not part of the predicate, don’t include in the index keys.• Only indexable META() filed is META().id, all others required fetch of the items.• Remove unused indexes.• If Index doesn’t fit in memory for MOI) use partial index.• If index is heavily used create duplicate index. • Add index nodes.
©2016 Couchbase Inc. 29
Query Tuning
©2016 Couchbase Inc. 30©2016 Couchbase Inc.
Advice on Query Performance
• EXPLAIN to analyze query plan • Index selection, spans for push down of as many predicates as possible. More the merrier• Pushdown of LIMIT,OFFSET• Index order for ORDER BY• Covering index• Simple COUNT queries can take advantage of index count• Exploit index for MIN queries• For ANY, ANY AND EVERY, WITHIN predicates use ARRAY index.• For UNNEST, use ARRAY index. Array key has to be the leading key (Only for UNNEST)• USE IN instead of WITHIN• Use pretty=false (4.5.1), max_parallelism when queries return large resultset• Improve fetch performance by increasing pipeline-cap, pipeline-batch
• Exploit array fetch by query rewite• Execute query and explore each phase of monitoring stats of query.• Monitor CPU and memory usage and adjust number of Query Service Nodes..
©2016 Couchbase Inc. 31©2016 Couchbase Inc.
SELECT: JOIN
SELECT COUNT(1)FROM `beer-sample` beerINNER JOIN `beer-sample` brewery ON KEYS beer.brewery_idWHERE state = ‘CA’
• JOIN operation stitches two keyspaces
• JOIN criteria is based on ON KEYS clause
• The outer table uses the index scan, if possible
• The fetch of the inner table (brewery) document-by-document
• 4.6 improves this by fetching in batches.
©2016 Couchbase Inc. 32©2016 Couchbase Inc.
SELECT: JOIN
SELECT COUNT(1)FROM ( SELECT RAW META().id FROM `beer-sample` beer WHERE state = ‘CA’) as blistINNER JOIN `beer-sample` brewery ON KEYS blist;
SELECT COUNT(1)FROM ( SELECT ARRAY_AGG(META().id) karray FROM `beer-sample` beer WHERE state = ‘CA’) as bINNER JOIN `beer-sample` brewery ON KEYS b.karray;
• Why not get all of the required document IDs from the index scan then do a big bulk get on the outer table?
• Two ways to do it.a) Use the array aggregate (ARRAY_AGG()) to create the listb) Use RAW to create the the array and then use that to JOIN.
©2016 Couchbase Inc. 33©2016 Couchbase Inc.
DISTINCT1. select DISTINCT type from `travel-sample`;2. SELECT MIN(type) FROM `travel-sample` WHERE type IS NOT MISSING;3. SELECT MIN(type) FROM `travel-sample` WHERE type > "airline";
import requestsimport jsonurl="http://localhost:8093/query"s = requests.Session()s.keep_alive = Trues.auth = ('Administrator','password')query = {'statement':'SELECT MIN(type) minval FROM `travel-sample` WHERE type IS NOT MISSING ;'}r = s.post(url, data=query, stream=False, headers={'Connection':'close'})result = r.json()['results'][0]lastval = result['minval']while lastval != None: print lastval stmt = 'SELECT MIN(type) minval FROM `travel-sample` WHERE type > "' + lastval + '";'; query = {'statement':stmt} r = s.post(url, data=query, stream=False, headers={'Connection':'close'}) result = r.json()['results'][0] lastval = result['minval']
©2016 Couchbase Inc. 34©2016 Couchbase Inc.
GROUP, COUNT()SELECT type, count(type) FROM `travel-sample`GROUP BY type;
SELECT type, count(type) FROM `travel-sample`WHERE type IS NOT MISSINGGROUP BY type;
Step 1: Get the first entry in the index for the type.Step 2: Then, COUNT() from the data set where type = first-value.Step 3: Now we use the index to find the next value for type.Step 4: Repeat step 2 and 3 for all the values of type.
©2016 Couchbase Inc. 35©2016 Couchbase Inc.
GROUP, COUNT()import requestsimport json
url="http://localhost:8093/query"s = requests.Session()s.keep_alive = Trues.auth = ('Administrator','password')
query = {'statement':'SELECT MIN(type) minval FROM `travel-sample` WHERE type IS NOT MISSING ;'}r = s.post(url, data=query, stream=False, headers={'Connection':'close'})result = r.json()['results'][0]lastval = result['minval']while lastval != None: stmt = 'SELECT COUNT(type) tcount FROM `travel-sample` WHERE type = "' + lastval + '";'; query = {'statement':stmt} r = s.post(url, data=query, stream=False, headers={'Connection':'close'}) result = r.json()['results'][0] tcount = result['tcount'] print lastval, tcount
stmt = 'SELECT MIN(type) minval FROM `travel-sample` WHERE type > "' + lastval + '";'; query = {'statement':stmt} r = s.post(url, data=query, stream=False, headers={'Connection':'close'}) result = r.json()['results'][0] lastval = result['minval']
©2016 Couchbase Inc. 36
Deployment & Configuration
©2016 Couchbase Inc. 37
Deployment
• Couchbase Cluster Services• Data• Index• Query• FTS• Analytics
• Data Service• Enough RAM to cache reads• Enough Disk to eventually persist
writes• CPU primarily for View and XDCR• At least 3 nodes – Replication at the
bucket level• Minimum requirements: 4GB RAM, 8
Cores CPU
• Index Service• Primarily RAM and Disk IO bound• ForestDB persistence engine• MOI – Memory Optimized Index• At least 2 nodes for HA, each index
replicated individually• Minimum requirements : 8GB RAM,
8 Cores CPU, fast disk• Query Service
• Primarily CPU bound• Very low disk requirements• At least 2 nodes for HA – Queries
automatically load balanced by CB SDKs
• Minimum requirements : 8GB RAM, 16+ Cores CPU
©2016 Couchbase Inc. 38
Deployment
• Multi Dimensional Scalability (MDS)• Option1: All services enabled on all the nodes• Option 2: Separated services – size nodes depends on workload.
©2016 Couchbase Inc. 39©2016 Couchbase Inc.
Query Configurationcurl -u Administrator:password http://localhost:8093/admin/settings >z.json
{ "completed-limit": 4000, "completed-threshold": 1000, "cpuprofile": "", "debug": false, "keep-alive-length": 16384, "loglevel": "INFO", "max-parallelism": 1, "memprofile": "", "pipeline-batch": 16, "pipeline-cap": 512, "request-size-cap": 67108864, "scan-cap": 0, "servicers": 32, "timeout": 0}
©2016 Couchbase Inc. 40©2016 Couchbase Inc.
Query Configuration
{ "completed-limit": 4000, "completed-threshold": 1000, "cpuprofile": "", "debug": false, "keep-alive-length": 16384, "loglevel": "INFO", "max-parallelism": 1, "memprofile": "", "pipeline-batch": 1024, "pipeline-cap": 4096, "request-size-cap": 67108864, "scan-cap": 0, "servicers": 32, "timeout": 0}
curl -u Administrator:password http://localhost:8093/admin/settings -XPOST -d@./z.json
©2016 Couchbase Inc. 41©2016 Couchbase Inc.
Query Configuration
{ "completed-limit": 4000, "completed-threshold": 1000, "cpuprofile": "", "debug": false, "keep-alive-length": 16384, "loglevel": "INFO", "max-parallelism": 1, "memprofile": "", "pipeline-batch": 1024, "pipeline-cap": 4096, "request-size-cap": 67108864, "scan-cap": 0, "servicers": 32, "timeout": 0}
©2016 Couchbase Inc. 42©2016 Couchbase Inc.
Query Configuration --
curl -X POST -u Administrator:<password> http://127.0.0.1:9000/diag/eval/ -d 'ns_config:set({node, node(), {query, extra_args}}, ["-- pipeline-batch=1024", "--pipeline-cap =4096"])'
• Updating parameters via ns_server changes the values permanently• The values survive the restart• You can change any of the parameters over command line
©2016 Couchbase Inc. 43©2016 Couchbase Inc.
Query Configuration. -acctstore="gometrics:": Accounting store address (http://URL or stub:) -certfile="": HTTPS certificate file -completed-limit=4000: maximum number of completed requests -completed-threshold=1000: cache completed query lasting longer than this many milliseconds -configstore="stub:": Configuration store address (http://URL or stub:) -cpuprofile="": write cpu profile to file -datastore="": Datastore address (http://URL or dir:PATH or mock:) -debug=false: Debug mode -enterprise=true: Enterprise mode -http=":8093": HTTP service address -https=":18093": HTTPS service address -keep-alive-length=16384: maximum size of buffered result -keyfile="": HTTPS private key file -logger="": Logger implementation -loglevel="info": Log level: debug, trace, info, warn, error, severe, none -max-parallelism=1: Maximum parallelism per query; use zero or negative value to disable -memprofile="": write memory profile to this file -metrics=true: Whether to provide metrics -mutation-limit=0: Maximum LIMIT for data modification statements; use zero or negative value to disable -namespace="default": Default namespace -order-limit=0: Maximum LIMIT for ORDER BY clauses; use zero or negative value to disable -pipeline-batch=16: Number of items execution operators can batch -pipeline-cap=512: Maximum number of items each execution operator can buffer -plus-servicers=256: Plus servicer count -pretty=true: Pretty output -readonly=false: Read-only mode -request-cap=1024: Maximum number of queued requests per logical CPU -request-size-cap=67108864: Maximum size of a request -scan-cap=0: Maximum buffer size for primary index scans; use zero or negative value to disable -servicers=64: Servicer count -signature=true: Whether to provide signature -ssl_minimum_protocol="tlsv1": TLS minimum version ('tlsv1'/'tlsv1.1'/'tlsv1.2') -static-path="static": Path to static content -timeout=0: Server execution timeout, e.g. 500ms or 2s; use zero or negative value to disable
©2016 Couchbase Inc. 45
Thank You!