eventually elasticsearch: eventual consistency in the real world
TRANSCRIPT
eventually elasticsearch
dealing with temporal inconsistencies in
the real world ™
Anne Veling | @anneveling | March 25, 2015
agenda
• Introduction
• Bol.com Plaza / Square project
• Using ElasticSearch in a mixed DB landscape
– ES as a DB free-text index or as a separate DB
• Consistency issues and solutions
• Lessons learned
bol.com
• Leading ecommerce platform in The Netherlands and Belgium
– 5M active customers
– 1M visits every day
– 9M products
– €680M revenue
• Growing (pains)
– 750 employees, 37 scrum teams
– moving towards continuous deployment, team independence
• Plaza / Square Seller platform
– 7k sellers, 16% of total revenue
Square ElasticSearch
• Using ElasticSearch to combine Offer and Product information
– Offers from Oracle
– Products from MongoDb
• Replacing Oracle SQL queries
– Too slow for faceting and result sets (for sellers with over 2k offers)
• About 12M productoffer documents
• Scala, Team 1B
• ElasticSearch 1.4
– With Search, Master and Data nodes
• In production now, rolling out to sellers
option: right
• ElasticSearch as a free-text DB index on Offers
• DB update update ES too
– In the same ‘transaction’
• Benefits
– easier
• Drawbacks
– Less service independence
– Slower (b/c refresh)
SDD
SDD
PCS
PCS
STEP
SSY
ES
option: left
SDD
SDD
PCS
PCS
STEP
SSY
ES
• ElasticSearch as a separate database
• Updates from DB sent to ES via async queues
• Benefits
– Architecture more loosely coupled
– Search performance
• Drawbacks
– some latency between DB and ES: eventual consistency
“immediate” consistency?
• Relational databases
– User view vs. DB view
– Take it or leave it
– Only vertical scaling
• ElasticSearch
– Read snapshots by
refresh interval
– Caching
– Write once, read many
user 1 db user 2
START TRANSACTION;UPDATE OFFERS SET STOCK=1 WHERE ID=42;COMMIT TRANSACTION;
sources of temporal inconsistencies
• Internal inconsistencies
– within ElasticSearch
• External inconsistencies
– nature of ElasticSearch
– between Database and ElasticSearch
– between User expectations and Application behavior
send data to index API receives new data
updates index
quorum says ‘ok’
app master replica
got ‘ok’
user
curl -XPOST localhost:9200/demo/drinks -d '{brand:"Glenlivet", age:18}’
{"_index":"demo","_type":"drinks","_id":"AUxKuw5pxgWzNUrImnD4","_version":1,"created":true}
app master searchuser
curl -XPOST localhost:9200/demo/drinks -d '{brand:"Glenlivet", age:18}’
{"_index":"demo","_type":"drinks","_id":"AUxKuw5pxgWzNUrImnD4","_version":1,"created":true}
curl -XPOST localhost:9200/demo/drinks/_search -d '{query:{match:{brand:"Glenlivet"}}}'
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
refresh
refresh
ind
ex.r
efre
sh_i
nte
rval
influencing search refresh• Set index.refresh_interval
curl -XPUT localhost:9200/demo/_settings -d
'{index:{refresh_interval:"30s"}}’
• Refresh on demand
curl -XPOST localhost:9200/demo/_refresh
• Refresh after index (be careful!)
curl -XPOST
'localhost:9200/demo/drinks?refresh=true' -d
'{brand:"Famous Grouse", age:12}’
dealing with search delay
For a user updating a single item in the UI
• On the client
– Wait until refresh_interval has passed before searching again
– Do a get-by-id for changed item (=real time)
• And only change the single item (but: aggregations out sync)
• On the server
– Wait until refresh_interval has passed
– Show a “done” message and hope user is slow
– Refresh all searchers upon index (all searches slower!)
– Add queue priority
– Update ES too
• Or: accept eventual consistency
app ES dbqueue
async queue issue
Measure DB ES latency
{drinks: { _timestamp: {enabled: true, store: 'yes'}}}
localhost:9200/demo/_search?fields=_timestamp,_version,_source
measuring DB ES latencyPOST /productoffer-005/_search?fields=_timestamp,_source{
"size":0, "query": {
"range": {"modificationDate": {
"from": "now-7d"}
}},"aggs": {
"hokje": {"date_histogram": {
"field": "dateModification","interval": "10m"
},"aggs": {
"q": {"stats": {
"script”:"doc['_timestamp'].value - doc['modificationDate'].value"
}}
}}
}
app ES dbqueue
queue order issue
• Only update if newer (w/ optimistic locking)
– read (with _version) update index (with expected _version) retry
• version_type=external, use DB last-modified timestamp
curl -XPUT
localhost:9200/demo/drinks/1?version=1427279177904&version_type=
external -d '{brand: "Glenlivet", age: 12}'
conclusions• Compromises hurt someone
• Are you sure you want an eventual-consistent
database?
– Lots of patch work needed by bol.com…
– Choose left, make it look like you chose right
• In real-life, consistency concerns
– more than just ES-writes
– Also ES-reads
– How to get data in and keep fresh influences
DBES
DBES
right: as a free-text index
left: as a separate DB
ES Consistency
knobs to control “consistency level”
eventualimmediate
faster
slower
1
4
2
3
1. Optimistic
locking &
refresh=true
2. -
3. -
4. Eventually
consistent
ES DB
ES
ES
searcher
R CUD
refresh_interval
?consistency
_version
action.write_consistency
?refresh
indexer
lessons learned
• Make assumptions even more clear
• There is more to eventual consistency than you think
– User-oriented round-trip consistency latency in a mixed DB
context
• Use the ES knobs and dials to make it
– as consistent as you need
– while keeping it as fast as you can
• You have to know what you’re doing