elasticsearch in production (london version)

Post on 27-Aug-2014

2.625 Views

Category:

Software

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Elasticsearch in production, or an overview of things you want to know about before happening upon them in production.

TRANSCRIPT

Elasticsearch in Production !

Alex Brasetvik alex@found.no @alexbrasetvik

Elasticsearch in Production !

Alex Brasetvik alex@found.no @alexbrasetvik

Who?

Co-founder of Found AS 8+ years search, 3+ Elasticsearch

Herding hundreds of Elasticsearch clusters

Agenda

Agenda• Anti-patterns

• Memory / Resource Usage

• Distributed problems

• Security

• Client concerns

• Changing a cluster

found.no/foundation

Elasticsearch in Production Elasticsearch as a NoSQL Database

Intro to Function Scoring All About Analyzers

Securing your Elasticsearch Cluster

Snapshot / Restore

Circuit breakersDocument values

Aggregations

Distributed percolation

Suggesters

Anti-Patterns

Arbitrary Keys

• “Schema Free”

• One field per value

• Ever-growing cluster state

acls: 1234: READ 42: WRITE

Heavy Updating

• Update = Delete + Reindex

• Be careful with counters

Slow queries

• WHERE foo ILIKE ‘%bar%’

• {“query_string”: {“query”: “foo:*bar*”}}

• Don’t ask for 3300 results :)

Arbitrary searchesquery: filtered: filter: term: user_id: 42 query: [user’s query here]

Memory

Memory• Field caches

• Filter caches

• Page caches

• Aggregations

• Index building

Page Cache

• Keeping index pages in memory

• Can’t have too much

• Outgrow: Gradual slowdown

Heap Space

• Memory used by Elasticsearch process

• Field / Filter caches

• Aggregations

Time Bomb

Time Bomb

OutOfMemoryError

Woah there I ate all the memories

Your cluster may or may not work any more

OutOfMemory

• Growing too big

• Selecting too big timespan in Kibana

• Document ingestion peak

Preventing OOMs• Have enough memory :-)

• Understand your search’s memory profile

• Bulk / Circuit breaker settings

• Monitoring

• Document values

Marvel( /_stats )

"my_field": { "type": "string", "fielddata": { "format": "doc_values" } }

Document Values

• Rely on page cache

• Only caches doc values actually used

Sizing

Sizing

• Test, don’t guess

• Start big, scale down

• Index, search, monitor

Glitch Meltdown

Glitch Meltdown

• Tie-breaker can be a cheap master-node

• Applies to data centers / availability zones too

Data-only nodes

Master-only nodes

Jepsen

Jepsen

• Kyle Kingsbury’s series on distributed systems

• Distributed systems are hard

• aphyr.com

Security

Security

• “Not my job!” – Elasticsearch

• That’s fine!

Dynamic Scripts

!

• Scoring

• Aggregations

• Updating

Dynamic Scripts

Runtime.getRuntime().exec(…)

Dynamic Scripts

Runtime.getRuntime().exec(…)

<script src=“http://127.0.0.1:9200/_search?callback=capture&…

Security

!

• Disable dynamic scripts (On by default in ≤1.1)

• Mind index patterns

• Even then, don’t accept arbitrary requests

Client Concerns

Client Concerns

• Connection pools

• Idempotent requests

• Have sane syncing/indexing strategies

# BOOM !

Cluster changes

Cluster changes

• Make new nodes join existing cluster

• No rolling restarts

• Easy rollback if things go bad

v1.0.0 v1.0.1

Cluster changes

• Test first

• Mind recover_*-settings

Multi-Cluster Workflows

• Snapshot/Restore

• Operations across clusters

• Swap clusters!

• Works well with good syncing strategy

• Rolling restarts: Risky, fast

• Grow and shrink: Less risky, copies lots of data

• Multiple clusters: Least risky, copies lots of data

Misc

• Same JVM

• ulimits

• Unicast

• Kernel-settings like IO-scheduler

?

@foundsays

top related