elastic meetup june16

28
1 Miguel Bosin Support Engineer, @miguelbosin Hot/Warm Architecture + Sizing

Upload: miguel-bosin

Post on 13-Jan-2017

179 views

Category:

Presentations & Public Speaking


0 download

TRANSCRIPT

Page 1: Elastic meetup june16

1

Miguel Bosin Support Engineer, @miguelbosin

Hot/Warm Architecture + Sizing

Page 2: Elastic meetup june16

2

Intro

Int• Miguel Bosin

– Support engineer– Joined in 2015– Interested in techonology

– Passion about support

• Elastic– Founded in 2012– Distributed company– Elasticsearch: What’s it?– Open-source:ES,LS,Kibana and Beats– Commercial:

X-Pack

Page 3: Elastic meetup june16

3

Intro

• Miguel Bosin– Support engineer– Joined in 2015– Interested in techonology

– Passion about support

• Elastic– Founded in 2012– Distributed company– Elasticsearch: What’s

it – Open-source:ES,LS,Kibana and Beats– Commercial:

X-Pack

Page 4: Elastic meetup june16

4

What is it? Open source Distributed-scalable Highly available Document-oriented (JSON) RESTful FT search engine with real-time search and analytics capabilities

Page 5: Elastic meetup june16

5

Agenda

Elastic overview1

Sizing introduction3

Hot/Warm architecture4

Elasticsearch basic architecture2

Page 6: Elastic meetup june16

6

Elastic current’s products overview

Page 7: Elastic meetup june16

7

Agenda

Elastic overview

Sizing introduction3

Hot/Warm architecture4

Elasticsearch basic architecture

1

2

Page 8: Elastic meetup june16

8

Elasticsearch terminology

A node is a single Elasticsearch instance, a single JVM Multiple nodes can form a cluster

A cluster or a node can manage multiple indices An index is a container for data

A shard is a single piece of an Elasticsearch index A shard is either a primary or a replica

Page 9: Elastic meetup june16

9

Elasticsearch terminology II

Page 10: Elastic meetup june16

10

Elasticsearch terminology III

Page 11: Elastic meetup june16

11

Elasticsearch Architecture: Node roles

Master node:

coordinates the cluster only node able to apply changes to cluster state publishes updated cluster state to all nodes

Data node:

performs indexing can allocate shards locally knows cluster state

Page 12: Elastic meetup june16

12

Elasticsearch Architecture: Node roles II

Client node:

does NOT perform indexing or allocate shards locally does NOT perform cluster management operations knows cluster state smart load balancer (load balancing Kibana searches i.e.) redirect operations to the nodes that holds the relevant

data calculate aggregations results

Page 13: Elastic meetup june16

13

Nodes roles are set in the elasticsearch.yml

Elasticsearch Architecture: Node roles III

Page 14: Elastic meetup june16

14

Architecture: node roles

Page 15: Elastic meetup june16

15

Architecture: node roles

Page 16: Elastic meetup june16

16

Architecture special case: dedicated master nodes

Page 17: Elastic meetup june16

17

Dedicated master nodes –Why / minimum_master_nodes

Indexing and searching data is CPU-, memory-, and I/O-intensive work which can put pressure on a node’s resources

Avoiding split brain: 2 current master nodes on the same cluster DATA LOSS

Set this setting discovery.zen.minimum_master_nodes to the quorum:

(master_eligible_nodes / 2) + 1

Page 18: Elastic meetup june16

18

Agenda

Elastic overview

Sizing introduction

Hot/Warm architecture4

Elasticsearch basic architecture

1

3

2

Page 19: Elastic meetup june16

19

Sizing: general factors (server capacity)

• Disks (SSD vs. HD)

• RAM -1/2 total RAM for ES

-ES heap size max: 30.5Gb

• # CPU cores -ES threadpools concept

**1 shard—>gets 1 thread—>1 java process—>1core**

Page 20: Elastic meetup june16

20

Sizing: Elasticsearch factors (logging case)

Size of shards Number of shards on each node Retention period of data Mapping configuration -Which fields are searchable, _source enabled or

not,etc… Size (average) of the documents

Page 21: Elastic meetup june16

21

Sizing: Capacity planning test I

FIRST: testing on a single node with a single index with one shard and no replica

THEN: insert as many documents as you can and run some typical queries

At some point, queries will start to slow down to a threshold, which no longer meet your requirements

This is the ideal number of documents a single shard is able to hold

NEXT: Find the ideal number your primary shards (by dividing your dataset size by the ideal shard size)

FINALLY: Add replicas for HA and improve the read throughput

Page 22: Elastic meetup june16

22

Sizing: Capacity planning test II

Each experiment tries to accomplish a discreet goal and build upon previous

22

Determine various disk utilization

1 2 3 4

Determine breaking point of a shard

Determine saturation point of

a node

Test desired configuration on two node cluster

Page 23: Elastic meetup june16

23

Agenda

Elastic overview

Sizing introduction

Hot/Warm architecture

3

Elasticsearch basic architecture

1

2

4

Page 24: Elastic meetup june16

24

Hot / Warm architecture

When using it?

Elasticsearch for larger time-data analytics use cases Using time-based indices Able to run an architecture with 3 different types of nodes

Page 25: Elastic meetup june16

25

Hot / Warm architecture: Type of nodes

Master, Hot and Warm nodes:

Master nodes: 3 dedicated master nodes Hot data nodes: perform all indexing and also hold the most

recent daily (data to be queried most frequently). Powerful machines with SSD storage

Warm data nodes: handle a large amount of read-only indices that are not queried frequently. Very large attached spinning disks

Page 26: Elastic meetup june16

26

Hot / Warm architecture: tagging

Which node is doing what?

ES needs to know which servers contain the hot nodes and which servers contain the warm nodes

This can be achieved by assigning arbitrary tags to each server (Hot/Warm)

Tag the node with node.box_type: xxx in elasticsearch.yml

OR start a node using ./bin/elasticsearch --node.box_type xxx

Page 27: Elastic meetup june16

27

Hot / Warm architecture: Force Merge API

Optimizing your indices in the Warm Node

The force merge API allows to force merging of one or more indices through an API. Optimizes the index for faster search operation

The merge relates to the number of segments a Lucene index holds within each shard

The force merge operation allows to reduce the number of segments by merging them:

$ curl -XPOST 'http://localhost:9200/my_index/_forcemerge'