elastic meetup june16

1

Miguel Bosin Support Engineer, @miguelbosin

Hot/Warm Architecture + Sizing

2

Intro

Int• Miguel Bosin

– Support engineer– Joined in 2015– Interested in techonology

– Passion about support

• Elastic– Founded in 2012– Distributed company– Elasticsearch: What’s it?– Open-source:ES,LS,Kibana and Beats– Commercial:

X-Pack

3

Intro

• Miguel Bosin– Support engineer– Joined in 2015– Interested in techonology

– Passion about support

• Elastic– Founded in 2012– Distributed company– Elasticsearch: What’s

it – Open-source:ES,LS,Kibana and Beats– Commercial:

X-Pack

4

What is it? Open source Distributed-scalable Highly available Document-oriented (JSON) RESTful FT search engine with real-time search and analytics capabilities

5

Agenda

Elastic overview1

Sizing introduction3

Hot/Warm architecture4

Elasticsearch basic architecture2

6

Elastic current’s products overview

7

Agenda

Elastic overview

Sizing introduction3


Elasticsearch basic architecture

1

2

8

Elasticsearch terminology

A node is a single Elasticsearch instance, a single JVM Multiple nodes can form a cluster

A cluster or a node can manage multiple indices An index is a container for data

A shard is a single piece of an Elasticsearch index A shard is either a primary or a replica

9

Elasticsearch terminology II

10

Elasticsearch terminology III

11

Elasticsearch Architecture: Node roles

Master node:

coordinates the cluster only node able to apply changes to cluster state publishes updated cluster state to all nodes

Data node:

performs indexing can allocate shards locally knows cluster state

12

Elasticsearch Architecture: Node roles II

Client node:

does NOT perform indexing or allocate shards locally does NOT perform cluster management operations knows cluster state smart load balancer (load balancing Kibana searches i.e.) redirect operations to the nodes that holds the relevant

data calculate aggregations results

13

Nodes roles are set in the elasticsearch.yml

Elasticsearch Architecture: Node roles III

14

Architecture: node roles

15

Architecture: node roles

16

Architecture special case: dedicated master nodes

17

Dedicated master nodes –Why / minimum_master_nodes

Indexing and searching data is CPU-, memory-, and I/O-intensive work which can put pressure on a node’s resources

Avoiding split brain: 2 current master nodes on the same cluster DATA LOSS

Set this setting discovery.zen.minimum_master_nodes to the quorum:

(master_eligible_nodes / 2) + 1

18

Agenda

Elastic overview

Sizing introduction



1

3

2

19

Sizing: general factors (server capacity)

• Disks (SSD vs. HD)

• RAM -1/2 total RAM for ES

-ES heap size max: 30.5Gb

• # CPU cores -ES threadpools concept

**1 shard—>gets 1 thread—>1 java process—>1core**

20

Sizing: Elasticsearch factors (logging case)

Size of shards Number of shards on each node Retention period of data Mapping configuration -Which fields are searchable, _source enabled or

not,etc… Size (average) of the documents

21

Sizing: Capacity planning test I

FIRST: testing on a single node with a single index with one shard and no replica

THEN: insert as many documents as you can and run some typical queries

At some point, queries will start to slow down to a threshold, which no longer meet your requirements

This is the ideal number of documents a single shard is able to hold

NEXT: Find the ideal number your primary shards (by dividing your dataset size by the ideal shard size)

FINALLY: Add replicas for HA and improve the read throughput

22

Sizing: Capacity planning test II

Each experiment tries to accomplish a discreet goal and build upon previous

22

Determine various disk utilization

1 2 3 4

Determine breaking point of a shard

Determine saturation point of

a node

Test desired configuration on two node cluster

23

Agenda

Elastic overview

Sizing introduction

Hot/Warm architecture

3


1

2

4

24

Hot / Warm architecture

When using it?

Elasticsearch for larger time-data analytics use cases Using time-based indices Able to run an architecture with 3 different types of nodes

25

Hot / Warm architecture: Type of nodes

Master, Hot and Warm nodes:

Master nodes: 3 dedicated master nodes Hot data nodes: perform all indexing and also hold the most

recent daily (data to be queried most frequently). Powerful machines with SSD storage

Warm data nodes: handle a large amount of read-only indices that are not queried frequently. Very large attached spinning disks

26

Hot / Warm architecture: tagging

Which node is doing what?

ES needs to know which servers contain the hot nodes and which servers contain the warm nodes

This can be achieved by assigning arbitrary tags to each server (Hot/Warm)

Tag the node with node.box_type: xxx in elasticsearch.yml

OR start a node using ./bin/elasticsearch --node.box_type xxx

27

Hot / Warm architecture: Force Merge API

Optimizing your indices in the Warm Node

The force merge API allows to force merging of one or more indices through an API. Optimizes the index for faster search operation

The merge relates to the number of segments a Lucene index holds within each shard

The force merge operation allows to reduce the number of segments by merging them:

$ curl -XPOST 'http://localhost:9200/my_index/_forcemerge'

28

Hot / Warm architecture: Demo time!!

DEMO

https://zoom.us/recording/play/BAOH3cNWg15IVAUfSth4JjhXWQVb7D-EYR09GrEN2ZfsIOdoJvzoClgzp0a9tatE

elastic meetup june16

Presentations & Public Speaking