elasticsearch cluster deep dive

Post on 22-Jan-2018

283 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ElasticsearchCluster deep dive

NoSQL: Text Search and Document

Elasticsearch cluster

Cluster documentation

Distributed

Client Nodes

Data Nodes

Master Nodes

Ingest Nodes

Today view of the cluster

Other Nodes

Master Nodes

What happen when a node starts?

Starting

What happen when a node starts?

E

D

A

B

C

Starting1. Get a list of nodes to ping from config

Master

What happen when a node starts?

E

D

A

B

C

Starting1. Get a list of nodes to ping from config2. Each response contains:

a. cluster nameb. node detailsc. master node detailsd. cluster state version

What happen when a node starts?

E

D

A

B

C

Starting1. Get a list of nodes to ping from config2. Each response contains:

a. cluster nameb. node detailsc. master node detailsd. cluster state version

3. Only keeps master eligible responses based on discovery.zen.master_election.ignore_non_master_pings

What happen when a node starts?

E

D

A

B

C

Starting● List of master nodes: [C, C]● List of eligible master nodes: [A, B, C]

What happen when a node starts?

E

D

A

B

C

Starting1. Join master node (C) sending:

internal:discovery/zen/join

What happen when a node starts?

E

D

A

B

C

Starting1. Join master node (C) sending:

internal:discovery/zen/join2. Master validates join sending:

internal:discovery/zen/join/validate

Cluster state update

What happen when a node starts?

E

D

A

B

C

Starting1. Join master node (C) sending:

internal:discovery/zen/join2. Master validates join sending:

internal:discovery/zen/join/validate

3. Master update the cluster state with the new node

What happen when a node starts?

E

D

A

B

C

Starting1. Join master node (C) sending:

internal:discovery/zen/join2. Master validates join sending:

internal:discovery/zen/join/validate

3. Master update the cluster state with the new node

4. Master waits for discovery.zen.minimum_master_nodes master eligible to respond

What happen when a node starts?

E

D

A

B

C

Starting1. Join master node (C) sending:

internal:discovery/zen/join2. Master validates join sending:

internal:discovery/zen/join/validate

3. Master update the cluster state with the new node

4. Master waits for discovery.zen.minimum_master_nodes master eligible to respond

5. Change commited and confirmation sent

What happen when a node starts?

E

D

A

B

C

Starting1. Join master node (C) sending:

internal:discovery/zen/join2. Master validates join sending:

internal:discovery/zen/join/validate

3. Master update the cluster state with the new node

4. Master waits for discovery.zen.minimum_master_nodes master eligible to respond

5. Change commited and confirmation sent

What happen when a node starts?

E

D

A

B

C

Starting1. New node check the received state for

a. new master nodeb. no master node in the state

Master fault detection

E

D

F A

B

C

Started● Every discovery.zen.fd.ping_interval

nodes ping master (default 1s)● Timeout is

discovery.zen.fd.ping_timeout (default 30s)

● Retry is discovery.zen.fd.ping_retries (default is 3)

Node fault detection

E

D

F A

B

C

Started● Every discovery.zen.fd.ping_interval

nodes ping master (default 1s)● Timeout is

discovery.zen.fd.ping_timeout (default 30s)

● Retry is discovery.zen.fd.ping_retries (default is 3)

Master election

Minimum of candidate required

Master election

E

D

F A

B

C

Network Partition

E

D

F A

B

C Master election cannot happen, master steps down

Network Partition

E

D

F A

B

C

Master fault detection triggers new master election

Master election

1. Based on the list of master eligible nodes it chooses in priority:a. The node with the higher cluster state version (part of the ping response)b. Master eligible nodec. Sort alphabetically the id of the remaining a take the first

2. Sends a join to this new master. In the meantime it accumulates join requests

If the current node elected itself as master it waits for the minimum join requests to declare itself as master (discovery.zen.minimum_master_nodes)

In case of master failure detection, each node removes the failed master from the candidates.

Latest cluster version

Lost update partially fixed in 5.0 found by jepsen test

E

D

F A

B

C v18

v18

v18

Lost update partially fixed in 5.0 found by jepsen test

E

D

F A

B

C v18

v19

v19

Lost update partially fixed in 5.0 found by jepsen test

E

D

F A

B

C v18

v20

v20

Lost update partially fixed in 5.0 found by jepsen test

E

D

F A

B

C v18

v20

v20

Lost update partially fixed in 5.0 found by jepsen test

E

D

F A

B

C v18

v20

v20

Lost update partially fixed in 5.0 found by jepsen test

E

D

F A

B

C v19

v20

v20

Lost update partially fixed in 5.0 found by jepsen test

E

D

F A

B

C v19

v20

v20

Cannot become the master

Shard allocation

Shard assigned to new node

1. Master will rebalance shard allocation to have:a. same average number of shard per nodeb. same average of shard per index per node avoiding 2 shard with the

same id on the same node2. Uses deciders to decide which shard goes where based on

a. Hot/Warm setup (time based indices)b. Disk usage allocation (low watermark and high watermark)c. Throttling (node is already recovering, master might again later)

Shard initialization (Primary)

1. Master communicate through cluster state a new shard assignment2. Node initialize an empty shard3. Node notify the master4. Master mark the shard as started5. If this is the first shard with a specific id, it is marked as primary is

receives requests

Shard initialization (Replica)

1. Master communicate through cluster state a new shard assignment2. Node initialize recovery from the primary3. Node notify the master4. Master mark the replica as started5. Node activate the replica

Shard recovery

Shard

S1S2S3

DISK

Memory

S1S2S3

Commit point

In memory buffer

Translog

Recovery from primary

Node with Primary Node with Replica

Start Recovery

1. Validate request2. Prevent translog from deletion3. Snapshot Lucene

Recovery from primary

Node with Primary Node with Replica

Start Recovery

1. Validate request2. Prevent translog from deletion3. Snapshot Lucene

Segments

Recovery from primary

Node with Primary Node with Replica

Start Recovery

1. Validate request2. Prevent translog from deletion3. Snapshot Lucene

Segments

Translog

Recovery from primary

Node with Primary Node with Replica

Start Recovery

1. Validate request2. Prevent translog from deletion3. Snapshot Lucene

Segments

Translog

Notifies master

Thank you !

top related