tech talk @ google on flink fault tolerance and ha
TRANSCRIPT
![Page 1: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/1.jpg)
Apache Flink StreamingResiliency and Consistency
Paris Carbone - PhD Candidate KTH Royal Institute of Technology
<[email protected], [email protected]>1
Technical Talk @ Google
![Page 2: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/2.jpg)
Overview• The Flink Execution Engine Architecture
• Exactly-once-processing challenges
• Using Snapshots for Recovery
• The ABS Algorithm for DAGs and Cyclic Dataflows
• Recovery, Cost and Performance
• Job Manager High Availability
2
![Page 3: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/3.jpg)
Current Focus
3
Streaming APIBatch API
Flink Optimiser
Flink Runtime
Tabl
e
ML
Gel
ly
ML
Gel
ly
Stat
e M
anag
emen
t
![Page 4: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/4.jpg)
Executing Pipelines
• Streaming pipelines translate into job graphs
4
Flink Runtime
Flink Job Graph Builder/Optimiser
Flink Client User Pipeline
Job Graph
![Page 5: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/5.jpg)
Unbounded Data Processing Architectures
5(Hadoop, Spark) (Spark Streaming)
1) Streaming (Distributed Data Flow)
LONG-LIVED TASK EXECUTION STATE IS KEPT INSIDE TASKS
2) Micro-Batch
![Page 6: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/6.jpg)
The Flink Runtime Engine
• Tasks run operator logic in a pipelined fashion
• They are scheduled among workers
• State is kept within tasks
6
LONG-LIVED TASK EXECUTION
Job Manager• scheduling • monitoring
![Page 7: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/7.jpg)
Task Failures • Task failures are guaranteed to occur
• We need to make them transparent
• Can we simply recover a task from scratch?
7
task
state
consistent state
no duplicatesno loss
![Page 8: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/8.jpg)
Record Acknowledgements
• We can monitor the consumption of every event
• It guarantees no loss
• Duplicates and inconsistent state are not handled
8
task
state
consistent state
no duplicatesno loss
bookkeeping
![Page 9: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/9.jpg)
Atomic transactions per update
• It guarantees no loss and consistent state.
• No duplication can be achieved e.g. with bloom filters or caching
• Fine grained recovery
• Non-constant load in external storage can lead to unsustainable execution
9
task
state
consistent state
no duplicatesno loss
state
statestate
DistributedStorage
Trident
![Page 10: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/10.jpg)
Lessons Learned from Batch
10
batch-1batch-2
• If a batch computation fails, simply repeat computation as a transaction
• Transaction rate is constant
• Can we apply these principles to a true streaming execution?
![Page 11: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/11.jpg)
Distributed Snapshots
11
![Page 12: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/12.jpg)
Distributed Snapshots
11
“A collection of operator states and records in transit (channels) that reflects a
moment at a valid execution”
![Page 13: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/13.jpg)
Distributed Snapshots
11
“A collection of operator states and records in transit (channels) that reflects a
moment at a valid execution”
p1
p2
p3
p4
![Page 14: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/14.jpg)
Distributed Snapshots
11
“A collection of operator states and records in transit (channels) that reflects a
moment at a valid execution”
p1
p2
p3
p4
![Page 15: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/15.jpg)
Distributed Snapshots
11
“A collection of operator states and records in transit (channels) that reflects a
moment at a valid execution”
p1
p2
p3
p4
consistent cut
![Page 16: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/16.jpg)
Distributed Snapshots
11
“A collection of operator states and records in transit (channels) that reflects a
moment at a valid execution”
p1
p2
p3
p4
consistent cut
![Page 17: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/17.jpg)
Distributed Snapshots
11
“A collection of operator states and records in transit (channels) that reflects a
moment at a valid execution”
p1
p2
p3
p4
consistent cut
causality violation
![Page 18: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/18.jpg)
Distributed Snapshots
11
“A collection of operator states and records in transit (channels) that reflects a
moment at a valid execution”
p1
p2
p3
p4
consistent cut
inconsistent cut
causality violation
![Page 19: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/19.jpg)
Distributed Snapshots
11
“A collection of operator states and records in transit (channels) that reflects a
moment at a valid execution”
p1
p2
p3
p4
consistent cut
inconsistent cut
causality violation
Idea: We can resume from a snapshot that defines a consistent cut
![Page 20: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/20.jpg)
Distributed Snapshots
11
Assumptions• repeatable sources • reliable FIFO channels
“A collection of operator states and records in transit (channels) that reflects a
moment at a valid execution”
p1
p2
p3
p4
consistent cut
inconsistent cut
causality violation
Idea: We can resume from a snapshot that defines a consistent cut
![Page 21: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/21.jpg)
Distributed Snapshots
12
![Page 22: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/22.jpg)
Distributed Snapshots
12
![Page 23: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/23.jpg)
Distributed Snapshots
12
t1
snap - t1
![Page 24: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/24.jpg)
Distributed Snapshots
12
t1
snap - t1
![Page 25: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/25.jpg)
Distributed Snapshots
12
t2t1
snap - t1 snap - t2
![Page 26: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/26.jpg)
Distributed Snapshots
12
t2t1
snap - t1 snap - t2
![Page 27: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/27.jpg)
Distributed Snapshots
12
t3t2t1
snap - t1 snap - t2
![Page 28: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/28.jpg)
Distributed Snapshots
12
t3t2t1
reset from snap t2snap - t1 snap - t2
![Page 29: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/29.jpg)
Distributed Snapshots
12
t3t2t1
reset from snap t2snap - t1 snap - t2
Assumptions• repeatable sources • reliable FIFO channels
![Page 30: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/30.jpg)
Taking Snapshots
13
t2t1
execution snapshots
Initial approach (see Naiad)• Pause execution on t1,t2,.. • Collect state • Restore execution
![Page 31: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/31.jpg)
Lamport On the Rescue
14
“The global-state-detection algorithm is to be superimposed on the underlying computation:
it must run concurrently with, but not alter, this underlying computation”
![Page 32: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/32.jpg)
Chandy-Lamport Snapshots
15
p1 p2
p1
Using Markers/Barriers• Triggers Snapshots • Separates preshot-postshot events • Leads to a consistent execution cut
markers propagate the snapshot execution under continuous ingestion markers
![Page 33: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/33.jpg)
Asynchronous Snapshots
16
t2t1
snap - t1 snap - t2
snapshotting snapshotting
propagating markers
![Page 34: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/34.jpg)
Asynchronous Barrier Snapshotting
17
barriers
![Page 35: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/35.jpg)
Asynchronous Barrier Snapshotting
18
![Page 36: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/36.jpg)
Asynchronous Barrier Snapshotting
19
![Page 37: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/37.jpg)
Asynchronous Barrier Snapshotting
19
![Page 38: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/38.jpg)
Asynchronous Barrier Snapshotting
20
![Page 39: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/39.jpg)
Asynchronous Barrier Snapshotting
21
![Page 40: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/40.jpg)
Asynchronous Barrier Snapshotting
21
![Page 41: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/41.jpg)
Asynchronous Barrier Snapshotting
22
![Page 42: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/42.jpg)
Asynchronous Barrier Snapshotting
22
![Page 43: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/43.jpg)
Asynchronous Barrier Snapshotting
22
pending
![Page 44: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/44.jpg)
Asynchronous Barrier Snapshotting
22
aligning
pending
![Page 45: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/45.jpg)
Asynchronous Barrier Snapshotting
23
aligning
aligning
![Page 46: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/46.jpg)
Asynchronous Barrier Snapshotting
24
aligning
![Page 47: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/47.jpg)
Asynchronous Barrier Snapshotting
24
aligning
![Page 48: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/48.jpg)
Asynchronous Barrier Snapshotting
25
aligning
![Page 49: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/49.jpg)
Asynchronous Barrier Snapshotting
25
aligning
![Page 50: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/50.jpg)
Asynchronous Barrier Snapshotting
26
![Page 51: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/51.jpg)
Asynchronous Barrier Snapshotting Benefits
27
• Taking advantage of the execution graph structure
• No records in transit included in the snapshot
• Aligning has lower impact than halting
![Page 52: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/52.jpg)
Snapshots on Cyclic Dataflows
28
![Page 53: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/53.jpg)
Snapshots on Cyclic Dataflows
29
![Page 54: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/54.jpg)
Snapshots on Cyclic Dataflows
30
![Page 55: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/55.jpg)
Snapshots on Cyclic Dataflows
30
![Page 56: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/56.jpg)
Snapshots on Cyclic Dataflows
31
![Page 57: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/57.jpg)
Snapshots on Cyclic Dataflows
31
deadlock
![Page 58: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/58.jpg)
Snapshots on Cyclic Dataflows
• Checkpointing should eventually terminate
• Records in transit within loops should be included in the checkpoint
![Page 59: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/59.jpg)
Snapshots on Cyclic Dataflows
33
![Page 60: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/60.jpg)
Snapshots on Cyclic Dataflows
33
![Page 61: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/61.jpg)
Snapshots on Cyclic Dataflows
34
![Page 62: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/62.jpg)
Snapshots on Cyclic Dataflows
34
![Page 63: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/63.jpg)
Snapshots on Cyclic Dataflows
35
downstream backup
![Page 64: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/64.jpg)
Snapshots on Cyclic Dataflows
36
downstream backup
![Page 65: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/65.jpg)
Snapshots on Cyclic Dataflows
36
downstream backup
![Page 66: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/66.jpg)
Snapshots on Cyclic Dataflows
37
![Page 67: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/67.jpg)
Implementation in Flink
38
• Coordinator/Driver Actor per job in JM
• sends periodic markers to sources
• collects acknowledgements with state handles and registers complete snapshots
• injects references to latest consistent state handles and back-edge records in pending execution graph tasks
• Tasks
• snapshot state transparently in given backend upon receiving a barrier and send ack to coordinator
• propagate barriers further like normal records
![Page 68: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/68.jpg)
Recovery
39
• Full rescheduling of the execution graph from the latest checkpoint - simple, no further modifications
• Partial rescheduling of the execution graph for all upstream dependencies - trickier, duplicate elimination should be guaranteed
![Page 69: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/69.jpg)
Performance Impact
40
http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
![Page 70: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/70.jpg)
Node Failures
41
Job Manager
Task Manager
Task Manager
Task Manager
task task task
Clientjob graph
optimiser
JM State • Pending Execution Graphs • Snapshots (State Handles)
![Page 71: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/71.jpg)
Node Failures
41
Job Manager
Task Manager
Task Manager
Task Manager
task task task
Clientjob graph
optimiser
JM State • Pending Execution Graphs • Snapshots (State Handles)
![Page 72: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/72.jpg)
Node Failures
41
Job Manager
Task Manager
Task Manager
Task Manager
task task task task
Clientjob graph
optimiser
JM State • Pending Execution Graphs • Snapshots (State Handles)
![Page 73: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/73.jpg)
Node Failures
41
Job Manager
Task Manager
Task Manager
Task Manager
task task task task
Clientjob graph
optimiser
JM State • Pending Execution Graphs • Snapshots (State Handles)
![Page 74: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/74.jpg)
Node Failures
41
Job Manager
Task Manager
Task Manager
Task Manager
task task task task
Clientjob graph
optimiser
JM State • Pending Execution Graphs • Snapshots (State Handles)
Single Point Of Failure
![Page 75: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/75.jpg)
Passive Standby JM
42
JM JM JM
Zookeeper State • Leader JM Address • Pending Execution Graphs • State Snapshots
Client
![Page 76: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/76.jpg)
Eventual Leader Election
43
time
![Page 77: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/77.jpg)
Eventual Leader Election
43
JM JM JMt0time
![Page 78: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/78.jpg)
Eventual Leader Election
43
JM JM JMt0
t1 JM JM JM
time
![Page 79: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/79.jpg)
Eventual Leader Election
43
JM JM JMt0
t1
t2
JM JM JM
JM JM JM
…recovering
time
![Page 80: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/80.jpg)
Eventual Leader Election
43
JM JM JMt0
t1
t2
t3
JM JM JM
JM JM JM
…recovering
JM JMJM
time
![Page 81: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/81.jpg)
Scheduling Jobs
44
Job Manager
Task Manager
Task Manager
Task Manager
Client
optimiser
Zookeeper
![Page 82: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/82.jpg)
Scheduling Jobs
44
Job Manager
Task Manager
Task Manager
Task Manager
Client
optimiser
Zookeeperjob-1
![Page 83: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/83.jpg)
Scheduling Jobs
44
Job Manager
Task Manager
Task Manager
Task Manager
Client
optimiser
Zookeeperjob-1execution
graph
![Page 84: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/84.jpg)
Scheduling Jobs
44
Job Manager
Task Manager
Task Manager
Task Manager
Client
optimiser
Zookeeperjob-1execution
graph
blocking write in Zookeeper
![Page 85: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/85.jpg)
Scheduling Jobs
44
Job Manager
Task Manager
Task Manager
Task Manager
Client
optimiser
Zookeeperjob-1execution
graph
task
blocking write in Zookeeper
![Page 86: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/86.jpg)
Scheduling Jobs
44
Job Manager
Task Manager
Task Manager
Task Manager
task task task task
Client
optimiser
Zookeeperjob-1execution
graph
task
blocking write in Zookeeper
![Page 87: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/87.jpg)
Logging Snapshots
45
Job Manager
Task Manager
Task Manager
Task Manager
task task task task
Zookeeper
State Backend
asynchronous writes in Zookeeper
![Page 88: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/88.jpg)
Logging Snapshots
45
Job Manager
Task Manager
Task Manager
Task Manager
task task task task
Zookeeper
State Backend
asynchronous writes in Zookeeper
Checkpointing states
![Page 89: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/89.jpg)
Logging Snapshots
45
Job Manager
Task Manager
Task Manager
Task Manager
task task task task
Zookeeper
State Backend
asynchronous writes in Zookeeper
Checkpointing states
![Page 90: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/90.jpg)
Logging Snapshots
45
Job Manager
Task Manager
Task Manager
Task Manager
task task task task
Zookeeper
State Backend
asynchronous writes in Zookeeper
state handle
Checkpointing states
![Page 91: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/91.jpg)
Logging Snapshots
45
Job Manager
Task Manager
Task Manager
Task Manager
task task task task
Zookeeper
State Backend
asynchronous writes in Zookeeper
state handle
Checkpointing states
![Page 92: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/92.jpg)
Logging Snapshots
45
Job Manager
Task Manager
Task Manager
Task Manager
task task task task
Zookeeper
State Backend
asynchronous writes in Zookeeper
state handle
Checkpointing states
![Page 93: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/93.jpg)
Logging Snapshots
45
Job Manager
Task Manager
Task Manager
Task Manager
task task task task
Zookeeper
State Backend
asynchronous writes in Zookeeper
state handle
Checkpointing states
![Page 94: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/94.jpg)
Logging Snapshots
45
Job Manager
Task Manager
Task Manager
Task Manager
task task task task
Zookeeper
State Backend
asynchronous writes in Zookeeper
state handle
Checkpointing states
![Page 95: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/95.jpg)
Logging Snapshots
45
Job Manager
Task Manager
Task Manager
Task Manager
task task task task
Zookeeper
State Backend
asynchronous writes in Zookeeper
global snapshot
state handle
Checkpointing states
![Page 96: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/96.jpg)
HA in Zookeeper
• Task Managers query ZK for the current leader before connecting (with lease)
• Upon standby failover restart pending execution graphs
• Inject state handles from the last global snapshot
![Page 97: Tech Talk @ Google on Flink Fault Tolerance and HA](https://reader034.vdocument.in/reader034/viewer/2022042723/587137651a28abf0568b6093/html5/thumbnails/97.jpg)
Summary• The Flink execution monitors long running tasks
• Establishing exactly-once-processing guarantees with snapshots can be achieved without halting the execution
• The ABS algorithm persists the minimal state at a low cost
• Master HA is achieved through Zookeeper and passive failover