data pipeline monitoring - michiel kalkman · data pipeline reporng monitoring diagnoscs alerng...

39
Data Pipeline Monitoring Michiel Kalkman

Upload: others

Post on 27-Aug-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Data Pipeline Monitoring

Michiel Kalkman

Page 2: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Mental model of a pipeline

Figure 1: Actually a duct (Source:wikimedia)

Page 3: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Map of a real pipeline

Figure 2: Typical pipeline (Source:wikimedia)

Page 4: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Notes

Pipelines,▶ are systems▶ cross multiple political zones▶ cross multiple technical zones▶ have multiple inputs (providers, sources)▶ have multiple outputs (consumers, sinks)▶ carry payloads in multiple stages (refinements)

Page 5: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Break it down

By administrative zones

Defines supportability, frames arguments over responsibility

Page 6: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Observability

Page 7: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Pillars of Observability

Logs Metrics TracingAccounting X XReporting X XAlerting X XTesting X X XDiagnostics X X XVerification X XAuditing X

Page 8: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

What gets measured gets managed

Products

Data

Pipeline

Repor�ngMonitoringDiagnos�csAudi�ng Aler�ng Accoun�ng

MetricsLogs Tracing

Pipeline component

Figure 3: Component observability

Page 9: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

What to monitor

▶ Data flowing across platform boundaries▶ Cycles in the pipeline▶ Data flow pressure points▶ Baseline operation separate from service operation▶ Infrastructure separate from service operation▶ Quality control gateways for change

Page 10: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Feed

Products

Observability Data

Asset

Repor�ng Monitoring Diagnos�cs Aler�ng

Metrics

Pipeline component

Figure 4: Observability - Metrics

Page 11: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Metrics focus

A wide variety of metrics out there. It’s easy to get lost. Define high level metrics thatcan be compared consistenty across the entire landscape. Focus on two distinct areas.Different sides of the same coin,

Utilization, Saturation, Errors (USE)

These are resource focused and provide technical information▶ “Which servers are overloaded?”

Rate, Errors, Duration (RED)

These are service focused and provide business information▶ “Am I meeting my SLA targets?”

Page 12: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Four Golden Signals (Google SRE)

1. Latency2. Traffic3. Errors4. Saturation

Page 13: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Throughput - components

DownstreamInput

ForwarderOutput

Upstream

Count bytesCount events

Count bytesCount events

Figure 5: Component metrics

Page 14: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Throughput

Counter t1 t2input bytes 100 200output bytes 150 270input events 20 30output events 30 55

▶ Throughput Rate is (𝑡2 − 𝑡1)▶ Average event size

▶ 𝐼𝑛 = 𝑅𝑎𝑡𝑒(𝐵𝑦𝑡𝑒𝑠𝐼𝑛)𝑅𝑎𝑡𝑒(𝐸𝑣𝑒𝑛𝑡𝑠𝐼𝑛)

▶ 𝑂𝑢𝑡 = 𝑅𝑎𝑡𝑒(𝐵𝑦𝑡𝑒𝑠𝑂𝑢𝑡)𝑅𝑎𝑡𝑒(𝐸𝑣𝑒𝑛𝑡𝑠𝑂𝑢𝑡)

▶ Internal buffer pressure▶ 𝑅𝑎𝑡𝑒(𝐸𝑣𝑒𝑛𝑡𝑠𝐼𝑛) − 𝑅𝑎𝑡𝑒(𝐸𝑣𝑒𝑛𝑡𝑠𝑂𝑢𝑡)

Page 15: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Tracing

Page 16: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Feed

Products

Data

Pipeline

Repor�ng Monitoring Diagnos�cs Aler�ng Accoun�ng

Tracing

Pipeline component

Figure 6: Observability - Tracing

Page 17: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Three Ts

Inputs Outputs TransformationTransaction 1 1 New dataTransportation 1 1+ EnrichmentTransformation 1+ 1+ New data, enrichment

Page 18: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Transportation tracing

Downstream

Forwarder

Upstream

Figure 7: Transportation

Page 19: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Transaction tracing

User

Component A Component B

Figure 8: Distributed transaction

Page 20: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Transformation tracing

Source A Source B

Transformer

Upstream Target

Figure 9: Transformation

Page 21: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Monitoring

Page 22: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Plan for failure

Figure 10: Hopefully not this bad (Source:wikimedia)

Page 23: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Key monitoring points

▶ Integrity▶ Packet/event/record drops▶ Timeouts, queue expiries▶ Data loss scenarios

▶ Capacity▶ Backpressure signaling▶ Backlog processing▶ Peak hour spikes

Page 24: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Heartbeats

▶ Add a dummy input channel to each input▶ Continuously generate fixed data at fixed rate▶ Monitor dummy channel on each boundary▶ Alert on dummy channel rate at each boundary

Page 25: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Buffers, Backlogs and Backpressure

Page 26: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

MQ pipeline with push - dataflow

Topic A

Topic B

Topic C

Topic D

Transform

Transform

Transform

Producer MQ Handler 1 Handler 2 Handler 3 Consumer

Figure 11: MQ pipeline with push - dataflow

Page 27: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

MQ pipeline with push - sequenceProducer

Producer

MQ

MQ

Handler 1

Handler 1

Handler 2

Handler 2

Handler 3

Handler 3

Consumer

Consumer

PUSH Topic A

PUSH Topic A

Pressure point

Process

PUSH Topic B

PUSH Topic B

Pressure point

Process

PUSH Topic C

PUSH Topic C

Pressure point

Process

PUSH Topic D

PUSH Topic D

Pressure point

Figure 12: Kafka pipeline with push - sequence

Page 28: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

MQ pipeline notes

▶ This design is active here, sends data as it comes in▶ Server-push model for moving data

▶ Yes, you can also poll a queue▶ Complex programming model

▶ MQ-specific protocol▶ Requires registration of callback▶ Handler process might be unavailable

Page 29: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Model

def next(records_in, buffer_size, output_capacity):buffer_size = buffer_size + records_in

if ((buffer_size - output_capacity) >= 0):records_out = output_capacitybuffer_size = buffer_size - output_capacity

else:records_out = buffer_sizebuffer_size = 0

plot(records_in, buffer_size, records_out)return buffer_size

Page 30: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Input rate =< output capacity

Figure 13: Output capacity = 15 eps

Page 31: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Backlog processing

Figure 14: Output capacity = 5 eps

Page 32: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Backlog processing with finite buffer

Figure 15: Limit reached with no backpressure means data loss

Page 33: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Observing buffer change rate

t1 t2 t3 t4𝐶𝑜𝑢𝑛𝑡𝑒𝑟(𝐼𝑛) 5 12 19 26𝐶𝑜𝑢𝑛𝑡𝑒𝑟(𝑂𝑢𝑡) 5 10 15 20𝑅𝑎𝑡𝑒𝐼𝑛(𝑡) N/A 7 7 7𝑅𝑎𝑡𝑒𝑂𝑢𝑡(𝑡) N/A 5 5 5𝑅𝑎𝑡𝑒𝐼𝑛(𝑡) − 𝑅𝑎𝑡𝑒𝑂𝑢𝑡(𝑡) N/A 2 2 2

𝑅𝑎𝑡𝑒(𝑛) = 𝐶𝑜𝑢𝑛𝑡𝑒𝑟(𝑛) − 𝐶𝑜𝑢𝑛𝑡𝑒𝑟(𝑛 − 1) 𝐵𝑢𝑓𝑓𝑒𝑟(𝑛) = 𝑅𝑎𝑡𝑒𝐼𝑛(𝑛) − 𝑅𝑎𝑡𝑒𝑂𝑢𝑡(𝑛)

Page 34: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Buffer change rate

Figure 16: Long term average of the red line should approach zero

Page 35: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Kafka pipeline - Dataflow - by asset

Schema A Topic A

Schema B Topic B

Schema C Topic C

Schema D Topic D

Transform

Transform

Transform

Producer Ka�a Spark 1 Spark 2 Spark 3 Consumer

Figure 17: Kafka pipeline Dataflow

Page 36: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Kafka pipeline - connection initiation - by asset

Producer

Producer

Ka�a

Ka�a

Spark 1

Spark 1

Spark 2

Spark 2

Spark 3

Spark 3

Consumer

Consumer

PUSH Topic A

PULL Topic A

Process

PUSH Topic B

PULL Topic B

Process

PUSH Topic C

PULL Topic C

Process

PUSH Topic D

PULL Topic D

Figure 18: Kafka pipeline sequence

Page 37: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Kafka pipeline - Dataflow - by service

Topic A

Transform

Topic B

Transform

Topic C

Transform

Topic D

Producer Ka�aTopic A

Spark 1 Ka�aTopic B

Spark 2 Ka�aTopic C

Spark 3 Ka�aTopic D

Consumer

Figure 19: Kafka pipeline Dataflow

Page 38: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Kafka pipeline - connection initiation - by serviceKa�a Ka�a Ka�a Ka�a

Producer

Producer

Topic A

Topic A

Spark 1

Spark 1

Topic B

Topic B

Spark 2

Spark 2

Topic C

Topic C

Spark 3

Spark 3

Topic D

Topic D

Consumer

Consumer

PUSH

PULL

Process

PUSH

PULL

Process

PUSH

PULL

Process

PUSH

PULL

Figure 20: Kafka pipeline sequence

Page 39: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs

Kafka pipeline notes

▶ This design is passive, does not send data unless asked▶ Client-pull model for moving data▶ All persistence is done on Kafka▶ Very simple programming model▶ Well understood wire-protocol (HTTP)