it’s a match! - grafanacon...infrastructure monitoring component method elasticsearch clusters...

51
It’s A Match! &

Upload: others

Post on 25-May-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

It’s A Match!

&

Page 2: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

032

Brief Introduction

Page 3: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

Wenting Gong

Software Engineer, Observability, Tinder

3

Github: https://github.com/christine-gongLinkedin: https://www.linkedin.com/in/wentingg/Email: [email protected]

Page 4: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

034

Agenda

Page 5: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

5

Agenda

● Our Monitoring Journey with Grafana- Applications running as VMs and Containers- Infrastructure resources- Datasource and Dashboard Automation

● Demo

Page 6: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

036

2 Years Ago

Page 7: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

037

Let’s meet some of our Tinder Engineering team members:

Observability Team

Cloud Infrastructure Team

Backend Team

Alice

Charlie

Bob

Page 8: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

038

Let’s build our observability infrastructure to monitor the health of all the services!

Cloud Infrastructure Team

Charlie

AliceObservability Team

Hmmm, we also need a central place for engineers to check and view the real-time metrics!

Page 9: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

9

Initial Scenario

Around fifty microservices running with AWS EC2 instances and ELBs

Page 10: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0310

AliceObservability Team

AWS Cloudwatch and Grafana should help with that!

+

Page 11: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

11

Solution

Page 12: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

12

Solution

Page 13: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0313

AliceObservability Team

Ok! Now we can view and check the services real-time health metrics from Grafana!

Bob Backend

Team

Great!

CharlieCloud Infra

Team

Page 14: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0314

We backend engineers also want to know the details about our services! E.g. the P50, P95, P99 latency, the requests status for different routes, etc

Bob Backend Team

AliceObservability Team

That sounds fair enough! We need to investigate some open source monitoring solutions!

Page 15: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0315

AliceObservability Team

Page 16: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0316

AliceObservability Team

Prometheus is the best option for us!

Page 17: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

17

Solution

Page 18: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0318

A few months later...

Page 19: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0319

Our business grows so fast that one prometheus server could not handle them all.

Bob Backend Team

Yeah, we should have a better solution to make our monitoring infra more scalable.

AliceObservability Team

Page 20: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0320

AliceObservability Team

Let’s get each service an assigned prometheus server!

Page 21: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

21

Solution

Page 22: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

22

Solution

Prometheus Datasource

Page 23: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0323

Bob Backend

Team

CharlieCloud Infra

Team

AliceObservability

Team

Good Job!!!

Page 24: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0324Picture from https://blogs.oracle.com/marketingcloud/how-to-decide-what-marketing-metrics-to-track-and-report

Page 25: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0325

The metrics are great! Can we keep them longer so that we can see the trends and compare them when necessary?

Bob Backend Team

That’s a good point!

AliceObservability Team

Page 26: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

26

Compare two approaches

Approach Pros Cons

Increase the retention period for all prometheus servers Super EASY

● Not all metrics are needed for longer retention

● Increased costs● $$$ and resources wasted

for unnecessary metrics

Have a separate prometheus archive server for long-term metrics

Uses the resources and $$$ more efficiently

● Extra setup● Module owners need to

understand and update existing configuration

Page 27: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

27

Compare two approaches

Approach Pros Cons

Increase the retention period for all prometheus servers Super EASY

● Not all metrics are needed for longer retention

● Increased costs● $$$ and resources wasted

for unnecessary metrics

Have a separate prometheus archive server for long-term metrics

Uses the resources and $$$ more efficiently

● Extra setup● Module owners need to

understand and update existing configuration

Page 28: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

28

Solution

AliceObservability Team

Let’s only archive those key metrics for longer time!

Page 29: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0329

Bob Backend

Team

CharlieCloud Infra

Team

AliceObservability

Team

Well Done!

Page 30: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0330Picture from Adam Tow, AllThingsD.com

Page 31: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0331Picture from Adam Tow, AllThingsD.com

Page 32: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0332

Our services expand so aggressively, it is time to move to Kubernetes for deploying and managing containerized applications at scale.

CharlieCloud Infrastructure Team

Page 33: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0333

Our services expand so aggressively, it is time to move to Kubernetes for deploying and managing containerized applications at scale.

Kubernetes was designed to give developers more velocity, efficiency and agility.

CharlieCloud Infrastructure Team

Page 34: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0334

Our services expand so aggressively, it is time to move to Kubernetes for deploying and managing containerized applications at scale.

Kubernetes was designed to give developers more velocity, efficiency and agility. Charlie

Cloud Infrastructure Team

AliceObservability Team

Then we should definitely support the monitoring for K8S environment!

Page 35: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

35

CharlieCloud Infrastructure

Team

Page 36: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0336

+ = ?

Prometheus Operator

AliceObservability Team

Developers should not redo the metrics instrumenting, we need to stick with Prometheus in Kubernetes as well!

Page 37: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

37

Solution - metrics for all services

prom-operator

Prometheus datasources

Page 38: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

38

Solution: archived metrics for longer retention

prom-operator

Page 39: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0339

Bob Backend

Team

CharlieCloud Infra

Team

AliceObservability

Team

Awesome!!!

Page 40: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0340

CharlieCloud Infrastructure Team

Speaking of , sometimes issue happens inside k8s cluster. We need the capability to monitor the k8s health itself.

AliceObservability Team

Sure thing! Kubernetes has a large, rapidly growing ecosystem. Its services, support, and tools are widely available [1]

Page 41: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

41

Solution

Page 42: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0342

Bob Backend

Team

CharlieCloud Infra

Team

AliceObservability

Team

Lovely!

Page 43: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0343

Are infrastructure resources being taken care of? Do we have metrics for them?

Bob Backend Team

Let me look into that!

AliceObservability Team

Page 44: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

44

Infrastructure Monitoring

Component Method

Elasticsearch Clusters Prometheus elasticsearch exporter

Kafka Clusters Prometheus kafka exporter, Prometheus kafka consumer group exporter

AWS RDS Instances Prometheus cloudwatch exporter

AWS Dynamodb Prometheus cloudwatch exporter

Page 45: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0345

AliceObservability Team

We should definitely do that! Also Grafana has many useful APIs, I can develop that automation utils!

Now everything gets settled! With the help of k8s, we can scale the testing environments with monitoring! Could we reduce manual dashboard creation work?

CharlieCloud Infrastructure Team

Page 46: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

46

AliceObservability Team

Wrote my Grafana dashboard automation utils with the help of Grafanalib in Python with a Http client wrapper. Would help a lot when duplicating similar dashboards for all clusters and all environments

Page 47: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

47

Summary

- Have Grafana as a central monitoring place - Detailed real-time metrics for each service- Longer term historical key metrics available

- Monitor the health of all the microservices running in containers and VMs- Monitor the health of Kubernetes clusters - Monitor the health of all of our infrastructure resources, e.g.

Elasticsearch, Dynamodb, Redis, etc- Datasource auto-discovery and Dashboard auto-creation

Page 48: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

0348

Demo Time

Page 49: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

49

Reference

[1] https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/

[2] Emoji icons supplied by EmojiOne

[3] Kubernetes related icons https://github.com/octo-technology/kubernetes-icons

Page 50: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

QUESTIONS?

Page 51: It’s A Match! - GrafanaCon...Infrastructure Monitoring Component Method Elasticsearch Clusters Prometheus elasticsearch exporter Kafka Clusters Prometheus kafka exporter, Prometheus

THANKS