monitoring nginx (plus): key metrics and how-to
DESCRIPTION
NGINX just works and that's why we use it. That does not mean that it should be left unmonitored. As a web server, it plays a central role in a modern infrastructure. As a gatekeeper, it sees every interaction with the application. If you monitor it properly it can explain a lot about what is happening in the rest of your infrastructure. In this talk you will learn more about NGINX (plus) metrics, what they mean and how to use them. You will also learn different methods (status, statsd, logs) to monitor NGINX with their pros and cons, illustrated with real data coming from real servers.TRANSCRIPT
![Page 1: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/1.jpg)
Monitoring nginxAlexis Lê-Quôc, Datadog
@alq
![Page 2: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/2.jpg)
Agenda• Dramatis personae • Observations • Monitoring 1 nginx (plus) with logs • Monitoring 1 nginx (plus) with metrics • Monitoring N nginx effectively
![Page 3: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/3.jpg)
@alq CTO at Datadog
![Page 4: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/4.jpg)
Datadog == monitoring• Monitoring as a service • Work really will with large, dynamic environments (e.g. clouds) • Aggregate performance metrics • Correlate nginx performance with the rest of your infrastructure
![Page 5: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/5.jpg)
![Page 6: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/6.jpg)
![Page 7: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/7.jpg)
ObservationsFrom the field
![Page 8: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/8.jpg)
Some stats• Across all monitored servers • nginx ~10% • Apache ~5% • CPU and CPU/$ is the dominant resource
![Page 9: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/9.jpg)
% of instances per core count
0%
10%
20%
30%
40%
Core count1 2 4 8 12 16 24 32
10%
1%3%
10%
30%
7%
39%
10%
![Page 10: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/10.jpg)
% of instances per type (AWS only)
0%
7.5%
15%
22.5%
30%
EC2 typec3.l c3.2xl c1.xl c3.8xl m3.l c3.xl m3.m cc2.8xl t2.m c3.4xl rest
8.6%
3.1%4.4%4.5%4.7%5%5.3%
7.6%
13%14%
30%
![Page 11: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/11.jpg)
Monitoring nginx1. Monitoring with logs 2. Monitoring with status 3. Monitoring with statsd
![Page 12: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/12.jpg)
Monitoring with logs
• Canonical example of log indexers • Your choice of:
• logstash • splunk • logentries, sumologic, loggly, etc.
nginx log forwarder indexer UI
![Page 13: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/13.jpg)
Monitoring with logs
nginx log forwarder indexer UI
Strengths Weaknesses
forensics & anomalies low signal-to-noise ratio
content-driven analysis “black box”
![Page 14: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/14.jpg)
Monitoring with metrics
• open-source: ngx_http_stub_status_module • bare-bone metrics • human-readable text presentation
• plus: ngx_http_status_module • a lot more metrics for each function • json format
• Your choice of… • Datadog, Nagios, Zabbix, etc. for open-source • Datadog for nginx plus
nginx status collector aggregator UI/alerts
![Page 15: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/15.jpg)
Monitoring with metrics
nginx status collector aggregator UI/alerts
Strengths Weaknesses
lightweight & real-time no insight into content
“white box”
![Page 16: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/16.jpg)
Simple metrics taxonomy1. What it measures
• Work or resource • Focus on work because work == value • Resource analysis useful to understand performance
• Use Brendan Gregg’s USE • Utilization (% over time) • Saturation (queue length) • Errors (count over time)
2. Type • Gauge: sample • Counter: accumulated sample, needs to be derived to be
meaningful
http://www.brendangregg.com/usemethod.html
![Page 17: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/17.jpg)
Open-source metrics
Class Type Resource/Work Notes
Current connections Gauge Resource reading, writing,
idleAccepted
connections Counter Resource
Handled connections Counter Resource <= accepted if
resource limit
Requests Counter Work True purpose of the server
•Latency must be measured using logs or statsd.
![Page 18: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/18.jpg)
Key “plus” metrics
Class Type Resource/Work Notes
5xx Errors Counter Work without log analysis
5xx/sum(Nxx) Gauge Work error rate %
idle/dropped connections Gauge Resource saturation
active/total connections Gauge Resource upstream
capacity
Requests Counter Work true purpose of the server
• Latency must be measured using logs or statsd.
![Page 19: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/19.jpg)
Monitoring with statsd
nginx statsd UI/alerts
Strengths Weaknesses
lightweight, real-time, standard not comprehensive
custom metrics, content-aware
https://github.com/zebrafishlabs/nginx-statsd
![Page 20: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/20.jpg)
Example
![Page 21: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/21.jpg)
Monitoring nginx1. Logs for content-analysis (forensics, anomalies, marketing) 2. Status for (white box) performance monitoring 3. statsD for custom metrics
No single method gives you everything you need.
![Page 22: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/22.jpg)
Monitoring a lot of nginx1. Requires aggregation 2. It’s all about Metadata (“Pet-to-cattle” mindset) 3. Correlation
![Page 23: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/23.jpg)
Aggregation• By default for log-based monitoring • Not by default for metric-based monitoring
![Page 24: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/24.jpg)
Metadata• Analyze by properties that are not the host identity • Find anomalies that are not obvious • Pet-to-cattle evolution: hosts don’t matter, services do
![Page 25: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/25.jpg)
Correlation• nginx is only one piece of the infrastructure
![Page 27: Monitoring NGINX (plus): key metrics and how-to](https://reader033.vdocument.in/reader033/viewer/2022042715/559452351a28abd34f8b4672/html5/thumbnails/27.jpg)
Thank you!Questions/Comments? @alq