simple practices in performance monitoring and evaluation
TRANSCRIPT
![Page 1: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/1.jpg)
Simple Practices in Performance Monitoring and Evaluation
Schubert Zhang 2016.3.24
![Page 2: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/2.jpg)
SLA
Service Level Agreements
https://en.wikipedia.org/wiki/Service-level_agreement
SLAs commonly include segments to address: a definition of services, performance measurement, problem management, customer duties,
warranties, disaster recovery, termination of agreement.
![Page 3: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/3.jpg)
•
•
• APIIM SLA
•
• Performance
• Performanceperformance oriented SLA
![Page 4: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/4.jpg)
MetricsSLA Performance SLA
Performance Metrics
e.g.1: API
•
• (99%)
•
e.g.2: Call Center
• Abandonment Rate: Percentage of calls abandoned while waiting to be answered.
• ASA (Average Speed to Answer): Average time it takes for a call to be answered by the service desk.
• TSF (Time Service Factor): Percentage of calls answered within a definite timeframe, e.g., 80% in 20 seconds.
• FCR (First-Call Resolution): Percentage of incoming calls that can be resolved without the use of a callback or without having the caller call back the helpdesk to finish resolving the case.
• TAT (Turn-Around Time): Time taken to complete a certain task.
Metrics
Performance Metrics
![Page 5: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/5.jpg)
Benchmarking
the quality of a service must be measured, evaluated, … benchmarked.
and we must have a set of approaches for benchmarking.
![Page 6: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/6.jpg)
Metrics to be monitored
![Page 7: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/7.jpg)
Throughput
QPS TPS CPS
in seconds, in minutes, in hours …
![Page 8: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/8.jpg)
Concurrency
![Page 9: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/9.jpg)
Latency
Response Time Round-Trip Time(RTT) …
Average Median Min. Max. Percentile …
![Page 10: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/10.jpg)
Quantile / Percentile
refers to Google Sawzall Paper
![Page 11: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/11.jpg)
A Summary of these Concepts
Client-1
Client-2
Client-3
Client-N
Work Thread
Work Thread
Work Thread
Work Thread
Work Thread
ThroughputLatency Concurrency
Clients Server
![Page 12: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/12.jpg)
A Life-World Example
![Page 13: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/13.jpg)
Example-1 Paper Amazon Dynamo
![Page 14: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/14.jpg)
![Page 15: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/15.jpg)
![Page 16: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/16.jpg)
Average
99.9%, quantile
![Page 17: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/17.jpg)
Example-2 Evaluation Report to a NoSQL DB
Cassandra
![Page 18: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/18.jpg)
Benchmark for Write APIBenchmark for Writes Cluster overview
Throughput Latency
• Eachnoderuns6clients(threads),totally54clients.• EachclientgeneratesrandomCDRsfor50millionusers/phone-numbers,
andputsthemintoDaStoronebyone.– KeySpace:50million– SizeofaCDR: Thrift-compactedencoding,~200bytes
ü Throughput: average~80Kops/s;per-node:average~9Kops/sü Latency:average~0.5msp Bottleneck:network (andmemory)
![Page 19: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/19.jpg)
Benchmark for Read API• Eachnoderuns8clients(threads),totally72clients.• Eachclientrandomlyusesauser-id/phone-numberoutofthe50-million
space,togetit’srecent20CDRs(onepage)fromDaStor.• AllclientsreadCDRsofasameday/bucket.
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61100ms
percentageofreadops
ü Throughput: average~140ops/s;per-node:average~16ops/sü Latency:average~500ms,97%<2s(SLA)p Bottleneck:diskIO(randomseek)(CPUloadisverylow)
average97%
quantile
![Page 20: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/20.jpg)
Total & Delta
Total: Delta:
![Page 21: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/21.jpg)
Generate the metrics and monitor them
![Page 22: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/22.jpg)
• In server side
• Add a operation-count and the time-cost for every client call
• For every monitor interval, pull and push the current Throughput and Latency the monitor-tool(ganglia/zabbix) or console.
• Throughput = sum of count / time interval
• Latency = average(sum of latency / sum of count), max, min, quantile …
Code in Gitlab and Gerrit
![Page 23: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/23.jpg)
Code for Spring Project
![Page 24: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/24.jpg)
• Java
• JMX (Java Management Extensions, a simple example at https://github.com/schubertzhang/jsketch)
• javaagent (java -javaagent:jar path [= premain ] )
• jmxetric (use JMX and javaagent to display metrics to Ganglia, https://github.com/schubertzhang/jmxetric)
•
• Ganglia
• Zabbix
• …
![Page 25: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/25.jpg)
Ganglia Zabbix etc.
![Page 26: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/26.jpg)
Performance Benchmark Programing
Demo Test and Evaluation the Throughput and Latency of http://www.fangdd.com
![Page 27: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/27.jpg)
Demo Time …
![Page 28: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/28.jpg)
demo screenshots
![Page 29: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/29.jpg)
demo screenshots
�
���
���
���
��
����
����
����
� � � � �� �� �� �� � �� �� �� �� � �� �� �� �� � �� �� �� �� � �� �� �� �� � �� �� �� �� � �� �� �� �� � � � � � � � � � ���
���
���
���
��
���
���
���
���
��
���
���
���
���
��
���
���
����
����
�� ������� ���� �
Average 95%
The long tail …
![Page 30: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/30.jpg)
Statistical Monitoring for Outlier
usually for trouble-shooting
![Page 31: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/31.jpg)
Captured from UTStarcom mSwitch R5 system, Guangxi Site, 2004.
The magic matrix:
![Page 32: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/32.jpg)
•
• Redis Memcache
• Just add at a point, very low-cost
•
• Very
• Logs ELK
![Page 33: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/33.jpg)
Heavy Logs & ELK
It’s another topic!
![Page 34: Simple practices in performance monitoring and evaluation](https://reader033.vdocument.in/reader033/viewer/2022052706/5a64d5357f8b9a735d8b4b4b/html5/thumbnails/34.jpg)
Thank You!