performance monitoring - adoniram mishra, rupesh dubey, thoughtworks

23
ANALYSING PERFORMANCE METRICS Steps beyond performance testing - Adoniram Mishra and Rupesh Dubey, ThoughtWorks

Upload: thoughtworks

Post on 20-Aug-2015

648 views

Category:

Technology


0 download

TRANSCRIPT

ANALYSING PERFORMANCE METRICSSteps beyond performance testing - Adoniram Mishra and Rupesh Dubey, ThoughtWorks

PERFORMANCE- AN INTRODUCTIONAgenda:

How is it related and different than performance testing

What parameters to monitor

What are the tools we use for performance monitoring

How to analyze live data

Basics are the same

§9 Most important performance monitoring parameters §Uptime§Page Speed§Full page load time§Geographic performance§Disk Free space§Memory Utilization ( Heap memory, overall memory)§Database performance§CPU usage§Internal Jobs / cron jobs

HOW IS IT DIFFERENT THAN PERFORMANCE TESTING

PERFORMANCE MONITORING AND ANALYSIS OF COMPLEX SYSTEM

GOOD OLD DAYS..BACK THEN • Simpler Architecture

• Centralised Monitoring • Limited Bottlenecks

AND NOW...

Distributed Architecture

Multi technologies

Multi vendors

Scattered Information

Rich UI

COMPLEX APPLICATION Daily Traffic : 2 million+page views

7 back end systems

1 Front end system

Rich UI + responsiveness

25 Servers

CURRENT ARCHITECTURE LAYOUT

SIMPLIFIED VIEW

HOW DO WE MONITOR PERFORMANCE ON DAILY BASIS?

GANGLIA Client

gmetad

gmetad gmetad

gmond

Node

gmond

Node

gmond

Node

gmond

Node

Poll Poll

Poll

Poll

CLUSTER 1

Poll Poll

Poll

CLUSTER 2

SPLUNK

Splunk Server

Data Source 1 Data Source 2 Data Source 3

Forwarders ForwardersForwarders

Alerts Search

Dashboard, reports

CASE STUDIES - X FILES

CASE STUDY I: CURIOUS CASE OF MEMORY UTILIZATION

Case Study

War Room Notes:

Apr 25 9:00 AM -- [Agent SPLUNK]The Memory on 3 Backend servers have crossed 90% -- May hem ! May Hem!

SOLUTION:• Should we add more memory.

- This will be a temp fix, without even investigating a root cause

• Let's see what SPLUNK gives us.Does it had additional information

• Real Fix lies with ....

Let's refer to our architecture

CASE STUDY II: RATE MY REVIEWS - "I WOULD SAY IT SUCKS:("

Recent integration with a 3rd party system

We wrote our custom UI

In performance testing the page load time was about 7 sec.

What's the root cause? Solution:

• Is our API call slow?

• Is our custom UI slow?

• Or is it? ....

LEARNINGS

DB INDEXES

For most heavily used db queries

Need to check which are the slow queries

PROPER LOGGING

This is an important aspect to check while doing a functional testing

How and what message is logged not just the log level

PROPER LOGGING CONT ..

Eg:

127.0.0.1 [01/jan/2014:16:38:24 -0600] "GET /api/books_details HTTP/1.1" 200 476

127.0.0.1 [01/jan/2014:16:39:24 -0600] "GET /api/magazines_details HTTP/1.1" 404 500

We can query the GET calls which is giving 404 as status code from the log server