continuous performance monitoring of a distributed application [con4730]

Post on 26-May-2015

510 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

JavaONE 2013 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.1

TRANSCRIPT

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.1

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.2

Insert Picture Here

Continuous Performance Monitoring of a Distributed Application

Ashish Srivastava Principal Member of Technical StaffDiana YuryevaSenior Member of Technical Staff

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.3

The following is intended to outline our general product direction. It is intended

for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.4

Session Goal

§ Components:– Design patterns

– Tools

§ Qualities:– Continuous

– Light-weight

– Recordable

Arrive at solution for extreme performance monitoring

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.5

Session Agenda

§ Use Case

§ Software Patterns

§ Tools

§ Pitfalls and Advice

§ Q&A

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.6

About Us

§ Oracle Billing and Revenue Management Elastic Charging Engine– 100% real-time charging application

– Java

– Distributed grid– Oracle Coherence

– Oracle NoSQL

– Focus on extreme performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.7

Operating Conditions

§ Low latency expectations

§ Heavy system load

§ Distributed environment

§ Multi-level software stack

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.8

Monitoring Requirements

§ Detailed insight about performance– Latency

– Throughput

§ View over time§ Reporting§ Bottleneck detection§ View of system as cohesive unit

Functional

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.9

Monitoring Requirements

§ Minimal impact on processing

§ Ease of use

§ Separation of concerns

Non-Functional

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.10

Session Agenda

§ Use Case

§ Software Patterns

§ Tools

§ Pitfalls and Advice

§ Q&A

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.11

Approach

§ Off-the-shelf software not sufficient

§ Custom development needed– Incorporate monitoring into system

– Collect, analyze and present metrics

How do I address these requirements?

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.12

Collecting Metrics

§ Goal– Incorporate metrics collection into general processing

§ Approach– Enhance domain model with monitoring-related data structures

Problem overview

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.13

Collecting Metrics

Model of sample system

ECE Client

Network Mediation

A

B

B'C

C'A'

A

Node1

Node3

Node2

request

―Debatch the requests―Data lookups―Apply Tariff―Save Session―Prepare Response―..

response

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.14

Collecting Metrics

Solution

ECE Client

Network Mediation

A

B

B'C

C'A'

A

Node1

Node3

Node2

request

―Debatch the requests―Data lookups―Apply Tariff―Save Session―Prepare Response―..

response

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.15

Client Node

Processing Node

Envelope

Routing ContextPayloadTracking Context Chronicler

– TimePointsStat Reporterharvest

Envelope

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.16

Collecting Metrics

##### Elapsed time = 3600 seconds

##### Avg throughput = 20000 ops/sec

##### Avg latency = 50 ms

Result

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.17

Granular Reporting

§ Goal– I need more granular reporting of performance over time

§ Approach– Enhance reporting of collected metrics

Problem overview

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.18

Granular Reporting

Solution – data structure

Chronicler removed

―A moving reporting window―100% reporting―Sampled reporting―Stats exposed over JMX―Fixed data set for a window

Chronicler added

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.19

Granular Reporting

Solution – class diagram

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.20

Granular Reporting

§ I can see min/max/avg latency and throughput over time

§ My throughput reporting is quite good: I can see whether I had stable or erratic throughput

Result

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.21

Latency Percentile Report

§ Goal– Latencies are still not detailed enough. I need to know more than the

average/min/max latencies

– Need to guarantee that 99.999% of the requests take less than 55ms

§ Approach– Introduce range bucketing to count latencies

Problem overview

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.22

Latency Percentile Report

§ Pre-defined buckets of latency percentiles§ Data set does not grow. Each bucket is updated§ Multiple percentile breakdown

– End-to-end

– Server side processing

– Per batch reporting

Solution

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.23

Latency Percentile Report

2013-03-21 08:44:29.112 PDT INFO ##### Latency statistics based on percentiles:

Percentile: 0.1, Latency: 1ms, Total Count: 1173148

Percentile: 1.0, Latency: 2ms, Total Count: 100909763

Percentile: 10.0, Latency: 2ms, Total Count: 100909763

Percentile: 95.0, Latency: 26ms, Total Count: 685176664

Percentile: 99.0, Latency: 50ms, Total Count: 713029967

Percentile: 99.5, Latency: 58ms, Total Count: 716355711

Percentile: 99.9, Latency: 78ms, Total Count: 719217619

Percentile: 99.99, Latency: 104ms, Total Count: 719836971

Percentile: 99.999, Latency: 128ms, Total Count: 719897850

Percentile: 100.0, Latency: 169ms, Total Count: 719904814

Result – printed report

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.24

Latency Percentile Report

Result – heat map

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.25

Method Breakdown

§ Goal– I want to measure the impact of a new method under varying load

– End-to-end latency always ON

– Minimum performance impact

§ Approach– Method annotations

– Aspect

Problem overview

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.26

Method Breakdown

Solution

public enum LabelEnum { APPLY_TARIFF,

... DEBATCH } public class ClassToBeTracked { @Track(pointLabel = LabelEnum.APPLY_TARIFF) private <ReturnObject> method(<Parameters>) { ... } }

<pointcut name="scope" expression="within(ClassName) "/>

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.27

Method Breakdown

Result – detailed breakdown report

2013-07-15 16:29:24.953 PDT Chronicler Breakdown: DEBATCH -> 64149 nanoseconds LOOKUP_DATA -> 1056748 nanoseconds APPLY_TARIFF -> 99994 nanoseconds SAVE_SESSION -> 12989 nanoseconds PREPARE_RESPONSE -> 15998 nanoseconds

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.28

Session Agenda

§ Use Case

§ Software Patterns

§ Tools

§ Pitfalls and Advice

§ Q&A

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.29

Storing and Presenting Metrics

§ Goal– I collect detailed performance metrics, but I need to report them too

– I need a tool which stores these metrics and presents them in a unified view

§ Approach– Create monitoring dashboard

– Technologies: JRDS,RRD and in-house development

Problem overview

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.30

Storing and Presenting Metrics

Result – monitoring dashboard

Configuration: Topology 24 servers Throughput 20000 ops/sec Duration 10 hrs

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.31

Storing and Presenting Metrics

§ Graphical§ Supports various metrics

– Application-specific

– Machine-specific

– JVM-specific

§ Consolidated view– All graphs on one page

Solution qualities

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.32

Storing and Presenting Metrics

§ Easy to use– Collects and saves data automatically

§ Easy to share– Includes configuration for future references

– Send links to web pages

– Print page as PDF

Solution qualities

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.33

Storing and Presenting Metrics

§ Stores data without losing precision§ Supports drilling down§ Light-weight§ Customizable

Solution qualities

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.34

Session Agenda

§ Use Case

§ Software Patterns

§ Tools

§ Pitfalls and Advice

§ Q&A

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.35

Pitfalls and Advice

§ Distributed system monitoring != Single JVM monitoring– Consolidated view is critical

§ Consistency of tools across team is important– Same language across development, QE and Performance teams saves

hours

§ Solution should enable you to be agile– Run monitoring on laptop AND realistic setup

Take into consideration

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.36

Pitfalls and Advice

§ These hide problems– Averaging

– Sampling

§ GC has big impact, so include it in your metrics§ Watch our for processes sharing the same host§ Always run long-duration tests

Some things to pay attention to

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.37

Session Summary

Detailed insight about performance– Latency

– Throughput

View over time Reporting Bottleneck detection View of system as cohesive unit

Let's see how we addressed original requirements

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.38

Session Summary

Detailed insight about performance– Latency

– Throughput

View over time Reporting Bottleneck detection View of system as cohesive unit

Let's see how we addressed original requirements

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.39

Session Summary

Detailed insight about performance– Latency

– Throughput

View over time Reporting Bottleneck detection View of system as cohesive unit

Let's see how we addressed original requirements

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.40

Session Summary

Detailed insight about performance– Latency

– Throughput

View over time Reporting Bottleneck detection View of system as cohesive unit

Let's see how we addressed original requirements

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.41

Session Summary

Detailed insight about performance– Latency

– Throughput

View over time Reporting Bottleneck detection View of system as cohesive unit

Let's see how we addressed original requirements

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.42

Session Summary

Detailed insight about performance– Latency

– Throughput

View over time Reporting Bottleneck detection View of system as cohesive unit

Let's see how we addressed original requirements

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.43

Session Agenda

§ Use Case

§ Software Patterns

§ Tools

§ Pitfalls and Advice

§ Q&A

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.44

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.45

Links

§ ECE– http://www.oracle.com/us/products/applications/communications/elastic-

charging-engine

§ JRDS

– http://www.jrds.fr

top related