hotspot detection in a service oriented architecture pranay anchuri, anchupa@cs.rpi.edu,...

Post on 31-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Hotspot Detection in a Service Oriented Architecture

Pranay Anchuri, anchupa@cs.rpi.edu, http://cs.rpi.edu/~anchupa Rensselaer Polytechnic Institute, Troy, NY

Roshan Sumbaly, roshan@coursera.org Coursera, Mountain View, CA

Sam Shah, samshah@linkedin.com LinkedIn, Mountain View, CA

www.rpi.edu

Introduction

www.rpi.edu

Largest professional network.

300M members from 200 countries.

2 new members per second.

www.rpi.edu

Largest professional network.

300M members from 200 countries.

2 new members per second.

www.rpi.edu

Service Oriented Architecture

www.rpi.edu

What is a Hotspot

Hotspot : Service responsible for suboptimal performance of a user facing functionality.

www.rpi.edu

What is a Hotspot

Hotspot : Service responsible for suboptimal performance of a user facing functionality.

Performance measures: Latency Cost to serve Error rate

www.rpi.edu

Who uses hotspot detection ?

Engineering teams : Minimize latency for the user. Increase the throughput of the servers.

Operations teams : Reduce the cost of serving user requests.

www.rpi.edu

Goal

www.rpi.edu

Data - Service Call Graphs

Service call metrics logged into a central system.

Call graph structure re-constructed from random trace id.

www.rpi.edu

Example of Service Call Graph

Read profile

Content Service

Context Service

Content Service

Entitlements

Visibility

3 712

10 11

www.rpi.edu

Example of Service Call Graph

Read profile

Content Service

Context Service

Content Service

Entitlements

Visibility

3 712

10 11

www.rpi.edu

Example of Service Call Graph

Read profile

Content Service

Context Service

Content Service

Entitlements

Visibility

3 712

10 11

www.rpi.edu

Example of Service Call Graph

Read profile

Content Service

Context Service

Content Service

Entitlements

Visibility

3 712

10 11

www.rpi.edu

www.rpi.edu

Challenges in mining hotspots

www.rpi.edu

Structure of call graphs

Structure of call graphs change rapidly across requests. Depends on member’s attributes. A/B testing. Changes to code base.

Over 90% unique structures for most requested services.

www.rpi.edu

Asynchronous service calls

Calls AB, AC are Serial : C is called after B returns to A. Parallel : B and C are called at same time

or in a brief time span. Parallel service calls are particularly

difficult to handle. Degree of parallelism ~ 20 for some

services.

www.rpi.edu

Related Work

Hu et. al [SIGCOMM 04, INFOCOMM 05] Tools to detect bottlenecks along network

paths.

Mann et. al [USENIX 11] Models to estimate latency as a function of

RPC’s latencies.

www.rpi.edu

Why existing methods don’t work ?

Metric cannot be controlled as in bottleneck detection algorithms.

Analyzing millions of small networks. Parallel service calls.

www.rpi.edu

Our approach

www.rpi.edu

● Given call graphs

Optimize and summarize approach

www.rpi.edu

● Given call graphs

● Hotspots in each call graph

Optimize and summarize approach

www.rpi.edu

● Given call graphs

● Hotspots in each call graph

● Ranking hotspots

Optimize and summarize approach

www.rpi.edu

What are the top-k hotspots in a call graph ?

Hotspots in a specific call graph irrespective of other call graphs for the same type of request.

www.rpi.edu

Key Idea

What are the k services, if already optimized, that would have lead to maximum reduction in the latency of request ?(Specific to a particular call graph)

www.rpi.edu

Quantifying impact of a service

What if a service was optimized by θ ? (think after the fact)

www.rpi.edu

Quantifying impact of a service

What if a service was optimized by θ ? (think after the fact) Its internal computations are θ times faster. No effect on the overall latency if its

parent is waiting on other service call to return.

www.rpi.edu

Example

[0,11]

[0,3]

[1,2]

[1.3, 1.6]

[2.1, 2.5]

[4,11]

[6,9]

[7,8]

www.rpi.edu

Example

[0,11]

[0,3]

[1,2]

[1.3, 1.6]

[2.1, 2.5]

[4,11]

[6,9]

[7,8]

www.rpi.edu

Example

[0,11]

[0,3]

[1,2]

[1.3, 1.6]

[2.1, 2.5]

[4,11]

[6,9]

[7,8]

2x faster

www.rpi.edu

Example

[0,11]

[0,3]

[1,2]

[1.3, 1.6]

[2.1, 2.5]

[4,11]

[6,9]

[7,8]

2x faster

www.rpi.edu

Example

[0,11]

[0,3]

[1,2]

[1.3, 1.6]

[2.1, 2.5]

[4,11]

[6,9]

[7,8]

2x faster

Effect of 2x speedup

www.rpi.edu

Local effect of optimization

Latency : Sum of computation and waiting times.

Effect : Lesser computation times and early subcalls. 1)

2) 3)

=

is a service and is its subcall after computation intervals.

www.rpi.edu

Negative example

[0,11]

[0,3]

[1,2]

[1.3, 1.6]

[2.1, 2.5]

[4,11]

[6,9]

[7,8]

www.rpi.edu

Negative example

[0,11]

[0,3]

[1,2]

[1.3, 1.6]

[2.1, 2.5]

[4,11]

[6,9]

[7,8]

www.rpi.edu

Negative example

[0,11]

[0,3]

[1,2]

[1.3, 1.6]

[2.1, 2.5]

[4,11]

[6,9]

[7,8]

www.rpi.edu

Example

www.rpi.edu

Under the propagation assumption

Computing the optimal services is NP-hard. Reduction from a variation of subset sum

problem. Construction and proof in the paper.

www.rpi.edu

Relaxation

Variation of the propagation assumption that allows for a service to propagate fractional effects to its parent. Leads to a greedy algorithm.

www.rpi.edu

Greedy algorithm to compute top-k hotspots

Given an optimization factor θ, Repeatedly select a service that has maximum impact

on frontend service. Update the times after each selection. Stop after k iterations.

www.rpi.edu

Ranking hotspots

top services change significantly across different call graphs.

Rank hotspots on: Frequency (itemset

mining) Impact on front end

service.

www.rpi.edu

Rest of the paper

Similar approach applied to cost of request metric.

Generalized framework for optimizing arbitrary metrics.

Other ranking schemes.

www.rpi.edu

Results

www.rpi.edu

DatasetRequest type

Avg # of call graphs per day*

Avg # of service call per request

Avg # of subcalls per service

Max # of parallel subcalls

Home 10.2 M 16.90 1.88 9.02

Mailbox 3.33 M 23.31 1.9 8.88

Profile 3.14 M 17.31 1.86 11.04

Feed 1.75 M 16.29 1.87 8.97

* Scaled down by a constant factor

www.rpi.edu

vs Baseline algorithm

www.rpi.edu

User of the system

www.rpi.edu

Consistency over a time period

www.rpi.edu

Conclusion

www.rpi.edu

Conclusions

Defined hotspots in service oriented architectures.

Framework to mine hotspots w.r.t various performance metrics.

Experiments on real world large scale datasets.

www.rpi.edu

ThanksQuestions ?

top related