hotspot detection in a service oriented architecture pranay anchuri, anchupa@cs.rpi.edu,...
Post on 31-Dec-2015
219 Views
Preview:
TRANSCRIPT
Hotspot Detection in a Service Oriented Architecture
Pranay Anchuri, anchupa@cs.rpi.edu, http://cs.rpi.edu/~anchupa Rensselaer Polytechnic Institute, Troy, NY
Roshan Sumbaly, roshan@coursera.org Coursera, Mountain View, CA
Sam Shah, samshah@linkedin.com LinkedIn, Mountain View, CA
www.rpi.edu
Introduction
www.rpi.edu
Largest professional network.
300M members from 200 countries.
2 new members per second.
www.rpi.edu
Largest professional network.
300M members from 200 countries.
2 new members per second.
www.rpi.edu
Service Oriented Architecture
www.rpi.edu
What is a Hotspot
Hotspot : Service responsible for suboptimal performance of a user facing functionality.
www.rpi.edu
What is a Hotspot
Hotspot : Service responsible for suboptimal performance of a user facing functionality.
Performance measures: Latency Cost to serve Error rate
www.rpi.edu
Who uses hotspot detection ?
Engineering teams : Minimize latency for the user. Increase the throughput of the servers.
Operations teams : Reduce the cost of serving user requests.
www.rpi.edu
Goal
www.rpi.edu
Data - Service Call Graphs
Service call metrics logged into a central system.
Call graph structure re-constructed from random trace id.
www.rpi.edu
Example of Service Call Graph
Read profile
Content Service
Context Service
Content Service
Entitlements
Visibility
3 712
10 11
www.rpi.edu
Example of Service Call Graph
Read profile
Content Service
Context Service
Content Service
Entitlements
Visibility
3 712
10 11
www.rpi.edu
Example of Service Call Graph
Read profile
Content Service
Context Service
Content Service
Entitlements
Visibility
3 712
10 11
www.rpi.edu
Example of Service Call Graph
Read profile
Content Service
Context Service
Content Service
Entitlements
Visibility
3 712
10 11
www.rpi.edu
www.rpi.edu
Challenges in mining hotspots
www.rpi.edu
Structure of call graphs
Structure of call graphs change rapidly across requests. Depends on member’s attributes. A/B testing. Changes to code base.
Over 90% unique structures for most requested services.
www.rpi.edu
Asynchronous service calls
Calls AB, AC are Serial : C is called after B returns to A. Parallel : B and C are called at same time
or in a brief time span. Parallel service calls are particularly
difficult to handle. Degree of parallelism ~ 20 for some
services.
www.rpi.edu
Related Work
Hu et. al [SIGCOMM 04, INFOCOMM 05] Tools to detect bottlenecks along network
paths.
Mann et. al [USENIX 11] Models to estimate latency as a function of
RPC’s latencies.
www.rpi.edu
Why existing methods don’t work ?
Metric cannot be controlled as in bottleneck detection algorithms.
Analyzing millions of small networks. Parallel service calls.
www.rpi.edu
Our approach
www.rpi.edu
● Given call graphs
Optimize and summarize approach
www.rpi.edu
● Given call graphs
● Hotspots in each call graph
Optimize and summarize approach
www.rpi.edu
● Given call graphs
● Hotspots in each call graph
● Ranking hotspots
Optimize and summarize approach
www.rpi.edu
What are the top-k hotspots in a call graph ?
Hotspots in a specific call graph irrespective of other call graphs for the same type of request.
www.rpi.edu
Key Idea
What are the k services, if already optimized, that would have lead to maximum reduction in the latency of request ?(Specific to a particular call graph)
www.rpi.edu
Quantifying impact of a service
What if a service was optimized by θ ? (think after the fact)
www.rpi.edu
Quantifying impact of a service
What if a service was optimized by θ ? (think after the fact) Its internal computations are θ times faster. No effect on the overall latency if its
parent is waiting on other service call to return.
www.rpi.edu
Example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
www.rpi.edu
Example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
www.rpi.edu
Example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
2x faster
www.rpi.edu
Example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
2x faster
www.rpi.edu
Example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
2x faster
Effect of 2x speedup
www.rpi.edu
Local effect of optimization
Latency : Sum of computation and waiting times.
Effect : Lesser computation times and early subcalls. 1)
2) 3)
=
is a service and is its subcall after computation intervals.
www.rpi.edu
Negative example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
www.rpi.edu
Negative example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
www.rpi.edu
Negative example
[0,11]
[0,3]
[1,2]
[1.3, 1.6]
[2.1, 2.5]
[4,11]
[6,9]
[7,8]
www.rpi.edu
Example
www.rpi.edu
Under the propagation assumption
Computing the optimal services is NP-hard. Reduction from a variation of subset sum
problem. Construction and proof in the paper.
www.rpi.edu
Relaxation
Variation of the propagation assumption that allows for a service to propagate fractional effects to its parent. Leads to a greedy algorithm.
www.rpi.edu
Greedy algorithm to compute top-k hotspots
Given an optimization factor θ, Repeatedly select a service that has maximum impact
on frontend service. Update the times after each selection. Stop after k iterations.
www.rpi.edu
Ranking hotspots
top services change significantly across different call graphs.
Rank hotspots on: Frequency (itemset
mining) Impact on front end
service.
www.rpi.edu
Rest of the paper
Similar approach applied to cost of request metric.
Generalized framework for optimizing arbitrary metrics.
Other ranking schemes.
www.rpi.edu
Results
www.rpi.edu
DatasetRequest type
Avg # of call graphs per day*
Avg # of service call per request
Avg # of subcalls per service
Max # of parallel subcalls
Home 10.2 M 16.90 1.88 9.02
Mailbox 3.33 M 23.31 1.9 8.88
Profile 3.14 M 17.31 1.86 11.04
Feed 1.75 M 16.29 1.87 8.97
* Scaled down by a constant factor
www.rpi.edu
vs Baseline algorithm
www.rpi.edu
User of the system
www.rpi.edu
Consistency over a time period
www.rpi.edu
Conclusion
www.rpi.edu
Conclusions
Defined hotspots in service oriented architectures.
Framework to mine hotspots w.r.t various performance metrics.
Experiments on real world large scale datasets.
www.rpi.edu
ThanksQuestions ?
top related