systems support for end-to-end performance management sandip agarwala phd advisor: karsten schwan...
TRANSCRIPT
Systems Support for End-to-End Performance Management
Sandip Agarwala
PhD Advisor: Karsten Schwan
College of Computing
Georgia Tech
Reasons for Complexity
• Application diversity• Interdependencies• Heterogeneous components
– Too many different technologies and platform
• Too little “hints” from the system to the administrators– Legacy issues; Application-specific solutions
• Insufficient information about the system to drive self-management
Lack of Automation
Online System Management
Control Execute
MonitorAnalyze
Workload
•Scheduling•Capacity and SLA management•Design evaluation and tuning•Bottleneck detection•Resource provisioning, accounting, etc.
Proposed Approach: Service Path
Service Path
Front - endWeb Servers
Middle-tierServlet Server
Application Logic(EJBs, etc.)
Data BaseBack - end
I n t e
r n e
t
Pro
xy S
erv
er
• System abstractions that describe the dynamic dependencies between the different distributed application components
• Service Class: Application-level request class, e.g. SLA class
Outline
• Background• Motivation• Service path
– Discovery with E2EProf– Refinement with SysProf– Automated SLA Enforcement
• Related Work• Future Plans
E2EProf
time
time
(AB)
(BC)
time
time
D1
D2
• Black-box approach• Correlate per-edge time series signals• Monitor network packet traces (source, destination, timestamps)
Model traces as per-edge time series signals or density functions
A
XB
C
D
Basic Approach
Delay at B
• Compute cross-correlation (D1 D2)
A
XB
C
D
(AB) (BC)
(AB) (BD)
SpikeCausality
Spike’s positionDelay
No spike
Evaluation with 4-tier RUBiS1
TomcatServer 1
TomcatServer 2
MySQLServerApache Web
Server
1http://rubis.objectweb.org/
Clients
comment
bidding
CPUbound
I/Obound
EJBServer 2
EJBServer 1
Service Path Detection in RUBiSHighest
delay node
Highest delay node
Highest delay nodes
Static server assignment
Round-robin load balancer
Revenue PipelineTotal Traffic:1.34 million / day (56k / hour)
Delta Air Lines’ Application
TACSIN &TACSOUT
XIN & XOUT
APEXIN &APEXOUT
Error/Warning (Tivoli) Logs
Time of the day
Lat
ency
(se
c)
Delta Air Lines’ Application
TACS
S1 S8S7S3S2
Client requests
TACS
Huge request burst
Outline
• Background• Motivation• Service path
– Discovery with E2EProf– Refinement with SysProf– Automated SLA Enforcement
• Related Work• Future Plans
Beyond dependency and latency…
C1
C2
S1
S3
S2
S5
S6S4
Solution: Zoom into the servicepath with SysProf• No application hints or instrumentation• Monitor resource usage on per-class basis
SysProf Methodology
ethdriver
BDD
Net
wor
kS
tack
System Call
FS/VM/etc.
A1 A2 ANS
ched
uler
UserKernel
Sch
edul
er
Instrumentationpoints
From clientTo client
Init CID
Context SwitchesContext Switches
Net softirq
system call parameters, PID,
App functions
Disk I/O
•Track request context–Work done for processing a request class–May span user-level or kernel-level–Executes in more than one contexts (e.g. processes, threads, softirqs)
–Happens in a system-visible event (e.g. system calls)
Class ID Propagation
InitCID
Process CID
Fromclient
To client
Msg CID
Packet CID Inherits CID
Front-Tier Middle-Tier End-Tier
UserKernel
Application of SysProf
• Resource Accounting
• Utility Billing
• Bottleneck detection
• Capacity Estimation
• Root-Cause Analysis
• Black-Box SLA management
Resource-Aware Adaptive Control
TomcatServer 1
TomcatServer 2
MySQLServer
EJBServer 2
EJBServer 1
Class 1
Class 2
Class 3
Cluster workloadscontending for same resources
Separate Queue/Controller for each cluster
resourcesofset
k k
kj
k
ki
R
r
R
rjiW ,,),(
Fro
nt-e
nd
Con
trol
ler
+Sc
hedu
ler
Summary
• Service Path– System abstractions to represent dependencies
and request path
• E2EProf and Pathmap– Dependency and latency analysis
• SysProf– Service-based resource analysis
• Aid human operator and automate end-to-end performance management