locality-aware request distribution in cluster-based network servers
DESCRIPTION
Locality-Aware Request Distribution in Cluster-based Network Servers. Presented by: Kevin Boos Authors: Vivek S. Pai , Mohit Aron , et al. Rice University ASPLOS 1998 *** Figures adapted from original presentation ***. Time Warp to 1998. Rapid Internet growth Bandwidth limitations - PowerPoint PPT PresentationTRANSCRIPT
Locality-Aware Request Distribution in Cluster-based Network ServersPresented by: Kevin Boos
Authors: Vivek S. Pai, Mohit Aron, et al.Rice UniversityASPLOS 1998*** Figures adapted from original presentation ***
2
Time Warp to 1998
Rapid Internet growth Bandwidth limitations “Cheap” PCs and “fast” LANs Need for increased throughput
3
Clustered Servers
Front-End
Node
LAN (Switch
)
Back-End
NodeBack-End
NodeBack-End
Node
Client
Client
4
Weighted Round Robin (WRR)
5
Pure Locality-Based Distribution
6
Motivation for Change
Weighted Round Robin Disregards content on back-end nodes Many cache misses Limited by disk performance
Pure Locality-Based Distribution Disregards current load on back-end nodes Uneven load distribution Inefficient use of resources
7
LARD Concepts
Locality-Aware Request Distribution Goal: improve performance
Higher throughput Higher cache hit rates Reduced disk access
Even load distribution + content-based distribution The best of both algorithms
8
Outline
Basic LARD Algorithm Improvements to LARD TCP Handoff Protocol Simulation and Results Prototype Implementation and Testing
9
Outline
Basic LARD Algorithm Improvements to LARD TCP Handoff Protocol Simulation and Results Prototype Implementation and Testing
10
Basic LARD Algorithm
Front-end maps target content to back-end nodes 1-to-1 mapping
First request for each target is assigned to the least-loaded back-end node
Subsequent requests are distributed to the same back-end node based on target content mapping Unless overloaded… Re-assigns target content to a new back-end node
11
Front-End
Flow of Basic LARD
Client
AAa
AAa
12
Determining Load in Basic LARD
Ask the server? Introduces unnecessary communication
Current load = number of open connections Tracked in the front-end node
Use thresholds to determine when to re-balance Low, High, and Limit Re-balance when (load > Tlimit) or
(load > Thigh and there is a “free” node with load < Tlow)
13
Outline
Basic LARD Algorithm Improvements to LARD TCP Handoff Protocol Simulation and Results Prototype Implementation and Testing
14
LARD Needs Improvement
Only one back-end node per target content Working set is a single node Front-end must limit total connections
Still need to increase throughput One node per content type is unrealistic …add more back-end nodes?
15
LARD/R
LARD with Replication Maps target content to a set of back-end nodes
Working set is several nodes with similar cache content
Sends new requests to least-loaded node in set Moves nodes to/from sets based on load
imbalance Idle nodes in a low-load set are moved to higher-load set
16
Front-End
Flow of LARD/R
Client
AAa
AAa
AAa
17
LARD Outline
Basic LARD Algorithm Improvements to LARD Request Handoff Protocol Simulation and Results Prototype Implementation and Testing
18
Determining Content Type
How do we determine content in the front-end? Front-end must see network traffic
Standard TCP Assumptions Requests are small and light Responses are big and heavy
How do we forward requests?
19
Potential TCP Solutions
Simple TCP Proxy Everything must flow through front-end node
Can inspect all incoming content
Cannot respond directly from back-end to client But front-end can also inspect all outgoing content
Better for persistent connections
20
TCP Connection Handoff Front-end connects
to client Inspects content Forwards request
to back-end node Returned directly
back to client from back-end node
21
LARD Outline
Basic LARD Algorithm Improvements to LARD TCP Handoff Protocol Simulation and Results Prototype Implementation and Testing
22
Evaluation Goals
Throughput Requests/second served by entire cluster
Hit rate (Requests that hit memory cache) / (total requests)
Underutilization time Time that a node’s load is ≤ 40% of Tlow
23
Simulation Model
300MHz Pentium II 32MB Memory (cache) 100Mbps Ethernet Traces from web servers at Rice and IBM
24
Simulation Results – Prior Work
Weighted Round Robin Lowest throughput Highest cache miss ratio But lowest idle time
Pure Locality-Based An increase in nodes decrease in cache miss ratio But idle time increases (unbalanced load) Only minor improvement over WRR
25
Simulation Results – LARD & LARD/R Throughput ~4x better (8 nodes)
WRR would need nodes with a 10x larger cache size
CPU bound after 8 nodes Cache miss rate decreases Only 1% idle time on average
26
Simulation Results – Throughput
27
Simulation Results – Cache Misses
28
Simulation Results – Idle Time
29
What Affects Performance?
WRR is disk-bound, LARD/R is CPU bound Increasing CPU speed improves LARD/R, not WRR Adding more disks improves WRR, not LARD/R
LARD/R shows no improvement if a node has > 2 disks
WRR is not scalable
30
LARD Outline
Basic LARD Algorithm Improvements to LARD TCP Handoff Protocol Simulation and Results Prototype Implementation and Testing
31
Prototype Implementation
One front-end PC 300MHz Pentium II, 128MB RAM
6 back-end PCs 7 client PCs
166MHz Pentium Pro, 64MB RAM
100Mb Ethernet, 24-port switch
32
Prototype Testing Results
33
Evaluation Shortcomings
What influences the results more? LARD/R protocol? TCP handoff protocol?
34
Conclusion
LARD and LARD/R significantly better than WRR Higher throughput Better CPU utilization More frequent cache hits Reduced disk access
Benefits of Locality-Based and Load-Balanced Scalable at low cost