locality-aware request distribution in cluster-based network servers

Locality-Aware Request Distribution in Cluster-based Network ServersPresented by: Kevin Boos

Authors: Vivek S. Pai, Mohit Aron, et al.Rice UniversityASPLOS 1998*** Figures adapted from original presentation ***

2

Time Warp to 1998

Rapid Internet growth Bandwidth limitations “Cheap” PCs and “fast” LANs Need for increased throughput

3

Clustered Servers

Front-End

Node

LAN (Switch

)

Back-End

NodeBack-End

NodeBack-End

Node

Client

Client

4

Weighted Round Robin (WRR)

5

Pure Locality-Based Distribution

6

Motivation for Change

Weighted Round Robin Disregards content on back-end nodes Many cache misses Limited by disk performance

Pure Locality-Based Distribution Disregards current load on back-end nodes Uneven load distribution Inefficient use of resources

7

LARD Concepts

Locality-Aware Request Distribution Goal: improve performance

Higher throughput Higher cache hit rates Reduced disk access

Even load distribution + content-based distribution The best of both algorithms

8

Outline

Basic LARD Algorithm Improvements to LARD TCP Handoff Protocol Simulation and Results Prototype Implementation and Testing

9

Outline


10

Basic LARD Algorithm

Front-end maps target content to back-end nodes 1-to-1 mapping

First request for each target is assigned to the least-loaded back-end node

Subsequent requests are distributed to the same back-end node based on target content mapping Unless overloaded… Re-assigns target content to a new back-end node

11

Front-End

Flow of Basic LARD

Client

AAa

AAa

12

Determining Load in Basic LARD

Ask the server? Introduces unnecessary communication

Current load = number of open connections Tracked in the front-end node

Use thresholds to determine when to re-balance Low, High, and Limit Re-balance when (load > Tlimit) or

(load > Thigh and there is a “free” node with load < Tlow)

13

Outline


14

LARD Needs Improvement

Only one back-end node per target content Working set is a single node Front-end must limit total connections

Still need to increase throughput One node per content type is unrealistic …add more back-end nodes?

15

LARD/R

LARD with Replication Maps target content to a set of back-end nodes

Working set is several nodes with similar cache content

Sends new requests to least-loaded node in set Moves nodes to/from sets based on load

imbalance Idle nodes in a low-load set are moved to higher-load set

16

Front-End

Flow of LARD/R

Client

AAa

AAa

AAa

17

LARD Outline

Basic LARD Algorithm Improvements to LARD Request Handoff Protocol Simulation and Results Prototype Implementation and Testing

18

Determining Content Type

How do we determine content in the front-end? Front-end must see network traffic

Standard TCP Assumptions Requests are small and light Responses are big and heavy

How do we forward requests?

19

Potential TCP Solutions

Simple TCP Proxy Everything must flow through front-end node

Can inspect all incoming content

Cannot respond directly from back-end to client But front-end can also inspect all outgoing content

Better for persistent connections

20

TCP Connection Handoff Front-end connects

to client Inspects content Forwards request

to back-end node Returned directly

back to client from back-end node

21

LARD Outline


22

Evaluation Goals

Throughput Requests/second served by entire cluster

Hit rate (Requests that hit memory cache) / (total requests)

Underutilization time Time that a node’s load is ≤ 40% of Tlow

23

Simulation Model

300MHz Pentium II 32MB Memory (cache) 100Mbps Ethernet Traces from web servers at Rice and IBM

24

Simulation Results – Prior Work

Weighted Round Robin Lowest throughput Highest cache miss ratio But lowest idle time

Pure Locality-Based An increase in nodes decrease in cache miss ratio But idle time increases (unbalanced load) Only minor improvement over WRR

25

Simulation Results – LARD & LARD/R Throughput ~4x better (8 nodes)

WRR would need nodes with a 10x larger cache size

CPU bound after 8 nodes Cache miss rate decreases Only 1% idle time on average

26

Simulation Results – Throughput

27

Simulation Results – Cache Misses

28

Simulation Results – Idle Time

29

What Affects Performance?

WRR is disk-bound, LARD/R is CPU bound Increasing CPU speed improves LARD/R, not WRR Adding more disks improves WRR, not LARD/R

LARD/R shows no improvement if a node has > 2 disks

WRR is not scalable

30

LARD Outline


31

Prototype Implementation

One front-end PC 300MHz Pentium II, 128MB RAM

6 back-end PCs 7 client PCs

166MHz Pentium Pro, 64MB RAM

100Mb Ethernet, 24-port switch

32

Prototype Testing Results

33

Evaluation Shortcomings

What influences the results more? LARD/R protocol? TCP handoff protocol?

34

Conclusion

LARD and LARD/R significantly better than WRR Higher throughput Better CPU utilization More frequent cache hits Reduced disk access

Benefits of Locality-Based and Load-Balanced Scalable at low cost

locality-aware request distribution in cluster-based network servers

Documents