1 porcupine: a highly available cluster-based mail service yasushi saito brian bershad hank levy...

24
1 Porcupine: A Highly Available Cluster- based Mail Service Yasushi Saito Brian Bershad Hank Levy University of Washington Department of Computer Science and Engineering, Seattle, WA http:// porcupine.cs.washington.edu /

Post on 21-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

1

Porcupine: A Highly Available Cluster-based Mail Service

Yasushi SaitoBrian Bershad

Hank Levy

University of Washington Department of Computer Science and Engineering,

Seattle, WA

http://porcupine.cs.washington.edu/

2

Why Email?

Mail is importantReal demand

Mail is hardWrite intensiveLow locality

Mail is easyWell-defined APILarge parallelismWeak consistency

3

Goals

Use commodity hardware to build a large, scalable mail service

Three facets of scalability ...• Performance: Linear increase with cluster

size • Manageability: React to changes automatically• Availability: Survive failures gracefully

4

Conventional Mail Solution

Static partitioning

Performance problems:No dynamic load balancing

Manageability problems:Manual data partition

decisionAvailability problems:

Limited fault tolerance

SMTP/IMAP/POP

Bob’smbox

Ann’smbox

Joe’smbox

Suzy’smbox

NFS servers

5

Presentation Outline

OverviewPorcupine Architecture

Key concepts and techniquesBasic operations and data structuresAdvantages

Challenges and solutionsConclusion

6

Key Techniques and Relationships

Functional Homogeneity“any node can perform any task”

AutomaticReconfiguration

Load BalancingReplication

Manageability PerformanceAvailability

Framework

Techniques

Goals

7

Porcupine Architecture

Node A ...Node B Node Z...

SMTPserver

POPserver

IMAPserver

Mail mapMailbox storage

User profile

Replication Manager

Membership Manager

RPC

Load Balancer

User map

8

Porcupine Operations

Internet

A B...

A

1. “send mail to bob”

2. Who manages bob? A

3. “Verify bob”

5. Pick the best nodes to store new msg C

DNS-RR selection

4. “OK, bob has msgs on C and D 6. “Store

msg”B

C

Protocol handling

User lookup

Load Balancing

Message store

...C

9

Basic Data Structures

“bob”

BCACABAC

bob: {A,C}ann: {B}

BCACABAC

suzy: {A,C} joe: {B}

BCACABAC

Apply hash function

User map

Mail map/user info

Mailbox storage

A B C

Bob’s MSGs

Suzy’s MSGs

Bob’s MSGs

Joe’s MSGs

Ann’s MSGs

Suzy’s MSGs

10

Porcupine Advantages

Advantages:Optimal resource utilizationAutomatic reconfiguration and task re-distribution

upon node failure/recoveryFine-grain load balancing

Results:Better AvailabilityBetter ManageabilityBetter Performance

11

Presentation Outline

OverviewPorcupine ArchitectureChallenges and solutions

Scaling performanceHandling failures and recoveries:

Automatic soft-state reconstructionHard-state replication

Load balancingConclusion

12

Performance

GoalsScale performance linearly with cluster size

Strategy: Avoid creating hot spotsPartition data uniformly among nodes

Fine-grain data partition

13

Measurement Environment

30 node cluster of not-quite-all-identical PCs100Mb/s Ethernet + 1Gb/s hubsLinux 2.2.742,000 lines of C++ code

Synthetic load Compare to sendmail+popd

14

How does Performance Scale?

0

100

200

300

400

500

600

700

800

0 5 10 15 20 25 30Cluster size

Messages/second

Porcupine

sendmail+popd

68m/day

25m/day

15

Availability

Goals:Maintain function after failuresReact quickly to changes regardless of cluster sizeGraceful performance degradation / improvement

Strategy: Two complementary mechanismsHard state: email messages, user profile

Optimistic fine-grain replicationSoft state: user map, mail map

Reconstruction after membership change

16

Soft-state Reconstruction

B C A B A B A C

bob: {A,C}

joe: {C}

B C A B A B A C

B A A B A B A B

bob: {A,C}

joe: {C}

B A A B A B A B

A C A C A C A C

bob: {A,C}

joe: {C}

A C A C A C A C

suzy: {A,B}

ann: {B}

1. Membership protocolUsermap recomputation

2. Distributed disk scan

suzy:

ann:

Timeline

A

B

ann: {B}

B C A B A B A C

suzy: {A,B}C ann: {B}

B C A B A B A C

suzy: {A,B}ann: {B}

B C A B A B A C

suzy: {A,B}

17

How does Porcupine React to Configuration Changes?

300

400

500

600

700

0 100 200 300 400 500 600 700 800Time(seconds)

Messages/second

No failure

One nodefailureThree nodefailuresSix nodefailures

Nodes fail

New membership determined

Nodes recover

New membership determined

18

Hard-state Replication

Goals:Keep serving hard state after failuresHandle unusual failure modes

Strategy: Exploit Internet semanticsOptimistic, eventually consistent replicationPer-message, per-user-profile replicationEfficient during normal operationSmall window of inconsistency

19

How Efficient is Replication?

0

100

200

300

400

500

600

700

800

0 5 10 15 20 25 30Cluster size

Me

ss

ag

es

/se

co

nd

Porcupine no replication

Porcupine with replication=2

68m/day

24m/day

20

How Efficient is Replication?

0

100

200

300

400

500

600

700

800

0 5 10 15 20 25 30Cluster size

Me

ss

ag

es

/se

co

nd

Porcupine no replication

Porcupine with replication=2

Porcupine with replication=2, NVRAM

68m/day

24m/day33m/day

21

Load balancing: Deciding where to store messages

Goals:Handle skewed workload wellSupport hardware heterogeneityNo voodoo parameter tuning

Strategy: Spread-based load balancingSpread: soft limit on # of nodes per mailbox

Large spread better load balanceSmall spread better affinity

Load balanced within spreadUse # of pending I/O requests as the load measure

22

How Well does Porcupine Support Heterogeneous Clusters?

0%

10%

20%

30%

0% 3% 7% 10%Number of fast nodes (% of total)

Th

rou

gh

pu

t in

crea

se(%

)

Spread=4

Static

+16.8m/day (+25%)

+0.5m/day (+0.8%)

23

Conclusions

Fast, available, and manageable clusters can be built for write-intensive service

Key ideas can be extended beyond mailFunctional homogeneity

Automatic reconfiguration

Replication

Load balancing

24

Ongoing Work

More efficient membership protocolExtending Porcupine beyond mail: Usenet,

BBS, Calendar, etc More generic replication mechanism