scalable wikipedia with erlang · routing table with each routing step using greedy routing,...

Post on 15-Mar-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Scalable Wikipedia with Erlang

Thorsten Schütt 1

Thorsten Schütt, Florian Schintke , Alexander Reinefeld

Zuse Institute Berlin (ZIB)

onScale solutions

Scaling

Thorsten Schütt 2

Web 2.0 Hosting

1. Step

• Single Server

– Webserver

– DB Server

Clients

Thorsten Schütt 3

– DB Server

2. Step

• Single Webserver

Clients

Thorsten Schütt 4

• Single DB server

3. Step

• Load Balancer

• n Webservers

Clients

. . .

Thorsten Schütt 5

• Single DB server

. . .

4. Step

• Load Balancer

• n Webservers

Clients

Thorsten Schütt 6

• Master DB server

• Slave DB servers

. . .

5. Step

• Load Balancer

• n Webservers

Clients

Thorsten Schütt 7

• DB partitioning

• DB clusters

• n DB servers

• memcached

DB Directory

. . .

What did others do?

• SimpleDB

• BigTable

• Dynamo

• MySQL Cluster

Thorsten Schütt 8

• MySQL Cluster

• MapReduce

• Google FS

Our Approach: P2P in the Data Center

Clients

Key/Value Store

(simple DBMS)

Thorsten Schütt 9

P2P OverlayP2P Overlay

Transactions Transactions

ReplicationReplication

ACID

Scalable Wikipedia with Erlang

• Chord#

• Transactional Layer

• Load Balancing

Thorsten Schütt 10

• Wikipedia

Chord# = Chord \ Hash

Thorsten Schütt 11

Chord# = Chord \ Hash

Chord#

• A dictionary has 3 ops:

• insert(key, value)

• delete(key)

• lookup(key)

• Chord# implements a distributed dictionary

Thorsten Schütt 12

• Chord implements a distributed dictionary

Key Value

Clarke 2007

Allen 2006

Bachman 1973

Thompson 1983

Knuth 1974

Codd 1981

node A

node D

node B

node C

Each node only

stores part of the

data

Backus

Codd

Dijkstra

FlyodRivest

Wirth

Yao

Chord#

• Key Space is an arbitrary set

with a total order, e.g.

strings

• Every node is assigned a

Thorsten Schütt 13

Gray

Karp

Knuth

Milner

Minsky

Naur

Nygaard

Perlis

Ritchie

• Every node is assigned a

random key

• Nodes form logical ring

over the key space

Backus

Distributed Dictionary

• Items are stored on their

successor, i.e. the first node

encountered in clockwise

direction

Thorsten Schütt 14

Backus

Codd

Ritchie

Rivest

Wirth

Yao

(Thompson, 1983)

(Clarke, 2007)

(Allen, 2006)

direction

– Thomson on Wirth

– Allen on Backus

– Bachman on Backus

– Clarke on Codd

– …

(Bachman, 1973)

• Each node puts the successor and log2N exponentially spaced fingers in its

routing table

With each routing step using greedy routing,

Distributed Dictionary

Backus

CoddYao

Thorsten Schütt 15

• With each routing step using greedy routing,

the distance between the currrent node

and the destination is halved.

=> Yields O(log2N) hops.

Dijkstra

Flyod

Gray

Karp

Knuth

Milner

Minsky

Naur

Nygaard

Perlis

Ritchie

Rivest

Wirth

Summary: Chord#

• DHT is a fully decentralized data structure

• DHTs self-organize as nodes

• Only one hop for calculating a routing pointer, Chord needs log(N) hops.

DHTs/Chord Chord#

Thorsten Schütt 16

• DHTs self-organize as nodes join, leave, and fail

• All operations only require local knowledge

• max. log (N) hops, while Chord guarantees this only with “high probability”.

• Can adapt to imbalances in the query load; Chord can’t.

• Supports range queries.

Scalable Wikipedia with Erlang

• Chord#

• Transactional Layer

• Load Balancing

Thorsten Schütt 17

• Wikipedia

ACID

Transactions on DHTs are Challenging

• high churn rate

• nodes may leave, join, or crash at any time

� changing responsibilities

• “crash stop” fault model

Thorsten Schütt 18

• “crash stop” fault model

• no perfect “failure detector”

• never know whether a node crashed or just slow network

Transactions + Replicas

START

debit (a, 100);

START

debit (a1, 100);

debit (a2, 100);

debit (a3, 100);

Thorsten Schütt 19

deposit (b, 100);

COMMIT

3

deposit (b1, 100);

deposit (b2, 100);

deposit (b3, 100);

COMMIT

Adapted Paxos Commit

Leader ItemsTransaction

Manager

1. Step

2. Step

GetTPInitTM

RegisterRTM

RegisterTP

TMs and TPs

• Optimistic CC with fallback option

• Fast Read Operation

– just 1 round

– reads a majority (of the last versions)

of all replicates

Thorsten Schütt 20

3. Step

4. Step

5. Step

6. Step

TMs and TPs

Prepare

Prepared

Ack Prepared

Commit

of all replicates

• Write Operation

– 3 rounds

– nonblocking (fallback)

• succeeds when > f/2 nodes alive

Summary “Transactions“

• Consistent way to update several items and their replicas

Thorsten Schütt 21

• Mitigates some of the Overlay Oddities

• Node Failures

• Asynchronous programming model

Scalable Wikipedia with Erlang

• Chord#

• Transactional Layer

• Load Balancing

Thorsten Schütt 22

• Wikipedia

Backus

Codd

Dijkstra

FlyodRivest

Wirth

Yao1. Pick 2 random nodes („Naur“,

„Wirth“)

2. IF Load(„Wirth“) >> Load(„Naur“)

1. „Naur“ leaves system

2. „Naur“ joins as „Tarjan“

Load-Balancing

Overloaded

Thorsten Schütt 23

Gray

Karp

Knuth

Milner

Minsky

Naur

Nygaard

Perlis

RitchieLoad can be any metric

D. Karger, M. Ruhl. „Simple Efficient Load Balancing

Algorithms for Peer-to-Peer Systems“. IPTPS 2004.

Backus

Codd

Dijkstra

FlyodTarjan

Wirth

Yao

1. Pick 2 random nodes („Naur“, „Wirth“)

2. IF Load(„Wirth“) >> Load(„Naur“)

1. „Naur“ leaves system

2. „Naur“ joins as „Tarjan“

Load-Balancing

Thorsten Schütt 24

Gray

Karp

Knuth

Milner

Minsky

Naur

Nygaard

Perlis

Ritchie

RivestLoad can be any metric

D. Karger, M. Ruhl. „Simple Efficient Load Balancing

Algorithms for Peer-to-Peer Systems“. IPTPS 2004.

Multi Data-Center Scenario

• Multi-Data-Center Scenarios– Optimize for Latency

– Increase Availability

• Prefix Articles with– Language

Thorsten Schütt 25

– Language

– Replicas Number

• E.g. „de:Main Page“– 5 replicas

– 2 in Germany

– 1 in UK

– 1 in USA

– 1 in Asia

„0de:Main Page“, „1de:Main Page“, „2de:Main Page“, …

Scalable Wikipedia with Erlang

• Chord#

• Transactional Layer

• Load Balancing

Thorsten Schütt 26

• Wikipedia

Wikipedia

Top 10 Web sites

1. Yahoo!

2. Google

3. YouTube

4. Windows Live

Wikipedia is the top1 open Web Site

– source code is open

– architecture is open

Thorsten Schütt 27

4. Windows Live

5. MSN

6. Myspace

7. Wikipedia

8. Facebook

9. Blogger.com

10. Yahoo!カテゴリ

– architecture is open

– dumps available

Source: alexa.com

Wikipedia

Top 10 Web sites

1. Yahoo!

2. Google

3. YouTube

4. Windows Live

50.000 requests/sec

– 95% are answered by squid proxies

– 2,000 req./sec hit the backend

Thorsten Schütt 28

4. Windows Live

5. MSN

6. Myspace

7. Wikipedia

8. Facebook

9. Blogger.com

10. Yahoo!カテゴリ

The Wikipedia System Architecture

other

Thorsten Schütt 29

web servers

search servers

NFS

Erlang Implementation

Thorsten Schütt 30

Erlang Implementation

Overlay

• Implemented in Erlang

– Chord#

– Load-Balancing

– Transaction Framework

• OTP behaviours

– Supervisor

Thorsten Schütt 31

– Supervisor

– gen_server

• Distributed Erlang

– Security Problems

– Scalability Problems

-> Own Transport-Layer on top of TCP

Data Model

Wikipedia

– SQL DB

Chord#

– Key-Value Store

Thorsten Schütt 32

Map Relations to Key-Value Pairs

– (Title, List of Versions)

– (CategoryName, List of Titles)

– (Title, List of Titles) //Backlinks

CREATE TABLE /*$wgDBprefix*/page (

page_id int unsigned NOT

NULL auto_increment,

page_namespace int NOT NULL,

...

Data Model

void updatePage(string title, int oldVersion, string newText)

{

//new transaction

Transaction t = new Transaction();

//read old version

Page p = t.read(title);

//check for concurrent update

if(p.currentVersion != oldVersion)

Thorsten Schütt 33

t.abort();

else{

//write new text

t.write(p.add(newText));

//update categories

foreach(Category c in p)

t.write(t.read(c.name).add(title));

//commit

t.commit();

}

}

Wiki

Database:

• Chord#

• Mapping

– Wiki -> Key-Value Store

Thorsten Schütt 34

Renderer:

• Java

– Tomcat

– Plog4u

• Jinterface

– Interface to Erlang

IEEE Scale Challenge 2008

Live Demos

• Bavarian

• Simple English

• Browsing

• Deployments

– Planet-Lab

• 20 nodes

– Cluster

• 320 nodes in Berlin

Thorsten Schütt 35

• Browsing

• Editing with full History

• Category-Pages

• 320 nodes in Berlin

Summary

• Previously, P2P was mainly used for file sharing (read only).

• We support consistent, distributed write operations.

DHT + Transactions = Scalable, Reliable, Efficient Key/Value Store

Thorsten Schütt 36

• We support consistent, distributed write operations.

• Numerous applications

• Internet databases, transactional online-services, …

Team

• Thorsten Schütt,

• Florian Schintke,

• Monika Moser,

• Stefan Plantikow,

• Alexander Reinefeld,

Thorsten Schütt 37

• Alexander Reinefeld,

• Nico Kruber,

• Christian von Prollius,

• Seif Haridi (SICS),

• Ali Ghodsi (SICS),

• Tallat Shafaat (SICS)

Detailed Descriptions

WikiS. Plantikow, A. Reinefeld, F. Schintke.

Transactions for Distributed Wikis on Structured Overlays.

DSOM, October 2007.

TransactionsM. Moser, S. Haridi.

Atomic Commitment in Transactional DHTs.

1st CoreGRID Symposium, August 2007.

Thorsten Schütt 38

1st CoreGRID Symposium, August 2007.

T. Shafaat, M. Moser, A. Ghodsi , S. Haridi, T. Schütt, A. Reinefeld.

Key-Based Consistency and Availability in Structured Overlay Networks.

Infoscale, June 2008.

DHTT. Schütt, F. Schintke, A. Reinefeld.

A Structured Overlay for Multi-dimensional Range Queries.

Euro-Par, August 2007.

T. Schütt, F. Schintke, A. Reinefeld.

Structured Overlay without Consistent Hashing: Empirical Results.

GP2PC, May 2006.

Questions

Thorsten Schütt 39

Questions

top related