p2p systems - 1 ©2005, karl aberer, epfl-ic, laboratoire de systèmes d'informations répartis...

29
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 1 Information Systems: A Short Overview of Peer-2-Peer Systems Karl Aberer EPFL-IC lsirwww.epfl.ch

Post on 20-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 1

Information Systems: A Short Overview of Peer-2-Peer Systems

Karl AbererEPFL-IC

lsirwww.epfl.ch

Page 2: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 2

Overview

1. P2P Systems - Motivation2. Unstructured P2P Overlay Networks3. Hierarchical P2P Overlay Networks4. Structured P2P Overlay Networks

5. Small World Graphs6. Conclusions

Page 3: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 3

• Web search engine– Global scale application

• Example: Google– 200-300 Mio searches/day– 4 10^9 Web pages

Client-Server Information Systems

100000 processors261000 disks

1

Find "aberer"

2Result home page of Karl Aberer …

GoogleServer

Client

Client

Client

ClientClient

Client

Client

Client

Client

Client

• Strengths– Global ranking– Fast response time

• Weaknesses– Infrastructure, administration, cost– A new company for every global application ?

Page 4: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 4

(Semi-)Decentralized Information Systems

• P2P Music file sharing– Global scale application

• Example: Napster– 1.57 Mio. Users– 10 TeraByte of data

(2 Mio songs, 220 songs per user) (February 2001)

1

Find <title> "brick in the wall" <artist> "pink floyd" <size> "1 MB" <category> "rock"schema

2

Result you find f.mp3 at peer x

3

Request and transfer file f.mp3from peer X directly

NapsterServer

Peer

Peer

Peer

PeerPeer

Peer

Peer

Peer

Peer

PeerX

100 servers

Page 5: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 5

Lessons Learned from Napster

• Strengths: Resource Sharing– Every node “pays” its participation by providing access to its resources

• physical resources (disk, network), knowledge (annotations), ownership (files)

– Every participating node acts as both a client and a server (“servent”): P2P

– global information system without huge investment – decentralization of cost and administration = avoiding resource

bottlenecks

• Weaknesses: Centralization– server is single point of failure– unique entity required for controlling the system = design bottleneck– copying copyrighted material made Napster target of legal attack

CentralizedSystem

DecentralizedSystem

increasing degree of resource sharing and decentralization

Page 6: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 6

2. Unstructured P2P Overlay Networks

• P2P file sharing– Global scale application

• Example: Gnutella– 40.000 nodes, 3 Mio files

(August 2000)

Gnutella: no servers

• Strengths– Good response time, scalable– No infrastructure, no

administration– No single point of failure

• Weaknesses– High network traffic– No structured search– Free-riding

Find "brick in the wall"

I have "brick_in_the_wall.mp3"….

Self-organizing System

Page 7: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 7

Self-Organization

• Self-organized systems well known from physics, biology, cybernetics– distribution of control ( = decentralization = P2P)– local interactions, information and decisions– emergence of global structures– failure resilience

• Self-organization in information systems– new hot topic in research– strongly motivated by P2P systems

Page 8: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 8

Connectivity in Gnutella

• Follows a power-law distribution: P(k) ~ k-g

– k number of links a node is connected to, g constant (e.g. g=2)– distribution independent of number of nodes N– explanation: preferential attachment (self-organization process)

Page 9: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 9

3. Hierarchical P2P Overlay Networks

• Dedicated servers provide index information, i.e. know which peer holds which file (Napster)

• Simplest Approach– one central server– user register files– service (file exchange) is organized

as P2P architecture

k="madonna"

index server

Page 10: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 10

Superpeer Networks

• Improvement of Central Index Server (Morpheus, Kaaza)– multiple index servers build a P2P network– clients are associated with one (or more) superpeers– superpeers use message flooding to forward search requests

Page 11: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 11

4. Structured P2P Overlay Networks

• Unstructured overlay networks – what we learned– simplicity (simple protocol) – robustness (almost impossible to “kill” – no central authority)

• Performance– search latency O(logN)– update cost low

• Drawbacks– tremendous bandwidth consumption

• Can we do better?

Page 12: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 12

"Napster"bottleneck

Search Trees

• Search tree: search keys are binary keys

000 001 010 011 100 101 110 111

00? 01? 10? 11?

0?? 1??

???

index

101?

101?

101?

101!peer 1 peer 2 peer 3 peer 4

Page 13: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 13

Non-scalable Distribution of Search Tree

• Distribute search tree over peers

000 001 010 011 100 101 110 111

00? 01? 10? 11?

0?? 1??

???

peer 1 peer 2 peer 3 peer 4

bottleneck

Page 14: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 14

Scalable Distribution of Search Tree (P-Grid)

000 001 010 011 100 101 110 111

00? 01? 10? 11?

0?? 1??

???

peer 1 peer 2 peer 3 peer 4

Associate each peer with a complete path

Page 15: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 15

Routing Information

100 101

10?

1??

???

peer 1 peer 2

peer 3

peer 4

know more about this part of the tree

knows more about this part of the tree

Page 16: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 16

Prefix Routing

11?

1??

???

peer 4

peer 1peer 2

peer 3

110 111

100 101

10?

1??

???

peer 1 peer 2

peer 3

peer 4

101?

101?

101?

101?

101!

Messageto peer 3

101 ?

prefix peer

0?? peer1 peer2

10? peer3

routing tableof peer4

search(p. k)find in routing table peeri withlongest prefix matching kif last entry then found else search(peeri, k)

search(p. k)find in routing table peeri withlongest prefix matching kif last entry then found else search(peeri, k)

Page 17: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 17

Efficient Resource Location

search cost

maximal bandwidth

update cost

low low

low

high

high

high

BROADCAST(e.g. Gnutella)

SERVER(e.g. Napster)

FULL REPLICATION

STRUCTURED OVERLAYNETWORKS

(e.g. prefix routing)

Page 18: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 18

5. Small World Graphs

• Each overlay network can be interpreted as a directed graph– peers correspond to nodes – routing table entries as directed

links

• Task– Find a decentralized algorithm

(greedy routing) to route a message from any node A to any other node B with few hops compared to the size of the graph

– Requires the existence of short paths in the graph

Page 19: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 19

Milgram’s Experiment

• Finding short chains of acquaintances linking pairs of people in USA who didn’t know each other;– Source person in Nebraska– Sends message with first name and location– Target person in Massachusetts.

• Average length of the chains that were completed was between 5 and 6 steps

• “Six degrees of separation” principle

• BIG QUESTION:– WHY there should be short chains of acquaintances linking together

arbitrary pairs of strangers???

Page 20: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 20

Random Graphs

• For many years typical explanation was - random graphs– Vertices are selected uniformly at random

– Low diameter: expected distance between two nodes is logkN (where k is the outdegree and N the number of nodes)

• But there are some inaccuracies– If A and B have a common friend C it is more likely that they themselves will be friends! (clustering)– Many real world networks (social networks, biological networks in nature, artificial networks – power

grid, WWW) exhibit this clustering property– Random networks are NOT clustered.

Page 21: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 21

Clustering

• Clustering measures the fraction of neighbors of a node that are connected themselves

• Regular Graphs have a high clustering coefficient– but also a high diameter

• Random Graphs have a low clustering coefficient– but a low diameter

Random Graph (k=4)Short path length

L ~ logkN

Almost no clusteringC ~ k/n

Regular Graph (k=4)Long paths

L ~ n/(2k)

Highly clustered C ~ 3/4

Page 22: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 22

Small-World Networks

• Random rewiring of regular graph (by Watts and Strogatz) – With probability p rewire each link in a regular graph to a

randomly selected node– Resulting graph has high clustering and short path length

• BUT! Watts-Strogatz explains the structure of the graph– existence of short paths, high clustering

• It does not explain how the shortest paths are found– Gnutella networks are also small-world graphs

Page 23: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 23

P2P Overlay Networks as Graphs

• Each structured overlay network can be interpreted as a directed graph … – peers correspond to nodes – routing table entries as directed links

• … embedded in some space– P-Grid: interval [0,1] – others: d-dimensional space– etc.

• Task– Find a decentralized algorithm

(greedy routing) to route a message from any node A to any other node B with few hops compared to the size of the graph

each node has acoordinate!

Page 24: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 24

Kleinberg’s Small-World Model

• Kleinberg’s Small-World’s model– Embed the graph into an r-dimensional grid– constant number of short range links (neighborhood)– q long range links: choose long-range links such that the probability to

have a long range contact is proportional to 1/dr

• Importance of r !– Decentralized (greedy) routing performs best iff. r = dimension of space

r = 2

Page 25: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 25

Influence of “r”

• Given r = dim each long range contact of u is nearly equally likely to belong to any of the sets Ai

• When q = logN – on average each node will have a link in each set of Ai

A4

A3

A2

A1

Ai, consists of all nodes whose distance from u is between 2i and 2i+1, i=0..logN-1.

Page 26: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 26

Structured Overlay Networks and Kleinberg's model

P-Grid’s model

Kleinberg’s model

Page 27: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 27

6. Conclusions

• P2P systems

– started out from some "hacker-type" applications

– initiated lots of original research

– basis for novel, highly scalable information systems

Page 28: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 28

Small Guide to Information Systems courses

• Introduction to information systems (Sem 6, mandatory)Basis of all: relational databases and Web technology, fun project

• Middleware (Masters)Everything you need in industry

• Distributed Information systems (Masters)Fun part: Web, P2P, Search Engines, and much more

Page 29: P2P Systems - 1 ©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Systems: A Short Overview of Peer-2-Peer Systems

©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 29

Masters Specialization "Internet Computing"

INFNatural Language Rajman et al 6(4+2) SSINFAdvanced Databases Spaccapietra 6(3+3) SSINFMultimedia Documents Vanoirbeek 6(4+2) SSALG Distributed Inf. Sys. Aberer 6(4+2) WS ALG Intelligent Agents Faltings 6(3+3) WS

ALG Distributed algorithms: Schiper 4(2+1) WS message passingSYS Middleware Guerraoui 6(4+2) SSSYS Performance LeBoudec 6(4+2)

SSSYS Mobile Networks Hubaux 4(2+2) SSSYS Cryptography and Security Vaudenay 6(4+2)

WSHISHuman-Computer Interaction Pu 4(2+1) SSHISEnterprise Architecture Wegmann 6(4+2) WSHISE-Business Pigneur 6(4+2) WS

66 Credits Total