p2p systems - 1 ©2005, karl aberer, epfl-ic, laboratoire de systèmes d'informations répartis...
Post on 20-Dec-2015
215 views
TRANSCRIPT
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 1
Information Systems: A Short Overview of Peer-2-Peer Systems
Karl AbererEPFL-IC
lsirwww.epfl.ch
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 2
Overview
1. P2P Systems - Motivation2. Unstructured P2P Overlay Networks3. Hierarchical P2P Overlay Networks4. Structured P2P Overlay Networks
5. Small World Graphs6. Conclusions
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 3
• Web search engine– Global scale application
• Example: Google– 200-300 Mio searches/day– 4 10^9 Web pages
Client-Server Information Systems
100000 processors261000 disks
1
Find "aberer"
2Result home page of Karl Aberer …
GoogleServer
Client
Client
Client
ClientClient
Client
Client
Client
Client
Client
• Strengths– Global ranking– Fast response time
• Weaknesses– Infrastructure, administration, cost– A new company for every global application ?
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 4
(Semi-)Decentralized Information Systems
• P2P Music file sharing– Global scale application
• Example: Napster– 1.57 Mio. Users– 10 TeraByte of data
(2 Mio songs, 220 songs per user) (February 2001)
1
Find <title> "brick in the wall" <artist> "pink floyd" <size> "1 MB" <category> "rock"schema
2
Result you find f.mp3 at peer x
3
Request and transfer file f.mp3from peer X directly
NapsterServer
Peer
Peer
Peer
PeerPeer
Peer
Peer
Peer
Peer
PeerX
100 servers
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 5
Lessons Learned from Napster
• Strengths: Resource Sharing– Every node “pays” its participation by providing access to its resources
• physical resources (disk, network), knowledge (annotations), ownership (files)
– Every participating node acts as both a client and a server (“servent”): P2P
– global information system without huge investment – decentralization of cost and administration = avoiding resource
bottlenecks
• Weaknesses: Centralization– server is single point of failure– unique entity required for controlling the system = design bottleneck– copying copyrighted material made Napster target of legal attack
CentralizedSystem
DecentralizedSystem
increasing degree of resource sharing and decentralization
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 6
2. Unstructured P2P Overlay Networks
• P2P file sharing– Global scale application
• Example: Gnutella– 40.000 nodes, 3 Mio files
(August 2000)
Gnutella: no servers
• Strengths– Good response time, scalable– No infrastructure, no
administration– No single point of failure
• Weaknesses– High network traffic– No structured search– Free-riding
Find "brick in the wall"
I have "brick_in_the_wall.mp3"….
Self-organizing System
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 7
Self-Organization
• Self-organized systems well known from physics, biology, cybernetics– distribution of control ( = decentralization = P2P)– local interactions, information and decisions– emergence of global structures– failure resilience
• Self-organization in information systems– new hot topic in research– strongly motivated by P2P systems
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 8
Connectivity in Gnutella
• Follows a power-law distribution: P(k) ~ k-g
– k number of links a node is connected to, g constant (e.g. g=2)– distribution independent of number of nodes N– explanation: preferential attachment (self-organization process)
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 9
3. Hierarchical P2P Overlay Networks
• Dedicated servers provide index information, i.e. know which peer holds which file (Napster)
• Simplest Approach– one central server– user register files– service (file exchange) is organized
as P2P architecture
k="madonna"
index server
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 10
Superpeer Networks
• Improvement of Central Index Server (Morpheus, Kaaza)– multiple index servers build a P2P network– clients are associated with one (or more) superpeers– superpeers use message flooding to forward search requests
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 11
4. Structured P2P Overlay Networks
• Unstructured overlay networks – what we learned– simplicity (simple protocol) – robustness (almost impossible to “kill” – no central authority)
• Performance– search latency O(logN)– update cost low
• Drawbacks– tremendous bandwidth consumption
• Can we do better?
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 12
"Napster"bottleneck
Search Trees
• Search tree: search keys are binary keys
000 001 010 011 100 101 110 111
00? 01? 10? 11?
0?? 1??
???
index
101?
101?
101?
101!peer 1 peer 2 peer 3 peer 4
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 13
Non-scalable Distribution of Search Tree
• Distribute search tree over peers
000 001 010 011 100 101 110 111
00? 01? 10? 11?
0?? 1??
???
peer 1 peer 2 peer 3 peer 4
bottleneck
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 14
Scalable Distribution of Search Tree (P-Grid)
000 001 010 011 100 101 110 111
00? 01? 10? 11?
0?? 1??
???
peer 1 peer 2 peer 3 peer 4
Associate each peer with a complete path
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 15
Routing Information
100 101
10?
1??
???
peer 1 peer 2
peer 3
peer 4
know more about this part of the tree
knows more about this part of the tree
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 16
Prefix Routing
11?
1??
???
peer 4
peer 1peer 2
peer 3
110 111
100 101
10?
1??
???
peer 1 peer 2
peer 3
peer 4
101?
101?
101?
101?
101!
Messageto peer 3
101 ?
prefix peer
0?? peer1 peer2
10? peer3
routing tableof peer4
search(p. k)find in routing table peeri withlongest prefix matching kif last entry then found else search(peeri, k)
search(p. k)find in routing table peeri withlongest prefix matching kif last entry then found else search(peeri, k)
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 17
Efficient Resource Location
search cost
maximal bandwidth
update cost
low low
low
high
high
high
BROADCAST(e.g. Gnutella)
SERVER(e.g. Napster)
FULL REPLICATION
STRUCTURED OVERLAYNETWORKS
(e.g. prefix routing)
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 18
5. Small World Graphs
• Each overlay network can be interpreted as a directed graph– peers correspond to nodes – routing table entries as directed
links
• Task– Find a decentralized algorithm
(greedy routing) to route a message from any node A to any other node B with few hops compared to the size of the graph
– Requires the existence of short paths in the graph
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 19
Milgram’s Experiment
• Finding short chains of acquaintances linking pairs of people in USA who didn’t know each other;– Source person in Nebraska– Sends message with first name and location– Target person in Massachusetts.
• Average length of the chains that were completed was between 5 and 6 steps
• “Six degrees of separation” principle
• BIG QUESTION:– WHY there should be short chains of acquaintances linking together
arbitrary pairs of strangers???
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 20
Random Graphs
• For many years typical explanation was - random graphs– Vertices are selected uniformly at random
– Low diameter: expected distance between two nodes is logkN (where k is the outdegree and N the number of nodes)
• But there are some inaccuracies– If A and B have a common friend C it is more likely that they themselves will be friends! (clustering)– Many real world networks (social networks, biological networks in nature, artificial networks – power
grid, WWW) exhibit this clustering property– Random networks are NOT clustered.
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 21
Clustering
• Clustering measures the fraction of neighbors of a node that are connected themselves
• Regular Graphs have a high clustering coefficient– but also a high diameter
• Random Graphs have a low clustering coefficient– but a low diameter
Random Graph (k=4)Short path length
L ~ logkN
Almost no clusteringC ~ k/n
Regular Graph (k=4)Long paths
L ~ n/(2k)
Highly clustered C ~ 3/4
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 22
Small-World Networks
• Random rewiring of regular graph (by Watts and Strogatz) – With probability p rewire each link in a regular graph to a
randomly selected node– Resulting graph has high clustering and short path length
• BUT! Watts-Strogatz explains the structure of the graph– existence of short paths, high clustering
• It does not explain how the shortest paths are found– Gnutella networks are also small-world graphs
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 23
P2P Overlay Networks as Graphs
• Each structured overlay network can be interpreted as a directed graph … – peers correspond to nodes – routing table entries as directed links
• … embedded in some space– P-Grid: interval [0,1] – others: d-dimensional space– etc.
• Task– Find a decentralized algorithm
(greedy routing) to route a message from any node A to any other node B with few hops compared to the size of the graph
each node has acoordinate!
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 24
Kleinberg’s Small-World Model
• Kleinberg’s Small-World’s model– Embed the graph into an r-dimensional grid– constant number of short range links (neighborhood)– q long range links: choose long-range links such that the probability to
have a long range contact is proportional to 1/dr
• Importance of r !– Decentralized (greedy) routing performs best iff. r = dimension of space
r = 2
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 25
Influence of “r”
• Given r = dim each long range contact of u is nearly equally likely to belong to any of the sets Ai
• When q = logN – on average each node will have a link in each set of Ai
A4
A3
A2
A1
Ai, consists of all nodes whose distance from u is between 2i and 2i+1, i=0..logN-1.
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 26
Structured Overlay Networks and Kleinberg's model
P-Grid’s model
Kleinberg’s model
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 27
6. Conclusions
• P2P systems
– started out from some "hacker-type" applications
– initiated lots of original research
– basis for novel, highly scalable information systems
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 28
Small Guide to Information Systems courses
• Introduction to information systems (Sem 6, mandatory)Basis of all: relational databases and Web technology, fun project
• Middleware (Masters)Everything you need in industry
• Distributed Information systems (Masters)Fun part: Web, P2P, Search Engines, and much more
©2005, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis P2P Systems - 29
Masters Specialization "Internet Computing"
INFNatural Language Rajman et al 6(4+2) SSINFAdvanced Databases Spaccapietra 6(3+3) SSINFMultimedia Documents Vanoirbeek 6(4+2) SSALG Distributed Inf. Sys. Aberer 6(4+2) WS ALG Intelligent Agents Faltings 6(3+3) WS
ALG Distributed algorithms: Schiper 4(2+1) WS message passingSYS Middleware Guerraoui 6(4+2) SSSYS Performance LeBoudec 6(4+2)
SSSYS Mobile Networks Hubaux 4(2+2) SSSYS Cryptography and Security Vaudenay 6(4+2)
WSHISHuman-Computer Interaction Pu 4(2+1) SSHISEnterprise Architecture Wegmann 6(4+2) WSHISE-Business Pigneur 6(4+2) WS
66 Credits Total