p2p tutorial
TRANSCRIPT
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 1/180
Digital Enterprise Research Institute www.deri.ie
Introduction to Peer-to-Peer
Manfred Hauswirth, Marcel Karnstedt
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 2/180
Di ital Enter rise Research Institute www.deri.ie
Goals of the Tutorial
Position the P2P paradigmin the design space ofdistributed systems
Get an overview of P2P
systems and the underlyingconcepts
decentralized datamanagement in P2Psystems
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 3/180
Di ital Enter rise Research Institute www.deri.ie
What is P2P?
Clay Shirkey (The Accelerator Group):
“
eer- o- peer s a c ass o app ca ons a a e a van age o
resources—storage, cycles, content, human presence—available at the edges of the Internet. Because accessing
environment of unstable connectivity and unpredictable IP
addresses, peer- to- peer nodes must operate outside the DNSand have si nif icant or total autonom of cent ral servers.”
P2P “litmus test:”
– Does it allow for variable connectivity and temporary network
– Does it give the nodes at the edges of the network signif icantautonomy?
~ -
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 4/180
Di ital Enter rise Research Institute www.deri.ie
P2P in a historical Context
The original Internet was designed as a P2P system
– no f irewalls / no network address translation – no asymmetric connections (V.90, ADSL, cable, etc.)
t e ac - t en er apps” an te net are utanyone could telnet/ FTP anyone else
servers acted as clients and vice versa
cooperation was a central goal and “value”: no spam orexhaustive bandwidth consumpt ion
“ - ”
Usenet News
DNS
The emergence of P2P can be seen as a renaissance
of the original Internet model
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 5/180
Di ital Enter rise Research Institute www.deri.ie
What is P2P?
Every participating node acts as botha client and a server (“servent ”)
Every node “pays” its part icipation by
providing access to (some of) itsresources
no central coord ination
no cent ral database
system
global behavior emerges from localinteractions
all existing data and services areaccessible from any peer
peers are autonomous
peers an connec ons are unre a e
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 6/180
Di ital Enter rise Research Institute www.deri.ie
Where is P2P – System layers ?
User
Users
QoSuses
Applicat ion layer E- commerce systems can be P2P
Application
ex loits
or cent ralized
Information management
Information
Management
P2P or centralized
Networks are P2P
Qosexploits Internet
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 7/180
Di ital Enter rise Research Institute www.deri.ie
Types of P2P Systems
E- commerce systems
, , , …
File sharing systems Napster, Gnutella, Freenet, …
Distributed Databases
Mariposa [Stonebraker96], … etwor s
Arpanet
-
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 8/180
Di ital Enter rise Research Institute www.deri.ie
How much P2P is involved?
informationmanagement
interaction
noyesyesaps er
,Freenet
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 9/180
Di ital Enter rise Research Institute www.deri.ie
Related Approaches
Related distributed information system approaches:
Event - based systems
Push systems
Mobile agents
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 10/180
Di ital Enter rise Research Institute www.deri.ie
Event-based (publish/subscribe)
System model
Com onents ( eers) in teract b enerat in and receivin events
Components declare interest in receiving specific (patterns of)events and are not if ied upon their occurrence
Su orts a hi hl f lex ible interact ion between loosel - cou ledcomponents
Subscribe to
XY
X followed by Y
XY
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 11/180
Di ital Enter rise Research Institute www.deri.ie
Event-based vs. Peer-to-Peer
Common properties:
dynamic binding between producers and consumers
Subscript ion to events ~ “passive” queries
EB: notification
P2P: active discovery u scr p on anguage suppor s more sop s ca e
queries and pattern matching (event patterns withtime dependencies)
Event - based systems typically have a specializedevent distribution infrastructure
: 2 no e types, 2 : 1 no e type
EB infrastructure must be deployed
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 12/180
Di ital Enter rise Research Institute www.deri.ie
Push Systems
A set of designatedbroadcasters offer information
that is pre- grouped in channels(weather, news, etc.)
ece vers su scr e o c anne s
of their interest and receivechannel information as it isbein “broadcast” t imeldistribution)
Receivers may have to pay priorto receiving the information
pay- per- v ew, at ee, etc. Pull push
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 13/180
Di ital Enter rise Research Institute www.deri.ie
Push Systems vs. Peer-to-Peer
Asymmetric communication style (P2P: symmetric)
ocus s on t me y ata str ut on not on scovery
Filtering may be deployed to reduce data
Subscript ion to channels is prerequisite
Producer/ consumer binding is stat ic
Push systems require a specialized distribut ioninfrastructure
us : no e ypes, : no e ype Push inf rastructure must be deployed
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 14/180
Di ital Enter rise Research Institute www.deri.ie
Mobile Agents
A mobi le agent is a computat ional ent ityt at moves aroun n a networ at t s
own volit ion to accomplish a task onbehalf of its owner
“learns” (“Whom to visit next?”)
Mobility (heterogeneous network!)
,
Strong: code, data, execution Stack
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 15/180
Di ital Enter rise Research Institute www.deri.ie
Mobile Agents vs. Peer-to-Peer
Very similar in terms of search and navigat ion
,
MA: the nodes propagate the agents
Mobile agent ~ “act ive” query
Mobile agent systems require a considerably moresophisticated environment
securi ty (protect the receiving node from malicious mobileagents and vice versa)
In many domains P2P systems can take over more apt for distributed data management
bandwidth, securit y, etc.)
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 16/180
Di ital Enter rise Research Institute www.deri.ie
Distributed Databases
Fragmenting large databases (e.g., relational) over
Efficient processing of complex queries (e.g., SQL)by decomposing them
Efficient update strategies (e.g., lazy vs. eager)
Consistent transact ions (e.g., 2 phase commit ) orma y approac es re y on cen ra coor na on
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 17/180
Di ital Enter rise Research Institute www.deri.ie
Distributed Databases vs. Peer-to-Peer
Data distribut ion is a key issue for P2P systems
scalability LH* family of scalable hash index structures [Litwin97]
Snowball: scalable storage system for workstation clusters[Vingralek98]
Fat - Bt ree: a scalable B- Tree for arallel DB Yokota 9
Approaches in distributed DB that addressautonomy (and scalabil it y)
Mariposa: distributed relational DBMS based on anunderlying economic model [Stonebraker96]
P2P Data Mana ement has to address bothscalability and autonomy
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 18/180
Di ital Enter rise Research Institute www.deri.ie
Usage Patterns to Position P2P
Discovering informat ion is the predominant problem
ad hoc requests, irregular E.g., new town — where is the next car rental?
,
ot cat on: event- ase systems
not if icat ion for (correlated) events (event patterns)
E. ., not if me when m stocks dro below a threshold
pus
Systematic discovery: P2P systems
f ind certain type of information on a regular basis
search engines, MA
. ., Cont inuous information feed: push systems
subscrip t ion to a certain information type
event-based
E.g., sports channel, updates are sent as soon as available
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 19/180
Di ital Enter rise Research Institute www.deri.ie
The Interaction Spectrum
Event-based s stems Mobile a entsPush systems Peer-to-peer systems
pass ve active
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 20/180
Di ital Enter rise Research Institute www.deri.ie
Peer-to-Peer vs. C/S and Web
-
Session-based
Web-basedPeer-to-Peer
Coupling tight loose very loose
Comm.Style
asymmetric asymmetric symmetric
Clients (1000) (1,000,000) high (1,000,000)
Number of few 10
manynone 0
ervers ,
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 21/180
Di ital Enter rise Research Institute www.deri.ie
Coupling vs. Scalability
p l i n g
C o
session-based
push-based
web-based
-
peer-to-peer
Scalability
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 22/180
Di ital Enter rise Research Institute www.deri.ie
P2P System Models
Centralized model
(single point of failure) direct contact between requestors and providers
xamp e: aps er
Decent ralized model
Examples: Freenet, Gnutella
no g lobal index, no central coordination, g lobal behavior emergesfrom local interact ions, etc.
direct contact between requestors and providers (Gnutella) or
mediated by a chain of intermediaries (Freenet) Hierarchical model
int roduction of “super- peers”
mix of centralized and decentralized model
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 23/180
Di ital Enter rise Research Institute www.deri.ie
Centralized Information Systems
Web search engineClient
Example: Google 150 Mio searches/ day
Client
Client
1- 2 Terabytes of data(April 2001)
1
2Google
Server
Client Client
"aberer"home page of Karl Aberer … ClientClient
Strengths
ClientClientClient
Global ranking Fast response time
Google: 15000 servers Infrastructure, administrat ion, cost
A new company for every global applicat ion ?
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 24/180
Di ital Enter rise Research Institute www.deri.ie
(Semi-)Decentralized Information Systems
P2P Music file sharing
Example: Napster 1.57 Mio. Users 3
Peer
PeerPeerX
10 TeraByte of data(2 Mio songs, 220 songs per user)
(February 2001)
1
Napster
Peer Peer
Find<title> "brick in the wall"
2
ResultRequest and transfer file
erver
PeerPeer
<artist> "pink floyd"<size> "1 MB"<category> "rock" schema you find f.mp3 at peer x.
from peer X directly PeerPeer
Peer
Napster: 100 servers
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 25/180
Di ital Enter rise Research Institute www.deri.ie
Lessons Learned from Napster
Strengths: Resource Sharing Every node “pays” it s part icipat ion by providing access to i ts resources
– p ys ca resources s , ne wor , now e ge anno a ons , owners p es
Every participating node acts as both a client and a server (“servent”): P2P global information system without huge investment
decentralization of cost and administration = avoiding resource bot tlenecks
Weaknesses: Cent ralization server is single point of failure
=
copying copyrighted material made Napster target of legal att ack
increasing degree of resource sharing and decentralization
Centralized
System
Decentralized
System
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 26/180
Di ital Enter rise Research Institute www.deri.ie
Fully Decentralized Information Systems
P2P file sharing • Strengths– Good response time, scalable
Example: Gnutella 40.000 nodes, 3 Mio files
– No infrastructure, no administration
– No single point of failure• Weaknesses
(August 2000) – High network traffic
– No structured search
– Free-riding
Find"brick in the wall"
I have"brick_in_the_wall.mp3"….
Self-organizing System
Gnutella: no servers
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 27/180
Di ital Enter rise Research Institute www.deri.ie
Self-Organization
Self - organized systems well known f rom physics,
distribut ion of control ( = decentralizat ion = symmetry inroles = P2P)
oca n erac ons, n orma on an ec s ons
emergence of global structures
failure resil ience
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 28/180
Di ital Enter rise Research Institute www.deri.ie
P2P Architectures
Principle of self - organizat ion can be applied at
NetworkingLayerInternet Routing TCP/ IP,
DNS
Data AccessLayer
OverlayNetworks
ResourceLocation
Gnutella,FreeNet
Service Layer P2P Messaging, Napster,applicat ions Distributed
ProcessingSeti,Groove
User Layer UserCommunities
Collaborat ion eBay, Ciao
Original Internet designed as decentralized system:
~ - the Internet
support appl icat ion- specif ic addresses
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 29/180
Di ital Enter rise Research Institute www.deri.ie
Resource Location in P2P Systems
Problem: Peers need to locate distr ibuted information
Peers wit h address p store data items d that are identif ied by a key kd
Given a key d (or a predicate on d) locate a peer that stores ,i.e. locate the index information (k
d, p)
Thus, the data we have to manage consists of the key- value pairs (kd, p)
Can such a distributed database be maintained and accessed by aset of peers without central cont rol ?
P1
P2 P3
P4
kd ="jingle" ?
P5
P6P7
P8kd="jingle-bells"p="P8"
d=" jingle-bells.mp3""
("jingle",P8)
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 30/180
Di ital Enter rise Research Institutewww.deri.ie
Resource Location Problem
Operations
search for a key at a peer: p- > search(k)
update a key at a peer: p- > update(k,p')
peers joining and leaving the network: p- > join(p’) Performance Criteria (for search)
. .
message bandwidth, e.g. messages(query) Log(size(database))messages(update) Log(size(database))
storage space used, e.g. storagespace(peer) Log(size(database)) resilience to failures (network, peers)
Qualitat ive Cri teria
complex search predicates: equality, prefix , containment, similaritysearch
use of g lobal knowledge peer autonomy
peer anonymity and t rust
security (e.g. denial of service attacks)
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 31/180
Di ital Enter rise Research Institute www.deri.ie
Summary
What is a P2P System ?
What is emergence ?
At which layers can the P2P architecture occur ?
locat ion system ?
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 32/180
Di ital Enter rise Research Institute www.deri.ie
Unstructured P2P Overlay Networks
No index informat ion is used
i.e. the information (k, ) is onl available directl from
Simplest approach: Message Flooding (Gossiping)
send query message to C neighbors
- -
messages have IDs to eliminate cycles
k="jingle-bells"
Example: C=3, TTL=2
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 33/180
Di ital Enter rise Research Institute www.deri.ie
Gnutella
Developed in a 14 days “quick hack” by Nullsoft (winamp)
Orig inally intended for exchange of recipes
vo ut on o nute a
Published under GNU General Public License on the Nullsoft web server Taken off after a couple of hours by AOL (owner of Nullsoft)
“ ”
Gnutella protocol was reverse engineered from downloaded versions ofthe original Gnutella software
Third- party clients were publ ished and Gnutella started to spread Based on message f looding
Typical values C= 4, TTL= 7
One request leads to messages240,26)1(**20
TTL
i
iC C
least one Gnutella host (gnutellahosts.com:6346; outside the Gnutellaprotocol specification)
Neighbors are found using a basic discovery protocol (ping- pong
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 34/180
Di ital Enter rise Research Institute www.deri.ie
Gnutella: Protocol Message Types
PingAnnounce availability and probe for
other servents
None
Pong Response to a ping IP address and port# of responding servent;
Query Search request Minimum network bandwidth of responding
servent; search criteria
QueryHit Returned by servents that have IP address, port# and network bandwidth of
result set
Push File download requests for
servents behind a firewall
Servent identifier; index of requested file; IP
address and port to send file to
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 35/180
Di ital Enter rise Research Institute www.deri.ie
Gnutella: Meeting Peers (Ping/Pong)
CA
B D
A’s ping
B’s pongC’s pong
D’s pong
s pong
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 36/180
Di ital Enter rise Research Institute www.deri.ie
Gnutella: Searching (Query/QueryHit/GET)
X.m 3GET X.mp3 X.mp3
CA
B D
’ . ., .
C’s query hit
E’s query hit X.mp3
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 37/180
Di ital Enter rise Research Institute www.deri.ie
Popularity of Queries [Sripanidkulchai01]
Very popular documents are approx imately equally popular
Less o ular documents follow a Zi f- like distribut ion i.e.
the probabil ity of seeing a query for the ith most popularquery is proport ional to 1/(ialpha)
Access fre uenc of web documents also follows Zi f- like distributions caching might work for Gnutella
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 38/180
Di ital Enter rise Research Institute www.deri.ie
Free-riding on Gnutella [Adar00]
24 hour sampling period:
o nute a users s are no es
50% of all responses are returned by top 1% of sharing hosts
Problems:
Degradation of system performance: collapse? Increase of system vulnerability
“Centralized” (“backbone”) Gnutella copyright issues?
H1: A signif icant por t ion of Gnutella peers are free riders.
H2: Free riders are distr ibuted evenly across domains
H3: Often hosts share files nobody is interested in (are not
downloaded)
d
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 39/180
Di ital Enter rise Research Institute www.deri.ie
Free-riding Statistics - 1 [Adar00]
H1: Most Gnutella users are free riders
,
22,084 (66%) of t he peers share no f iles
24,347 (73%) share ten or less files
Top 1 percent (333) hosts share 37% (1,142,645) of t otal fi les shared
Top 5 percent (1,667) hosts share 70% (1,142,645) of total f iles shared
Top 10 percent (3,334) hosts share 87% (2,692,082) of t otal fi les shared
F idi S i i 2
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 40/180
Di ital Enter rise Research Institute www.deri.ie
Free-riding Statistics - 2 [Adar00]
H3: Man servents share files nobod downloads
Of 11,585 sharing hosts: Top 1% of sites provide nearly 47% of all answers
Top 25% of sites provide 98% of all answers
7,349 (63%) never provide a query response
T l f G ll
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 41/180
Di ital Enter rise Research Institute www.deri.ie
Topology of Gnutella [Jovanovic01]
Small- world propert ies veri f ied (“f ind everything close by”)
Backbone + outskirts
G t ll B kb
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 42/180
Di ital Enter rise Research Institute www.deri.ie
Gnutella Backbone [Jovanovic01]
C t i f Q i
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 43/180
Di ital Enter rise Research Institute www.deri.ie
Categories of Queries [Sripanidkulchai01]
Categorized top 20 queries
C hi i G t ll
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 44/180
Di ital Enter rise Research Institute www.deri.ie
Caching in Gnutella [Sripanidkulchai01]
Average bandwidth consumpt ion in tests: 3.5Mbps
Best case: trace 2 (73% hit rate = 3.7 t imes traff icreduction)
Gnutella: Bandwidth Barriers
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 45/180
Di ital Enter rise Research Institute www.deri.ie
Gnutella: Bandwidth Barriers
Clip2 measured Gnutella over 1 month:
typ ca query s ts ong nc u ng ea ers
25% of the traff ic are queries, 50% pings, 25% other on avera e each eer seems to have 3 other eers activel
connected
Clip2 found a scalabili ty barr ier with substant ial
10 queries/ sec
* 560 bits/ query
o accoun or e o er quar ers o message ra c
* 3 simultaneous connect ions
67,200 bps
quer es sec max mum n e presence o many a up users
won’t improve (more bandwidth - larger files)
Gnutella: Summary
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 46/180
Di ital Enter rise Research Institute www.deri.ie
Gnutella: Summary
Completely decent ralized
High fault tolerance Adopts well and dynamically to changing peer populations
Protocol causes high network traffic (e.g., 3.5Mbps). For example:
4 connections C / peer, TTL = 7
1 ping packet can cause packets240,26)1(**2
TTL iC C
No est imates on the durat ion of queries can be given
No probabilit y for successful queries can be given
Free riding is a problem
Reputat ion of peers is not addressed
mp e, ro ust, an sca a e at t e moment
Modern Gnutella
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 47/180
Di ital Enter rise Research Institute www.deri.ie
Modern Gnutella
Lots of improvements
Hybrid Super- Peer architecture “Gnutella + DHT”
Improvements of Message Flooding
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 48/180
Di ital Enter rise Research Institute www.deri.ie
Improvements of Message Flooding
Expanding Ring
start search with small TTL (e.g. TTL = 1)
no success terat ve y ncrease e.g. = + 2
k- Random Walkers forward query to one randomly chosen neighbor only, with large
TTL
start k random walkers
random walker periodically checks with requester whether to
continue
Discussion Unstructured Networks
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 49/180
Di ital Enter rise Research Institute www.deri.ie
Discussion Unstructured Networks
Performance
Message Bandwidth: high – improvements through random walkers, but essent ially the
Storage cost: low (only local neighborhood)
Update and maintenance cost: low (only local updates) Resilience to failures good: mult iple paths are explored
and data is repl icated
Quali tat ive Cri teria search predicates: very flex ible, any predicate is possible
global knowledge: none required
peer autonomy: high
Summary
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 50/180
Di ital Enter rise Research Institute www.deri.ie
Summary
How are unstructured P2P networks characterized ?
What is the purpose of the ping/ pong messages inGnutella ?
Why is search latency in Gnutella low ?
Which are methods to reduce message bandwidth
Hierarchical P2P Overlay Networks
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 51/180
Di ital Enter rise Research Institute www.deri.ie
Hierarchical P2P Overlay Networks
Servers provide index information, i.e. the
servers
Simplest Approach
one central server
user register f iles
index server
as P2P archi tecture
k="jingle-bells"
Napster
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 52/180
Di ital Enter rise Research Institute www.deri.ie
Napster
Central (virtual) database which holds an index of offeredMP3/ WMA files
Clients connect to this server, identify themselves (account)
and send a list of MP3/ WMA fi les they are sharing (C/ S) Other clients can search the index and learn from which
Addit ional services at server (chat etc.)
register
(user, files)
A B“A has X.mp3”
Download X.mp3
Superpeer Networks
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 53/180
Di ital Enter rise Research Institute www.deri.ie
Superpeer Networks
Improvement of Central Index Server (Morpheus, Kaaza)
mult iple index servers build a P2P network
clients are associated with one (or more) superpeers
superpeers use message flooding to forward search requests
Experiences
redundant superpeersare goo
superpeers should havehigh outdegree (> 20)
Discussion
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 54/180
Di ital Enter rise Research Institute www.deri.ie
Discussion
Performance
Message Bandwidth: low – with superpeers flooding occurs, but the number of
Storage cost: low at client, high at index server
Update cost: low (no repl icat ion)
Resilience to failures: bad (system has single- point offailure)
Quali tat ive Cri teria search predicates: very flex ible, any predicate is possible
global knowledge: server
peer autonomy: low
Summary
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 55/180
Di ital Enter rise Research Institute www.deri.ie
y
Which are the two levels of P2P networks in
are they related ?
Which problem of distribut ion is avoided insuperpeer networks and addressed in structured
between nodes and functional layers ?
Structured P2P Overlay Networks
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 56/180
Di ital Enter rise Research Institute www.deri.ie
y
Unstructured overlay networks – what we learned
robustness (almost impossible to “ki ll” – no centralauthority)
er ormance
search latency O(log n), n number of peers
Drawbacks
t remendous bandwidth consumpt ion for search
free rid ing
Efficient Resource Location
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 57/180
Di ital Enter rise Research Institute www.deri.ie
update cost
hi h
FULL REPLICATION
low STRUCTURED P2P OVERLAY
search cost
lowlow
high
(e.g. prefix routing)
maximal bandwidthhighUNSTRUCTURED P2POVERLAY NETWORKS
(e.g. Gnutella)
(e.g. Napster)
Distribution of Index Information
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 58/180
Di ital Enter rise Research Institute www.deri.ie
Goal: provide eff icient search using few messages without usingdesignated servers
, . .maintains and provides part of the index information (k, p)
Diff icult: d istribut ing the data access structure to support eff icientsearch
server
Search starts here
Where to start the search?
data access
index information I
structure
eers storin data and index information
I1 I2 I3 I4
peers (storing data)
Approaches
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 59/180
Di ital Enter rise Research Institute www.deri.ie
Different strategies
-
Chord: constructing a distr ibuted hash table CAN: Rout ing in a d- dimensional space
Freenet: caching index information along search paths
Commonalities
eac peer ma n a ns a sma par o e n ex n orma on(rout ing table)
searches performed by d irected message forwarding
Differences performance and qualitative criteria
P-Grid
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 60/180
Di ital Enter rise Research Institute www.deri.ie
Search tree (prefix tree)?
0?? 1??data 101
00? 01? 10? 11?101
?
000 001 010 011 100 101 110 111
!101N objects log2(N) steps
Scalable data access structures
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 61/180
Di ital Enter rise Research Institute www.deri.ie
Assume number of data objects > > storage of one
Distributed storage
Given a data access structure
Size of data access structure = number of data objects
ze o a a access s ruc ure > > s orage o one no e
Non-scalable Distribution of Search Tree
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 62/180
Di ital Enter rise Research Institute www.deri.ie
• Distribute search tree over peers
???
0?? 1??
00? 01? 10? 11?
000 001 010 011 100 101 110 111
peer 1 peer 2 peer 3 peer 4
Scalable Distribution of Search Tree
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 63/180
Di ital Enter rise Research Institute www.deri.ie
"Napster"
bottleneck???
0?? 1??
00? 01? 10? 11?
000 001 010 011 100 101 110 111
peer 1 peer 2 peer 3 peer 4
Scalable data access structures
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 64/180
Di ital Enter rise Research Institute www.deri.ie
Associate each peer with a complete path
???
0?? 1??
00? 01? 10? 11?
000 001 010 011 100 101 110 111
peer 1 peer 2 peer 3 peer 4
Scalable data access structures
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 65/180
Di ital Enter rise Research Institute www.deri.ie
???
1??peer 1 peer 2
10? eer 4
knows more aboutthis part of the tree
100 101
knows more aboutthis part of the tree
peer 3
The result is P-Grid
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 66/180
Di ital Enter rise Research Institute www.deri.ie
101Peers cooperate in search
???101
?
11?
1??peer 1 peer 2
peer 3
???
101 Message
peer 4
110 1111??peer 1 peer 2
?101
to peer 3
101 ?
100 101
10? peer 4
101!
peer 3
Construction
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 67/180
Di ital Enter rise Research Institute www.deri.ie
Split t ing Approach (P- Grid)
peers meet and decide whether to extend search t ree by split t ingt e ata space
peers can perform load balancing considering their storage load
networks with dif ferent origins can merge, like Gnutella, Freenet
Node Insert ion Approach (Chord, CAN, …)
" "
nodes route from a gateway node to their node- id to populatethe routing table
Repl icat ion of data items and rout ing table entries is used toincrease failure resil ience
P-Grid Discussion
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 68/180
Di ital Enter rise Research Institute www.deri.ie
Performance
,
Message Bandwidth: O(log n) (selective routing)
Storage cost: O(log n) (rout ing table)
Update cost: low (like search)
search predicates: prefix searches
global knowledge: key hashing
peer autonomy: peers can locally decide on their role(spli t t ing decision)
DHT example: Chord
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 69/180
Di ital Enter rise Research Institute www.deri.ie
Hashing of search keys AND peer addresses on binary keys oflength m e.g. m= 8, key("jingle- bells.mp3")= 17, key(196.178.0.1)= 3
Data keys are stored at next larger node key
,data with hashed identifier k, thenk ] predecessor(p), p ]k
m=832 keys
storedat
predecessor
p2
Search possibilities1. every peer knows every other
O(n) routing table size
p3.
O(n) search cost
l h
Routing Tables
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 70/180
Di ital Enter rise Research Institute www.deri.ie
Every peer knows m peers with exponent iallyincreasing distance
Eac peer p stores a routing ta e
First peer with hashed identifier si such thatsi =successor(p+2i-1) for i=1,..,m
e wr e a so si = nger , ppp+2
p+4
p+1
+
s1, s2, s3
s
p2 1 p2
2 p2
s4
3
p
4 p3
5 p4
p+16Search
O(log n) routing table sizep4
Di it l E t i R h I tit t d i i
Search
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 71/180
Di ital Enter rise Research Institute www.deri.ie
search(p, k)
find in routing table largest (i, p*) such that p* [p,k[
/* largest peer key smaller than the searched data key */
if such a p* exists then search(p*, k)else return (successor(p)) // found
pp+2
p+4
p+1
s1, s2, s3
Search
p2
p+
s4
5
k1k2O(log n) search cost
RT with exp. increasing
p+16
probability
p4
Di ital Enter rise Research Institute www deri ie
Node Insertion
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 72/180
Di ital Enter rise Research Institute www.deri.ie
New node q joining the network
q asks ex ist ing node p to f ind predecessor and fingers
cost: O(log2 n)
p p+2p+1 qp+
p2 routing table
of
routing table
ofp+8 i si
1 q
i si
1 p2
p3p4
2 q
3 p2
2 p2
3 p3
4 3p+16
5 p4 5 p4
Di ital Enter rise Research Institute www deri ie
Load Balancing in Chord
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 73/180
Di ital Enter rise Research Institute www.deri.ie
^
5 10^5 keys
uniform data distribution
50 keys per node?
NO, as IP addresses do
not map uniformly into
data ke s ace.
Di ital Enter rise Research Institute www deri ie
Length of Search Paths
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 74/180
Di ital Enter rise Research Institute www.deri.ie
^
100 2^12 keys
Path length ½ Log2(n)
RTs can be seenas an embeddingof search treesinto the network
an t us searc
starts at a randomlyselected tree depth
Di ital Enter rise Research Institute www.deri.ie
Chord Discussion
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 75/180
ta te se esea c st tute www.deri.ie
Performance
-
Node join/ leave cost: O(log2 n)
Resilience to failures: replicat ion to successor nodes
Quali tat ive Cri teria
searc pre ca es: equa y o eys on y
global knowledge: key hashing, network orig in
peer autonomy: nodes have by vir tue of their address aspecific role in the network
Di ital Enter rise Research Institute www.deri.ie
Topological Routing (CAN)
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 76/180
Based on hashing of keys into a d-dimensional space (a torus)
Each peer is responsible for keys of a subvolume of the space (a zone)
Each peer stores the adresses of peers responsible for the neighboringzones for rout ing
Search requests are greedi ly forwarded to t he peers in the closest zones
Assignment of peers to zones depends on a random select ion madeby the peer
Di ital Enter rise Research Institute www.deri.ie
Network Search and Join
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 77/180
o e o ns e ne wor y c oos ng a coor na e n e vo ume o=> O(d) updates or RTs
Di ital Enter rise Research Institute www.deri.ie
CAN Refinements
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 78/180
Multiple Realities
Nodes hold a zone in each of them
Creates r replicas of the (key, value) pairs
Increases robustness
Reduces path length as search can be cont inued in the
realit where the tar et is closest
Overloading zones
Different peers are responsible for the same zone
Splits are only performed i f a maximum occupancy (e.g. 4)
Nodes know all other nodes in the same zone
But only one of the neighbors
Di ital Enter rise Research Institute www.deri.ie
CAN Path Length
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 79/180
Di ital Enter rise Research Institute www.deri.ie
CAN Discussion
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 80/180
Performance1/ d ,
probabil ity, provable)
Message Bandwidth: O(d n1/ d), (select ive rout ing)
Update cost: low (like search)
Node join/ leave cost: O(d n1/ d)
Resilience to failures: reali t ies and overloading
Qualitat ive Criteria
search predicates: spatial distance of mult idimensional keys
global knowledge: key hashing, network origin
space
Di ital Enter rise Research Institute www.deri.ie
Dynamical Clustering (Freenet)
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 81/180
Freenet Background
P2P system which supports publicat ion, repl icat ion, and retr ieval
Protects anonymity of authors and readers: infeasible to
determine the orig in or dest ination of data Nodes are not aware of what they store (keys and f iles are sent
and stored encrypted)
Uses an adaptive routing and caching strategy
Key Data Address
8e47683isdd0932uje89 ZT38hwe01h02hdhgdzu tcp/125.45.12.56:6474
456r5wero04d903iksd0 Rhweui12340jhd091230 tcp/67.12.4.65:4711
f3682jkjdn9ndaqmmxia eqwe1089341ih0zuhge3 tcp/127.156.78.20:8811
wen09hjfdh03uhn4218 erwq038382hjh3728ee7 tcp/78.6.6.7:2544712345jb89b8nbopledh tcp/40.56.123.234:1111
d0ui43203803ujoejqhh tcp/128.121.89.12:9991
Di ital Enter rise Research Institute www.deri.ie
Freenet Routing
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 82/180
If a search request arrives
Either the data is in the table
r t e request s orwar e to t e a resses w t t e mostsimi lar keys (lexicographic similarit y, edit distance) til l an answer
is found or TTL reached (e.g. TTL = 500)
If an answer arrives
The key, address and data of the answer are inserted into thetable
Qualit y of rout ing should improve over t ime
Node is l isted under certain ke in routin tables
Therefore gets more requests for similar keys
Therefore tends to store more entries with similar keys(clustering) when receiving result s and caching them
ynam c rep ca on o a a
Di ital Enter rise Research Institute www.deri.ie
Freenet Routing
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 83/180
peer p has k
eer ' has k'
search k
response (k,p)
new link established
' ',
Di ital Enter rise Research Institute www.deri.ie
Freenet: Inserting Files
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 84/180
First a the key of the file is calculated
An insert message with this proposed key and ahops- to- live value is sent to the neighbor with themost similar key
en every peer c ec s w e er e propose eyis already present in it s local store
es return stored f ile or i inal re uester must ro osenew key)
no route to next peer for further checking (rout ing usesthe same ke sim ilarit measure as searchin
continue until hops- to- live are 0 or failure
Di ital Enter rise Research Institute www.deri.ie
Freenet: Evolution of Path Length
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 85/180
1000 identical nodes
max 50 data
max 200
references/node Initial references:
(i-1, i-2, i+1, i+2) mod n
each time-step:
- randomly insert
- TTL=20
every 100 time-steps:300 requests
median path length 500 6=
random nodes andmeasure actual path
length (failure=500).
Di ital Enter rise Research Institute www.deri.ie
Freenet Discussion
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 86/180
Performance
Message Bandwidth: low (select ive rout ing)
Storage cost: relatively low (experimentally not validated !)
Update cost: low (like search)
– but a bootstrapping phase is required
data and keys)
ua a ve r er a
search predicates: with encrypt ion only equali ty of keys
lobal knowled e: none
peer autonomy: high (with encrypt ion r isk of stor ingundesired data)
Di ital Enter rise Research Institute www.deri.ie
Comparison
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 87/180
Paradi m Search T eSearch Cost
messages
Gnutella Breadth-first String2* *( 1)
TTL iC C
FreenetDepth-first
search on graphEquality O(Log n) ?
ChordImplicit binary
search treesEquality O(Log n)
d-dimensional^
space
P-Grid Binary prefixtrees
Prefix O(Log n)
Di ital Enter rise Research Institute www.deri.ie
Small World Graphs
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 88/180
Each P2P system can beinter reted as a directed ra h(overlay network)
peers correspond to nodes
links
Find a decentralized algorithm(greedy rout ing) to route amessage rom any no e to any
other node B with few hopscompared to the size of t he graph
Requires the ex istence of shortpaths in the graph
Di ital Enter rise Research Institute www.deri.ie
Milgram’s Experiment
Fi di h h i f i
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 89/180
Finding short chains of acquaintanceslinking pairs of people in USA who didn’tknow each other;
Source person in Nebraska
Target person in Massachusetts.
Average length of the chains that werecompleted was between 5 and 6 steps
“Six degrees of separat ion” principle
BIG QUESTION: WHY there should be short chains of ac uaintances
linking together arbit rary pairs of strangers???
Di ital Enter rise Research Institute www.deri.ie
Random Graphs
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 90/180
For many years typical explanation was - random
Low diameter: expected distance between two nodes is
logkN, where k is the outdegree and N the number of nodes
en pa rs or ver ces are se ec e un orm y a ran omthey are connected by a short path with high probabil ity
But there are some inaccuracies
If A and B have a common f riend C it is more likely that theythemselves will be friends! (clustering)
Many real world networks (social networks, biologicalnetworks in nature, art if icial networks – power grid, WWW)
exhibit this clustering property
Random networks are NOT clustered.
Di ital Enter rise Research Institute www.deri.ie
Clustering
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 91/180
Clustering measures the fract ion of neighbors of a node thatare connected themselves
but also a high d iameter
Random Graphs have a low clustering coeff icient
Both models do match the properties expected from realnetworks!
Random Graph (k= 4)
Short ath length
Regular Graph(k=4)
L~logkN
Almost no clustering
~
Long paths
L ~ n/ (2k)
C~3/ 4
Di ital Enter rise Research Institute www.deri.ie
Small-World Networks
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 92/180
Random rewir ing of regular graph (by Watts and
With probabil ity p rewire each link in a regular graph to a
randomly selected node
esu ng grap as proper es, o o regu ar anrandom graphs
– High clustering and short path length
Freenet has been shown to result in small world graphs
Di ital Enter rise Research Institute www.deri.ie
Flashback: Freenet Search Performance
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 93/180
Modifying rout ing tables in Freenet through" "
Studies show that Freenet graphs have small- world
properties
Explains improving search performance
Regular graph:
n nodes, k nearest neighbors
path length ~ n/2k
4096/16 = 256
Random graph:
path length ~ log (n)/log(k)
~ 4
Rewired graph (1% of nodes):
path length ~ random graph
clustering ~ regular graph
Small World Graph
Di ital Enter rise Research Institute www.deri.ie
Search in Small World Graphs
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 94/180
BUT! Watts- Strogatz can provide a model
existence of short paths
It does not explain how the shortest paths
also Gnutella networks are small- world graphs
Di ital Enter rise Research Institute www.deri.ie
P2P Overlay Networks as Graphs
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 95/180
Each P2P system can be
graph …
peers correspond to nodes
rout ing table entries as directedlinks
… embedded in some space
P- Grid: interval [0,1]
Chord: ring [0,1)
CAN: d- dimensional torus
lexicographical d istance
Di ital Enter rise Research Institute www.deri.ie
Kleinberg’s Small-World Model
Kl i b ’ S ll W ld’ d l
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 96/180
Kleinberg’s Small- World’s model
Embed the graph into an r- dimensional grid
constant number p of short range links (neighborhood)
q long range links: choose long- range links such that the probability
to have a long range contact is proportional to 1/ dr
Decentralized (greedy) rout ing performs best if f. r = dimension ofspace
r = 2
Di ital Enter rise Research Institute www.deri.ie
Influence of “r” (1)
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 97/180
• Each peer u has link to the peer v with probability proportional to r vud ),(
1
, .
• Optimal value: r = dim = dimension of the space
algorithm can quickly approach the neighborhood of target, but then slowsdown till finally reaches target itself).
• If r > dim we tend to choose more close neighbors (algorithm finds quickly
’arge n s ne g or oo , u reac es s ow y s ar away .• When r = 0 – long range contacts are chosen uniformly. Random graph theory
proves that there exist short paths between every pair of vertices, BUT there is no decentralized algorithm capable finding these paths
Di ital Enter rise Research Institute www.deri.ie
Influence of “r” (2)
Given node u if we can part it ion the remaining peers into sets A1 A2
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 98/180
Given node u if we can part it ion the remaining peers into sets A1, A2 ,A3, … , AlogN , where Ai , consists of all nodes whose distance from u is
i i+1 - .. .
Then given r = dim each long range contact of u is nearly equally likely to
belong to any of t he sets Ai
When q = log N – on average each node will have a link in each set of A
A1
A3
2
4
Di ital Enter rise Research Institute www.deri.ie
DHTs and Kleinberg model
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 99/180
P- Grid ’s
Kleinberg’smodel
Di ital Enter rise Research Institute www.deri.ie
Conclusions from Kleinberg's Model
With respect to the Watts and Strogatz model
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 100/180
With respect to the Watts and Strogatz model
there is no decentralized algorit hm capable performing effective search
J. Kleinberg presented the inf inite family of Small World networks thatgeneralizes the Watts and Strogatz model and shows that decentralizedsearch algorithms can find short paths with high probability
ere ex s on y one un que mo e w n a am y or w cdecentralized algorithms are effective.
With respect to overlay networks
Many of the structured P2P overlay networks are similar t o Kleinberg’s
model (e.g. Chord, randomized version, q= log N, r= 1)
Unstructured overlay networks also fi t into the model (e.g. Gnutella q= 5,r=0)
Some variants of structured P2P overla networks are havin noneighborhood lattice (e.g. P- Grid , p= 0)
Extensions to spaces beyond regular gr ids are possible (e.g. arbit rarymetric spaces)
Di ital Enter rise Research Institute www.deri.ie
Summary
How can we characterize P2P overlay networks such that we
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 101/180
How can we characterize P2P overlay networks such that wecan study them using graph- theoretic approaches?
What is the main di f ference between a random graph and aSW ra h?
What is the main dif ference between the Watts/ Strogatz and
What is the relat ionship between structured overlay networks
What are possible variat ions of the small world graph model?
Di ital Enter rise Research Institute www.deri.ie
Specific problems: Identity management
D fi iti
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 102/180
Definition:
ons s en mapp ng o a se o a r u es on o anidentifier in a unique, deterministic, and secure way
Ident if icat ion is an essent ial building block indistr ibuted (information) systems
xamp es: rec ory serv ces DNS: symbolic host names IP address
X.500: dist in uished name ob ect at tributes
UDDI: query
web service specificat ion
Di ital Enter rise Research Institute www.deri.ie
Identity management issues
Data management
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 103/180
Data management
Centralized vs. distr ibuted data management
– Degree of decentralizat ion (orthogonal to the distr ibut ion of
Update consistency
Security
Access perm issions (+ management)
Requires unique identi f icat ion
es ence aga ns a ac s
Infrastructure Third art
Scalabili ty, robustness, 24/ 7 availabil ity, etc.
Di ital Enter rise Research Institute www.deri.ie
Use case: Dynamic IP addresses
Most computers on the Internet have a dynamic IP
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 104/180
Most computers on the Internet have a dynamic IP
Limited number of IP addresses
Dynamic Host Configuration Protocol (lease time)
Host mobil ity (physical mobil ity)
Problem for any system that bui lds a distr ibuted
Use Case:
P2P systems (Chord, DKS, Pastry, P-Grid, etc.)
Di ital Enter rise Research Institute www.deri.ie
P2P systems and dynamic IP addresses
Structured P2P systems (Chord Pastry P- Grid DKS
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 105/180
Structured P2P systems (Chord, Pastry, P Grid, DKS,
These systems construct a distr ibuted index routing
tables ynam c a resses rou ng a es ecome
inconsistent system can break down
Unstructured (Gnutella) and hierarchical (FastTrack)
systems
Less of a problem
,
– high bandwidth consumption, or – single point of failures, or
– ,
– etc.
Di ital Enter rise Research Institute www.deri.ie
Problems to address
How to f ind out that an IP address has become invalid?
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 106/180
No res onse
– Network problem or did the peer get a new address?
Response – Is i t sti l l the same eer? (authenticit , re la , man- in- the-middle att acks)
Frequency of address changes is crucial
Peers can join and leave at any moment
IP address can change at any moment
Security: DOS attacks are very simple
EvilHacker.org part icipates in the overlay and thus f indsout IP addresses
.random hosts or itself
A scalable and secure infrastructure is required
Di ital Enter rise Research Institute www.deri.ie
Problems of current P2P approaches
Third- party infrastructures are required
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 107/180
Third party infrastructures are required
Maintenance protocols may compromise structural
propert ies (e.g., load- balancing)
Previous knowledge is lost (e.g., reputat ion of thepeer, QoS, etc.)
o curren approac a resses secur y Only the owner should be allowed to update the mapping
DOS re la man- in- the- middle etc. are not addressed
Di ital Enter rise Research Institute www.deri.ie
IPv6 as an alternative
With IPv6 dynamic addresses or NAT are no longer
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 108/180
y g
IPv6 address space is ~3,4 * 1038 (or 1030 addresses per
person on the planet) v curren a ress space s
IPsec (included in IPv6)
solves authent icat ion roblem
DOS attacks are more di f f icult
Mobili ty is addressed
IPv6: home/ foreign address
IPv4: mobi li ty extension but not suppor ted on a large scale
Di ital Enter rise Research Institute www.deri.ie
DNS as an alternative
DNS extensions
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 109/180
DNS could support secure updates
Problems – ery eavy- we g
– Configuration is very diff icult and error- prone
– Not for the “normal user” (as in a P2P system)
DynDNS (and similar services) Hosts maintain a consistent name/ address mapping in a
special DNS domain via a special client
Problems
– Centralized scalability
–
Di ital Enter rise Research Institute www.deri.ie
Basic idea
DNS FQDN IP addressstatic
i
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 110/180
DNS FQDN IP addressmapping
lookup IPaddress
Index
routing based on
FQDN (any overlay)
- r
Data
routing based on
lookup IPaddress
logical identifier
P-Grid
logical identifier IP address
DYNAMIC mapping
Di ital Enter rise Research Institute www.deri.ie
Informal discussion
Use unique logical ident if iers (UUIDs) instead of physicalid ifi (IP dd )
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 111/180
identifiers (IP addresses):
Peer identifiers
Routing based on UUIDs
the logical and the physical ident if ier
- Rate of changes < self- healing rate
Dynamic equi libr ium
Advantages: General identification facility disentangling logical identifiers
from network structure
Tracking of chances (for example, reputat ion)
Di ital Enter rise Research Institute www.deri.ie
Maintenance and routing
Universal Unique Ident if ier (UUID) are generatedlocally
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 112/180
locally
Cryptographically secure hash function global
uniqueness
Index / rout ing tables: UUID
Peers maintain an up- to- date UUID- IP mappings in P- Grid
ou ng Peers cache known UUID- IP mappings
Ma in ex ists in cache ident it of tar et eer is
checked before forwarding
No mapping is known query P- Grid for mapping
Di ital Enter rise Research Institute www.deri.ie
Security issues
Peer generates publ ic/ pr ivate key
UUID bli k P G id
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 113/180
UUID- public key P- Grid
Why is it secure?
Data in P- Grid is stored at a number of random replicas
Hard to attack
Request are – signed (private key) authenticity & access permissions
– time- stamped no replay possible Quorums are required for each request
et ter secur ty t an
Quorums Independent, random paths avoid weakest link problem
113
Revoke and update of securit y relevant information is possible
Di ital Enter rise Research Institute www.deri.ie
Directory maintenance: self-healing
Repair strategies
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 114/180
Lazy repair: repair a routing table when all references at one level
become stale
DNS
lookup IP lookup IP address
routing based onlogical identifier
rou ng ase onFQDN (any overlay)
maintain logical identifier IP address
mapping: eager or lazy
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 115/180
Di ital Enter rise Research Institute www.deri.ie
Eager repair strategy
System is in a dynamic equilibr ium if the rate of changes due tochanging mappings and t he rate of repairs is equal
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 116/180
LHS
rup changes per 1- rup queries
Nrec – 1 addi tional recursive queries
Repair makes sense only if t he rout ing ent ry to be repaired
A repair is possib le only if recursive query succeeds
RHSRate of entries turning stale
rup c anges
1- pdyn probabil ity of non- stale references (only these can turn stale)
r references at each peer for each of log2n levels
Di ital Enter rise Research Institute www.deri.ie
Lazy repair strategy
Repair only if all references of a level are stale
Not all rout ing entr ies are treated uniform ly
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 117/180
Not all rout ing entr ies are treated uniform ly
The number of stale ent ries for each rout ing level at each peerdefines the state of that level
Markovian model
0 ref stale
1 ref stale
2 ref stale
r ref stale
…
IDchange
IDchange
IDchange
IDchange
repairs
Dynamic equi librium equation
inflow = outf low (for each state) At dynamic equil ibr ium, the number of rout ing levels with
given number of stale ent ries over the whole system shouldnot change
N.B. We distinguish stale entries from offline peers
Di ital Enter rise Research Institute www.deri.ie
Lazy repair: Analytical vs.simulation results
Number of messages vs. rate of change (N=128,256,512,1024,replication factor is 8)
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 118/180
80
Lazy Rec., Mess. vs. r_up for p_on=1 n=N 8 ,
sim,N=256ana,N=128
sim,N=128
rup60
sim,N=1024
ana,N=512sim,N=512ana,N=256
40
,
20
0.025 0.05 0.075 0.1 0.125 0.15r_up
Di ital Enter rise Research Institute www.deri.ie
Effect of pon on stability andmessage overhead
In networks with more online peers the lazy strategy is advantageousbut collapses earlier
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 119/180
1200
Msg Lazy vs. Eager rec. r_up=0.2, lg_2 n =5
Directory "unstable"
800
Eager rec
Lazy rec
400
600
200
Directory "stable"
0.3 0.4 0.5 0.6 0.7 0.8 0.9 _
Di ital Enter rise Research Institute www.deri.ie
Summary
Decentralized, self - maintaining, light- weight , and
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 120/180
secure rec ory serv ce
Robust and applicable in unreliable environments
Contributions
Logical independence of identi ty from network properties
General approach for ident if icat ion Structural propert ies are maintained
Dynamic resilience of a P2P system under “churn”
Di ital Enter rise Research Institute www.deri.ie
GridVine: Peer Data Management
Searching semantically richer objectsin lar e scale hetero eneous networks
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 121/180
in lar e scale hetero eneous networks<xap:CreateDate>2001-12-
19T18:49:03Z</xap:CreateDate>
<xap:ModifyDate>2001-12-
date?
<es:DofCreation> 05/08/2004 </es:DofCreation>
?
? ? ?
<myRDF:Date> Jan 1, 2005 </myRDF:Date>
➠ Lack of semantic interoperability
Di ital Enter rise Research Institute www.deri.ie
Information Heterogeneity
Syntactic discrepancies
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 122/180
Semantic hetero eneit
mage c a e
A0657B25 05.08.04<es:cDate> 05/08/2004 </es:cDate>VS
Extensible standards (XML, RDF, XMP, PSA, WinFS...)
<rdf:Property rdf:ID="width"> <rdf:Property rdf:ID=“Length-Y">
<rdfs:subPropertyOf rdf:resource="#length"/>
</rdf:Property>
-
<rdfs:subPropertyOf rdf:resource="#length"/>
</rdf:Property>VS
. .,protein information, picture metadata)
➠ Shared re resentation is not enou h
Di ital Enter rise Research Institute www.deri.ie
Integrating Data in DistributedDatabases
The Wrapper- Mediator architecture
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 123/180
D a t e
D a t
Di ital Enter rise Research Institute www.deri.ie
Integrating Data in the new WebEcology
Mediated ArchitecturesMediated Architectures Large Scale Information SystemsLarge Scale Information Systems(e.g., WWW))(e.g., WWW))
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 124/180
ScaleScale Number of sources < 100 Number of sources > 1000
UncertaintyUncertainty Consistent Data-
Uncertain Data-
- Manually curated data
Schemas created byadministrators
- Semi- automatic creation of data
Schemas created by end users
DynamicityDynam icit y Relat ively stable set of sources- stable mediator
Sources known a priori
Network churn- node fai lures
Unknown sources
ExpressivenessExpressiveness Relat ional Data
Structured Schemas- Integrity constraints
Semi- structured data
Schematas- Few integrity constraints
-
Di ital Enter rise Research Institute www.deri.ie
Decentralized Interoperability
Q1=
<GUID>$p/GUID</GUID>
FOR $p IN /Photoshop_Image
Q2=
<GUID>$p/GUID</GUID>
FOR $p IN T12
p rea or o
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 125/180
p rea or o
Photoshop(own schema)
WinFS
known schema
p rea or o
<Photoshop_Image>
<GUID>178A8CD8865</GUID>
<Creator>Robinson</Creator>
<Subject>
<WinFSImage>
<GUID>178A8CD8866</GUID>
<Author>
<Dis la Name>=<Bag>
<Item>
Tunbridge Wells
</Item>
<Item>Royal Council</Item>
Henry Peach Robinson<DisplayName>
<Role>Photographer</Role>
<Author>
<Keyword>
<Photoshop_Image><GUID>$fs/GUID</GUID>
<Creator>
$fs/Author/DisplayName
</Subject>
…
</Photoshop_Image>
Tunbridge
</Keyword>
<Keyword>Council</Keyword>
…
</WinFSImage>
</Creator>
</Photoshop_Image>
FOR $fs IN /WinFSImage
➠ Extending integrat ion techniques to decent ralized sett ings
Di ital Enter rise Research Institute www.deri.ie
Peer Data Management Systems
<xap:CreateDate>2001-12-
19T18:49:03Z</xap:CreateDate>
<xa :Modif Date>2001-12-
date?
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 126/180
19T20:09:28Z</xap:ModifyDate>
es:cDate xap:CreateDate
<myRDF:Date> Jan 1, 2005
</myRDF:Date>
Pairwise mappings
Peer Data Management Systems (PDMS)Local mappings overcome global heterogeneity
Iterative query reformulation
Di ital Enter rise Research Institute www.deri.ie
3-Tier Network
Semantic
e a on
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 127/180
e a on
Layer
Jupp /
P-Grid
DHTs
Overlay
Layer
Internet
Layer
Di ital Enter rise Research Institute www.deri.ie
Data-Centric P2P Systems
Piazza / Hyperion
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 128/180
– LAV- style query reformulation in P2P sett ings?
Network- intensive – arge- sca e ep oymen
Perfect mappings
PIER
Scales a relational engine on top of a DHT
Fixed schema
RDFPeers
Indexes RDF triples in a DHT
Di ital Enter rise Research Institute www.deri.ie
Conclusions
More and more machine- processable (semi- structured)data available
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 129/180
Peer Production
Human Computat ion
Top- down efforts to align data have failed largely
SUMO
Emergent SemanticsBottom- up
Best-Effort
➠ Only resort to foster interoperabili ty in the large scaleecen ra ze a a spaces curren y emerg ng
Di ital Enter rise Research Institute www.deri.ie
Credits
Karl Aberer
-
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 130/180
Anwitaman Datta
Roman Schmidt
Di ital Enter rise Research Institute www.deri.ie
Are you still awake?
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 131/180
Digital Enterprise Research Institute www.deri.ie
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 132/180
Introduction to Peer-to-Peer–
Manfred Hauswirth, Marcel Karnstedt
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Di ital Enter rise Research Institute www.deri.ie
Outline – Part 2
The Vision: A Universal Storageor e a a
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 133/180
or e a a
A Distributed Universal Storage
Operators
Query Engine
Mappings & Query Expansion
The Praxis: Implementation
133 of XYZ
Di ital Enter rise Research Institute www.deri.ie
Outline – Part 2
The Vision: A Universal Storageor e a a
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 134/180
o e a a
A Distributed Universal Storage
Operators
Query Engine
Mappings & Query Expansion
The Praxis: Implementation
134 of XYZ
Di ital Enter rise Research Institute www.deri.ie
Examples
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 135/180
, ,dynamic, structured, deep, l inked, …
Clique project:
Integrated structrured storage
Data pre- processing, integration, restructuring, index ing
135
Data access API and query processor for complex queries
135 Marcel Karnstedt IFIP Database Meeting Nicosia, Cyprus, 2009
Di ital Enter rise Research Institute www.deri.ie
Public Data Management
....Semantic & social Web, encyclopedias,“ “
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 136/180
Datasets, which are
Maintained by large communities in a distributed way Of public interest
„Homogenized“ database, extensible and flexible,
136 of XYZ
, ,
Di ital Enter rise Research Institute www.deri.ie
Main Challenges
Data management
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 137/180
Security, trust, fairness
Guarantees consistenc inte rit
CAP- Theorem [Gilbert et al. 2002], ACID vs. BASE
Query expressiveness
DB- like queries with advanced funct ionalit y Support of IR queries and simi larity is mandatory
-
Efficient processing
Eff icient query operators
137 of XYZ
Cost awareness in changing situations
Di ital Enter rise Research Institute www.deri.ie
Approaches
Who pays the load?
Who owns the data?
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 138/180
views over 100.000
Efficientquery processing?Do we trust them?
Sindice, YARS
Jena, OracleSW- Store
PIER, PeerDB
AmbientDBRDFPeers
138 of XYZ
UniStore
Di ital Enter rise Research Institute www.deri.ie
Outline – Part 2
The Vision: A Universal Storageor e a a
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 139/180
A Distributed Universal Storage
Operators
Query Engine
Mappings & Query Expansion The Praxis: Implementation
139 of XYZ
Di ital Enter rise Research Institute www.deri.ie
Influences
Robustness,self-organization,
sca a ty
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 140/180
Efficient
lookups
P2P paradigm
PDMSDHTs &
SDDS
Index structures Distributed DBSSensor
networks
Transparency,
query processing
Data streams
140 of XYZ
Di ital Enter rise Research Institute www.deri.ie
The Big Picture
Who wrote an article for cool
Who wrote an article for cool
DB
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 141/180
DB
Wikipedia(article,author)“Pulp Fiction”,”MK”Wikipedia(article,author)“Pulp Fiction”,”MK”
Del.icio.us bookmark ta creatorDel.icio.us bookmark ta creator
DBPedia(link,wikilink,category)“Pulp Fictoon”,”Q. Tarantino”,”movie”
DBPedia(link,wikilink,category)“Pulp Fictoon”,”Q. Tarantino”,”movie”
“http://…pfiction”,”cool”,”MKa”“http://…pfiction”,”cool”,”MKa”
141 of XYZ
Di ital Enter rise Research Institute www.deri.ie
Layers of Processing
Scheduling,Ada tation,
Costs
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 142/180
Processing Strategies
Similarity /Approximate
Query Operators
Multicast
Routing
Aggregation,Range
142 of XYZ
Di ital Enter rise Research Institute www.deri.ie
Outline – Part 2
The Vision: A Universal Storageor e a a
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 143/180
A Distributed Universal Storage
Operators
Query Engine
Mappings & Query Expansion The Praxis: Implementation
143 of XYZ
Di ital Enter rise Research Institute www.deri.ie
Universal Relation Model
Since the eight ies: Model for simpl if ied retrieval inrelat ional databases
Universal relat ion containing all attributes
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 144/180
Universal relat ion containing all attributes
Simplif ies navigation over mult iple relat ions dur ingquery formulat ion
SW- Store
1 2 3 1 2 1...
144 of XYZ
Di ital Enter rise Research Institute www.deri.ie
Triple Store
Universal relat ion model
value)
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 145/180
)
Similar to RDF: subject, predicate, object
OID Car Mileage HP Price
232 Volvo V70 34.000 180 28.000
SW- Store
Extensible
232 Car Volvo V70
Sindice, YARS...
Self-descriptive
No need for
232 Mileage 34.000
232 HP 180
145 of XYZ
representing null valuesr ce .
Di ital Enter rise Research Institute www.deri.ie
Indexing
Indexing of att ributes = key for Hashing
For tuple (oid v1 v2 ) of R(OID A1 A2 )
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 146/180
For tuple (oid, v1, v2, ...) of R(OID, A1, A2, ...)
232 Car Volvo V70
232 Mileage 34.000
YARSHexastore
232 Price 28.000
h(oid) for object lookup
h(A1 || v1)
h(A2 || v2) for Ai≥
v... (prefix search)
146 of XYZ
...trade-off storage vs. performance
Di ital Enter rise Research Institute www.deri.ie
P-Grid : Range Queries
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 147/180
[Datta et al. 2005]
Di ital Enter rise Research Institute www.deri.ie
Outline – Part 2
The Vision: A Universal Storageor e a a
A Distrib ted Uni ersal Storage
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 148/180
A Distributed Universal Storage
Operators
Query Engine
Mappings & Query Expansion The Praxis: Implementation
148 of XYZ
Di ital Enter rise Research Institute www.deri.ie
VQL
Query language VQL
Conjunct ive queries
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 149/180
j q
Enhanced by advanced operators Operations both on instance and schema level
Basic query form
SELECT ?oid, ?val
WHERE { ?oid price ?val }
SELECT ?oid, ?val
WHERE { ?oid price ?val }
...
LIMIT ...
...
LIMIT ...
149 of XYZ
SPARQL
Di ital Enter rise Research Institute www.deri.ie
Similarity Queries
FILTER (edist(?value, v) < 2) }
FILTER (edist(?value, v) < 2) }
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 150/180
String similari ty:
Edit distance using (posit ional) q- Grams
LSH forestSWAM
[Gravano et al. 2001, Schallehn et al. 2004] Requires addit ional key- value pairs in P- Grid
, ,
– h(q- gram i(A)) oid
– h(q- gram i(v)) oid
150 of XYZ
Di ital Enter rise Research Institute www.deri.ie
String Similarity
0
0 0 1
1
1
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 151/180
.. ..
0 0 11
. ....
00…0 11…1
“All values in distance d= 1…”
Quer ran e for att ribute vs. uer d+ 1 - rams
151 of XYZ
[NetDB06]
Di ital Enter rise Research Institute www.deri.ie
More Operators
Similarity joins [NetDB06]
WHERE { ?o1 attr 1 ?v1 . ?o2 attr 2 ?v2
FILTER (edist(?v1 ?v2) < k) }
WHERE { ?o1 attr 1 ?v1 . ?o2 attr 2 ?v2
FILTER (edist(?v1 ?v2) < k) }
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 152/180
( ( 1, 2) ) }( ( 1, 2) ) }
Ranking queries: top-k, skyline [DBRank07]
ORDER BY ?v LIMIT k
ORDER BY ?v LIMIT k
WHERE { ?o attr ?v }
ORDER BY ?v NN “A String“ LIMIT k
WHERE { ?o attr ?v }
ORDER BY ?v NN “A String“ LIMIT k
152 of XYZ
WHERE { ?o attr1 ?x . ?o attr2 ?y}
SKYLINE OF ?x MIN, ?y MAX
WHERE { ?o attr1 ?x . ?o attr2 ?y}
SKYLINE OF ?x MIN, ?y MAX
Di ital Enter rise Research Institute www.deri.ie
String Similarity Joins
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 153/180
Doubled sequent ial Doubled parallel
Cloud services
153 of XYZ
Parallel and sequent ial Sequent ial and parallel
Di ital Enter rise Research Institute www.deri.ie
Skyline Queries
Ob ects that are not “dominated“ b other ob ects
Scoring function on mult iple attributes, no weight ing
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 154/180
dominated objects
rice
154 of XYZ
mileage
Di ital Enter rise Research Institute www.deri.ie
Skylines: Basic Idea
DSL
“Frame Skyline“ algorithm
Minimum of f irst
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 155/180
dimension definesmaximum for seconddimension
a frame narrowing thesearch space
155 of XYZ
Di ital Enter rise Research Institute www.deri.ie
Skylines: Processing
y
...
...
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 156/180
...
Min x x
Min y
1. Find minimum in one selective dimension x2. Use y value of min(x) to l im it search range
.
4. Always ship current skyline with query
5. Determine lobal sk line at one eer
156 of XYZ
6. Opt ionally: distributed range querying “on theway“ to min(x)
Di ital Enter rise Research Institute www.deri.ie
Skylines: More Dimensions
All project ions to 2dsub- spaces are skylinecandidates
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 157/180
candidates
Objects of the searchedframe can dominate
projections
dominate objects of the
searched f rame
157 of XYZ
Di ital Enter rise Research Institute www.deri.ie
Outline – Part 2
The Vision: A Universal Storageor e a a
A Distributed Universal Storage
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 158/180
g
Operators
Query Engine
Mappings & Query Expansion The Praxis: Implementation
158 of XYZ
Di ital Enter rise Research Institute www.deri.ie
Query Execution
Goal: stateless processing „Push“ approach
(based on Mutant Query Plans [Papadimos et al. 2002])
Receiver peer is ident if ied by applying the hash funct ion
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 159/180
Receiver peer is ident if ied by applying the hash funct ion
Multiple instances of the plan t ravel t rough the network
A,1 , A,2 {(A,2)} B PIER, PeerDB
p1
p4
{(B,2),(C,1),(B,2),(C,4)}
{(A,2,B,2,C,1), (A,2,B,2,C,4)}
(A) B
(A) B
DARQ
p0 2
p3p5
{(A,3),(A,4)}p0
{(A,5),(B, 5)}(A) B
{(A,3),(A,4)} B
159 of XYZ
{(A,5),(A,6)}, , ,
{(A,5)} B
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 160/180
Di ital Enter rise Research Institute www.deri.ie
Guarantees: Completeness
Problem:“ “ ‘ complete results
Goal:
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 161/180
Goal:
Allow to est imate result completeness
– For the user (“98% of all possible answers“)
– For blocking query operators (aggregators, ranking- based
operators) in order t o guarantee a certain level ofcompleteness
Idea:
Not feasible on data level, but perhaps on peer level?!
161 of XYZ
Di ital Enter rise Research Institute www.deri.ie
Completeness Estimation
Estimation on peer level
Support of probabilistic guarantees
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 162/180
Accuracy improved by Milestone messages (MiMes)
...Seaweed
[CIKM08, WIDM08]
Join(A=B) P1B P2B P3B PmB...
Extract(A)sequ Extract(B)rangeP1
A P2A P3
A PnA...
162 of XYZ
Query graph0 Routing graph level
Routing
pointPx
Y
Di ital Enter rise Research Institute www.deri.ie
CERQ: Initial Estimation
0 1
P1000 001
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 163/180
0000 0001
P0 P2
P3
P4
Example: P-Grid range query 00100-1101 at P0
Predict trie on information from:
2) local routing table (at least one node per level/sub-tree)
=> estimates 8 (out of 10)
163 of XYZ
...the better, the more information is kept in each routing table
[P2P07]
Di ital Enter rise Research Institute www.deri.ie
CERQ: Estimation Refinement
The ini t ial est imate might not be correct
Piggy- back information
Query contains est imation of peers in sub- tree
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 164/180
Query contains est imation of peers in sub tree
Query replies can contain corrections
0 1
Sub-query P0 -> P3
Sub-tree 001*P1
0100
000 001
Estimate: 3 peers
P3’s routing table
contains peer(s) of
P0 P2
P3
P4
0000 0001
164 of XYZ
sub-tree 0010*
Di ital Enter rise Research Institute www.deri.ie
CERQ: Further Improvements
Use of structural replication
Each ent ry might have a di fferent path
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 165/180
Every path allows peers to learn more about sub- trees More ent ries mean better init ial estimates
P2P networks are dynamic
Though the structure is likely to be stable
The learned structure can be cached for later estimates
165 of XYZ
Di ital Enter rise Research Institute www.deri.ie
CERQ: Other Overlays
SkipGraphs Most similar to P- Grid
Rout ing information of mult iple levels
Unknown number of peers in bucket layer
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 166/180
Prefix hash tree (Chord)
Peers build a tree- hierarchy
Only applicable if number of children is known
Forwarding along neighbors
No estimation can be given
166 of XYZ
= e ea can e mappe un er cer a n con ons
Di ital Enter rise Research Institute www.deri.ie
Outline – Part 2
The Vision: A Universal Storageor e a a
A Distributed Universal Storage
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 167/180
Operators
Query Engine
Mappings & Query Expansion The Praxis: Implementation
167 of XYZ
Di ital Enter rise Research Institute www.deri.ie
Representing Mappings
Simple kind of at tribute correspondences
subsumes
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 168/180
A1 A2 A3 A4 A5 A6
Triple representation
(A4, equiv, A5)
(A3, subsumes, A6)PeerDBSQPeer
168 of XYZ
[Ideas08]
Di ital Enter rise Research Institute www.deri.ie
Query Expansion
Unexpanded query
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 169/180
GridVinePDMS
Map operators added
169 of XYZFirst mapping Expanded query
Di ital Enter rise Research Institute www.deri.ie
Outline – Part 2
The Vision: A Universal Storageor e a a
A Distributed Universal Storage
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 170/180
Operators
Query Engine
Mappings & Query Expansion The Praxis: Implementation
170 of XYZ
Di ital Enter rise Research Institute www.deri.ie
UniStore
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 171/180
Di ital Enter rise Research Institute www.deri.ie
Evaluation: Similarity Joins
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 172/180
c1: seq & par/ seq
c3: seq & par/ par
c4: par& par/ par
172 of XYZ
c5: par & local
Di ital Enter rise Research Institute www.deri.ie
Evaluation: CERQ
1 reference 3 references 5references
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 173/180
More references lead to (as expected)
Better init ial est imate
Less corrections
Smaller errors
173 of XYZ
Replicat ion factor 1 (similar results for factor 2)
Di ital Enter rise Research Institute www.deri.ie
Evaluation: Completeness
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 174/180
Min, max, avg 74 peers Min, max, avg 50 peers
174 of XYZ Without MiMes With MiMes
Di ital Enter rise Research Institute www.deri.ie
Outline – Part 2
The Vision: A Universal Storageor e a a
A Distributed Universal Storage
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 175/180
Operators
Query Engine
Mappings & Query Expansion
The Praxis: Implementation
175 of XYZ
Di ital Enter rise Research Institute www.deri.ie
Summary & Outlook
Web data is huge, heterogeneous, structured,linked
Modern applicat ions require a universal and f lexiblestorage
DB like and RDF lik ing
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 176/180
DB- like and RDF- lik ing DHTs well- suited for large- scale data management
UniStore as one solution
o us an sca a e, un versa an g - we g
Sophisticated query capabilities
Adapt ive, cost- based, stateless and parallel QP
Guarantees, semantic layer
…and all on totally decentralised and self-organising P2P!!
176 of XYZ
Privacy & Trust, reputat ion
Guarantees, consistency, integri ty
Di ital Enter rise Research Institute www.deri.ie
Acknowledgements
Kai- Uwe Satt ler, Manfred Hauswir th, Kat ja Hose,
Sapkota, Conor Hayes, ...
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 177/180
Students: Mart in Richtarsky, Michael Haß, JessicaMüller, Mario Wiegandt, Stefan Schwalm, Matthias
, , ...
Supported by the Science Foundation Ireland underGrant No. SFI/ 08/ CE/ I1380 (Lion- 2) and under
Grant No. 08/ SRC/ I1407 (Clique: Graph & Network
177 of XYZ
Di ital Enter rise Research Institute www.deri.ie
Related Systems
Sindice: Sindice. The semantic web index. http:/ / sindice.com/
YARS: A. Harth, J. Umbr ich, A. Hogan, S. Decker, Yars2: A federated repository forquerying graph structured data f rom the web, in: Proc. of ISWC/ ASWC, 2007.
Jena: Wilkinson, K., Sayers, C., Kuno, H., Reynolds, D.: Efficient RDF storage and retrievalin Jena2. In: SWDB, pp. 131–150 (2003)
Oracle: Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL- based RDFquerying scheme. In: VLDB, pp. 1216–1227 (2005)
- . . , . , . . , . . -
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 178/180
. . , . , . . , . . part it ioned DBMS for Semant ic Web data management . The VLDB Journal (2009) 18 :385– 406
PIER: R. Huebsch, J. M. Hellerstein, N. Lanham, B. Thau Loo, S. Shenker, and I. Stoica.Querying the Internet with PIER. In VLDB’03, pages 321–332, 2003.
RDFPeers: M. Cai and M. Frank. RDFPeers: a scalable distributed RDF repository based
on a structured peer- to- peer network . In WWW’04, pages 650–657, 2004. PeerDB: W. S. Ng, B. Ch. Ooi, and K.- L. Tan. PeerDB: A P2P- based System for Dist ributed
Data Sharing. In ICDE ’03, pages 633–644, 2003.
AmbientDB: P. Boncz and C. Trei tel . AmbientDB: Relat ional Quer Processin in a P2P Network . In Workshop On Databases, Information Systems and P2P Comput ing,(DBISP2P’03), pages 153–168, 2003.
Hexastore: Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for
semantic web data management. VLDB, 2008’
178 of XYZ
. . . , .W3C Candidate Recommendation.
Di ital Enter rise Research Institute www.deri.ie
Related Systems /2
LSH forest: M. Bawa, T. Cond ie, and P. Ganesan. LSH forest: self - tuning indexes forsimi larit y search. In WWW’05, pages 651–660, 2005.
SWAM: F. Banaei- Kashani and C. Shahabi. SWAM: a family of access methods fors m ar ty- searc n peer- to- peer ata net wor s. n , pages – , .
Cloud services: M. Brantner, D. Florescu, D. Graf, D. Kossmann, and T. Kraska. Bui ld inga database on S3. In SIGMOD ’08, pages 251–264, 2008.
DSL: P. Wu, C. Zhan, Y. Feng, B. Zhao, D. Agrawal, and A. El Abbadi. Parallelizing skyline
ueries for scalable distribution. In EDBT’06 a es 112–130 2006.Sk f S Q C O S f
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 179/180
Skyframe: S. Wang, Q. H. Vu, B. Ch. Ooi, A. K. H. Tung, and L. Xu. Skyframe: aframework for skyline query processing in peer- to- peer systems. The VLDB Journal,18(1):345–362, 2009.
DARQ : B. Quilitz and U. Leser. Querying Distributed RDF Data Sources with SPARQL. In’ – , , .
ObjectGlobe: R. Braumandl, M. Keidl , A. Kemper, D. Kossmann, A. Kreutz, S. Seltzsam,and K. Stocker. Ob jectGlobe: Ubiquit ous query processing on t he Internet. VLDB Journal,10(1):48–71, 2001.
Seaweed: D. Narayanan, A. Donnelly, R. Mortier, and A. Rowstron. Delay aware queryingw t seawee . n , pages – , .
SQPeer: G. Kokk inid is, E. Sid irourgos, and V. Chr istophides. Query Processing in RDF/ S-Based P2P Database Systems. Semantic Web and Peer- to- Peer, chapt er 4, pages 59–81.
Springer, 2006. GridVine: K. Aberer P. Cudr´e- Mauroux M. Hauswirth and T. Van Pelt. GridVine:
179 of XYZ
Bui ld ing Internet- Scale Semant ic Overlay Networks. In ISWC’04, pages 107–121, 2004.
PDMS: A. Y. Halevy, Z. G. Ives, P. Mork, and I. Tatarinov. Piazza: data managementinf ras- tructure for semantic web applications. In WWW’03, pages 556–567, 2003.
Di ital Enter rise Research Institute www.deri.ie
Thank you!
[CIKM08]: M. Karnstedt , K.Satt ler, M. Haß, M. Hauswir th, B. Sapkota, R. Schmidt: Estimating theNumber of Answers with Guarantees for Structured Queries in P2P Databases, CIKM 2008,
Napa, USA. [WIDM08]: M. Karnstedt , K. Satt ler, M. Haß, M. Hauswirt h, B. Sapkota, R. Schmidt:
-
-Applicat ions, WIDM'08 icw CIKM'08, Napa, USA, 2008.
[Ideas08]: M. Karnstedt, K.Satt ler, M. Hauswir th, B. Sapkota, R. Schmidt: Ad- hoc Integrat ionand Querying of Semantic Web Data, Ideas 2008, Coimbra, Portugal.
[DBRank07]: M. Karnstedt, J. Müller, K. Satt ler, Cost- Aware Skyl ine Queries in Structured
‘, , , .
8/3/2019 P2P Tutorial
http://slidepdf.com/reader/full/p2p-tutorial 180/180
[ICDE07]: M. Karnstedt , K.Satt ler, M. Richtarsky, J. Müller, M. Hauswirt h, R. Schmidt, R. John:UniStore: Querying a DHT- based Universal Storage, ICDE 2007 Demonstration.
[P2P07]: M. Karnstedt , K.Satt ler, R. Schmidt: Completeness Estimation of Range Queries inStructured Overlays, P2P 2007, Galway, Ireland.
. , . , . , . -Queries in Structured Overlays, P2P 2006, Cambr idge, UK
[NetDB06]: M. Karnstedt, K. Sattler, M. Hauswirth, R. Schmidt: Similarity Queries on StructuredData in Structured Overlays, NetDB'06 @ ICDE 2006, Atlanta, GA.
[Gilbert et al. 2002]: S. Gilbert and N. Lynch. Brewer’s conjecture and the feasibility of, , - . , – , .
[Datta et al. 2005]: A. Datta, M. Hauswir th, R. Schmidt, R. John, and K. Aberer. Range queries intr ie- structured overlays. In P2P’05, pages 57–66, 2005.
[Papadimos et al. 2002]: V. Papadimos, D. Maier. Mutant Query Plans. Information andSoftware Technology, 44(4):197–206, April 2002.
180 of XYZ
c a e n e a . : . c a e n, . e s , . a er: uppor ng m ar y pera ons aseon Approx imate Str ing Matching on the Web, CoopIS 2004, Larnaca.
[Gravano et al. 2001]: L. Gravano et al.; Approx im ate String Joins in a Database (almost) forFree, VLDB 2001, Roma.