peer-to-peer data management
DESCRIPTION
Peer-To-Peer Data Management. Hector Garcia-Molina ICDE Conference, February 28, 2002. What is P2P?. pastry. jxta. can. fiorana. napster. freenet. united devices. open cola. ?. aim. ocean store. netmeeting. farsite. gnutella. icq. ebay. maorpheus. limewire. seti@home. - PowerPoint PPT PresentationTRANSCRIPT
1
Peer-To-Peer Data Management
Hector Garcia-Molina
ICDE Conference, February 28, 2002
2
What is P2P?
napster
gnutellamaorpheus
kazaa
bearshare seti@home
folding@home
ebay
limewire
icq
fiorana
mojo nation
jxta
united devicesopen cola
uddi
process tree
can
chord
ocean store
farsite
pastry
tapestry
?grove
netmeeting
freenet
popular power
aim
jabber
3
Napster
central index
join
query
answer
get
file
...
4
Gnutella
query
5
Morpheus
...
...
......
...
...
super peer
6
Seti@Home
satellite dish
...
raw data chunk
analyzed data
central site
7
Lockss
library A
library B
library C
library E
library DD1
D2
D3
8
PeerCast
Stanford
source
Stanford
source
after:
before:
9
What is a P2P System?
Multiple sites (at edge)
Distributed resources
Sites are autonomous (different owners)
Sites are both clients and servers
Sites have equal functionality
P2P Purity
10
P2P is BAD IDEA!!
Distribution is expensive!
Specialized functionality is good!
11
Example: Distributed Data Management
Distribution is expensive
If you must distribute:build centralized directory, index
• use backups for reliability
for replicated data, use primary copy• use backups for reliability
12
Computational Efficiency is NOT Main Goal
Main driving force in a P2P system:exploiting existing (often free) resourcessharing costs among manylegal protectionautonomyanonymity
13
Should We Do P2P Research?
Should we help people break the law?
Analogy: Should we develop pillows, knives, hammers, drugs, bath tubs, cars, airplanes, ... ??
14
Should We Do P2P Research?
YES: P2P not exclusively for breaking lawRemember the VCR
YES: P2P can liberate us from culture “plantation owners” (Lessig)
15
Is “Free Culture’’ Feasible?
Example: Legal texts
Can we afford it?
economicactivity
rules of the game
today
16
Should DB community work on P2P?
YES
17
P2P Challenges
Easier to list NON-Research-Topics:Color schemes for P2P NodesImpact of P2P on Moroccan 15th Century Literature
18
P2P Challenges
Search
Resource Management
Security & Privacy
19
Search Taxonomy
lookup
content queriessearch
single site regional global
scope of index
freenet
gnutella napstermorpheus
can
routing
replicated SP
partial
20
Index Implementation Taxonomy
yes
no
centralized distributed P2P
nature of index
freenetgnutella
napster
morpheus
can
routing
inde
x lo
cati
on c
orre
late
d w
ith
cont
ent l
ocat
ion replicated SP
partial
21
Content Addressable Network (CAN)
1
2
NodesData
22
Can We Improve Flooding?
yes
no
centralized distributed P2P
nature of index
freenetgnutella
napster
morpheus
can
routing
inde
x lo
cati
on c
orre
late
d w
ith
cont
ent l
ocat
ion replicated SP
partial
23
Directed BFS in Gnutella
Heuristics for Selecting Direction
>RES: Returned most results
<TIME: Shortest satisfaction time
<HOPS: Min hops for results
>MSG: Sent us most messages (all types)
<QLEN: Shortest queue
<LAT: Shortest latency
>DEG: Highest degree
query?...
24
How Does One Evaluate?
Live Gnutella?
Use real Gnutella as “laboratory”
25
Time to Satisfaction for Directed BFS
26
Routing Index
A B
C
D
5025C
AIDB
015D
AIDB
050B
AIDB
015D
200A
5025C
2065B
7075B
5090B
200A
AIDB
Q(DB)
27
Types of Routing Indexes
Compound
Hop Count
Exponential Decay
Strategies for CyclesIgnore (for Hop-Count, exponential)Avoid Update CyclesDetect Update Cycles and Recover
28
Effect of Index Compression
0
100
200
300
400
500
600
0% 50% 67% 75% 80% 83%
Index Compression
Me
ss
age
s CRI
HRI
ERI
No-RI
29
Effect of Network Topology
0
100
200
300
400
500
600
700
CRI HRI ERI No RI
Me
ssa
ge
s Tree
Tree+Cycle
Powerlaw
30
Resource Management
Resource:storage (lockss)CPU processing (seti@home)bandwidth (PeerCast)
Issues:fairnessload balancing
31
Example: Data Trading
site 1 site 2 site 3
A1 B1 C1
A2 B2 C2
B1 A1
trade
B2 A2
trade
32
Example: Data Trading
site 1 site 2 site 3
A1 B1 C1
A2 B2 C2
B1 A1
trade
C1
A2 trade
C2 B2
trade
33
Data Trading
Order of trades impacts reliability
Issues:Swaps vs. DeedsFixed price vs. bidsPreference to
• sites with a lot of space?
• reliable sites?
• “desperate” sites?
34
Effect of Bid Policies
-50
0
50
100
150
200
250
300
350
2 3 4 5
Local storage factor (F)
MT
TF
dif
fere
nce
vs.
Fix
edP
rice
(p
erce
nt)
FreeSpace InverseRareCollection RareCollection UsedSpace
bid more (ask more in return)when I have more free space
bid more (ask more in return)when I have less free space
35
Effect of One Maverick Site
-40
-20
0
20
40
60
80
100
120
140
2 3 4 5
Local storage factor (F)
MT
TF
dif
fere
nce
vs.
no
mav
eric
k si
tes
(per
cen
t)
Normal No maverick sites Maverick
always bids high
36
Security & Privacy
Issues:AnonymityReputationAccountabilityInformation PreservationInformation QualityTrustDenial of service attacks
37
Information Preservation
Example Policy: make 3 copies of documents
A1 make copies
What can go wrong?
38
What Can Go Wrong?
“Bad” sites make copies
“Bad” site alters copy
“Bad” site publishes fake
“Bad” site makes may copies of other docs
...
A1 make copies
A’1
A1
39
Conclusion
P2P systems popular today
P2P systems vulnerable and inefficient
Many challenges aheadSearchResource ManagementSecurity and Privacy
40
For Additional Information
Google: “Stanford Peers”
http://www-db.stanford.edu/peers/