peer-to-peer data management

40
1 Peer-To-Peer Data Management Hector Garcia-Molina ICDE Conference, February 28, 2002

Upload: lainey

Post on 05-Jan-2016

18 views

Category:

Documents


3 download

DESCRIPTION

Peer-To-Peer Data Management. Hector Garcia-Molina ICDE Conference, February 28, 2002. What is P2P?. pastry. jxta. can. fiorana. napster. freenet. united devices. open cola. ?. aim. ocean store. netmeeting. farsite. gnutella. icq. ebay. maorpheus. limewire. seti@home. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Peer-To-Peer Data Management

1

Peer-To-Peer Data Management

Hector Garcia-Molina

ICDE Conference, February 28, 2002

Page 2: Peer-To-Peer Data Management

2

What is P2P?

napster

gnutellamaorpheus

kazaa

bearshare seti@home

folding@home

ebay

limewire

icq

fiorana

mojo nation

jxta

united devicesopen cola

uddi

process tree

can

chord

ocean store

farsite

pastry

tapestry

?grove

netmeeting

freenet

popular power

aim

jabber

Page 3: Peer-To-Peer Data Management

3

Napster

central index

join

query

answer

get

file

...

Page 4: Peer-To-Peer Data Management

4

Gnutella

query

Page 5: Peer-To-Peer Data Management

5

Morpheus

...

...

......

...

...

super peer

Page 6: Peer-To-Peer Data Management

6

Seti@Home

satellite dish

...

raw data chunk

analyzed data

central site

Page 7: Peer-To-Peer Data Management

7

Lockss

library A

library B

library C

library E

library DD1

D2

D3

Page 8: Peer-To-Peer Data Management

8

PeerCast

Stanford

source

Stanford

source

after:

before:

Page 9: Peer-To-Peer Data Management

9

What is a P2P System?

Multiple sites (at edge)

Distributed resources

Sites are autonomous (different owners)

Sites are both clients and servers

Sites have equal functionality

P2P Purity

Page 10: Peer-To-Peer Data Management

10

P2P is BAD IDEA!!

Distribution is expensive!

Specialized functionality is good!

Page 11: Peer-To-Peer Data Management

11

Example: Distributed Data Management

Distribution is expensive

If you must distribute:build centralized directory, index

• use backups for reliability

for replicated data, use primary copy• use backups for reliability

Page 12: Peer-To-Peer Data Management

12

Computational Efficiency is NOT Main Goal

Main driving force in a P2P system:exploiting existing (often free) resourcessharing costs among manylegal protectionautonomyanonymity

Page 13: Peer-To-Peer Data Management

13

Should We Do P2P Research?

Should we help people break the law?

Analogy: Should we develop pillows, knives, hammers, drugs, bath tubs, cars, airplanes, ... ??

Page 14: Peer-To-Peer Data Management

14

Should We Do P2P Research?

YES: P2P not exclusively for breaking lawRemember the VCR

YES: P2P can liberate us from culture “plantation owners” (Lessig)

Page 15: Peer-To-Peer Data Management

15

Is “Free Culture’’ Feasible?

Example: Legal texts

Can we afford it?

economicactivity

rules of the game

today

Page 16: Peer-To-Peer Data Management

16

Should DB community work on P2P?

YES

Page 17: Peer-To-Peer Data Management

17

P2P Challenges

Easier to list NON-Research-Topics:Color schemes for P2P NodesImpact of P2P on Moroccan 15th Century Literature

Page 18: Peer-To-Peer Data Management

18

P2P Challenges

Search

Resource Management

Security & Privacy

Page 19: Peer-To-Peer Data Management

19

Search Taxonomy

lookup

content queriessearch

single site regional global

scope of index

freenet

gnutella napstermorpheus

can

routing

replicated SP

partial

Page 20: Peer-To-Peer Data Management

20

Index Implementation Taxonomy

yes

no

centralized distributed P2P

nature of index

freenetgnutella

napster

morpheus

can

routing

inde

x lo

cati

on c

orre

late

d w

ith

cont

ent l

ocat

ion replicated SP

partial

Page 21: Peer-To-Peer Data Management

21

Content Addressable Network (CAN)

1

2

NodesData

Page 22: Peer-To-Peer Data Management

22

Can We Improve Flooding?

yes

no

centralized distributed P2P

nature of index

freenetgnutella

napster

morpheus

can

routing

inde

x lo

cati

on c

orre

late

d w

ith

cont

ent l

ocat

ion replicated SP

partial

Page 23: Peer-To-Peer Data Management

23

Directed BFS in Gnutella

Heuristics for Selecting Direction

>RES: Returned most results

<TIME: Shortest satisfaction time

<HOPS: Min hops for results

>MSG: Sent us most messages (all types)

<QLEN: Shortest queue

<LAT: Shortest latency

>DEG: Highest degree

query?...

Page 24: Peer-To-Peer Data Management

24

How Does One Evaluate?

Live Gnutella?

Use real Gnutella as “laboratory”

Page 25: Peer-To-Peer Data Management

25

Time to Satisfaction for Directed BFS

Page 26: Peer-To-Peer Data Management

26

Routing Index

A B

C

D

5025C

AIDB

015D

AIDB

050B

AIDB

015D

200A

5025C

2065B

7075B

5090B

200A

AIDB

Q(DB)

Page 27: Peer-To-Peer Data Management

27

Types of Routing Indexes

Compound

Hop Count

Exponential Decay

Strategies for CyclesIgnore (for Hop-Count, exponential)Avoid Update CyclesDetect Update Cycles and Recover

Page 28: Peer-To-Peer Data Management

28

Effect of Index Compression

0

100

200

300

400

500

600

0% 50% 67% 75% 80% 83%

Index Compression

Me

ss

age

s CRI

HRI

ERI

No-RI

Page 29: Peer-To-Peer Data Management

29

Effect of Network Topology

0

100

200

300

400

500

600

700

CRI HRI ERI No RI

Me

ssa

ge

s Tree

Tree+Cycle

Powerlaw

Page 30: Peer-To-Peer Data Management

30

Resource Management

Resource:storage (lockss)CPU processing (seti@home)bandwidth (PeerCast)

Issues:fairnessload balancing

Page 31: Peer-To-Peer Data Management

31

Example: Data Trading

site 1 site 2 site 3

A1 B1 C1

A2 B2 C2

B1 A1

trade

B2 A2

trade

Page 32: Peer-To-Peer Data Management

32

Example: Data Trading

site 1 site 2 site 3

A1 B1 C1

A2 B2 C2

B1 A1

trade

C1

A2 trade

C2 B2

trade

Page 33: Peer-To-Peer Data Management

33

Data Trading

Order of trades impacts reliability

Issues:Swaps vs. DeedsFixed price vs. bidsPreference to

• sites with a lot of space?

• reliable sites?

• “desperate” sites?

Page 34: Peer-To-Peer Data Management

34

Effect of Bid Policies

-50

0

50

100

150

200

250

300

350

2 3 4 5

Local storage factor (F)

MT

TF

dif

fere

nce

vs.

Fix

edP

rice

(p

erce

nt)

FreeSpace InverseRareCollection RareCollection UsedSpace

bid more (ask more in return)when I have more free space

bid more (ask more in return)when I have less free space

Page 35: Peer-To-Peer Data Management

35

Effect of One Maverick Site

-40

-20

0

20

40

60

80

100

120

140

2 3 4 5

Local storage factor (F)

MT

TF

dif

fere

nce

vs.

no

mav

eric

k si

tes

(per

cen

t)

Normal No maverick sites Maverick

always bids high

Page 36: Peer-To-Peer Data Management

36

Security & Privacy

Issues:AnonymityReputationAccountabilityInformation PreservationInformation QualityTrustDenial of service attacks

Page 37: Peer-To-Peer Data Management

37

Information Preservation

Example Policy: make 3 copies of documents

A1 make copies

What can go wrong?

Page 38: Peer-To-Peer Data Management

38

What Can Go Wrong?

“Bad” sites make copies

“Bad” site alters copy

“Bad” site publishes fake

“Bad” site makes may copies of other docs

...

A1 make copies

A’1

A1

Page 39: Peer-To-Peer Data Management

39

Conclusion

P2P systems popular today

P2P systems vulnerable and inefficient

Many challenges aheadSearchResource ManagementSecurity and Privacy

Page 40: Peer-To-Peer Data Management

40

For Additional Information

Google: “Stanford Peers”

http://www-db.stanford.edu/peers/