peer-to-peer ee 122: intro to communication networks fall 2010 (mw 4-5:30 in 101 barker) scott...

Peer-to-Peer

EE 122: Intro to Communication Networks

Fall 2010 (MW 4-5:30 in 101 Barker)

Scott Shenker

TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev, Prayag Narula

http://inst.eecs.berkeley.edu/~ee122/

Materials with thanks to Jennifer Rexford, Ion Stoica, Vern Paxsonand other colleagues at Princeton and UC Berkeley

Today’s Lecture

• The Opening Act (~30 minutes)– Scott talking about Peer-to-Peer Systems

• The Main Event (rest of the time, and beyond)– Igor Ganichev talking about Networking Libraries

2

Ground Rules

• No slides (for me)– No laptops for you

• I won’t test you on anything from this lecture– This is for context

• I will give you homework on a P2P system– But based on material in the book– Won’t be hard– Some material covered in section

3

Peer-to-Peer

• Design paradigm: No central contol– Large number of identical nodes– Highly resilient and scalable– Just what you need to run a large datacenter

• Economic model: leverage user nodes– No need for huge investment– Broad geographic distribution– Self-scaling

• Will discuss both, but start with economic model….– A continuing struggle for control 4

In the beginning…..

• AT&T created the telephone network

• First large-scale person-to-person communication infrastructure– The patent dispute of the telephone makes our patent

litigation battles seem like child’s play

• The Telephone network dominated for two generations….

5

The Telephone Model

• Functionality controlled by network operator– They sink the money into the infrastructure– They get to decide what that infrastructure does– But government regulated company (set ROI, etc.)

• End-user only has “dumb terminal”– Legally restricted in its use of that terminal– Until the court’s finally gave some freedom to users

• Regulated monopoly led to glacial innovation in functionality but extreme reliability and polish– Why spend money on features no one knows they want?– Spend money improving what people notice (failures)

6

Then came the Internet…

• End points had complete freedom, and substantial computing power– Infrastructure just carried bits

• Completely different economic model– Small guys can innovate– Big guys run dumb infrastructure (like utilities)

• Result:– Rapid innovation in applications (e.g., email, web)– Diversity of content (on web)– Low barrier to entry

• And finally, even the big boys noticed….7

The Empire Strikes Back

• Zipf’s law restores order to the universe– Popularity ~ 1/rank – Lots of weight at top (people like the same things)– Lots of weight in tail (but lots of idiosyncratic tastes)

• A Tale of Two Markets– Lots of action in the tail (anyone can play)– But only a few really big guys (hard to enter this market)

• High barrier to entry: CDNs– Bandwidth– Servers– Management

8

Revenge of the Nerds

• Peer-to-Peer restores the balance– Takes “contributed” nodes from participants– Together they provide enough aggregate bandwidth

• The key is in coordinated these peer nodes– First: Napster (Shawn Fanning)

• Academia followed (as it always does)

• My lecture on how academia has missed out on everything?– We are really good at solving problems– We are really terrible at figuring out what people want…..

9

Coordination Mechanisms

• Must be: – Scalable– Fault-tolerant– Can use commodity parts

• A good way to build systems in general!

• Now finished with P2P as economic model– Moving on to……..

10

Peer-to-Peer as Design Paradigm

• Once you can coordinate many disparate peers– You can certainly coordinate many co-located peers– Now the dominant design style in datacenters– DHT-like data structures are everywhere

• This is what made Google work: (like Jobs at App)– Design as if failure is the typical case– Recover from failure only at the highest possible layer

o If routing fails use another server, don’t wait for routing to recovero This is hard to accept for some people….

– Low cost components– Scale out, not up

11

P2P Systems Do Three Main Things

• Help user determine which content they want– Some form of search– P2P form of Google

• Then locate that content– Locate where that content is on the Internet– P2P form of DNS (map name to location)

• Then download that content– P2P form of Akamai

12

We need P2P forms of

• Search (keyword)

• Directory

• CDN

• What kinds of coordination mechanisms do we need for these tasks?

13

P2P Search

• Basic approach:– Since search can be complicated, just do it on each

machine independently, and keep going for as long as you need

• Examples:– Broadcast– Broadcast among superpeers– Random walk (theory)

• Cannot match efficiency of Google14

P2P Directory

• In most cases, a few centralized servers will do

• If you need to scale further, then use DHT– Put/Get interface

• DHT: simple version is consistent hashing– Everyone knows set of servers– Map key to server using the successor rule

15

P2P Download

• The first key here is self-scaling

• If every person who downloads something also has to upload it to someone else, the system works

• The second key here is asymmetric bandwidth– That’s where chunks come in– Downloading many chunks

16

Modern P2P Systems Use a Mixture

• Search to find name (wildcard search)– Flood among superpeers

• Directory lookup to find host given exact name– DHT-like structure

• Chunked download– Self-scaling– Asymmetric bandwidth

17

peer-to-peer ee 122: intro to communication networks fall 2010 (mw 4-5:30 in 101 barker) scott...

Documents