peer-to-peer ee 122: intro to communication networks fall 2010 (mw 4-5:30 in 101 barker) scott...
TRANSCRIPT
Peer-to-Peer
EE 122: Intro to Communication Networks
Fall 2010 (MW 4-5:30 in 101 Barker)
Scott Shenker
TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev, Prayag Narula
http://inst.eecs.berkeley.edu/~ee122/
Materials with thanks to Jennifer Rexford, Ion Stoica, Vern Paxsonand other colleagues at Princeton and UC Berkeley
Today’s Lecture
• The Opening Act (~30 minutes)– Scott talking about Peer-to-Peer Systems
• The Main Event (rest of the time, and beyond)– Igor Ganichev talking about Networking Libraries
2
Ground Rules
• No slides (for me)– No laptops for you
• I won’t test you on anything from this lecture– This is for context
• I will give you homework on a P2P system– But based on material in the book– Won’t be hard– Some material covered in section
3
Peer-to-Peer
• Design paradigm: No central contol– Large number of identical nodes– Highly resilient and scalable– Just what you need to run a large datacenter
• Economic model: leverage user nodes– No need for huge investment– Broad geographic distribution– Self-scaling
• Will discuss both, but start with economic model….– A continuing struggle for control 4
In the beginning…..
• AT&T created the telephone network
• First large-scale person-to-person communication infrastructure– The patent dispute of the telephone makes our patent
litigation battles seem like child’s play
• The Telephone network dominated for two generations….
5
The Telephone Model
• Functionality controlled by network operator– They sink the money into the infrastructure– They get to decide what that infrastructure does– But government regulated company (set ROI, etc.)
• End-user only has “dumb terminal”– Legally restricted in its use of that terminal– Until the court’s finally gave some freedom to users
• Regulated monopoly led to glacial innovation in functionality but extreme reliability and polish– Why spend money on features no one knows they want?– Spend money improving what people notice (failures)
6
Then came the Internet…
• End points had complete freedom, and substantial computing power– Infrastructure just carried bits
• Completely different economic model– Small guys can innovate– Big guys run dumb infrastructure (like utilities)
• Result:– Rapid innovation in applications (e.g., email, web)– Diversity of content (on web)– Low barrier to entry
• And finally, even the big boys noticed….7
The Empire Strikes Back
• Zipf’s law restores order to the universe– Popularity ~ 1/rank – Lots of weight at top (people like the same things)– Lots of weight in tail (but lots of idiosyncratic tastes)
• A Tale of Two Markets– Lots of action in the tail (anyone can play)– But only a few really big guys (hard to enter this market)
• High barrier to entry: CDNs– Bandwidth– Servers– Management
8
Revenge of the Nerds
• Peer-to-Peer restores the balance– Takes “contributed” nodes from participants– Together they provide enough aggregate bandwidth
• The key is in coordinated these peer nodes– First: Napster (Shawn Fanning)
• Academia followed (as it always does)
• My lecture on how academia has missed out on everything?– We are really good at solving problems– We are really terrible at figuring out what people want…..
9
Coordination Mechanisms
• Must be: – Scalable– Fault-tolerant– Can use commodity parts
• A good way to build systems in general!
• Now finished with P2P as economic model– Moving on to……..
10
Peer-to-Peer as Design Paradigm
• Once you can coordinate many disparate peers– You can certainly coordinate many co-located peers– Now the dominant design style in datacenters– DHT-like data structures are everywhere
• This is what made Google work: (like Jobs at App)– Design as if failure is the typical case– Recover from failure only at the highest possible layer
o If routing fails use another server, don’t wait for routing to recovero This is hard to accept for some people….
– Low cost components– Scale out, not up
11
P2P Systems Do Three Main Things
• Help user determine which content they want– Some form of search– P2P form of Google
• Then locate that content– Locate where that content is on the Internet– P2P form of DNS (map name to location)
• Then download that content– P2P form of Akamai
12
We need P2P forms of
• Search (keyword)
• Directory
• CDN
• What kinds of coordination mechanisms do we need for these tasks?
13
P2P Search
• Basic approach:– Since search can be complicated, just do it on each
machine independently, and keep going for as long as you need
• Examples:– Broadcast– Broadcast among superpeers– Random walk (theory)
• Cannot match efficiency of Google14
P2P Directory
• In most cases, a few centralized servers will do
• If you need to scale further, then use DHT– Put/Get interface
• DHT: simple version is consistent hashing– Everyone knows set of servers– Map key to server using the successor rule
15
P2P Download
• The first key here is self-scaling
• If every person who downloads something also has to upload it to someone else, the system works
• The second key here is asymmetric bandwidth– That’s where chunks come in– Downloading many chunks
16
Modern P2P Systems Use a Mixture
• Search to find name (wildcard search)– Flood among superpeers
• Directory lookup to find host given exact name– DHT-like structure
• Chunked download– Self-scaling– Asymmetric bandwidth
17