peer to peer inf 123 – software architecture [email protected] 1
TRANSCRIPT
2
Outline
• Some theory• Napster• Gnutella/torrents• Skype• Sensor networks• I won’t talk about– NAT punching/Firewall (ICE, STUN, TURN)– Distributed system algorithms/graph walks
3
SOME THEORY
4
Peer to peer
• Identical components running independently on different networked hosts
• Similar to distributed systems– The topology can change– Many paths from A to B
• Each peer is client and server to other peers• Each peer provides resources: data, code, CPU,
…
5
Peer 1
Peer 4Peer 2
Peer 3
Peer 4Peer 2
Peer 3
now
later
6
Pros and cons
• Pros– Scaling: each node provides CPU and storage– Robust if one node fails
• Cons– Complex protocols for resource discovery– Security, trust management
7
Examples
• Skype• Sensor networks (house, environment, …)• DNS caching• File sharing
8
NAPSTER: HYBRID CS AND P2P
9
Napster
• Open source• Resource location– Client-server– Custom protocol on top of TCP
• Resource retrieval– P2P– HTTP GET
10
Communication diagram
Peer A Peer B
Content Directory
1) I have Gangnam Style
2) Who has Gangnam Style? 3) A has Gangnam Style
4) Give me Gangnam Style
5) Here is Gangnam Style
11
Behind Firewall? Sequence diagram
12
Protocol: I have a song
“filename”, md5, size (B), bitrate (kbps), frequency (Hz), time (s)
“C:\music\OMGPSY\gangnam.mp3” 9e107d9d372bb6826bd81d3542a419d6 1234567 128 44100 253
• To class: why md5?
13
Protocol: Search query
[FILENAME CONTAINS "artist name"] MAX_RESULTS <max> [FILENAME CONTAINS "song"] [LINESPEED <compare> <link-type>] [BITRATE <compare> "<br>"] [FREQ <compare> "<freq>"] [WMA-FILE] [LOCAL_ONLY]
MAX_RESULTS 100 FILENAME CONTAINS “Gangnam Style” BITRATE “AT LEAST” “128”
MAX_RESULTS 10000 # browse • To class: query for songs from Psy?
14
Achilles’ heel
• Server = bottleneck• Swamped with requests for the
location of a popular song– Answer location requests slowly
• Heavy load if the only owner is behind a Firewall– Cap number of simultaneous
transfers
15
Achilles’ heel #2
• Server = single point of failure• Without it, the peers are blind and useless• Shutdown mandated by court order
16
GNUTELLA: FULL P2P
17
Gnutella
• Open source• Resource retrieval– P2P– HTTP GET
• Resource location, peer discovery, …– P2P!– Custom over TCP
• Purely distributed/decentralized search engine• Peer also called servent (server + client)
18
Flooding
19
Flood prevention
• Every message has a Time to Live flag (TTL)• Decreased by 1 every hop• When 0, don’t forward
20
Peer discovery
• Send ping• When receiving ping– Send pong where it came from– And forward the ping to other nodes
21
Search query
• Query message payload
• Response: QueryHit message payload
22
Query hit
23
Seeds
• How do I find peers when I join for the first time?– Aka bootstrapping, seeding
• Need a reliable list of nodes/seeds– Same idea as web page seeds for web crawlers
• Ship the list in the software• Also: IRC, mailing lists, websites with seeds
24
SKYPE: OVERLAY P2P
25
26
Skype supernodes
• Promote “strong” peers to supernodes– Based on topology, bandwidth, …– Trade secret
• Directory distributed and replicated in supernodes– Robust and scalable directory
27
Privacy and security
• Calls relayed through supernodes– Proprietary protocol– Encrypted for privacy
• Closed-source client– Very hard to inspect the code– Even harder to hack it– Therefore: no malicious peers
http://recon.cx/en/f/vskype-part1.pdfhttp://www1.cs.columbia.edu/~salman/publications/skype1_4.pdf
28
SENSOR NETWORKS
29
Basic idea
30
Constraints
• Each node can– Sense – Compute– Communicate with its neighbors
• Non-functional properties– Low energy consumption– Fault-tolerance– Scalability– Low latency
31
Usage
• Home automation• Army communication• Forest fire, tsunami, or tornado detection• Airport weather conditions• …
32
London Heathrow Airport
33
Base Station (aka sink or gateway)
Analog-Digital Converter
Deploymentarchitecture
System architecture
34
Communication: routing
• Flat• Hierarchical (super nodes)• Location-based
• Depends on network topology
35
Communication: protocol
• Multipath – Send data through multiple paths– Good fault-tolerance, but high energy cost
• Query-based– The gateway sends a query to the network– A node with the data answers the query
• Quality of Service-based– Balance energy consumption and data quality
36
Read more
• https://connect.innovateuk.org/web/network-of-sensors/article-view/-/blogs/advanced-sensor-networks-developed-for-heathrow-airport
• Routing Techniques in Wireless Sensor Networks: A Survey, Al-Karaki et al., 2004