p2p computing mira yun september 16, 2005. outline what is p2p p2p taxonomies characteristics...

34
P2P Computing MIRA YUN September 16, 2005

Upload: irma-perry

Post on 05-Jan-2016

230 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

P2P Computing

MIRA YUN

September 16, 2005

Page 2: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Outline

What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Page 3: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

P2P

“Peer-to-peer” (P2P) refers to a class of systems and applications that employ distributed resources to perform a function in a decentralized manner

Generally opposed to the client/server architecture

Page 4: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Peers

A peer gives some resources and obtains other resources in return. Peer = like each other All participants are peers (in the pure form of a P2P net.)

Each peer depends on other peers Meaningless to be alone

Peers are autonomous (self governing) if not wholly controlled by each other or by the same authority as everyone else

Page 5: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

What is P2P?

“The sharing of computer resources and services by direct exchange between systems” [p2pwg, 2001].

“Systems and applications that employ distributed resources to perform critical functions in a decentralized manner”

enables peers to share their resources (information, processing, presence, etc.) with at most a limited interaction with a Centralized server.

Page 6: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Taxonomy of computer systems

Page 7: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

P2P Models : pure, hybrid, super-peers

Pure: peers have same capability and responsibility. symmetric communication. No host superior; all hosts can act as client or server. examples: Gnutella, Freenet

Hybrid: servers facilitate the interaction between peers addressing bypasses the DNS, but a central server as

directory examples: Napster, ICQ, Jabber

Page 8: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

P2P Models : pure, hybrid, super-peers

Super-peers A super-peer is a node in a peer-to-peer network that

operates both as a server to a set of clients, and as an equal in a network of super-peers.

Super-peer networks try to balance the efficiency of centralized search, and the autonomy, load balancing and robustness to attacks provided by distributed search.

example: Kazaa

Page 9: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

P2P search models

Centralized directory model There is a central index. Once the requested file is located,

exchange takes place directly

between peers.

Page 10: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

P2P search models Napster

Created in 1999 by Shawn Fanning a freshman student at Northeastern University.

To freely get MP3 music files. Central index server, P2P exchange

Sued several times, suspended. The music industry is against Napster

because people can get music for free instead of paying for a CD.

Napster's defense is that the files are personal files that people maintain on their own machines, and therefore Napster is not responsible.

Page 11: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

P2P search models

Flooded requests model Each request from a peer is flooded/broadcast to directly

connected peers (1) which in turn flood their peers (2). Propagated until a maximum number of floods

occur (typically 5 to 9) or the request

is answered. Used by Gnutella Requires a lot of bandwidth,

does not scale Good for company networks

Page 12: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

P2P search models

Document routing model Each peer is assigned a random ID; each peers knows a

number of other peers. When a document is published, an ID is computed by hash

on the document contents and name. Each peer routes the document

to the node with the most similar

ID until the nearest peer ID is

the current peer's ID.

Page 13: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

P2P search models

Document routing model When a peer requests the document, the request will go to

the peer with the ID most similar to the document ID. This process is repeated until a copy of the document is

found. Then the document is transferred

back to the request originator,

while each peer participating

in the routing will keep

a local copy.

Page 14: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

P2P search models

Document routing model Efficient for large communities But document ID must be known before posting request Used in FreeNet Four improved algorithms:

Chord, CAN, Tapestry and Pastry.

Page 15: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Characteristics

Decentralization Centralized systems

Ideal for some applications Bottlenecks Inefficient use of resources Expensive to setup Hard to maintain

Decentralized systems P2P emphasis on the users' ownership and control of data

and resources. Fully decentralized is difficult in practice Hybrid approach

Page 16: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Characteristics

Scalability Limited by factors:

The amount of centralized operations The amount of state The inherent parallelism an application exhibits

Scalability also depends on the ratio of communication to computation between the nodes

Napset: can scale up to over 6 million users SETI@home : close to 3.5 million users so far

Page 17: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Characteristics

Anonymity One goal of P2P is to allow people to use systems

without concern for legal issue. Three different kinds of anonymity

sender anonymity, Receiver anonymity mutual anonymity Gnutella

Request is broadcast and rebroadcast until it reaches a peer with the content

Freenet Request is sent and forward to a peer that is most likely to have the

content

Page 18: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Characteristics

Self-Organization Needed because of scalability, fault resilience, and the

cost of ownership. Adaptation is required to handle the changes caused by

peers connecting and disconnecting from the P2P systems. Cost of Ownership

Reduces the cost of owning the systems and the content, and the cost of maintaining them.

SETI@home faster than fastest supercomputer in world, cost is 1%

Ad-Hoc Connectivity Has a strong effect on all classes of P2P systems

Page 19: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Characteristics

Performance Influenced by three types of resources:

processing, storage, and networking. Three key approaches to optimize performance:

Replication: puts copies of objects/files closer to the requesting peers

Caching : Reduces the path length required to fetch a file/object and therefore the number of messages exchanged between the peers.

Intelligent routing and network organization:

Page 20: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Taxonomy of P2P systems

Page 21: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

- Processing scalability in massive multi-

parameters systems - Run by a central controller - Fork and join mechanism - Limitations

• Independent small parts• Internet latencies

- Intel claim speed-ups from 15hours to 30 minutes in case of interest

rate swap modeling by using P2P

Distributed Computing

Page 22: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Distributed Computing

SETI@home (Search for Extraterrestrial Intelligence) A collection of research projects aimed at discovering alien

civilizations. Goals: to search for extraterrestrial radio emissions. Design: Two major components: data server & client. Decentralization and Scalability:

distributes files (350KB large) to its users.

Page 23: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Jay ShethJay ShethJay Sheth

- Application level collaboration between users - Event based applications such as Instant messaging, chat, online games - Challenges

• Location of other peers (e.g.. NetMeeting requires to know other peers IP address)

• Real time constraints e.g.. Game DOOM

Collaboration

Page 24: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Jay ShethJay ShethJay Sheth

- Platforms have support for primary P2P components : naming, discovery, communication, security and resource aggregation - Candidates for future P2P platform : .net, JXTA

Platforms

Page 25: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Platforms (JXTA) JXTA = Juxtapose = side by side Open-source initiative from Sun (Java)

“JXTA™ technology is a set of open protocols that allow any connected device on the network ranging from cell phones and wireless PDAs to PCs and servers to communicate and collaborate in a P2P manner.”

“JXTA peers create a virtual network where any peer can interact with other peers and resources directly even when some of the peers and resources are behind firewalls and NATs or are on different network transports.”

Objectives: Interoperability - across systems and communities Platform independence - multiple/diverse languages, systems,

and networks Ubiquity - every device with a digital heartbeat

Page 26: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Platforms (JXTA)

Architecture JXTA application layer JXTA service layer JXTA core layer Set of 6 protocols

Peer Endpoint Protocols: available route to destination Peer Rendezvous Protocol : sign in/out, authentication Peer Resolver Protocol : send/receiver search queries for peers Pipe Binding protocols : pipe advertisement to pipe and point Peer Information protocol : learn peer’s status/properties Peer Discovery Protocol : find peers, groups, advertisement

Page 27: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

- Content storage and exchange is where P2P is most successful

• Napster, Gnutella, Kazza

File Sharing

Page 28: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Gnutella Protocol v0.4 (1/5)

One of the most popular file-sharing protocols. Operates without a central Index Server (such as Napster). Clients (downloaders) are also servers => servents Clients may join or leave the network at any time => highly fault-

tolerant but with a cost! Searches are done within the virtual network while actual

downloads are done offline (with HTTP). The core of the protocol consists of 5 descriptors (PING,

PONG, QUERY, QUERYHIT and PUSH).

Page 29: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Gnutella Protocol (2/5) A Peer (p) needs to connect to 1 or more other Gnutella Peers

in order to participate in the virtual Network p initially doesn’t know IPs of its fellow file-sharers

Gnutella Network N

?

Servent p

Page 30: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Gnutella Protocol (3/5)a. HostCaches – The initial connection P connects to a HostCache H to obtain a set of IP addresses of

active peers. P might alternatively probe its cache to find peers it was

connected in the past.

Gnutella Network N

!

Servent p

Hostcache Servere.g. connect1.gnutellahosts.com:6346

1

2

Request/Receive a set of Active

Peers

H

Connect to network

Page 31: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Gnutella Protocol (4/5)

b. Ping/Pong – The communication overhead Although p is already connected it must discover new peers since its current

connections may break. Thus, it sends periodically PING messages which are broadcasted (message

flooding). If a host e.g. p2 is available it will respond with a PONG (routed only the same

path the PING came from). P might utilize this response and attempt a connection to p2 in order to increase

its degree. Gnutella Network N

Servent p

PING1

PONG2

Servent p2

Page 32: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Gnutella Protocol (5/5)

c. Query/QueryHit – The utilization Query descriptors contain unstructured queries e.g. “celine dion

mp3” They are again, like PING, broadcasted with a typical TTL=7. If a host e.g. p2 matches the query it will respond with a Queryhit

descriptor

d. Push – Enable downloads from peers that are firewalled. If a peer is firewalled => we can’t connect to him. Hence we request

from him to establish a connection on us and to send us the file.

Page 33: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Conclusions Not anything new ... but right time to:

Take advantage of available resources Find an alternative to centralized c/s solutions There is something attractive about the defiance or avoidance of authority.

Raised legal copyright issues

Currently, 60% to 89% of all Internet traffic is due to p2p traffic => source of revenue => marketing argument.

Potential good match between adhoc nets and P2P

Interesting architectural and technical issues behind ... And challenging requirements

Page 34: P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

Summary of P2P computing