FeedTree: Sharing Web Micronews with Peer-to-Peer
Event Notification
D. Sandler, A. Mislove,A. Post, P. Druschel
Presented by: Andrew Sutton
Contributions
• Propose alternative to RSS distribution architecture
• Use peer-to-peer technology to reduce network load
RSS Distribution
• RSS (Real Simple Syndication) - XML format for publishing micronews
• Feed - a source of RSS items• Content Provider - responsible for
publishing RSS feeds• Reader/Aggregator - user agent
responsible for RSS acquisition and display
RSS Distribution Network
• Readers poll content providers
• Request RSS files every ~30 minutes
• Readers can be online, requesting 24/7
Problems with Distribution
• Polling - Requests occur on schedule
• Superfluity - Full response per request
• Stickiness - RSS traffic persists even if web traffic subsides
• 24 Hour Traffic - requests occur all day long
Network Load Example
• Updates occur every 30 minutes
• Slashdot– Subscribers: > 17,000– RSS file size: ~15KB– ~11.6GB/Day of RSS data
• Difficult to measure accurately
• No reliable statistics
Related Work
• Improved Polling
• Outsourced Aggregation
Improved Polling
• Improved Polling– Restrict reader polling via RSS– Use HTTP caching to reduce superfluous
responses– Use compress to reduce response size
• Delta Encoding– Only transmit what’s changed [RFC 3229]
– Seemingly ideal for RSS
Outsourced Aggregation
• Content Providers supply RPC interface to aggregator
• User readers query central server instead of providers
Outsourcing Problems
• Central aggregator allows– Single point of failure for readers– Censorship of original content– Modification of original content (i.e., ads)
• May not be reliable or trustworthy
FeedTree
• Eliminate network/provider load
• Uses peer-to-peer subscription
• Use hybrid push/pull mechanism for timely distribution/update of micronews
• Signed documents to enable trust
FeedTree Architecture
Pastry
• Enables Peer-to-Peer networking applications– Self-organizing - nodes added, removed
dynamically– Network overlay - efficiently routes
messages in participating nodes
• Applications: Scribe, SplitStream
Overlay Network
• Logical network built on top of actual network
• Can define virtual routes between nodes
• Common approach for P2P networks
Pastry Network
• Based on a circular namespace of node id’s (not tree-oriented)
• Routing– Shortest-path based on routing– Non-receivers forward message to next-
closest (proximity) node– Routes messages in O(logn) time
Scribe
• Group Communication and Event Notification– Highly dynamic groups (based on topics)– Uses publish/subscribe model– Allows application-level multicast and
anycast
• Applications: FeedTree, ???
Scribe Multicast
• Subscribing to a topic– Subscriber knows publisher’s node id– Sends “subscribe” message– Forwarding nodes become parents in the multi-
cast tree (keeps track of children)
• Notification of event– Events are multicast to all children of publisher,
forwarders
• One multicast tree per topic
FeedTree Distribution
• Subscription– Readers subscribe to a feed (i.e., Scribe
topic)
• Publication– Each item is given timestamp, sequence id– Document is signed with publishers private
key
FeedTree Delivery
• Bootstrap Delivery– Signed RSS document is multicast to
overlay network– Essentially, a combined subscribe/request
operation
• Incremental Delivery– Only new items are multicast– If no changes, multicast a “heartbeat”
Missed Deliveries
• If reader is missing sequence numbers– Query parent for missing items– Nodes must buffer last n items to make re-
delivery more efficient– If items still missing, query publisher
Publisher Delivery Tree
Network Overhead
• Assume an RSS feed generating 4KB/hour
• Interior node in tree with 16 children forwards < 20B/sec
• However…– Unknown how this scales for large
providers, large readers
Implementation
• Implemented both publisher/reader software (proxies)
• Created testbed website for real distribution of RSS feeds
• No substantial experimentation
http://www.feedtree.net
Advantages/Disadvantages
• Benefits - lower cost of delivering micronews– (Significantly) reduced provider load– No fear of being RSS feeds being
“slashdotted”
• Differentiated services - different feeds for headlines/full news
Disadvantages
• Requires specialized software for publishers/subscribers
• P2P denial of service attacks– Malicious nodes may not forward events
Conclusions
• End users receive better service than currently possible
• Foresee new services based on RSS – Storing every single RSS item published on
the internet– Anonymous feeds using anonymizing p2p
routing algorithms– Cooperative multicast to distribute realtime
media
Evaluation
• Good– Appears to be well-reasoned idea– Developed software to test hypothesis– Good workshop paper
• What’s needed for research– More detailed description of protocol– Substantiate claims about performance
(i.e., experiment)
Questions
1. List four problems with the current RSS feed distribution model.
2. Which two of these four problems have the largest impact on network load?
Questions
3. How long does it take Pastry to route a message if there are n nodes in the network?
4. Suppose Slashdot has 50,000 RSS subscribers through FeedTree. What is the approximate depth of the multicast tree for the Slashdot topic?
Questions
5. Assume that there are 100,000 FeedTree topics on a Pastry network that all update at 4KB/Hour. An interior node with 16 children will send 20B/sec. Suppose an interior node participates in all feeds. What is the expected output (in B/sec) of this node?