pub sub hubbub doc
TRANSCRIPT
PUBSUBHUBBUB TECHNICAL SEMINAR
CONTENTS
1. ABSTRACT
2. INTRODUCTION 1
3. SECURITY MODEL 4
4. MOTIVATION 5
5. NEED FOR ANOTHER PROTOCOL 7
6. SCALE 8
7. PROGRESS 14
8. REHASH 16
9. APPLICATIONS 18
10. CONCLUSION 21
11.BIBLIOGRAPHY 22
PUBSUBHUBBUB TECHNICAL SEMINAR
1. ABSTRACT
PubSubHubBub is an “open, server-to-server web-hook-based
publish/subscribe protocol as an extension to Atom (and RSS)”. This protocol
allows interested parties to get instant notifications when a feed is updated.
The protocol was developed by Google and it can be found under the Google
Code project with the same name. Instead of a client constantly polling a
server at regular time intervals in order to find out if the feed has been
updated, the PubSubHubBub protocol turns the pulling approach into a
pushing one. The client subscribes to a Hub and it is almost instantly notified
when the feed is updated. Google has created a reference implementation of
a Hub that can be used to test the publishing/subscribing process to see how
it works. A subscriber (a server that's interested in a topic), initially fetches
the Atom URL as normal. If the Atom file declares its hubs, the subscriber
can then avoid lame, repeated polling of the URL and can instead register
with the feed's hub(s) and subscribe to updates. It helps in real time
messaging using syndication which creates a flow. The simultaneous delivery
of event spurs immediate conversation. This helps a lot in business.
PUBSUBHUBBUB TECHNICAL SEMINAR
2. INTRODUCTION
PUBSUBHUBBUB is a simple publish/subscribe protocol which
turns Atom and RSS feeds into real-time streams. It is a Web-scale,
low-latency messaging there are three participants: Publisher,
Subscriber, Hubs. The basic design goals are Decentralization with No
single company in control, Scale to the size of the whole web,
publishing and subscribing as easy as possible, Pragmatic(i.e., not
theoretically perfect, but solve huge, known use cases with minimal
effort).
FOR PUBLISHERS:
Add a declaration in your feed with your Hub of choice
<link rel="hub" href="https://pubsubhubbub.appspot.com/"/>
Send a ping to the Hub with the feed URL
POST / HTTP/1.1
Content-Type: application/x-www-form-urlencoded
hub.mode=publish&hub.url=<your feed>
204 response = Success, 4xx = Bad request, 5xx = Try again
PUBSUBHUBBUB TECHNICAL SEMINAR
FOR SUBSCRIBERS:
Detect the Hub declaration in a feed
Send a subscribe request to the feed's Hub POST / HTTP/1.1Content-Type: application/x-www-form-urlencodedhub.mode=subscribe&hub.verify=sync&
hub.topic=<feed URL>&hub.callback=<callback
URL>
Hub will send a request to verify the subscriptionGET /callback? hub.challenge=<random> HTTP/1.1
HTTP/1.1 200 ...
<echo random>Process new content from the HubPOST /callback HTTP/1.1 Content-Type: application/atom+xml...
<? xml version="1.0" encoding="utf-8"?> <feed xmlns="http://www.w3.org/2005/Atom">
<title>Awesome feed</title> <link rel="hub" href="http://pubsubhubbub.appspot.com"/>
...<entry>
...</entry></feed>
ROLE OF THE HUB:
PUBSUBHUBBUB TECHNICAL SEMINAR
Logical component o Publishers may be their own Hub o Combined Hub/Publisher has p2p speed-up
Distinct functions
o Accept and verify subscriptions to new topics o Receive pings from publishers, retrieve content o Extract new/updated items from feed o Send all subscribers the new content.
Scalability o # of subscribers & feeds, update frequency o Delegation of content distribution (= bandwidth)
Reliability o Retry fetch, delivery, idempotence
3. SECURITY MODEL
PUBSUBHUBBUB TECHNICAL SEMINAR
Subscriber verification prevents DOS attacks
Declaration of the Hub is a delegation of trust
o Subscribers may trust the Hub to deliver content on
publisher's behalf
o v0.2 supports shared-secret HMACs for subscribers to verify that
notifications came from the hub
Privacy through HTTPS for hubs, feeds, and callbacks
o URLs and payloads can be sent via encrypted channel
o Subscribed topics are not discoverable
o Unguessable, capability URLs (e.g., from OAuth)
Publishers can run their own hub!
4. MOTIVATION
TCP maximizes the throughput of a link
Dump data in, it will be received
PUBSUBHUBBUB TECHNICAL SEMINAR
The window means no waiting for acks!
When acks are missed, the sender will retransmit
Receivers reassemble the message in-order, de-dupe.
Good citizenship with congestion control.
WITH WINDOW:
WITHOUT WINDOW:
PUBSUBHUBBUB TECHNICAL SEMINAR
5. NEED FOR ANOTHER PROTOCOL
We want interoperable, web-scale messaging
Almost every company already has an internal system
o TIBCO, WebsphereMQ, ActiveMQ, RabbitMQ, ...
o Proprietary message payloads, topics, networks
PUBSUBHUBBUB TECHNICAL SEMINAR
Existing attempts at an standard haven't caught on
o XMPP weirds people out; started in 1999, still isn't
used for interop widely beyond IM
o These standards are too complex or not pragmatic
(XEP-0060, WS-*, AMQP, RestMS, new REST-*)
Build the simplest interoperable messaging protocol that can scale to
the size of the web
Make the base specification bare-bones, easy-to-use
Target Atom/RSS initially as a payload format; everyone uses them for
time-based, idempotent streams
In the future, add extensions for cool stuff
Proof of simplicity is in the code
o Bret Taylor added PubSubHubBub subscription to Friend Feed in
a single evening.
6. SCALE
GOALS:
World-wide RSS publishing currently.
o ~X,000 updates per second
Legitimate email currently
o ~X,000,000 per second
PUBSUBHUBBUB TECHNICAL SEMINAR
Need to scale by at least 1000x; hopefully more.
Trying to enable new use-cases.
LIGHT PINGING:
Protocols exist for faster Atom/RSS
o Ping-o-Matic, changes.xml, SUP, rssCloud
All only indicate the feed URL that has changed
o Still need to go and fetch the content
o These protocols are just optimized polling.
o Equivalent to killing the TCP window!
Optimized polling is still worse o Latency is high: 3 round trips.
o Thundering herd as subscribers fetch published feeds.
o Unpredictable, bursty load pattern.
o More bandwidth, CPU, connection star-pattern.
PUBSUBHUBBUB TECHNICAL SEMINAR
LIGHT PINGING AT SCALE:
Send out pings slowly to reduce the herd
Herd causes all feeds to be fully regenerated
o Invalidates existing caches
Bandwidth increases extremely fast
o (average updates per feed) * (# feeds) * (# subscribers) *
(average feed size)
o Often 99.5%+ more than you needed
CPU costs increase for subscribers with update frequency.
Consider a single master replication scheme
After each update, wait for copying to all replicas.
PUBSUBHUBBUB TECHNICAL SEMINAR
FAT PINGING:
Compared to light pings
Latency: 1/3 as much
Based on reasonable averages
o Bandwidth: ~20x less
o CPU:~20x less
Never wait for replication delays
PUBSUBHUBBUB TECHNICAL SEMINAR
FAT PINGING AT SCALE:
Run your own hub.
Compute feed deltas at update time; no need to regenerate a whole
feed (or churn your caches).
Send out new content at sustained network rate.
Bandwidth is minimum possible per subscriber
o (update size) * (# feeds) * (# subscribers).
PUBSUBHUBBUB TECHNICAL SEMINAR
PUBSUBHUBBUB TECHNICAL SEMINAR
Advanced protocol pieces.
Connection reuse from HTTP/1.1
Pipeline HTTP requests for feed fetching
Use aggregated content delivery
o Many Atom feeds in a single <feed> XML doc
o Fewer connections
7. PROGRESS
PUBSUBHUBBUB STATUS:
Over 100 Million feeds are PubSubHubbub-enabled
Companies: Google, FriendFeed (FB), livedoor, Six Apart, LiveJournal,
LazyFeed, Superfeedr, ...
Google products: FeedBurner, Blogger, Reader shared items,
Google Alerts,...
Cool apps: Socnode, Reader2Twitter, chat gateways ...
More publishers, subscribers, hubs, apps on the way
Publisher clients: Perl, PHP, Python, Ruby, Java, Haskell, C#,
MovableType, WordPress, Django, Zend
Active mailing list with 240+ members
PUBSUBHUBBUB TECHNICAL SEMINAR
GETTING INVOLVED:
Review the spec; recommend improvements
o Open process, will be licensed by Open Web Foundation
Write some sample code for your favorite language or CMS
Contribute to one of the open source Hub implementations
Write on your blog about why we need push for the future.
PRESENT SCENARIO OF FACEBOOK:
Subscribe to feeds that are PubSubHubbub-enabled
o Put that great UI to work.
o Maybe reuse the FriendFeed index pipeline?
o Call Bret and Ben.
Enable PubSubHubbub for activity streams
o Provide Face book app developers with real-time updates to
users' home streams.
o Speeds up surfacing Face book in other apps.
Detecting new events could trigger the app to take action in real-time
(send an email, classify a photo, initiate an action in a game, etc).
FUTURE SCENARIO OF FACEBOOK:
Figure out if private feeds will work with this model
o Run your own hub
o Use capability URLs (OAuth token in the query string).
PUBSUBHUBBUB TECHNICAL SEMINAR
Give your developers more feeds to consume and syndicate.
8. REHASH
Push for the future! Scale to new use-cases
Decentralized, open spec: no company owns it.
One API for all stream-based content.
Project page: http://pubsubhubbub.googlecode.com
o Full Hub source code with tests o Example publisher and subscriber apps. o Demo hub at http://pubsubhubbub.appspot.com
HUB STORAGE SPACE:
Manageable cost ~10 million feeds
~1 million subscribers
Assume 1 billion events per a day(~11,000/second) thar be
dragons
FeedEntryRecord
Key name
o "FeedEntryRecord" + entry_id_hash + parent key
PUBSUBHUBBUB TECHNICAL SEMINAR
o 400 bytes, could be smaller
Indexed properties
o Entry ID hash (again-- doh!): 160 bytes
o Entry content hash: 160 bytes
o Update time: 8 bytes
Unindexed properties
o Entry ID: 2048 bytes maximum, 200 on average
Result
~1KB per entry
27TB per month at ~11,000 req/sec -- no sweat!
WEB FINGER:
Unified discovery for email addresses Transform an email address into XRD
XRD defines all the services that address has
Helps provide social networking as a protocol
E.g., Simple way to discover if an account has a Portable Contacts
interface
PUBSUBHUBBUB TECHNICAL SEMINAR
9. APPLICATIONS
Amazonca API: Amazonca.com provides users with online access to a
barcode database that it maintains.
Aonaware CountCheatService API: Aonaware CountCheatService allows
users to find solutions to simple anagram problems, for example those posed
by the UK Channel 4 game ‘Countdown’.
Bitlan API: Bitlan provides a url shortening tool. The site is in Dutch.
Cleeng API: Cleeng helps publishers monetize their content via
subscriptions, advertisements, or direct micro-payments (between 0.15 and
19.99 EU). Plugins are available for popular CMSes and payment gateways.
DEA Filter API: DEAFilter is a free RESTful API that can be implemented into
an existing site with a couple lines of code. It is used to get rid of all
Disposable Email Address services that are used to infest a site with fake
users and spammers.
DoStuffMedia API: DoStuff Media is a company that helps local media
companies and music festivals monetize their web sites by providing a
technology that aggregates, organizes and displays critical information about
PUBSUBHUBBUB TECHNICAL SEMINAR
the festival, all while incorporating social networking, sharing and interactive
features.
Flite Advertising API: Flite is a cloud-based ad platform. The Flite API
allows developers to integrate custom Flash content into Flite ads.
Fog Creek FogBugz XML API: FogBugz is bug tracking software used in
software project management.
Fog Creek Kiln API: The provider offers a complete read/write API for
interacting with Mercurial repositories hosted by Kiln.
Gistpoint API: Gistpoint is a service that lets users find and submit
summaries for online articles.
Glocal Focal US Mortality Data API: Glocal Focal is a social networking
site for social causes. The Glocal Focal US Mortality Data API provides United
States mortality data based on user selectable parameters.
Grid5000 API: Grid’5000 is a scientific instrument for the study of large
scale parallel and distributed systems.
Guzzle Ayup! API: Guzzle Ayup! is a PubSubHubbub Hub. Publishers can
notify the Guzzle Ayup! hub that new content is available on their feeds, in
turn Guzzle Ayup! hub will notify all Subscribers that the feeds they
subscribed to just got updated, by directly pushing the new entries to them.
HotelBeds API: Hotelbeds accommodation & destination services is a
provider of incoming travel services to travel trade professionals worldwide
currently serving over 2,500 resorts across more than 80 countries.
Kayako API: Kayako is help desk software that lets users manage email,
tickets, live chat, calls and remote support.
Kronovia Compliance Cloud Service API: The Kronovia Compliance Cloud
Service allows Social CRM companies to offer social compliance, governance
and brand protection as a part of their solution. Using their API, developers
can integrate the service with any Social CRM application or custom social
media solution.
Massachusetts Port Authority API: The Massachusettes Port Authority
(Massport) provides an API for developers interested in building mobile
PUBSUBHUBBUB TECHNICAL SEMINAR
applications that integrate accurate updated content for Massport�s
facilities. Content is provided for Boston Logan International Airport. With the
API, users can make RESTful calls to retrieve data feeds in the Atom
Syndication format.
Mibbit API: Mibbit is an online chat client that can be embedded on third
party websites.
notice API: n0tice.com is a public notice board. It offers APIs for pulling and
adding data to news reports, events and offers. The APIs use RESTful calls
and responses are formatted in XML, JSON, RSS and KML.
NAC Real-time Conversion API: The Natural Area Coding System is a
geodetic system that has generated code called Natural Area Code (NAC) to
unify the representations of geographic coordinates, area codes, street
addresses, postal codes, map grids and property identifiers of every location
or area in the world, and make the information from all maps, GPS receivers
and location based services and products connected, efficient and universal.
NHXS API: NHXS is a provider of contract compliance and point-of-service
adjudication workflow automation.
Nike+ API: Nike+ is a service that lets users track their running statistics
and improvement over time, and track their individual goals along the way.
PUBSUBHUBBUB TECHNICAL SEMINAR
10. CONCLUSION
PubSubHubBub is the advanced version of RSS feed. This protocol may
be new, but is new milestone for lot of companies today. The applications of
this protocol are APIs which are very rigorous and clear in complacency.
It has many more advanced features than ping, which help in a lot of
development processes. It has greater efficiency and performance than the
RSS feed. This is mainly used for fast delivery of information to the clients.
It reduces the load on the server and thus increases the overall
performance. It is easy to handle and ensures information delivery at all
costs. Also easy to implement this protocol when load on the server is likely
to be very high. It is a server to server protocol for near-instant notifications.
It is a new and innovative extension to Atom and is an open source protocol.
It is expected that this protocol will support any kind of arbitrary
content, effectively allowing subscription of any web resource. Similarly,
work is being done to allow support for private or protected data.
PUBSUBHUBBUB TECHNICAL SEMINAR
11. BIBLIOGRAPHY
techno-weenie.net/2010/10/5/nub-nub
http://code.google.com/p/pubsubhubbub/wiki/ComparingProtocols
http://code.google.com/p/pubsubhubbub/w/list
http://pubsubhubbub.appspot.com
http://pubsubhubbub.appspot.com
http://scripting.com/stories/2009/07/10/googlesPubsubhubbub.html
http://code.google.com/p/pubsubhubbub/wiki/ComparingProtocols
http://code.google.com/p/pubsubhubbub
http://en.wikipedia.org/wiki/PubSubHubbub