pub sub hubbub doc

28
PUBSUBHUBBUB TECHNICAL SEMINAR CONTENTS 1. ABSTRACT 2. INTRODUCTION 1 3. SECURITY MODEL 4 4. MOTIVATION 5 5. NEED FOR ANOTHER PROTOCOL 7 6. SCALE 8 7. PROGRESS 14 8. REHASH 16 9. APPLICATIONS 18 10. CONCLUSION 21 11.BIBLIOGRAPHY 22

Upload: sravani-sravz

Post on 22-Oct-2014

138 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

CONTENTS

1. ABSTRACT

2. INTRODUCTION 1

3. SECURITY MODEL 4

4. MOTIVATION 5

5. NEED FOR ANOTHER PROTOCOL 7

6. SCALE 8

7. PROGRESS 14

8. REHASH 16

9. APPLICATIONS 18

10. CONCLUSION 21

11.BIBLIOGRAPHY 22

Page 2: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

1. ABSTRACT

PubSubHubBub is an “open, server-to-server web-hook-based

publish/subscribe protocol as an extension to Atom (and RSS)”. This protocol

allows interested parties to get instant notifications when a feed is updated.

The protocol was developed by Google and it can be found under the Google

Code project with the same name. Instead of a client constantly polling a

server at regular time intervals in order to find out if the feed has been

updated, the PubSubHubBub protocol turns the pulling approach into a

pushing one. The client subscribes to a Hub and it is almost instantly notified

when the feed is updated. Google has created a reference implementation of

a Hub that can be used to test the publishing/subscribing process to see how

it works. A subscriber (a server that's interested in a topic), initially fetches

the Atom URL as normal. If the Atom file declares its hubs, the subscriber

can then avoid lame, repeated polling of the URL and can instead register

with the feed's hub(s) and subscribe to updates. It helps in real time

messaging using syndication which creates a flow. The simultaneous delivery

of event spurs immediate conversation. This helps a lot in business.

Page 3: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

2. INTRODUCTION

PUBSUBHUBBUB is a simple publish/subscribe protocol which

turns Atom and RSS feeds into real-time streams. It is a Web-scale,

low-latency messaging there are three participants: Publisher,

Subscriber, Hubs. The basic design goals are Decentralization with No

single company in control, Scale to the size of the whole web,

publishing and subscribing as easy as possible, Pragmatic(i.e., not

theoretically perfect, but solve huge, known use cases with minimal

effort).

FOR PUBLISHERS:

Add a declaration in your feed with your Hub of choice

<link rel="hub" href="https://pubsubhubbub.appspot.com/"/>

Send a ping to the Hub with the feed URL

POST / HTTP/1.1

Content-Type: application/x-www-form-urlencoded

hub.mode=publish&hub.url=<your feed>

204 response = Success, 4xx = Bad request, 5xx = Try again

Page 4: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

FOR SUBSCRIBERS:

Detect the Hub declaration in a feed

Send a subscribe request to the feed's Hub POST / HTTP/1.1Content-Type: application/x-www-form-urlencodedhub.mode=subscribe&hub.verify=sync&

hub.topic=<feed URL>&hub.callback=<callback

URL>

Hub will send a request to verify the subscriptionGET /callback? hub.challenge=<random> HTTP/1.1

HTTP/1.1 200 ...

<echo random>Process new content from the HubPOST /callback HTTP/1.1 Content-Type: application/atom+xml...

<? xml version="1.0" encoding="utf-8"?> <feed xmlns="http://www.w3.org/2005/Atom">

<title>Awesome feed</title> <link rel="hub" href="http://pubsubhubbub.appspot.com"/>

...<entry>

...</entry></feed>

ROLE OF THE HUB:

Page 5: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

Logical component o Publishers may be their own Hub o Combined Hub/Publisher has p2p speed-up

Distinct functions

o Accept and verify subscriptions to new topics o Receive pings from publishers, retrieve content o Extract new/updated items from feed o Send all subscribers the new content.

Scalability o # of subscribers & feeds, update frequency o Delegation of content distribution (= bandwidth)

Reliability o Retry fetch, delivery, idempotence

3. SECURITY MODEL

Page 6: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

Subscriber verification prevents DOS attacks

Declaration of the Hub is a delegation of trust

o Subscribers may trust the Hub to deliver content on

publisher's behalf

o v0.2 supports shared-secret HMACs for subscribers to verify that

notifications came from the hub

Privacy through HTTPS for hubs, feeds, and callbacks

o URLs and payloads can be sent via encrypted channel

o Subscribed topics are not discoverable

o Unguessable, capability URLs (e.g., from OAuth)

Publishers can run their own hub!

4. MOTIVATION

TCP maximizes the throughput of a link

Dump data in, it will be received

Page 7: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

The window means no waiting for acks!

When acks are missed, the sender will retransmit

Receivers reassemble the message in-order, de-dupe.

Good citizenship with congestion control.

WITH WINDOW:

WITHOUT WINDOW:

Page 8: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

5. NEED FOR ANOTHER PROTOCOL

We want interoperable, web-scale messaging

Almost every company already has an internal system

o TIBCO, WebsphereMQ, ActiveMQ, RabbitMQ, ...

o Proprietary message payloads, topics, networks

Page 9: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

Existing attempts at an standard haven't caught on

o XMPP weirds people out; started in 1999, still isn't

used for interop widely beyond IM

o These standards are too complex or not pragmatic

(XEP-0060, WS-*, AMQP, RestMS, new REST-*)

Build the simplest interoperable messaging protocol that can scale to

the size of the web

Make the base specification bare-bones, easy-to-use

Target Atom/RSS initially as a payload format; everyone uses them for

time-based, idempotent streams

In the future, add extensions for cool stuff

Proof of simplicity is in the code

o Bret Taylor added PubSubHubBub subscription to Friend Feed in

a single evening.

6. SCALE

GOALS:

World-wide RSS publishing currently.

o ~X,000 updates per second

Legitimate email currently

o ~X,000,000 per second

Page 10: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

Need to scale by at least 1000x; hopefully more.

Trying to enable new use-cases.

LIGHT PINGING:

Protocols exist for faster Atom/RSS

o Ping-o-Matic, changes.xml, SUP, rssCloud

All only indicate the feed URL that has changed

o Still need to go and fetch the content

o These protocols are just optimized polling.

o Equivalent to killing the TCP window!

Optimized polling is still worse o Latency is high: 3 round trips.

o Thundering herd as subscribers fetch published feeds.

o Unpredictable, bursty load pattern.

o More bandwidth, CPU, connection star-pattern.

Page 11: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

LIGHT PINGING AT SCALE:

Send out pings slowly to reduce the herd

Herd causes all feeds to be fully regenerated

o Invalidates existing caches

Bandwidth increases extremely fast

o (average updates per feed) * (# feeds) * (# subscribers) *

(average feed size)

o Often 99.5%+ more than you needed

CPU costs increase for subscribers with update frequency.

Consider a single master replication scheme

After each update, wait for copying to all replicas.

Page 12: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

FAT PINGING:

Compared to light pings

Latency: 1/3 as much

Based on reasonable averages

o Bandwidth: ~20x less

o CPU:~20x less

Never wait for replication delays

Page 13: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

FAT PINGING AT SCALE:

Run your own hub.

Compute feed deltas at update time; no need to regenerate a whole

feed (or churn your caches).

Send out new content at sustained network rate.

Bandwidth is minimum possible per subscriber

o (update size) * (# feeds) * (# subscribers).

Page 14: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

Page 15: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

Advanced protocol pieces.

Connection reuse from HTTP/1.1

Pipeline HTTP requests for feed fetching

Use aggregated content delivery

o Many Atom feeds in a single <feed> XML doc

o Fewer connections

7. PROGRESS

PUBSUBHUBBUB STATUS:

Over 100 Million feeds are PubSubHubbub-enabled

Companies: Google, FriendFeed (FB), livedoor, Six Apart, LiveJournal,

LazyFeed, Superfeedr, ...

Google products: FeedBurner, Blogger, Reader shared items,

Google Alerts,...

Cool apps: Socnode, Reader2Twitter, chat gateways ...

More publishers, subscribers, hubs, apps on the way

Publisher clients: Perl, PHP, Python, Ruby, Java, Haskell, C#,

MovableType, WordPress, Django, Zend

Active mailing list with 240+ members

Page 16: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

GETTING INVOLVED:

Review the spec; recommend improvements

o Open process, will be licensed by Open Web Foundation

Write some sample code for your favorite language or CMS

Contribute to one of the open source Hub implementations

Write on your blog about why we need push for the future.

PRESENT SCENARIO OF FACEBOOK:

Subscribe to feeds that are PubSubHubbub-enabled

o Put that great UI to work.

o Maybe reuse the FriendFeed index pipeline?

o Call Bret and Ben.

Enable PubSubHubbub for activity streams

o Provide Face book app developers with real-time updates to

users' home streams.

o Speeds up surfacing Face book in other apps.

Detecting new events could trigger the app to take action in real-time

(send an email, classify a photo, initiate an action in a game, etc).

FUTURE SCENARIO OF FACEBOOK:

Figure out if private feeds will work with this model

o Run your own hub

o Use capability URLs (OAuth token in the query string).

Page 17: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

Give your developers more feeds to consume and syndicate.

8. REHASH

Push for the future! Scale to new use-cases

Decentralized, open spec: no company owns it.

One API for all stream-based content.

Project page: http://pubsubhubbub.googlecode.com

o Full Hub source code with tests o Example publisher and subscriber apps. o Demo hub at http://pubsubhubbub.appspot.com

HUB STORAGE SPACE:

Manageable cost ~10 million feeds

~1 million subscribers

Assume 1 billion events per a day(~11,000/second) thar be

dragons

FeedEntryRecord

Key name

o "FeedEntryRecord" + entry_id_hash + parent key

Page 18: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

o 400 bytes, could be smaller

Indexed properties

o Entry ID hash (again-- doh!): 160 bytes

o Entry content hash: 160 bytes

o Update time: 8 bytes

Unindexed properties

o Entry ID: 2048 bytes maximum, 200 on average

Result

~1KB per entry

27TB per month at ~11,000 req/sec -- no sweat!

WEB FINGER:

Unified discovery for email addresses Transform an email address into XRD

XRD defines all the services that address has

Helps provide social networking as a protocol

E.g., Simple way to discover if an account has a Portable Contacts

interface

Page 19: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

9. APPLICATIONS

Amazonca API: Amazonca.com provides users with online access to a

barcode database that it maintains.

Aonaware CountCheatService API: Aonaware CountCheatService allows

users to find solutions to simple anagram problems, for example those posed

by the UK Channel 4 game ‘Countdown’.

Bitlan API: Bitlan provides a url shortening tool. The site is in Dutch.

Cleeng API: Cleeng helps publishers monetize their content via

subscriptions, advertisements, or direct micro-payments (between 0.15 and

19.99 EU). Plugins are available for popular CMSes and payment gateways.

DEA Filter API: DEAFilter is a free RESTful API that can be implemented into

an existing site with a couple lines of code. It is used to get rid of all

Disposable Email Address services that are used to infest a site with fake

users and spammers.

DoStuffMedia API: DoStuff Media is a company that helps local media

companies and music festivals monetize their web sites by providing a

technology that aggregates, organizes and displays critical information about

Page 20: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

the festival, all while incorporating social networking, sharing and interactive

features.

Flite Advertising API: Flite is a cloud-based ad platform. The Flite API

allows developers to integrate custom Flash content into Flite ads.

Fog Creek FogBugz XML API: FogBugz is bug tracking software used in

software project management.

Fog Creek Kiln API: The provider offers a complete read/write API for

interacting with Mercurial repositories hosted by Kiln.

Gistpoint API: Gistpoint is a service that lets users find and submit

summaries for online articles.

Glocal Focal US Mortality Data API: Glocal Focal is a social networking

site for social causes. The Glocal Focal US Mortality Data API provides United

States mortality data based on user selectable parameters.

Grid5000 API: Grid’5000 is a scientific instrument for the study of large

scale parallel and distributed systems.

Guzzle Ayup! API: Guzzle Ayup! is a PubSubHubbub Hub. Publishers can

notify the Guzzle Ayup! hub that new content is available on their feeds, in

turn Guzzle Ayup! hub will notify all Subscribers that the feeds they

subscribed to just got updated, by directly pushing the new entries to them.

HotelBeds API: Hotelbeds accommodation & destination services is a

provider of incoming travel services to travel trade professionals worldwide

currently serving over 2,500 resorts across more than 80 countries.

Kayako API: Kayako is help desk software that lets users manage email,

tickets, live chat, calls and remote support.

Kronovia Compliance Cloud Service API: The Kronovia Compliance Cloud

Service allows Social CRM companies to offer social compliance, governance

and brand protection as a part of their solution. Using their API, developers

can integrate the service with any Social CRM application or custom social

media solution.

Massachusetts Port Authority API: The Massachusettes Port Authority

(Massport) provides an API for developers interested in building mobile

Page 21: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

applications that integrate accurate updated content for Massport�s

facilities. Content is provided for Boston Logan International Airport. With the

API, users can make RESTful calls to retrieve data feeds in the Atom

Syndication format.

Mibbit API: Mibbit is an online chat client that can be embedded on third

party websites.

notice API: n0tice.com is a public notice board. It offers APIs for pulling and

adding data to news reports, events and offers. The APIs use RESTful calls

and responses are formatted in XML, JSON, RSS and KML.

NAC Real-time Conversion API: The Natural Area Coding System is a

geodetic system that has generated code called Natural Area Code (NAC) to

unify the representations of geographic coordinates, area codes, street

addresses, postal codes, map grids and property identifiers of every location

or area in the world, and make the information from all maps, GPS receivers

and location based services and products connected, efficient and universal.

NHXS API: NHXS is a provider of contract compliance and point-of-service

adjudication workflow automation.

Nike+ API: Nike+ is a service that lets users track their running statistics

and improvement over time, and track their individual goals along the way.

Page 22: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

10. CONCLUSION

PubSubHubBub is the advanced version of RSS feed. This protocol may

be new, but is new milestone for lot of companies today. The applications of

this protocol are APIs which are very rigorous and clear in complacency.

It has many more advanced features than ping, which help in a lot of

development processes. It has greater efficiency and performance than the

RSS feed. This is mainly used for fast delivery of information to the clients.

It reduces the load on the server and thus increases the overall

performance. It is easy to handle and ensures information delivery at all

costs. Also easy to implement this protocol when load on the server is likely

to be very high. It is a server to server protocol for near-instant notifications.

It is a new and innovative extension to Atom and is an open source protocol.

It is expected that this protocol will support any kind of arbitrary

content, effectively allowing subscription of any web resource. Similarly,

work is being done to allow support for private or protected data.

Page 23: Pub Sub Hubbub Doc

PUBSUBHUBBUB TECHNICAL SEMINAR

11. BIBLIOGRAPHY

techno-weenie.net/2010/10/5/nub-nub

http://code.google.com/p/pubsubhubbub/wiki/ComparingProtocols

http://code.google.com/p/pubsubhubbub/w/list

http://pubsubhubbub.appspot.com

http://pubsubhubbub.appspot.com

http://scripting.com/stories/2009/07/10/googlesPubsubhubbub.html

http://code.google.com/p/pubsubhubbub/wiki/ComparingProtocols

http://code.google.com/p/pubsubhubbub

http://en.wikipedia.org/wiki/PubSubHubbub