thialfi: a client notification service for internet-scale applications

24
Thial : A Client Noti cation Service for Internet-Scale Applications Atul Adya, Gregory Cooper, Daniel Myers, Michael Piatek Google Seattle 1

Upload: sanam

Post on 23-Feb-2016

46 views

Category:

Documents


1 download

DESCRIPTION

Thialfi: A Client Notification Service for Internet-Scale Applications. Atul Adya, Gregory Cooper, Daniel Myers, Michael Piatek Google Seattle. A Case for Notifications. Problem: Ensuring cached data is fresh across users and devices. Common Application Patterns. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Thialfi: A Client Notification Service for Internet-Scale Applications

1

Thialfi: A Client Notification Servicefor Internet-Scale Applications

Atul Adya, Gregory Cooper, Daniel Myers, Michael Piatek

Google Seattle

Page 2: Thialfi: A Client Notification Service for Internet-Scale Applications

2

A Case for NotificationsProblem: Ensuring cached data is fresh across

users and devices

Page 3: Thialfi: A Client Notification Service for Internet-Scale Applications

3

Common Application Patterns

• Clients poll to detect changes– Simple and reliable, but slow and inefficient

• Push updates to the client– Fast but complex– Add backup polling to get reliability– Tail latencies can be high: masks bugs– Application-specific protocol

sacrifice reliability

Page 4: Thialfi: A Client Notification Service for Internet-Scale Applications

4

Our Solution: Thialfi

• Scalable: tracks millions of clients and objects

• Fast: notifies clients in less than a second

• Reliable: even when entire data centers fail

• Easy to use: deployed in Chrome Sync, Contacts, Google Plus

Page 5: Thialfi: A Client Notification Service for Internet-Scale Applications

5

Talk Outline

• Thialfi’s abstraction: reliable signaling

• Delivering notifications in the common case

• Detecting and recovering from failures

• Evaluation and experience

Page 6: Thialfi: A Client Notification Service for Internet-Scale Applications

6

Thialfi Overview

Thialfi client library

Register X Notify X

ClientData center

X: C1, C2

Client C1 Client C2

Thialfi Service

Update XRegister

Register

Update XApplication backend

Notify X Notify X

Page 7: Thialfi: A Client Notification Service for Internet-Scale Applications

7

Thialfi Abstraction

• Objects have unique IDs and version numbers, monotonically increasing on every update

• Delivery guarantee– Registered clients learn latest version number– Reliable signal only: cached object ID X at version Y

Page 8: Thialfi: A Client Notification Service for Internet-Scale Applications

8

Why Signal, Not Data?

• Developers want reliable, in-order data delivery

• Adds complexity to Thialfi and application, e.g.,– Hard state, arbitrary buffering– Offline applications flooded with data on wakeup

• For most applications, reliable signal is enough– Invoke polling path on signal: simplifies integration

Page 9: Thialfi: A Client Notification Service for Internet-Scale Applications

9

API Without Failure Recovery

Thialfi Service Publish(objectId, version)

ClientLibrary

Register(objectId)Unregister(objectId)

Notify(objectId, version)

Page 10: Thialfi: A Client Notification Service for Internet-Scale Applications

10

Talk Outline

• Thialfi’s abstraction: reliable signaling

• Delivering notifications in the common case

• Detecting and recovering from failures

• Evaluation and experience

Page 11: Thialfi: A Client Notification Service for Internet-Scale Applications

11

Architecture

ClientBigtable

• Matcher: Object ID registered clients, version• Registrar: Client ID registered objects, notifications

Client

Registrar

MatcherObjectBigtable

Data center

Notifications Application Backend

Registrations, notifications,acknowledgments

Client library

Page 12: Thialfi: A Client Notification Service for Internet-Scale Applications

12

C1: x, v7C2: x, v7C1: x, v5C2: x,

x: v5; C1, C2x: v7; C1, C2x: v7; C1, C2

x

Life of a Notification

ClientBigtable

C1: x, v7

C2: x, v7

Notify: x, v7

Client C2

MatcherObjectBigtable

Data center

Publish(x, v7)x, v7

Ack: x, v7

Registrar

Page 13: Thialfi: A Client Notification Service for Internet-Scale Applications

13

Talk Outline

• Thialfi’s abstraction: reliable signaling

• Delivering notifications in the common case

• Detecting and recovering from failures

• Evaluation and experience

Page 14: Thialfi: A Client Notification Service for Internet-Scale Applications

14

Data center lossServer state loss/schema migrationPartial storage unavailability

Possible Failures

ClientLibrary

ClientBigtable Registrar

MatcherObjectBigtable

ClientBigtable Registrar

MatcherObjectBigtable

. . .

Data center 1 Data center nThialfi Service

ClientStore

Client restartClient state loss

Publish Feed

Network failures

Page 15: Thialfi: A Client Notification Service for Internet-Scale Applications

15

Failures Addressed by Thialfi

• Client restart• Client state loss• Network failures• Partial storage unavailability• Server state loss / schema migration• Publish feed loss• Data center outage

Page 16: Thialfi: A Client Notification Service for Internet-Scale Applications

16

Main Principle: No Hard State

• Thialfi remains correct even if all state is lost– All registrations– All object versions

• Detect and reconstruct after failures using:– ReissueRegistrations() client event– Registration Sync Protocol– NotifyUnknown() client event

Page 17: Thialfi: A Client Notification Service for Internet-Scale Applications

17

Recovering Client Registrations

Registrar

MatcherObjectBigtable

x

y

x yReissueRegistrations()

Register(x); Register(y)

ReissueRegistrations: Not a burden for applications– Application stores objects in its cache, or – Object list is implicit, e.g., bookmarks for user X

Page 18: Thialfi: A Client Notification Service for Internet-Scale Applications

18

Registrar

MatcherObjectBigtable

Register: x, y

Syncing Client Registrations

x

y

Hash(x, y)x y

• Goal: Keep client-registrar registration state in sync• Every message contains hash of registered objects• Registrar initiates protocol when detects out-of-sync• Allows simpler reasoning of registration state

Reg syncHash(x, y)

Page 19: Thialfi: A Client Notification Service for Internet-Scale Applications

19

Recovering From Lost Versions

• Versions may be lost, e.g. schema migration

• Refreshing from backend requires tight coupling

• Inform client with NotifyUnknown(objectId) – Client must refresh, regardless of its current state

Page 20: Thialfi: A Client Notification Service for Internet-Scale Applications

20

Talk Outline

• Thialfi’s abstraction: reliable signaling

• Delivering notifications in the common case

• Detecting and recovering from failures

• Evaluation and experience

Page 21: Thialfi: A Client Notification Service for Internet-Scale Applications

21

Notification Latency Breakdown

Notification latency (ms)0

100

200

300

Matcher to Registrar RPC (Batched)

Matcher Bigtable Read

Matcher Bigtable Write (Batched)

Bridge to Matcher RPC (Batched)

App Backend to Bridge

Batching accounts for significant fraction of latency

Page 22: Thialfi: A Client Notification Service for Internet-Scale Applications

22

Thialfi Usage by ApplicationsApplication Language Network

ChannelClient Lines of Code(Semi-colons)

Chrome Sync C++ XMPP 535Contacts JavaScript Hanging GET 40

Google+ JavaScript Hanging GET 80Android Application Java C2DM +

Standard GET300

Google BlackBerry Java RPC 340

Page 23: Thialfi: A Client Notification Service for Internet-Scale Applications

23

Some Lessons Learned

• Add complexity at the server, not the client– Deploy at server: minutes. Upgrade clients: years+

• Asynchronous events, not callbacks– Spontaneous events occur: need to handle them

• Initial applications have few objects per client– Earlier use of polling forces such a model

Page 24: Thialfi: A Client Notification Service for Internet-Scale Applications

24

Thialfi Summary

• Fast, scalable notification service• Reliable even when data centers fail• Two key ideas simplify failure handling– Deliver a reliable signal, not data– No hard state: reconstruct after failure

• Deployed in Chrome Sync, Contacts, Google+