large-scale incremental processing using distributed transactions and notifications daniel peng and...

72
Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation @ IDB Lab. Seminar Presented by Jee-bum Park

Upload: elmer-wiggins

Post on 02-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

Large-scale Incremental ProcessingUsing Distributed Transactions and Notifications

Daniel Peng and Frank DabekGoogle, Inc.OSDI 2010

15 Feb 2012Presentation @ IDB Lab. Seminar

Presented by Jee-bum Park

Page 2: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

2

Outline Introduction Design

– Bigtable overview– Transactions– Notifications

Evaluation Conclusion Good and Not So Good Things

Page 3: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

3

Introduction How can Google find the documents on the web so

fast?

Page 4: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

4

Introduction Google uses an index, built by the indexing sys-

tem, that can be used to answer search queries

Page 5: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

5

Introduction What does the indexing system do?

– Crawling every page on the web– Parsing the documents– Extracting links– Clustering duplicates– Inverting links– Computing PageRank– ...

Page 7: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

7

Introduction Compute PageRank using MapReduce

Job 1: compute R(1) Job 2: compute R(2) Job 3: compute R(3) ...

R(t) =

Page 8: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

8

Introduction Now, consider how to update that index after re-

crawling some small portion of the web

Page 9: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

9

Introduction Now, consider how to update that index after re-

crawling some small portion of the web

Is it okay to run the MapReducesover just new pages?

Page 10: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

10

Introduction Now, consider how to update that index after re-

crawling some small portion of the web

Is it okay to run the MapReducesover just new pages?

Nope, there are links between thenew pages and the rest of the web

Page 11: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

11

Introduction Now, consider how to update that index after re-

crawling some small portion of the web

Is it okay to run the MapReducesover just new pages?

Nope, there are links between thenew pages and the rest of the web

Well, how about this?

Page 12: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

12

Introduction Now, consider how to update that index after re-

crawling some small portion of the web

Is it okay to run the MapReducesover just new pages?

Nope, there are links between thenew pages and the rest of the web

Well, how about this?

MapReduces must be run again over the entire repository

Page 13: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

13

Introduction Google’s web search index was produced in this way

– Running over the entire pages

It was not a critical issue,– Because given enough computing resources, MapReduce’s

scalability makes this approach feasible

However, reprocessing the entire web– Discards the work done in earlier runs– Makes latency proportional to the size of the repository,

rather than the size of an update

Page 14: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

14

Introduction An ideal data processing system for the task of main-

taining the web search index would be optimized for incremental processing

Incremental processing system: Percolator

Page 15: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

15

Outline Introduction Design

– Bigtable overview– Transactions– Notifications

Evaluation Conclusion Good and Not So Good Things

Page 16: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

16

Design Percolator is built on top of the Bigtable distributed storage sys-

tem

A Percolator system consists of three binaries that run on every machine in the cluster– A Percolator worker– A Bigtable tablet server– A GFS chunkserver

All observers (user applications) are linked into the Percolator worker

Page 17: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

17

Design Dependencies

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Page 18: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

Design System architecture

18

Node 1

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Node 2

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Node ...

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Timestamp oracle ser-vice Lightweight lock service

Page 19: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

19

Design The Percolator worker

– Scans the Bigtable for changed columns– Invokes the corresponding observers as a function call in the

worker process

The observers– Perform transactions by sending read/write RPCs to Bigtable

tablet servers

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Page 20: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

20

Design The Percolator worker

– Scans the Bigtable for changed columns– Invokes the corresponding observers as a function call in the

worker process

The observers– Perform transactions by sending read/write RPCs to Bigtable

tablet servers

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

1: scan

Page 21: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

21

Design The Percolator worker

– Scans the Bigtable for changed columns– Invokes the corresponding observers as a function call in the

worker process

The observers– Perform transactions by sending read/write RPCs to Bigtable

tablet servers

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

1: scan

2: in-voke

Page 22: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

22

Design The Percolator worker

– Scans the Bigtable for changed columns– Invokes the corresponding observers as a function call in the

worker process

The observers– Perform transactions by sending read/write RPCs to Bigtable

tablet servers

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

1: scan

2: in-voke3:

RPC

Page 23: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

Design The timestamp oracle service

– Provides strictly increasing timestamps A property required for correct operation of the snapshot isola-

tion protocol

The lightweight lock service– Workers use it to make the search for dirty notifications

more efficient

23

Timestamp oracle ser-vice Lightweight lock service

Page 24: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

24

Design Percolator provides two main abstractions

– Transactions Cross-row, cross-table with ACID snapshot-isolation semantics

– Observers Similar to database triggers or events

Transactions Observers Percolator

Page 25: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

25

Design – Bigtable overview Percolator is built on top of the Bigtable distributed

storage system

Bigtable presents a multi-dimensional sorted map to users– Keys are (row, column, timestamp) tuples

Bigtable provides lookup, update operations, and transactions on individual rows

Bigtable does not provide multi-row transactions

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

Page 26: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

26

Design – Transactions Percolator provides cross-row, cross-table transac-

tions with ACID snapshot-isolation semantics

Page 27: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

27

Design – Transactions Percolator stores multiple versions of each data

item using Bigtable’s timestamp dimension– Multiple versions are required to provide snapshot isola-

tion

Snapshot isolation

13

2

Page 28: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

28

Design – Transactions Case 1: use exclusive locks

1

Page 29: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

29

Design – Transactions Case 1: use exclusive locks

1

Page 30: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

30

Design – Transactions Case 1: use exclusive locks

1

2

Page 31: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

31

Design – Transactions Case 1: use exclusive locks

2

Page 32: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

32

Design – Transactions Case 1: use exclusive locks

2

Page 33: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

33

Design – Transactions Case 1: use exclusive locks

2

Page 34: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

34

Design – Transactions Case 2: do not use any locks

1

Page 35: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

35

Design – Transactions Case 2: do not use any locks

1

Page 36: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

36

Design – Transactions Case 2: do not use any locks

1

2

Page 37: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

37

Design – Transactions Case 2: do not use any locks

1

2

Page 38: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

38

Design – Transactions Case 2: do not use any locks

2

Page 39: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

39

Design – Transactions Case 2: do not use any locks

2

Page 40: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

40

Design – Transactions Case 2: do not use any locks

2

Page 41: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

41

Design – Transactions Case 3: use multiple versioning & timestamp

1

Page 42: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

42

Design – Transactions Case 3: use multiple versioning & timestamp

1

Page 43: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

43

Design – Transactions Case 3: use multiple versioning & timestamp

1

Page 44: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

44

Design – Transactions Case 3: use multiple versioning & timestamp

1

2

Page 45: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

45

Design – Transactions Case 3: use multiple versioning & timestamp

1

2

Page 46: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

46

Design – Transactions Case 3: use multiple versioning & timestamp

1

2

Page 47: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

47

Design – Transactions Case 3: use multiple versioning & timestamp

1

2

Page 48: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

48

Design – Transactions Case 3: use multiple versioning & timestamp

2

Page 49: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

49

Design – Transactions Case 3: use multiple versioning & timestamp

2

Page 50: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

50

Design – Transactions Case 3: use multiple versioning & timestamp

2

Page 51: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

51

Design – Transactions Case 3: use multiple versioning & timestamp

2

Page 52: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

52

Design – Transactions Percolator stores its locks in special in-memory col-

umns in the same Bigtable

Page 53: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

53

Design – Transactions Percolator transaction demo

Page 54: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

54

Design – Transactions Percolator transaction demo

Page 55: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

55

Design – Transactions Percolator transaction demo

Page 56: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

56

Design – Transactions Percolator transaction demo

Page 57: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

57

Design – Transactions Percolator transaction demo

Page 58: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

58

Design – Notifications In Percolator, the user writes code (“observers”) to

be triggered by changes to the table

Each observer registers a function and a set of col-umns

Percolator invokes the functions after data is written to one of those columns in any row

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

1: scan

2: in-voke3:

RPC

Page 59: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

59

A Percolator application

Design – Notifications Percolator applications are structured as a series of

observers– Each observer completes a task and creates more work for

“downstream” observers by writing to the table

Observer 1

Observer 2

Observer 4

Observer 5

Observer 3

Observer 6

Page 60: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

60

Google’s new indexing system

Design – Notifications

Document Processor (parse, extract links,

etc.)Clustering Exporter

Observers

Percolator worker

Bigtable tablet server

GFS chunkserver

1: scan

2: in-voke3:

RPC

Page 61: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

61

Design – Notifications To implement notifications, Percolator needs to effi-

ciently find dirty cells with observers that need to be run

To identify dirty cells, Percolator maintains a special “notify” Bigtable column, containing an entry for each dirty cell– When a transaction writes an observed cell, it also sets the

corresponding notify cell

Page 62: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

Design – Notifications Each Percolator worker chooses a portion of the table

to scan by picking a region of the table randomly– To avoid running observers on the same row concurrently,

each worker acquires a lock from a lightweight lock ser-vice before scanning the row

62

Timestamp oracle ser-vice Lightweight lock service

Page 63: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

63

Outline Introduction Design

– Bigtable overview– Transactions– Notifications

Evaluation Conclusion Good and Not So Good Things

Page 64: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

64

Evaluation Experiences with converting a MapReduce-based index-

ing pipeline to use Percolator

Latency– 100x faster than the previous system

Simplification– The number of observers in the new system: 10– The number of MapReduces in the previous system: 100

Easier to operate– Far fewer moving parts: tablet servers, Percolator workers,

chunkservers– In the old system, each of a hundred different MapReduces

needed to be individually configured and could independently fail

Page 65: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

65

Evaluation Crawl rate benchmark on 240 machines

Page 66: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

66

Evaluation Versus Bigtable

Page 67: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

67

Evaluation Fault-tolerance

Page 68: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

68

Outline Introduction Design

– Bigtable overview– Transactions– Notifications

Evaluation Conclusion Good and Not So Good Things

Page 69: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

69

Conclusion Percolator provides two main abstractions

– Transactions Cross-row, cross-table with ACID snapshot-isolation semantics

– Observers Similar to database triggers or events

Transactions Observers Percolator

Page 70: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

70

Outline Introduction Design

– Bigtable overview– Transactions– Notifications

Evaluation Conclusion Good and Not So Good Things

Page 71: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

71

Good and Not So Good Things Good things

– Simple and neat design– Purpose of use is clear– Detailed description based on real example: Google’s index-

ing system

Not so good things– Lack of observer examples (Google’s indexing system in par-

ticular)

Page 72: Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI 2010 15 Feb 2012 Presentation

Thank You!

Any Questions or Comments?