couchconf-sf-introduction-to-mapreduce

60
1 Wednesday, August 3, 11

Upload: couchbase

Post on 20-Jun-2015

1.173 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CouchConf-SF-Introduction-to-MapReduce

1

Wednesday, August 3, 11

Page 2: CouchConf-SF-Introduction-to-MapReduce

Introduction to MapReduce

2

Wednesday, August 3, 11

Page 3: CouchConf-SF-Introduction-to-MapReduce

a 'map' functionA named pair of functions:

An entry in a 'design document'

3

A 'View' Is:

a 'reduce' function (optional)

A disk file of indexed results

Wednesday, August 3, 11

Page 4: CouchConf-SF-Introduction-to-MapReduce

Called with every database documentAn 'emitter' of a key and a value

4

A 'Map Function' Is:

Wednesday, August 3, 11

Page 5: CouchConf-SF-Introduction-to-MapReduce

Called once with the 'map' resultsA simplifier (it 'reduces' map output)

5

A 'Reduce Function' Is:

Wednesday, August 3, 11

Page 6: CouchConf-SF-Introduction-to-MapReduce

{"_id":"5d5f25254ef8fd62d6b9f2db642a8fc2","_rev": "1-157b2928bec2def71485cc751af7de37","type":"president","presidency":1,"name":"George Washington","wikipedia_entry":"http://en.wikipedia.org/wiki/George_Washington","took_office":1789,"left_office":1797,"party":"Independent","home_state":"Virginia"

}6

Example "President" Document (1 of 44)

Wednesday, August 3, 11

Page 7: CouchConf-SF-Introduction-to-MapReduce

{"_id":"5d5f25254ef8fd62d6b9f2db642a9f7d","_rev": "1-9630b35932dedbd4d31138aaf3385847","type":"event","year":1791,"event":"The independent Vermont Republic becomes the 14th state"

}

Example "Event" Document (1 of 883)

7

Wednesday, August 3, 11

Page 8: CouchConf-SF-Introduction-to-MapReduce

8

Design Document

{..."_id":"_design/design_document","_rev": "1-9630b35932dedbd4d31138aaf3385847","views": {

"party_state_name": { "map":"function ... ", "reduce": " ... "},"president_events": { "map":"function ... "},"president_names": { "map":"function ... "},"presidents": { "map":"function ... ", "reduce": " ... "},"time_in_office": { "map":"function .... ", "reduce": " ... "},"total_time_in_office": { "map":"function .... ", "reduce": " ... "}

}, ...}

special you choose

Wednesday, August 3, 11

Page 9: CouchConf-SF-Introduction-to-MapReduce

president_names

9

Wednesday, August 3, 11

Page 10: CouchConf-SF-Introduction-to-MapReduce

10

Invoke a View

Wednesday, August 3, 11

Page 11: CouchConf-SF-Introduction-to-MapReduce

curl -X GEThttp://localhost:5984/presidents/_design/design_doc/_view/president_names{"total_rows":44,"offset":0,"rows":[

{"id":"...","key":1789,"value":"George Washington"},{"id":"...","key":1797,"value":"John Adams"},{"id":"...","key":1801,"value":"Thomas Jefferson"},{"id":"...","key":1809,"value":"James Madison"},...

]}11

Invoke a View

emitting document ids are always included

Wednesday, August 3, 11

Page 12: CouchConf-SF-Introduction-to-MapReduce

CouchDB

DiskSpidermonkey ICU

http://localhost:5984/presidents/_design/design_doc/_view/president_names

Erlang HTTP

mod_couch

storage engine

{"total_rows":44, "offset":0, "rows":[...]}

query server

view

12

Under the Hood: Views

Wednesday, August 3, 11

Page 13: CouchConf-SF-Introduction-to-MapReduce

CouchDB

DiskSpidermonkey ICU

13

Under the Hood: Views

{"id":"...","key":1797,"value":"John Adams"},{"id":"...","key":1789,"value":"George Washington"},

...

{"total_rows":44, "offset":0, "rows":[...]}

Wednesday, August 3, 11

Page 14: CouchConf-SF-Introduction-to-MapReduce

{"total_rows":44,"offset":41,"rows":[! ! {"id":"...","key":1993,"value":"Bill Clinton"}]}

GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?key=1993 any valid JSON

14

Fetch Documents Matching a Key

total view rowsoffset into rows

matching key

Wednesday, August 3, 11

Page 15: CouchConf-SF-Introduction-to-MapReduce

GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?startkey=1790&endkey=1810

15

Get a Key Range of Documents

{"total_rows":44,"offset":1,"rows":[! {"id":"...","key":1797,"value":"John Adams"},! {"id":"...","key":1801,"value":"Thomas Jefferson"},! {"id":"...","key":1809,"value":"James Madison"}]}

Wednesday, August 3, 11

Page 16: CouchConf-SF-Introduction-to-MapReduce

16

Limit the Number of Documents

GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?limit=2{"total_rows":44,"offset":0,"rows":[! {"id":"...","key":1789,"value":"George Washington"},! {"id":"...","key":1797,"value":"John Adams"},]}

Wednesday, August 3, 11

Page 17: CouchConf-SF-Introduction-to-MapReduce

presidents (_count)

17

Wednesday, August 3, 11

Page 18: CouchConf-SF-Introduction-to-MapReduce

18

Invoke a View

Wednesday, August 3, 11

Page 19: CouchConf-SF-Introduction-to-MapReduce

19

Reduce: _count{

"_id":"_design/design_doc","_rev": "1-157b2928bec2def71485cc751af7de37","views": {

"presidents": {"map":"function(doc) {! if(doc.type == 'president') {! ! emit(doc.took_office, doc)}}","reduce":"_count"}

},...

}Wednesday, August 3, 11

Page 20: CouchConf-SF-Introduction-to-MapReduce

20

_count Function

GET http://localhost:5984/presidents/_design/design_doc/_view/presidents{"rows":[! {"key":null,"value":44}]}

Wednesday, August 3, 11

Page 21: CouchConf-SF-Introduction-to-MapReduce

total_time_in_office (_sum)

21

Wednesday, August 3, 11

Page 22: CouchConf-SF-Introduction-to-MapReduce

22

Invoke a View

Wednesday, August 3, 11

Page 23: CouchConf-SF-Introduction-to-MapReduce

23

A Map Function for _sum{

"_id":"_design/design_doc","_rev": "1-157b2928bec2def71485cc751af7de37","views": {

total_time_in_office": {"map":"function(doc) {! ! if(doc.type == 'president') {! ! ! emit(doc.name, doc.left_office - doc.took_office) ! }}","reduce":"_sum"}

},...

}

will be sorted by name

value is number of years in office

Wednesday, August 3, 11

Page 24: CouchConf-SF-Introduction-to-MapReduce

24

Reduce: _sum{

"_id":"_design/design_doc","_rev": "1-157b2928bec2def71485cc751af7de37","views": {

total_time_in_office": {"map":"function(doc) {! ! if(doc.type == 'president') {! ! ! emit(doc.name, doc.left_office - doc.took_office) ! }}","reduce":"_sum"}

},...

}

_sum requires number values

Wednesday, August 3, 11

Page 25: CouchConf-SF-Introduction-to-MapReduce

25

_sum Function

GET http://localhost:5984/presidents/_design/design_doc/_view/total_time_in_office{"rows":[! {"key":null,"value":232}]}

Wednesday, August 3, 11

Page 26: CouchConf-SF-Introduction-to-MapReduce

time_in_office (_stats)

26

Wednesday, August 3, 11

Page 27: CouchConf-SF-Introduction-to-MapReduce

27

Invoke a View

Wednesday, August 3, 11

Page 28: CouchConf-SF-Introduction-to-MapReduce

28

Reduce: _stats{

"_id":"_design/design_doc","_rev": "1-157b2928bec2def71485cc751af7de37","views": {

"time_in_office": {"map":"function(doc) {! ! if(doc.type == 'president') {! ! ! emit(doc.name, doc.left_office - doc.took_office) ! }}","reduce":"_stats"}

},...

}

Wednesday, August 3, 11

Page 29: CouchConf-SF-Introduction-to-MapReduce

29

_stats Function

GET http://localhost:5984/presidents/_design/design_doc/_view/time_in_office{"rows":[! {"key":null,! "value":{! "sum":232,"count":43,"min":0,"max":12,"sumsqr":1546 }

]}

Wednesday, August 3, 11

Page 30: CouchConf-SF-Introduction-to-MapReduce

View Trees

30

Wednesday, August 3, 11

Page 31: CouchConf-SF-Introduction-to-MapReduce

31

leaves

interior nodes

root

A B C D F G H I K L N O Q R

A-C D-F G-H I-L N-R

A-H I-R

A-R

Disk-Based View Tree

k=size of interior noden=number of keys

depth= log (n)k

Wednesday, August 3, 11

Page 32: CouchConf-SF-Introduction-to-MapReduce

32

keys

A B C D F G H I K L N O Q R

3 2 2 3 4

7 7

14

reductions

root

_count Nodes

Wednesday, August 3, 11

Page 33: CouchConf-SF-Introduction-to-MapReduce

33

keys

reductions

root

A B C D F G H I K L N O Q R

A-C3

D-F2

G-H2

I-L3

N-R4

A-H7

I-R7

A-R14

_count Nodes

Wednesday, August 3, 11

Page 34: CouchConf-SF-Introduction-to-MapReduce

34

A-R15

I-R8

M-R5

A B C D F G H I K L N O Q R

A-C3

D-F2

G-H2

I-L3

N-R4

A-H7

I-R7

A-R14

M

new root

new key

Inserting a New Document

new reductions

Wednesday, August 3, 11

Page 35: CouchConf-SF-Introduction-to-MapReduce

A B C D F G H I K L N O Q R

A-C3

D-F2

G-H2

I-L3

N-R4

A-H7

I-R7

A-R14

M

35

Committing the Change

M

A-R15

I-R8

M-R5

Wednesday, August 3, 11

Page 36: CouchConf-SF-Introduction-to-MapReduce

A B C D F G H I K L N O Q R

A-C3

D-F2

G-H2

I-L3

M-R5

A-H7

I-R7

A-R14

M

36

startkey endkey

Getting a Key Range

Wednesday, August 3, 11

Page 37: CouchConf-SF-Introduction-to-MapReduce

37

startkey endkey

(3)

(1)

Key Range Reduction

(5)

(2)

(8)

A B C D F G H I K L N O Q R

3 2 2 3 5

7 8

15

M

Wednesday, August 3, 11

Page 38: CouchConf-SF-Introduction-to-MapReduce

More Ways to Use Views

38

Wednesday, August 3, 11

Page 39: CouchConf-SF-Introduction-to-MapReduce

GET http://localhost:5984/presidents/_design/design_doc/_view/time_in_office?reduce=false

39

Skip the Reduce Function

{"total_rows":44,"offset":0,"rows":[{"id":"...","key":"Abraham Lincoln","value":5},{"id":"...","key":"Andrew Jackson","value":8},{"id":"...","key":"Andrew Johnson","value":4},...

]}

Wednesday, August 3, 11

Page 40: CouchConf-SF-Introduction-to-MapReduce

40

Reversing the Order of Results

GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?descending=true{"total_rows":44,"offset":0,"rows":[

{"id":"...","key":2009,"value":"Barack Obama"},{"id":"...","key":2001,"value":"George W. Bush"},{"id":"...","key":1993,"value":"Bill Clinton"},{"id":"...","key":1989,"value":"George H. W. Bush"},...

]}

Wednesday, August 3, 11

Page 42: CouchConf-SF-Introduction-to-MapReduce

avoid large values

42

Ignore a Given Number of Rows

GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?limit=10&skip=1

Wednesday, August 3, 11

Page 43: CouchConf-SF-Introduction-to-MapReduce

// first page of documentsGET http://localhost:5984/presidents/_design/design_doc/_view/president_names?limit=2{"total_rows":44,"offset":0,"rows":[! {"id":"...","key":1789,"value":"George Washington"},! {"id":"...","key":1797,"value":"John Adams"},]}

43

Paginating (Initial Page)

last key of result

Wednesday, August 3, 11

Page 46: CouchConf-SF-Introduction-to-MapReduce

46

Using a Stale View

GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?stale=ok

Wednesday, August 3, 11

Page 47: CouchConf-SF-Introduction-to-MapReduce

47

Updating the View Immediately After

GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?stale=update_after

Wednesday, August 3, 11

Page 48: CouchConf-SF-Introduction-to-MapReduce

party_state_name(group & group_level)

48

Wednesday, August 3, 11

Page 49: CouchConf-SF-Introduction-to-MapReduce

49

Invoke a View

Wednesday, August 3, 11

Page 50: CouchConf-SF-Introduction-to-MapReduce

GET http://localhost:5984/my_db/_design/ddoc/_view/v1? group_level=2

["a",1,1]["a",3,4]["a",3,8]["b",2,6]["b",2,6]["c",1,5]["c",4,2]

Map Keys Group Level 1["a"] ["b"] ["c"]

Group Level 2["a",1]["a",3] ["b",2] ["c",1]["c",4]

only applies to reduce views50

Group Level

Wednesday, August 3, 11

Page 51: CouchConf-SF-Introduction-to-MapReduce

51

Invoke a View

Wednesday, August 3, 11

Page 52: CouchConf-SF-Introduction-to-MapReduce

GET http://localhost:5984/my_db/_design/greeting/_view/v1? group=true

one output row for each unique key

equivalent to group_level=infinity

52

Group

Wednesday, August 3, 11

Page 53: CouchConf-SF-Introduction-to-MapReduce

53

Including Full Documents

GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?include_docs=true{"total_rows":44,"offset":0,"rows":[{"id":"...","key":1789,"value":"George Washington","doc":{"_id":"...","_rev":"1-...","presidency":"1","wikipedia_entry":"http://en.wikipedia.org/...","took_office":1789,"left_office":1797,"party":"Independent","home_state":"Virginia","name":"George Washington","type":"president"}},! ,...]}

Wednesday, August 3, 11

Page 54: CouchConf-SF-Introduction-to-MapReduce

function(doc) { emit("key", aValue) }includes the latest rev of the emitter

function(doc) { emit("key",{"_id":"foo", "value":doc}) }

function(doc) { emit("key",{"_rev":doc._rev; value:doc}) }includes this rev of the emitter

includes document with id 'foo'

54

Emitting with include_docs=true

Wednesday, August 3, 11

Page 55: CouchConf-SF-Introduction-to-MapReduce

POST -H "Content-Type:application/json"http://localhost:5984/presidents/_design/design_doc/_view/president_names-d '{"keys":[1789, 1929, 1993, ... ]}'

55

Requesting Specific Keys

POST -H "Content-Type:application/json"http://localhost:5984/presidents/_design/design_doc/_view/president_names?include_docs=true-d '{"keys":[1789, 1929, 1993, ... ]}'

Wednesday, August 3, 11

Page 56: CouchConf-SF-Introduction-to-MapReduce

president_events ('join')

56

Wednesday, August 3, 11

Page 57: CouchConf-SF-Introduction-to-MapReduce

57

Invoke a View

Wednesday, August 3, 11

Page 58: CouchConf-SF-Introduction-to-MapReduce

"views": {"president_events": "function(doc) {

if (doc.type == 'president') {emit([doc.took_office], doc.name);

} else if (doc.type == 'event') {emit([doc.year, 0], doc.event);

}}"

}

58

Collating 'Joins'

year took office

year of eventsecond array element

one-element array

Wednesday, August 3, 11

Page 59: CouchConf-SF-Introduction-to-MapReduce

GET http://localhost:5984/presidents/_design/design_doc/_view/president_events

59

'Join' Presidents and Events

{"total_rows":883,"offset":0,"rows":[{"id":"...","key":[1789],"value":"George Washington"},{"id":"...","key":[1790,0],"value":"Rhode Island ratifies the Constitution and becomes 13th state"},{"id":"...","key":[1791,0],"value":"Bill of Rights ratified"},{"id":"...","key":[1791,0],"value":"First Bank of the United States chartered"},...

]}Wednesday, August 3, 11

Page 60: CouchConf-SF-Introduction-to-MapReduce

End

60

Wednesday, August 3, 11