couchconf-sf-introduction-to-mapreduce
TRANSCRIPT
1
Wednesday, August 3, 11
Introduction to MapReduce
2
Wednesday, August 3, 11
a 'map' functionA named pair of functions:
An entry in a 'design document'
3
A 'View' Is:
a 'reduce' function (optional)
A disk file of indexed results
Wednesday, August 3, 11
Called with every database documentAn 'emitter' of a key and a value
4
A 'Map Function' Is:
Wednesday, August 3, 11
Called once with the 'map' resultsA simplifier (it 'reduces' map output)
5
A 'Reduce Function' Is:
Wednesday, August 3, 11
{"_id":"5d5f25254ef8fd62d6b9f2db642a8fc2","_rev": "1-157b2928bec2def71485cc751af7de37","type":"president","presidency":1,"name":"George Washington","wikipedia_entry":"http://en.wikipedia.org/wiki/George_Washington","took_office":1789,"left_office":1797,"party":"Independent","home_state":"Virginia"
}6
Example "President" Document (1 of 44)
Wednesday, August 3, 11
{"_id":"5d5f25254ef8fd62d6b9f2db642a9f7d","_rev": "1-9630b35932dedbd4d31138aaf3385847","type":"event","year":1791,"event":"The independent Vermont Republic becomes the 14th state"
}
Example "Event" Document (1 of 883)
7
Wednesday, August 3, 11
8
Design Document
{..."_id":"_design/design_document","_rev": "1-9630b35932dedbd4d31138aaf3385847","views": {
"party_state_name": { "map":"function ... ", "reduce": " ... "},"president_events": { "map":"function ... "},"president_names": { "map":"function ... "},"presidents": { "map":"function ... ", "reduce": " ... "},"time_in_office": { "map":"function .... ", "reduce": " ... "},"total_time_in_office": { "map":"function .... ", "reduce": " ... "}
}, ...}
special you choose
Wednesday, August 3, 11
president_names
9
Wednesday, August 3, 11
10
Invoke a View
Wednesday, August 3, 11
curl -X GEThttp://localhost:5984/presidents/_design/design_doc/_view/president_names{"total_rows":44,"offset":0,"rows":[
{"id":"...","key":1789,"value":"George Washington"},{"id":"...","key":1797,"value":"John Adams"},{"id":"...","key":1801,"value":"Thomas Jefferson"},{"id":"...","key":1809,"value":"James Madison"},...
]}11
Invoke a View
emitting document ids are always included
Wednesday, August 3, 11
CouchDB
DiskSpidermonkey ICU
http://localhost:5984/presidents/_design/design_doc/_view/president_names
Erlang HTTP
mod_couch
storage engine
{"total_rows":44, "offset":0, "rows":[...]}
query server
view
12
Under the Hood: Views
Wednesday, August 3, 11
CouchDB
DiskSpidermonkey ICU
13
Under the Hood: Views
{"id":"...","key":1797,"value":"John Adams"},{"id":"...","key":1789,"value":"George Washington"},
...
{"total_rows":44, "offset":0, "rows":[...]}
Wednesday, August 3, 11
{"total_rows":44,"offset":41,"rows":[! ! {"id":"...","key":1993,"value":"Bill Clinton"}]}
GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?key=1993 any valid JSON
14
Fetch Documents Matching a Key
total view rowsoffset into rows
matching key
Wednesday, August 3, 11
GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?startkey=1790&endkey=1810
15
Get a Key Range of Documents
{"total_rows":44,"offset":1,"rows":[! {"id":"...","key":1797,"value":"John Adams"},! {"id":"...","key":1801,"value":"Thomas Jefferson"},! {"id":"...","key":1809,"value":"James Madison"}]}
Wednesday, August 3, 11
16
Limit the Number of Documents
GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?limit=2{"total_rows":44,"offset":0,"rows":[! {"id":"...","key":1789,"value":"George Washington"},! {"id":"...","key":1797,"value":"John Adams"},]}
Wednesday, August 3, 11
presidents (_count)
17
Wednesday, August 3, 11
18
Invoke a View
Wednesday, August 3, 11
19
Reduce: _count{
"_id":"_design/design_doc","_rev": "1-157b2928bec2def71485cc751af7de37","views": {
"presidents": {"map":"function(doc) {! if(doc.type == 'president') {! ! emit(doc.took_office, doc)}}","reduce":"_count"}
},...
}Wednesday, August 3, 11
20
_count Function
GET http://localhost:5984/presidents/_design/design_doc/_view/presidents{"rows":[! {"key":null,"value":44}]}
Wednesday, August 3, 11
total_time_in_office (_sum)
21
Wednesday, August 3, 11
22
Invoke a View
Wednesday, August 3, 11
23
A Map Function for _sum{
"_id":"_design/design_doc","_rev": "1-157b2928bec2def71485cc751af7de37","views": {
total_time_in_office": {"map":"function(doc) {! ! if(doc.type == 'president') {! ! ! emit(doc.name, doc.left_office - doc.took_office) ! }}","reduce":"_sum"}
},...
}
will be sorted by name
value is number of years in office
Wednesday, August 3, 11
24
Reduce: _sum{
"_id":"_design/design_doc","_rev": "1-157b2928bec2def71485cc751af7de37","views": {
total_time_in_office": {"map":"function(doc) {! ! if(doc.type == 'president') {! ! ! emit(doc.name, doc.left_office - doc.took_office) ! }}","reduce":"_sum"}
},...
}
_sum requires number values
Wednesday, August 3, 11
25
_sum Function
GET http://localhost:5984/presidents/_design/design_doc/_view/total_time_in_office{"rows":[! {"key":null,"value":232}]}
Wednesday, August 3, 11
time_in_office (_stats)
26
Wednesday, August 3, 11
27
Invoke a View
Wednesday, August 3, 11
28
Reduce: _stats{
"_id":"_design/design_doc","_rev": "1-157b2928bec2def71485cc751af7de37","views": {
"time_in_office": {"map":"function(doc) {! ! if(doc.type == 'president') {! ! ! emit(doc.name, doc.left_office - doc.took_office) ! }}","reduce":"_stats"}
},...
}
Wednesday, August 3, 11
29
_stats Function
GET http://localhost:5984/presidents/_design/design_doc/_view/time_in_office{"rows":[! {"key":null,! "value":{! "sum":232,"count":43,"min":0,"max":12,"sumsqr":1546 }
]}
Wednesday, August 3, 11
View Trees
30
Wednesday, August 3, 11
31
leaves
interior nodes
root
A B C D F G H I K L N O Q R
A-C D-F G-H I-L N-R
A-H I-R
A-R
Disk-Based View Tree
k=size of interior noden=number of keys
depth= log (n)k
Wednesday, August 3, 11
32
keys
A B C D F G H I K L N O Q R
3 2 2 3 4
7 7
14
reductions
root
_count Nodes
Wednesday, August 3, 11
33
keys
reductions
root
A B C D F G H I K L N O Q R
A-C3
D-F2
G-H2
I-L3
N-R4
A-H7
I-R7
A-R14
_count Nodes
Wednesday, August 3, 11
34
A-R15
I-R8
M-R5
A B C D F G H I K L N O Q R
A-C3
D-F2
G-H2
I-L3
N-R4
A-H7
I-R7
A-R14
M
new root
new key
Inserting a New Document
new reductions
Wednesday, August 3, 11
A B C D F G H I K L N O Q R
A-C3
D-F2
G-H2
I-L3
N-R4
A-H7
I-R7
A-R14
M
35
Committing the Change
M
A-R15
I-R8
M-R5
Wednesday, August 3, 11
A B C D F G H I K L N O Q R
A-C3
D-F2
G-H2
I-L3
M-R5
A-H7
I-R7
A-R14
M
36
startkey endkey
Getting a Key Range
Wednesday, August 3, 11
37
startkey endkey
(3)
(1)
Key Range Reduction
(5)
(2)
(8)
A B C D F G H I K L N O Q R
3 2 2 3 5
7 8
15
M
Wednesday, August 3, 11
More Ways to Use Views
38
Wednesday, August 3, 11
GET http://localhost:5984/presidents/_design/design_doc/_view/time_in_office?reduce=false
39
Skip the Reduce Function
{"total_rows":44,"offset":0,"rows":[{"id":"...","key":"Abraham Lincoln","value":5},{"id":"...","key":"Andrew Jackson","value":8},{"id":"...","key":"Andrew Johnson","value":4},...
]}
Wednesday, August 3, 11
40
Reversing the Order of Results
GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?descending=true{"total_rows":44,"offset":0,"rows":[
{"id":"...","key":2009,"value":"Barack Obama"},{"id":"...","key":2001,"value":"George W. Bush"},{"id":"...","key":1993,"value":"Bill Clinton"},{"id":"...","key":1989,"value":"George H. W. Bush"},...
]}
Wednesday, August 3, 11
41
Reversing the Order of a Range
GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?descending=true&startkey=1850&endkey=1790{"total_rows":44,"offset":0,"rows":[
{"id":"...","key":2009,"value":"Barack Obama"},{"id":"...","key":2001,"value":"George W. Bush"},{"id":"...","key":1993,"value":"Bill Clinton"},{"id":"...","key":1989,"value":"George H. W. Bush"},...
]}
startkey and endkey are reversed, too
Wednesday, August 3, 11
avoid large values
42
Ignore a Given Number of Rows
GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?limit=10&skip=1
Wednesday, August 3, 11
// first page of documentsGET http://localhost:5984/presidents/_design/design_doc/_view/president_names?limit=2{"total_rows":44,"offset":0,"rows":[! {"id":"...","key":1789,"value":"George Washington"},! {"id":"...","key":1797,"value":"John Adams"},]}
43
Paginating (Initial Page)
last key of result
Wednesday, August 3, 11
// successive pagesGET http://localhost:5984/presidents/_design/design_doc/_view/president_names?startkey=1797&skip=1&limit=2
44
Paginating (Successive Pages)
last key of previous result
don't include the first document{"total_rows":44,"offset":2,"rows":[! {"id":"...","key":1801,"value":"Thomas Jefferson"},! {"id":"...","key":1809,"value":"James Madison"},]}
Wednesday, August 3, 11
45
Paginating in Reverse Order
// first page of documentsGET http://localhost:5984/presidents/_design/design_doc/_view/president_names?descending=true&limit=2
// successive pagesGET http://localhost:5984/presidents/_design/design_doc/_view/president_names?descending=true&startkey=1797&skip=1&limit=2
last key of previous result
don't include the first document
Wednesday, August 3, 11
46
Using a Stale View
GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?stale=ok
Wednesday, August 3, 11
47
Updating the View Immediately After
GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?stale=update_after
Wednesday, August 3, 11
party_state_name(group & group_level)
48
Wednesday, August 3, 11
49
Invoke a View
Wednesday, August 3, 11
GET http://localhost:5984/my_db/_design/ddoc/_view/v1? group_level=2
["a",1,1]["a",3,4]["a",3,8]["b",2,6]["b",2,6]["c",1,5]["c",4,2]
Map Keys Group Level 1["a"] ["b"] ["c"]
Group Level 2["a",1]["a",3] ["b",2] ["c",1]["c",4]
only applies to reduce views50
Group Level
Wednesday, August 3, 11
51
Invoke a View
Wednesday, August 3, 11
GET http://localhost:5984/my_db/_design/greeting/_view/v1? group=true
one output row for each unique key
equivalent to group_level=infinity
52
Group
Wednesday, August 3, 11
53
Including Full Documents
GET http://localhost:5984/presidents/_design/design_doc/_view/president_names?include_docs=true{"total_rows":44,"offset":0,"rows":[{"id":"...","key":1789,"value":"George Washington","doc":{"_id":"...","_rev":"1-...","presidency":"1","wikipedia_entry":"http://en.wikipedia.org/...","took_office":1789,"left_office":1797,"party":"Independent","home_state":"Virginia","name":"George Washington","type":"president"}},! ,...]}
Wednesday, August 3, 11
function(doc) { emit("key", aValue) }includes the latest rev of the emitter
function(doc) { emit("key",{"_id":"foo", "value":doc}) }
function(doc) { emit("key",{"_rev":doc._rev; value:doc}) }includes this rev of the emitter
includes document with id 'foo'
54
Emitting with include_docs=true
Wednesday, August 3, 11
POST -H "Content-Type:application/json"http://localhost:5984/presidents/_design/design_doc/_view/president_names-d '{"keys":[1789, 1929, 1993, ... ]}'
55
Requesting Specific Keys
POST -H "Content-Type:application/json"http://localhost:5984/presidents/_design/design_doc/_view/president_names?include_docs=true-d '{"keys":[1789, 1929, 1993, ... ]}'
Wednesday, August 3, 11
president_events ('join')
56
Wednesday, August 3, 11
57
Invoke a View
Wednesday, August 3, 11
"views": {"president_events": "function(doc) {
if (doc.type == 'president') {emit([doc.took_office], doc.name);
} else if (doc.type == 'event') {emit([doc.year, 0], doc.event);
}}"
}
58
Collating 'Joins'
year took office
year of eventsecond array element
one-element array
Wednesday, August 3, 11
GET http://localhost:5984/presidents/_design/design_doc/_view/president_events
59
'Join' Presidents and Events
{"total_rows":883,"offset":0,"rows":[{"id":"...","key":[1789],"value":"George Washington"},{"id":"...","key":[1790,0],"value":"Rhode Island ratifies the Constitution and becomes 13th state"},{"id":"...","key":[1791,0],"value":"Bill of Rights ratified"},{"id":"...","key":[1791,0],"value":"First Bank of the United States chartered"},...
]}Wednesday, August 3, 11
End
60
Wednesday, August 3, 11