map/confused? a practical approach to map/reduce with mongodb

44

Upload: uwe-seiler

Post on 15-Jan-2015

34.592 views

Category:

Technology


1 download

DESCRIPTION

Talk given at MongoDb Munich on 16.10.2012 about the different approaches in MongoDB for using the Map/Reduce algorithm. The talk compares the performance of built-in MongoDB Map/Reduce, group(), aggregate(), find() and the MongoDB-Hadoop Adapter using a practical use case.

TRANSCRIPT

Page 1: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 2: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 3: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 5: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 6: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 7: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 8: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 9: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 10: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 11: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 12: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 13: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 14: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 15: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 17: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 18: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 19: Map/Confused? A practical approach to Map/Reduce with MongoDB

{

"_id" : ObjectId("4fb9fb91d066d657de8d6f36"),

"text" : “MongoDB uses Map/Reduce #epic #win",

"user" : {

"friends_count" : 73,

"followers_count" : 102,

"id" : 53507833,

},

}

Page 20: Map/Confused? A practical approach to Map/Reduce with MongoDB

mongod --rest --shardsvr --port 27017 --dbpath /tmp/shard1/ --smallfiles

mongod --rest --shardsvr --port 27017 --dbpath /tmp/shard1/ --smallfiles

mongod --configsvr --port 10000 --dbpath /tmp/config/ --smallfiles

mongos --port 22222 --configdb localhost:10000

1.db.tweets.mapReduce()

2.db.tweets.group()

3.db.tweets.aggregate()

4.MongoDB-Hadoop Adapter

5.db.tweets.find()

Page 21: Map/Confused? A practical approach to Map/Reduce with MongoDB

var measure = function(c) {

var a = Date.now();

var results = c.apply();

var d = Date.now() - a;

return { results:results, duration:d };

};

Page 22: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 23: Map/Confused? A practical approach to Map/Reduce with MongoDB

function() {

if (this.user != null) {

emit("user",

{userName: this.user.name,

followers: this.user.followers_count});

}

}

Page 24: Map/Confused? A practical approach to Map/Reduce with MongoDB

function(key, values) {

var result = null;

values.forEach( function(value) {

if (result == null ||

result.followers < value.followers) {

result = value;

}

})

return result;

}

Page 25: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 26: Map/Confused? A practical approach to Map/Reduce with MongoDB

db.tweets.group({

key: {},

initial: { name:'', followers_count:0 },

reduce: function(obj,prev) {

if (obj.user != null &&

prev.followers_count < obj.user.followers_count)

{

prev.name = obj.user.name;

prev.followers_count = obj.user.followers_count;

}

}

})

Page 27: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 28: Map/Confused? A practical approach to Map/Reduce with MongoDB

db.tweets.aggregate(

{$group: {

_id: {user_name: "$user.name"},

followers_count: {$max: "$user.followers_count"}

}},

{$sort: {"followers_count" : -1}},

{$limit : 1},

{$project: {

_id : 0,

user_name : "$_id.user_name",

followers_count : "$followers_count"

}})

Page 29: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 30: Map/Confused? A practical approach to Map/Reduce with MongoDB

#!/usr/bin/env python

# encoding: utf-8

import sys

sys.path.append(".")

from pymongo_hadoop import BSONMapper

def mapper(documents):

for doc in documents:

if doc['user'] != None:

yield {'_id': doc['user']['name'].encode('utf-8'),

'followers':doc['user']['followers_count']}

BSONMapper(mapper)

print >> sys.stderr, "Done Mapping!"

Page 31: Map/Confused? A practical approach to Map/Reduce with MongoDB

#!/usr/bin/env python

# encoding: utf-8

import sys

sys.path.append('.')

from pymongo_hadoop import BSONReducer

def reducer(key, values):

print >> sys.stderr, "Processing key %s" % key.encode('utf-8')

_count = 0

for v in values:

if _count < v['followers']:

_count = v["followers"]

return {"_id": key.encode('utf-8'), "count": _count}

BSONReducer(reducer)

print >> sys.stderr, "Done Reducing!"

Page 32: Map/Confused? A practical approach to Map/Reduce with MongoDB

hadoop jar /usr/lib/hadoop/lib/mongo-hadoop-streaming-

assembly-1.1.0-SNAPSHOT.jar

-files mapper.py, reducer.py

-inputURI mongodb://localhost:27017/twitter.tweets

-outputURI mongodb://localhost:27017/twitter.top_user

-mapper mapper.py

-reducer reducer.py

Page 33: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 34: Map/Confused? A practical approach to Map/Reduce with MongoDB

db.tweets.find().sort( {"user.followers_count": -1} ).limit(1)

Page 35: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 36: Map/Confused? A practical approach to Map/Reduce with MongoDB

db.tweets.mapReduce()

db.tweets.group()

db.tweets.aggregate()

MongoDB-Hadoop Adapter

db.tweets.find()

Page 37: Map/Confused? A practical approach to Map/Reduce with MongoDB

db.tweets.mapReduce()

db.tweets.group()

db.tweets.aggregate()

MongoDB-Hadoop Adapter

db.tweets.find()

Page 38: Map/Confused? A practical approach to Map/Reduce with MongoDB

db.tweets.mapReduce()

db.tweets.group()

db.tweets.aggregate()

MongoDB-Hadoop Adapter

db.tweets.find()

Page 39: Map/Confused? A practical approach to Map/Reduce with MongoDB

db.tweets.mapReduce()

db.tweets.group()

db.tweets.aggregate()

MongoDB-Hadoop Adapter

db.tweets.find()

Page 40: Map/Confused? A practical approach to Map/Reduce with MongoDB

db.tweets.mapReduce()

db.tweets.group()

db.tweets.aggregate()

MongoDB-Hadoop Adapter

db.tweets.find()

Page 41: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 42: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 43: Map/Confused? A practical approach to Map/Reduce with MongoDB
Page 44: Map/Confused? A practical approach to Map/Reduce with MongoDB