1403 app dev series - session 5 - analytics
DESCRIPTION
TRANSCRIPT
![Page 1: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/1.jpg)
Application Development SeriesBack to BasicsReporting & Analytics
Daniel Roberts@dmroberts
#MongoDBBasics
![Page 2: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/2.jpg)
2
• Recap from last session
• Reporting / Analytics options
• Map Reduce
• Aggregation Framework introduction– Aggregation explain
• mycms application reports
• Geospatial with Aggregation Framework
• Text Search with Aggregation Framework
Agenda
![Page 3: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/3.jpg)
3
• Virtual Genius Bar
– Use the chat to post questions
– EMEA Solution Architecture / Support team are on hand
– Make use of them during the sessions!!!
Q & A
![Page 4: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/4.jpg)
Recap from last time….
![Page 5: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/5.jpg)
5
Indexing
• Indexes• Multikey, compound,
‘dot.notation’
• Covered, sorting
• Text, GeoSpatial
• Btrees
>db.articles.ensureIndex( { author : 1, tags : 1 } )
>db.user.find({user:"danr"}, {_id:0, password:1})
>db.articles.ensureIndex( { location: “2dsphere” } )
>>db.articles.ensureIndex( { "$**" : “text”,
name : “TextIndex”} )
options db.col.ensureIndex({ key : type})
![Page 6: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/6.jpg)
6
Index performance / efficiency
• Examine index plans
• Identity slow queries
• n / nscanned ratio
• Which index used.
operators .explain() , db profiler> db.articles.find(
{author:'Dan Roberts’})
.sort({date:-1}).explain()
> db.setProfilingLevel(1, 100){ "was" : 0, "slowms" : 100, "ok" : 1 }
> db.system.profile.find().pretty()
![Page 7: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/7.jpg)
Reporting / Analytics options
![Page 8: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/8.jpg)
8
• Query Language– Leverage pre aggregated documents
• Aggregation Framework– Calculate new values from the data that we have– For instance : Average views, comments count
• MapReduce– Internal Javascript based implementation– External Hadoop, using the MongoDB connector
• A combination of the above
Access data for reporting, options
![Page 9: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/9.jpg)
9
• Immediate results– Simple from a query
perspective.
– Interactions collection
Pre Aggregated Reports
{‘_id’ : ObjectId(..),
‘article_id’ : ObjectId(..), ‘section’ : ‘schema’,
‘date’ : ISODate(..),‘daily’: { ‘views’ : 45,
‘comments’ : 150 } ‘hours’ : { 0 : { ‘views’ : 10 }, 1 : { ‘views’ : 2 }, … 23 : { ‘views’ : 14,
‘comments’ : 10 } }}
> db.interactions.find(
{"article_id" : ObjectId(”…..")},{_id:0, hourly:1}
)
![Page 10: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/10.jpg)
10
• Use query result to display directly in application– Create new REST API
– D3.js library or similar in UI
Pre Aggregated Reports
{"hourly" : {
"0" : {
"view" : 1},"1" : {
"view" : 1},……"22" : {
"view" : 5},"23" : {
"view" : 3}
}}
![Page 11: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/11.jpg)
Map Reduce
![Page 12: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/12.jpg)
12
• Map Reduce– MongoDB – JavaScript
• Incremental Map Reduce
Map Reduce
//Map Reduce Example> db.articles.mapReduce(
function() { emit(this.author, this.comment_count); },function(key, values) { return Array.sum (values) },{
query : {},out: { merge: "comment_count" }
})
Output
{ "_id" : "Dan Roberts", "value" : 6 }{ "_id" : "Jim Duffy", "value" : 1 }{ "_id" : "Kunal Taneja", "value" : 2 }{ "_id" : "Paul Done", "value" : 2 }
![Page 13: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/13.jpg)
13
MongoDB – Hadoop Connector
Hadoop Integration
Primary
Secondary
Secondary
HDFS
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
HDFS HDFS HDFS
MapReduce MapReduce MapReduce MapReduce
MongoS MongoSMongoS
Application ApplicationApplication
Application Dash Boards / Reporting
1) Data Flow, Input / Output via Application Tier
![Page 14: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/14.jpg)
Aggregation Framework
![Page 15: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/15.jpg)
15
• Multi-stage pipeline– Like a unix pipe –
• “ps -ef | grep mongod”
– Aggregate data, Transform documents
– Implemented in the core server
Aggregation Framework
//Find out which are the most popular tags…db.articles.aggregate([
{ $unwind : "$tags" },{ $group : { _id : "$tags" , number : { $sum : 1 } } },{ $sort : { number : -1 } }
])
Output
{ "_id" : "mongodb", "number" : 6 }{ "_id" : "nosql", "number" : 3 }{ "_id" : "database", "number" : 1 }{ "_id" : "aggregation", "number" : 1 }{ "_id" : "node", "number" : 1 }
![Page 16: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/16.jpg)
16
In our mycms application..
//Our new python [email protected]('/cms/api/v1.0/tag_counts', methods=['GET'])def tag_counts():
pipeline = [ { "$unwind" : "$tags" },{ "$group" : { "_id" : "$tags" ,
"number" : { "$sum" : 1 } } },{ "$sort" : { "number" : -1 } }]
cur = db['articles'].aggregate(pipeline, cursor={})# Check everything okif not cur:
abort(400) # iterate the cursor and add docs to a dict tags = [tag for tag in cur] return jsonify({'tags' : json.dumps(tags, default=json_util.default)})
![Page 17: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/17.jpg)
17
• Pipeline and Expression operators
Aggregation operators
Pipeline
$match $sort$limit$skip$project$unwind$group$geoNear$text$search
Tip: Other operators for date, time, boolean and string manipulation
Expression
$addToSet
$first$last$max$min$avg$push$sum
Arithmetic
$add$divide$mod$multiply$subtract
Conditional
$cond$ifNull
Variables
$let$map
![Page 18: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/18.jpg)
18
• What reports and analytics do we need in our application?– Popular Tags– Popular Articles– Popular Locations – integration with Geo Spatial– Average views per hour or day
Application Reports
![Page 19: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/19.jpg)
19
• Unwind each ‘tags’ array
• Group and count each one, then Sort
• Output to new collection– Query from new collection so don’t need to compute for
every request.
Popular Tags
db.articles.aggregate([{ $unwind : "$tags" },{ $group : { _id : "$tags" , number : { $sum : 1 } } },{ $sort : { number : -1 } },{ $out : "tags"}
])
![Page 20: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/20.jpg)
20
• Top 5 articles by average daily views– Use the $avg operator – Use use $match to constrain data range
• Utilise with $gt and $lt operators
Popular Articles
db.interactions.aggregate([ {
{$match : { date : { $gt : ISODate("2014-02-
20T00:00:00.000Z")}}},{$group : {_id: "$article_id", a : { $avg : "$daily.view"}}},{$sort : { a : -1}},{$limit : 5}
]);
![Page 21: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/21.jpg)
21
• Use Explain plan to ensure the efficient use of the index when querying.
Aggregation Framework Explain
db.interactions.aggregate([{$group : {_id: "$article_id", a : { $avg : "$daily.view"}}},{$sort : { a : -1}},{$limit : 5}
],{explain : true}
);
![Page 22: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/22.jpg)
22
Explain output…
{"stages" : [
{"$cursor" : { "query" : … }, "fields" : { … },
"plan" : {"cursor" : "BasicCursor","isMultiKey" : false,"scanAndOrder" : false,"allPlans" : [
{"cursor" :
"BasicCursor",
"isMultiKey" : false,
"scanAndOrder" : false}
]}
}},…
"ok" : 1}
![Page 23: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/23.jpg)
Geo Spatial & Text Search Aggregation
![Page 24: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/24.jpg)
24
• $text operator with aggregation framework– All articles with MongoDB– Group by author, sort by comments count
Text Search
db.articles.aggregate([ { $match: { $text: { $search: "mongodb" } } }, { $group: { _id: "$author", comments:
{ $sum: "$comment_count" } } }{$sort : {comments: -1}},
])
![Page 25: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/25.jpg)
25
• $geoNear operator with aggregation framework– Again use geo operator in the $match statement.– Group by author, and article count.
Utilise with Geo spatial
db.articles.aggregate([ { $match: { location: { $geoNear :
{ $geometry :{ type: "Point" ,coordinates : [-0.128,
51.507] } }, $maxDistance :5000} }
}, { $group: { _id: "$author", articleCount: { $sum: 1 } } } ])
![Page 26: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/26.jpg)
Summary
![Page 27: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/27.jpg)
27
• Aggregating Data…– Map Reduce– Hadoop– Pre-Aggregated Reports– Aggregation Framework
• Tune with Explain plan
• Compute on the fly or Compute and store
• Geospatial
• Text Search
Summary
![Page 28: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/28.jpg)
28
– Operations for you application– Scale out
– Availability
– How do we prepare of production
– Sizing
Next Session – 3th April
![Page 29: 1403 app dev series - session 5 - analytics](https://reader033.vdocument.in/reader033/viewer/2022061218/54b6bd024a7959fa048b45cc/html5/thumbnails/29.jpg)