mongodb: queries and aggregation framework with nba game data
TRANSCRIPT
MongoDB Queries and Aggregation
Valeri KarpovKernel Tools Engineer, MongoDB
www.thecodebarbarian.comgithub.com/vkarpov15
@code_barbarian
*
Introducing an Awesome Data Set
•Scraped basketball-reference.com
•Mad props to NPM module Cheerio
•Box scores for all 31,686 NBA games since 1985
•Download: http://bit.ly/1jlgs9u via S3
•Untar and run mongorestore
*
Data Set Structure
•Contains final score
•Contains box score for teams and players
*
Data Set Structure - High Level
•Contains _id, date
•Info on winning team and losing team
*
Data Set Structure - Box
•Box score contains detailed stats by team
*
Data Set Structure - Box
•And also for individual players:
*
Queries and Aggregation
•MongoDB has a rich query framework
•Aggregation framework is like SQL’s group by
*
Query Basics - findOne()
•When was Kobe Bryant’s 81 point game?
*
Query Basics - find()
•Which teams have lost despite scoring more than 150 points?
*
Query Basics - count()
•How many games did the Lakers win in the 1999-2000 season?
*
Query Basics - distinct()
•Which teams have lost a game despite having a player make at least 10 3 pointers?
*
Query Basics - $elemMatch operator
•When did Michael Jordan score 60 points in a losing effort?
*
Query Basics - $elemMatch operator
*
Query Basics - .sort() and .limit()
•What are the 5 highest point totals for a losing team?
*
Query Basics - .sort() and .limit()
•What are the 5 highest point totals for a losing team?
*
Aggregation
•Similar to SQL group by
•Filters and transforms data in pipeline stages
•Stages are chainable
•Accessible via the .aggregate() function in shell
*
Aggregation - Lakers Season PPG
•How many points did the Lakers average in games they won in the 2008-2009 season?
*
Aggregation - Lakers Season PPG
•How many points did the Lakers average in games they won in the 2008-2009 season?
*
Aggregation - $sort and $limit
•Compute the teams with the 5 best records in the 1999-2000 season
*
Aggregation - $sort and $limit
*
Aggregation - $sort and $limit
*
Aggregation - $unwind
•Random statistic: player with highest scoring average in games their team lost
*
Aggregation - $unwind
•Random statistic: player with highest scoring average in games their team lost
*
Aggregation - Fun With Steals
•How often does a team win when they record more steals than the other team?
*
Aggregation - Fun With Steals
*
Aggregation - Fun With Steals
*
Thanks for Listening!
Slides on Twitter, @code_barbarian