time series data in mongodb - files.meetup.comfiles.meetup.com/1742411/dc meetup - time...

Time Series Data in MongoDB

Senior Solutions Architect, MongoDB Inc.

Dave Erickson

#mongodb

Agenda

•  What is time series data?

•  Schema design considerations

•  Broader use case: operational intelligence

•  MMS Monitoring schema design

•  Thinking ahead

•  Questions

What is time series data?

Time Series Data is Everywhere

•  Financial markets pricing (stock ticks)

•  Sensors (temperature, pressure, proximity)

•  Industrial fleets (location, velocity, operational)

•  Social networks (status updates)

•  Mobile devices (calls, texts)

•  Systems (server logs, application logs)

Time Series Data at a Higher Level

•  Widely applicable data model

•  Applies to several different “data use cases”

•  Various schema and modeling options

•  Application requirements drive schema design

Time Series Data Considerations

•  Resolution of raw events

•  Resolution needed to support –  Applications –  Analysis –  Reporting

•  Data retention policies –  Data ages out –  Retention

Schema Design Considerations

Designing For Writing and Reading

•  Document per event

•  Document per minute (average)

•  Document per minute (second)

•  Document per hour

Document Per Event { server: “server1”, load: 92, ts: ISODate("2013-‐10-‐16T22:07:38.000-‐0500") } •  Relational-centric approach

•  Insert-driven workload

•  Aggregations computed at application-level

Document Per Minute (Average) { server: “server1”, load_num: 92, load_sum: 4500, ts: ISODate("2013-‐10-‐16T22:07:00.000-‐0500") } •  Pre-aggregate to compute average per minute more easily

•  Update-driven workload

•  Resolution at the minute-level

Document Per Minute (By Second) { server: “server1”, load: { 0: 15, 1: 20, …, 58: 45, 59: 40 } ts: ISODate("2013-‐10-‐16T22:07:00.000-‐0500") } •  Store per-second data at the minute level


•  Pre-allocate structure to avoid document moves

Document Per Hour (By Second) { server: “server1”, load: { 0: 15, 1: 20, …, 3598: 45, 3599: 40 } ts: ISODate("2013-‐10-‐16T22:00:00.000-‐0500") } •  Store per-second data at the hourly level



•  Updating last second requires 3599 steps

Document Per Hour (By Second) { server: “server1”, load: { 0: {0: 15, …, 59: 45}, …. 59: {0: 25, …, 59: 75} ts: ISODate("2013-‐10-‐16T22:00:00.000-‐0500") } •  Store per-second data at the hourly level with nesting



•  Updating last second requires 59+59 steps

Characterzing Write Differences

•  Example: data generated every second

•  Capturing data per minute requires: –  Document per event: 60 writes –  Document per minute: 1 write, 59 updates

•  Transition from insert driven to update driven –  Individual writes are smaller –  Performance and concurrency benefits

Characterizing Read Differences

•  Example: data generated every second

•  Reading data for a single hour requires: –  Document per event: 3600 reads –  Document per minute: 60 reads

•  Read performance is greatly improved –  Optimal with tuned block sizes and read ahead –  Fewer disk seeks

MMS Monitoring Schema Design

MMS Monitoring

•  MongoDB Management System Monitoring

•  Available in two flavors –  Free cloud-hosted monitoring –  On-premise with MongoDB Enterprise

•  Monitor single node, replica set, or sharded cluster deployments

•  Metric dashboards and custom alert triggers

MMS Monitoring

MMS Application Requirements

Resolution defines granularity of stored data

Range controls the retention policy, e.g. after 24 hours only 5-minute resolution

Display dictates the stored pre-aggregations, e.g. total and count

Monitoring Schema Design

•  Per-minute document model

•  Documents store individual metrics and counts

•  Supports “total” and “avg/sec” display

{ timestamp_minute: ISODate(“2013-‐10-‐10T23:06:00.000Z”), num_samples: 58, total_samples: 108000000, type: “memory_used”, values: { 0: 999999, … 59: 1800000 } }

Monitoring Data Updates

•  Single update required to add new data and increment associated counts

db.metrics.update( { timestamp_minute: ISODate("2013-‐10-‐10T23:06:00.000Z"), type: “memory_used” }, { {$set: {“values.59”: 2000000 }}, {$inc: {num_samples: 1, total_samples: 2000000 }} } )

Monitoring Data Management

•  Data stored at different granularity levels for read performance

•  Collections are organized into specific intervals

•  Retention is managed by simply dropping collections as they age out

•  Document structure is pre-created to maximize write performance

Use Case: Operational Intelligence

What is Operational Intelligence

•  Storing log data –  Capturing application and/or server generated events

•  Hierarchical aggregation –  Rolling approach to generate rollups –  e.g. hourly > daily > weekly > monthly

•  Pre-aggregated reports –  Processing data to generate reporting from raw events

Storing Log Data

{ _id: ObjectId('4f442120eb03305789000000'), host: "127.0.0.1", user: 'frank', time: ISODate("2000-‐10-‐10T20:55:36Z"), path: "/apache_pb.gif", request: "GET /apache_pb.gif HTTP/1.0", status: 200, response_size: 2326, referrer: “http://www.example.com/start.html", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)" }

127.0.0.1 -‐ frank [10/Oct/2000:13:55:36 -‐0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en] (Win98; I ;Nav)”

Pre-Aggregation

•  Analytics across raw events can involve many reads

•  Alternative schemas can improve read and write performance

•  Data can be organized into more coarse buckets

•  Transition from insert-driven to update-driven workloads

Pre-Aggregated Log Data { timestamp_minute: ISODate("2000-‐10-‐10T20:55:00Z"), resource: "/index.html", page_views: { 0: 50, … 59: 250 } }

•  Leverage time-series style bucketing

•  Track individual metrics (ex. page views)

•  Improve performance for reads/writes

•  Minimal processing overhead

Hierarchical Aggregation

•  Analytical approach as opposed to schema approach –  Leverage built-in Aggregation Framework or MapReduce

•  Execute multiple tasks sequentially to aggregate at varying levels

•  Raw events à Hourly à Weekly à Monthly

•  Rolling approach distributes the aggregation workload

Thinking Ahead

Before You Start

•  What are the application requirements?

•  Is pre-aggregation useful for your application?

•  What are your retention and age-out policies?

•  What are the gotchas? –  Pre-create document structure to avoid fragmentation and

performance problems –  Organize your data for growth – time series data grows fast!

Down The Road

•  Scale-out considerations –  Vertical vs. horizontal (with sharding)

•  Understanding the data –  Aggregation –  Analytics –  Reporting

•  Deeper data analysis –  Patterns –  Predictions

Scaling Time Series Data in MongoDB

•  Vertical growth –  Larger instances with more CPU and memory –  Increased storage capacity

•  Horizontal growth –  Partitioning data across many machines –  Dividing and distributing the workload

Time Series Sharding Considerations

•  What are the application requirements? –  Primarily collecting data –  Primarily reporting data –  Both

•  Map those back to –  Write performance needs –  Read/write query distribution –  Collection organization (see MMS Monitoring)

•  Example: {metric name, coarse timestamp}

Aggregates, Analytics, Reporting

•  Aggregation Framework can be used for analysis –  Does it work with the chosen schema design? –  What sorts of aggregations are needed?

•  Reporting can be done on predictable, rolling basis –  See “Hierarchical Aggregation”

•  Consider secondary reads for analytical operations –  Minimize load on production primaries

Deeper Data Analysis

•  Leverage MongoDB-Hadoop connector –  Bi-directional support for reading/writing –  Works with online and offline data (e.g. backup files)

•  Compute using MapReduce –  Patterns –  Recommendations –  Etc.

•  Explore data –  Pig –  Hive

Questions?

Resources

•  Schema Design for Time Series Data in MongoDB http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb

•  Operational Intelligence Use Case http://docs.mongodb.org/ecosystem/use-cases/#operational-intelligence

•  Data Modeling in MongoDB http://docs.mongodb.org/manual/data-modeling/

•  Schema Design (webinar) http://www.mongodb.com/events/webinar/schema-design-oct2013