high performance applications with mongodb
TRANSCRIPT
High Performance with MongoDB
or "how to design fast applications"
Asya KamskyLead Product Manager, MongoDB Inc
#MongoDB @asya999 #askAsya
What Is Fast?
• Must understand – what "fast" means– how to measure it– what are requirements– what's the context
What Is Fast?
What Is Fast?
What Is Fast?
George Washington Bridge
Is It Fast?
Is It Fast?
• In context of crossing the bridge, fast means:– how long will it take one car– how many cars can do it "at the same time"
Is It Fast?Facts & Info
Opened to trafficUpper level: October 25, 1931Lower level: August 29, 1962
Bus Station opened: January 17, 1963
Length of bridge between anchorages: 4,760 feet Width of bridge: 119 feet Width of roadway: 90 feet Height of tower above water: 604 feet Water clearance at midspan: 212 feet
Number of toll lanes: Upper level: 12Lower level: 10 Palisades Interstate Parkway: 7** E-ZPass only overnight
2013 Traffic Volumes
Total New York-bound (eastbound) traffic: 49,402,245 vehicles
What Is Fast?
What Is Fast?
Latency Throughput
How long "it" takes How many "per unit of time"
What Is Fast?
Latency Throughput Throughput Latency
What Is Fast?
Latency Throughput Throughput Latency
Orthogonal, but highly interdependent
What Is Fast?
Latency Throughput Throughput Latency
What Is Fast?
Latency Throughput Throughput Latency
What Is Fast?
New Jersey New York
What Is Fast?
New Jersey New York
What Is Fast?
New Jersey New York
Must address the "limiting factor"
Application
DriverDB Requests
Application
SchemaIndexes
Storage Engine
DriverDB Requests
Application
SchemaIndexes
File System
Storage Engine
OS
DriverDB Requests
Application
SchemaIndexes
File System
Storage Engine
OS
DriverDB Requests
Application
SchemaIndexes
File System
Storage Engine
OS
DriverDB Requests
PhysicalConceptual
Application
SchemaIndexes
File System
Storage Engine
OS
DriverDB Requests
PhysicalConceptual
SchemaIndexes
Storage Engine
Schema
Schema Patterns
Schema Anti-Patterns
Parent Object
OVER-NORMALIZATION OVER-EMBEDDING
Schema Anti-Patterns
Unbounded growth
Deeply nested arrays
Really large documents
Schema Anti-Patterns: over-embedding
Unbounded growth
Deeply nested arrays
Really large documents
Schema Anti-Patterns: over-embedding
Unbounded growth
Deeply nested arrays
Really large documents
Schema Anti-Patterns: over-embedding
Unbounded growth
Deeply nested arrays
Really large documents
Schema Anti-Patterns: over-normalizing
you are over-normalizing if you are doing JOINS in your application
instead of "finds"
reads vs writes
polymorphic collections
polymorphic fields
Schema Anti-Patterns: signs of trouble
reads vs writes
polymorphic collections
polymorphic fields
Schema Anti-Patterns: signs of trouble
reads vs writes
polymorphic collections
polymorphic fields
Schema Anti-Patterns: signs of trouble
bad regex queries
lots of indexes
no indexes
Schema Anti-Patterns: can't use indexes
bad regex queries
lots of indexes
no indexes
Schema Anti-Patterns: can't use indexes
bad regex queries
lots of indexes
no indexes
Schema Anti-Patterns: can't use indexes
bad regex queries
lots of indexes
no indexes
Schema Anti-Patterns: can't use indexes
Indexes
Storage Engine
Storage Engine: compression
Is It Fast?
Disk IOPS a factor?
Is It Fast?
Disk IOPS a factor?Data compressible?
Is It Fast?
Disk IOPS a factor?Data compressible?Extra CPU cycles available?
Is It Fast?
Disk IOPS a factor?Data compressible?Extra CPU cycles available?
- Yes WT FTW!
Storage Engine: concurrency
MMAPV1 WiredTiger
Granularity lowLatency low
Granularity highLatency higher
New Jersey New York
New Jersey New York
New Jersey New York
New Jersey New York
New Jersey New York
Is It Fast?
Multiple threads?
Is It Fast?
Multiple threads?Same collection?
Is It Fast?
Multiple threads?Same collection?Not the same document(s)?
Is It Fast?
Multiple threads?Same collection?Not the same document(s)?Extra CPU cycles available?
Is It Fast?
Multiple threads?Same collection?Not the same document(s)?Extra CPU cycles available?Not "too many" workers?
Is It Fast?
Multiple threads?Same collection?Not the same document(s)?Extra CPU cycles available?Not "too many" workers?
- Yes WT FTW!
Uniform Latest Zipfian0
10,000
20,000
30,000
40,000
50,000
60,000
Throughput: 50/50 Workload in RAM
htop
Storage Engine: write-pattern
{ timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 1500000 } }}
{ timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 1500000 } }}db.metrics.update( { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used” }, {$set: {“values.59.59”: 2000000 } })
{ timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 2000000 } }}db.metrics.update( { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used” }, {$set: {“values.59.59”: 2000000 } })
{ timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 1500000 } }}
{ timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 1500000 } }}db.metrics.update( { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used” }, {$set: {“values.59.59”: 2000000 } })
{ timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 1500000 } }}db.metrics.update( { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used” }, {$set: {“values.59.59”: 2000000 } })
{ timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 2000000 } }}
MongoDB Cloud Monitoring
Benchmark your own applicationUse realistic workloadUse real dataMeasure throughput and latency