mythbusting: understanding how we measure performance at mongodb
DESCRIPTION
Benchmarking, benchmarking, benchmarking. We all do it, mostly it tells us what we want to hear but often hides a mountain of misinformation. In this talk we will walk through the pitfalls that you might find yourself in by looking at some examples where things go wrong. We will then walk through how MongoDB performance is measured, the processes and methodology and ways to present and look at the information.TRANSCRIPT
VP, Engineering, MongoDB
Andrew Erlichson
#MongoDBDays
Mythbusting: Understanding How We Measure the Performance of MongoDB
Before we start…
• We are going to look a lot at– C++ kernel code– Java benchmarks– JavaScript tests
• And lots of charts
• And its going to be awesome!
Goals of Benchmarking
– You have an idea– It must be tested– Others must be able
to reproduce your work.
– Explanations that contribute to knowledge are key.
– Should have practical applications
Academia
Industry Benchmarketing
• Prove you are faster than the competition
• Emphasize your best attributes
• Repeatable
• Explanations
Industry Benchmarketing
• Prove you are faster than the competition
• Emphasize your best attributes
• Repeatable*
• Explanations*
*OPTIONAL
Goals of Internal Benchmarking
• Always be improving
• Understand our bottlenecks
• Main customer is engineering
• Explanations are somewhat important.
Overview – Internal Benchmarking
• Some common traps
• Performance measurement & diagnosis
• What's next
Part OneSome Common Traps
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
#1 Time taken to Insert x Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
#1 Time taken to Insert x Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
#1 Time taken to Insert x Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
#1 Time taken to Insert x Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
#1 Time taken to Insert x Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
So that looks ok, right?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Object creation and GC management?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Thread contention on nextInt()?
Object creation and GC management?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Time to synthesize data?
Object creation and GC management?
Thread contention on nextInt()?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Object creation and GC management?
Thread contention on addAndGet()?
Thread contention on nextInt()?
Time to synthesize data?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Object creation and GC management?
Clock resolution?
Thread contention on nextInt()?
Time to synthesize data?
Thread contention on addAndGet()?
// Pre Create the Object outside the Loop
BasicDBObject[] aDocs = new BasicDBObject[documentsPerInsert];
for (int i=0; i < documentsPerInsert; i++) {
BasicDBObject doc = new BasicDBObject();
String cVal = "…";
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i] = doc;
}
Solution: Pre-Create the objects
Pre-create non varying data outside the timing loop
Alternative
• Pre-create the data in a file; load from file
// Use ThreadLocalRandom generator or an instance of java.util.Random per thread
java.util.concurrent.ThreadLocalRandom rand;
for (long roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
doc = aDocs[i];
doc.put("_id",id);
doc.put("k", nextInt(rand, numMaxInserts)+1);
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
}
// Maintain count outside the loop
globalInserts.addAndGet(documentsPerInsert * roundNum);
Solution: Remove contention
Remove contention nextInt() by making Thread local
// Use ThreadLocalRandom generator or an instance of java.util.Random per thread
java.util.concurrent.ThreadLocalRandom rand;
for (long roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
doc = aDocs[i];
doc.put("_id",id);
doc.put("k", nextInt(rand, numMaxInserts)+1);
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
}
// Maintain count outside the loop
globalInserts.addAndGet(documentsPerInsert * roundNum);
Solution: Remove contention
Remove contention on addAndGet()
Remove contention nextInt() by making Thread local
long startTime = System.currentTimeMillis();
…
long endTime = System.currentTimeMillis();
long startTime = System.nanoTime();
…
long endTime = System.nanoTime() - startTime;
Solution: Timer resolution
"resolution is at least as good as that of currentTimeMillis()"
"granularity of the value depends on the underlying operating system and may be larger"
Source
• http://docs.oracle.com/javase/7/docs/api/java/lang/System.html
General Principal #1Know what you are measuring
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
#2 Response time to return all results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
#2 Response time to return all results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
#2 Response time to return all results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
#2 Response time to return all results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
So that looks ok, right?
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
What else are you measuring?
Each doc is is 4080 bytes on disk with powerOf2Sizes
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
What else are you measuring?
Each doc is is 4080 bytes on disk with powerOf2Sizes
Unrestricted predicate?
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
What else are you measuring?
Each doc is is 4080 bytes on disk with powerOf2Sizes
Measuring• Time to parse &
execute query• Time to retrieve all
documentBut also• Cost of shipping
~4MB data through network stack
Unrestricted predicate?
BasicDBObject predicate = new BasicDBObject();
predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));
BasicDBObject projection = new BasicDBObject();
projection.put("_id", 1);
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate, projection );
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
Solution: Limit the projection
Return fixed range
BasicDBObject predicate = new BasicDBObject();
predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));
BasicDBObject projection = new BasicDBObject();
projection.put("_id", 1);
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate, projection );
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
Solution: Limit the projection
Only project _id
Return fixed range
BasicDBObject predicate = new BasicDBObject();
predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));
BasicDBObject projection = new BasicDBObject();
projection.put("_id", 1);
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate, projection );
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
Solution: Limit the projection
Only project _id
Only 46k transferred through network stack
Return fixed range
General Principal #2Measure only what you need to measure
Part TwoPerformance measurement & diagnosis
Broad categories
• Micro Benchmarks
• Workloads
Micro benchmarks: mongo-perf
mongo-perf: goals
• Measure– commands
• Configure– Single mongod, ReplSet size (1 -> n), Sharding– Single vs. Multiple DB– O/S
• Characterize– Throughput (ops/second) by thread count
• Compare
What do you get?
Better
Thread Count
Ops/
Sec
What do you get?
Measured improvement
between rc0 and rc2
Better
tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } );
Benchmark source code
tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } );
Benchmark source code
tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } );
Benchmark source code
tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } );
Benchmark source code
Code Change
Workloads
• "public" workloads– YCSB– Sysbench
• "real world" simulations– Inbox fan in/out– Message Stores– Content Management
Example: Bulk Load Performance16m Documents
Better
55% degradation 2.6.0-rc1 vs
2.4.10
Ouch… where's the tree in the woods?
• 2.4.10 -> 2.6.0– 4495 git commits
git-bisect
• Bisect between good/bad hashes
• git-bisect nominates a new githash– Build against githash– Re-run test– Confirm if this githash is good/bad
• Rinse and repeat
Code Change - Bad Githash
Code Change - Fix
Bulk Load Performance - Fix
Better
11% improvement
2.6.1 vs 2.4.10
The problem with measurement
• Observability– What can you observe on the system?
• Effect– What effects does the observation cause?
mtools
mtools
• MongoDB log file analysis– Filter logs for operations, events– Response time, lock durations– Plot
• https://github.com/rueckstiess/mtools
Code Change – Yielding Policy
Code Change
Response TimesBulk Insert 2.6.0 vs 2.6.1
Ceiling similar, lower floor resulting in 40% improvement in
throughput
Secondary effects of Yield policy changeWrite lock time reduced
Order of magnitude reduction of write lock
duration
> db.serverStatus()
Yes – will cause a read lock to be acquired
> db.serverStatus({recordStats:0})
No – lock is not acquired
> mongostat
Yes - until SERVER-14008 resolved, uses db.serverStatus()
Unexpected side effects of measurement?
CPU sampling
• Get an impression of– Call Graphs– CPU time spent on node and called nodes
> sudo apt-get install google-perftools
> sudo apt-get install libunwind7-dev
> scons --use-cpu-profiler mongod
Setup & building with google-profiler
> mongod –dbpath <…>
Note: Do not use –fork
> mongo
> use admin
> db.runCommand({_cpuProfilerStart: {profileFilename: 'foo.prof'}})
Execute some commands that you want to profile
> db.runCommand({_cpuProfilerStop: 1})
Start the profiling
Sample start vs. end of workload
Sample start vs. end of workload
Code change
Public Benchmarks – Not all forks are the same…
• YCSB– https://github.com/achille/YCSB
• sysbench-mongodb– https://github.com/mdcallag/sysbench-mongodb