performance tuning on the fly at cmp.ly

Post on 18-Jul-2015

431 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1JUNE 2014

Performance Tuning on the Fly at CMP.LY

Michael De Lorenzo

CTO, CMP.LY Inc.

michael@cmp.ly

@mikedelorenzo

2JUNE 2014

Agenda• CMP.LY and CommandPost

• What is MongoDB Management Service?

• Performance Tuning

• MongoDB Issues we’ve faced

• Slow response times and delayed writes

• Unindexed queries

• Increased Replication Lag and Plummeting oplog Window

• Keep your deployment healthy with MMS

• Using MMS Alerts

• Using MMS Backups

3JUNE 2014

A venture-funded NYC startup that offers proprietary social media, monitoring,

measurement, insight and compliance solutions for Fortune 100

A Monitoring, Measurement & Insights (MMI) tool for managed social

communications.

4JUNE 2014

Use CommandPost to:• Track and measure cross-platform in real-time

• Identify and attribute high-value engagement

• Analyze and segment engaged audience

• Optimize content and engagement strategies

• Address compliance needs

5JUNE 2014

What is MongoDB

Management Service?

6JUNE 2014

MongoDB Management Service• Free MongoDB Monitoring

• MongoDB Backup in the Cloud

• Free Cloud service or Available

to run On-Prem for Standard or

Enterprise Subscriptions

• Automation coming soon—FTW!

Ops

Makes MongoDB easier to use and

manage

7JUNE 2014

Who Is MMS for?• Developers

• Ops Team

• MongoDB Technical Service Team

8JUNE 2014

Performance Tuning

9JUNE 2014

How To Do Performance Tuning?• Assess the problem and establish acceptable behavior.

• Measure the performance before modification.

• Identify the bottleneck.

• Remove the bottleneck.

• Measure performance after modification to confirm.

• Keep it or revert it and repeat.

Adapted from [http://en.wikipedia.org/wiki/Performance_tuning]

10JUNE 2014

What We’ve Faced

11JUNE 2014

Issues We’ve Faced• Concurrency Issues

• Slow response times and delayed writes

• Querying without indexes

• Slow reads, timeouts

• Increasing Replication Lag + Plummeting oplog Window

12JUNE 2014

Concurrency

Slow responses and delayed writes

13JUNE 2014

Concurrency• What is it?

• How did it affect us?

• How did MMS help identify it?

• How did we diagnose the issue in our app and fix it?

• Today

14JUNE 2014

Concurrency in MongoDB• MongoDB uses a readers-writer lock

• Many read operations can use a read lock

• If a write lock exists, a single write lock holds the lock exclusively

• No other read or write operations can share the lock

• Locks are “writer-greedy”

15JUNE 2014

How Did This Affect Us?• Slow API response times due to slow database operations

• Delayed writes

• Backed up queues

16JUNE 2014

MMS: Identify Concurrency Issues

17JUNE 2014

Lock % Greater than 100%?!?!?• time spent in write lock state; sum of global lock + hottest database at that time,

can make value > 100%

• Global lock percentage is a derived metric:

% of time in global lock (small number)

+% of time locked by hottest (“most locked”) database

• Data is sampled and combined, it is possible to see values over 100%.

18JUNE 2014

Diagnosis• Identified the write-heavy collections in our applications

• Used application logs to identify slow API responses

• Analyzed MongoDB logs to identify slow database queries

19JUNE 2014

Our Remedies• Schema changes

• Message queues

• Multiple databases

• Sharding

20JUNE 2014

Schema Changes• Denormalized our schema

• Allowed for atomic updates

• Customized documents’ _id attribute

• Leveraged existing index on _id attribute

21JUNE 2014

Modeling for Atomic OperationsDocument{

_id: 123456789,

title: "MongoDB: The Definitive Guide",

author: [ "Kristina Chodorow", "Mike Dirolf"

],

published_date: ISODate("2010-09-24"),

pages: 216,

language: "English",

publisher_id: "oreilly",

available: 3,

checkout: [ { by: "joe", date:

ISODate("2012-10-15") } ]

}

Update Operationdb.books.update (

{ _id: 123456789, available: { $gt: 0 } },

{

$inc: { available: -1 },

$push: { checkout: { by: "abc", date: new

Date() } }

}

)

ResultWriteResult({ "nMatched" : 1, "nUpserted" : 0,

"nModified" : 1 })

22JUNE 2014

Message Queues• Controlled writes to specific collections using Pub/Sub

• We chose Amazon SQS

• Other options include Redis, Beanstalkd, IronMQ or any other message queue

• Created consistent flow of writes versus bursts

• Reduced length and frequency of write locks by controlling flow/speed of writes

23JUNE 2014

Using Multiple Databases• As of version 2.2, MongoDB implements locks at a per database granularity for

most read and write operations

• Planned to be at the document level in version 2.8

• Moved write-heavy collections to new (separate) databases

24JUNE 2014

Using Sharding• Improves concurrency by distributing databases across multiple mongod

instances

• Locks are per-mongod instance

25JUNE 2014

Lock %: Today

26JUNE 2014

Queries without Indexes

Slow responses and timeouts

27JUNE 2014

Indexing• What is it?

• How did it affect us?

• How did MMS help identify it?

• How did we diagnose the issue in our app and fix it?

• Today

28JUNE 2014

Indexing with MongoDB• Support for efficient execution of queries

• Without indexes, MongoDB must scan every document

• Example

Wed Jul 17 13:40:14 [conn28600] query x.y [snip] ntoreturn:16 ntoskip:0 nscanned:16779 scanAndOrder:1 keyUpdates:0 numYields: 906 locks(micros) r:46877422 nreturned:16 reslen:6948 38172ms

38 seconds! Scanned 17k documents, returned 16

• Create indexes to cover all queries, especially support common and user-facing

• Collection scans can push entire working set out of RAM

29JUNE 2014

How Did this Affect Us?• Our web apps became slow

• Queries began to timeout

• Longer operations mean longer lock times

30JUNE 2014

MMS: Identifying Indexing IssuesPage Faults

• The number of times that

MongoDB requires data

not located in physical

memory, and must read

from virtual memory.

31JUNE 2014

Diagnosis• Log Analysis

• Use mtools to analyze MongoDB logs

• mlogfilter• filter logs for slow queries, collection scans, etc.

• mplotqueries• graph query response times and volumes

• https://github.com/rueckstiess/mtools

32JUNE 2014

Diagnosis• Monitoring application logs

• Enabling ‘notablescan’ option in development and testing versions of apps

• MongoDB profiling

33JUNE 2014

The MongoDB Profiler• Collects fine grained data about MongoDB write operations, cursors, database

commands on a running mongod instance.

• Default slowOpThreshold value is 100ms, can be changed from the Mongo shell

34JUNE 2014

Our Remedies• Add indexes!

• Make sure queries are covered

• Utilize the projection specification to limit fields (data) returned

35JUNE 2014

Adding Indexes• Improved performance for common queries

• Alleviates the need to go to disk for many operations

36JUNE 2014

Projection SpecificationControls the amount of data that needs to be (de-)serialized for use in your app

• We used it to limit data returned in embedded documents and arrays

db.inventory.find( { type: 'food' }, { item: 1, qty: 1 } )

37JUNE 2014

Page Faults: Today

38JUNE 2014

Increasing Replication Lag + Plummeting oplog Window

39JUNE 2014

Replication• What is it?

• How did it affect us?

• How did MMS help identify it?

• How did we diagnose the issue in our app?

• How did we fix it?

• Today

40JUNE 2014

What is Replication?• A replica set is a group of mongod

processes that maintain the same data

set.

• Replica sets provide redundancy and

high availability, and are the basis for all

production deployments

41JUNE 2014

What Is the Oplog?• A special capped collection that keeps a rolling record of all operations that

modify the data stored in your databases.

• Operations are first applied on the primary and then recorded to its oplog.

• Secondary members then copy and apply these operations in an asynchronous

process.

42JUNE 2014

What is Replication Lag?• A delay between an operation on the primary and the application of that

operation from the oplog to the secondary.

• Effects of excessive lag

• “Lagged” members ineligible to quickly become primary

• Increases the possibility that distributed read operations will be inconsistent.

43JUNE 2014

How did this affect us?• Degraded overall health of our production deployment.

• Distributed reads are no longer eventually consistent.

• Unable to bring new secondary members online.

• Caused MMS Backups to do full re-syncs.

44JUNE 2014

Identifying Replication Lag Issues with MMSThe Replication Lag chart displays the lag for your deployment

45JUNE 2014

Diagnosis• Possible causes of replication lag include network latency, disk throughput,

concurrency and/or appropriate write concern

• Size of operations to be replicated

• Confirmed Non-Issues for us

• Network latency

• Disk throughput

• Possible Issues for us

• Concurrency/write concern

• Size of op is an issue because entire document is written to oplog

46JUNE 2014

Concurrency/Write Concern• Our applications apply many updates very quickly

• All operations need to be replicated to secondary members

• We use the default write concern—Acknowledge

• The mongod confirms receipt of the write operation

• Allows clients to catch network, duplicate key and other errors

47JUNE 2014

Concurrency Wasn’t the IssueLock Percentage

48JUNE 2014

Operation Size Was the IssueCollection A (most active)

Total Updates: 3,373

Total Size of updates: 6.5 GB

Activity accounted for nearly 87% of total traffic

Collection B (next most active)

Total Updates: 85,423

Total Size of updates: 740 MB

49JUNE 2014

Fast Growing oplog causes issuesReplication oplog Window – approximate hours available in the primary’s oplog

50JUNE 2014

How We Fixed It• Changed our schema

• Changed the types of updates that were made to documents

• Both allowed us to utilize atomic operations

• Led to smaller updates

• Smaller updates == less oplog space used

51JUNE 2014

Replication Lag: Today

52JUNE 2014

oplog Window: Today

53JUNE 2014

Keeping Your Deployment Healthy

54JUNE 2014

MMS Alerts

55JUNE 2014

Watch for Warnings• Be warned if you are

• Running outdated versions

• Have startup warnings

• If a mongod is publicly visible

• Pay attention to these warnings

56JUNE 2014

MMS Backups• Engineered by MongoDB

• Continuous backup with point-in-time recovery

• Fully managed backups

57JUNE 2014

Using MMS Backups• Seeding new secondaries

• Repairing replica set members

• Development and testing databases

• Restores are free!

58JUNE 2014

Summary• Know what’s expected and “normal” in your systems

• Know when and what changes in your systems

• Utilize MMS alerts, visualizations and warnings to keep things running smoothly

59JUNE 2014

Questions?

Michael De Lorenzo

CTO, CMP.LY Inc.

michael@cmp.ly

@mikedelorenzo

top related