trust but verify - a year with cassandra and the hunt...

44
Trust but verify A year with Cassandra and the hunt for native memory JVM leaks Chris Burroughs Clearspring 2011-08-22 Chris Burroughs (Clearspring) Trust but verify 2011-08-22 1 / 34

Upload: others

Post on 29-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Trust but verifyA year with Cassandra and the hunt for native memory JVM leaks

Chris Burroughs

Clearspring

2011-08-22

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 1 / 34

Page 2: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

1 Introduction

2 Cassandra at Clearspring

3 Some Definitions

4 Time-line of the Hunt

5 Conclusions

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 2 / 34

Page 3: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Table of Contents

1 Introduction

2 Cassandra at Clearspring

3 Some Definitions

4 Time-line of the Hunt

5 Conclusions

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 3 / 34

Page 4: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Hello!

Chris Burroughs [email protected]

Active in the Apache Cassandra and (incubating) Kafka communities

A few mostly minor tickets: 1966, 2082, 2551

http://www.meetup.com/Cassandra-DC-Meetup/

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 4 / 34

Page 5: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

We are hiring

http://www.clearspring.com/about/careers

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 5 / 34

Page 6: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

What is this talk about?

Some of what we learned after using Cassandra for a year.

Particularly as we struggled with with unbounded RES growth. Mostof this is applicable to any JVM program.

I’ve tried to explain things when they make sense, not chronologicallywhen we figured them out. (But feel free to ask questions)

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 6 / 34

Page 7: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Disclaimers

I have come out of this with a general positive view of Cassandraeven though getting there sucked.

This is mostly about what I learned, to the extent that there were“discoveries” they were made by others.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 7 / 34

Page 8: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Disclaimers

I have come out of this with a general positive view of Cassandraeven though getting there sucked.

This is mostly about what I learned, to the extent that there were“discoveries” they were made by others.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 7 / 34

Page 9: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Table of Contents

1 Introduction

2 Cassandra at Clearspring

3 Some Definitions

4 Time-line of the Hunt

5 Conclusions

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 8 / 34

Page 10: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 9 / 34

Page 11: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Sharecounter

Capacity planning conundrum: The counter will account for between0 and 100% of views within ? days?/weeks?/months?

Primary considerations: Proven, incremental, horizontal scalability.Tolerance to individual node failures.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 10 / 34

Page 12: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Sharecounter

Capacity planning conundrum: The counter will account for between0 and 100% of views within ? days?/weeks?/months?

Primary considerations: Proven, incremental, horizontal scalability.Tolerance to individual node failures.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 10 / 34

Page 13: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Tangent: Counters

We did not use CASSANDRA-1072 counters.

(Probably will in the future depending on results of SSTable compressiontests.)

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 11 / 34

Page 14: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Tangent: Counters

We did not use CASSANDRA-1072 counters.

(Probably will in the future depending on results of SSTable compressiontests.)

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 11 / 34

Page 15: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Table of Contents

1 Introduction

2 Cassandra at Clearspring

3 Some Definitions

4 Time-line of the Hunt

5 Conclusions

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 12 / 34

Page 16: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

JVM: Cocoon

JVM, on heap: “Normal” place for allocation. You can set a max sizeof n bytes.

I Max heap size seems to get a reasonable amount of respect from theJVM.

I But the heap can fragment and take up more than n bytes. This isdifficult to detect.

JVM, off heap: Give me some bytes! You can use either useDirectByteBuffer’s yourself, or it’s likely that you use a library thatdoes (NIO).

JVM, permgen: Classes and stuff like that.

Other: Hotspot is a C++ program. It can use memory for whateverit needs to do.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 13 / 34

Page 17: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Linux: Harsh Reality

Resident set size: The least bad measure of how much memory aprocess is using.

mmap(2): mmap-ed files are counted as part of your PIDs RSS.Reduces visibility (have fun with pmap and friends), may be faster.

Linux does not care about your nice heap abstractions, it’s just anotherprocess.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 14 / 34

Page 18: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Wizard is Out Of Mana

JVM: OutOfMemory Exception → Nice log messages with a clue towhat happened.

Linux: The kernel needs more memory → it kills processes until it’ssatisfied. Check dmesg.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 15 / 34

Page 19: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Table of Contents

1 Introduction

2 Cassandra at Clearspring

3 Some Definitions

4 Time-line of the Hunt

5 Conclusions

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 16 / 34

Page 20: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

First test stack failure

A node in the test stack dies at 2010-10-10 at 3:15pm.

Around this time there was a large and unexplained increase inCPU utilization

$ dmesg | grep -i oom

syslogd invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0

java invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

That was weird, decrease max heap size and forget about it.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 17 / 34

Page 21: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

About a month later . . .

All production servers die within an hour or so of each other.

On failures

We often model as if failures are uncorrelated.

This isn’t really true for hardware (ie same model disks), but itdefinitely is not true for software.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 18 / 34

Page 22: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

About a month later . . .

All production servers die within an hour or so of each other.

On failures

We often model as if failures are uncorrelated.

This isn’t really true for hardware (ie same model disks), but itdefinitely is not true for software.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 18 / 34

Page 23: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Monitoring!

We get a graph like this:

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 19 / 34

Page 24: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

More Monitoring!

Start rolling restarts every few weeks.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 20 / 34

Page 25: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

January: reduced cached mem; resident set size growth

Armed with graphs we started posting on cassandra-users.

1 Hotspot version

(we upgraded, no difference)

2 permgen? (nope, checked that)

3 mmap? (nope, disabled that a long time ago)

4 swap? (Not currently swapping)

5 Heap Fragmentation (Well that’s interesting, have fun with jemalloc)

This smelled like a JVM/glibc/kernel bug, but we are faced with the factthat it only occurs when we are running Cassandra.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 21 / 34

Page 26: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

January: reduced cached mem; resident set size growth

Armed with graphs we started posting on cassandra-users.

1 Hotspot version (we upgraded, no difference)

2 permgen?

(nope, checked that)

3 mmap? (nope, disabled that a long time ago)

4 swap? (Not currently swapping)

5 Heap Fragmentation (Well that’s interesting, have fun with jemalloc)

This smelled like a JVM/glibc/kernel bug, but we are faced with the factthat it only occurs when we are running Cassandra.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 21 / 34

Page 27: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

January: reduced cached mem; resident set size growth

Armed with graphs we started posting on cassandra-users.

1 Hotspot version (we upgraded, no difference)

2 permgen? (nope, checked that)

3 mmap?

(nope, disabled that a long time ago)

4 swap? (Not currently swapping)

5 Heap Fragmentation (Well that’s interesting, have fun with jemalloc)

This smelled like a JVM/glibc/kernel bug, but we are faced with the factthat it only occurs when we are running Cassandra.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 21 / 34

Page 28: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

January: reduced cached mem; resident set size growth

Armed with graphs we started posting on cassandra-users.

1 Hotspot version (we upgraded, no difference)

2 permgen? (nope, checked that)

3 mmap? (nope, disabled that a long time ago)

4 swap?

(Not currently swapping)

5 Heap Fragmentation (Well that’s interesting, have fun with jemalloc)

This smelled like a JVM/glibc/kernel bug, but we are faced with the factthat it only occurs when we are running Cassandra.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 21 / 34

Page 29: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

January: reduced cached mem; resident set size growth

Armed with graphs we started posting on cassandra-users.

1 Hotspot version (we upgraded, no difference)

2 permgen? (nope, checked that)

3 mmap? (nope, disabled that a long time ago)

4 swap? (Not currently swapping)

5 Heap Fragmentation

(Well that’s interesting, have fun with jemalloc)

This smelled like a JVM/glibc/kernel bug, but we are faced with the factthat it only occurs when we are running Cassandra.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 21 / 34

Page 30: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

January: reduced cached mem; resident set size growth

Armed with graphs we started posting on cassandra-users.

1 Hotspot version (we upgraded, no difference)

2 permgen? (nope, checked that)

3 mmap? (nope, disabled that a long time ago)

4 swap? (Not currently swapping)

5 Heap Fragmentation (Well that’s interesting, have fun with jemalloc)

This smelled like a JVM/glibc/kernel bug, but we are faced with the factthat it only occurs when we are running Cassandra.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 21 / 34

Page 31: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Tangent: Rolling restarts and caches

Refresher on caches:

key cache: Caches location of keys

row cache: Caches entire rows

Also, the OS page cache

Cassandra can persist the entire key cache, and can persist the row keysfor the row cache, but not the rows themselves.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 22 / 34

Page 32: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Tangent: Rolling restarts and caches, shoot ourselves inthe foot

Before cache savings:

Size row cache it get best hit rate vs heap size trade-off

Restart node.

Node can’t handle reads, drops messages for a while. Not safe torestart another one until it stops.

After:

Size row cache it get best hit rate vs heap size trade-off.

Persist row cache keys

Restart node.

Wait half an hour for all row’s to be read, node now has a pile ofhinted handoffs to deal with.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 23 / 34

Page 33: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Tangent: Rolling restarts and caches, shoot ourselves inthe foot

Before cache savings:

Size row cache it get best hit rate vs heap size trade-off

Restart node.

Node can’t handle reads, drops messages for a while. Not safe torestart another one until it stops.

After:

Size row cache it get best hit rate vs heap size trade-off.

Persist row cache keys

Restart node.

Wait half an hour for all row’s to be read, node now has a pile ofhinted handoffs to deal with.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 23 / 34

Page 34: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Tangent: Rolling restarts and caches, right answer

Options:

1 Something hacky to save row values along with row keys and beinconsistent.

2 Something hacky to save a random set of row keys and hope thathelps.

3 Modify CLHM to allow traversal in hotness order.

4 Recognize that this is a sign you need more capacity.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 24 / 34

Page 35: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Tangent: Rolling restarts and caches, CASSANDRA-1966

Ben Manes Google Alert:

This example it would be a fair usage and justification of orderediteration. Its a trivial change, but its an enhancement I’veavoided eagerly performing until a project considers it aworthwhile feature.

1.0 will have a row cache keys to save option.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 25 / 34

Page 36: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

CASSANDRA-2654

CASSANDRA-2654

Work around native heap leak in sun.nio.ch.Util affectingIncomingTcpConnection

Java bug #6210541

Deep in the bowels of Java NIO is a weak references cache to directbyte buffers

That’s a painfully broken design.

CASSANDRA-2654 works around it. But this isn’t really a “leak”,since eventually a full GC should clean them up.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 26 / 34

Page 37: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

More attempts

Tried to audit the use of DirectByteBuffersI -XX:MaxDirectMemorySize

Opened a ticket with Oracle.I Has not gone anywhere yet.

Survey on the user listI No pattern among kernel, OS, hotspot, or other software versions.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 27 / 34

Page 38: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Hark, a Tweet!

http://twitter.com/#!/kimchy/status/90861039930970113

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 28 / 34

Page 39: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Java Bug 7066129

import j a v a . l ang . management . GarbageCol lectorMXBean ;import j a v a . l ang . management . ManagementFactory ;import j a v a . u t i l . L i s t ;

pub l i c c l a s s TestMemoryLeak {

pub l i c s t a t i c vo id main ( S t r i n g [ ] a r g s ) throws Excep t i on {wh i l e ( t rue ) {

L i s t<GarbageCol lectorMXBean> gcMxBeans = ManagementFactory . getGarbageCo l l ectorMXBeans ( ) ;f o r ( GarbageCol lectorMXBean gcMxBean : gcMxBeans ) {

( ( com . sun . management . GarbageCol lectorMXBean ) gcMxBean ) . g e t L a s tGc I n f o ( ) ;}

}}

}

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 29 / 34

Page 40: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

CASSANDRA-2868

Several people verified that disabling the GCInspector (which callsGarbageCollectorMXBean#getLastGcInfo) keeps RSS from increasing.

There is a patch that tries to get similar data through another set ofmethods.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 30 / 34

Page 41: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Table of Contents

1 Introduction

2 Cassandra at Clearspring

3 Some Definitions

4 Time-line of the Hunt

5 Conclusions

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 31 / 34

Page 42: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Conclusions

Happy to be spending less time with Cassandra for a while.

There are bugs in Hotspot, your file system, RHEL5 and everythingelse you think is infallible.

I think page cache management is the open question right now.

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 32 / 34

Page 43: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Thoughts on Upcoming Cassandra changes

Very excited about the alternative SSTable format inCASSANDRA-674 and friends (type specific data compression,compressed index, row cache as row+filter, etc)

Once burned twice shy: Terrified of off heap data structures, but itlooks like we didn’t go down that path after all. (CASSANDRA-2252)

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 33 / 34

Page 44: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory

Questions?

Chris Burroughs (Clearspring) Trust but verify 2011-08-22 34 / 34