dealing with jvm limitations in apache cassandra (fosdem 2012)
TRANSCRIPT
![Page 1: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/1.jpg)
Dealing with JVM limitationsin Apache Cassandra
Jonathan Ellis / @spyced
![Page 2: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/2.jpg)
Pain points for Java databases
✤ GC✤ GC✤ GC
![Page 3: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/3.jpg)
Pain points for Java databases
✤ GC✤ Platform specific code
![Page 4: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/4.jpg)
GC
✤ Concurrent and compacting: choose one✤ G1✤ Azul C4 / Zing?
![Page 5: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/5.jpg)
Fragmentation
✤ Bloom filter arrays✤ Compression offsets
![Page 6: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/6.jpg)
Automatic mitigation?
✤ http://www.research.ibm.com/people/d/dfb/papers/Bacon03Controlling.pdf
✤ http://researcher.ibm.com/files/us-hirzel/pldi10-arraylets.pdf
![Page 7: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/7.jpg)
Fragmentation, 2
✤ Arena allocation for memtables
![Page 8: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/8.jpg)
(Memtables?)
Memory
Hard drive
Memtable
write( , )k1 c1:v1
Commit log
![Page 9: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/9.jpg)
Memory
Hard drive
Memtable
write( , )k1 c1:v
Commit log
k1 c1:v
k1 c1:v
![Page 10: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/10.jpg)
Memory
Hard drive
write( , )k1 c2:v
k1 c1:v
k1 c1:v
k1 c2:v
c2:v
![Page 11: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/11.jpg)
Memory
Hard drive
k1 c1:v
k1 c1:v
k1 c2:v
c2:v
write( , )k2 c1:v c2:v
k2 c1:v c2:v
k2 c1:v c2:v
![Page 12: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/12.jpg)
Memory
Hard drive
k1 c1:v
k1 c1:v
k1 c2:v
c2:v
write( , )k1 c1:v c3:v
k2 c1:v c2:v
k2 c1:v c2:v
k1 c1:v c3:v
c3:v
![Page 13: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/13.jpg)
Memory
Hard drive
SSTable
flush
k1 c1:v c2:v
k2 c1:v c2:v
c3:v
index
cleanup
![Page 14: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/14.jpg)
“Java is a memory hog”
✤ Large overhead for typical objects and collections✤ How large?✤ java.lang.instrument.Instrumentation
✤ JAMM: Java Agent for Memory Measurements✤ https://github.com/jbellis/jamm
![Page 15: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/15.jpg)
org.apache.cassandra.cache.SerializingCache
✤ Live objects are about 85% JVM bookeeping✤ org.apache.cassandra.cache.FreeableMemory using reference
counting✤ Considering doing reference-counted, off-heap memtables
as well
![Page 16: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/16.jpg)
Don’t forget about young gen
✤ Always stop-the-world for ~100ms
![Page 17: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/17.jpg)
Platform-specific code
✤ OS✤ JVM
![Page 18: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/18.jpg)
m[un]map
✤ Log-structured storage wants to remove old files post-compaction; some platforms disallow deleting open files
✤ Old workaround (pre-1.0): ✤ use PhantomReference to tell when mmap’d file is GC (hence
unmapped)✤ Poor user experience and messy corner cases
✤ New workaround:✤ Class.forName("sun.nio.ch.DirectBuffer").getMethod("cleaner")
![Page 19: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/19.jpg)
mmap part 2
✤ 2GB limit via ByteBuffer: public abstract byte get(int index)
✤ Workaround: MmappedSegmentedFilepublic Iterator<DataInput> iterator(long position)
![Page 20: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/20.jpg)
link
✤ Used for snapshots✤ Old workaround: JNA✤ New workaround: supported directly by Java7
![Page 21: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/21.jpg)
mlockall
✤ swappiness: pissing off database developers since 2001 (?)✤ mlockall(MCL_CURRENT)
![Page 22: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/22.jpg)
Low-level i/o
✤ posix_fadvise✤ mincore/fincore✤ fctl
✤ ... JNA
![Page 23: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/23.jpg)
A plug for JNA
✤ https://github.com/twall/jna
static { try { Native.register("c"); ...
private static native int mlockall(int flags) throws LastErrorException;
![Page 24: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/24.jpg)
The fallacy of choosing portability over power
✤ Applets have been dead for years✤ Python gets it right
✤ import readline
![Page 25: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/25.jpg)
The fallacy of choosing safety over power
✤ Allowing munmap would expose developers to segfaults✤ But, relying on the GC to clean up external resources is a
well-known antipattern✤ File.close
✤ We need munmap badly enough that we resort to unnatural and unportable code to get it✤ You haven’t kept us from risking segfaults, you’ve just made us
miserable
![Page 26: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/26.jpg)
Compatibility through obscurity?
✤ sun.misc.Unsafe✤ Used by high-profile libraries like high-scale-lib
![Page 27: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/27.jpg)
... even public options
http://blogs.oracle.com/dave/entry/false_sharing_induced_by_card
![Page 28: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/28.jpg)
Too negative?
![Page 29: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)](https://reader033.vdocument.in/reader033/viewer/2022052619/555c2666d8b42a09438b4ccc/html5/thumbnails/29.jpg)
Still true
✤ "Many concurrent algorithms are very easy to write with a GC and totally hard (to down right impossible) using explicit free." -- Cliff Click