WORK WITH MULTIPLE HOT TERABYTES IN JVMSPER MINBORG@PMINBORGCTO, SPEEDMENT, INC.
See all the presentations from the In-Memory Computing Summit at http://imcsummit.org
SPEEDMENT, INC.
3ABOUT PER
SCENARIO
>1 TB
Application
Source of Truth
In-JVM-Cache
In-Memory Solution
Web ShopStock TradeBankMachine learningEtc.
PROS OF IN-MEMORY
Improved performance Consistent performance Cost reduction (server, AWS and licenses)
CHALLENGES OF IN-MEMORY
Optimized Speed Cost and size of Memory Consistency, Restart, DB impact, etc. Organization and size of JVMs
CHALLENGES OF IN-MEMORY
Optimized Speed Cost and size of Memory Consistency, Restart, DB impact, etc. Organization and size of JVMs
OPTIMIZED SPEED
No matter how advanced database you may ever use, it is really the data locality that counts
Eventually, memory will cost less than x $/GB (Pick any x)
LATENCIES USING THE SPEED OF LIGHT
Database query (1 s)
LATENCIES USING THE SPEED OF LIGHT
Disk Seek – LA TCP (DC) – SJ SSD - Oakland
LATENCIES USING THE SPEED OF LIGHT
Main Memory CPU L3 Cache
LATENCIES USING THE SPEED OF LIGHT
CPU L2 Cache CPU L1 Cache
CHALLENGES OF IN-MEMORY
Optimized Speed Cost and size of Memory Consistency, Restart, DB impact, etc. Organization and size of JVMs
TITLE OF SLIDE GOES HEREHow much does 1 GB cost?
BACK TO THE FUTURE
$ 5
$ 0.04
$ 720,000
$ 67,000,000,000
Source: http://www.jcmit.com/memoryprice.htm
BACK TO THE FUTURE
CHALLENGES OF IN-MEMORY
Optimized Speed Cost and size of Memory Consistency, Restart, DB impact, etc. Organization and size of JVMs
CACHE SYNCHRONIZATION STRATEGIES
• Dumps are reloaded periodically • All data elements are reloaded• Data remains unchanged between
reloads• System restart is just a reload
DUMP AND LOAD• Data evicted, refreshed or marked
as old• Evicted element are reloaded• Data changes all the time• System restart either warm-up the
cache or use a cold cache
POLL
CACHE SYNCHRONIZATION STRATEGIES
• Changed data is captured in the Database• Changed data events are pushed into the cache• Events are grouped in transactions• Cache updates are persisted• Data changes all the time• System restart, replay the missed events
REACTIVE PERSISTANT CACHING
COMPARISON
Dump and Load Caching
Poll Caching Reactive Persistance Caching
Max Data Age Dump period Eviction time Replication Latency - ms
Lookup Performance Consistently Instant ~20% slow Consistently Instant
Consistency Eventually Consistent Inconsistent - stale data Eventually Consistent
Database Cache Update Load
Total Size Depends on Eviction Time and Access Pattern
Rate of Change
Restart Complete Reload Eviction Time Down time update rate -> 10% of down time
*
CHALLENGES OF IN-MEMORY
Optimized Speed Cost and size of Memory Consistency, Restart, DB impact, etc. Organization and size of JVMs
BIG JVMS WITH TERABYTES OF DATA
Scale Up One large JVM handles all data Map memory to (SSD backed) files Several JVMs can share data via the file system Instant restart
Scale Out Have several JVMs in a network Use sharding between nodes Redundant nodes
CONVENTIONAL JAVA APPLICATIONS
Java Objects live on the Heap and are Garbage Collected periodically Garbage Collection times increases with the Java Heap size Garbage Collection times increases with the Java Heap mutation rate “The app has hit the GC wall” Hard to meet reasonable SLAs with more than 16:ish GB JVMs 10 TB data and 10 GB JVMs -> ~1000 JVMs
OFF HEAP STORAGE
Stores data outside of the Java heap The Garbage Collector does not see the content Scales up to terra bytes of main memory in a single JVM Use any number of nodes for scale out solutions
PERSISTENT SCALE OUT CACHE
Persists data in files or memory mapped files SSD backing device recommended 1.3 GB/s reload per node
10 GB in 6s 100 GB in 1 min 1 TB in 10 min
6.5 GB/s reload in a system with 10 nodes (1 active and 1 backup) 10 GB in 1 s 100 GB in 12 s 1 TB in 2 min
65 GB/s reload in a system with 100 nodes, 1 TB in 12 s
COMPRESSED OOPS IN JAVA 8
Using the default of –XX:+UseCompressedOops –XX:ObjectAlignmentInBytes=16
In a 64-bit JVM, it can use “compressed” memory references. This allows the heap to be up to 64 GB without the overhead of 64-bit object references. As all object must be 8 or 16-byte aligned, the lower 3 or 4 bits of the address are
always zeros and don’t need to be stored. This allows the heap to reference 4 billion * 16-bytes or 64 GB.
Uses 32-bit references.
JVM SIZE SWEET SPOT
50 GB off heap per node 20 nodes per terabyte 40 nodes per terabyte with minimum redundancy
CONCLUSIONS
Get speed by keeping your data close to the application RAM is cheap and getting bigger and ever cheaper Consistent solution with Reactive Persistent Caching
Reactive Persistent Caching imposes minimum load on restart and on the DB Scale up solutions can be in the terabytes with virtual memory or file mapped memory
Scale out solutions can use 50 GBish nodes
SOLUTION
>1 TB
Application
In-JVM-Cache Web Shop
Stock TradeBankMachine learningEtc.
Source of Truth
SPEEDMENT
Java Application Development Tool In-JVM-memory cache Database SQL Reflector (CDC, Change Data Capture) Pluggable storage engines (Speedment, Chronicle Map, Hazelcast, Grid Gain, etc.) Code generation tool -> Automatic domain model extraction from databases Transaction-aware
SPEEDMENT SCALE UP ULTRA-LOW LATENCY CACHE
Ultra-low latency (Runs in the same JVM as the application) Millions of TPS Latencies measured in microseconds Supports file mapping Terabytes of data O(1) for equality operations O(log(N)) for other operations
SPEEDMENT SQL REFLECTOR Detects changes in a
database Buffers the changes Can replay the changes later
on Will preserve order
Will preserve transactions Sees data as it was persisted Detects changes from any
source
Database
INSERTUPDATEDELETE
DOWNLOAD TRIAL @ WWW.SPEEDMENT.COM
CONNECT TO YOUR EXISTING SQL DB
AUTOMATIC SCHEMA ANALYSIS
PUSH AND PLAY
OFFERINGS
Complete solutions for in-memory hot big data Software licenses Service and support Consulting