imc summit 2016 breakout - per minoborg - work with multiple hot terabytes in jvms

Post on 09-Jan-2017

345 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

WORK WITH MULTIPLE HOT TERABYTES IN JVMSPER MINBORG@PMINBORGCTO, SPEEDMENT, INC.

See all the presentations from the In-Memory Computing Summit at http://imcsummit.org

SPEEDMENT, INC.

3ABOUT PER

SCENARIO

>1 TB

Application

Source of Truth

In-JVM-Cache

In-Memory Solution

Web ShopStock TradeBankMachine learningEtc.

PROS OF IN-MEMORY

Improved performance Consistent performance Cost reduction (server, AWS and licenses)

CHALLENGES OF IN-MEMORY

Optimized Speed Cost and size of Memory Consistency, Restart, DB impact, etc. Organization and size of JVMs

CHALLENGES OF IN-MEMORY

Optimized Speed Cost and size of Memory Consistency, Restart, DB impact, etc. Organization and size of JVMs

OPTIMIZED SPEED

No matter how advanced database you may ever use, it is really the data locality that counts

Eventually, memory will cost less than x $/GB (Pick any x)

LATENCIES USING THE SPEED OF LIGHT

Database query (1 s)

LATENCIES USING THE SPEED OF LIGHT

Disk Seek – LA TCP (DC) – SJ SSD - Oakland

LATENCIES USING THE SPEED OF LIGHT

Main Memory CPU L3 Cache

LATENCIES USING THE SPEED OF LIGHT

CPU L2 Cache CPU L1 Cache

CHALLENGES OF IN-MEMORY

Optimized Speed Cost and size of Memory Consistency, Restart, DB impact, etc. Organization and size of JVMs

TITLE OF SLIDE GOES HEREHow much does 1 GB cost?

BACK TO THE FUTURE

$ 5

$ 0.04

$ 720,000

$ 67,000,000,000

Source: http://www.jcmit.com/memoryprice.htm

BACK TO THE FUTURE

CHALLENGES OF IN-MEMORY

Optimized Speed Cost and size of Memory Consistency, Restart, DB impact, etc. Organization and size of JVMs

CACHE SYNCHRONIZATION STRATEGIES

• Dumps are reloaded periodically • All data elements are reloaded• Data remains unchanged between

reloads• System restart is just a reload

DUMP AND LOAD• Data evicted, refreshed or marked

as old• Evicted element are reloaded• Data changes all the time• System restart either warm-up the

cache or use a cold cache

POLL

CACHE SYNCHRONIZATION STRATEGIES

• Changed data is captured in the Database• Changed data events are pushed into the cache• Events are grouped in transactions• Cache updates are persisted• Data changes all the time• System restart, replay the missed events

REACTIVE PERSISTANT CACHING

COMPARISON

Dump and Load Caching

Poll Caching Reactive Persistance Caching

Max Data Age Dump period Eviction time Replication Latency - ms

Lookup Performance Consistently Instant ~20% slow Consistently Instant

Consistency Eventually Consistent Inconsistent - stale data Eventually Consistent

Database Cache Update Load

Total Size Depends on Eviction Time and Access Pattern

Rate of Change

Restart Complete Reload Eviction Time Down time update rate -> 10% of down time

*

CHALLENGES OF IN-MEMORY

Optimized Speed Cost and size of Memory Consistency, Restart, DB impact, etc. Organization and size of JVMs

BIG JVMS WITH TERABYTES OF DATA

Scale Up One large JVM handles all data Map memory to (SSD backed) files Several JVMs can share data via the file system Instant restart

Scale Out Have several JVMs in a network Use sharding between nodes Redundant nodes

CONVENTIONAL JAVA APPLICATIONS

Java Objects live on the Heap and are Garbage Collected periodically Garbage Collection times increases with the Java Heap size Garbage Collection times increases with the Java Heap mutation rate “The app has hit the GC wall” Hard to meet reasonable SLAs with more than 16:ish GB JVMs 10 TB data and 10 GB JVMs -> ~1000 JVMs

OFF HEAP STORAGE

Stores data outside of the Java heap The Garbage Collector does not see the content Scales up to terra bytes of main memory in a single JVM Use any number of nodes for scale out solutions

PERSISTENT SCALE OUT CACHE

Persists data in files or memory mapped files SSD backing device recommended 1.3 GB/s reload per node

10 GB in 6s 100 GB in 1 min 1 TB in 10 min

6.5 GB/s reload in a system with 10 nodes (1 active and 1 backup) 10 GB in 1 s 100 GB in 12 s 1 TB in 2 min

65 GB/s reload in a system with 100 nodes, 1 TB in 12 s

COMPRESSED OOPS IN JAVA 8

Using the default of –XX:+UseCompressedOops –XX:ObjectAlignmentInBytes=16

In a 64-bit JVM, it can use “compressed” memory references. This allows the heap to be up to 64 GB without the overhead of 64-bit object references. As all object must be 8 or 16-byte aligned, the lower 3 or 4 bits of the address are

always zeros and don’t need to be stored. This allows the heap to reference 4 billion * 16-bytes or 64 GB.

Uses 32-bit references.

JVM SIZE SWEET SPOT

50 GB off heap per node 20 nodes per terabyte 40 nodes per terabyte with minimum redundancy

CONCLUSIONS

Get speed by keeping your data close to the application RAM is cheap and getting bigger and ever cheaper Consistent solution with Reactive Persistent Caching

Reactive Persistent Caching imposes minimum load on restart and on the DB Scale up solutions can be in the terabytes with virtual memory or file mapped memory

Scale out solutions can use 50 GBish nodes

SOLUTION

>1 TB

Application

In-JVM-Cache Web Shop

Stock TradeBankMachine learningEtc.

Source of Truth

SPEEDMENT

Java Application Development Tool In-JVM-memory cache Database SQL Reflector (CDC, Change Data Capture) Pluggable storage engines (Speedment, Chronicle Map, Hazelcast, Grid Gain, etc.) Code generation tool -> Automatic domain model extraction from databases Transaction-aware

SPEEDMENT SCALE UP ULTRA-LOW LATENCY CACHE

Ultra-low latency (Runs in the same JVM as the application) Millions of TPS Latencies measured in microseconds Supports file mapping Terabytes of data O(1) for equality operations O(log(N)) for other operations

SPEEDMENT SQL REFLECTOR Detects changes in a

database Buffers the changes Can replay the changes later

on Will preserve order

Will preserve transactions Sees data as it was persisted Detects changes from any

source

Database

INSERTUPDATEDELETE

DOWNLOAD TRIAL @ WWW.SPEEDMENT.COM

CONNECT TO YOUR EXISTING SQL DB

AUTOMATIC SCHEMA ANALYSIS

PUSH AND PLAY

OFFERINGS

Complete solutions for in-memory hot big data Software licenses Service and support Consulting

sales@speedment.com

@Speedment

www.speedment.com

top related