tuning large scale java platforms

82
© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission. Tuning Large Scale Java Platforms By Emad Benjamin, Principal Architect at VMware & Jamie O’Meara, Community Engineer at Pivotal

Upload: spring-io

Post on 14-Jun-2015

1.465 views

Category:

Software


5 download

DESCRIPTION

Speakers:Emad Benjamin, VMWARE Jamie O'Meara, Pivotal Applied Spring Track The session will cover various GC tuning techniques, in particular focus on tuning large scale JVM deployments. Come to this session to learn about GC tuning recipe that can give you the best configuration for latency sensitive applications. While predominantly most enterprise class Java workloads can fit into a scaled-out set of JVM instances of less than 4GB JVM heap, there are workloads in the in memory database space that require fairly large JVMs. In this session we take a deep dive into the issues and the optimal tuning configurations for tuning large JVMs in the range of 4GB to 128GB. In this session the GC tuning recipe shared is a refinement from 15 years of GC engagements and an adaptation in recent years for tuning some of the largest JVMs in the industry using plain HotSpot and CMS GC policy. You should be able to walk away with the ability to commence a decent GC tuning exercise on your own. The session does summarize the techniques and the necessary JVM options needed to accomplish this task. Naturally when tuning large scale JVM platforms, the underlying hardware tuning cannot be ignored, hence the session will take detour from the traditional GC tuning talks out there and dive into how you optimally size a platform for enhanced memory consumption. Lastly, the session will also cover Pivotal Application Suite reference architecture where a comprehensive performance study was done.

TRANSCRIPT

Page 1: Tuning Large Scale Java Platforms

© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Tuning Large Scale Java Platforms

By Emad Benjamin, Principal Architect at VMware

& Jamie O’Meara, Community Engineer at Pivotal

Page 2: Tuning Large Scale Java Platforms

Speaker Bio- Emad Benjamin, [email protected], @vmjavabook , vmjava.comBio

Graduated with BE,

Published

undergraduate

thesis

1993

Independent consultant

On C++ and Java,

Open source contributions

1994 -2005

VMware IT -

virtualized all Java

systems

2005-2010 2010-2012

Tech lead for

vFabric Reference Architecture

http://tinyurl.com/mvtyoq7

2013 2014

LiVefire Trainer

Page 3: Tuning Large Scale Java Platforms

Speaker Bio- Jamie O’Meara, [email protected] Bio

• 20 years as a Software Engineer, building Commercial and Federal software solutions.

• 4 Years @ VMware, now Pivotal

• Latest project is Cloud Foundry

• Twitter: JamieOMeara

• Blog: www.littlepivots.com

Page 4: Tuning Large Scale Java Platforms

Agenda

• Defining Large Scale Java Platforms

• Design and Sizing Large Scale Java Platforms

• Tuning Large Scale Java Platforms

• Modern PAAS Platforms

• Q and A

4

Page 5: Tuning Large Scale Java Platforms

Defining Large Scale Java Platforms

5

Page 6: Tuning Large Scale Java Platforms

Conventional Application Platforms

Java platforms are multitier and multi org

DB Servers Java Applications

Load Balancer Tier

Load Balancers Web Servers

IT Operations

Network Team

IT Operations

Server Team

IT Apps – Java

Dev Team

IT Ops & Apps

Dev Team

Organizational Key Stakeholder Departments

Web Server Tier Java App Tier DB Server Tier

Page 7: Tuning Large Scale Java Platforms

DB Servers

Load Balancer Tier Web Server Tier

Java App Tier

DB Server Tier

Web Server Pool

App Server Pool

DB Connection Pool

Html static lookup

requests (load on

webservers)

Dynamic Request to DB,

create Java Threads

(load on Java App server

and DB)

Page 8: Tuning Large Scale Java Platforms

Platform Engineer?

8

• Platform Engineers play a key role in correctly mapping development software artifacts onto

appropriate infrastructure services

• They know when to scale up vs. scale-out

• They know how to size both the software components, application runtimes (JVMs) and underlying

infrastructure to guarantee SLAs

• An ideal platform engineer would have deep knowledge in each of the above three areas

Developer Deployment

(JVM/App

Runtime)

Infrastructure Platform

Engineer

Page 9: Tuning Large Scale Java Platforms

Java Platform Categories – Category 1 (many smaller JVMs)

Smaller JVMs < 4GB heap, 4.5GB

Java process, and 5GB for VM

vSphere hosts with <96GB RAM is

more suitable, as by the time you

stack the many JVM instances, you

are likely to reach CPU boundary

before you can consume all of the

RAM. For example if instead you chose

a vSphere host with 256GB RAM, then

256/4.5GB => 57JVMs, this would clearly

reach CPU boundary

Multiple JVMs per VM

Use Resource pools to manage

different LOBs

Consider using 4 socket servers to

get more cores

Most common workloads are Web

apps in this category

Category 1: 100s to 1000s of JVMs

Page 10: Tuning Large Scale Java Platforms

Java Platform Categories – Category 1

Category 1: 100s to 1000s of JVMs

Use 4 socket servers

to get more cores

Consider using 4 socket servers

instead of 2 sockets to get more

cores

Page 11: Tuning Large Scale Java Platforms

Java Platform Categories – Category 2 - fewer larger JVMs

Fewer JVMs < 20

Very large JVMs, 32GB to 128GB

Always deploy 1 VM per NUMA node and size to fit perfectly

1 JVM per VM

Choose 2 socket vSphere hosts, and install ample

memory128GB to 512GB

Example is in memory databases, like SQLFire and

GemFire

Apply latency sensitive BP disable interrupt coalescing pNIC

and vNIC

Dedicated vSphere cluster

Category 2: a dozen of very large JVMs

Use 2 socket servers

to get larger NUMA

nodes

Page 12: Tuning Large Scale Java Platforms

Java Platform Categories – Category 3

Many Smaller JVMs Accessing Information From Fewer Large JVMs

Category 3: Category-1 accessing data from Category-2

Resource Pool 1

Gold LOB 1

Resource Pool 2

SilverLOB 2

Page 13: Tuning Large Scale Java Platforms

Designing and Sizing Large Scale Java Platforms

13

Page 14: Tuning Large Scale Java Platforms

Design and Sizing of Application Platforms

Step 1 –

Establish Load profile

From production logs/monitoring reports measure:

Concurrent UsersRequests Per SecondPeak Response TimeAverage Response Time

Establish your response time SLA

Step 2

Establish Benchmark

Iterate through Benchmark test until you are satisfied with the Load profile metrics and your intended SLAafter each benchmark iteration you may have to adjust the Application Configuration Adjust the vSphereenvironment to scale out/up in order to achieve your desired number of VMs, number of vCPU and RAM configurations

Step 3 –

Size Production Env.

The size of the production environment would have been established in Step2, hence either you roll out the environment from Step-2 or build a new one based on the numbers established

Page 15: Tuning Large Scale Java Platforms

Step 2 – Establish Benchmark

DETERMINE HOW MANY VMs

Establish Horizontal Scalability

Scale Out Test

How many VMs do you need to meet your

Response Time SLAs without reaching 70%-

80% saturation of CPU?

Establish your Horizontal scalability Factor

before bottleneck appear in your application

Scale Out Test

Building Block VM Building Block VM

SLA

OK?

Test

complete

Investigate bottlnecked layer

Network, Storage, Application

Configuration, & vSphere

If scale out

bottlenecked layer is

removed, iterate scale

out test

If building block

app/VM config

problem, adjust &

iterate No

Building Block VM

Building Block VM

ESTABLISH BUILDING BLOCK VM

Establish Vertical scalability

Scale Up Test

Establish how many JVMs on a VM?

Establish how large a VM would be in terms of

vCPU and memory

Sca

le U

p T

est

Building Block VM

Page 16: Tuning Large Scale Java Platforms

HotSpot JVMs on VM

JVM Max Heap -Xmx

JVM Memory

Perm Gen

Initial Heap

Guest OS Memory

VM Memory

-Xms

Java Stack

-Xss per thread

-XX:MaxPermSize

Other mem

Direct native

Memory

“off-the-heap”

Non Direct

Memory

“Heap”

Page 17: Tuning Large Scale Java Platforms

HotSpot JVMs on VM

Guest OS Memory approx 1G (depends on OS/other processes)

Perm Size is an area additional to the –Xmx (Max Heap) value and is not GC-ed because it contains class-level

information.

“other mem” is additional mem required for NIO buffers, JIT code cache, classloaders, Socket Buffers (receive/send), JNI,

GC internal info

If you have multiple JVMs (N JVMs) on a VM then:

• VM Memory = Guest OS memory + N * JVM Memory

VM Memory = Guest OS Memory + JVM Memory

JVM Memory = JVM Max Heap (-Xmx value) + JVM Perm Size (-XX:MaxPermSize) +

NumberOfConcurrentThreads * (-Xss) + “other Mem”

Page 18: Tuning Large Scale Java Platforms

Sizing Example

JVM Max Heap -Xmx (4096m)

JVM Memory (4588m) Perm Gen

Initial Heap

Guest OS Memory

VM Memory (5088m)

-Xms (4096m)

Java Stack -Xss per thread (256k*100)

-XX:MaxPermSize (256m)

Other mem (=217m)

500m used by OS

set mem Reservation to 5088m

Page 19: Tuning Large Scale Java Platforms

Larger JVMs for In-Memory Data Grids

JVM Max Heap -Xmx (30g)

Perm Gen

Initial Heap

Guest OS Memory

-Xms (30g)

Java Stack -Xss per thread (1M*500)

-XX:MaxPermSize (0.5g)

Other mem (=1g)

0.5-1g used by OS

Set memory reservation to 34g

JVM Memory for SQLFire (32g)

VM Memory for SQLFire (34g)

Page 20: Tuning Large Scale Java Platforms

96 GB RAM

on Server

Each NUMA

Node has 94/2

45GB

8 vCPU VMs

less than

45GB RAM

on each VM ESX Scheduler

If VM is sized greater

than 45GB or 8 CPUs,

Then NUMA interleaving

Occurs and can cause

30% drop in memory

throughput performance

Page 21: Tuning Large Scale Java Platforms

NUMA Local Memory with Overhead Adjustment

Physical RAM

On vSphere host

Physical RAM

On vSphere host

Number of VMs

On vSphere host

1% RAM

overhead

vSphere RAM

overhead

Number of Sockets

On vSphere host vSphere Overhead

Page 22: Tuning Large Scale Java Platforms

NUMA Local Memory with Overhead Adjustment

For production environments you obviously don’t want to run this close to the

NUMA Local Memory Ceiling, instead within 95% of the above NUMA Local

Memory

Prod NUMA Local Memory (Intel) Per VM = 0.95 * NUMA Local Memory

Prod NUMA Local Memory (AMD) Per VM = (0.95 * NUMA Local Memory)/2

Page 23: Tuning Large Scale Java Platforms

NUMA Local Memory with Overhead Adjustment

The nVMs (number of VMs) is best selected to be equal to the number of

Sockets, as a starting point, no doubt you can have an n-multiple of these,

examples:

• 2 socket Intel server, nVms=2 or more, 2 is the most optimal

• 4 socket Intel server, nVms=4 or more, 4 is the most optimal

• 2 socket AMD server, nVms=4 or more, 4 is the most optimal

• 4 socket AMD server, nVms=8 or more, 8 is the most optimal, overhead

calculation is a conservative over-estimate in this case it is likely that the

overhead is less than 8% for 8VMs

Page 24: Tuning Large Scale Java Platforms

Comparing VM Configurations on 2 socket 6 core Architecture

2 VMs of 6vCPU

each yielded best

results, i.e. 1 VM

per Socket

Page 25: Tuning Large Scale Java Platforms

96GB RAM

2 sockets

8 pCPU per

socket

Middleware

components

45GB RAM VMs

with 8vCPU

Locator/heart beat

for middleware

DO NOT VMotion

Memory Available for VMs = 96*0.98 -

1GB => 93GB=>93*0.95 => 88.5GB

Per NUMA memory => 88.5GB/2

Approx. 45GB for each VM

Page 26: Tuning Large Scale Java Platforms

Tuning Large Scale Java Platforms

26

Page 27: Tuning Large Scale Java Platforms

Which GC?

VMware doesn’t care which GC you select, because of the degree

of independence of Java to OS and OS to Hypervisor

Page 28: Tuning Large Scale Java Platforms

Tuning GC – Art Meets Science!

Either you tune for Throughput or reduction of Latency, one at the cost of the

other

Increase

Throughput

Reduce

Latency Tuning

Decisions

•improved R/T

•reduce latency impact

•slightly reduced throughput

•improved throughput

•longer R/T

•increased latency impact

Job

Web

Page 29: Tuning Large Scale Java Platforms

Sizing The Java Heap

JVM Max

Heap

-Xmx

(4096m)

Eden Space

Survivor

Space 2

Old Generation

Survivor

Space 1

Slower

Full GC

Quick

Minor GC YoungGen

-Xmn

(1350m)

OldGen

2746m

Page 30: Tuning Large Scale Java Platforms

Inside the Java Heap

Page 31: Tuning Large Scale Java Platforms

Parallel Young Gen and CMS Old Gen

application threads minor GC threads concurrent mark and sweep GC

Young Generation Minor GC Parallel GC in YoungGen using

XX:ParNewGC & XX:ParallelGCThreads

-Xmn

Old Generation Major GC Concurrent using in OldGen using

XX:+UseConcMarkSweepGC

Xmx minus Xmn

S

0

S

1

Page 32: Tuning Large Scale Java Platforms

High Level GC Tuning Recipe

Measure

Minor GC

Duration

and

Frequency

Adjust –Xmn

Young Gen size

and /or

ParallelGCThreads

Measure

Major GC

Duration

And

Frequency

Adjust

Heap space

–Xmx

Adjust –Xmn

And/or

SurvivorSpaces

Step A-Young Gen Tuning

Step B-Old Gen Tuning

Step C-

Survivor Spaces

Tuning

Applies to

Category-1 and 2

Platforms

Applies to

Category-2 Platforms

Page 33: Tuning Large Scale Java Platforms

Why is Duration and Frequency of GC Important?

Young Gen

Minor GC Old Gen

Major GC

Young Gen minor

GC duration

frequency frequency

Old Gen

GC duration

We want to ensure regular application user threads

get a chance to execute in between GC activity

Page 34: Tuning Large Scale Java Platforms

34

Impact of Increasing Young Generation (-Xmn)

Young Gen

Minor GC Old Gen

Major GC

less frequent Minor GC

but longer duration

potentially increased

Major GC frequency

You can mitigate the

increase in GC

frequency

by increasing -Xmx

You can mitigate the

increase in Minor GC

duration by increasing

ParallelGCThreads

Page 35: Tuning Large Scale Java Platforms

35

Impact of Reducing Young Generation (-Xmn)

Young Gen

Minor GC Old Gen

Major GC

more frequent

Minor GC but shorter

duration

Potentially increased

Major GC duration

You can mitigate the

increase in Major GC

duration by

decreasing -Xmx

Page 36: Tuning Large Scale Java Platforms

Survivor Spaces

• Survivor Space Size = -Xmn / (-XX:SurvivorRatio + 2 )

• Decrease Survivor Ratio causes an increase in Survivor Space Size

• Increase in Survivor Space Size causes Eden space to be reduced hence

o MinorGC frequency will increase

o More frequent MinorGC causes Objects to age quicker

o Use –XX:+PrintTenuringDistribution to measure how effectively objects age in

survivor spaces.

Page 37: Tuning Large Scale Java Platforms

37

Decrease Survivor Spaces by Increasing Survivor Ratio

Young Gen

Minor GC Old Gen

Major GC

more frequent Minor GC

but shorter duration

Hence Minor GC

frequency is reduced

with slight increase in

minor GC duration

S0 S1 S

0

S

1

Reduce

Survivor Space

Page 38: Tuning Large Scale Java Platforms

38

Increasing Survivor Ratio Impact on Old Generation

Young Gen

Minor GC Old Gen

Major GC

S

0

S

1

Increased Tenure ship/promotion

to old Gen

hence increased Major GC

Page 39: Tuning Large Scale Java Platforms

CMS Collector Example

• FullGC every 2hrs and overall Heap utilization down by 30%

java -Xms50g -Xmx50g -Xmn16g -XX:+UseConcMarkSweepGC -

XX:+UseParNewGC –XX:CMSInitiatingOccupancyFraction=75

–XX:+UseCMSInitiatingOccupancyOnly -XX:+ScavengeBeforeFullGC

-XX:TargetSurvivorRatio=80 -XX:SurvivorRatio=8 -XX:+UseBiasedLocking

-XX:MaxTenuringThreshold=15 -XX:ParallelGCThreads=6

-XX:+OptimizeStringConcat -XX:+UseCompressedStrings -XX:+UseStringCache

Page 40: Tuning Large Scale Java Platforms

CONFIDENTIAL 40

Parallel Young Gen and CMS Old Gen

Young Gen

Minor GC Old Gen

Major GC

Parallel/Throughput GC in

YoungGen using

XX:ParNewGC

XX:ParallelGCThreads

Concurrent using

XX:+UseConcMarkSweepGC

Application user threads

Minor GC threads Concurrent Mark and Sweep

Page 41: Tuning Large Scale Java Platforms

CMS Collector Example

• Customer chose not to use LargePages:

• They were content with performance they already achieved and did not want to make

OS level changes that may impact the amount of total memory available to other

processes that may or may not be using LargePages.

• -XX:+UseNUMA JVM option also does not work with -

XX:+UseConcMarkSweepGC

• Alternate would be to experiment with o numactl --cpunodebind=0 --membind=0 myapp

• However we found ESX NUMA locality algorithms were doing

great at localizing and did not need further NUMA tuning.

Page 42: Tuning Large Scale Java Platforms

CMS Collector Example

java –Xms30g –Xmx30g –Xmn10g -XX:+UseConcMarkSweepGC -XX:+UseParNewGC –_XX:CMSInitiatingOccupancyFraction=75

–XX:+UseCMSInitiatingOccupancyOnly -XX:+ScavengeBeforeFullGC

-XX:TargetSurvivorRatio=80 -XX:SurvivorRatio=8 -XX:+UseBiasedLocking

-XX:MaxTenuringThreshold=15 -XX:ParallelGCThreads=4

-XX:+UseCompressedOops -XX:+OptimizeStringConcat -XX:+UseCompressedStrings

-XX:+UseStringCache

This JVM configuration scales up and down effectively

-Xmx=-Xms, and –Xmn 33% of –Xmx

-XX:ParallelGCThreads=< minimum 2 but less than 50% of available vCPU to the

JVM. NOTE: Ideally use it for 4vCPU VMs plus, but if used on 2vCPU VMs drop the -

XX:ParallelGCThreads option and let Java select it

Page 43: Tuning Large Scale Java Platforms

CMS Collector Example

JVM Option Description

-Xmn10g Fixed size Young Generation

-XX:+UseConcMarkSweepGC The concurrent collector is used to collect the tenured generation and

does most of the collection concurrently with the execution of the

application. The application is paused for short periods during the

collection. A parallel version of the young generation copying collector

is used with the concurrent collector.

-XX:+UseParNewGC This sets whether to use multiple threads in the young generation

(with CMS only!). By default, this is enabled in Java 6u13, probably

any Java 6, when the machine has multiple processor cores.

–XX:CMSInitiatingOccupancyFraction=75 This sets the percentage of the heap that must be full before the JVM

starts a concurrent collection in the tenured generation. The default is

some where around 92 in Java 6, but that can lead to significant

problems. Setting this lower allows CMS to run more often (all the

time sometimes), but it often clears more quickly to avoid

fragmentation.

Page 44: Tuning Large Scale Java Platforms

CMS Collector Example

JVM Option Description

–XX:+UseCMSInitiatingOccupancyOnly Indicates all concurrent CMS cycles should start based on –

XX:CMSInitiatingOccupancyFraction=75

-XX:+ScavengeBeforeFullGC Do young generation GC prior to a full GC.

-XX:TargetSurvivorRatio=80 Desired percentage of survivor space used after scavenge.

-XX:SurvivorRatio=8 Ratio of eden/survivor space size

Page 45: Tuning Large Scale Java Platforms

CMS Collector Example

JVM Option Description

-XX:+UseBiasedLocking Enables a technique for improving the performance of uncontended

synchronization. An object is "biased" toward the thread which first

acquires its monitor via a monitorenter bytecode or synchronized

method invocation; subsequent monitor-related operations performed

by that thread are relatively much faster on multiprocessor machines.

Some applications with significant amounts of uncontended

synchronization may attain significant speedups with this flag

enabled; some applications with certain patterns of locking may see

slowdowns, though attempts have been made to minimize the

negative impact.

-XX:MaxTenuringThreshold=15 Sets the maximum tenuring threshold for use in adaptive GC sizing.

The current largest value is 15. The default value is 15 for the parallel

collector and is 4 for CMS.

Page 46: Tuning Large Scale Java Platforms

CMS Collector Example

JVM Option Description

-XX:ParallelGCThreads=4 Sets the number of garbage collection threads in the young/minor

garbage collectors. The default value varies with the platform on

which the JVM is running.

-XX:+UseCompressedOops Enables the use of compressed pointers (object references

represented as 32 bit offsets instead of 64-bit pointers) for optimized

64-bit performance with Java heap sizes less than 32gb.

-XX:+OptimizeStringConcat Optimize String concatenation operations where possible.

(Introduced in Java 6 Update 20)

-XX:+UseCompressedStrings Use a byte[] for Strings which can be represented as pure ASCII.

(Introduced in Java 6 Update 21 Performance Release)

-XX:+UseStringCache Enables caching of commonly allocated strings

Page 47: Tuning Large Scale Java Platforms

IBM JVM - GC choice

-Xgc:mode Usage Example

-Xgcpolicy:Optthruput

(Default)

Performs the mark and sweep operations during

garbage collection when the application is paused to

maximize application throughput. Mostly not suitable for

multi CPU machines.

Apps that demand a high

throughput but are not very

sensitive to the occasional

long garbage collection

pause

-Xgcpolicy:Optavgpause

Performs the mark and sweep concurrently while the

application is running to minimize pause times; this

provides best application response times.

There is still a stop-the-world GC, but the pause is

significantly shorter. After GC, the app threads help out

and sweep objects (concurrent sweep).

Apps sensitive to long

latencies transaction-based

systems where Response

Time are expected to be

stable

-Xgcpolicy:Gencon Treats short-lived and long-lived objects differently to

provide a combination of lower pause times and high

application throughput.

Before the heap is filled up, each app helps out and

mark objects (concurrent mark).

Latency sensitive apps,

objects in the transaction

don't survive beyond the

transaction commit

Job

Web

Web

Page 48: Tuning Large Scale Java Platforms

XYZCarRegistry.com – Current Java Platform

• 25 unique REST Services

• Xyzcars.com deployed each REST service

on a dedicated JVM

• The 25 JVMs are deployed on physical

box of 12 cores (2 sockets 6 cores each

socket) total and 96GB RAM

• There are a total of 16 hosts/physical

boxes, hence total of 400 JVMs servicing

peak transactions for their business

• The current peak CPU utilization across all

is at 15%

• Each JVM has heap size of –Xmx

1024MB

• Majority of transactions performed on

xyzcars.com traverse ALL of the REST

services, and hence all of the 25 JVMs

Page 49: Tuning Large Scale Java Platforms

XYZCarRegistry.com – Current Java Platform

R1 R1- denotes REST 1…25 Denotes a JVM

R1 One REST Service Per JVM

R1

R25

R1

R25

Load Balancer Layer

Page 50: Tuning Large Scale Java Platforms

Solution 1 – Virtualize 1 REST : 1 JVM

with 25 JVMs Per VM, 2 VMS Per Host

25 JVMs, 1

REST per

JVM

On 1 VM

Page 51: Tuning Large Scale Java Platforms

Solution 1 (400GB) – Virtualize

1 REST : 1 JVM with 25 JVMs Per VM,

2 VMS Per Host • Sized for current workload, 400GB

Heap space

• Deployed 25 JVMs on each VM,

each JVM is 1GB

• Accounting for JVM off the heap

overhead

• 25GB*1.25=31.25GB

• Add Guest OS 1GB

• 31.25+1=32.25GB

• 8 Hosts

• 25 unique REST Services

• Each REST Service deployed in its

own JVM

• Original call paradigm has not

changed

50 JVMs to 12 cores, this maybe an issue, while the CPU utilization

is originally at 15% you can assume 30%+ CPU utilization is the new

level. However in actual fact response time may suffer significantly

due to coinciding GC cycles that can cause CPU contention

Page 52: Tuning Large Scale Java Platforms

XYZCarRegistry.com – Current Java Platform

Load Balancer Layer

R1

R25

R1

R25

R1 R1- denotes REST 1…25 Denotes a JVM

R1 One REST Service Per JVM

Page 53: Tuning Large Scale Java Platforms

Solution 1 (800GB) – Virtualize 1 REST : 1 JVM

with 25 JVMs Per VM, 2 VMS Per Host

• Sized for future workload,

800GB Heap space

• Deployed 25 JVMs on each

VM, each JVM is 1GB

• Accounting for JVM off the

heap overhead

• 25GB*1.25=31.25GB

• Add Guest OS 1GB

• 31.25+1=32.25GB

• 16 Hosts

• 25 unique REST Services

• Each REST Service

deployed in its own JVM

• Original call paradigm has

not changed

50 JVMs to 12 cores, this maybe an issue, while the CPU

utilization is originally at 15% you can assume 30%+ CPU

utilization is the new level. However in actual fact response time

may suffer significantly due to coinciding GC cycles that can cause

CPU contention

THIS SOLUTION IS NOT GREAT BUT ITS LEAST INTRUSIVE

NOTE: We had to use 16 hosts, as the 8

hosts in the 400GB case, already had 50

JVMs per host, which is significant

Page 54: Tuning Large Scale Java Platforms

Solution 2 – Virtualize 25 REST : 1 JVM

with 1 JVMs Per VM, 2 VMS Per Host

25 JVMs,

1 REST

per JVM

On 1 VM

Page 55: Tuning Large Scale Java Platforms

Solution 2 – Virtualize 25 REST : 1 JVM

with 1 JVMs Per VM, 2 VMS Per Host

Page 56: Tuning Large Scale Java Platforms

Solution 2 – Virtualize 25 REST : 1 JVM

with 1 JVMs Per VM, 2 VMS Per Host

R1 R2

R3 R4

R5

R25

R1 R2

R3 R4

R5

R25

Load Balancer Layer

R1

R25

R1

R25

Load Balancer Layer

Page 57: Tuning Large Scale Java Platforms

Solution 2 – Virtualize 25 REST : 1 JVM

with 1 JVM Per VM, 2 VMS Per Host

R1 R2

R3 R4

R5

R25

R1 R2

R3 R4

R5

R25

Load Balancer Layer

Perm Gen

Initial

Heap

Guest OS

Memory

Java Stack

All the REST transaction across

25 services run within one JVM

instance at a time

Page 58: Tuning Large Scale Java Platforms

Solution 2 – Virtualize 25 REST : 1 JVM

with 1 JVM Per VM, 2 VMS Per Host

Description Today Future Comment

VM Size (theoretical ceiling

NUMA optimized)

[96-{(96*0.02)+1}]/2 =

46.5GB

[96-{(96*0.02)+1}]/2 =

46.5GB

Using NUMA overhead

equation, this VM of 46.5GB

and 6vCPU will be NUMA

local

VM Size for Prod 46.5*0.95=44.2GB 46.5*0.95=44.2GB

JVM Heap Allowable (44.2-1)/1.25=34.56GB

(44.2-1)/1.25=34.56GB Working backwards from the

NUMA size of 44.2, minus

1GB for Guest OS, and then

accounting for 25% JVM

overhead by dividing by 1.25

Number of JVMs needed 400/34.56=11.59 => 12 JVMs 800/34.56=23.15 => 24

JVMs

Total heap needed divided by

how much heap can be

placed in each NUMA node

Number of Hosts 6 12 1 JVM per VM, 1 VM per

NUMA node

Page 59: Tuning Large Scale Java Platforms

Solution 2 – Virtualize 25 REST : 1 JVM

with 1 JVM Per VM, 2 VMS Per Host

JVM Max

Heap

-Xmx

(34.5GB)

Total JVM Memory

(max=42.5GB

Min=37.5) Perm Gen

Initial

Heap

Guest OS

Memory

VM

Memory

(43.5GB)

-Xms (34.5GB)

Java Stack -Xss per thread (256k*1000)

-XX:MaxPermSize (1GB)

Other mem (=1GB)

1GB used by OS

Set memory Reservation to 43.5GB

All REST Services in one Heap

Increase thread pool to

1000 to take on more load

since heap is much larger

Page 60: Tuning Large Scale Java Platforms

Another Example (360GB JVM) A monitoring system that does not scale out, runs in a large single JVM of –Xm360g, i.e. 360GB.

The server has 512GB and 2 sockets of 10 cores each

360GB + 1GB for OS + 25% * 360GB for off-the-heap overhead => 360GB + 1GB + 90GB => 451GB is the VMs memory Reservation

The VM has 20 vCPUs

java –Xms360g –Xmx360g –Xmn10g –Xss1024k -XX:+UseConcMarkSweepGC -XX:+UseParNewGC –_XX:CMSInitiatingOccupancyFraction=75

–XX:+UseCMSInitiatingOccupancyOnly -XX:+ScavengeBeforeFullGC

-XX:TargetSurvivorRatio=80 -XX:SurvivorRatio=8 -XX:+UseBiasedLocking

-XX:MaxTenuringThreshold=15 -XX:ParallelGCThreads=10 -XX:+OptimizeStringConcat -XX:+UseCompressedStrings

-XX:+UseStringCache –XX:+DisableExplicitGC –XX:+AlwyasPreTouch

Page 61: Tuning Large Scale Java Platforms

The 360GB JVM downsized to 200GB A monitoring system that does not scale out, runs in a large single JVM of –Xm200g, i.e. 200GB.

The server has 512GB and 2 sockets of 10 cores each

200GB + 1GB for OS + 25% * 200GB for off-the-heap overhead => 200GB + 1GB + 50GB => 251GB is the VMs memory Reservation

The VM has 20 vCPUs

java –Xms200g –Xmx200g –Xmn60g –Xss1024k -XX:+UseConcMarkSweepGC -XX:+UseParNewGC –_XX:CMSInitiatingOccupancyFraction=75

–XX:+UseCMSInitiatingOccupancyOnly -XX:+ScavengeBeforeFullGC

-XX:TargetSurvivorRatio=80 -XX:SurvivorRatio=8 -XX:+UseBiasedLocking

-XX:MaxTenuringThreshold=15 -XX:ParallelGCThreads=10 -XX:+OptimizeStringConcat

-XX:+UseCompressedStrings -XX:+UseStringCache –XX:+DisableExplicitGC

–XX:+AlwaysPreTouch

Page 62: Tuning Large Scale Java Platforms

Scale Out Monitoring Tools

For large environments, scale-out monitoring tools are

essential

VMware vC Ops uses GemFire to scale out, has many self

learning algorithms to detect trends

Page 63: Tuning Large Scale Java Platforms

Comparing Scenario-1 of 4 JVMs vs. Scenario-2 of 2 JVMs

Scenario-1 4 JVMs off 1GB Heap on each, Average R/T 166ms

Scenario-2 2 JVMs of 2GB Heap on each, Average R/T 123ms

0

100

200

300

400

500

600

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

scenario-1 RT

scenario-2 RT

Scenario-2 has 26%

better response time

Page 64: Tuning Large Scale Java Platforms

Comparing 2 JVMs vs. 4 JVMs

Scenario-2 (2 JVMs) has 60%

less CPU utilization than

scenario-1

Scenario-1 (4JVMs)

Page 65: Tuning Large Scale Java Platforms

Cloud Foundry Architecture

Page 66: Tuning Large Scale Java Platforms

Spring Trader Application

Page 67: Tuning Large Scale Java Platforms

Cloud Foundry Demo

Page 69: Tuning Large Scale Java Platforms

Inserting New Slides and Applying Layouts

• To create different types of slides, select New Slide and apply

desired Layout

• If you make formatting changes to a slide and want to reapply the

default settings, click Reset

69

Page 70: Tuning Large Scale Java Platforms

Layouts

There are 11 layouts • Title Slide – Presentation title with author name and contact info. Use only once in presentation.

• Title Only – Standard slide with no text content box, do not use for section title

• Title and Content (1 Col) – Standard slide with title and text content box

• Title and Content (2 Col) – Standard slide with title and two text content boxes

• Comparison – Standard slide with title and two text content boxes with subtitles

• Section Title – Use between sections

• Table – Quick and easy way to get a table with brand colors

• Bar Chart – Quick and easy way to get a bar chart with brand colors

• Line Chart – Quick and easy way to get a line chart with brand colors

• Pie Chart – Quick and easy way to get a pie chart with brand colors

• Code Formatting – Quick and easy way to get preferred code formatting

70

Page 71: Tuning Large Scale Java Platforms

Fonts – Title (24pt, bold)

• All body text is Arial

• Subhead (22pt)

• Level Two (18pt)

o Level Three (16pt)

o Level Four (16pt)

• Use the “Decrease/Increase Indent”

tools to change bullet levels

• Click on the Home ribbon, Paragraph tab

• Line spacing is set in master slides

71

Page 72: Tuning Large Scale Java Platforms

Brand Colors

72

R: 241 G: 241 B: 241

R: 153

G: 153

B: 153

R: 109 G: 179 B: 63

Spring

Brand Colors

R: 52

G: 48

B: 45

R: 64

G: 173

B: 100

R: 77

G: 172

B: 169

SpringOne 2GX 2014

Theme Colors

R: 109 G: 179 B: 63

R: 63

G: 129

B: 179

R: 238

G: 238

B: 238

R: 51 G: 51 B: 51

R: 255 G: 255 B: 255

R: 84

G: 108

B: 159

R: 218

G: 102

B: 102

R: 226

G: 161

B: 47

R: 143

G: 136

B: 73

R: 125

G: 78

B: 128

Page 73: Tuning Large Scale Java Platforms

Notes for Reviewers

To provide comments, click the speech icon.

If you're viewing this deck online, the icon is in the footer

If you're a collaborator, the icon is on the lower left hand side

To view the proposed dialog for each slide, you need to be added as a collaborator, login to SlideShare, then View > Show Notes.

To be added as a collaborator (which also allows slide editing), please setup a SlideRocket account with your VMware address and email Ben.

73

Page 74: Tuning Large Scale Java Platforms

Logos and Clip Art

74

Page 75: Tuning Large Scale Java Platforms

Logos

75

Page 76: Tuning Large Scale Java Platforms

Event Logos and Icons

76

Page 77: Tuning Large Scale Java Platforms

Project Icons

77

Spring

Framework

Spring

Security

Spring

Data

Spring

Batch

Spring

Integration

Spring

Reactor

Spring

AMQP

Spring

Hateoas

Spring

Mobile

Spring

Android

Spring

Social

Groovy Spring

Web Services

Spring

Web Flow

Spring

XD

Spring

Boot

Spring

LDAP

Grails

Page 78: Tuning Large Scale Java Platforms

Tool Suite Icons

78

Spring Tool Suite Groovy / Grails Tool Suite

Page 79: Tuning Large Scale Java Platforms

Document Icons

79

Spring Framework Tools Support

Guides Reference App Tutorials

Page 80: Tuning Large Scale Java Platforms

Additional Icons

80

Twitter Github Stack

Overflow

Continuous

Integration

Metrics Forums /

Discussion

Blog @ Issues

All Author Broadcast Date News

Event

Releases Apple Linux Engineering Windows

Professional

Support

Community

Support Location

Page 81: Tuning Large Scale Java Platforms

Clip Art

81

Page 82: Tuning Large Scale Java Platforms

Clip Art

82