java performance monitoring & tuning

83
Immensely Passionate about Technology

Upload: muhammed-shakir-misarwala

Post on 26-Jan-2015

138 views

Category:

Documents


9 download

DESCRIPTION

It is easy to monitor the performance of JVM if one knows how GC and Threads work in JVM. This presentation throws light on Collector types, HotSpot Collection Algorithms, Thread Monitoring, Method Profiling and Heap Profiling

TRANSCRIPT

Page 1: Java Performance Monitoring & Tuning

Immensely Passionate about Technology

Page 2: Java Performance Monitoring & Tuning

MeMuhammed Shakir CoE Lead - Java & Liferay

@MuhammedShakir

www.mslearningandconsulting.com

[email protected]

17 Yrs Exp | 40+ Projects | 300+ Training Programs

๏ Monitoring Java Applications

๏ Tuning GC & Heap

Java Performance Tuning

Page 3: Java Performance Monitoring & Tuning

Java Performance Tuning

In this module we will cover the following:

๏Garbage Collection & Threads in JVM

๏What is method profiling & why it is important

๏Object Creation Profiling & Why it is

important ?

๏Gross memory monitoring

#1 Module Coverage

Monitoring JVM

Page 4: Java Performance Monitoring & Tuning

Java Performance Tuning

๏ About thread profiling

๏ Client Server Communications

๏ We will summarize on - “All in all - What to

monitor”

#2 Module Coverage

Monitoring JVM

Page 5: Java Performance Monitoring & Tuning

Java Performance Tuning

GC in JVM

There is no point discussing monitoring and

tuning without understanding fundamentals of

GC & Threads.

We will discuss in general how GC works.

We will also discuss in general how Threads

behave in JVM.

In order to uderstand GC we also need to

understand the memory structure first. Hence

we will start with understanding the memory

model of Java.

Why discuss GC & Threads ?

Monitoring Java Applications

Page 6: Java Performance Monitoring & Tuning

Java Performance Tuning

Classloader is the subsystem that loads classes.

Heap is where the object allocation is done

Non Heap area typically comprises of Method

Area, Code Cache and Permanent Generation.

PC are program counters that tracks the control

of execution in stack

Execution is the JVM that provides services to

Java Application

#1 Memory Structure

Monitoring Java Applications

GC in JVM

Page 7: Java Performance Monitoring & Tuning

Java Performance Tuning

Classloader loads the class

Creates an object of class Class and stores the

bytecode information in fields, methods etc. All

this meta data is stored in perm gen.

Static variables comes into existence while

loading the class.

If reference variable then object is in heap and

reference is in perm gen

Objects of class Class is created in perm gen.

#2 Method Area & Heap

Monitoring Java Applications

GC in JVM

Page 8: Java Performance Monitoring & Tuning

Java Performance Tuning

Each thread is allocated 1 stack object.

Each method is allocated a frame.

Program Counter tracks the flow of execution in

thread

Native threads are not within Java Stack

#3 Runtime Data Areas exclusive to each thread

Monitoring Java Applications

GC in JVM

Page 9: Java Performance Monitoring & Tuning

Java Performance Tuning

Heap stores all the application objects.

Program never frees memory

GC frees memory.

The way to think about GC in Java is that it’s a

“lazy bachelor” that hates taking out the trash

and typically postpones the process for some

period of time. However, if the trash can begins

to overflow, java immediately takes it out In

other words - if memory becomes scarce, java

immediately runs GC to free memory

#4 What is GC & When it happens

Monitoring Java Applications

GC in JVM

Page 10: Java Performance Monitoring & Tuning

Java Performance Tuning

More time in GC means more pauses of

application threads

More number of objects, higher is the memory

foot print and thereby more work for GC

Large heap - more time for GC

Small heap - less time but frequent

Memory leaks (loitering objects) can make GC

kick very often

#10 Why is GC Monitoring important !

Monitoring Java Applications

GC in JVM

Page 11: Java Performance Monitoring & Tuning

Java Performance Tuning

GC compute intensive - CPU overhead. More the

time taken by GC, slower will be your

application.

Throughput : Total time spent in not doing GC.

Pause Time: The time for which the app threads

stopped while collecting.

Footprint: Working size of JVM measured in

terms of pages and cache lines (See glossary in

notes)

Promptness: time between objects death and its

collection.

#11 Why is GC Monitoring Important !

Monitoring Java Applications

GC in JVM

Page 12: Java Performance Monitoring & Tuning

Java Performance Tuning

Reference Counting: Each object has a reference

count.

Collector collects the object with 0 references.

Simple but requires significant assistance from

compiler - the moment the reference is modified

compiler must generate code to change the

count

Unable to collect objects with cyclic references -

like doubly linked list or tree where child

maintains reference to parent node.

Java does not use Reference Counting. STW

collector.

#5 Types of Collectors - Reference Counting

Monitoring Java Applications

GC in JVM

Page 13: Java Performance Monitoring & Tuning

Java Performance Tuning

Collector takes snapshot of root objects - objects

that are being referred from stack (local

variables) and perm gen (static variables)

Starts tracing objects reachable from root

objects and marks them as reachable.

Balance is garbage

All collectors in Java are of type tracing collector.

Stop the world collector.

#6 Types of Collectors - Tracing Collectors

Monitoring Java Applications

GC in JVM

Page 14: Java Performance Monitoring & Tuning

Java Performance Tuning

This is the basic tracing collector.

Marking: Object has mark bit in block header;

clears mark of all objects and then marks that

are reachable.

Sweep: Collector runs through all the allocated

objects to get the mark value. Collects all objects

that are not marked.

There are two challenges with this collector:

1.Collectors has to walk through all allocated

objects in sweep phase.

2.Leaves heap fragmented

#7 Types of Collectors - Mark-Sweep

Monitoring Java Applications

GC in JVM

Page 15: Java Performance Monitoring & Tuning

Java Performance Tuning

Overcomes challenges of Mark-Sweep. (This

collection is aka - Scavenge)

Creates two spaces - active and inactive

Moves surviving objects from active to inactive

space.

Roles of spaces is flipped.

Advantages - a) Does not have to visit garbage

objects to know its marker. b) Solves the

reference locality issue.

Disadvantages - a) Overhead of copying objects

b) adjusting references to point to new location

#8 Types of Collectors - Copying Collector

Monitoring Java Applications

GC in JVM

Interesting downside: When

standing on its own , it needs

memory 2wice as the heap to

be reliable; because when the

collector starts, it does not know

how much will be the live

objects in from space.

Page 16: Java Performance Monitoring & Tuning

Java Performance Tuning

Overcomes challenges of Copy (Twice size is not

needed) & Mark-Compact (no fragmentation)

Marking - Same as Mark-Sweep i.e. Visits each

live objects and marks as reachable.

Compaction - Marked objects are copied such

that all live objects are copied to the bottom of

the heap.

Clear demarcation between active portion of

heap and free area.

Long lived objects tend to accumulate at the

bottom of the heap so that they are not copied

again as they are in copying collector.

#9 Types of Collectors - Mark-Compact

Monitoring Java Applications

GC in JVM

CMS (Concurrent Mark Sweep ) garbage collection does not do compaction. ParallelOld garbage collection performs only whole-heap compaction, which results in considerable pause times.

CMS (Concurrent Mark Sweep ) garbage collection does not do compaction. ParallelOld garbage collection performs only whole-heap compaction, which results in considerable pause times.

Page 17: Java Performance Monitoring & Tuning

Java Performance Tuning

#10 Copy Vs Mark Sweep Compact

Monitoring Java Applications

GC in JVM2x refers to “Twice the

memory”

Page 18: Java Performance Monitoring & Tuning

Java Performance Tuning

#11 Very important

Monitoring Java Applications

GC in JVM

Page 19: Java Performance Monitoring & Tuning

Java Performance Tuning

Thread in JVM

Threads for better performance; however more

the number of threads - more are the challenges

Threads when not sharing data - challenges are

less

Challenges with threads - race condition,

deadlock, starvation, livelock

Deadlocked threads are dreaded - can eat up

CPU time

For monitoring threads, understanding thread

states is important

#1 Threads - Understanding is Important

Monitoring Java Applications

Page 20: Java Performance Monitoring & Tuning

Java Performance Tuning

New - Created but not yet started.

Runnable - Executing but may be waiting for OS

resources like CPU time.

Blocked - Waiting for the monitor lock to enter

syncrhonized block or after being recalled from

the wait-set on encountering notify.

Waiting - As a result of Object.wait(),

Thread.join(), LockSupport.park().

#2 Thread States

Monitoring Java Applications

Thread in JVM

Page 21: Java Performance Monitoring & Tuning

Java Performance Tuning

Timed Waiting : As a result of Thread.sleep(),

Thread.wait(timeout), Thread.join(timeout)

Terminated : Execution Completed

#3 Thread States

Monitoring Java Applications

Thread in JVM

Page 22: Java Performance Monitoring & Tuning

Java Performance Tuning

Two concurrent threads changing the state of

same object

While one thread has not finished writing to

memory location, the other thread reads from it.

Synchronization- is the solution.

We all know what is synchronization. Really ?

Read on....

#4 Race Condition

Monitoring Java Applications

Thread in JVM

Page 23: Java Performance Monitoring & Tuning

Java Performance Tuning

The semantics of includes

๏Mutual exclusion of execution based on state of

semaphore

๏Rules about synchronizing threads interaction

with main memory. In particular, the acquisition

and release of lock triggers memory barrier -- a

forced syncrhonization between the threads

local memory and main memory.

The last point is the one which is very often not

known by developers.

#5 Synchronization Semantics

Monitoring Java Applications

Thread in JVM

Page 24: Java Performance Monitoring & Tuning

Java Performance Tuning

Deadlock occurs when two or more threads are blocked for ever, waiting for each other.

Object o1 and o2. Thread t1 and t2 starts together.

Thread t1 starts and locks o1 and then without releasing lock on o1, after 100ms tries to lock o2.

Thread t2 starts and locks o2 and then without releasing the lock, tries to lock o1

There is a sure deadlock - t1 is occupying o1 monitor hence t2 will not get access to o1 and t2 has occupied monitor of o2 and hence t1 will not get access to o2

#6 DeadLock - Can Hurt Performance Badly

Monitoring Java Applications

Thread in JVM

Page 25: Java Performance Monitoring & Tuning

Java Performance Tuning

#7 DeadLock - Can Bring JVM to Knees

Monitoring Java Applications

Thread in JVM

Page 26: Java Performance Monitoring & Tuning

Java Performance Tuning

#8 DeadLock - Can Bring JVM to Knees

Monitoring Java Applications

Thread in JVM

Page 27: Java Performance Monitoring & Tuning

Java Performance Tuning

Monitoring Java Applications

A less common situation as compared to

DeadLock; Starvation happens when one thread

is deprived of the resource (a shared object for

instance) because other thread has occupied it

for a very long time and not releasing it.

LiveLock - Again less common situation where -

Two threads are responding to each other’s

action and unable to proceed.

#9 Starvation & LiveLock

Thread in JVM

Page 28: Java Performance Monitoring & Tuning

Java Performance Tuning

Method Profiling

Monitoring Java Applications

What if your application is running slow at one

point of execution

You can pin point exactly the execution path

where the performance is bad.

There is probably a method that is taking time

more than expected

You need to profile the application to trace

method calls.

Visual VM is a good tool - Lets use it

#1 Monitoring Methods

Page 29: Java Performance Monitoring & Tuning

Java Performance Tuning

Method Profiling

Monitoring Java Applications

Get the test program from here:

http://www.mslearningandconsulting.com/documents/28301/83860/Meth

odCallProfileTest.java

.

Study the program, run it and start visual VM

1.Select the process

2.Go to Profiler

3.Select settings and remove all the package names from “Do not profile classes” and save the settings

4.Run CPU Profiler

5.Go back to application console and hit “enter” twice to start runThreads method

6.Let the profiling complete and save the snapshot

#2 Tracing Methods using Visual VM

Page 30: Java Performance Monitoring & Tuning

Java Performance Tuning

Method Profiling

Monitoring Java Applications

Select the Hot Spots from the tabs below the Snapshot.

Note which method and from which thread is taking maximum time.

You will notice that FloatingDecimal.dtoa method is taking max time.

Select Combined option from the tab. Now double click on FloatingDecimal.dtoa and see the trace to FloatingDecimal.dtoa

#3 Observations of Method Profile

Page 31: Java Performance Monitoring & Tuning

Java Performance Tuning

Profiling Obj Creation

Monitoring Java Applications

More the number of objects in memory, more

work for GC.

Object creation itself is compute intensive job.

Leaking (loitering) object can be all the more

dangerous and can lead to OOME.

Memory Profiling can help find the objects which

are taking max space. We can also get number

of instances of given class.

We will use Visual VM for the purpose

#1 Objects Churned in memory

Page 32: Java Performance Monitoring & Tuning

Java Performance Tuning

Profiling Obj Creation

Monitoring Java Applications

Download the code from here: http://www.mslearningandconsulting.com/documents/28301/83860/Object+Creation+Profiling.zip

Run the code and select the Java Process in Visual VM.

Now hit the enterkey on console.

Go to sampler option and select memory (if not present then VM >> Tools >> Plugins >> Install Sampler)

Monitor the amount of memory taken by LargeObject.

Also the byte array object - this will take max memory

#2 Visual VM Memory Profiler

Page 33: Java Performance Monitoring & Tuning

Java Performance Tuning

Profiling Obj Creation

Monitoring Java Applications

What is memory leak ? The object is created in heap and there is a reference to it; at some point in time, the application looses access to the reference variable (you would call it a pointer in C ) before reclaiming memory that was allocated for the object.

Is memory leak possible in Java ? No & Yes

No - There is no way that the object has lost reference and GC does not collect it.

Yes - There can be an object which has a strong reference to it but the design of the application is such that application will never use the reference - such are loitering objects

#3 Can Java Application Leak Memory ?

Page 34: Java Performance Monitoring & Tuning

Java Performance Tuning

Profiling Obj Creation

Monitoring Java Applications

Consider that ClassA is instantiated and has a life

equal to life of JVM. Now if ClassA refers to an

instance of ClassB and if ClassB is an instance of

UI widget, it is quite likely that the UI is

eventually dismissed by the user. In such a case

that instance will always be held in memory as it

is being referred by instance of ClassA. Instance

of ClassA will be considered as loitering.

You cannot find loitering objects by simple

looking at memory utilization in Activity Monitor

or Task Manager

You need better tools. For e.g. Jprobe, Yourkit

etc.

#4 Possible Leaking Objects in Java

Page 35: Java Performance Monitoring & Tuning

Java Performance Tuning

Collection classes, such as hashtables and

vectors are common places to find the cause of

memory leak.

Use static variables thoughtfully. Especially final

static.

If registering an instance of ActionListener Class,

do not forget to unregister once the event is

invoked (some programming platforms like

ActionScript supports registration by

WeakReference.

#6 General Tips to Avoid Memory Leaks

Profiling Obj Creation

Monitoring Java Applications

Page 36: Java Performance Monitoring & Tuning

Java Performance Tuning

Avoid static references esp. final fields.

Avoid calling str.intern() on lengthy Strings as

this would put the the string object referred to by

str in StringPool.

Avoid storing large objects in ServletContext in

web applications.

Unclosed open streams can cause problems.

Unclosed database connections can cause

problems.

#7 General Tips.... (Contd.)

Profiling Obj Creation

Monitoring Java Applications

Page 37: Java Performance Monitoring & Tuning

Java Performance Tuning

Tomcat server crashes after several

redeployments

The ClassLoader object does not get unloaded

thereby maintaining references to all the

metadata. OOME - PermGen Space error.

Each ClassLoader objects maintains cache of all

the classes it loads.

Object of each class maintains the reference to

its class object

#8 ClassLoader Leak

Profiling Obj Creation

Monitoring Java Applications

Page 38: Java Performance Monitoring & Tuning

Java Performance Tuning

Consider this :

1.A long running Thread

2.Loads a class with custom ClassLoader

3.The object is created of loaded class and a

reference of that object is stored in ThreadLocal

(say through constructor of loaded class)

4.Now even if you clear the newly created

object, class reference object and the loader,

the loader will remain along with all the classes

it loaded

#8 ClassLoader Leak (contd.)

Profiling Obj Creation

Monitoring Java Applications

Page 39: Java Performance Monitoring & Tuning

Java Performance Tuning

#9 ClassLoader Leak Code

Profiling Obj CreationMonitoring Java Applications

New ThreadNew Thread

Custom Custom ClassLoaderClassLoader

Instance in Instance in ThreadLocalThreadLocal

Page 40: Java Performance Monitoring & Tuning

Java Performance Tuning

On destroy of container, LeakServlet looses reference and hence it is collectedAppClassLoader is not collected because LeakServlet$1.class is referencing it.LeakServlet$1.class is not collected because CUSTOMLEVEL object is referencing it.CUSTOMLEVEL object is not collected because Level.class (through its static variable called known) is referencing it.Level.class is not collected as it is loaded by BootStrapClassLoaderSince AppClassLoader not collected, OOME Perm......

#10 ClassLoader Leak - java.util.Level

Profiling Obj CreationMonitoring Java Applications

Page 41: Java Performance Monitoring & Tuning

Java Performance Tuning

#10 Strings Leaking because of substring

Profiling Obj CreationMonitoring Java Applications

// Instead use the followingthis.muchSmallerString = new String(veryLongString.substring(0, 1));

Page 42: Java Performance Monitoring & Tuning

Java Performance Tuning

#10 Closing of Streams is Important

Profiling Obj CreationMonitoring Java Applications

Create a BigJar Read the contents

Note that stream is not closed - Check memory consumption in Visual VM

Page 43: Java Performance Monitoring & Tuning

Java Performance Tuning

Profiling Obj CreationMonitoring Java Applications

#5 Monitoring Memory Leak

Where is the leak ?

Peak Load Concept: To distinguish between a memory leak and an application that simply needs more memory, we need to look at the "peak load" concept. When program has just started no users have yet used it, and as a result it typically needs much less memory then when thousands of users are interacting with it. Thus, measuring memory usage immediately after a program starts is not the best way to gauge how much memory it needs! To measure how much memory an application needs, memory size measurements should be taken at the time of peak load—when it is most heavily used.

Page 44: Java Performance Monitoring & Tuning

Java Performance Tuning

Gross Memory Monitoring

Monitoring Java Applications

The objects are allocated in heap.

At any point of time if the memory available to

create objects is less than what is needed, you

will encounter dreaded OOME.

Monitoring gross memory usage is important so

that you can identify the memory limits for your

application.

It is important to understand how memory is

used, claimed and freed by JVM.... Be engaged....

#1 Why Gross Memory Monitoring ?

Page 45: Java Performance Monitoring & Tuning

Java Performance Tuning

Gross Memory Monitoring

Monitoring Java Applications

Initial size : -Xms and max size : -Xmx

Runtime.getRuntime().totalMemory() returns

currently grep-ed memory.

If JVM needs more memory, expansion happens -

max to the tune of -Xmx

OOME if memory needs goes beyond -Xmx

OOME if expansion fails because OS does not

have memory to provide (rare case).

Will revisit this topic while discussing more on

tuning.

#2 Heap Memory Usage by JVM

Download and run this class : http://www.mslearningandconsulting.com/documents/28301/83860/MonitorHeapExpansion.java

Page 46: Java Performance Monitoring & Tuning

Java Performance Tuning

Thread Profiling

Monitoring Java Applications

Re-run the DeadLock program you have written

earlier.

Start JConsole >> Threads.

Click on “Detect DeadLock”. You will fine two

threads identified to be in deadlock.

Study the other things like state which can help

to detect LiveLock or Starvation if any.

Recollect the discussion we did on Thread states

#1 Monitoring Threads - JConsole

Page 47: Java Performance Monitoring & Tuning

Java Performance Tuning

Thread Profiling

Monitoring Java Applications

Use jstack in order to get thread dump while

your jvm is running

Jstack prints stack traces for java threads for a

given process.

Run the MonitoringHeapExpansion program and

use jstack to study the stack trace.

#2 Monitoring Threads - jstack

Page 48: Java Performance Monitoring & Tuning

Java Performance Tuning

Client Server Monitoring

Monitoring Java Applications

Monitor the time taken for incoming requests to

be processed

Monitor the average amount of data sent in each

request

Monitor the number of worker threads

Monitor the state of thread and the timeout set

for each thread in the pool

#1 What to monitor ?

Page 49: Java Performance Monitoring & Tuning

Java Performance Tuning

All in All - What to Monitor ?

Monitoring Java Applications

GC - Number of GCs, time taken by GC, amount

of memory freed after GC (remember then can

be loitering objects which can make GC kick in

very often)

Thread - State of the Threads - Look for

DeadLock, Starvation, LiveLock

Hotspots - The methods taking max time.

Object Allocation - Probing the number of objects

churned especially looking for loitering objects

Finalizers - Object pending for finalization.

#1 Summary of What to Monitor !

Page 50: Java Performance Monitoring & Tuning

Java Performance Tuning

Observability API

Monitoring Java Applications

JVM PI in Java 1.2. JVM TI from 1.5

Use JConsole to see the list of Management

Beans

Let us monitor our DeadLocalDemo code to

detect dead locked threads

#1 JVM TI

ThreadMXBean threadMB = ManagementFactory.getThreadMXBean();long threadIds[] = threadMB.findDeadlockedThreads();for (long id : threadIds) { System.out.println("The deadLock Thread id is : " + id

+ " > "+

threadMB.getThreadInfo(id).getThreadName());}

Page 51: Java Performance Monitoring & Tuning

Java Performance Tuning

Observability API

Monitoring Java Applications

There are many tools that are bundled with Sun JDK and they are as follows:

1.jmap (use sudo jmap on mac) : prints shared object memory maps or heap memory details of a given process.

2.jstack: prints Java stack traces of Java threads for a given Java process

3.jinfo (use sudo jinfo on mac): prints Java configuration information for a given Java process.

4.Jconsole provides much of the above.

#1 JDK Tools

Page 52: Java Performance Monitoring & Tuning

Java Performance Tuning

Profiling Tools

Monitoring Java Applications

1.JProfiler: This is a paid product and has a very nice user interface. Gives all the information on GC, Object Creation and Allocation and CPU Utilization

2.Yourkit: This is also a paid product. Quite comprehensive.

3.AppDynamics: This is my favorite. It works with distributed system and very intelligently understands the different components that makes up your application.

#1 Overview of Profiling Tools

Visual VM - Lets run MemoryHeapExpansion and monitor memory & threads in Visual VM

Page 53: Java Performance Monitoring & Tuning

Java Performance Tuning

Tuning GC & Heap

In this module we will cover the following (as

such both of these topcis will go hand in hand)

๏Monitoring & Tuning GC

๏Monitoring & Tuning the Heap

#1 Module Coverage

Page 54: Java Performance Monitoring & Tuning

Java Performance Tuning

Sizing Heap

Serial GC - One thread used. Good for

uniprocessor; throughput will be lost on multi-

processor system.

Ergonomics - Goal is to provide good

performance with little or no tuning by selecting

gc, heap size and compiler. Introduced in J2SE

5.0

Generations - Most objects are short lived and

they die young. Long lived objects are kept in

different generations.

Tuning GC & Heap goes hand in hand

#1 Things to Know before Tuning

Tuning GC & Heap

Page 55: Java Performance Monitoring & Tuning

Java Performance Tuning

Throughput : Total time spent in not doing GC.

Pause Time: The time for which the app threads

stopped while collecting.

Footprint: Working size of JVM measured in

terms of pages and cache lines (See glossary in

notes)

Promptness: time between objects death and its

collection.

#2 Performance Considerations (Revisited)

Tuning GC & Heap

Sizing Heap

Page 56: Java Performance Monitoring & Tuning

Java Performance Tuning

-verbose:gc : prints heap and gc info on each collection.

Example shows 2 minor and 1 major collections.

Number before and after arrow indicates live objects before and after.

After number also includes garbage which could not be claimed either because they are in tenured or being referenced from tenured or perm gen.

Number in parenthesis provides committed heap size - Runtime.getRuntime().totalMemory()

0.2300771 indicates time taken for collection

#3 Monitoring GC -verbose:gc

Tuning GC & Heap

Sizing Heap

Page 57: Java Performance Monitoring & Tuning

Java Performance Tuning

Additional info as compared to -verbose:gc

Prints information about young generation.

DefNew : Shows the live objects before & after

minor collection in young gen.

Second line shows the status of entire heap and

the time taken.

-XX:+PrintGCTimeStamps will add time stamp at

the start of collection.

Use of -verbose:gc is important with this options

#4 Monitoring GC -XX:+PrintGCDetails

Tuning GC & Heap

Sizing Heap

Page 58: Java Performance Monitoring & Tuning

Java Performance Tuning

Many parameters change the generation sizes.

Not all space is committed - Uncommitted space

is labelled as Virtual.

Generations can grow and shrink; grow to the

extent of -Xmx

Some of the parameters are ratios like NewRatio

& SurvivorRatio.

#5 Sizing Generations

Tuning GC & Heap

Sizing Heap

Page 59: Java Performance Monitoring & Tuning

Java Performance Tuning

Defaults are different for serial and parallel.

Throughput is inversely proportional to amount of memory available.

Total memory is the most important factor in GC performance.

Heap grows and shrinks based on -XX:MinHeapFreeRatio and -XX:MaxHeapFreeRatio

MinHeapFreeRatio is 40 by default and MaxHeapFreeRatio is 70 by default.

Defaults scaled by approx 30% in 64 bit

#6 Total Heap

Tuning GC & Heap

Max must be always smaller than OS can afford to give to

avoid paging

Sizing Heap

Page 60: Java Performance Monitoring & Tuning

Java Performance Tuning

Defaults has problems on large servers - defaults are small and will resule in several expansions and contractions

Recommendations

1.If pauses can be tolerated, use heap as much as possible

2.Consider setting -Xms and -Xmx same.

3.Increase memory if more processor so that memory allocation is parallelized.

#7 Total Heap - (contd.)

Tuning GC & Heap

Sizing Heap

Page 61: Java Performance Monitoring & Tuning

Java Performance Tuning

Proportion of heap dedicated to Young is very crucial

Bigger the Young Gen, lesser minor collections.

Bigger Young will make tenured smaller (if heap size is limited) which will result in frequent Major Collections.

Young Gen size controlled by NewRatio -XX:NewRatio=3 means (Young + Survivors) will be 1/4th of total heap.

-XX:NewSize100M will set the initial size of Young to 100.

-XX:MaxNewSize=200M will set the max size.

#8 Young Generation

Tuning GC & Heap

Sizing Heap

Page 62: Java Performance Monitoring & Tuning

Java Performance Tuning

-XX:SurvivorRatio=6 will set the ratio between eden and survivor to 1:6 i.e. 1/8th of Young. (Not 1/7th because there are 2 survivor spaces)

You will rarely need to change this. Defaults are OK.

Small Survivors will throw objects in tenured.

Bigger Survivor will be a waste.

Ideally Survivors must be half full - this is the factor that determines the threshold for objects to be promoted

-XX:+PrintTenuringDistribution shows age of object in Young Generation.

#9 Survivor Space Sizing

Tuning GC & Heap

Sizing Heap

Page 63: Java Performance Monitoring & Tuning

Java Performance Tuning

Identify max heap size you can afford

Plot your performance metric and identify Young Size

Do not increase Young such that tenured becomes too small to accommodate application cache data plus some 20% extra

Subject to above considerations increase the size of young to avoid frequent minor gc.

#10 Recommendations on New Gen Sizing

Tuning GC & Heap

Sizing Heap

Page 64: Java Performance Monitoring & Tuning

Java Performance Tuning

Identify max heap size you can afford

Plot your performance metric and identify Young Size

Do not increase Young such that tenured becomes too small to accommodate application cache data plus some 20% extra

Subject to above considerations increase the size of young to avoid frequent minor gc.

#10 Recommendations on New Gen Sizing

Tuning GC & Heap

Sizing Heap

Page 65: Java Performance Monitoring & Tuning

Java Performance Tuning

Serial Collector: Single thread, no overhead of coordinating threads, suited for uni processor for apps with small data sets (approx 100M). -XX:+UseSerialGC

Parallel Collector: Can take advantage of multiple processors, Efficient for systems with large data sets, aka throughput collector. -XX:+UseParallelGC.

Parallel Collector by default is used on New. For old use -XX:UseParallelOldGC

Concurrent Collector: Performs most of the work concurrently with minimal pauses. -XX:+UseConcMarkSweep

#1 Available Collectors

Tuning GC & Heap

Selecting Collector

CMS (Concurrent Mark Sweep ) garbage collection does not do compaction. ParallelOld garbage collection performs only whole-heap compaction, which results in considerable pause times.

Concurrent Collector does not do compaction.

CMS (Concurrent Mark Sweep ) garbage collection does not do compaction. ParallelOld garbage collection performs only whole-heap compaction, which results in considerable pause times.

Concurrent Collector does not do compaction.

Page 66: Java Performance Monitoring & Tuning

Java Performance Tuning

-XX:+UseSerialGC if application has small data

set, pause times are not required to be strict,

Uniprocessor

-XX:+UseParallelGC with multiple processors

-XX:+UseParallelOldGC for parallel compaction in

tenured generation (whole heap compaction -

considerable pause times)

-XX:+UseConcMarkSweepGC if pause times must

be lesser than 1 second. Note this works only on

Old Generation - No Compaction; results in

fragmented heap

#3 General Collector Selection Guidelines

Tuning GC & Heap

Selecting Collector

Page 67: Java Performance Monitoring & Tuning

Java Performance Tuning

-XX:ParallelGCThreads=4 will create 4 threads to

collect in parallel.

Ideally the number of threads must be equal to

number of processors.

Auto tuning based on Ergonomics

Generations in Parallel GC. The arrangement of

generations and names may be different in case

of different Collectors

Serial calls its Tenured and Parallel calls it Old

#4 Parallel Collector in Detail

Tuning GC & Heap

Selecting Collector

Page 68: Java Performance Monitoring & Tuning

Java Performance Tuning

Instead of you changing generation sizes etc.

You specify the goal and let the JVM auto tune

the generation sizes, number of threads etc.

There are 3 types of goals that can be specified

1.Pause Time

2.Throughput

3.Footprint

#5 Parallel Collector Auto Tuning with Goals

Tuning GC & Heap

Selecting Collector

Page 69: Java Performance Monitoring & Tuning

Java Performance Tuning

-XX:MaxGCPauseMillis=<N>

<N> milliseconds or lesser pause time is desired

Generation sizes adjusted automatically.

Throughput may be affected.

Meeting the goal is not guaranteed.

#5 Parallel Collector Pause Time Goal

Tuning GC & Heap

Selecting Collector

Page 70: Java Performance Monitoring & Tuning

Java Performance Tuning

Throughput goal is measure in terms of time

spent doing gc vs. Time spent outside gc

(application time)

-XX:GCTimeRatio=<N> which sets the ration of

gc to application time to 1 / (1 + N)

i.e. If <N> is 19 then 1 / (1 + 19) is 1/20 i.e. 5%

of time spent in GC is acceptable

Default value of <N> is 99 i.e. 1% (1 / 1 + 99) is

1/100 i.e. 1% of time in GC is acceptable

#6 Parallel Collector Throughput Goal

Tuning GC & Heap

Selecting Collector

Page 71: Java Performance Monitoring & Tuning

Java Performance Tuning

Specified with none other than -Xmx.

GC tries to minimize the size as long as other

goals are met

Goals are address in the order a) Pause time b)

Throughput and finally c) Footprint

#6 Parallel Collector Footprint Goal

Tuning GC & Heap

Selecting Collector

Page 72: Java Performance Monitoring & Tuning

Java Performance Tuning

Generation size adjustments are done automatically as per goals specified.

-XX:YoungGenerationSizeIncrement=<Y> where Y is the percentage by which the increments of Young Generation will happen

-XX:TenuredGenerationSizeIncrement=<T> for tenured

-XX:AdaptiveSizeDecrementScaleFactor for decrementing % of both generations

OOME : Parallel Collector will throw OOME if it spends 98% of time in GC and collects less than 2% of heap.

#6 Controlling the Auto Tuning

Tuning GC & Heap

Selecting Collector

Page 73: Java Performance Monitoring & Tuning

Java Performance Tuning

4 Phases

Initial Mark : Pauses all application threads and gets the root objects and object reachable from young.

Concurrent Mark: Marks rest of the object reachable from root, concurrently with application threads

Remark: Again pauses application threads to mark those objects that has changed references due to previous concurrent phase.

Concurrent Sweep: Sweeps the garbage concurrenly with application threads. Note it does NOT compact memory

#7 Concurrent Collector

Tuning GC & Heap

Selecting Collector

Page 74: Java Performance Monitoring & Tuning

Java Performance Tuning

Initial Mark - is always done with 1 single thread.

Remaking can be tuned to use multiple threads.

Pauses are for a very minimal amount of time - only during initial mark and remark phase.

Concurrent mode failure: May stop all application threads if concurrently running app threads are unable to allocate before the gc threads completes collection.

Floating Garbage: It is possible that objects traced by gc may become unreachable before gc completes collection. This will be cleared in next generation

#7 Concurrent Collector - (contd.)

Tuning GC & Heap

Selecting Collector

Page 75: Java Performance Monitoring & Tuning

Java Performance Tuning

#8 Available Collectors

Tuning GC & Heap

UseSerialGC UseParallelGC UseConMarkGC

Young / New

•Copy Collector•Single Threaded•Low Throughput

•PS Scavenge•Multiple Threads•High Throughput•Optimized

•ParNewGC (mandatory)

•Multiple Threads / Copy Collector

Tenuered / Old

•MarkSweepCompact•Single Threaded•Whole Heap Compaction

•PS MarkSweep•Multiple Threads•Compaction with

ParallelOldGC - but whole heap

•ConcurrentMarkSweep

•Low pause times•At cost of

throughput•No compaction

(Fragmented Heap)

Selecting Collector

Page 76: Java Performance Monitoring & Tuning

Java Performance Tuning

Target - servers with multiprocessors & large memories

Meets pause time goals with high probability with high throughput

It is concurrent, parallel and compacting.

Global marking is concurrent.

Interruptions proportional to heap or live-data sets.

#8 G1 Collector

Tuning GC & Heap

Selecting Collector

Page 77: Java Performance Monitoring & Tuning

Java Performance Tuning

Divides heap into regions, each contiguous range of virtual memory.

Concurrent Global Marking to determine liveness of objects through heap.

G1 knows which regions are mostly empty - collects these regions first; hence the name - Garbage First.

Collecting mostly empty is very fast as fewer objects to copy

#9 G1 Collector - How it Works

Tuning GC & Heap

Selecting Collector

Page 78: Java Performance Monitoring & Tuning

Java Performance Tuning

Uses Pause Prediction Model to meet user defined pause-time goals and selects regions based on this goal.

Concentrates on collection and compaction of regions that are full of dead matter (ripe for collection) - Again : fewer objects to copy.

Copies live objects from one or more regions to single region - in the process compacts and frees memory - this is evacuation.

Evacuating regions with mostly dead matter means again means fewer copies.

#10 G1 Collector - How it Works

Tuning GC & Heap

Selecting Collector

Page 79: Java Performance Monitoring & Tuning

Java Performance Tuning

Evacuation is done with multiple threads - decreasing pause times and increasing throughput.

Advantages

Continuously works to reduce fragmentation.

Thrives to work within user defined pause times.

CMS does not do compaction which results in fragmented heap

ParallelOld performs whole heap-compaction which results in considerable pause times

#11 G1 Collector - How it Works

Tuning GC & Heap

Selecting Collector

Page 80: Java Performance Monitoring & Tuning

Java Performance Tuning

#8 Available Collectors

Tuning GC & Heap

UseSerialGC UseParallelGCUseConMarkG

C

No Parallelism resulting in loss of throughput on multi processor

Whole Heap Compaction

No Compaction resulting in fragmented heap

No Compaction resulting in fragmented heap

Selecting Collector

๏ Regions

๏ Global Marking to get regions liveliness

๏ Collects mostly empty regions

๏ Vigilant on regions that has max dead matter - evacuates such regions first

๏ Evacuation is based on user defined pause-time requirements (Pause Prediction Model)

๏ Evacuating regions that are mostly empty and those that are with max dead matter means fewer obhject to copy. - Less overhead of copying

๏ Evacuation is parallel

G1๏Global Marking to determine Liveliness is Concurrent

๏Evacuation is Parallel

๏During evacuation, compacts while copying to other regions

๏Algo ensures - there are fewer objects to copy

Page 81: Java Performance Monitoring & Tuning

Java Performance Tuning

JVM Monitoring

Few more tips

Permanent Generation - Use -

XX:MaxPermSize=<N> if your application

dynamically generates classes (jsps for e.g.). If

perm gen goes out of space you will encounter

OOME Perm Gen Space.

Beware of Finalizers. GC needs two cycles to

clear objects with finalizers. Also, it is possible

that before the finalize is called the JVM exits.

Explicit GC : System.gc() can force major

collections when not needed

Other important considerations

Page 82: Java Performance Monitoring & Tuning

Java Performance Tuning

JVM Monitoring

Summary

Monitoring includes

GC Monitoring - Look for gc pauses, throughput

and foot print.

Threads Monitoring - Look for deadlocks,

starvation.

Method Profiling - Look for hot spots

Object Creation - Look for memory leaks

Summary

Page 83: Java Performance Monitoring & Tuning

A big Thank You

Still not so much about me but countless other developers who have helped perfect my craft by

sharing their experience with me

www.mslearningandconsulting.com

[email protected]