tomcatx-troubleshooting-production

55
Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. Tomcat Expert Series Troubleshooting in Production Filip Hanik SpringSource 2009

Upload: sudipta-dutta

Post on 02-Apr-2015

44 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: tomcatx-troubleshooting-production

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited.

Tomcat Expert Series

Troubleshooting in Production

Filip Hanik SpringSource

2009

Page 2: tomcatx-troubleshooting-production

2

Topics in this Session

• Brief overview of JVM memory layout• Understanding Out Of Memory Errors

– Causes– Solutions

• Error logs and stack traces• The tale of the thread dump

Page 3: tomcatx-troubleshooting-production

3

Topics in this Session

• Using OS utilities to narrow down the problem

• JMX – what it can do for you

Page 4: tomcatx-troubleshooting-production

4

Everything else…

Process Heap (java/java.exe)

OS Memory (RAM)

The JVM process heap

Java Object Heap

Page 5: tomcatx-troubleshooting-production

5

Storing data in memory

• JVM manages the process heap (in most cases)– JNI managed memory would be an exception, and there

are others

• No shared memory between processes– At least not available through the Java API

• JVM creates a Java Heap – Part of the process heap

• Configured through –Xmx and –Xms settings

Page 6: tomcatx-troubleshooting-production

6

JVM Process Heap

• Maximum size is limited– 32 bit size, roughly 2GB

– 32 bit JVM on 64 bit OS, roughly 3.7GB

– 64 bit, much much larger V

• If 2GB is the max for the process– -Xmx1800m –Xms1800m – not very good

– Leaves no room for anything else

Page 7: tomcatx-troubleshooting-production

7

Gotcha #1

• -Xmx and -Xms– Only controls the Java Object Heap

• Often misunderstood to control the process heap

• Confusion leads to incorrect tuning– And in some cases, the situation worsens

Page 8: tomcatx-troubleshooting-production

8

Java Object Heap Allocation

• Aggressive Heap Allocation

• -XX:MinHeapFreeRatio=– Default is 40 (40%)

– When the JVM allocates memory, it allocates enough to get 40% free

• -XX:MaxHeapFreeRatio= – Default 70%

– To give back memory when a majority of it is not used

• Not important when –Xms == -Xmx

Page 9: tomcatx-troubleshooting-production

9

Young Generation

Java Object Heap

Old Generation

Java Object Heap (-Xmx/-Xms)

A good size for the YG is 33% of the total heap

Page 10: tomcatx-troubleshooting-production

10

Java Object Heap

• Young Generation– All new objects are created here

– Only moved to Old Gen if they survive one or more minor GC

• Sized using– -Xmn – fixed value

– -XX:NewRatio=<value> - dynamic sizing

– -XX:MaxNewSize/-XX:NewSize – similar to -Xmx/-Xms

• Survivor Spaces– 2, used during the GC algorithm (minor collections)

– Mainly to alleviate fragmentation

Page 11: tomcatx-troubleshooting-production

11

New Objects

Survivor SpaceEden Space

Young Generation

To From

Survivor Ratio(-XX:SurvivorRatio )

2Mb default

64Kb default

Young size(-XX:NewRatio )

Page 12: tomcatx-troubleshooting-production

12

Tenured Space

Old Generation

5Mb min 44Mb max (default)

Garbage collection section willexplain in detail how these spaces are used during the GC process.

Page 13: tomcatx-troubleshooting-production

13

Java Heap Space

• java.lang.OutOfMemoryError: Java heap space– Most common out of memory error

– Caused by too many objects in the Java heap

• Solution– Increase -Xmx if possible

– Add -XX:+HeapDumpOnOutOfMemoryError

– Fix memory leak if application is consuming more memory than expected

• Side effects– Increasing -Xmx can have other side effects

Page 14: tomcatx-troubleshooting-production

14

JVM Process Heap

• Yes, there is more...– Permanent Space

– Code Generation

– Socket Buffers

– Thread stacks

– Direct Memory Space

– JNI Code

– Garbage Collection

– JNI allocated memory

Page 15: tomcatx-troubleshooting-production

15

Permanent Space

• Permanent Generation– Permanent Space (name for it)

– 4Mb initial, 64Mb max

• Stores classes, methods and other meta data– -XX:PermSize=<value> (initial)

– -XX:MaxPermSize=<value> (max)

• Common OOM for webapp reloads– Separate space for pre-historic reasons

– Early days of Java, class GC was not common, reduces size of the Java Heap

– Does not exist in IBM JVM

Page 16: tomcatx-troubleshooting-production

16

Permanent Space

• Permanent Space Memory Errors– Too many classes loaded

– Classes not unloaded/not being GC:ed

– Unaffected by –Xmx flag

• java.lang.OutOfMemoryError: PermGen space– Many situations, increasing max perm size will help

– i.e. no leak, but just not enough memory

– Others will require to fix the leak

Page 17: tomcatx-troubleshooting-production

17

Socket Buffers

• Each connection contains two buffers– Receive buffer ~37k

– Send buffer ~25k

• Configured in Java code– Default and Max limits set in kernel

• Very common tune parameter for content delivery

• Usually hit other limits than memory before you run out of memory– IOException: Too many open files (for example)

– SocketException: No buffer space available

Page 18: tomcatx-troubleshooting-production

18

Thread Stacks

• Each thread has a separate memory space called “thread stack”

• Configured by –Xss

• Default value depends on OS/JVM

• As number of threads increase, memory usage increases

Page 19: tomcatx-troubleshooting-production

19

Thread Stacks

• java.lang.OutOfMemoryError: unable to create new native thread

• Solution– Decrease –Xmx (or other space) and/or

– Decrease –Xss

– Or, you have a thread leak, fix the program

• Gotcha– Increasing –Xmx (32bit systems) will leave less room for

threads if it is being used, hence the opposite of the solution

– Too low –Xss value can causejava.lang.StackOverflowError

• Thread dump will lead to instant answer– Returns all the threads

Page 20: tomcatx-troubleshooting-production

20

Garbage Collection

• However, if there is excessive GC

• java.lang.OutOfMemoryError: GC overhead limit exceeded

• 98% of the time is spent in GC– less than 2% of the heap is recovered

• To disable– -XX:-UseGCOverheadLimit

• To enable– -XX:+UseGCOverheadLimit

Page 21: tomcatx-troubleshooting-production

21

GC: How It Works

Survivor SpaceEden Space

From To

Tenured Space

1. New object is created2. When EDEN is full – minor collection3. Copy surviving objects into 1st

survivor space

4. Next time Eden is fullCopy from Eden to 2nd Copy from 1st to 2nd

5. If 2nd fills and objects remain in Eden or 1st

These get copied to the tenured

Page 22: tomcatx-troubleshooting-production

22

GC: Debugging it

• -Xloggc:%CATALINA_BASE%\logs\gc.log

• -XX:+PrintGCDetails

• -XX:+PrintGC

• -XX:+PrintGCApplicationStoppedTime

• -XX:+PrintGCTimeStamps

• -XX:+PrintHeapAtGC

Page 23: tomcatx-troubleshooting-production

23

Troubleshooting steps

• What are the symptoms

• How does the problem manifest itself

• How can I dig into the problem

Page 24: tomcatx-troubleshooting-production

24

Tomcat Logs

• Tomcat is really good at logging

• It doesn’t spit out info you don’t need know

• When an error happens it can, however, generate tons of log entries

• Common IO exceptions are swallowed– Normal TCP behavior

Page 25: tomcatx-troubleshooting-production

25

Tomcat Logs

• Unexpected errors are always logged– Application and container errors

• Logs are always a resource

• Log entries are categorized– DEBUG

– INFO

– WARNING

– SEVERE

– FATAL

Page 26: tomcatx-troubleshooting-production

26

Tomcat Logs

• INFO – no error, just information given to you

• WARNING – you might care a little bit

• SEVERE – yes, now you got an error

• FATAL – what ever this is, it cant be good!

• Always pay attention to the log level

Page 27: tomcatx-troubleshooting-production

27

Tomcat Logs

• So what do I need to look at

• SEVERE – yes, now you got an error

• It’s easy to ‘grep’ logs for these entries

Page 28: tomcatx-troubleshooting-production

28

Tomcat Logs

• Very basic error

• So Tomcat can’t log application errors– Only if the application doesn’t ‘trap’ the error, tomcat

will catch it and spit out some info

• Found in catalina.2008-03-28.log– Uncaught application exception

// An uncaught application error could look like thisSEVERE: Servlet.service() for servlet jsp threw exceptionjava.lang.NullPointerException at org.apache.jsp.npe_jsp._jspService(npe_jsp.java:55)

Page 29: tomcatx-troubleshooting-production

29

Java Stack Traces

// An uncaught application error could look like thisSEVERE: Servlet.service() for servlet jsp threw exceptionjava.lang.NullPointerException

at org.apache.jsp.npe_jsp._jspService(npe_jsp.java:55)at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:374)at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:337)at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)at org.apache.catalina.valves.RequestDumperValve.invoke(RequestDumperValve.java:151)at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)at java.lang.Thread.run(Thread.java:595)

• Shows the code execution path up until the error happened

Page 30: tomcatx-troubleshooting-production

30

Java Stack Traces

// An uncaught application error could look like thisSEVERE: Servlet.service() for servlet jsp threw exceptionjava.lang.RuntimeException: java.lang.NullPointerException at org.apache.jsp.npe_jsp._jspService(npe_jsp.java:59) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)…Caused by: java.lang.NullPointerException at org.apache.jsp.npe_jsp._jspService(npe_jsp.java:57) ... 19 more

• Traces can be chained, only the root cause is the real error

• Always find the top most error and the bottom most root cause

Page 31: tomcatx-troubleshooting-production

31

Tomcat Logs

• If the logs are not yielding any information– Gather more information about the error

• Can you reproduce it

• Is it happening on your server– Trace the request down

Page 32: tomcatx-troubleshooting-production

32

Viewing Requests

• Access logs can help – They record every request and it’s response code

– You can print out headers, cookies, request and session information

• Often very useful to see how traffic is flowing

• When using httpd in front– httpd access log combined with Tomcat access log

– Excellent way to consolidate requests

Page 33: tomcatx-troubleshooting-production

33

Seeing traffic

• When using httpd in front– httpd has a mod_dumpio module

– Print out everything one wants to know

– Useful when privileges for a sniffer are not present

• RequestDumper Valve – Valve that spits out everything, similar to mod_dumpio

– Poorly designed, it breaks out the output into multiple log statement

– Not recommended

Page 34: tomcatx-troubleshooting-production

34

Seeing traffic

• Network sniffers (client/server)– Nothing compares to getting exact data

– Wireshark, ethereal, tcpdumper, etc

• Many choices, just pick one.– Often one is already installed

– Requires root privileges

• Client side visualizer– MS Fiddler is an excellent tool for Windows users

– Firefox firebug

Page 35: tomcatx-troubleshooting-production

35

Seeing traffic

• Ability to map– Request to error

– Error to a time frame

– Error to a client

• Traffic pattern can– Help you reproduce the error

– Resolve the issue faster

Page 36: tomcatx-troubleshooting-production

36

Thread dumps

• Displays the state of all threads in a virtual machine

• Provides plenty of information about activity and any dead locks

• Provides a trace where each thread started to where its current point in execution

Page 37: tomcatx-troubleshooting-production

37

Thread dumps

• On Unix -> kill -3 <tomcat pid>– jstack -l also works

– On Solaris, powerful pstack utility

• On Windows Ctrl+Break– JDK 1.6+ you have jstack to help

• Tanuki Wrapper– telnet <host:port> D

• Thread dump is printed to stdout– Or wherever stdout is redirected to

– Don't send it to /dev/null !

Page 38: tomcatx-troubleshooting-production

38

The tale of Thread dumps

Heap def new generation total 9088K, used 5497K [0x04070000, 0x04a40000, 0x067d0000) eden space 8128K, 60% used [0x04070000, 0x045402a0, 0x04860000) from space 960K, 59% used [0x04950000, 0x049de428, 0x04a40000) to space 960K, 0% used [0x04860000, 0x04860000, 0x04950000) tenured generation total 121024K, used 1656K [0x067d0000, 0x0de00000, 0x24070000) the space 121024K, 1% used [0x067d0000, 0x0696e068, 0x0696e200, 0x0de00000) compacting perm gen total 12288K, used 4482K [0x24070000, 0x24c70000, 0x2c070000) the space 12288K, 36% used [0x24070000, 0x244d0920, 0x244d0a00, 0x24c70000) ro space 8192K, 66% used [0x2c070000, 0x2c5bd978, 0x2c5bda00, 0x2c870000) rw space 12288K, 52% used [0x2c870000, 0x2ceb9cb8, 0x2ceb9e00, 0x2d470000)

• Alternate way to dump– jstack – tool that comes with the JDK

– Use -l option with jstack to get lock information

• More than just threads

• JDK 1.6+ also prints out memory stats

Page 39: tomcatx-troubleshooting-production

39

The tale of Thread dumps

Found one Java-level deadlock:============================="pool-2-thread-6": waiting for ownable synchronizer 0x482eaeb0, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by "http-8081-2""http-8081-2": waiting to lock monitor 0x08143b14 (object 0x482eade0, a org.apache.catalina.ha.session.DeltaRequest), which is held by "pool-2-thread-6"

• More than just threads

• Dead lock detection

Page 40: tomcatx-troubleshooting-production

40

The tale of Thread dumps

• Thread information

• First line for each thread contains critical info

"http-8082-7" thread name

daemon type of thread [daemon, empty means non daemon]

prio=10 thread priority [1..10]

tid=0x09142000 C++ pointer to JVM OSThread object

nid=0x67c kernel thread identifier

in Object.wait() what the thread is doing

[0xba3ff000..0xba3ff4e0] address space

Page 41: tomcatx-troubleshooting-production

41

The tale of Thread dumps

nid=0x67c kernel thread identifier

ps -eL -o pid,%cpu,lwp | grep -i `ps -ef \ |grep -v grep |grep java|awk '{print $2}'` |grep -v 0.0

• Thread information– CPU usage very high, caused by a spinning thread, but

which one?

– On linux for example, one can get CPU usage per thread

• Lists all threads that run inside a java process with a CPU usage higher than 0.0%

Page 42: tomcatx-troubleshooting-production

42

The tale of Thread dumps

• Threads dumps can help you identify

• Show what threads are waiting for a lock

• Dead locks

• Spinning threads, and what code it belongs

• Memory usage

• Cause of an unresponsive JVM

Page 43: tomcatx-troubleshooting-production

43

The tale of Thread dumps

• When taking thread dumps ALWAYS take two or more dumps

• This will help you see if threads are changing execution path

• Single thread dump can cause a lot of “false positives”, where you think a thread is stuck but it’s not

Page 44: tomcatx-troubleshooting-production

44

The tale of Thread dumps

• When taking thread dumps ALWAYS take two or more dumps

• Sometimes when a OS is overloaded, maybe swapping, threads move, but very slowly

• If you hadn’t had two dumps, you’d never know that!

Page 45: tomcatx-troubleshooting-production

45

The tale of Thread dumps

• Examples of thread dumps and GC logs– Stuck but not dead locked (jvm-locked.txt)

– What went wrong? (five-thread-dumps.txt)

– Memory leak confirmed (gc.log/gc-logs-explained.txt)

– Excessive GC (gc-excessive-object.creation.txt)

Page 46: tomcatx-troubleshooting-production

46

OS utilities

• Troubleshooting a Tomcat server – often involves more than just Tomcat and JVM

troubleshooting

– OS utilities come very handy

• Tomcat logs internal errors– If they spawned an exception

– Container bugs are hard to troubleshoot, but rare in frequencies

Page 47: tomcatx-troubleshooting-production

47

OS utilities

• File descriptors are common problems– Most OS have a limit on FD

– A FD can be an open file or a socket

• File descriptor leaks are also very common in web applications– java.io.IOException: Too many open files

• Use utilities to track it down

Page 48: tomcatx-troubleshooting-production

48

OS utilities

• File descriptors– Sockets

– Open files

• On Linux – lsof –p <process id>– Will list all open FD

– This will often put you on the right path on what is going wrong

– Solaris – pfiles

Page 49: tomcatx-troubleshooting-production

49

OS utilities

• Active Connections– Can be HTTP/AJP connections

– Useful to track database or other connections

• netstat is an excellent tool to view sockets and their current state

• On Unix– Able to track socket buffers and their usage

– Send buffer filling up, slow client or bad network

– Receive buffer filling up, application is not reading fast enough

Page 50: tomcatx-troubleshooting-production

50

OS utilities

• Linux– nmon

– Very nice system stats collector

– CPU,Memory,disk,network and more

• Windows– Expand your task manager

– It reports as much as you need (thread counts, IO activity, virtual vs residential memory

Page 51: tomcatx-troubleshooting-production

51

JMX

• Both Tomcat and the JVM– Make information available through JMX

• jconsole– Utility that comes with the JDK

– Lets you attach to a JVM and get information

• JVM or application inoperable– JMX may not report accurately

– Don't rely on it exclusively

Page 52: tomcatx-troubleshooting-production

52

So far…

• We’ve covered tons of options to troubleshoot your systems – without actually adding any software

– This is usually a requirement for production troubleshooting

• There are additional options available

• Such as profilers and monitoring applications

Page 53: tomcatx-troubleshooting-production

53

Profilers

• Comes in all shapes and sizes (and prices)

• My preferences– www.yourkit.com

– Very biased, as a Tomcat developer, I get it for free

• But it has many advantages such as:– Inexpensive

– Works fairly well in production environments

– Excellent support

– Great tool to use

Page 54: tomcatx-troubleshooting-production

54

Profilers

• Only last resort for production– When you are unable to reproduce in UAT/QA/etc

– But sometimes required to solve the problem

Page 55: tomcatx-troubleshooting-production

55

Summary

• Gather all the information first– Even if you think you don't need it

– Create a check list• Thread dumps• Logs • Configuration files• OS statistics• etc

• Chasing a problem without all that, you can easily miss something– And you'll chase a needle in a haystack