take your oracle weblogic applications to the next level with … · •further inspection through...
TRANSCRIPT
Take Your Oracle WebLogic Applications to The
Next Level with Oracle Enterprise Manager 12c
Mojahedul Hoque Abul Hasanat CTO, Therap Services
Neelima Bawa Consulting Tech. Lead, SCP, EM, Oracle
Agenda
• Background of Therap Services
• The problem
• Application Performance Management
• Quick description of Oracle’s APM offering
• How OEM and RUEI helped us
• Actual scenarios
• The future
• Tips
• Q&A
Therap Services - OOW 2013 2
Therap Services, LLC
• Documentation and Communication Software for MR/DD
• EHR for the DD industry is the closest for describing us
• Niche segment in the health sector
• Improve quality of life for people with DD by improving efficiency of delivery through communication
• SaaS business model
• 150K+ active users
• 1000+ providers in 48 states
• State customers
• Extensive usage for DD in DHS ND and DHHS NE
• 150+ employees
• Based in CT, dev center in Bangladesh
• http://www.therapservices.net
Therap Services - OOW 2013 3
The Application
• The application is our business
• 1M+ lines of code
• 60+ modules
• 1M+ sustained HTTP requests/hour
• 30K+ peak requests/minute
• 6000+ concurrent users
• Based on JEE and the Spring Framework
• Hibernate
• Seam
• GRAILS
Therap Services - OOW 2013 4
Delivery Platform
• 2 identical sites in two states
• Primary hosts (per site):
• 4 WebLogic application servers in cluster
• 1 Memory based data server (in-house, java)
• 1 Oracle database server
• 1 NetApp storage (SAN)
• 1 F5 Load balancer
• Supporting hosts
• Use Dyn for site high availability
• Data replication with Oracle Golden Gate
Therap Services - OOW 2013 5
What Matters
• Availability
• Application is used 24x7
• Application use is critical to the business of our customers
• Performance
• A user needs to spend as little time as possible in our application
• Most users use it daily, multiple times
• Data integrity
• Fast development turnaround
Therap Services - OOW 2013 6
Evolution of Therap
• Improved testing
• Formal code review
• Improved processes
• Removed repeating problems
• Now all problems we face are new
• Acquired large customers
• Availability and performance have become critical factors
Therap Services - OOW 2013 7
Before OEM & RUEI
• Heavy use of logging
• Nagios
• Cacti
• kill -3
Therap Services - OOW 2013 8
The Problem
• Diagnosis of performance issues
• Has become much harder with the growing system
• Application availability
• With a larger customer base, uptime has become a major factor
• Complexity of the system increases difficulty
• Limits of logging
• Works for known unknowns
• Need infrastructure to visualize and store historic data
• Limits of OS based monitoring
• Limited metrics
• Limits of simple JMX monitoring
Therap Services - OOW 2013 9
Application Performance Management
• Deep insight into running application
• Profiling at runtime
• Some bottlenecks are only visible at runtime
• Historic data
• Invaluable for preventing performance regression
Therap Services - OOW 2013 10
Which Vendor?
• We were moving from JBoss to WebLogic
• JVM Diagnostics
• Extensive WebLogic metrics
• Probably the best database diagnostics
• Our team was already familiar with OEM
• Deep integration with the database
• Integration of JVMD and app server metrics
• Expect better support from Oracle
Therap Services - OOW 2013 11
Oracle’s APM
• Oracle Enterprise Manager 12c
• WebLogic Metrics
• Middleware Diagnostics Advisor
• JVM Diagnostics
• Configuration Management
• Incident Management
• Lifecycle Management
• Oracle Real User Experience Insight
• Oracle Business Transaction Management
Therap Services - OOW 2013 12
WebLogic Metrics
• Pro-active monitoring
• Helps us in avoiding downtime
• Correlation between various metrics
• Middleware Diagnostics Advisor
Therap Services - OOW 2013 13
JVM Diagnostics
• Deep insight into the JVM
• Invaluable for understanding application performance issues
• Helped us in identifying log4j bottleneck
• Early identification of performance problems
Therap Services - OOW 2013 14
Oracle Real User Experience Insight
• Measure performance seen from the customer end
• Detect performance regression
• Enables shorter release cycles
• Quick and real feedback for performance tuning
operations
Therap Services - OOW 2013 15
Our Journey
Therap Services - OOW 2013 16
Timeline
• Evaluation of various vendors – July 2012
• Purchase of WebLogic, OEM, RUEI – Nov 2012
• Start JBoss to WebLogic migration
• Start building expertise on OEM
• Start using OEM in test environment
• Fix problems found through OEM, JVMD
• Production deployment – Mar 2013
Therap Services - OOW 2013 17
How OEM and RUEI helped us
Therap Services - OOW 2013 18
The log4j bottleneck
• During load testing, we could not increase load beyond a
certain point
• CPU load was low
• JVMD showed us something that we could hardly believe
• Many threads were contending for lock for writing to the
log file
• The contention only shows up at high loads
• Used JVMD heavily to find the best logging backend and
the best configuration
Therap Services - OOW 2013 19
log4j…
Therap Services - OOW 2013 20
log4j…
Therap Services - OOW 2013 21
log4j...
Therap Services - OOW 2013 22
EJB Transaction Optimization
• Noticed abnormally high number of bean transaction
commits
• We had forgotten to optimize some frequently used EJB
• Read-only methods do not need to be transactional
Therap Services - OOW 2013 23
EJB Transaction Optimization…
Therap Services - OOW 2013 24
Unexpected Top Method
• Noticed a JMS listener in the top method list
• In production!
• Did not show up during synthetic load testing
• We forgot to add a “message selector” on the listener
Therap Services - OOW 2013 25
Top Method…
Therap Services - OOW 2013 26
The MDA Catch
• MDA reported an unexpected “The EJB is taking too long
to execute”
• Related method was showing in the top methods list
• There were extraneous calls to the EJB
• The method did not need to be in an EJB
Therap Services - OOW 2013 27
The MDA Catch…
Therap Services - OOW 2013 28
The MDA Catch…
Therap Services - OOW 2013 29
The Slow Library
• A library call for producing JSON showed on the top
method list
• JSON is needed for AJAX
• It was totally unexpected
• The library was old and inefficient
• Replaced it with a newer and more efficient library
Therap Services - OOW 2013 30
The Slow Library…
Therap Services - OOW 2013 31
The Slow Library…
Therap Services - OOW 2013 32
The Slow Library…
Therap Services - OOW 2013 33
In-efficient Network Write
• Initially discovered in production through JVMD
• There were instances of high network waits
• Methods a certain module in the application showed up in
the top list during the high network wait periods
• Discovered a 3 level loop that writes data
• Further inspection through JProfiler confirmed it
Therap Services - OOW 2013 34
In-efficient Network Write…
Therap Services - OOW 2013 35
In-efficient Network Write…
Therap Services - OOW 2013 36
Automatic Thread Snapshots
• Previously, relied on kill -3
• Manual, missed dumps at crucial moments
• Now, JVMD takes thread snapshots when an abnormal
thread state is reached on any WebLogic server
• Combined with auto-restart from WebLogic, eliminated
unplanned downtime
Therap Services - OOW 2013 37
Compliance
• Helps us identify patches needed for:
• WebLogic
• Oracle Database
• OEM
• Downtime log
• Useful for tracking operations improvement
Therap Services - OOW 2013 38
The JDK Upgrade
• Upgraded JDK from 1.6.0_29 to 1.6.0_45
• At that time, we have not brought RUEI into our regular
operations process
• Started getting slowness complaints after a few days
• A look into RUEI instantly revealed performance
regression
• Downgrading the JDK fixed the regression completely
Therap Services - OOW 2013 39
The JDK Upgrade…
Therap Services - OOW 2013 40
The JDK Upgrade…
• We could not reproduce the performance regression in lab
environment
• We now have a new procedure to upgrade JDK
• Do normal tests and load tests as before
• In production, upgrade the JDK of one server only
• Wait a few days
• Compare performance
• Decide whether to upgrade or rollback
Therap Services - OOW 2013 41
The Results
• 1 unplanned downtime in the last 4 months!
• Improving DevOps culture
• Foster collaboration between dev, ops and DBA
• No more fighting between dev and DBA
• With RUEI, even the business team joins the fun
Therap Services - OOW 2013 42
Most
Importantly
I can do this!
Therap Services - OOW 2013 43
Future with OEM and RUEI
Therap Services - OOW 2013 44
Future
• Involve more people
• Use KPI in RUEI
• Integrate KPI with OEM by defining a Business
Application
• Leverage alerting and incident management in OEM
Therap Services - OOW 2013 45
Challenges
• Steep learning curve
• Took us time to understand the role of BTM and JRF
• Better documentation available now
• Lack of full WebLogic 12c support when we started
• Should be solved in latest release of OEM
• Support has been fanatical!
• Thank you Oracle!
Therap Services - OOW 2013 46
Tips
• Understand what you need
• Any web app with performance requirements needs RUEI
• If you face performance issues, you will need JVMD
• Engage as many people in your team as possible
• Give wide access to many
• The learning curve
• The problems these tools help you with are also complex
Therap Services - OOW 2013 47
Q&A
Therap Services - OOW 2013 48
Other Sessions
• JVM Diagnostics: Java Profiling in Production
Environments [CON9571]
• Thursday 2:00pm – 3:00pm @ Moscone North - 130
Therap Services - OOW 2013 49
Therap Services - OOW 2013 50
Contact
http://neelimabawa.blogspot.com
@Masum6