![Page 1: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/1.jpg)
Autonomous Recovery in Componentized Internet
ApplicationCandea et. al
Vikram Negi
![Page 2: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/2.jpg)
Introduction
• Autonomic Problem
• Approach
• Results
• Discussion
![Page 3: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/3.jpg)
The Autonomic Problem
• To allow the application to recover automatically from transient and intermittent software failure.
![Page 4: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/4.jpg)
The Approach
• Introduce the idea :– Microanalysis (fault detection)– Microrebooting (rapid recovery)– External Management (recovery action)
• Integrate and Test with JBOSS
![Page 5: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/5.jpg)
Design Overview
• Autonomous Process – Monitoring
• Java probes
– Fault detection• Generate Anomaly report
– Recovery• Takes action
• Total time to recovery.
![Page 6: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/6.jpg)
J2EE Review
• J2EE enterprise apps = collection of reusable Java modules
• JSPs / servlets invoke EJBs, which invoke other EJBs, ...
• EJB = Java component that complies to a certain interface and provides a service
• Deployment descriptor (per-bean XML file) conveys run-time characteristics and dependencies; used in deploying the application
![Page 7: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/7.jpg)
JBoss Design
• Open-source J2EE app server• Written entirely in Java • Microkernel with components held together by JMX (Mgmt Support)
![Page 8: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/8.jpg)
JAGR = ROC-ified JBoss with Application-Generic Recovery
• 3 Tier Architecture
• Key Components– Macro analysis Engine
– Microrebooting Hook
– Recovery Manager
![Page 9: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/9.jpg)
Pinpoint : Detection and Localization
• Store Observation– IP address of machine, timestamp– Globally unique request ID. – # of calls/returns to EJB’s– Association between sender and receiver.– Collect SQL Queries, update, read
![Page 10: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/10.jpg)
Pinpoint : Analysis
• Analysis Engine– Centralized Engine
– Plugin based architecture
• Modeling Components– Assume both present
component behavior and historical (normal) behavior have same probability distribution.
– Ki square test to determine different probability distribution.
![Page 11: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/11.jpg)
Recovery : micro-reboot is not expensive
• State Segregation– Store impt. state outside the application in database. – Persistent State
• CMP (container managed persistence, J2EE) is a requirement for prototype.
– Session State• Store in modified SSM(external session state store)
• Containment and Reintegration– Microreboot transitive closure of all inter-EJB references– XML deployment descriptors to determine grouping for closure– Complete or micro reboot
![Page 12: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/12.jpg)
Recovery
• Enabling Micro reboot– Method in JBOSS EJB Container– Preserve Class Loader
![Page 13: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/13.jpg)
Manage Recovery
• Recovery Policy
– Read failure report consider components > 1.0
– Micro-reboot(top n) or all >1.0
– Allow delay (~30sec)
– If error is present still try few time or reboot completely
– Finally report it to sys admin
![Page 14: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/14.jpg)
Evaluation Test Framework
• Application– Petstore 1.1 (12 comp, 233 java file, 11K Loc)
– Petstore 1.3.1(47 comp, 310 java file 10K Loc)
– RUBiS (21 comp, 500 java file , 25K Loc)
• Workload– Implement Simulators with Transition table.
– 350 client (max utilization principle)
• Faultload– Based on industry experience
– No low level hardware or OS faults.
![Page 15: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/15.jpg)
Evaluation Detection
• Result similar to other detector
• No discussion on absolute numbers?• Forced Java Runtime/Declared Exceptions, call emission and src code bug
• 1# How well the fault was detected, 2#how well major outage was detected ?
![Page 16: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/16.jpg)
Evaluation : Localization
Localization % for a algorithm per fault type CIA > 85%No absolute data again ?
![Page 17: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/17.jpg)
Evaluation : Recovery
• Introduce faults in SSM-RUBiS.
• Restart SSM-RUBiS or micro reboot component.
• Observation from 10 trials per 350 concurrent client.
![Page 18: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/18.jpg)
Full v/s Micro reboot
• Injected a null reference fault in SB CommitBid, then a corrupt User-Item, SB BrowseCategories and SB CommitUserFeedback.
• Microreboot maintains steady response.
• 425 vs 3916 failed request
• 61527 vs 56028 success request
• What error condition did other trials had?
![Page 19: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/19.jpg)
Total Recovery Time
• Corrupt SB_ViewItem set it to NULL.• 19.4 sec TRT• 18.5 sec in analysis• Pinpoint is bottleneck in micro reboot.
![Page 20: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/20.jpg)
Pinpoint is app generic ?
• Upgrade to Petstore v.1.3.2– Works for the confidence interval
How different was the updated version??
![Page 21: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/21.jpg)
Perfomance Overload
• Results for 30min fault free run w/ 350 clients
• In memory v/s Out memory (SSM)
• Marshalling costs
![Page 22: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/22.jpg)
Assumption
• Well defined interface for components (.Net,J2ee)
• Deterministic call path b/w component
• No critical service request
• Training data for statistical model
• Guidelines (Crash Only Software)
![Page 23: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/23.jpg)
Discussion
• Overall one of the Good Papers maybe bit verbose in introduction !
• Integrating framework for earlier work by Candea.• Limitation of the present statistical model.• Shared EJB state
– Modify JIT, disable microreboots(ref, static var)
• Application – Global data not scrubbed. • Cost Benefit : micro reboot v/s total reboot
![Page 24: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/24.jpg)
Supplementary
• Application server = operating system for Internet applications (instantiates app components in containers, provides runtime system services, integrates with web server to make app webaccessible)
• http://people.epfl.ch/george.candea