autonomous recovery in componentized internet application candea et. al vikram negi
DESCRIPTION
Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi. Introduction. Autonomic Problem Approach Results Discussion. The Autonomic Problem. To allow the application to recover automatically from transient and intermittent software failure. The Approach. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/1.jpg)
Autonomous Recovery in Componentized Internet
ApplicationCandea et. al
Vikram Negi
![Page 2: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/2.jpg)
Introduction
• Autonomic Problem
• Approach
• Results
• Discussion
![Page 3: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/3.jpg)
The Autonomic Problem
• To allow the application to recover automatically from transient and intermittent software failure.
![Page 4: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/4.jpg)
The Approach
• Introduce the idea :– Microanalysis (fault detection)– Microrebooting (rapid recovery)– External Management (recovery action)
• Integrate and Test with JBOSS
![Page 5: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/5.jpg)
Design Overview
• Autonomous Process – Monitoring
• Java probes
– Fault detection• Generate Anomaly report
– Recovery• Takes action
• Total time to recovery.
![Page 6: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/6.jpg)
J2EE Review
• J2EE enterprise apps = collection of reusable Java modules
• JSPs / servlets invoke EJBs, which invoke other EJBs, ...
• EJB = Java component that complies to a certain interface and provides a service
• Deployment descriptor (per-bean XML file) conveys run-time characteristics and dependencies; used in deploying the application
![Page 7: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/7.jpg)
JBoss Design
• Open-source J2EE app server• Written entirely in Java • Microkernel with components held together by JMX (Mgmt Support)
![Page 8: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/8.jpg)
JAGR = ROC-ified JBoss with Application-Generic Recovery
• 3 Tier Architecture
• Key Components– Macro analysis Engine
– Microrebooting Hook
– Recovery Manager
![Page 9: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/9.jpg)
Pinpoint : Detection and Localization
• Store Observation– IP address of machine, timestamp– Globally unique request ID. – # of calls/returns to EJB’s– Association between sender and receiver.– Collect SQL Queries, update, read
![Page 10: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/10.jpg)
Pinpoint : Analysis
• Analysis Engine– Centralized Engine
– Plugin based architecture
• Modeling Components– Assume both present
component behavior and historical (normal) behavior have same probability distribution.
– Ki square test to determine different probability distribution.
![Page 11: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/11.jpg)
Recovery : micro-reboot is not expensive
• State Segregation– Store impt. state outside the application in database. – Persistent State
• CMP (container managed persistence, J2EE) is a requirement for prototype.
– Session State• Store in modified SSM(external session state store)
• Containment and Reintegration– Microreboot transitive closure of all inter-EJB references– XML deployment descriptors to determine grouping for closure– Complete or micro reboot
![Page 12: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/12.jpg)
Recovery
• Enabling Micro reboot– Method in JBOSS EJB Container– Preserve Class Loader
![Page 13: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/13.jpg)
Manage Recovery
• Recovery Policy
– Read failure report consider components > 1.0
– Micro-reboot(top n) or all >1.0
– Allow delay (~30sec)
– If error is present still try few time or reboot completely
– Finally report it to sys admin
![Page 14: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/14.jpg)
Evaluation Test Framework
• Application– Petstore 1.1 (12 comp, 233 java file, 11K Loc)
– Petstore 1.3.1(47 comp, 310 java file 10K Loc)
– RUBiS (21 comp, 500 java file , 25K Loc)
• Workload– Implement Simulators with Transition table.
– 350 client (max utilization principle)
• Faultload– Based on industry experience
– No low level hardware or OS faults.
![Page 15: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/15.jpg)
Evaluation Detection
• Result similar to other detector
• No discussion on absolute numbers?• Forced Java Runtime/Declared Exceptions, call emission and src code bug
• 1# How well the fault was detected, 2#how well major outage was detected ?
![Page 16: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/16.jpg)
Evaluation : Localization
Localization % for a algorithm per fault type CIA > 85%No absolute data again ?
![Page 17: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/17.jpg)
Evaluation : Recovery
• Introduce faults in SSM-RUBiS.
• Restart SSM-RUBiS or micro reboot component.
• Observation from 10 trials per 350 concurrent client.
![Page 18: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/18.jpg)
Full v/s Micro reboot
• Injected a null reference fault in SB CommitBid, then a corrupt User-Item, SB BrowseCategories and SB CommitUserFeedback.
• Microreboot maintains steady response.
• 425 vs 3916 failed request
• 61527 vs 56028 success request
• What error condition did other trials had?
![Page 19: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/19.jpg)
Total Recovery Time
• Corrupt SB_ViewItem set it to NULL.• 19.4 sec TRT• 18.5 sec in analysis• Pinpoint is bottleneck in micro reboot.
![Page 20: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/20.jpg)
Pinpoint is app generic ?
• Upgrade to Petstore v.1.3.2– Works for the confidence interval
How different was the updated version??
![Page 21: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/21.jpg)
Perfomance Overload
• Results for 30min fault free run w/ 350 clients
• In memory v/s Out memory (SSM)
• Marshalling costs
![Page 22: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/22.jpg)
Assumption
• Well defined interface for components (.Net,J2ee)
• Deterministic call path b/w component
• No critical service request
• Training data for statistical model
• Guidelines (Crash Only Software)
![Page 23: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/23.jpg)
Discussion
• Overall one of the Good Papers maybe bit verbose in introduction !
• Integrating framework for earlier work by Candea.• Limitation of the present statistical model.• Shared EJB state
– Modify JIT, disable microreboots(ref, static var)
• Application – Global data not scrubbed. • Cost Benefit : micro reboot v/s total reboot
![Page 24: Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi](https://reader036.vdocument.in/reader036/viewer/2022081603/568140a1550346895dac5555/html5/thumbnails/24.jpg)
Supplementary
• Application server = operating system for Internet applications (instantiates app components in containers, provides runtime system services, integrates with web server to make app webaccessible)
• http://people.epfl.ch/george.candea