integrated data, message, and process recovery for failure masking in web services

29
Saarbrücken, Aug. 26th, 2 Saarbrücken, Aug. 26th, 2 005 005 1 Max Planck Institute for Informatics AG5: Databases and Information Systems Integrated Integrated Data, Process, and Message Data, Process, and Message Recovery Recovery for Failure Masking in Web for Failure Masking in Web Services Services Doctoral Thesis Colloquium Doctoral Thesis Colloquium German Shegalov German Shegalov funded by

Upload: german-gera-shegalov

Post on 22-May-2015

311 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Saarbrücken, Aug. 26th, 2005Saarbrücken, Aug. 26th, 2005 11

Max Planck Institute for InformaticsAG5: Databases and Information Systems

IntegratedIntegratedData, Process, and MessageData, Process, and Message

RecoveryRecoveryfor Failure Masking in Web Servicesfor Failure Masking in Web Services

Doctoral Thesis Colloquium Doctoral Thesis Colloquium

German ShegalovGerman Shegalov

funded by

Page 2: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

OutlineOutline Problem Statement and BackgroundProblem Statement and Background Interaction Contracts FrameworkInteraction Contracts Framework

• Formal Specification of the Formal Specification of the Committed ICCommitted IC • Verification of IC's with model checkingVerification of IC's with model checking• Verification of Web Service IC ModelVerification of Web Service IC Model

Implementation: Exactly-Once Web Implementation: Exactly-Once Web Service (EOS)Service (EOS)• OverviewOverview• EOS-PHPEOS-PHP• DemoDemo

SummarySummary

Page 3: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Problem StatementProblem Statement

Non-idempotence (math)Non-idempotence (math)• , n , n >> 1 1

Non-idempotence (Web, ERP, etc.)Non-idempotence (Web, ERP, etc.)• "Request timeout" "Request timeout" "request failure" "request failure"• "Request send" "Request send" "request resend" "request resend"• 88 Medicare cards for a Medicare cards for a 33 member family member family• Order Order oneone, get , get many many , pay , pay many many

( ) ( )nf x f x

Page 4: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Transaction RecoveryTransaction Recovery

BEGIN TRANSACTIONBEGIN TRANSACTION/* LSN= 1: log for undo and redo in MM buffer*//* LSN= 1: log for undo and redo in MM buffer*/

UPDATE Accounts SET balance = balance – 100,00 UPDATE Accounts SET balance = balance – 100,00 WHERE Number = 1WHERE Number = 1

/* LSN = 2: log for undo and redo in MM buffer*//* LSN = 2: log for undo and redo in MM buffer*/UPDATE Accounts SET balance = balance + 100,00UPDATE Accounts SET balance = balance + 100,00

WHERE Number = 2WHERE Number = 2/* LSN = 3: log commit and force (5-6 orders slower)*//* LSN = 3: log commit and force (5-6 orders slower)*/COMMIT TRANSACTIONCOMMIT TRANSACTION

AccountsAccounts

NumberNumber BalanceBalance

11 1000,001000,00

22 2000,002000,00

AccountsAccounts

NumberNumber BalanceBalance

11 900,00900,00

22 2100,002100,00

At most once semanticsAt most once semantics

Redo Committed, Undo UncommittedRedo Committed, Undo Uncommitted• LSN test guarantees idempotenceLSN test guarantees idempotence

(LSN=0) (LSN=3)(LSN=3)

Page 5: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

However, … However, …

ACK

Web Client Web Application Server

Database Server

Purchase Request

Order Confirmation

Start Transaction

SQL RequestSQL Response

SQL Request

SQL Response

Commit Transaction

Tim

eline

ACKTransaction Restart

Purchase Request Resubmission

Non-idempotent execution!

Transactions alone are not a panacea!!!Transactions alone are not a panacea!!!

Page 6: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Real-World Real-World nn-Tier App -Tier App

Expedia SabreServer

AmadeusExpedia App Server

SabreApp Server

AmadeusApp Server

Client

Web Server

DB1 DB2 DB3 DB4

Don't panic! Peer-to-peer apps may be even worse.

Page 7: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

OutlineOutline Problem Statement and BackgroundProblem Statement and Background Interaction Contracts FrameworkInteraction Contracts Framework

• Formal Specification of the Formal Specification of the Committed ICCommitted IC • Verification of IC's with model checkingVerification of IC's with model checking• Verification of Web Service IC ModelVerification of Web Service IC Model

Implementation: Exactly-Once Web Implementation: Exactly-Once Web Service (EOS)Service (EOS)• OverviewOverview• EOS-PHPEOS-PHP• DemoDemo

SummarySummary

Page 8: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

IC FrameworkIC Framework Components Components andand Guarantees Guarantees

• Persistent PcomPersistent Pcom: : PersistentPersistent, , testabletestable state and messagesstate and messages

• External Xcom (e.g., humans): External Xcom (e.g., humans): No No guaranteesguarantees

Interaction ContractsInteraction Contracts• Xcom Xcom Pcom = External IC (XIC) Pcom = External IC (XIC)• Pcom Pcom Pcom = Committed IC (CIC) Pcom = Committed IC (CIC)

Exactly-Once SemanticsExactly-Once Semantics• Forget rollbacks, exactly-once execution is Forget rollbacks, exactly-once execution is

guaranteedguaranteed

Page 9: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Pcom DesignPcom Design Redo Log & Recovery ManagersRedo Log & Recovery Managers Piecewise determinismPiecewise determinism + Logging = + Logging =

Full DeterminismFull Determinism Deterministic replay Deterministic replay recovers Pcom'srecovers Pcom's Installation PointsInstallation Points speed up replay speed up replay Failure modelFailure model

• CrashesCrashes• Message lossesMessage losses• No malicious manipulationsNo malicious manipulations• No disk corruption (sufficient redundancy)No disk corruption (sufficient redundancy)

Transient failures due tonondeterministic Heisenbugs

Page 10: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

CIC's Informal DesignCIC's Informal Design CIC sender (Pcom) obligationsCIC sender (Pcom) obligations

• Persist state before sendPersist state before send• Tag message with a Tag message with a MSNMSN• Resend on timeout until Resend on timeout until stable stable ackack• Resend on receiver's Resend on receiver's "get msg""get msg"• Forget interaction on Forget interaction on installed installed ackack

CIC receiver (Pcom) obligationsCIC receiver (Pcom) obligations• Eliminates duplicates by Eliminates duplicates by MSN'sMSN's• Persists interaction before Persists interaction before stable stable ackack• "gets msg""gets msg" if msg is not in log after failure if msg is not in log after failure• Ensures autonomous recovery before Ensures autonomous recovery before

installed installed ackack

Page 11: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Committed IC ActivitiesCommitted IC Activities ActivitychartActivitychart = Functional View= Functional View

CIC_AC

@CIC_SCFAILURE_PRONE_ENVIRONMENT

RCVR_CRASH

SNDR_CRASH

LINK_OUTAGE

CIC_SNDR_AC CIC_RCVR_ACSEND_MSG

STABLE

INSTALLED

@CIC_SNDR_SC@CIC_RCVR_SC

EXTERNAL_APP_LOGIC

SNDR_TRIGGER MSG_PROCESSED

GET_MSG

SYSTEM_ADMINISTRATOR

ICIC

TIMEOUTS

Page 12: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Committed IC MonitorCommitted IC Monitor Statechart Statechart = Behavioral View= Behavioral View

• Finite State Automaton (FSA) +Finite State Automaton (FSA) +• NestingNesting + + OrthogonalOrthogonal substates + substates +• EE[[CC]/]/AA transitions: on transitions: on EEvent while vent while CConditionondition

Leave source, enter target, execute Leave source, enter target, execute AActionction E.g., E.g., AA = = E' E' means generate event means generate event E'E'

• ConfigurationConfiguration = set of entered states = set of entered states• Execution contextExecution context = variable valuation = variable valuation

StepStepii: : confconfii ctxtctxtii confconfi+1i+1 ctxtctxti+1i+1

CIC_SC CIC_SC

SENDING

RECEIVING

(not SNDR_CRASH)[not active(CIC_SNDR_AC) ]/start!(CIC_SNDR_AC)

SENDING

RECEIVING (not RCVR_CRASH)[not active(CIC_RCVR_AC)]/start!(CIC_RCVR_AC)

SNDR_SSNDR_S

RCVR_SRCVR_S

Page 13: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Committed IC SenderCommitted IC Sender

* * EVENT_OK = EVENT EVENT_OK = EVENT LINK_OUTAGELINK_OUTAGE

STABLE_S

SENDING INSTALLED_S

RECOVERY

MSG_LOOKUP

PREPARE_PERSISTENCE

SNDR_MSG_TM andnot (STABLE_OK or

INSTALLED_OK)/SEND_MSG

SNDR_ND/SEND_MSG SNDR_TRIGGER

[SNDR_LAST_LOGGED=='']/SNDR_ND

MSG_RECOVERED_TM/SEND_MSG GET_MSG_OK

[SNDR_LAST_LOGGED=='INSTALLED']

INSTALLED_OK/SNDR_LAST_LOGGED:='INSTALLED'

STABLE_OK SNDR_STABLE_TM andnot (INSTALLED_OK or GET_MSG_OK)/IS_INSTALLED

CIC_SNDR_SC

STABLE_S

SENDING

MSG_LOOKUP

SNDR_MSG_TM and

INSTALLED_OK)/SEND_MSG

SNDR_ND/SEND_MSG

[SNDR_LAST_LOGGED=='']/SNDR_ND

MSG_RECOVERED_TM/SEND_MSG GET_MSG_OK

INSTALLED_OK/

SNDR_STABLE_TM andnot (INSTALLED_OK or GET_MSG_OK)/IS_INSTALLED

SNDR_CRASH

T T

STABLE_S

SENDING

MSG_LOOKUP

SNDR_MSG_TM and

INSTALLED_OK)/SEND_MSG

SNDR_ND/SEND_MSG

[SNDR_LAST_LOGGED=='']/SNDR_ND

MSG_RECOVERED_TM/SEND_MSG GET_MSG_OK

INSTALLED_OK/

SNDR_STABLE_TM andnot (INSTALLED_OK or GET_MSG_OK)/IS_INSTALLED

CIC_SNDR_SC

STABLE_S

SENDING

MSG_LOOKUP

INSTALLED_OK/

SNDR_MSG_TM and

INSTALLED_OK)/SEND_MSG

SNDR_ND/SEND_MSG

SNDR_LAST_LOGGEDSNDR_ND

MSG_RECOVERED_TM/SEND_MSG GET_MSG_OK

INSTALLED_OK/

SNDR_STABLE_TM andnot (INSTALLED_OK or GET_MSG_OK)/IS_INSTALLED

T T

SNDR_LAST_LOGGED:='INSTALLED'

_TM means TIMEOUT_TM means TIMEOUT

Page 14: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Committed IC ReceiverCommitted IC ReceiverMSG_RECOVERY

STABLE_R INSTALLED_R

MSG_RECEIVED RECOVERY

MSG_PROCESSED

RCVR_INSTALL_TM/ RCVR_LAST_LOGGED:='INSTALLED'; INSTALLED

[RCVR_LAST_LOGGED=='INSTALLED'] [RCVR_LAST_LOGGED=='STABLE']

SEND_MSG_OK

[RCVR_LAST_LOGGED=='STABLE']/GET_MSG

[ICIC]/RCVR_LAST_LOGGED:='INSTALLED';INSTALLED

MSG_EXEC_TM/ RECEIVED;

( RCVR_STABLE_TM or RCVR_ND [MSG_ORDER_MATTERS] )[not ICIC and RCVR_LAST_LOGGED=='']/RCVR_LAST_LOGGED:='STABLE';

SEND_MSG_OK[RCVR_LAST_LOGGED=='']

not SEND_MSG_OKandGET_MSG_TM/GET_MSG

RCVR_CRASH

T

CIC_RCVR_SC

MSG_RECEIVED RECOVERY

MSG_PROCESSED

[RCVR_LAST_LOGGED=='INSTALLED'] [RCVR_LAST_LOGGED=='STABLE']

SEND_MSG_OK

[RCVR_LAST_LOGGED=='STABLE']/GET_MSG

[ICIC]/RCVR_LAST_LOGGED:='INSTALLED';INSTALLED

MSG_EXEC_TM/ RECEIVED;

[not ICIC and RCVR_LAST_LOGGED=='']/RCVR_LAST_LOGGED:='STABLE';

SEND_MSG_OK[RCVR_LAST_LOGGED=='']

not SEND_MSG_OKandGET_MSG_TM/GET_MSG

RCVR_CRASH

T

SEND_MSG or IS_INSTALLED/ SEND_MSG or IS_INSTALLED/INSTALLED

STABLE_R INSTALLED_R

MSG_RECEIVED RECOVERY

MSG_PROCESSED

[RCVR_LAST_LOGGED=='INSTALLED'] [RCVR_LAST_LOGGED=='STABLE']

SEND_MSG_OK

[RCVR_LAST_LOGGED=='STABLE']/GET_MSG

[ICIC]/RCVR_LAST_LOGGED:='INSTALLED';INSTALLED

MSG_EXEC_TM/ RECEIVED;

STABLE

SEND_MSG_OK[RCVR_LAST_LOGGED=='']

not SEND_MSG_OKandGET_MSG_TM/GET_MSG

RCVR_CRASH

T

CIC_RCVR_SC

MSG_RECEIVED RECOVERY

MSG_PROCESSED

[RCVR_LAST_LOGGED=='INSTALLED'] [RCVR_LAST_LOGGED=='STABLE']

SEND_MSG_OK

[RCVR_LAST_LOGGED=='STABLE']/GET_MSG

[ICIC]/RCVR_LAST_LOGGED:='INSTALLED';INSTALLED

MSG_EXEC_TM/ RECEIVED;

SEND_MSG_OK[RCVR_LAST_LOGGED=='']

not SEND_MSG_OKandGET_MSG_TM/GET_MSG

RCVR_CRASH

T

SEND_MSG or IS_INSTALLED/STABLE

SEND_MSG or IS_INSTALLED/INSTALLED

* EVENT_OK = EVENT LINK_OUTAGE, _TM means TIMEOUT

RCVR_LAST_LOGGED:='INSTALLED'

Page 15: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Execution AbstractionExecution Abstraction

Kripke structureKripke structure KK=(=(SS,,RR,,LL)) over over PP• P is a finite set of atomic propositions• Software:Software: PP is a union of all memory bitsis a union of all memory bits• SS finite set of states finite set of states• RR SS SS state transitions state transitions• L L SS PP { {true, falsetrue, false}} valuationvaluation• Non-determinism to determinism Non-determinism to determinism

Computation Tree vs. Sequence Computation Tree vs. Sequence

p

p

q

p

q

pp, , qq PP

Page 16: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Computation Tree LogicComputation Tree Logic Basic SyntaxBasic Syntax

• Atomic propositions P Atomic propositions P CTL( CTL(PP))• If If p,p, qq CTL( CTL(PP), then so are), then so are

Propositional logic formulas (Propositional logic formulas (pp,, p p q, etc. q, etc.)) Path quantifiers Path quantifiers EExists, xists, AAll + ll + modality modality neneXXtt, , UUntilntil EX pEX p {{E, AE, A} (} (p U qp U q))

Derived SyntaxDerived Syntax AX pAX p ((EX EX p p )) A FA Finallyinally p p A A ((true U ptrue U p)) EF pEF p E E ((true U ptrue U p)) A GA Globallylobally p p ( ( E E ((true U true U pp) )) ) EG pEG p ( ( A A ((true U true U pp) )) )

Page 17: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

CIC VerificationCIC Verification SafetySafety

For all For all loglog values values vv {{'stable', 'installed''stable', 'installed'}}

AGAG ( ( writtenwritten((loglog) ) log log == vv AX AGAX AG ¬( ¬(writtenwritten((loglog) ) log log == vv) ) ) )

i.e., a value is written at most oncei.e., a value is written at most once Liveness for timeouts < 30 stepsLiveness for timeouts < 30 steps

• FF< < nn eventually after at most eventually after at most nn steps steps

• AFAF<<500500 AGAG ¬ ¬failuresfailures AFAF<700<700 CIC installedCIC installed

Page 18: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

HTML_PROMPT

IC's & Web ServiceIC's & Web Service

USER1_REQ

@USER1_SC

XACT_UPDATE<TIC_AC

BROWSER_INPUT<XIC_I_AC

BROWSER_OUTPUT <XIC_O_AC

APPSRVR2_REP <CIC_AC

APPSRVR1_REQ<CIC_AC

APPSRVR2_REQ<CIC_AC

APPSRVR1_REP<CIC_AC

WEBSRVR_REP <CIC_AC

WEBSRVR_REQ<CIC_AC

CUSTOMER

BUTTON_CLICKED HTML_REPLY

CLICK_CAPTURED

WEBSRVR_REQ_RCVD

APPSRVR1_REQ_RCVD

APPSRVR2_REP_RCVD APPSRVR1_REP_RCVD

WEBSRVR_REP_RCVD

LOCAL_FAILURES

BROWSER_CRASH,XACT_{USER, INTERNAL}_ABORT,BROWSER_WEBSRVR_LINK_OUTAGE

GLOBAL_FAILURES

WEBSERVER_CRASH, APPSERVER{1;2}_CRASH, DBSRVR_CRASH,WEB_APP{1,2}_LINK_OUTAGE, APP1_DB_LINK_OUTAGE

XACT_COMMITTED

APPSRVR2_REQ_RCVD

USER1_REQ

@USER1_SC

XACT_UPDATE<TIC_AC

BROWSER_INPUT<XIC_I_AC

BROWSER_OUTPUT <XIC_O_AC

APPSRVR2_REP <CIC_AC

APPSRVR1_REQ<CIC_AC

APPSRVR2_REQ<CIC_AC

APPSRVR1_REP<CIC_AC

WEBSRVR_REP <CIC_AC

WEBSRVR_REQ<CIC_AC

CUSTOMER

LOCAL_FAILURES

BROWSER_CRASH,XACT_{USER, INTERNAL}_ABORT,BROWSER_WEBSRVR_LINK_OUTAGE

GLOBAL_FAILURES

WEBSERVER_CRASH, APPSERVER{1;2}_CRASH, DBSRVR_CRASH,WEB_APP{1,2}_LINK_OUTAGE, APP1_DB_LINK_OUTAGE

Web server reply's SNDR_ND =App server replies' RCVR_ND = WEBSRVR_ND, i.e., commits app server reply order

AG websrvr_rep:send_msg

i=1,2 (appsrvri:rcvr_log=’stable' appsrvri:rcvr_log=’installed' )

Page 19: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Explicit Model CheckingExplicit Model Checking For For KK = ( = (SS, , RR, , LL) over ) over P, s P, s S, f S, f CTLCTL((PP))

• ss |= |= ff,, f f P P LL((ss, , ff) = ) = truetrue

• s s |= |= ff,, f = f =ff11 s s|| ff11

• ss |= |= ff,, f = f f = f11 ff2 2 ss|= |= ff11 or or ss|= |= ff22

• ss |= |= ff,, f = f = EXEX f f ((ss, , rr) ) R R withwith r r|= |= ff

• ss |= |= ff,, f = f = EE((ff11 UU ff22)) ifif s s already checked then already checked then falsefalse else checkelse check ifif s s|= |= ff2 2 then then truetrue if if ss|= |= ff1 1 andand ((ss, , rr) ) R with rR with r|= |= ff then then truetrue

• ss|= |= ff,, f = f = AA((ff11 UU ff22)) ifif s s already checked then already checked then false false else checkelse check ifif s s|= |= ff22then then truetrue ifif ss|= |= ff11 andand ((ss, , rr) ) R withR with rr|= |= ff

Page 20: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Verification Run-TimesVerification Run-Times

Property/Specification TypeProperty/Specification Type OBDD sizeOBDD sizeVerification Verification TimeTime

IC-level safetyInteger Timeout ~104 ~5 seconds

Nondeterministic Timeout

~103 ~1sec.

IC-level liveness

Integer Timeout ~106 ~10 hours

Nondeterministic Timeout

~105 ~10 hours

1-user WS safety

Integer Timeout ~107 Not terminated

Nondeterministic Timeout

~106 ~10 hours

Page 21: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

OutlineOutline Problem Statement and BackgroundProblem Statement and Background Interaction Contracts FrameworkInteraction Contracts Framework

• Formal Specification of the Formal Specification of the Committed ICCommitted IC • Verification of IC's with model checkingVerification of IC's with model checking• Verification of Web Service IC ModelVerification of Web Service IC Model

Implementation: Exactly-Once Web Implementation: Exactly-Once Web Service (EOS)Service (EOS)• OverviewOverview• EOS-PHPEOS-PHP• DemoDemo

SummarySummary

Page 22: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Zend Engine

Session CURL

Zend Engine

Session CURL

PHP and Zend EnginePHP and Zend Engine

Zend Engine

Session CURL

Web Client

Web Client

Web Client

Web Client1.1. <html><html>

2.2. <?php <?php 3.3. session_start(); session_start(); 4.4. $HTTP_SESSION_VARS["count"]++; $HTTP_SESSION_VARS["count"]++; 5.5. printf("Script called printf("Script called %i%i times", times", 6.6. $HTTP_SESSION_VARS["count"]$HTTP_SESSION_VARS["count"]););

7.7. $ch = curl_init("http://eos-php.net/b2b.php");$ch = curl_init("http://eos-php.net/b2b.php");8.8. $b2b_reply = curl_exec($ch);$b2b_reply = curl_exec($ch);9.9. printf("Other server reports: printf("Other server reports: %s%s", ", $b2b_reply$b2b_reply););10.10. curl_close($ch);curl_close($ch);11.11.?>?>12.12.</html></html>

<html><html>Script called Script called 55 times timesOther server reports: Other server reports: Script called 1000 timesScript called 1000 times

</html></html>

Page 23: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

EOSEOS Exactly-once semantics withExactly-once semantics with

• Transparent browser recoveryTransparent browser recovery• Concurrent accesses to shared dataConcurrent accesses to shared data• Nondeterm. functions: Nondeterm. functions: timetime, , curl_execcurl_exec, , randrand • Any Any nn in in nn-tier, any fanout-tier, any fanout• Failure masking: Failure masking: no changesno changes to app codeto app code

neither to PHP scripts, nor to the browserneither to PHP scripts, nor to the browser

Performance enhancements (side effects)Performance enhancements (side effects)• Log structured data access (sequential I/O)Log structured data access (sequential I/O)• LRU buffers for state and log data LRU buffers for state and log data • Latches (Shared/Exclusive)Latches (Shared/Exclusive)• session_startsession_start((boolbool $read_only$read_only))

Page 24: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Experiment SetupExperiment Setup

Backend Backend ServerServerP4 3Ghz, 1GBP4 3Ghz, 1GB

Frontend Frontend ServerServerP4 3Ghz, 1GBP4 3Ghz, 1GB

shared shared countcount

1234123412351235

private count

23

private count

23

private count

21

private private countcount

2233

POST (ICIC)POST (ICIC)action=incrementaction=incrementb2b=trueb2b=true

12351235<html><html><p>Privatel Count: 3<p>Privatel Count: 3<p>Shared Count: 1235<p>Shared Count: 1235</html></html>

POST (ICIC)POST (ICIC)action=incrementaction=increment

Web Web ClientClient

eBay-like auction serviceeBay-like auction service User settings at frontend (private)User settings at frontend (private) Auction items at backend (shared)Auction items at backend (shared) 5 concurrent end users, synthetic load5 concurrent end users, synthetic load

Page 25: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Run-Time OverheadRun-Time Overhead

BackendServer

FrontendServer

shared count

12341235

private count

23

private count

23

private count

21

private count

23

POST (ICIC)action=incrementb2b=true

1235<html><p>Privatel Count: 3<p>Shared Count: 1235</html>

POST (ICIC)action=increment

Web Client

  SessionSession 1 step 1 step 5 steps5 steps 10 steps10 stepsPHP elapsed time [sec]PHP elapsed time [sec] 0.15600.1560 0.79000.7900 1.61001.6100

EOS-PHP elapsed time [sec]EOS-PHP elapsed time [sec] 0.31400.3140 1.68501.6850 3.10003.1000

Overhead (elapsed time) [%]Overhead (elapsed time) [%] 101%101% 113%113% 93%93%

PHP frontend CPU time [sec]PHP frontend CPU time [sec] 0.03900.0390 0.27080.2708 0.57270.5727

EOS-PHP frontend CPU time [sec]EOS-PHP frontend CPU time [sec] 0.08150.0815 0.60000.6000 1.15451.1545

Overhead (frontend CPU) [%]Overhead (frontend CPU) [%] 109%109% 122%122% 102%102%

PHP backend CPU time [sec]PHP backend CPU time [sec] 0.00900.0090 0.05500.0550 0.12000.1200

EOS-PHP backend CPU time [sec]EOS-PHP backend CPU time [sec] 0.01300.0130 0.07500.0750 0.16000.1600

Overhead (backend CPU) [%]Overhead (backend CPU) [%] 44%44% 36%36% 33%33%

Page 26: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

OutlineOutline Problem Statement and BackgroundProblem Statement and Background Interaction Contracts FrameworkInteraction Contracts Framework

• Formal Specification of the Formal Specification of the Committed ICCommitted IC • Verification of IC's with model checkingVerification of IC's with model checking• Verification of Web Service IC ModelVerification of Web Service IC Model

Implementation: Exactly-Once Web Implementation: Exactly-Once Web Service (EOS)Service (EOS)• OverviewOverview• EOS-PHPEOS-PHP• DemoDemo

SummarySummary

Page 27: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

SummarySummary Generic IC framework specificationGeneric IC framework specification

Formal verification at IC and app levelFormal verification at IC and app level• To do: Overcome "model checking" non-To do: Overcome "model checking" non-

scalabilityscalability

Efficient implementation: EOSEfficient implementation: EOS• Rigorous recovery guaranteesRigorous recovery guarantees

Based on the formal verified modelsBased on the formal verified models

• Many enhancements to PHPMany enhancements to PHP LRU buffer managementLRU buffer management Mostly sequential disk accessesMostly sequential disk accesses Concurrency control with latchesConcurrency control with latches

Page 28: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

EOS DemoEOS Demo

USER 1

BackendServer

FrontendServer

B2B_LINKB2C_LINK

Page 29: Integrated Data, Message, and Process Recovery for Failure Masking in Web Services

Thank You!Thank You!