clr reliability under memory exhaustion

Post on 30-Jan-2016

28 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

CLR Reliability under Memory Exhaustion. Solomon Boulos. Temporary Memory Exhaustion causes failures. Out of Memory (OOM) is temporary Shouldn’t cause failure Just wait for memory to become available System take action to free up memory All managed code depends on CLR Testing is difficult - PowerPoint PPT Presentation

TRANSCRIPT

07/09/04 Windows Reliability Team 1

CLR Reliability under Memory Exhaustion

Solomon Boulos

07/09/04 Windows Reliability Team 2

Temporary Memory Exhaustion causes failures

• Out of Memory (OOM) is temporary• Shouldn’t cause failure

– Just wait for memory to become available– System take action to free up memory

• All managed code depends on CLR• Testing is difficult

– Exceptions are objects– Boxing (casting value type to object)– JIT compilation

07/09/04 Windows Reliability Team 3

Overview

• Previous Work– Reliability Working Group– Improvements for Whidbey

• OOM behavior– Everett (CLR v1.1)– Whidbey (CLR v2.0)– WinFX

• Solutions– Transactions– Recovery

07/09/04 Windows Reliability Team 4

Reliability Working Group

• Discussion of CLR reliability issues

• Interaction with Yukon and Avalon teams

• FailFast Behavior

• Controversial Decisions

• Fault Injection

07/09/04 Windows Reliability Team 5

Improvements for Whidbey

• CLR hardened to Out of Memory (OOM)

• Constrained Execution Regions (CERs)– Eagerly Prepared (No JIT Compiling)– Blocks ThreadAbort

• Reliability Contracts– Describes reliability attributes of code– Allows for function calls within CER

• Unhandled Exception Policy

07/09/04 Windows Reliability Team 6

My Approach

• Exhaust Memory (Not fault injection)

• Find failure points

• Consistently reproduce results

• Examine underlying causes

• Develop solutions

07/09/04 Windows Reliability Team 7

Everett OOM Behavior

• Different classes of failures– Catchable Out of Memory (OOM) Exception– Type Initialization Exception– Invalid Program exception from JIT compiler– Fatal OOM Error– Fatal Execution Engine error

07/09/04 Windows Reliability Team 8

Supporting Datavoid ManagedFunction(){

Regex* myReg = new Regex("*");

}Available Memory Observed Behavior

0-5860K Fatal Error

5892-5912K InvalidProgram

5924-5960K TypeInit

5890-Above Success

07/09/04 Windows Reliability Team 9

Fault Injection Examplestatic void Main(string[] args){try

{ // operations in here

}catch ( OutOfMemoryException ){Console.WriteLine(“Nothing should get past me.");}

}

07/09/04 Windows Reliability Team 10

Whidbey OOM Behavior

• See OOM Exception instead of– TypeInit– InvalidProgram

• Exception to Native host is COMPlusException– Not very helpful

• Fatal OOM only during initialization– Initialization can be large though (e.g. 10MB)

• CERs provide defense, but dangerous– CER { for (;;) } cannot be stopped

• Reliability Contracts = Honor System

07/09/04 Windows Reliability Team 11

• Swallows exceptions

• Shell– Crashes and restarts

• WinFS– Silent Process Failure

• Indigo– False Completion

WinFX Case Studies

Base OSBase OS

Whidbey

WinFX

07/09/04 Windows Reliability Team 12

Shell Failure

• Exhaust System Memory

• CLR throws OOM Exception

• Shell doesn’t catch

• Escalates to unhandled Win32 exception

• Shell crashes and restarts– Major disruption to user

07/09/04 Windows Reliability Team 13

WinFS Test

• Simple Contact Store Functions– AddContact– RenameContact– RemoveContact– ListContacts– ReachMemory

07/09/04 Windows Reliability Team 14

WinFS Test Normal Execution

• ListContacts() : “No Contacts Found”• AddContact(“Shane”) : Shane is added• ListContacts(): “Shane”• RenameContact(“Shane”, “Bob”): Shane is now

Bob• ListContacts(): “Bob”• RemoveContact(“Bob”): Bob is now deleted• ListContacts(): “No Contacts Found”

07/09/04 Windows Reliability Team 15

WinFS Test Stressed Execution

• ListContacts() : “No Contacts Found”

• ReachMemory(8MB): 8MB Available

• AddContact(“Shane”) : Shane should be added

• ListContacts(): “No Contacts Found”

• Process Exits

07/09/04 Windows Reliability Team 16

Indigo Test Specifications

• Client::SendMessage(): – Sends message to server and prints confirmation of

sending.

• Client::ReceiveMessage(): – Prints received message.

• Server::SendMessage(): – Sends message to client and prints confirmation of

sending.

• Server::ReceiveMessage(): – Prints message and responds with SendMessage()

07/09/04 Windows Reliability Team 17

Indigo Test Behavior

• Normal Execution– Client::SendMessage()– Server::ReceiveMessage()– Server::SendMessage()– Client::ReceiveMessage()

• Execution with Memory Pressure– Client::SendMessage()– Server::ReceiveMessage()– Server::ExhaustMemory()– Server::SendMessage()– Client never receives message

07/09/04 Windows Reliability Team 18

Solutions

• Transactions– In Memory– Durable (backed by disk)

• Recovery– Creates Recovery Log– Allows state restore

07/09/04 Windows Reliability Team 19

Transaction Participantpublic TransactionParticipant(String _originalValue)

{ originalValue = _originalValue;

result = originalValue;}

public void Prepare(IPreparingEnlistment pe){ // do work for transactionresult = "New Value";// all is well, vote preparedpe.Prepared();

}

07/09/04 Windows Reliability Team 20

Transaction Participant Continuedpublic void Commit(IEnlistment e){

// no work to do, vote done e.EnlistmentDone();}public void Rollback(IEnlistment e){

// restore originalValue result = originalValue; if ( null != e ) e.EnlistmentDone();}

07/09/04 Windows Reliability Team 21

Simple Transaction ExampleTransactionParticipant tp = new TransactionParticipant(txtInput.Text);

try

{

using (TransactionScope s = new TransactionScope()){

Transaction.Current.VolatileEnlist(tp,false);

s.Consistent = true;

}

}

catch (TransactionAbortedException){}

txtInput.Text = tp.Result;

07/09/04 Windows Reliability Team 22

rNotepad Techniques

• Log user work– KeyPressed Records– Resize Records

• Write work to log file every second

• Write checkpoint every 30 seconds

• Upon startup, recover– Checkpoint speeds up recovery

07/09/04 Windows Reliability Team 23

Conclusion

• Testing is difficult but possible

• Temporary memory pressure shouldn’t cause failures

• Transactions and Recovery can provide resilient and recoverable solutions

07/09/04 Windows Reliability Team 24

Questions?

• More info athttp://windows/sites/reliavuls/CLR/default.aspx

top related