fine-grained fault tolerance using device...

104
Fine-Grained Fault Tolerance using Device Checkpoints Asim Kadav with Matthew Renzelmann and Michael M. Swift University of Wisconsin-Madison 1

Upload: lyxuyen

Post on 05-May-2018

218 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Fine-Grained Fault Tolerance using Device Checkpoints

Asim Kadavwith Matthew Renzelmann and Michael M. Swift

University of Wisconsin-Madison

1

Page 2: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

The (old) elephant in the room

2

device drivers

(majority of kernel code)

3rd party developers

+

OSkernel

2

Page 3: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

The (old) elephant in the room

2

device drivers

(majority of kernel code)

3rd party developers

+

OSkernel

2

Page 4: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

The (old) elephant in the room

2

device drivers

(majority of kernel code)

3rd party developers

+

OSkernel

Recipe for

disaster

2

Page 5: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Improvement System Validation Validation Validation Improvement SystemDrivers Bus Classes

Isolation Nooks [SOSP 03] 6 1 2

XFI [OSDI 06] 2 1 1

CuriOS [OSDI 08] 2 1 2

Type Safety SafeDrive [OSDI 06] 6 2 3

Singularity [Eurosys 06] 1 1 1

Specification Nexus [OSDI 08] 2 1 2

Termite [SOSP 09] 2 1 2

Recovery Shadow Drivers [OSDI 04] 13 1 3

Static analysis tools Windows SDV [Eurosys 06] All All All

Coverity [CACM 10] All All All

Cocinelle [Eurosys 08] All All All

3

Extensive past work on reliability research

3

Page 6: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Improvement System Validation Validation Validation Improvement SystemDrivers Bus Classes

Isolation Nooks [SOSP 03] 6 1 2

XFI [OSDI 06] 2 1 1

CuriOS [OSDI 08] 2 1 2

Type Safety SafeDrive [OSDI 06] 6 2 3

Singularity [Eurosys 06] 1 1 1

Specification Nexus [OSDI 08] 2 1 2

Termite [SOSP 09] 2 1 2

Recovery Shadow Drivers [OSDI 04] 13 1 3

Static analysis tools Windows SDV [Eurosys 06] All All All

Coverity [CACM 10] All All All

Cocinelle [Eurosys 08] All All All

3

Extensive past work on reliability research

3

Page 7: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Improvement System Validation Validation Validation Improvement SystemDrivers Bus Classes

Isolation Nooks [SOSP 03] 6 1 2

XFI [OSDI 06] 2 1 1

CuriOS [OSDI 08] 2 1 2

Type Safety SafeDrive [OSDI 06] 6 2 3

Singularity [Eurosys 06] 1 1 1

Specification Nexus [OSDI 08] 2 1 2

Termite [SOSP 09] 2 1 2

Recovery Shadow Drivers [OSDI 04] 13 1 3

Static analysis tools Windows SDV [Eurosys 06] All All All

Coverity [CACM 10] All All All

Cocinelle [Eurosys 08] All All All

3

Extensive past work on reliability research

3

Page 8: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Improvement System Validation Validation Validation Improvement SystemDrivers Bus Classes

Isolation Nooks [SOSP 03] 6 1 2

XFI [OSDI 06] 2 1 1

CuriOS [OSDI 08] 2 1 2

Type Safety SafeDrive [OSDI 06] 6 2 3

Singularity [Eurosys 06] 1 1 1

Specification Nexus [OSDI 08] 2 1 2

Termite [SOSP 09] 2 1 2

Recovery Shadow Drivers [OSDI 04] 13 1 3

Static analysis tools Windows SDV [Eurosys 06] All All All

Coverity [CACM 10] All All All

Cocinelle [Eurosys 08] All All All

3

Observation 1: Solutions that limit changes to kernel and apply to lots of drivers have real impact

Extensive past work on reliability research

3

Page 9: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Improvement System Validation Validation Validation Improvement SystemDrivers Bus Classes

Isolation Nooks [SOSP 03] 6 1 2

XFI [OSDI 06] 2 1 1

CuriOS [OSDI 08] 2 1 2

Type Safety SafeDrive [OSDI 06] 6 2 3

Singularity [Eurosys 06] 1 1 1

Specification Nexus [OSDI 08] 2 1 2

Termite [SOSP 09] 2 1 2

Recovery Shadow Drivers [OSDI 04] 13 1 3

Static analysis tools Windows SDV [Eurosys 06] All All All

Coverity [CACM 10] All All All

Cocinelle [Eurosys 08] All All All

3

Extensive past work on reliability research

Observation 2: Most systems focus on improving isolation and detection and not on recovery

3

Page 10: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Driver failure recovery limited to driver restart

★ Restart driver upon failure★ Safedrive and MINIX approach★ Can break applications

Device Driver

Device

Driver-Kernel Interface

4

Applications

Kernel

Shadow drivers

4

Page 11: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Driver failure recovery limited to driver restart

★ Restart driver upon failure★ Safedrive and MINIX approach★ Can break applications

Device Driver

Device

Driver-Kernel Interface

4

Applications

Kernel

Shadow drivers

4

Page 12: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Driver failure recovery limited to driver restart

★ Restart driver upon failure★ Safedrive and MINIX approach★ Can break applications

Device Driver

Device

Shadow Driver

Driver-Kernel Interface

4

Applications

Kernel

★ Restart and replay upon failure★ Shadow driver approach ★ Always record state of driver★ Perform restart and log replay

upon failure★ Transparent to applications

Shadow drivers

4

Page 13: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Problem 1: Restart based driver recovery is slow

5

0ms

500ms

1,000ms

1,500ms

2,000ms

8139too e1000 ens1371 psmouse

Restart times

net net sound input

5

Page 14: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Problem 1: Restart based driver recovery is slow

5

Shadow drivers restart the driver upon failure which can be slow

0ms

500ms

1,000ms

1,500ms

2,000ms

8139too e1000 ens1371 psmouse

Restart times

net net sound input

5

Page 15: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Driver re-initialization probes hardware again

6

Allocate device structures

Set chipset specific ops

Map BAR and I/O ports

Register device operations

Detect chipset capabilities

Cold boot device Verify EEPROM checksum

Device self test

Configure device

Device ready

6

Page 16: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Driver re-initialization probes hardware again

6

Allocate device structures

Set chipset specific ops

Map BAR and I/O ports

Register device operations

Detect chipset capabilities

Cold boot device Verify EEPROM checksum

Device self test

Configure device

Device ready

6

Page 17: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Driver re-initialization probes hardware again

6

★ What does slow device re-initialization hurt?★ Fault tolerance: Driver recovery★ Virtualization: Live migration ★ OS functions: Fast reboot

Allocate device structures

Set chipset specific ops

Map BAR and I/O ports

Register device operations

Detect chipset capabilities

Cold boot device Verify EEPROM checksum

Device self test

Configure device

Device ready

6

Page 18: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Problem 2: Shadow drivers assume drivers follow class behavior

7

★ Class definition includes:★ Callbacks registered with the bus,

device and kernel subsystem

networkdriver

bus

net devicesubsystem

kernel

probe

xmit

confignetwork

card

shadow drivers

7

Page 19: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Problem 2: Shadow drivers assume drivers follow class behavior

7

How many drivers follow class behavior and how much code does this add and

★ Class definition includes:★ Callbacks registered with the bus,

device and kernel subsystem

networkdriver

bus

net devicesubsystem

kernel

probe

xmit

confignetwork

card

shadow drivers

7

Page 20: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Problem 2(a): Drivers do behave outside class definitions

★ Non-class behavior that affects recovery:- procfs/sysfs interactions and unique ioctls

8

$  echo  1  >  /sys/class/sound/mixer/device/enable

Windows WLAN card config via private ioctls

Linux sound card config via sysfs

8

Page 21: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Problem 2(a): Drivers do behave outside class definitions

★ Non-class behavior that affects recovery:- procfs/sysfs interactions and unique ioctls

8

At least 16% of drivers have non-class behavior and may not recover correctly using shadow drivers

$  echo  1  >  /sys/class/sound/mixer/device/enable

Windows WLAN card config via private ioctls

Linux sound card config via sysfs

8

Page 22: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Problem 2(b): Too many classes

9★ “Understanding Modern Device Drivers” ASPLOS 2012

ata (1%)

cdrom

ide

md (RAID)

mmc

network RAID

mtd (1.5%)scsi (9.6%)floppy

tape

acpiblue tooth

crypto

fire wire

gpu (3.9%)

inputjoy stick

key board

mouse

touch screentablet game port

serio

leds

media (10.5%)

isdn (3.4%)

sound (10%)

pcm

midi

mixer

thermal

tty

char (52%)

block (16%)net (27%)

other (5%)

atm

ethernet

infiniband

wireless

wimax

token ring

Linux

Device Drivers

gpio

tpmserial

display

lcd

back light

video (5.2%)

pata

disk

sata

disk

fiber channel

iscsi

usb-storageosd

raid

drm

vga

bus drivers

xen/lguest

dma/pci libs

video

radio

digital video broadcasting

wan

uwb

driver libraries

9

Page 23: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Problem 2(b): Too many classes

9★ “Understanding Modern Device Drivers” ASPLOS 2012

ata (1%)

cdrom

ide

md (RAID)

mmc

network RAID

mtd (1.5%)scsi (9.6%)floppy

tape

acpiblue tooth

crypto

fire wire

gpu (3.9%)

inputjoy stick

key board

mouse

touch screentablet game port

serio

leds

media (10.5%)

isdn (3.4%)

sound (10%)

pcm

midi

mixer

thermal

tty

char (52%)

block (16%)net (27%)

other (5%)

atm

ethernet

infiniband

wireless

wimax

token ring

Linux

Device Drivers

gpio

tpmserial

display

lcd

back light

video (5.2%)

pata

disk

sata

disk

fiber channel

iscsi

usb-storageosd

raid

drm

vga

bus drivers

xen/lguest

dma/pci libs

video

radio

digital video broadcasting

wan

uwb

driver libraries

Class-specific driver recovery leads to a large kernel recovery subsystem

9

Page 24: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Fine-Grained Fault Tolerance (FGFT)

10

10

Page 25: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Fine-Grained Fault Tolerance (FGFT)

10

Fine-grained Isolation

★ Runs driver entry points like transactions

★ Relies on code generation to limit new code in kernel

10

Page 26: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Fine-Grained Fault Tolerance (FGFT)

10

Fine-grained Isolation

★ Runs driver entry points like transactions

★ Relies on code generation to limit new code in kernel

Checkpoint-based recovery

★ Provides fast and correct recovery semantics

10

Page 27: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Fine-Grained Fault Tolerance (FGFT)

10

Fine-grained Isolation

★ Runs driver entry points like transactions

★ Relies on code generation to limit new code in kernel

★ Requires incremental overhead/changes to drivers

★ Shifts burden of fault tolerance to faulty code

Checkpoint-based recovery

★ Provides fast and correct recovery semantics

10

Page 28: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Outline

11

Introduction

Evaluation and Conclusions

Fine-grained isolation

Checkpoint-based recovery

11

Page 29: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Unit of fault tolerance: Driver entry point

12

networkdriver

network card

probe

xmit

config

12

Page 30: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Unit of fault tolerance: Driver entry point

12

networkdriver

network card

probe

xmit

config

whole driver isolation

12

Page 31: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Unit of fault tolerance: Driver entry point

12

networkdriver

network card

probe

xmit

config

12

Page 32: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Unit of fault tolerance: Driver entry point

12

networkdriver

network card

probe

xmit

config

FGFT isolation

12

Page 33: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Unit of fault tolerance: Driver entry point

12

★ Provide fault tolerance to specific driver entry points

networkdriver

network card

probe

xmit

config

FGFT isolation

12

Page 34: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Unit of fault tolerance: Driver entry point

12

★ Provide fault tolerance to specific driver entry points

networkdriver

network card

probe

xmit

config

★ Can be applied to untested code or code marked suspicious by static or runtime tools

FGFT isolation

12

Page 35: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

netdev

Transactional support through code generation

13

networkdriver

get ringparam

netdev

13

Page 36: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

netdev

Transactional support through code generation

13

networkdriver

get ringparam

netdev

SFInetwork

driver

stubs

stubs

13

Page 37: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

netdev

Transactional support through code generation

13

networkdriver

get ringparam SFInetwork

driver

stubs

stubs

netdev

13

Page 38: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

netdev

Transactional support through code generation

13

Range Table

Address Access rights

0xffffa000 Read

0xffffa008 Write

0xffffa00a Readnetworkdriver

get ringparam SFInetwork

driver

stubs

stubs

netdev

13

Page 39: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

netdev

Transactional support through code generation

13

Range Table

Address Access rights

0xffffa000 Read

0xffffa008 Write

0xffffa00a Read

★ Detects and recovers from: ★ Memory errors like invalid pointer accesses★ Structural errors like malformed structures★ Processor exceptions like divide by zero, stack corruption

networkdriver

get ringparam SFInetwork

driver

stubs

stubs

netdev

13

Page 40: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

result

netdev

Transactional support through code generation

13

Range Table

Address Access rights

0xffffa000 Read

0xffffa008 Write

0xffffa00a Read

★ Detects and recovers from: ★ Memory errors like invalid pointer accesses★ Structural errors like malformed structures★ Processor exceptions like divide by zero, stack corruption

networkdriver

get ringparam SFInetwork

driver

stubs

stubs

netdev

netdev

13

Page 41: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Outline

14

Introduction

Conclusion

Fine-grained isolation

Checkpoint-based recovery

14

Page 42: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Checkpointing drivers is hard★Easy to capture memory state

15

networkdriver

network card

15

Page 43: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Checkpointing drivers is hard★Easy to capture memory state

15

networkdriver

network card

checkpoint

15

Page 44: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Checkpointing drivers is hard★Easy to capture memory state

15

networkdriver

network card

checkpoint

★ Device state is not captured★ Device configuration space

15

Page 45: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Checkpointing drivers is hard★Easy to capture memory state

15

networkdriver

network card

checkpoint

★ Device state is not captured★ Device configuration space★ Internal device registers and counters

15

Page 46: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Checkpointing drivers is hard★Easy to capture memory state

15

networkdriver

network card

checkpoint

★ Device state is not captured★ Device configuration space★ Internal device registers and counters★ Memory buffer addresses used for DMA

15

Page 47: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Checkpointing drivers is hard★Easy to capture memory state

15

networkdriver

network card

checkpoint

★ Device state is not captured★ Device configuration space★ Internal device registers and counters★ Memory buffer addresses used for DMA

★ Unique for every device

15

Page 48: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Checkpointing drivers is hard★Easy to capture memory state

15

networkdriver

network card

checkpoint

★ Device state is not captured★ Device configuration space★ Internal device registers and counters★ Memory buffer addresses used for DMA

★ Unique for every device

Intuition: Operating systems already capture device state during power management

15

Page 49: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Intuition with power management

16

★ Refactor power management code for device checkpoints★ Correct: Developer captures unique device semantics ★ Fast: Avoids probe and latency critical for applications

★ Ask developers to export checkpoint/restore in their drivers

16

Page 50: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Device checkpoint/restore from PM code

17

Save config state

Save register state

Disable device

Save DMA state

Suspend device

Restore config state

Restore register state

Restore or reset DMA state

Re-attach/Enable device

Device Ready

Suspend Resume

17

Page 51: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Device checkpoint/restore from PM code

17

Save config state

Save register state

Save DMA state

Suspend device

Restore config state

Restore register state

Restore or reset DMA state

Re-attach/Enable device

Device Ready

Suspend Resume

17

Page 52: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Device checkpoint/restore from PM code

17

Save config state

Save register state

Save DMA state

Restore config state

Restore register state

Restore or reset DMA state

Re-attach/Enable device

Device Ready

Suspend Resume

17

Page 53: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Device checkpoint/restore from PM code

17

Save config state

Save register state

Save DMA state

Restore config state

Restore register state

Restore or reset DMA state

Re-attach/Enable device

Device Ready

Suspend Resume

17

Page 54: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Device checkpoint/restore from PM code

17

Save config state

Save register state

Save DMA state

Restore config state

Restore register state

Restore or reset DMA state

Re-attach/Enable device

Device Ready

Resume Checkpoint

17

Page 55: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Device checkpoint/restore from PM code

17

Save config state

Save register state

Save DMA state

Restore config state

Restore register state

Restore or reset DMA state

Re-attach/Enable device

Resume Checkpoint

17

Page 56: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Device checkpoint/restore from PM code

17

Save config state

Save register state

Save DMA state

Restore config state

Restore register state

Restore or reset DMA state

Resume Checkpoint

17

Page 57: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Device checkpoint/restore from PM code

17

Save config state

Save register state

Save DMA state

Restore config state

Restore register state

Restore or reset DMA state

RestoreCheckpoint

17

Page 58: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Device checkpoint/restore from PM code

17

Save config state

Save register state

Save DMA state

Restore config state

Restore register state

Restore or reset DMA state

Suspend/resume code provides device checkpoint functionality

RestoreCheckpoint

17

Page 59: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Synergy of isolation and fast checkpoints

18

netdev

networkdriver

netdev

18

Page 60: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Synergy of isolation and fast checkpoints

18

xmit

netdev

networkdriver

netdev

18

Page 61: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Synergy of isolation and fast checkpoints

18

netdev

networkdriver

netdev

get ringparam

18

Page 62: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Synergy of isolation and fast checkpoints

18

netdev

networkdriver

netdev

C

get ringparam

18

Page 63: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Synergy of isolation and fast checkpoints

18

netdev

networkdriver

netdev

SFInetwork

driver

stubs

stubs

C

get ringparam

18

Page 64: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Synergy of isolation and fast checkpoints

18

netdev

networkdriver

netdev

SFInetwork

driver

stubs

stubs

C

get ringparam

18

Page 65: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Synergy of isolation and fast checkpoints

18

netdev netdevnetdev

networkdriver

SFInetwork

driver

stubs

stubs

C

get ringparam

18

Page 66: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Synergy of isolation and fast checkpoints

18

netdev netdevnetdev

networkdriver

SFInetwork

driver

stubs

stubs

C

get ringparam

18

Page 67: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Synergy of isolation and fast checkpoints

18

netdev netdevnetdev Range Table

Address Access rights

0xffffa000 Read

0xffffa008 Write

0xffffa00a Readnetworkdriver

SFInetwork

driver

stubs

stubs

C

get ringparam

18

Page 68: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Synergy of isolation and fast checkpoints

18

netdev netdevnetdev Range Table

Address Access rights

0xffffa000 Read

0xffffa008 Write

0xffffa00a Readnetworkdriver

SFInetwork

driver

stubs

stubs

C

get ringparam

18

Page 69: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Synergy of isolation and fast checkpoints

18

netdev netdevnetdev Range Table

Address Access rights

0xffffa000 Read

0xffffa008 Write

0xffffa00a Readnetworkdriver

SFInetwork

driver

stubs

stubs

C

get ringparam

18

Page 70: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Synergy of isolation and fast checkpoints

18

err R

netdev netdevnetdev Range Table

Address Access rights

0xffffa000 Read

0xffffa008 Write

0xffffa00a Readnetworkdriver

SFInetwork

driver

stubs

stubs

C

get ringparam

18

Page 71: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Synergy of isolation and fast checkpoints

18

err R

FGFT provides transactional execution of driver entry points

netdev netdevnetdev Range Table

Address Access rights

0xffffa000 Read

0xffffa008 Write

0xffffa00a Readnetworkdriver

SFInetwork

driver

stubs

stubs

C

get ringparam

18

Page 72: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

How does this give us transactional execution?

19

19

Page 73: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

How does this give us transactional execution?

19

★ Atomicity: All or nothing execution★ Driver state: Run code in SFI module★ Device state: Explicitly checkpoint/restore state

19

Page 74: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

How does this give us transactional execution?

19

★ Atomicity: All or nothing execution★ Driver state: Run code in SFI module★ Device state: Explicitly checkpoint/restore state

★ Isolation: Serialization to hide incomplete transactions★ Re-use existing device locks to lock driver★ Two phase locking

19

Page 75: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

How does this give us transactional execution?

19

★ Atomicity: All or nothing execution★ Driver state: Run code in SFI module★ Device state: Explicitly checkpoint/restore state

★ Isolation: Serialization to hide incomplete transactions★ Re-use existing device locks to lock driver★ Two phase locking

★ Consistency: Only valid (kernel, driver and device) states★ Higher level mechanisms to rollback external actions★ At most once device action guarantee to applications

19

Page 76: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Outline

20

Introduction

Evaluation & Conclusions

Fine-grained isolation

Checkpoint-based recovery

20

Page 77: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Evaluation platform

21

★ Criterion :★ Latency of recovery: How fast is it?★ Correctness of recovery: How well does it work?★ Incremental effort: How much work is it?★ Performance: How much does it cost?

21

Page 78: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Evaluation platform

21

★ Platform : ★ Implemented in Linux 2.6.29★ 2.5 GHz Intel Core 2 Quad

core w/ 4 GB DDR2 DRAM ★ Six drivers across three classes

★ Criterion :★ Latency of recovery: How fast is it?★ Correctness of recovery: How well does it work?★ Incremental effort: How much work is it?★ Performance: How much does it cost?

Driver Class Bus

8139too net PCIe1000 net PCI

r8169 net PCI

pegasus net USB

psmouse sound PCI

ens1371 input serio

21

Page 79: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Recovery speedup

22

8139too e1000 pegasus r8169 ens1371 psmouse0ms

500ms

1,000ms

1,500ms

2,000msRestart recoveryFGFT recovery

Recovery times

22

Page 80: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Recovery speedup

22

8139too e1000 pegasus r8169 ens1371 psmouse0ms

500ms

1,000ms

1,500ms

2,000ms

680.00

1030.00

120.00150.00

1800.00

310.00

Restart recoveryFGFT recovery

Recovery times

22

Page 81: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Recovery speedup

22

8139too e1000 pegasus r8169 ens1371 psmouse0ms

500ms

1,000ms

1,500ms

2,000ms

680.00

1030.00

120.00150.00

1800.00

310.00410.00

115.000.045.00

295.00

0.07

Restart recoveryFGFT recovery

Recovery times

22

Page 82: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Recovery speedup

22

FGFT provides significant speedup in driver recovery and improves system availability

8139too e1000 pegasus r8169 ens1371 psmouse0ms

500ms

1,000ms

1,500ms

2,000ms

680.00

1030.00

120.00150.00

1800.00

310.00410.00

115.000.045.00

295.00

0.07

Restart recoveryFGFT recovery

Recovery times

22

Page 83: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Static and dynamic fault injection

Driver Injected Faults

Native Crashes

8139too 43 43e1000 47 47

r8169 36 36pegasus 34 33ens1371 22 21

psmouse 46 46TOTAL 258 256

23

23

Page 84: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Static and dynamic fault injection

Driver Injected Faults

Native Crashes

FGFT Crashes

8139too 43 43 NONEe1000 47 47 NONE

r8169 36 36 NONEpegasus 34 33 NONEens1371 22 21 NONE

psmouse 46 46 NONETOTAL 258 256 NONE

23

23

Page 85: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Static and dynamic fault injection

Driver Injected Faults

Native Crashes

FGFT Crashes

8139too 43 43 NONEe1000 47 47 NONE

r8169 36 36 NONEpegasus 34 33 NONEens1371 22 21 NONE

psmouse 46 46 NONETOTAL 258 256 NONE

23

FGFT recovers from multiple failures : 1) restores non-class state and 2) does not affect other threads

23

Page 86: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Programming effort

Driver LOC Isolation annotationsIsolation annotations Recovery additionsRecovery additions

Driverannotations

Kernelannotations

LOC Moved LOC Added

8139too 1, 904 15 20 26 4

e1000 13, 973 32 32 10r8169 2, 993 10 17 5pegasus 1, 541 26 12 22 5ens1371 2, 110 23 66 16 6psmouse 2, 448 11 19 19 6

24

24

Page 87: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Programming effort

Driver LOC Isolation annotationsIsolation annotations Recovery additionsRecovery additions

Driverannotations

Kernelannotations

LOC Moved LOC Added

8139too 1, 904 15 20 26 4

e1000 13, 973 32 32 10r8169 2, 993 10 17 5pegasus 1, 541 26 12 22 5ens1371 2, 110 23 66 16 6psmouse 2, 448 11 19 19 6

24

FGFT requires a loadable kernel module (1200 LOC) and 38 lines of kernel changes to trap processor exceptions

24

Page 88: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Throughput with isolation and recovery

NativeFGFT-­‐I/O-­‐allFGFT-­‐off-­‐I/OFGFT-­‐I/O-­‐1/2

netperf on Intel quad-core machines25

25

Page 89: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Throughput with isolation and recovery

0

25

50

75

100

Thr

ough

put

%ag

e (B

asel

ine

844

Mbp

s)

e1000 Network Card

NativeFGFT-­‐I/O-­‐allFGFT-­‐off-­‐I/OFGFT-­‐I/O-­‐1/2

netperf on Intel quad-core machines25

25

Page 90: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Throughput with isolation and recovery

0

25

50

75

100100

Thr

ough

put

%ag

e (B

asel

ine

844

Mbp

s)

e1000 Network Card

NativeFGFT-­‐I/O-­‐allFGFT-­‐off-­‐I/OFGFT-­‐I/O-­‐1/2

netperf on Intel quad-core machines25

CPU: 2.4%

25

Page 91: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Throughput with isolation and recovery

0

25

50

75

100100

93

Thr

ough

put

%ag

e (B

asel

ine

844

Mbp

s)

e1000 Network Card

NativeFGFT-­‐I/O-­‐allFGFT-­‐off-­‐I/OFGFT-­‐I/O-­‐1/2

netperf on Intel quad-core machines25

CPU: 2.4% 2.4%

25

Page 92: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Throughput with isolation and recovery

0

25

50

75

100100

93100

Thr

ough

put

%ag

e (B

asel

ine

844

Mbp

s)

e1000 Network Card

NativeFGFT-­‐I/O-­‐allFGFT-­‐off-­‐I/OFGFT-­‐I/O-­‐1/2

netperf on Intel quad-core machines25

CPU: 2.4% 2.4% 3.4%

25

Page 93: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Throughput with isolation and recovery

0

25

50

75

100100

93100

96

Thr

ough

put

%ag

e (B

asel

ine

844

Mbp

s)

e1000 Network Card

NativeFGFT-­‐I/O-­‐allFGFT-­‐off-­‐I/OFGFT-­‐I/O-­‐1/2

netperf on Intel quad-core machines25

CPU: 2.4% 2.4% 2.9%3.4%

25

Page 94: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Throughput with isolation and recovery

0

25

50

75

100100

93100

96

Thr

ough

put

%ag

e (B

asel

ine

844

Mbp

s)

e1000 Network Card

NativeFGFT-­‐I/O-­‐allFGFT-­‐off-­‐I/OFGFT-­‐I/O-­‐1/2

netperf on Intel quad-core machines25

CPU: 2.4% 2.4% 2.9%3.4%

FGFT can isolate and recover high bandwidth devices at low overhead without adding kernel subsystems

25

Page 95: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Summary

26

26

Page 96: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Summary

26

★ FGFT runs driver code as transactions★ Provides fault tolerance at incremental

performance and programmer efforts

★ Introduced device checkpoints★ Provides fast and complete recovery semantics

★ Fast device checkpoints should be explored in other domains like fast reboot, upgrade etc.

26

Page 97: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Questions

Asim Kadav★ http://cs.wisc.edu/~kadav★ [email protected]★ Graduating in spring!

27

Page 98: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Extra slides

★ Unlike suspend, devices continue to be accessed after a checkpoint★ Rely on drivers following ACPI specifications for

correctness

28

Page 99: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Latency for device checkpoint/restore

Driver Class Bus Checkpoint Times

Restore Times

8139too net PCI 33μs 62μse1000 net PCI 32μs 280msr8169 net PCI 26μs 30μspegasus net USB 0μs 4msens1371 sound PCI 33μs 111mspsmouse input serio 0μs 390ms

29

Fast checkpoint/restore using suspend/resume

29

Page 100: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Transforming drivers to run as FGFT

If  (c==0)  {.print  (“Driver  init”);}..

Driver with annotations

Static modifications30

30

Page 101: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Transforming drivers to run as FGFT

If  (c==0)  {.print  (“Driver  init”);}..

Driver with annotations

Static modifications30

User supplied annotations

Source transformation (adds driver transactions)

30

Page 102: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Transforming drivers to run as FGFT

If  (c==0)  {.print  (“Driver  init”);}..

Driver with annotations

Static modifications30

If  (c==0)  {.print  (“Driver  init”);}..

If  (c==0)  {.print  (“Driver  init”);}..

User supplied annotations

Source transformation (adds driver transactions)

Main driver module

SFI driver module

SFI = software fault isolated

30

Page 103: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Transforming drivers to run as FGFT

If  (c==0)  {.print  (“Driver  init”);}..

Driver with annotations

Static modifications Run-time support30

If  (c==0)  {.print  (“Driver  init”);}..

If  (c==0)  {.print  (“Driver  init”);}..

User supplied annotations

Source transformation (adds driver transactions)

Main driver module

SFI driver module

SFI = software fault isolated

30

Page 104: Fine-Grained Fault Tolerance using Device Checkpointspages.cs.wisc.edu/~kadav/app/fgft-slides.pdf · Fine-Grained Fault Tolerance using Device Checkpoints ... Cold boot device Verify

Transforming drivers to run as FGFT

If  (c==0)  {.print  (“Driver  init”);}..

Driver with annotations

Communication and recovery

support

Static modifications Run-time support30

If  (c==0)  {.print  (“Driver  init”);}..

If  (c==0)  {.print  (“Driver  init”);}..

1200 LOC

User supplied annotations

Source transformation (adds driver transactions)

Object tracking

Marshaling/Demarshaling

Kernel undo log

Main driver module

SFI driver module

SFI = software fault isolated

30