dept. of computer science & engineering, cuhk fault tolerance and performance analysis in...

33
Dept. of Computer Science & Engineering, CUHK Fault Tolerance and Performance Analysis in Wireless CORBA Chen Xinyu 2002-12-09 Supervisor: Markers: Prof. Michael R. Lyu Prof. Jerome Yen Prof. John C.S. Lui

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Dept. of Computer Science & Engineering, CUHK

Fault Tolerance and Performance Analysis in Wireless CORBA

Chen Xinyu

2002-12-09

Supervisor:

Markers:

Prof. Michael R. Lyu

Prof. Jerome Yen

Prof. John C.S. Lui

Outline

Motivation

Wireless CORBA

Fault Tolerant Wireless CORBA

Performance and Availability Analysis

Conclusions and Future Work

Motivation

Mobile Computing Permanent failures

Physical damage

Transient failures Mobile host Wireless link Environmental conditions

Fault Tolerant CORBA Entity replication

Visited Domain

Home Domain

Terminal Domain

Wireless CORBA Architecture

Access Bridge

Access Bridge

Access Bridge

Access Bridge

Static Host

Static Host

Terminal Bridge

GIOP

Tunnel

ab1

ab2

mh1

GTP Messages

Visited Domain

ab1

ab2

Wireless CORBA Architecture

Access Bridge

Access Bridge

Static Host

Static Host

Home Domain

Home Location

Agent

Terminal Domain Terminal

Bridge

GIOP

Tunnelmh1

mh1

Terminal Domain Terminal

Bridge

GIOP

Tunnel

GIOP

Tunnel

mh1

Terminal Domain Terminal

BridgeGIOP Tunnel

mh1

Terminal Domain Terminal

Bridge

Access Bridge

Access Bridge

Outline

Motivation

Wireless CORBA

Fault Tolerant Wireless CORBA

Performance and Availability Analysis

Conclusions and Future Work

Basic Concepts

Checkpoint the saved program’s states during failure-free

execution

Repair brings the failed device back to normal operation

Rollback reloads the program’s states saved at the most

recent checkpoint

Recovery the reprocessing of the program, starting from the

most recent checkpoint, applying the logged messages and until the point just before the failure

Device, Wireless & Mobile Issues

Device Issues Slow processor Small memory Small disk space Low power supply Physical damage

Applying mobile host as stable storage

a large number of system messages or a large size of information carried in one message

Checkpoints and Logs

collection

Wireless Issues High bit error rate Little bandwidth Long transfer delay

Mobile Issue Handoff

Applying Access Bridge as stable storage

Uncoordinated checkpointing Pessimistic message logging

Fault Tolerance Architecture

Client Object

Terminal Bridge

Recovery Mechanism

ORB

Platform

Mobile Host

Recovery Mechanism

Logging Mechanism

Platform

Access Bridge

Mobile Side

Fixed Side

Mobile Support Station

ORB

Recovery Mechanism

Logging Mechanism

ORB

Platform

Static Server

GIOP Tunnel

Multicast Messages

Object Replica

Mobile Host Handoff

Access Bridge 1

Access Bridge 2

Access Bridge 3

Home

Location

Agent

HandoffLocation Update

Home

Location

Agent

Mobile Host Handoff

Access Bridge 1

Access Bridge 2

Access Bridge 3

HandoffLocation Update

Home

Location

Agent

Mobile Host Crash

Access Bridge 1

Access Bridge 2

Access Bridge 3

Home

Location

Agent

Mobile Host Recovery

Access Bridge 1

Access Bridge 2

Access Bridge 3Collect last checkpoint

and succeeded message logs

Sorted by Ack. SN

Reconnect

Messages Replay

Outline

Motivation

Wireless CORBA

Fault Tolerant Wireless CORBA

Performance and Availability Analysis

Conclusions and Future Work

Assumptions

Failure occurrence, message arrival and handoff event

homogeneous Poisson process with parameter , and respectively

Failures do not occur when the program is in the repair or rollback process

A failure is detected as soon as it occurs

Execution without Checkpointing

RY0

X0

R

F1

H1Z0

0 t

Fj

Hk

mj(1) mj(N)m1(n1)m0(N)

X(N)

Repair Handoff

H H

Conditional Execution Time & LST

LST and Expectation of Program Execution Time

Ci

Execution with Equi-number Checkpointing

R+CYi(0)

Xi(0)

R+C

Fi(1)

Hi(1)Z i(0)

0 t

Fi(j)

Hi(k)

mij(1) mij(a)mi1(ni1)mi0(a)

Xi(N,a)

Repair + Rollback Handoff

Ci-1

Checkpointing

H H CC

Conditional Execution Time & LST

LST and Expectation of Program Execution Time

Average Availability

uptime interval: a program produces useful work towards its completion

downtime interval: Repair and rollback Handoff Checkpoint creation Wasted Computation

average availability: how much of the time an MH is in uptime interval during an execution

Optimal Checkpointing Interval

Beneficial Condition

Equi-number Checkpointing

Equi-number checkpointing with respect to message number Message number in each checkpointing interval is

not changed

Equi-number checkpointing with respect to checkpoint number Checkpoint number is not changed

Equi-number Checkpointing with respect to Checkpoint Number

Equi-number Checkpointing with respect to Message Number

Comparison Between Checkpointing and Without Checkpointing

Average Availability vs. Message Arrival Rate and Handoff Rate

Conclusions

Fault tolerant wireless CORBA Equi-number checkpoiting strategy LST and expectation of program

execution time Average availability Optimal checkpointing interval Beneficial condition

Future Work

Analysis model The message queuing effect during repair and

recovery

Failure detector Distributed consensus with link failures, process

failures, and mobile disconnections Leads to a faster solution Reduces communication costs

Fault tolerance in Ad Hoc network Without infrastructure support Self-organizing and adaptive

Thank You