lecture 4: state-based methods cs 7040 trustworthy system design, implementation, and analysis...

58
Lecture 4: State- Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Upload: william-benson

Post on 04-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Lecture 4: State-Based Methods

CS 7040

Trustworthy System Design, Implementation, and Analysis

Spring 2015, Dr. Rozier

Adapted from slides by WHS at UIUC

Page 2: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Introduction

Page 3: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Availability: A motivation for State-Based Methods

• Recall that availability quantifies the alternation between proper and improper service.– A(t) is 1 if service is proper, 0 otherwise.– E[A(t)] is the probability that the service is proper

at time t.– A(0,t) is the fraction of time the system delivers

proper service during [0,t]

Page 4: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Availability

• For many systems, availability is a more “user-oriented” measure than reliability.– It is often more difficult to compute, since it must

account for repair and/or replacement.

Page 5: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Availability: Example

• A radio transmitter has an expected time to failure of 500 days. Replacement takes an average of 48 hours.

• A cheaper, and less reliable, transmitter has an expected time to failure of 150 days. Due to cost, a replacement is readily available and replacement takes 8 hours on average.

Page 6: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Availability: Example

• A radio transmitter has an expected time to failure of 500 days. Replacement takes an average of 48 hours.

• A cheaper, and less reliable, transmitter has an expected time to failure of 150 days. Due to cost, a replacement is readily available and replacement takes 8 hours on average.

• Reliability, – for the first transmitter MTTF = 500 days– for the second transmitter MTTF = 150 days

First transmitter is 3.33x more reliable.

Page 7: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Availability: Example

• A radio transmitter has an expected time to failure of 500 days. Replacement takes an average of 48 hours.

• A cheaper, and less reliable, transmitter has an expected time to failure of 150 days. Due to cost, a replacement is readily available and replacement takes 8 hours on average.

• For for the more reliable transmitter, and for the less reliable transmitter.

Page 8: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Availability: Example

• A radio transmitter has an expected time to failure of 500 days. Replacement takes an average of 48 hours.

• A cheaper, and less reliable, transmitter has an expected time to failure of 150 days. Due to cost, a replacement is readily available and replacement takes 8 hours on average.

• For for the more reliable transmitter, and for the less reliable transmitter.

Higher reliability doesn’t necessarily mean higher availability!

Page 9: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Availability: Example

• A radio transmitter has an expected time to failure of 500 days. Replacement takes an average of 48 hours.

• A cheaper, and less reliable, transmitter has an expected time to failure of 150 days. Due to cost, a replacement is readily available and replacement takes 8 hours on average.

• For for the more reliable transmitter, and for the less reliable transmitter.

Higher reliability doesn’t necessarily mean higher availability!

Page 10: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Availability Modeling using Combinatorial Methods

• Availability modeling can be done with combinatorial methods, but only with independent repair assumptions, and exponential life-time distributions.

• This uses the theory of ON/OFF processes

“On” time distribution is reliability distribution, obtained using combinatorial methods. We represent mean as E[On].

“Off” distribution is the repair time distribution. Mean is E[Off].

Page 11: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Availability Modeling using Combinatorial Methods

Availability is the fraction of the time in the On state.

• Asymptotically, if instances of On periods are independent and identically distributed (i.i.d.) and instances of Off periods are i.i.d. then:

Page 12: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Availability Modeling using Combinatorial Methods

• Lots of assumptions that may not hold!

Page 13: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

State-Based Methods• More accurate modeling can be achieved with state-based methods.

– State-based methods relax the independence assumptions needed for combinatorial modeling!

– Failures need not be independent. Failure of one component may make another component more or less likely to fail!

– Repairs need not be independent. Repair and replacement strategies are an important component that must be modeled in high-availability systems.

– High availability systems may operate in a degraded mode. In a degraded mode, the system may deliver only a fraction of its services. The repair process may start only when a system is sufficiently degraded.

Page 14: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Repairs and Independence

Page 15: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Repairs and Independence

Page 16: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC
Page 17: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

NCSA’s Blue Waters

• 288 cabinets• 26.4 PB of usable storage

Page 18: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

NCSA’s Blue Waters

• 288 cabinets• 26.4 PB of usable storage

• Let’s say the MTTF of a disk is 3 years.

Page 19: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

NCSA’s Blue Waters

• 26.4 PB of usable storage• Let’s say the MTTF of a disk is 3 years.• 2TB per disk

Page 20: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

NCSA’s Blue Waters

• 26.4 PB of usable storage• Let’s say the MTTF of a disk is 3 years.• 2TB per disk• > 27,000 TB of usable storage• ~ 34,000 TB of actual storage (RAID)

Page 21: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

NCSA’s Blue Waters

• 26.4 PB of usable storage• Let’s say the MTTF of a disk is 3 years.• 2TB per disk• > 27,000 TB of usable storage• ~ 34,000 TB of actual storage (RAID)

… 17,000 disks

Page 22: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

NCSA’s Blue Waters

• 26.4 PB of usable storage• Let’s say the MTTF of a disk is 3 years.• 2TB per disk• > 27,000 TB of usable storage• ~ 34,000 TB of actual storage (RAID)

… 17,000 disks

Page 23: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

NCSA’s Blue Waters

• 26.4 PB of usable storage• Let’s say the MTTF of a disk is 3 years.• 2TB per disk• > 27,000 TB of usable storage• ~ 34,000 TB of actual storage (RAID)

… 17,000 disks

Page 24: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

NCSA’s Blue Waters

• 26.4 PB of usable storage• Let’s say the MTTF of a disk is 3 years.• 2TB per disk• > 27,000 TB of usable storage• ~ 34,000 TB of actual storage (RAID)

… 17,000 disks Rate of failure for ANY disk

Page 25: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

NCSA’s Blue Waters

• We are losing 15+ drives every day.

• But loss of a drive isn’t necessarily loss of data, right?– 17,000 drives in 8+2 means 1,700 RAID groups.– Chance of RAID failure on day 1 is only 6%.– 24% on day 2

• Rather than replace every driveimmediately when it fails, let’sreplace many drives all atonce.

Page 26: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

State-Based Methods

• We use random processes to model these systems.

• We use “state” to “remember” the conditions leading to dependencies.

Page 27: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Random Processes

• A random process is a collection of random variables indexed by time.– Useful for characterizing the behavior of real

systems.

• X(t) is a random process. Let X(1) be the result of tossing a die. Let X(2) be the result of tossing a die plus X(1), and so on. Notice that time (T) = {1, 2, 3, …}

Page 28: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Random Processes

• X(t) is a random process. Let X(1) be the result of tossing a die. Let X(2) be the result of tossing a die plus X(1), and so on. Notice that time (T) = {1, 2, 3, …}

P[X(2) = 12] = 1/36P[X(3) = 14|X(1) = 2] = 1/36E[X(n)] = 3.5n

Etc.

Page 29: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Random Processes

• If X is a random process, X(t) is a random variable

• A random variable Y is a function that maps

• A random process X maps elements in the two dimensional space to elements in

Page 30: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Random Processes

• A sample path of X is the history of sample space values X adopts as a function of time.

• When we fix t, then X becomes a function • When we fix X becomes a function• By fixing (e.g., the system is available) and observing X as a

function of T, we see a trajectory of the process samplingor not.

Page 31: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Describing a Random Process

• Recall: for a random variable X, we can use the cumulative distribution to describe the random variable.

• In general, no such simple description exists for a random process.

Page 32: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Describing a Random Process

• A random process can often be described succinctly in various ways, for example, say Y is a random variable representing the roll of a die, and X(t) is the sum after t rolls. We can describe X(t) as:

X(t) – X(t-1) = YP[X(t) = i|X(t-1) = j] = P[Y = i – j],

Or where eachIs independent.

Page 33: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Classifying a Random Process:Characteristics of T

• If the number of points defined by a random process, i.e. |T|, is finite or countable then the random process is said to be a discrete-time random process.

• If |T| is uncountable then random process is said to be a continuous-time random process.

Let X(t) be the number of fault arrivals in a system up to time t. What time of process is X(t)?

Page 34: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Classifying a Random Process:State Space Type

• Let X be a random process. The state space of a random process is the set of all possible values that the process can take on, i.e.

If X is a random process that models a system, then the state space of X can represent the set of all possible configurations that the system could be in.

Page 35: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Classifying a Random Process:State Space Type

• If the state space, S, of a random process, X, is finite or countable, then X is said to be a discrete-state random process.

• If the state space S of a random process X is infinite and uncountable then X is said to be a continuous-state random process.

Page 36: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Classifying a Random Process:State Space Type

• Let X be a random process that represents the number of bad packets received over a network. – Classify the state space.

Page 37: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Classifying a Random Process:State Space Type

• Let X be a random process that represents the voltage on a telephone line. – Classify the state space.

Page 38: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Classifying a Random Process:State Space Type

We will concern ourselves primarily with discrete-state processes.

Page 39: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Random Process Classification Examples

Page 40: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Markov Process

• A special type of random process that we will examine is called a Markov process. A Markov process can be defined, informally, as follows:

Given the state of a Markov process X at time t, the future behavior of X can be described completely

in terms of X(t).

Markov processes have the very useful property that their future behavior is independent of past values.

Page 41: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Markov Chains• A Markov chain is a Markov process with a discrete state

space.We will always make the assumption that a Markov chain has a

state space in {1, 2, …} and that it is time-homogeneous.

A Markov chain is time-homogenous if its future behavior does not depend on what time it is, only the current state.

We formalize this property by looking at a discrete-time Markov chain (DTMC). A DTMC X has the following property:

Page 42: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

DTMCs

• Given i, j, and k, is a number.• can be interpreted as the probability that

X has value i, then after k time-steps, X will have value j.

Frequently we write to mean

Page 43: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

DTMC

• Suppose we have a processor, it can be working or failed at some time T.

Page 44: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

DTMC

• Suppose we have two processors, which can both be working or failed, independently, at time T.

Page 45: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

State Occupancy Probability Vector• Let be a row vector. We denote to be the i-th

element of the vector. If is a state occupancy probability vector, thenis the probability that a DTMC is in state i at time-step k.

• Assume that a DTMC X has a state-space of size n, i.e. S = {1, 2, …, n}. We say formally:

Note that

Page 46: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

State Occupancy Probability Vector

Computing a single step forward in time.

Given an initial which is the initial probability vector (where we are likely to be at t=0), andhow do we compute ?

Recall the definition of

Page 47: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

State Occupancy Probability Vector

Since

Page 48: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

State Occupancy Probability Vector

• We have which holds for all j.

• This resembles vector-matrix multiplication. If we arrange the matrix

Then

Page 49: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

State Occupancy Probability Vector

• The important consequence of this is that we can easily specify a DTMC in terms of an occupancy probability vector, and a transition probability matrix P.

Page 50: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Transient Behavior of DTMCs

Page 51: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

A Simple Example

• Suppose the weather in Cincinnati can be modeled the following way:

Page 52: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Simple Example, cont.

Page 53: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Simple Example, Solution

Page 54: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Solution, cont.

Page 55: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Graphical Representation

Page 56: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Simple Computer Example

Page 57: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

Limiting Behavior of DTMCs

Page 58: Lecture 4: State-Based Methods CS 7040 Trustworthy System Design, Implementation, and Analysis Spring 2015, Dr. Rozier Adapted from slides by WHS at UIUC

For next time

• Apply for a Mobius account• Use your uc.edu e-mail

address, specifically mention this class and instructor.