software reliability

SOFTWARE RELIABILITY AND QUALITY

Presentation of Software Project Management

Submitted to: Sir KashifSubmitted by: Usman Mukhtar

Shahbaz Khan

BS(IT) 7th Semester

Outline

Software Reliability

Relationship of Software Reliability & Failure

Measures of Reliability & Availability

Software Reliability models

Static testing

Case Study

Reliability and quality assurance

WHAT IS SOFTWARE RELIABILITY?

• The probability of failure free operation of a computer program in a specified environment for a specified time.

• Reliability is the probability of not failing in a specified length of time

• It is a quality factor that can be directly measured and estimated using historical and development data.

• It measures how often s/w encounters a data input or other condition that it does not correctly process to produce correct answer

• If programX has reliability of 0.96 (over 8 processing hours) then it means, if programX runs 100 times – it will operate correctly 96 times

SOFTWARE RELIABILITY

• The perceived software reliability is an observant dependent

– If you do not face the problem, you do not report it

– Different users have different views of the systems and thus different quality and reliability assessments

• The software reliability keeps changing as the defects are detected and fixed

Reliability Metrics

• Probability of failure on demand– This is a measure of the likelihood that the system will fail when a service

request is made

– POFOD = 0.001 means 1 out of 1000 service requests result in failure

– Relevant for safety-critical or non-stop systems

• Rate of fault occurrence (ROCOF)– Frequency of occurrence of unexpected behaviour

– ROCOF of 0.02 means 2 failures are likely in each 100 operational time units

– Relevant for operating systems, transaction processing systems

Reliability Metrics

• Mean time to failure– Measure of the time between observed failures

– MTTF of 500 means that the time between failures is 500 time units

– Relevant for systems with long transactions e.g. CAD systems

• Availability– Measure of how likely the system is available for use. Takes

repair/restart time into account

– Availability of 0.998 means software is available for 998 out of 1000 time units

– Relevant for continuously running systems e.g. telephone switching systems

Relationship of Reliability & Failure

• When we discuss reliability, we consider the term ‘failure’.• Non-conformance to s/w requirements leads to failures• Negative results or in worst case no output is failure• Failure may be of different levels of complexity. • Some failures can be corrected in seconds, some in weeks and others in

months• Correction of a failure may introduce new errors (and eventually new

failures).

Relationship of Reliability & Failure

• Mathematical representation– F(n) = 1 - R(n)

• where,– F(n) = probability of failing in a specified length

of time

– R(n) = probability of reliability (i.e. not failing)

– n = no. of time units, – if time unit is assumed in days then probability of not failing in 1 day

is R(1)

Failure Classification

• Temporary - only occurs with certain inputs

• Permanent - occurs on all inputs

• Recoverable - system can recover without operator help

• Unrecoverable - operator has to help

• Non-corrupting - failure does not corrupt system state or data

• Corrupting - system state or data are altered


• Early work in software reliability attempted to extrapolate the mathematics of hardware reliability theory to prediction of software reliability. But,– Most hardware reliability models are based on failure due to physical wear

(decomposition effects, shock, temperature, etc.) rather than design defects.

– The opposite is true for softwares - All software failures can be traced to design or implementation problems.

– The following concepts apply to both systems


• Measure of Reliability– Consider a computer-based system. – A simple measure of reliability for such a system is mean-time-between-failure

(MTBF)• MTBF = MTTF + MTTR

MTTF = Mean-time-to-failureMTTR = Mean-time-to-repair

Many researchers argue that MTBF is more useful term than defects/KLOC or defects/FP as user is more concerned with failure rate as compared to defect count.

– Each defect does not have same failure rate and the total defect count gives little indication of the reliability of a system

• Measure of Reliability– Consider a computer-based system. – A simple measure of reliability for such a system is mean-time-between-failure

(MTBF)• MTBF = MTTF + MTTR

MTTF = Mean-time-to-failureMTTR = Mean-time-to-repair

Many researchers argue that MTBF is more useful term than defects/KLOC or defects/FP as user is more concerned with failure rate as compared to defect count.

– Each defect does not have same failure rate and the total defect count gives little indication of the reliability of a system


• Measure of Availability– Software availability is the probability that a program is operating according

to requirements at a given point in time.

– It is defined as

Availability = [MTTF / (MTTF + MTTR)] * 100%

– Availability measure is sensitive to MTTR

Reliability models

Many software models contain:

• Assumptions

• Factors

• Mathematical function

Software reliability can be divided into categories

Prediction Model :This model uses historical data.

Estimation Model :Estimation model uses the current data from the current software development effort

Reliability Growth Modeling

• Step Function Model– The simplest reliability growth model:

• a step function model

– The basic assumption:• reliability increases by a constant amount each time an error is detected and repaired.

– Assumes:• all errors contribute equally to reliability growth

• highly unrealistic:– we already know that different errors contribute differently to reliability growth.


• Jelinski and Moranda Model

– Realizes each time an error is repaired reliability does not increase by a constant amount.

– Reliability improvement due to fixing of an error is assumed to be proportional to the number of errors present in the system at that time.


• Littlewood and Verall’s Model– Assumes different fault have different sizes, thereby contributing unequally to

failures.

– Allows for negative reliability growth

– Large sized faults tends to be detected and fixed earlier

– As number of errors is driven down with the progress in test, so is the average error size, causing a law of diminishing return in debugging


• Applicability of models:

– There is no universally applicable reliability growth model.

– Reliability growth is not independent of application.

– Fit observed data to several growth models.

• Take the one that best fits the data.

Statistical Testing

• Testing for reliability rather than fault detection

• Test data selection should follow the predicted usage profile forthe software

• Measuring the number of errors allows the reliability of thesoftware to be predicted

• An acceptable level of reliability should bespecified and the software tested and amended until that level ofreliability is reached

Statistical Testing

• Different users have different operational profile:

– i.e. they use the system in different ways

– formally, operational profile:

• probability distribution of input

• Divide the input data into a number of input classes:

– e.g. create, edit, print, file operations, etc.

• Assign a probability value to each input class:

– a probability for an input value from that class to be selected.

Statistical Testing

• Determine the operational profile of the software:– This can be determined by analyzing the usage pattern.

• Manually select or automatically generate a set of test data: – corresponding to the operational profile.

• Apply test cases to the program:– record execution time between each failure

– it may not be appropriate to use raw execution time

• After a statistically significant number of failures have been observed:– reliability can be computed

Statistical Testing

• Relies on using large test data set.

• Assumes that only a small percentage of test inputs:– likely to cause system failure.

• It is straight forward to generate tests corresponding to the most common inputs:– but a statistically significant percentage of unlikely inputs should also be included.

• Creating these may be difficult: – especially if test generators are used.

• Pros and cons

-Say by yourself

CASE STUDY

BANK AUTO-TELLER SYSTEM

• Each machine in a network is used 300 times a day

• Bank has 1000 machines

• Lifetime of software release is 2 years

• Each machine handles about 200, 000 transactions

• About 300, 000 database transactions in total per day

EXAMPLES OF A RELIABILITY SPEC.

Failure class Example Reliability metric

Permanent,non-corrupting.

The system fails to operate withany card which is input. Softwaremust be restarted to correct failure.

ROCOF1 occurrence/1000 days

Transient, non-corrupting

The magnetic stripe data cannot beread on an undamaged card whichis input.

POFOD1 in 1000 transactions

Transient,corrupting

A pattern of transactions across thenetwork causes databasecorruption.

Unquantifiable! Shouldnever happen in thelifetime of the system

SPECIFICATION VALIDATION

• It is impossible to empirically validate very high reliability specifications

• No database corruptions means POFOD of less than 1 in 200 million

• If a transaction takes 1 second, then simulating one day’s transactions takes 3.5days

• It would take longer than the system’s lifetime to test it for reliability

COSTS OF INCREASING RELIABILITY

SOFTWARE QUALITY

What is Software Quality?

• Software quality is:

- The degree to which a system, component, or process meets specified requirements.

- The degree to which a system, component, or process meets customer or user needs or expectations.

Software Quality Criteria

• Correctness

• Efficiency

• Flexibility

• Robustness

• Interoperability

• Maintainability

Software Quality Criteria

• Performance

• Portability

• Reliability

• Reusability

• Testability

• Usability

• Availability

• Understandability

Software Quality Management System

• A quality management system is a principal methodology used by organizations to ensure that the products they develop have the desired quality

• A quality system is the responsibility of the system as a whole, and the full support of the top management is a must

• A good quality system must be well documented

Software Quality Management System

• The quality system activities encompass the following:

– Auditing of projects

– Review of the quality system

– Development of standards, procedures and guidelines… etc

– Production of reports for the top management summarizing the effectiveness of the quality system in the organization

Product Metrics versus Process Metrics

• Product metrics help measure the characteristics of a product being developed, whereas process metric help measure how a process is performing

• Product metrics: LOC ,FP, PM, time to develop the product, the complexity of the system … etc

• Process metrics: review effectiveness, inspection efficiency … etc

CONCLUSIONS

• Software reliability is a key part in software quality

• Software reliability improvement is hard

• There are no generic models.

• Measurement is very important for finding the correct model.

• Statistical testing should be used but it is not easy again…

• Software Reliability Modelling is not as simple as it looks.

THANK YOU

software reliability

Technology