1 sas 04/ gsfc/satc-nswcdd system and software reliability technical presentation naval surface...

23
1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology Center & University of Connecticut Drs. William H. Farr & John R. Crigler (NSWCDD) Dolores Wallace (SATC), & Dr. Swapna Gokhale (UC) NASA OSMA SAS '04

Upload: clarence-golden

Post on 18-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

3 SAS 04/ GSFC/SATC-NSWCDD 2003 & 2004 Research 2003 (Software Based) Literature search Selection of new models Build new software models into SMERFS^3 Test new models with Goddard project data Make latest version of SMERFS^3 available 2004 (System Based) Conduct similar research effort for System Reliability Enhance and validate system models

TRANSCRIPT

Page 1: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

1SAS 04/ GSFC/SATC-NSWCDD

System and Software ReliabilityTechnical Presentation

Naval Surface Warfare Center Dahlgren DivisionSoftware Assurance Technology Center

& University of Connecticut

Drs. William H. Farr & John R. Crigler (NSWCDD) Dolores Wallace (SATC), & Dr. Swapna Gokhale (UC)

NASA OSMA SAS '04

Page 2: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

2SAS 04/ GSFC/SATC-NSWCDD

Outline of the Presentation

I. FY03 & FY04 Research InitiativesII. FY03 Research

A. Description of SMERFS^3B. Description of the Models ImplementedC. Application of the Models to GSFC DataD. Lessons Learned

III. FY04 ResearchA. Literature SearchB. System Model TaxonomyC. Description of GFSC System DataD. Plans for SMERFS^3

IV. Technology ReadinessV. Barriers to Research or Application

Page 3: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

3SAS 04/ GSFC/SATC-NSWCDD

2003 & 2004 Research

2003 (Software Based)2003 (Software Based)• Literature search• Selection of new models• Build new software models into SMERFS^3• Test new models with Goddard project data• Make latest version of SMERFS^3 available 2004 (System Based)2004 (System Based)• Conduct similar research effort for System Reliability

• Enhance and validate system models

Page 4: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

2003 Research

Page 5: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

5SAS 04/ GSFC/SATC-NSWCDD

SMERFS^3

• Current Version features:– 6 software reliability models– 2D, 3D plots of input data, fit into each model– Various reliability estimates– User queries for predictions

• Update constraints:– Employ data from integration, system test, or operational

phase– Use existing graphics of SMERFS^3– Integrate with existing user interfaces, goodness-of-fit

tests, and prediction capabilities

Page 6: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

6SAS 04/ GSFC/SATC-NSWCDD

Hypergeometric ModelAssumptions

Test instance, t(i): A collection of input test data.N: Total number of initial faults in the software.• Faults detected by a test instance are removed before the

next test instance is exercised• No new fault is inserted into the software in the removal of

the detected fault.• A test instance t(i) senses w(i) initial faults. w(i) may vary

with the condition of test instances over i. It is sometimes referred to in the authors' papers as a "sensitivity" factor. This w(i) can take any number of forms.

• The initial faults actually sensed by t(i) depend upon t(i) itself. The w(i) initial faults are taken randomly from the N initial faults.

Page 7: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

7SAS 04/ GSFC/SATC-NSWCDD

Schneidewind Model

• There are three versions:– Model 1: All of the fault counts for each testing

period are treated the same. – Model 2: Ignore the first s-1 testing periods and

their associated fault counts. Only use the data from s to n.

– Model 3: Combine the fault counts of the intervals 1 to s-1 into the first data point. Thus there are s+1 data points.

Page 8: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

8SAS 04/ GSFC/SATC-NSWCDD

Schneidewind Assumptions

• The number of faults detected in each of the respective intervals are independent.

• The fault correction rate is proportional to the number of faults to be corrected.

• The intervals over which the software is tested are all taken to be of the same length.

• The cumulative number of faults by time t, M(t), follows a Poisson process with mean value function μ(t). The mean value function is such that the expected number of fault occurrences for any time period is proportional to the expected number of undetected faults at that time.

• The failure intensity function, λ(t), is assumed to be an exponentially decreasing function of time; that is, λ(t)=αexp(-βt) for some α, β > 0.

Page 9: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

9SAS 04/ GSFC/SATC-NSWCDD

Models Incorporated in 2003

Hypergeometric & Schneidewind Model (enhancements) were Incorporated into SMERFS^3

• These two models require error count failure data• For the Hypergeometric model only the constant form (w(i) = c with c>=0)

was implemented– Only error count data was captured in the GSFC project database. The available data

included fault occurrence date, life cycle phase & severity level for three separate builds. No data was available on testing intensity measures (number of tests, number of testing personnel, testing hours expended, etc.)

• For the Schneidewind Model Type II the global optimal “s” was obtained and the risk criterion measures were implemented for all interval data models.

– The risk criterion measures address the important question of when can I release my software to minimize the number of remaining faults and to maximize the chance that a fault will not manifest itself over a specified mission critical time. Risk measures that were implemented during the last half of 2003 included:

• Operational quality at time t• Risk criterion metric for the remaining faults at time t• Risk criterion metric for the time to next failure at time t

Page 10: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

10SAS 04/ GSFC/SATC-NSWCDD

2003 Software Available Data

• Large GSFC project, but confidentiality required• Several subsystems• Data flat files – much effort into spreadsheet/database• Operational failures only• Removed specific faults and sorted others• Three builds were used (called A,B, & C) consisting of aggregated faults

by months for the activity phases: Integration Testing, Operability Testing, System Testing , and Operations and for severity levels 1, 2, and 3. This gave a resulting data set of 201 for Build A; 249 for B, and 187 for C.

Bottom line: organizing data required substantial effort – minimized if project person prepared the data

Page 11: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

11SAS 04/ GSFC/SATC-NSWCDD

GFSC Build A, B & C Faults per Month Data

JUL 98

NO

V 98

MA

R 99

JUL 99

NO

V 99

MA

R 00

JUL 00

NO

V 00

MA

R 01

JUL 01

NO

V 01

MA

R 02

JUL 02

NO

V 02

Month When Found

0

10

20

30

Build

A

B

C

Page 12: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

Build A Model

Results Build B Model

Results

Build C Model

Results

Sch. Type II

Yamada

Sch. Type II

Yamada

Yamada

Sch. Type

II

Page 13: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

Build A Risk Assessment Criteria

Remaining Faults = 5

Mission Duration = 2Remaining

Faults = 5 Desired Goal

Mission

Duration

Goal

Achieved

Desired Remaining Faults Requires 27

Months of Testing;

Desired Mission Duration Requires 48

Months of Testing

Page 14: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

14SAS 04/ GSFC/SATC-NSWCDD

Lessons Learned

1. There is a need to capture additional information on the Faults. a) Description of the particular activity that found the

faults,b) Duration of the activity, c) Number of individuals involved, etc.

2. Schneidewind’s Treatment Type 2 and the Yamada S-shaped models consistently did the best job in fitting the data. a) Early fault data was not reflective of the current

failure rate. b) Both models tend to factor out the early behavior.

Page 15: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

2004 Research

Page 16: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

16SAS 04/ GSFC/SATC-NSWCDD

Literature Search

Limited to papers published from 1990 onward. Few papers addressed total system reliability before then.

Initial search revealed that focus should be on system availability vice reliability. Reviewed 72 journal and conference papers.

Availability = the proportion of some specified period of time during which the system is operating satisfactorily

Availability = uptime/total time = MTBF/(MTBF + MTTR)Availability is the fundamental quantity of interest for repairable

systems and is a more appropriate measure than reliability for measuring the effectiveness of maintained systems.

Literature is also replete with special availability measures that have been proposed for specific application systems.

Page 17: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

17SAS 04/ GSFC/SATC-NSWCDD

System Model Taxonomy

System model criteria for incorporation into SMERFS^3:• Must use failure data (i.e., time-between-failures) from system testing or operation. Need both hardware and software failures.• Dates of failures and their closures must be included. Date on which a fault was corrected should be provided.• Candidate model must integrate well with the existing graphics and interface capabilities in SMERFS^3.

Two types of availability and reliability modeling approaches:• Model-based analysis• Measurement-based analysis

Page 18: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

18SAS 04/ GSFC/SATC-NSWCDD

Model-based analysis

Model relating failure and repair events of the components to failure and repair events of the system based on its structure.

Model types:Combinatorial (Fault tree, Reliability block diagram)State space (Markov chain)Hierarchical

Advantages:Can be performed early before the system is availableFacilitates “what-if”/predictive and sensitivity analysis

Disadvantages:Complex models with many parametersData availability to estimate the parameters is an issue

Page 19: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

19SAS 04/ GSFC/SATC-NSWCDD

Measurement-based analysis

Collect data during system operation Obtain reliability and availability estimates directly from dataAdvantages:

Provides true estimates of reliability and availabilityVerify assumptions underlying model-based analysisReveal model structure and build new modelsEstimate the values of model parameters

Disadvantages:Requires an operational system or at least a prototypeNo consensus or uniformity in the type of data required and its collection Expensive to perform predictive and sensitivity analysis

Page 20: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

20SAS 04/ GSFC/SATC-NSWCDD

Description of GSFC System Data

• Requirements for availability measurement– Time of each failure– Time system restored to service after each failure

• GSFC system data – Several spacecraft with severity level specified– Acceptance testing through operation– Hardware (predominant) and software– After accounting for each spacecraft, severity, activity,

each data set has only a few data points– Date of each failure– Date correction officially acceptedExact downtime not available.

Page 21: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

21SAS 04/ GSFC/SATC-NSWCDD

Technology Readiness

I. The prototype tool for SMERFS^3 incorporating the software models and their updates, the existing hardware models, and the basic systems modeling capability will be completed by the end of this year.

A. Additional needs for general availability of the tool and ease-of-use for the general practitioner include:

1. A User’s Manual2. A training package and supporting documentation3. A way of making Program Managers, developers,

etc. aware of the technology, the supporting tool and the technologies’ strengths and weaknesses

4. A distribution medium if the tool is desiredII. Demonstrated Return-on-Investment using the technology

Page 22: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

22SAS 04/ GSFC/SATC-NSWCDD

Plans for SMERFS^3 in 2004

• Identify 1-2 Candidate Models (August)• Formulate Models for Coding (August -

September)• Code models in SMERFS^3 (October –

November)• Apply Models to GFSC data (November –

December)• Write-up Final Report & distribute

SMERFS^3 (December)

Page 23: 1 SAS 04/ GSFC/SATC-NSWCDD System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology

23SAS 04/ GSFC/SATC-NSWCDD

Barriers to Research or Application

I. Data AvailabilityII. When data was available we encountered:

A. Confidentiality concernsB. The right kinds of data were not being collected. (Example:

Some measures relating to testing intensity.)C. Lack of consistency among and within data sets. (Definitions

and quality were particularly troublesome.)III. Complexity of the models. The more complex the models are, the

more parameters are used to define them. This necessitates advanced computational algorithms and larger data sets with more information that needs to be recorded.

IV. Validation of these models on large systems. There are so many factors (size, type, environment, etc.) to consider that real validation is a serious concern.

V. Management support on the use of this technology. To gain that we must demonstrate real Return-On-Investment.