1 can we trust the computer? what can go wrong? case study: the therac-25 increasing reliability and...

21
1 Can We Trust the Can We Trust the Computer? Computer? What Can Go Wrong? What Can Go Wrong? Case Study: The Therac-25 Case Study: The Therac-25 Increasing Reliability and Safety Increasing Reliability and Safety Perspectives on Failures, Dependence, Risk, Perspectives on Failures, Dependence, Risk, and Progress and Progress Computer Models Computer Models

Upload: randell-nicholson

Post on 29-Dec-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

1

Can We Trust the Computer?Can We Trust the Computer?

What Can Go Wrong?What Can Go Wrong?

Case Study: The Therac-25Case Study: The Therac-25

Increasing Reliability and SafetyIncreasing Reliability and Safety

Perspectives on Failures, Dependence, Risk, and Perspectives on Failures, Dependence, Risk, and

ProgressProgress

Computer ModelsComputer Models

2

What Can Go Wrong?What Can Go Wrong?

• Facts About Computer ErrorsFacts About Computer Errors•Error-free software is not possible.Error-free software is not possible.

•Errors are often caused by more than one Errors are often caused by more than one factor.factor.

•Errors can be reduced by following good Errors can be reduced by following good procedures and professional practices.procedures and professional practices.

Q: How do we distinguish between tolerable or unavoidable errors in software versus careless software development?

3

What Can Go Wrong?What Can Go Wrong?

• The Roles of People in Computer-related Problems:The Roles of People in Computer-related Problems:– Computer UserComputer User

• At home or work, users should understand the At home or work, users should understand the limitations of computers and the need for proper limitations of computers and the need for proper training and responsible use.training and responsible use.

– Computer ProfessionalComputer Professional

• Understanding the source and consequences of Understanding the source and consequences of computer failures is valuable when buying, computer failures is valuable when buying, developing, or managing a complex system.developing, or managing a complex system.

– Educated Member of SocietyEducated Member of Society

• Personal decisions and political, social, and ethical Personal decisions and political, social, and ethical decisions depend on understanding computer risks.decisions depend on understanding computer risks.

4

What Can Go Wrong?What Can Go Wrong?

• Categories of Computer Errors and FailuresCategories of Computer Errors and Failures– Problems for Individuals:Problems for Individuals:

•usually in their role as consumers.usually in their role as consumers.

•who are incorrectly identified by inaccurate who are incorrectly identified by inaccurate law enforcement databases.law enforcement databases.

– System Failures:System Failures:

•affecting large numbers of people and/or affecting large numbers of people and/or costing large amounts of money.costing large amounts of money.

– Safety-Critical Applications:Safety-Critical Applications:

•where people may be injured or killed.where people may be injured or killed.

5

What Can Go Wrong?What Can Go Wrong?

• Problems for IndividualsProblems for Individuals– Billing ErrorsBilling Errors

•Lack of tests for inconsistencies and Lack of tests for inconsistencies and inappropriate amounts.inappropriate amounts.

– Database Accuracy ProblemsDatabase Accuracy Problems• Incorrect information resulting in wrongful Incorrect information resulting in wrongful

treatment or acts.treatment or acts.

Q: Describe a computer error or failure that has affected you.

6

What Can Go Wrong?What Can Go Wrong?

• Problems for Individuals Problems for Individuals (cont’d)(cont’d)– Causes:Causes:

•Large population.Large population.

•Human common sense not part of automated Human common sense not part of automated processing.processing.

•Overconfidence in the accuracy of data from a Overconfidence in the accuracy of data from a computer.computer.

•Errors in data entry.Errors in data entry.

• Information not updated or corrected.Information not updated or corrected.

•Lack of accountability for errors.Lack of accountability for errors.

7

What Can Go Wrong?What Can Go Wrong?

• System FailuresSystem Failures– Communications:Communications:

•Telephone, online, and broadcast services.Telephone, online, and broadcast services.

– Business:Business:• Inventory and management software.Inventory and management software.

– Financial:Financial:•Stock exchange, brokerages, banks, etc..Stock exchange, brokerages, banks, etc..

– Transportation:Transportation:•Reservations, ticketing, and baggage handling.Reservations, ticketing, and baggage handling.

8

What Can Go Wrong?What Can Go Wrong?

• System Failures System Failures (cont’d)(cont’d)– Causes:Causes:

• Insufficient testing and debugging time.Insufficient testing and debugging time.

•Significant changes in specifications (during Significant changes in specifications (during and after project begun).and after project begun).

•Overconfidence in system.Overconfidence in system.

•Mismanagement of the project.Mismanagement of the project.

Q: Describe a recent system failure that affected many people or resulted in a great monetary loss.

9

What Can Go Wrong?What Can Go Wrong?

• Safety-Critical ApplicationsSafety-Critical Applications– MilitaryMilitary– Power PlantsPower Plants– AircraftAircraft– TrainsTrains– Automated FactoriesAutomated Factories– MedicineMedicine– ……others.others.

10

What Can Go Wrong?What Can Go Wrong?

• Safety-Critical ApplicationsSafety-Critical Applications– Causes:Causes:

•Overconfidence.Overconfidence.

•Lack of override features.Lack of override features.

• Insufficient testing.Insufficient testing.

•Sheer complexity of system.Sheer complexity of system.

•Mismanagement.Mismanagement.

Q: What activities do you participate in that are controlled by safety-critical applications?

11

Case Study: The Therac-25Case Study: The Therac-25

• The Therac-25 was a software-controlled The Therac-25 was a software-controlled radiation-therapy machine used to treat radiation-therapy machine used to treat people with cancer.people with cancer.– Overdoses of radiationOverdoses of radiation

•Normal dosage is 100–200 rads.Normal dosage is 100–200 rads.

• It is estimated that 13,000 and 25,000 rads were given It is estimated that 13,000 and 25,000 rads were given to six people.to six people.

•Three of the six people died.Three of the six people died.

Q: What determines whether the risks associated with a safety-critical application are acceptable?

12

Case Study: The Therac-25Case Study: The Therac-25

• Therac-25 Radiation OverdoseTherac-25 Radiation Overdose– Multiple Causes:Multiple Causes:

•Poor safety design.Poor safety design.

• Insufficient testing and debugging.Insufficient testing and debugging.

•Software errors.Software errors.

•Lack of safety interlocks.Lack of safety interlocks.

•Overconfidence.Overconfidence.

• Inadequate reporting and investigation of Inadequate reporting and investigation of accidents.accidents.

13

Increasing Reliability and Increasing Reliability and SafetySafety

• What Goes Wrong?What Goes Wrong?– Computer Systems Fail Because:Computer Systems Fail Because:

•The job they are doing is inherently difficult, andThe job they are doing is inherently difficult, and

•The job is done poorly.The job is done poorly.

– Compounding the Reliability Issue:Compounding the Reliability Issue:•Developers and users exhibit overconfidence in the system.Developers and users exhibit overconfidence in the system.

•Reused system software may not work in different Reused system software may not work in different environments.environments.

Q: Identify the elements needed as an incentive to increase reliability and safety.

14

Increasing Reliability and Increasing Reliability and SafetySafety

• Professional TechniquesProfessional Techniques•Follow good software practices.Follow good software practices.

•Exhibit professional responsibility at all levels of Exhibit professional responsibility at all levels of development and use.development and use.

•Construct well-designed user interfaces.Construct well-designed user interfaces.

•Take human factors into account.Take human factors into account.

• Include built-in redundancy.Include built-in redundancy.

• Incorporate self-checking where appropriate.Incorporate self-checking where appropriate.

•Follow good testing principals and techniques.Follow good testing principals and techniques.

Q: What human interface features should be considered for ordinary business applications?

15

Increasing Reliability and Increasing Reliability and SafetySafety

• Law and RegulationLaw and Regulation– Criminal and Civil Penalties: Criminal and Civil Penalties:

•to recover loss from faulty or unsafe systems.to recover loss from faulty or unsafe systems.

– Liability and Civil Laws:Liability and Civil Laws:•to provide incentives to produce reliable and safe to provide incentives to produce reliable and safe

systems.systems.

– Warranties:Warranties:•to guarantee a certain level of quality.to guarantee a certain level of quality.

– Federal or State Regulations:Federal or State Regulations:•to protect the public.to protect the public.

16

Increasing Reliability and Increasing Reliability and SafetySafety

• Law and Regulation Law and Regulation (cont’d)(cont’d)– Database Accuracy Enforcement:Database Accuracy Enforcement:

•to protect the public from inaccurate information to protect the public from inaccurate information maintained by private companies and government.maintained by private companies and government.

– Mandatory Licensing of Software Developers:Mandatory Licensing of Software Developers:•to ensure proper training, competency, and to ensure proper training, competency, and

continuing education.continuing education.

Q: How can consumers protect themselves from faulty software?

17

Perspectives on Failures, Perspectives on Failures, Dependence, Risk, and Dependence, Risk, and ProgressProgress• FailuresFailures

• What are acceptable rates of failures?What are acceptable rates of failures?

• How accurate should software be?How accurate should software be?

• DependenceDependence

• How How dependentdependent on computer systems are our on computer systems are our ordinary activities?ordinary activities?

• How How usefuluseful are computer systems to our ordinary are computer systems to our ordinary activities?activities?

• Risk and ProgressRisk and Progress

• How do new technologies become safer?How do new technologies become safer?

• Can progress in software safety keep up with the Can progress in software safety keep up with the pace of change in computer technology?pace of change in computer technology?

18

Computer ModelsComputer Models

• Points to Consider:Points to Consider:•Models are simplifications of either physical or Models are simplifications of either physical or

intangible systems.intangible systems.

•Those who design and develop models must be Those who design and develop models must be honest and accurate with results.honest and accurate with results.

•Computer professionals and the general public Computer professionals and the general public must be able to evaluate the claims of the must be able to evaluate the claims of the developers.developers.

Q: What problems in your community have been or could be studied with computer models?

19

Computer ModelsComputer Models

• Evaluating ModelsEvaluating Models– Why Models Might Not Be Accurate:Why Models Might Not Be Accurate:

•Developers have incomplete knowledge of the system Developers have incomplete knowledge of the system being modeled.being modeled.

•Data might be incomplete or inaccurate.Data might be incomplete or inaccurate.

•Power of the computer might be inadequate.Power of the computer might be inadequate.

•Variables are difficult to numerically quantify.Variables are difficult to numerically quantify.

•Political and economic motivation to distort results.Political and economic motivation to distort results.

Q: For each item above, give an example of a model or simulation that was inaccurate.

20

Computer ModelsComputer Models

• Evaluating Models Evaluating Models (cont’d)(cont’d)

– Regarding the Car-Crash Models Regarding the Car-Crash Models Described in the Text:Described in the Text:•How well do the modelers understand the How well do the modelers understand the

system and/or materials being studied? How system and/or materials being studied? How accurate and complete are the data?accurate and complete are the data?

•What are the assumptions and simplifications What are the assumptions and simplifications in the model?in the model?

•Do the results or predictions correspond with Do the results or predictions correspond with the real world?the real world?

21

Computer ModelsComputer Models

• Evaluating Models Evaluating Models (cont’d)(cont’d)

– Regarding the Climate Models Discussed Regarding the Climate Models Discussed in the Text:in the Text:•How well do the modelers understand the How well do the modelers understand the

system and/or materials being studied? How system and/or materials being studied? How accurate and complete are the data?accurate and complete are the data?

•What are the assumptions and simplifications What are the assumptions and simplifications in the model?in the model?

•Do the results or predictions correspond with Do the results or predictions correspond with the real world?the real world?