cis 573 software engineering carl a. gunter fall 1999
Post on 25-Dec-2015
216 Views
Preview:
TRANSCRIPT
Contact Information
Course Web Page: http://www.cis.upenn.edu/~cis573.
Course announcements. Lecturer: Carl Gunter.
» Office hour: Thursday, 12:30-1:30, 370 Moore, 898-9506
Graduate Assistant: Mike McDougall.» Office hour: Wednesday 1:00-2:00, Moore 057a,
898-8116
What will I learn in the course?
Software engineering generally. Safety engineering as it offers lessons and ideas
for software. General principles for building safety critical
software systems. Techniques to achieve high confidence. How to analyze accidents.
Pre-requisites
Interest in both software and the systems in which it is used.
Programming in Java. Basic skill in mathematics; ability to learn some logic.
What am I Expected to Do?
Participate in classes. Read designated materials. Projects: individually or on a team. Final Exam: assesses understanding of lectures,
reading, and project presentations.
Participation and Reading
Slides distributed on course web page Textbook: Leveson, Safeware: System
Security and Computers. Other materials will be distributed.
Projects
Achieving confidence. Verifying software. Specifying software. Coding from a specification. Testing software.
Project Rules
No partner on first project. Groups of two are allowed on all subsequent
projects, but your partner must be different on each project.
Partners provide equal effort on a project.
Verification
Computer hardware and software can be mathematically described.
Hence, computers can be used to automate the verification of computer hardware and software systems.
Verification and Testing
Testing is like verification since each successfully-passed test is like a little theorem that has been proved about the implementation.
Verification has the capacity to cover large sets of cases exhaustively, eliminating the need for coverage conditions or statistical measures of confidence.
Verification by Reading
Testing
Simulation
FormalVerification Walkthrough
Audit
TechnicalReview
SoftwareInspection
Product Project
ManagementReview
From the IEEE Standard for Software Reviews and Audits
Verification and Validation
Verification can be used to show that the software or hardware conforms to a rigorous description of its expected behavior.
It cannot show that the behavior described is the one the user wanted.
Verification: building the system right Validation: building the right system
First Assignments
Reading: Chapters 1,2,3,4 of LevesonProject: Dekker, correctness of mutual
exclusion algorithms
Recommended for Fun and Profit
The Psychology of Everyday Things. Donald A. Norman, Basic Books, 1988.
Peter G. Neumann, Computer Related Risks, Addison Wesley, 1995. Drawn from the bulletin board: news:comp.risks.
Normal Accidents. Charles Perrow, Basic Books, 1984.
The Cuckoo's Egg: Tracking a Spy Through the Maze of Computer Espionage. Clifford Stoll, Mass Market Paperback,1995.
Bad Bytes: Why We Should Not Depend on Software. Lauren Ruth Wiener, Addison Wesley, 1993.
What is Risk?
Probability of Failure * Loss from Failure
Mitigate risk by increasing reliability or decreasing severity.
Risk and Opportunity
Many opportunities are held back by the expense of high risk.
Better assurance techniques break these barriers.
When is the risk low enough?
Risk and Opportunity Now
Fuel Injection and anti-lock breaking Fly-by-wire aircraft and computer-controlled
landings Reduced time gaps between trains Credit card purchases on the web and banking
online Online shareholder voting
Risk and Opportunity in the Future
Intelligent vehicles and highways
Electronic wallets
Genetically engineered organisms
Course Strategy
V V&T is a technology for increasing confidence in a system.
Its most vigorous application is in areas where the cost of failure is high.
We will focus primarily on software of this kind.
“High Risk” Computer Systems
Safety critical» Transportation and power systems» Medical and emergency systems
Security critical» Military systems» Electronic commerce
Mission critical» Key information systems» Key control systems
Low Risk Systems
When is a system non-critical? It is subjective and depends on use.
Some software is not strongly backed by its maker. Here is a standard industry disclaimer:
The entire risk as to the quality and performance of the program is with you. Should the program prove defective, you … assume the entire cost of all necessary servicing, repair or correction
Low Risk Systems continued
A refreshingly straightforward disclaimer:
We don't claim EasyFlow is good for anything---if you think it is, great, but it's up to you to decide. If EasyFlow doesn't work; tough. If you lose a million because EasyFlow messes up, it's you that's out the million, not us. If you don't like this disclaimer; tough. We reserve the right to do the absolute minimum provided by law, up to and including nothing.
Rigid Distinctions?
There are significant differences between the classes of high-risk systems.» Analysis of energy for safety systems.» Concept of an adversary in security systems.
But there are also many common themes.» Reliability of components.» Replication.» Backup.» Controlled failure modes.
Clayton Tunnel
Tunnel
A
B
Needle Telegraph
Signal Man
Semaphore
Gerard HoltzmanDesign and Validation of Computer Protocols
Signals In! Train in tunnel. Clear! Tunnel is free. Ok? Has the train left the tunnel?
Classes of Risks
Business Risk» Inadequate consumer interest in product» Standard for product controlled by competitor(s)
Project Risk» Inadequate time» Inappropriate personnel
Operational Failure Risk» Unavailability» Erroneous operation
Risk Factors
Appearance of new hazards Increasing complexity Increasing exposure Increasing amounts of energy Increasing automation of manual operations Increasing centralization and scale Increasing pace of technological change
Acceptable Risk
When is risk low enough? Risk-benefit analysis does not resolve moral
issues. Often the people taking the risk are not the ones
benefiting from the opportunity. Can we walk away from technical opportunity?
Role of Computers
Providing information or advice to a human operator upon request.
Interpreting data and displaying it to the controller, who makes the control decisions.
Issuing commands directly, but with a human monitor of the computer’s actions providing varying levels of input.
Eliminating the human from the control loop.
What Makes it Hard to Build Software?
Complexity of the functions required. Conformity to existing artifacts and
standards. Changeability of functions required. Invisibility of the software artifact, making it
hard to model or visualize.
Fred BrooksNo Silver Bullet---Essence andAccident in Software Engineering
What Makes Software Different?
Software is primarily a design, with no manufacturing variation, wear, corrosion or aging aspects.
It has a much greater capacity to contain complexity. It is perceived to be easy to change.
Software errors are systematic, not random. It is intangible.
Motor Industry Software Reliability Association (MISRA), Development Guidelines for Vehicle Based Software.
The Concept of Causality
The cause of an event is a set of conditions, each of which is necessary and which together are sufficient for the event to occur.
Individual conditions are called causal conditions or factors.
Chemical Process Industry
Classification of hazards:» fire» explosion» toxic release.
Factors influencing risk:» Size of inventory» Energy» Time» Intensity/distance relationship» Exposure
Bhopal
In December, 1984, release of methyl isocyanate (MIC) from Union Carbide chemical plant in Bhopal India resulted in the worst industrial accident in history.
MIC is used in pesticides. Demand for MIC pesticides had dropped after
1981 so plant was experiencing budgetary cutbacks.
Storage of MIC
610 611
Capacity: 60 tonsLimit: half fullTemperature: 0°CPressure: 3psi
50 tons 21 tons
619
1 ton
(thought to contain 20 tons)
(thought to contain 15 tons)
Events
10.30pm, December 2, 1984. A new worker was cleaning some valves.
11.00pm. Pressure was 10psi, temperature was 20°C. 11.30pm. Leak was discovered, workers notice eye
irritation. 12.40am. 40psi, 25°C, rumbling noise in tank, concrete
casing cracked. Then 400°C, began release of 50,000 pounds of MIC gas.
12.50am-12.55am. Siren sounded when MIC seen escaping from vent stack.
Cause and Effect
2,000 to 3,000 people killed, 10,000 with permanent disabilities, 200,000 injured.
Blamed by management on `human error’. Masking the complexity of causal factors.
Over-simplification
Human error.Technical failures.Organizational factors.Multiplicity of factors.Legal financial responsibility.
Legal View
Cause in fact is established by evidence showing that a defendant’s act or omission was a necessary antecedent to plaintiff’s injury.
Legal (or proximate) cause is a device for limiting liability of a defendant to consequences bearing some reasonable relationship to the risks he or she created.
Legal Cause
Example: car is negligently driven, strikes another car which hits a lamp post, thereby causing a power outage in a region. Legal responsibility may be limited to only part of the total consequences.
Primary classes of limitation» Unforeseen consequences» Intervening causes
DC-10 Cargo Door
In March 1974 a Turkish Airline DC-10 crashed near Paris resulting in 346 deaths.
Flight control cables in the DC-10 are routed under the cabin floor rather than along the airframe.
Cargo hold depressurization could collapse cabin floor. Improperly closed cargo door caused flight control
cables to be cut. Root cause: operator error?
Cabin Floor
Root Causes of Accidents
Flaws in Safety Culture Ineffective Organizational Structure Ineffective Technical Activities
Safety Culture
Discounting riskExcessive reliance on redundancyUnrealistic risk assessment Ignoring high-consequence, low-
probability events
Safety Culture, continued
Assuming risk decreases over timeUnderestimating software-related risks Low priority for safety Ignoring warning signsFlawed resolution of goals.
Organizational Activities
Diffusion of responsibility and authorityLack of independence and inadequate
rank of safety personnelLimited communication
Technical Activities
Superficial safety efforts Ineffective risk controlFailure to eliminate basic design flawsBasing safeguards on false
assumptions
Technical Activities, continued
Complexity Using safety devices to reduce safety
margins Inadequate collection and recording of
information Failure to use information Failure to evaluate changes
Flixborough In 1974 an explosion occurred at the Nypro Ltd.
chemical works at Flixborough killing 28 people working at the plant, including all 18 people in the control room.
The plant was making caprolactum, an intermediary product for manufacturing nylon.
The process used cyclohexane, a chemical with properties similar to gasoline.
Plant was under commercial pressure; competitors held the patent on a safer process for making caprolactum.
Events
Six reactors connected by 28-inch pipes were used. An escape was detected in reactor 5.
A change was made to bypass the reactor using 20-inch pipe with a dogleg.
This appeared to work for two months but eventually escaping cyclohexane created a vapor cloud that was ignited by a discharge tower.
Causal Factors
Changes that were not reviewed. Conflicting priorities. Organizational structure. Superficial safety activities.
Seveso In 1976 a cloud of dioxin was produced by the Icmesa
chemical factory in northern Italy and was washed by rain onto the town of Seveso. Numerous people were affected and a large region was contaminated.
Trichlorophenol is used to make bactericides and herbicides.
During processing tetrachlorodibenzodioxine (dioxin) can be produced. Dioxin is very toxic.
Changes were made in the production system to save money. The new process had increased risk of heat release and dioxin formation.
Events
Reaction and distillation cycle was started 10 hours later than usually on a Friday and the reactor was left to run unattended for the weekend.
Heat increased to 450°-500° and created conditions for the production of dioxin.
A valve released a toxic cloud that was carried by rain into Seveso.
Effects were first noticed in burned vegetation and sores on children.
It took some time to recognize that dioxin from the factory was the cause and act on this.
Causal Factors
Changes that were not communicated or reviewed.
Discounting risk. Ineffective safety measures.
» Inadequate warning.» Slow analysis.» Valve release unsafe.
Therac-25
Between June 1985 and January 1987, six patients received massive overdoses from a computer-controlled radiation therapy machine called the Therac-25.
History of the development of the device.» AECL and CGR» Electron and X-ray» Dual mode electron accelerator, Therac 20 and 25
Hazard Analysis Assumptions
Programming errors have been reduced by extensive testing on a hardware simulator and under field conditions. Residual software errors are not included in the hazard analysis.
Program software does not degrade due to wear, fatigue, or reproduction process.
Computer execution errors are caused by faulty hardware components and by random errors induced by alpha particles and electromagnetic noise.
Events
Kennestone Regional Oncology Center, June 1985
Ontario Cancer Foundation, July 1985 Yakima Valley Memorial Hospital, December
1985 East Texas Cancer Center, March 1986 East Texas Cancer Center, April 1986
Code Blamed for the Tyler Accidents
Program written in PDP-11 assembly language using its own standalone realtime operating system.
Four major components:» Stored data» Scheduler» Critical and non-critical tasks» Interrupt services
Pseudo-code for Key RoutinesDatent if mode/energy specified then begin calculate table index repeat fetch parameter output parameter point to next parameter until all parameters set call Magnet if mode/energy changed then return end if data entry is complete then set Tphase to 3 if data entry is not complete then if reset command entered then set Tphase to 0 return
Magnet Set bending magnet flag repeat set next magnet call Ptime if mode/energy has changed then exit until all magnets are set return
Ptime repeat if bending magnet flag is set then if editing taking place then if mode/energy has changed then exit until hysteresis delay has expired Clear bending magnet flag
Events, continued
FDA declares the Therac 25 defective, 2 May 1986
Yakima Valley Memorial Hospital, January 1987
Causal Factors
Overconfidence in software. Confusing reliability with safety. Lack of defensive design. Failure to address root causes. Inadequate investigation or follow-up on accident
reports. Software reuse. Safe versus friendly user interfaces. Government and user oversight and standards.
Causal Factors, continued
Inadequate software engineering practices.» Software specifications and documentation should
not be an afterthought.» Rigorous quality assurance needed.» Design needs to be simple.» Error detection needed from the beginning.» More than system testing needed.» Error messages and displays need careful design.
top related