death by software the therac-25 radio-therapy device brian mackay ese6361 - requirements engineering...

21
Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

Upload: albert-nichols

Post on 26-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

Death by SoftwareThe Therac-25 Radio-Therapy Device

Brian MacKay

ESE6361 - Requirements Engineering – Fall 2013

Page 2: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

2The Atomic Age

• World War II ushered in the atomic age• The start of the nuclear arms

race

• In many countries…• The question was how to

harness this power for peaceful purposes

Page 3: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

3In Canada: AECL

• Atomic Energy of Canada Limited is a “Crown Corporation”

• Designed and implemented a Heavy Water nuclear reactor• The CANDU system

• It also included AECL-Medical• Harnessing the atom for

medical reasons

Page 4: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

4AECL & CGR – Medical Accelerator Technology

• AECL-Medical and the French company: la Compagnie Générale de Radiologie (CGR)

• Worked together during the 1970s on using linear accelerators for radio-therapy• High energy, low dose, Electron beams, or• A stream of photons in the X-Ray spectrum

• The two companies’ partnership produced• The 6 MeV, X-Ray only “Therac-6”• The dual mode, 20 MeV “Therac-20”

Page 5: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

5Therac-6 & Therac-20

• Stand-alone electro-mechanical units

• Operator could• Set all settings manually• Position beam devices manually• Once everything was set, and system was “safe” – deliver the dose

• The system had an optional computer that allowed a simpler UI• A Digital Equipment PDP-11• 32 kilobytes of memory• All assembly code

Page 6: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

6True Innovation: the Therac-25

• AECL only – CGR partnership had dissolved

• Used a Double-Pass accelerator• Halved the space that the

Therac-6 & Therac-20 had occupied

• Made the computer the primary controller• No stand-alone manual mode

• Shipped in 1983• Still used a DEC PDP-11

Page 7: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

7It was the best on the market…

• Except…

• It seriously injured 6 patients between 1985 and 1987

• Killing 3 of those patients

• All because of software

Page 8: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

8Hubris

• When an engineer graduates in Canada, he/she attendsThe Ritual Calling of an Engineer

• And gets an Iron Ring

• Rudyard Kipling wrote the ceremony• Instills a sense of

professionalism• And humility

Page 9: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

9Supreme Faith in Software

• It appears that this device had rigorous safety engineering on the hardware side• Complete hazard analysis – fault tree

• On the software side, the likelihood of error was described in insanely low terms• Fault probabilities on the order of 10-9 and 10-11

• “Software does not degrade due to wear, fatigue or the reproduction process”

• They had no expectation that a bug could cause a problem

Page 10: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

10Malfunction 54

• When there was a problem, the UI displayed the word “Malfunction” followed by a number 1-64• There was NO documentation of what these codes were in the user

manual• An internal AECL service manual described #54 as “dose input 2”

and pointed out that this error code was only there for internal diagnostic reasons

• Under normal conditions, an operator might see as many as 40 malfunction codes in a day• But Malfunction 54 was very rare• They were easily dismissed by pressing [P] (for “Proceed”)

Page 11: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

11Electron Mode vs. X-Ray Mode

• In Electron Mode a low power beam is scanned across the patient

• In X-Ray mode a high power beam is aimed at a target, producing X-Rays, which then irradiate the patient

• The electron scanning mechanism and X-Ray target were mounted on a turntable• The position was controlled by

the computer

Page 12: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

12Usability

• User interface was a VT-100 Green Screen

• Contained the Prescription• Entered by the operator

• Originally – on error, prescription had to be re-entered• Usability studies changed

this, near the end of the dev cycle

• Introduced a major error

PATIENT NAME : JOHN DOETREATMENT MODE : FIX BEAM TYPE: X ENERGY (MeV): 25

ACTUAL PRESCRIBED UNIT RATE/MINUTE 0 200 MONITOR UNITS 50 50 200 TIME (MIN) 0.27 1.00

GANTRY ROTATION (DEG) 0.0 0 VERIFIEDCOLLIMATOR ROTATION (DEG) 359.2 359 VERIFIEDCOLLIMATOR X (CM) 14.2 14.3 VERIFIEDCOLLIMATOR Y (CM) 27.2 27.3 VERIFIEDWEDGE NUMBER 1 1 VERIFIEDACCESSORY NUMBER 0 0 VERIFIED

DATE : 84-OCT-26 SYSTEM : BEAM READY OP.MODE: TREAT AUTOTIME : 12:55. 8 TREAT : TREAT PAUSE X-RAY 173777OPR ID : T25VO2-RO3 REASON : OPERATOR COMMAND:

Page 13: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

13A Race Condition – UI & Operations Threads

• In the Therac-25, the prescription information was entered

• The Electron/X-Ray mode

• Then a command to execute

• If the operator • Entered an X-Ray command in error• Re-edited the page and changed it to

Electron• Then executed the dose, all within 8

seconds

• Then the patient was given an X-Ray dose directly through the Electron turntable element

PATIENT NAME : JOHN DOETREATMENT MODE : FIX BEAM TYPE: X ENERGY (MeV): 25

ACTUAL PRESCRIBED UNIT RATE/MINUTE 0 200 MONITOR UNITS 50 50 200 TIME (MIN) 0.27 1.00

GANTRY ROTATION (DEG) 0.0 0 VERIFIEDCOLLIMATOR ROTATION (DEG) 359.2 359 VERIFIEDCOLLIMATOR X (CM) 14.2 14.3 VERIFIEDCOLLIMATOR Y (CM) 27.2 27.3 VERIFIEDWEDGE NUMBER 1 1 VERIFIEDACCESSORY NUMBER 0 0 VERIFIED

DATE : 84-OCT-26 SYSTEM : BEAM READY OP.MODE: TREAT AUTOTIME : 12:55. 8 TREAT : TREAT PAUSE X-RAY 173777OPR ID : T25VO2-RO3 REASON : OPERATOR COMMAND:

Malfunction 54

Page 14: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

14Why Have One Deadly Bug?

• A second deadly bug was eventually found in the Therac-25

• The system periodically tested if everything is positioned properly, setting a variable with the result of the test• A zero indicated OK

• Instead of simply setting the value to 1 or 0, the program incremented the value• And, the variable was a byte

• The result was that every 256 tests of the positioning, the system would falsely indicate that everything was ready to proceed.

Page 15: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

15Noteworthy: The Users Found the Bugs

• It’s worth noting that AECL’s reaction to the problems initially was denial• Eventually, the got to the stage where they did piecemeal fixes

• Without the efforts of the staff at the East Texas Cancer Center in Tyler, AECL might never have acknowledged the first bug• After two accidents – with the same operator – they spent time

trying to recreate the race condition

• After the Therac-25, the FDA changed the way it evaluated software (and software engineering) in medical devices.

Page 16: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

16The Scorecard

Total Accidents

Deaths

Malfunction 54Race Condition

3 2

Incorrect Increment Logic

3 1*

Total 6 3 One patient died of cancer, but would have died of radiation poisoning in a

few weeks had the cancer not killed him

Page 17: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

17Not the Bugs – The Software Engineering

• All software systems have bugs• Even Knuth hands out the occasional $2.56 check

• AECL coalesced their entire operator interface, control system and safety system into one program

• They apparently had very little in the way of formal requirements gathering, design or development standards• All of the software was developed by one programmer

• Their reaction to the problems was to fix them one at a time

Page 18: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

18Software Reuse

• The Therac-20 reused some of the software from the Therac-6

• The Therac-25 reused software from both of the previous models

• But• The earlier models had hardware interlocks to prevent over-

dosing

• The desire to reuse previous software resulted in a• Home-made real-time operating system• On an expensive, 10 year old computer system • Running a program written entirely in assembly language• That relied on global variables for inter-task communication – without

synchronization

Page 19: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

19No Requirement to Separate Layers

• AECL architected the Therac-25’s software into a single point of failure

• This was far from accepted practice in the early 1980s• Safety systems were migrating from hardware to software• But… they were usually separate, simpler systems – e.g. PLCs

• By the early 80s, there were usually three distinct layers• Safety and integrity• Control and positioning• Operator interface and supervisory

Page 20: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

20Testability – Auditing

• AECL’s task architecture and real time OS made adequate testing nearly impossible• Look at the deadly errors – neither is discoverable through testing

• No auditing of operations, or failures was included in the system

• After all the issues with the Therac-25, a check was done on the Therac-20 system and the same bugs were found• But, because that system had mechanical interlocks, no injuries

resulted

Page 21: Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

21References

• “Medical Devices – The Therac-25”, Levenson, Nancy.http://sunnyday.mit.edu/papers/therac.pdf

• “An Investigation of the Therac-25 Accidents”, Levenson, Nancy and Turner, Clark S., IEEE Computer, Vol. 26, No. 7, July 1993, pp. 18-41http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Therac_1.html

• “Fatal Dose - Radiation Deaths linked to AECL Computer Errors”, Rose, Barbara Wade, Saturday Night (magazine), June, 1994http://www.ccnr.org/fatal_dose.html

• “Safety-Critical Computing: Hazards, Practices, Standards, and Regulation”, Jacky, Jonathan, http://staff.washington.edu/jon/pubs/safety-critical.html

• “Therac-25”, Wikipediahttp://en.wikipedia.org/wiki/Therac-25

• “PDP-11”, Wikipediahttp://en.wikipedia.org/wiki/PDP-11

• “PDP-11 architecture”, Wikipediahttp://en.wikipedia.org/wiki/PDP-11_architecture