Computer Mishaps Lead to Search for Failsafe Software

July 27, 1995

FATAL DEFECT: CHASING KILLER COMPUTER BUGS

Ivars Peterson

Times Books

260 pp., $25

IF the generals had believed their computers, World War III could have been started on the morning of June 3, 1980, the result of a poorly written computer program.

On that morning at 1:26, writes Ivars Peterson in his new book ''Fatal Defect,'' ''displays at the Strategic Air Command (SAC) post at Offutt Air Force Base near Omaha, Nebraska, suddenly indicated that two submarine-launched ballistic missiles were heading toward the United States. Eighteen seconds later, the early warning system displayed that even more missiles were en route.

''Then, strangely, it announced that no missiles were coming. Shortly thereafter, new information appeared showing that Soviet intercontinental ballistic missiles had been launched. Meanwhile, displays at the National Military Command Center in the Pentagon started to register the launch of enemy missiles from submarines.''

Of course, there was no Soviet missile launch. The problem was a faulty integrated circuit on a communications link within the Strategic Air Command. Prior to June 3, Peterson explains, the military had checked the integrity of its communications link ''by sending fake attack messages, with a zero filled in for the number of missiles detected.'' But when the chip failed, it started filling in random numbers.

Who is to blame? The company that manufactured the chip? The Air Force, for failing to service its equipment? Neither, Peterson writes. The fault lies with the system's designers, who decided to test their communications link with actual attack messages, while failing to build into their programs codes for detecting communications errors. The phantom missile attack was an accident waiting to happen. It's also one of the many cases of programmed design flaws and resulting computer foul-ups described in this new book.

Others include the breakdown of the AT&T long distance network on Jan. 15, 1990, and of the Therac-25, a medical radiation therapy device that accidentally overdosed six patients - causing several deaths.

These cases have been a cause celebre among computer cognoscenti in recent years, largely because of the efforts of two people: Peter G. Neumann, a scientist at SRI International, who has spent the last decade moderating an electronic mailing list called RISKS that is devoted to reporting and commenting on cases of computer-related malaise; and Nancy G. Leveson, a professor at the University of Washington in Seattle, whose detailed analysis of the problems in the Therac-25 episode has served as a landmark study of problems that arise when computers are used in safety-critical applications.

Peterson spends his first two chapters recounting Neumann's and Leveson's work. In his third chapter, he writes about the potential for disaster in computer-controlled nuclear plants, focusing on the career of David L. Parnas, a software engineer who is best known for his criticism of the US government's ''star wars'' Strategic Defense Initiative.

By following the careers of these researchers, Peterson shows that software safety is a pressing problem that more often than not is ignored. He also tells a good story. But then something goes wrong, and in the chapters that follow, Peterson fails to deliver on his promise of chasing down killer computer bugs.

Instead of tearing apart the question of why computer programs always have bugs and exploring the ways in which the impact of these bugs can be minimized, Peterson looks at people who have tried - and mostly failed - to make computers more reliable.

It makes for confusing reading. After convincingly demonstrating that there are no ''silver bullets'' in developing safe software, he burdens the reader with a slow and somewhat confusing chapter about Victor R. Basili, a computer science professor at the University of Maryland, who has spent much of his career searching for such a silver bullet for NASA's unmanned space exploration program.

Peterson's book is an entertaining introduction to the problems of creating reliable computer systems. But for an indication of how widespread the problem is, one would do better to read Neumann's recently published book, ''Inside RISKS'' (Addison-Wesley).

Besides containing literally hundreds of computer-related disasters, ''Inside RISKS'' also has provocative essays on threats to privacy, and the threats to democracy of computer-controlled voting machines.

Unfortunately, neither Peterson nor Neumann does much to suggest alternative ways of software development - techniques that would allow computers to be used with confidence to fly aircraft, drive cars, and control power plants. To find those alternatives, check out Leveson's new book, ''Safeware'' (Addison Wesley).

Read any of these books, and you will make an unsettling discovery. What is explained as human error - a plane crash, a train wreck, or simply a lost file on your desktop PC - is in fact frequently designed-in failure.