Phase 2 of an Architectural Study for a Self-Repairing Computer
Abstract:
The architecture and organization of a self-repairing computer which operates correctly even if some of its components malfunction is described. This computer, using currently available components, has a 0.997 probability of correct operation on a 10,000 hour mission with a 25 percent duty cycle. First the problem is stated, an overview of previous work is given, and the results of the investigation are summarized. Then several current computer component technologies are appraised and the assumption in reliability models of a Poisson failure distribution for these components is justified. Next, the development of several algorithms for the generation and evaluation of diagnostic test patterns for computer circuits is described. These algorithms are illustrated by examples and by detailed APL programs. Next, mathematical reliability evaluation models for a variety of functional computer units are devised, their APL programming implementation given, and their use in the design process illustrated. The effects of failure coverage, duty cycle, number of spares, failure tolerance, lower component failure rate with power off, and other parameters are tabulated and delineated. Last, the architecture and operation of an automatically repaired computer at different levels of organization is described, including failure tolerant storage, ROS, and arithmetic logical organs. The design of reconfiguration switches, status registers, and a fully checked decoder is also given. By employing these techniques, which require about 2.5 times as much hardware as an equivalent simplex computer, it is possible to gain the same benefits as would be produced by a 600 fold decrease in component failure rates.