Structure-Based Design and Analysis for Concurrent Error Detection and Recovery in Reliable Electronic Systems
Abstract:
The complex digital electronic systems necessary for successfully attaining the SDIO mission will be composed of extremely large numbers of fine geometry active devices densely integrated with potentially submicron interconnect. A major problem with the integration of these complex microelectronic circuits on VLSI chips and wafer-scale integration is that they become highly susceptible to physical failures and environmental disturbances, especially intermittent and transient failures. This problem is aggravated when these electronic systems are deployed in harsh environments potentially resulting in erroneous computation due to transient failures as projected in the SDIO mission. Failures can result in disastrous incorrect computational results or complete system shutdown. The purpose of this research has been to develop techniques for the design of highly reliable electronic digital systems based on the concurrent detection of errors and rapid recovery from failures and environmentally induced upsets. Concurrent error detection consistently monitors the correctness of computational results and therefore allows for the detection of errors due to transient environmental upsets as well as permanent failures. Basic scientific results were obtained in this research concerning the derivation of design and analysis principles for concurrent error detection and recovery in electronic systems.