Effectiveness Evaluation of Fault-Tolerant Multiprocessor Systems.
Final rept. 1 Oct 83-30 Aug 87,
DUKE UNIV DURHAM NC DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
An important area of research is in the analysis of the coverage of a fault tolerant system, that is, the probability that the system can recover from a fault. The author has studied a variety of models, from simple phase-type models to very complex stochastic Petri net models, and has investigated solution techniques for each model type. His methodology allows consideration of external events that can interfere with recovery, such as a hard limit on recovery time, or the occurrence of a second near-coincident fault. It was discovered that a policy of attempting transient recovery upon detection of an error as opposed to automatically reconfiguring the affected component out of the system may actually increase the unreliability of the system. This result holds if the error detectability is not nearly perfect, so that the risk of producing an undetectable error if the transient error is present is greater than the benefit gained by not discarding the component. Keywords Bibliographies.
- Computer Programming and Software