A Theory for Algorithm- Based Fault Tolerance in Array Processor Systems.
ILLINOIS UNIV AT URBANA COMPUTER SYSTEMS GROUP
Pagination or Media Count:
The concept of algorithm-based fault tolerance deals with system-level methods for obtaining reliable results from computations, especially when performed on array processor systems. In this scheme, algorithms have their outputs encoded in a system-level error-detecting or -correcting code. This report deals with a theoretical study of the scheme of algorithm-based fault tolerance and addresses four issues. First, it deals with some design issues of specific fault-tolerant and fault-secure schemes. Algorithms are classified into broad classes called paradigms which are determined exclusively by the communication patterns of the processors. The second part deals with the development of a model which can be used to analyze the fault-detecting and -locating capabilities of such algorithms. The model uses a broad interpretation of errors, faults and checks, which are represented as a tripartite graph. In the third part, some graph-theoretic bounds are presented on various useful characteristics in algorithm-based fault tolerance. The model is used to determine bounds on the number of data elements that a processor may affect while allowing t-fault detection or t-fault location. The last part deals with a probabilistic study of the scheme. Expressions for the reliability of the results of the computation, and the time for completion of the computation using a particular algorithm, are derived in terms of various parameters.
- Theoretical Mathematics
- Computer Hardware