Fault-Tolerance in Distributed and Multiprocessor Real-Time Systems
Final rept. 1 Sep 1992-31 Aug 1993
TEXAS ENGINEERING EXPERIMENT STATION COLLEGE STATION
Pagination or Media Count:
New schemes for fault-tolerance in multiprocessor and distributed systems have been developed in the following areas We have investigated a number of fault tolerance schemes to evaluate performance, reliability, and availability trade-offs. Fault tolerance schemes are being developed for various fault models tail-stop model, fail-slow model, and arbitrary failure model and application areas applications that are to provide results at the end of computation and applications that are long-running but should also provide results during computation. In the area of software-implemented fault tolerance, we are studying approaches for providing user transparent mechanisms for fault tolerance to design and implement a software library to which the user can link existing application software to achieve the desired level of fault tolerance. We are developing a new tool Reliable Architecture Characterization Tool--REACT for evaluating the reliability and availability of distributed multiprocessor systems using various fault tolerance techniques. This tool will facilitate evaluation of the fault tolerance schemes that we develop.
- Computer Programming and Software
- Computer Systems Management and Standards