DID YOU KNOW? DTIC has over 3.5 million final reports on DoD funded research, development, test, and evaluation activities available to our registered users. Click
HERE to register or log in.
Accession Number:
ADA279664
Title:
System Level Fault Tolerance in Parallel and Distributed Computing Systems
Descriptive Note:
Final rept. 1 Jul 1988-31 Dec 1993n
Corporate Author:
TEXAS UNIV AT AUSTIN
Report Date:
1993-12-31
Pagination or Media Count:
15.0
Abstract:
The major thrust of our effort was focused on the theory and practice of responsive fault-tolerant, real-time computing in parallel and distributed processing environments. New efficient methods of system testing have been developed which shorten a multiprocessor testing time by orders of magnitude and, therefore, can be used at system booting previous techniques were prohibitively long. A new design framework for responsive computing was designed and is being implemented for validation. This framework for responsive computing was designed and is being implemented for validation. This framework is based on consensus which can be used to provide synchronization, reliable communication, fault diagnosis, checkpointing and even scheduling in multiprocessor environments. We have formalized and quantified the space-time tradeoff for efficient fault recovery. The system model is a graph, and we were especially successful in analysis of meshes and hypercubes. We developed a new method called naturally redundant algorithms which allows efficient implementation of application-specific techniques
Distribution Statement:
APPROVED FOR PUBLIC RELEASE