A Method and Tool for Analyzing Fault-Tolerance in Systems

reportActive / Technical Report | Accession Number: ADA587646 | Open PDF

Abstract:

As computers are integrated into systems that have stringent fault-tolerance requirements, there is a growing need for techniques to establish that these systems actually satisfy those requirements. Informal arguments do not supply the desired level of assurance for critical systems. This dissertation presents a rigorous, automated approach to analyzing distributed systems, with a focus on checking fault-tolerance requirements, and describes a prototype implementation of the analysis. The analysis is a novel hybrid of ideas from stream-processing semantics of networks of processes, abstract interpretation of programs, and symbolic computation. The underlying principles of the analysis method are general, but specialized techniquessuch as the use of perturbations to represent changes to normal behavior caused by failuresare developed to deal efficiently with the types of systems and requirements that arise in establishing fault-tolerance. The method is illustrated with three examples the Oral Messages algorithm for Byzantine Agreement, due to Lamport, Shostak and Pease, a standard protocol for FIFO reliable broadcast, and a subtly awed protocol for fault-tolerant moving agents.

Security Markings

DOCUMENT & CONTEXTUAL SUMMARY

Distribution:
Approved For Public Release
Distribution Statement:
Approved For Public Release; Distribution Is Unlimited.

RECORD

Collection: TR
Identifying Numbers
Subject Terms