An Aspect-Oriented Approach to Assessing Fault Tolerance
RAYTHEON BBN TECHNOLOGIES CAMBRIDGE MA
Pagination or Media Count:
Fault tolerance and survivability are important aspects of many business-critical and mission-critical systems but it is still difficult to assess how well fault tolerance techniques work. Ensuring fault tolerance in military communication systems is particularly important due to the inevitability of hardware failure, data corruption, or service interruption and the risk that cascading failures could jeopardize critical military operations. In this paper, we present a fault tolerance assessment framework designed for distributed systems that provides automated injection of faults without changes to client or server code and automated assessment of whether the injected faults are tolerated. The framework applies aspect-oriented programming specifically AspectJ, to inject faults and weave in assessment criteria. The framework supports both assessing the tolerance of direct faults, such as crashes and corruption, like traditional fault injectors, and conditional faults, which can be probabilistically, randomly, or periodically injected at runtime. This latter class of faults is not historically supported by fault injectors, but enables the assessment of tolerance to many important classes of faults threatening modern distributed military communication systems, including timing faults resource exhaustion 201e.g., denial-of-service202, and integrity faults that are traditionally difficult to tolerate and assess. Additionally the framework provides a centralized view for users enabling them to monitor and script coordinated tests comprising performance metrics and injected faults spanning services applications, and hosts.
- Computer Programming and Software