Fault-Tolerant Wait-Free Shared Objects
CORNELL UNIV ITHACA NY DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
A concurrent system consists of processes and shared objects. Previous research focused on the problem of tolerating process failures. We study the complementary problem of tolerating object failures. We divide object failures into two broad classes responsive and non-responsive. With responsive failures, a faulty object responds to every invocation, but responses may be incorrect. With non-responsive failures, a faulty object may also hang without responding. For each class, we consider crash, omission, and arbitrary types of failures. For each type of failure, we are seeking a universal implementation for fault-tolerant wait-free shared objects. We present deterministic implementations for all types of responsive failures, including arbitrary failures. In contrast, we show that even the most benign type of non-responsive failures requires the use of randomization. Of special interest is the problem of implementing fault-tolerant objects using only objects of the same type. We present such fault-tolerant self-implementations for many common object types.
- Computer Hardware