Fault-Tolerant Wait-Free Shared Objects
CORNELL UNIV ITHACA NY DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
A concurrent system consists of processes communicating via shared objects, such as shared variables, queues, etc. The concept of wait-freedom was introduced to cope with process failures each process that accesses a wait-free object is guaranteed to get a response even if all the other processes crash. But what if these wait-free objects themselves fail For example, if a wait-free object crashes , all the processes that access that object are prevented from making progress. In this paper, we introduce the concept of fault-tolerant wait-free objects, and study the problem of implementing them. We give a universal method to construct fault-tolerant wait-free objects, for all types of responsive failures including one in which faulty objects may lie . In sharp contrast, we prove that many common and interesting object types such as queues, sets, and test and set have no fault-tolerant wait-free implementations even under the most benign of the non-responsive types of failure. We also introduce several concepts and techniques that are central to the design of fault-tolerant concurrent systems the concepts of self-implementation and graceful degradation, and techniques to automatically increase the fault-tolerance of implementations. We prove matching lower bounds on the resource complexity of most of our algorithms.
- Computer Hardware