Finding the Maximum Recoverable System State in Optimistic Rollback Recovery Methods
RICE UNIV HOUSTON TX DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
In a distributed system using rollback recovery, information saved on stable storage during failure-free execution allows certain states of each process to be recovered after a failure. For example, in a deterministic system using message logging and checkpointing, a process state can be recovered only if all messages received by the process since its previous checkpoint have been logged. In a nondeterministic system using checkpointing alone, a process state can be recovered only if it has been recorded in a checkpoint. Optimistic rollback recovery methods in general record this information asynchronously, assuming that a suitable recoverable system state can be constructed for use during recovery. A system is called recoverable if and only if it is consistent and the state of each individual process in that system state can be recovered. This paper shows that in any system using optimistic rollback recovery, there is always a unique maximum recoverable system state, extending our previous result for systems using message logging and checkpointing. We also present a simple new algorithm for finding the maximum recoverable system state, and describe some experience with its implementation. These results can be applied to deterministic and to nondeterministic systems.
- Computer Systems