Causal Distributed Breakpoints
RICE UNIV HOUSTON TX DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
A causal distributed breakpoint is initiated by a sequential breakpoint in one process of a distributed computation, and restores each process in the computation to its earliest state that reflects all events that happened before the breakpoint. A causal distributed breakpoint is the natural extension for distributed programs of the conventional notion of a breakpoint in a sequential program. We present an algorithm for finding the causal distributed breakpoint given a sequential breakpoint in one of the processes. Approximately consistent checkpoint sets are used for efficiently restoring each process to its state in a causal distributed breakpoint. Causal distributed breakpoints assume deterministic processes that communicate solely by messages. The dependencies that arise from communication between processes are logged. Dependency logging and approximately consistent checkpoint sets have been implemented on a network of SUN workstations running the V-System. Overhead on the message passing primitive varies between 1 and 14 percent for dependency logging. Execution time overhead for a 200 x 200 Gaussian elimination is less than 4 percent, and generates a dependency log of 288 kilobytes.
- Computer Programming and Software