Accession Number : ADA259426


Title :   The Performance of Cache-Based Error Recovery in Multiprocessors


Corporate Author : ILLINOIS UNIV AT URBANA COORDINATED SCIENCE LAB


Personal Author(s) : Janssens, Bob ; Fuchs, W F


Full Text : https://apps.dtic.mil/dtic/tr/fulltext/u2/a259426.pdf


Report Date : 25 Aug 1992


Pagination or Media Count : 24


Abstract : Several variations of cache-based checkpointing for rollback error recovery in shared-memory multiprocessors have been recently developed. By modifying the cache replacement policy, these techniques use the inherent redundancy in the memory hierarchy to periodically checkpoint the computation state. Three schemes, different in the manner in which they avoid rollback propagation, are evaluated in this paper. By simulation with address traces from parallel applications running on an Encore Multimax shared-memory multiprocessor, we evaluate the performance effect of integrating the recovery schemes in the cache coherence protocol. Our results indicate that the cache- based schemes can provide checkpointing capability with low performance overhead but uncontrollable high variability in the checkpoint interval.... Fault- tolerant computing, Cache-based checkpointing and rollback recovery, Shared- memory multiprocessors, Trace-driven simulation.


Descriptors :   *MEMORY DEVICES , *MULTIPROCESSORS , *FAULT TOLERANT COMPUTING , SIMULATION , PROPAGATION , RECOVERY , REPLACEMENT , INTERVALS , FAULTS , REDUNDANCY , ERRORS , COMPUTATIONS , POLICIES , COHERENCE


Subject Categories : Computer Hardware


Distribution Statement : APPROVED FOR PUBLIC RELEASE