Checkpointing and Error Recovery in Distributed Systems,

Mcdermid, J. A.

Checkpointing and Error Recovery in Distributed Systems,

Active / Technical Report | Accession Number: ADA093463 |

Open PDF

Abstract:

This paper discusses some of the problems of producing fault tolerant distributed computer systems, in particular those of software error recovery. It shows how checkpoints may be used in error recovery, it defines the information that checkpoints must contain, and discusses alternate strategies for checkpointing. It describes models of error recovery and extends an existing recovery protocol to cater for certain types of checkpoint inconsistencies. The paper defines protocols for systematically generating checkpoints so that they can be used by the recovery protocols. It also defines a protocol for discarding checkpoints when they are no longer of use, which prevents the set of checkpoints growing indefinitely. The paper concludes by considering some of the problems of implementing the protocols. Author

Author(s):

Mcdermid, J. A.

Author Organization(s):

ROYAL SIGNALS AND RADAR ESTABLISHMENT MALVERN (ENGLAND)

Pagination:

0027

Security Markings

DOCUMENT & CONTEXTUAL SUMMARY

Distribution:

Approved For Public Release

RECORD

Collection: TR

Identifying Numbers

Report Number(s):

RSRE-MEMO-3271, DRIC-BR-76154

Monitor Series:

BR-76154

Subject Terms

Joint Capability Areas:

JCA_8.1.3_Influence Adversary and Competitor Audiences; JCA_8.1_Communicate; JCA_8_Building Partnerships; JCA_6_Net Centric; JCA_6.1.3_Switching and Routing; JCA_6.1_Information Transport; JCA_5_Command and Control; JCA_5.3_Planning; JCA_4.3_Maintain; JCA_1_Force Support; JCA_6.2.1_Information Sharing; JCA_6.2.2_Computing Services; JCA_6.4.1_Secure Information Exchange

Communities of Interest:

Materials and Manufacturing Processes

Descriptor(s):

*COMPUTER ARCHITECTURE, *ERROR ANALYSIS, *FAULT TOLERANT COMPUTING, RECOVERY, COMPUTER COMMUNICATIONS, DATA TRANSMISSION SYSTEMS, NODES, HIGH LEVEL LANGUAGES, CHECKOUT PROCEDURES, NETWORK FLOWS, UNITED KINGDOM

Field(s)/Group(s):

Computer Programming and Software, Computer Hardware, Computer Systems

Keyword(s):

*Distributed data processing, Flex computer system

Report Date:

1980 Sep 01