CORNELL UNIV ITHACA NY DEPT OF COMPUTER SCIENCE
Pagination or Media Count:
The focus of this chapter is on the use of data replication and replicated execution to obtain faster response time or fault-tolerance in distributed programs. These techniques can be critical in determining whether or not a network-based solution to an application problem will be feasible. For example, modular expansion and price-performance considerations argue for the use of distributed systems in factory automation settings. However, many factories contain devices controlled by dedicated processors that require realtime response. Any delay imposed on the controllers by the network must be bounded. This is hard to ensure because of possible packet loss and unpredictable load on remote servers. Consequently, such systems are forced to replicate or cache data needed by the controllers. This chapter explores a number of approaches to replication and distributed consistency issued. The treatment is applicable to a conventional local area network or a loosely coupled multiprocessor. The programs and computers in such systems fail benignly, by crashing without sending out incorrect messages. Processors do not have synchronized clocks, hence the failure of an entire site can thus only be detected unreliably, using timeouts. Message communication is assumed to be reliable but bursty, because packets can be lost and may have to be retransmitted.
- Computer Programming and Software