Analysis of Static Recovery Schemes for Distributed Computing : COMPARATIVE ANALYSIS OF STATIC RECOVERY SCHEMES FOR DISTRIBUTED COMPUTING

Bok av Syed Muhammad Husnain Kazmi
Availability is a key feature in the fault tolerant distributed systems, like clusters, and it can be achieved by using failover techniques. For cluster availability, simplified strategies such as cold backup and warm backup are used by spare resources. High cost can be a drawback for using many stand-by computers in a large cluster system in order to achieve predefined level of availability. Another substitute solution for failure detection and recovery is the hot backup, even it is hard to make a decision on which computer the task of failure computer is executed while maintaining the load balance. Dynamic monitoring facilities and central scheduling are the usual solutions for this. In practice, the above solution turned out to be a problem with fault tolerant and scalability, while the mandatory scheduler acts both as a single point of failure and coordination bottleneck.