|
|
|||
|
||||
OverviewThis dissertation, Adaptive Recovery With Hierarchical Checkpointing on Workstation Clusters by 周志賢, Chi-yin, Edward, Chow, was obtained from The University of Hong Kong (Pokfulam, Hong Kong) and is being sold pursuant to Creative Commons: Attribution 3.0 Hong Kong License. The content of this dissertation has not been altered in any way. We have altered the formatting in order to facilitate the ease of printing and reading of the dissertation. All rights not granted by the above license are retained by the author. Abstract: Abstract of thesis entitled Adaptive Recovery with Hierarchical Checkpointing on Workstation Clusters Submitted by Edward, Chi-yin Chow for the degree of Master of Philosophy at the University of Hong Kong in August 1999 Abstract: Fault tolerance is inevitable in persisting the scalability limits with reliable network- based computing systems. This thesis proposes adaptive checkpointing and recovery scheme for both disk-based and diskless checkpointing with reduced recovery latency and performance overhead. The purpose is to build the fault tolerance and recovery capability of clusters with a minimum architectural upgrade. Improving from traditional disk-based checkpointing which stores checkpoints in local disk, this thesis proposes a hierarchical checkpointing scheme for adaptive rollback recovery. Checkpoints at three architectural levels are suggested. The costs of checkpointing at various levels are characterized and analyzed quantitatively. Guidelines to optimize the checkpoint hierarchy are given. Potential performance gains of the fault tolerant cluster design are presented and drawbacks are also discussed. Improving from Li and Plank's diskless checkpointing and Vaidya's 2-level approach, this thesis has developed an interleaved checkpointing scheme for adaptive recovery. The idea is based on interleaved mirroring in network memory and in stable storage. This interleaved scheme provides a wider fault coverage than traditional schemes and is designed to tolerate single, double and some multiple faults, adaptively. Through theoretical analysis, we prove that much reduced latency is expected in this adaptive recovery from the most often encountered failure types. Using Markov cost model, the checkpoint overheads and recovery latency in the various schemes are quantified. Our results reveal their relative performance effects on the cluster size, interleaving degree, job length, and failure rate. Possible implementations of these adaptive schemes are discussed with a tradeoff study. These schemes appeal especially to the construction of large-scale Unix workstation clusters or Beowulf PC/Linux clusters. DOI: 10.5353/th_b2981291 Subjects: Fault-tolerant computingClient/server computingComputer networks Full Product DetailsAuthor: 周志賢 , Chi-Yin Edward ChowPublisher: Open Dissertation Press Imprint: Open Dissertation Press Dimensions: Width: 21.60cm , Height: 1.00cm , Length: 27.90cm Weight: 0.608kg ISBN: 9781374719316ISBN 10: 1374719315 Publication Date: 27 January 2017 Audience: General/trade , General Format: Hardback Publisher's Status: Active Availability: Temporarily unavailable The supplier advises that this item is temporarily unavailable. It will be ordered for you and placed on backorder. Once it does come back in stock, we will ship it out to you. Table of ContentsReviewsAuthor InformationTab Content 6Author Website:Countries AvailableAll regions |
||||