Institute of Computer Science - Graduation Theses Registry

Completed theses (Submit your thesis) Graduation theses topics (Submit a thesis topic)

Fault Tolerant Distributed Computing Framework for Scientific Algorithms

Name

Ilja Kromonov

Abstract

The physical limitations of computing hardware have put a stop on the increase of a single processor core's computing power. However, Moore's law is still maintained through the ever increasing parallelism of the computing architectures. At the same time the demand for computational power has been unrelentingly growing, forcing people to adapt the algorithms they use to these parallel architectures. One of the many downsides to parallel architectures is that with the rise in the number of components, the chance of failure of one of these components increases. When it comes to embarrassingly parallel data-intensive algorithms, Map-Reduce has gone a long way in ensuring users can easily utilize large amounts of distributed computing resources without the fear of losing work. However, this does not apply to iterative communication-intensive algorithms common in the scientific computing domain. In this work a new BSP-inspired (Bulk Synchronous Parallel) programming model is proposed, which adopts an approach similar to continuation passing for implementing parallel algorithms and facilitates fault-tolerance inherent in the BSP program structure. The distributed computing framework NEWT, which is based on the proposed model, is described and used to validate the approach. The framework retains most of the advantages that Map-Reduce provides, yet efficiently supports a larger assortment of algorithms, such as the aforementioned iterative ones.

Graduation Thesis language

English

Graduation Thesis type

Master - Computer Science

Supervisor(s)

Pelle Jakovits, Satish Narayana Srirama

Defence year

2014

PDF

UT Institute of Computer Science Graduation Theses Registry

Fault Tolerant Distributed Computing Framework for Scientific Algorithms