Framework for Automated Partitioning of Scientific Workflows on the Cloud

Jaagup Viil
Scientific workflows have become a standardized way for scientists to represent a set of tasks to overcome or solve a certain problem. Usually these workflows consist of numerous amount of jobs that are both CPU heavy and I/O intensive that are executed using some kind of workflow management system either on clouds, grids, supercomputers, etc. Previously, it has been shown that using k-way partitioning algorithm to distribute a workflow's tasks between multiple machines in the cloud reduces the overall data communication and therefore lowers the cost of the bandwidth usage. In this thesis, a framework was built in order to automate this process - partition any workflow submitted by a scientist that is meant to be run on Pegasus workflow management system in the cloud with ease. The framework provisions the instances in the cloud using CloudML, configures and installs all the software needed for the execution, runs and partitions the scientific workflow and finally shows the time estimation of the workflow, so that the user would have an approximate guidelines on, how many resources one should provision in order to finish an experiment under a certain time-frame.
Graduation Thesis language
Graduation Thesis type
Master - Software Engineering
Satish Narayana Srirama
Defence year